0%

A Detail about Implementing Softmax Function

To avoid exp function overflow,we use the following tricks:

Suppose input x is a vector : \[ \bold{x} = [x_0,x_1,..., x_n]\],then original Softmax function should be: \[[\frac{e^{x_0}}{\sum_j{e^{x_j}}}, \dots,\frac{e^{x_n}}{\sum_j{e^{x_j}}}]\]. While when a certain entry of \[\bold{x}\] are extremely bif,exp function will overflow(an you get a inf)。

Therefore in practice,we devide the numerator and denominator of softmax function by \[e^{x_{max}}\] to avoid overflow:

\[\frac{e^{x_0}}{\sum_{j}e^{x_j}}= \frac{e^{x_0-x_{max}}}{\sum_{j}e^{x_j-x_{max}}}\]

You can test with the following code。When you do not have extreme input, softmax and softmax_origin function return the same values。When you have a extreme input,softmax_origin will fail, while softmax works well。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np
def softmax(x):
max_ = np.max(x)
x = x - max_
x_exp = np.exp(x)
x_sum = np.sum(x_exp)
s = x_exp/x_sum
return s

def softmax_origin(x):
x_exp = np.exp(x)
x_sum = np.sum(x_exp)
s = x_exp/x_sum
return s

if __name__ == '__main__':
x = np.array([0, 1, 2, 3])
print("## Input x is ", x)
print("original softmax: {}".format(softmax_origin(x)))
print("minus max version: {}\n".format(softmax(x)))
x = np.array([0, 1, 1000, 2000000])
print("##Input x is ", x)
print("original softmax: {}".format(softmax_origin(x)))
print("minus max version: {}\n".format(softmax(x)))

Implementation in CS224N Assignment2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def softmax(x):
"""
Arguments:
x -- A D dimensional vector or N x D dimensional numpy matrix.
orig_shape = x.shape
"""
if len(x.shape) > 1:
# Matrix
tmp = np.max(x, axis=1)
x -= tmp.reshape((x.shape[0], 1))
x = np.exp(x)
tmp = np.sum(x, axis=1)
x /= tmp.reshape((x.shape[0], 1))
else:
# Vector
tmp = np.max(x)
x -= tmp
x = np.exp(x)
tmp = np.sum(x)
x /= tmp
assert x.shape == orig_shape
return x