To avoid exp function overflow,we use the following tricks:
Suppose input x is a vector : \[ \bold{x} = [x_0,x_1,..., x_n]\],then original Softmax function should be: \[[\frac{e^{x_0}}{\sum_j{e^{x_j}}}, \dots,\frac{e^{x_n}}{\sum_j{e^{x_j}}}]\]. While when a certain entry of \[\bold{x}\] are extremely bif,exp function will overflow(an you get a inf)。
Therefore in practice,we devide the numerator and denominator of softmax function by \[e^{x_{max}}\] to avoid overflow:
\[\frac{e^{x_0}}{\sum_{j}e^{x_j}}= \frac{e^{x_0-x_{max}}}{\sum_{j}e^{x_j-x_{max}}}\]
You can test with the following code。When you do not have extreme input, softmax and softmax_origin function return the same values。When you have a extreme input,softmax_origin will fail, while softmax works well。
1 | import numpy as np |
Implementation in CS224N Assignment2: 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22def softmax(x):
"""
Arguments:
x -- A D dimensional vector or N x D dimensional numpy matrix.
orig_shape = x.shape
"""
if len(x.shape) > 1:
# Matrix
tmp = np.max(x, axis=1)
x -= tmp.reshape((x.shape[0], 1))
x = np.exp(x)
tmp = np.sum(x, axis=1)
x /= tmp.reshape((x.shape[0], 1))
else:
# Vector
tmp = np.max(x)
x -= tmp
x = np.exp(x)
tmp = np.sum(x)
x /= tmp
assert x.shape == orig_shape
return x