A Detail about Implementing Softmax Function

To avoid exp function overflow，we use the following tricks:

Suppose input x is a vector : \[ \bold{x} = [x_0,x_1,..., x_n]\]，then original Softmax function should be: \[[\frac{e^{x_0}}{\sum_j{e^{x_j}}}, \dots,\frac{e^{x_n}}{\sum_j{e^{x_j}}}]\]. While when a certain entry of \[\bold{x}\] are extremely bif，exp function will overflow(an you get a inf)。

Therefore in practice，we devide the numerator and denominator of softmax function by \[e^{x_{max}}\] to avoid overflow:

\[\frac{e^{x_0}}{\sum_{j}e^{x_j}}= \frac{e^{x_0-x_{max}}}{\sum_{j}e^{x_j-x_{max}}}\]

You can test with the following code。When you do not have extreme input, softmax and softmax_origin function return the same values。When you have a extreme input，softmax_origin will fail, while softmax works well。

import numpy as np
def softmax(x):
    max_ = np.max(x)
    x = x - max_
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp)
    s = x_exp/x_sum
    return s

def softmax_origin(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp)
    s = x_exp/x_sum
    return s

if __name__ == '__main__':
    x = np.array([0, 1, 2, 3])
    print("## Input x is ", x)
    print("original softmax: {}".format(softmax_origin(x)))
    print("minus max version: {}\n".format(softmax(x)))
    x = np.array([0, 1, 1000, 2000000])
    print("##Input x is ", x)
    print("original softmax: {}".format(softmax_origin(x)))
    print("minus max version: {}\n".format(softmax(x)))

Implementation in CS224N Assignment2:

def softmax(x):
  	"""
  	Arguments:
    x -- A D dimensional vector or N x D dimensional numpy matrix.
  	orig_shape = x.shape
  	"""
    if len(x.shape) > 1:
        # Matrix
        tmp = np.max(x, axis=1)
        x -= tmp.reshape((x.shape[0], 1))
        x = np.exp(x)
        tmp = np.sum(x, axis=1)
        x /= tmp.reshape((x.shape[0], 1))
    else:
        # Vector
        tmp = np.max(x)
        x -= tmp
        x = np.exp(x)
        tmp = np.sum(x)
        x /= tmp
    assert x.shape == orig_shape
    return x