This article is to sort out the four activation functions that you will encounter in the process of getting started with deep learning. The following three activation functions will be introduced in terms of formulas, codes and images. First, let’s clarify which four are: \
- The Sigmoid function
- Tahn function
- ReLu function
- SoftMax function
The role of activation functions
Figure A below is A linearly separable problem, which means that for two types of points (blue and green), you can achieve complete classification by A straight line.
Of course, image A is the most ideal and simplest dichotomy problem, but in reality there are often some very complex linear indivisible problems. For example, in image B, you cannot find any straight line that can completely separate the blue point from the green point, and you have to circle A closed curve.
The activation function is the nonlinear function that helps “draw” this closed curve, and with the activation function, a lot of algorithms can be more powerful, and also can deal with linear indivisibility problems.
The Sigmoid function
The Sigmoid function, which was mentioned in the introduction to logistic regression, has the following mathematical expression:
Where e is Napier’s constant and its value is 2.7182… It looks like this:
Some characteristics of the image can be observed:
- The range of the curve is 0,1.
- When x = 0, the Sigmoid function value is 0.5
- As x increases, the Sigmoid function infinitely approaches 1
- As x decreases, the Sigmoid function approaches 0 indefinitely
For the gradient descent method, information update to a large extent depends on the gradient, the Sigmoid function is an obvious weakness is when the function value is particularly close to 0 or 1 this ends, because its curve is nearly flat, so at this point the gradient of almost zero, so very unfavorable to the weight update, which will cause the model convergence.
The Sigmoid function is coded as follows:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
Copy the code
Tanh function
The Tanh function is a hyperbolic tangent function, and its mathematical expression is:
The Tanh function is very similar to the Sigmoid function, as can be seen from the graph:
The two functions have the same thing: when the input x value is very large or very small, the output Y value of the corresponding function is nearly equal. The same disadvantage is that the gradient is very small, which is very unfavorable to update the weight. The difference is that the Tanh function has a range of (-1,1), and when x is 0, the output is 0.
The code for Tanh is as follows:
import numpy as np
def tanh(x):
return (exp(x)-exp(-x))/(exp(x)+exp(-x))
Copy the code
ReLu function
ReLu is a linear rectifier function, also known as a modified linear unit, whose mathematical expression is
Tanh is a piecewise function and its graph is as follows:
The graph is easy to understand, if the input x value is less than 0, then the output is also 0; If the input x value is greater than 0, the x value is directly output. It should be noted that the ReLu function is discontinuous (non-differentiable) at x = 0, but it can also be used as an activation function.
Compared with Sigmoid function and Tanh function, ReLu function has an obvious advantage that it converges quickly in the application of gradient descent method. When the input value is integer, there will be no gradient saturation problem, because the part greater than 0 is a linear relationship. This advantage makes ReLu a widely used activation function at present.
The code for ReLu is as follows:
import numpy as np
def relu(x):
return np.maximum(0,x)
Copy the code
SoftMax function
Classification problems can be divided into binary classification problems and multi-classification problems. Sigmoid function is more suitable for binary classification problems, while SoftMax function is more suitable for multi-classification problems.
The mathematical expression of SoftMax function is:
Where, represents the output of the classifier, I represents the category index, the total number of categories is C, and represents the ratio of the index of the current element to the index sum of all elements. In a nutshell, SoftMax functions scale the output values of multiple categories into relative probabilities, making the outputs easier to understand and compare.
In order to prevent the problem of up overflow or down overflow in the calculation of SoftMax function, V is usually treated numerally in advance, that is, each V is subtracted from the maximum value in V. Assume that the mathematical expression of SoftMax function is changed to:
As the SoftMax function calculates the probability, it cannot be displayed with images. The code of SoftMax function is as follows:
import numpy as np
def softmax(x):
D = np.max(x)
exp_x = np.exp(x-D)
return exp_x / np.sum(exp_x)
Copy the code
Read more
Top 10 Best Popular Python Libraries of 2020 \
2020 Python Chinese Community Top 10 Articles \
5 minutes to quickly master the Python timed task framework \
Special recommendation \
\
Click below to read the article and join the community