Make writing a habit together! This is the fourth day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.
Common activation function
The high nonlinearity of the network can be achieved using activation functions, which is critical to modeling complex relationships between inputs and outputs. If there is no nonlinear activation function, the network can only express simple linear mapping. No matter how many hidden layers there are, the whole network is equivalent to the single-layer neural network. Only after the nonlinear activation function is added, the deep neural network has amazing nonlinear mapping learning ability. Activation functions can be applied at multiple layers in the network.
Sigmoid activation function
Sigmoid is the most widely used kind of activation function, whose value range is [0, 1]. It can map a real number to the interval of [0, 1] and can be used in dichotomous problems.
The Sigmoid function formula is defined as follows:
Use Python to implement this function:
def sigmoid(x) :
return 1/ (1+np.exp(-x))
Copy the code
The graph of the function is shown below. You can see that the shape of the function is like an S-curve, so it is also called an S-type growth curve:
sigmoid
Function advantages: smooth, easy to differentiate.sigmoid
Disadvantages: The derivation of backpropagation involves division, so the calculation is large; In the case of back propagation, the gradient will disappear easily, which limits the training of deep network.
Tanh activation function
Tanh is a kind of hyperbolic function, which is an improvement of Sigmoid activation function. It is a symmetric function centered on zero, and its value range is [-1, 1]. The calculation formula of Tanh activation function is as follows:
Use Python to implement this function:
def tanh(x) :
return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
Copy the code
The graph of the function is shown below. It is an odd function with monotonically increasing in the open interval (-1, 1). The graph of the function is symmetric about the origin:
-
Advantages of TANH function: TanH function is an improvement of sigmoID function, with fast convergence speed and no loss value shaking.
-
Disadvantages of TANH function: Unable to solve the problem of gradient dispersion, the calculation of the function is also exponential, and the calculation is relatively complicated.
ReLU activation function
1. The Rectified Linear Units (ReLU) activation function is a perfect substitute for sigmoID and TANH activation function, and it is one of the most important breakthrough technologies in the field of deep learning. The calculation formula of ReLU activation function is as follows:
Use Python to implement this function:
def relu(x) :
return np.where(x>0, x, 0)
Copy the code
The function graph is shown below. When the input value is greater than or equal to 0, the ReLU function is output as is. If the input is less than 0, the ReLU function value is 0. Because the linear component of ReLU that is greater than or equal to 0 has a fixed derivative, and the derivative with respect to the other linear component is 0. Therefore, it is much faster to train models using ReLU functions.
ReLU
Function advantages: there is no gradient disappearance problem, the calculation cost is very low, convergence speed ratiosigmoid
和tanh
The delta function is much faster.ReLU
Disadvantages of the function: when the gradient value is too large, its weight is negative after updating, inReLU
The derivative of a function is always zero, resulting in a gradient that is no longer updated. Also calleddying ReLU
The problem.
Linear activation function
Linear activation outputs the input value itself, and outputs the input value as is: Linear (x)=xlinear(x) =xlinear(x) =x Implement this function using Python:
def linear(x) :
return x
Copy the code
This function is only used for the output layer of the neural network model that solves regression problems. Note that linear activation functions cannot be used in hidden layers.
Softmax activation function
Typically, Softmax is used before the neural network outputs the final result. Softmax is often used to determine the probability that the input falls into one of n possible output categories in a given scenario. Suppose we are trying to classify digital images into one of 10 possible categories (numbers 0 through 9). In this case, there are 10 output values, each of which represents the probability that the input image falls into a certain category. Softmax activation function calculation formula is as follows:
Softmax activation is used to provide a probability value for each category in the output, where III represents the index of the output. Use Python to implement this function:
def softmax(x) :
return np.exp(x) / np.sum(np.exp(x))
Copy the code
The Softmax function generally acts as the last layer of the neural network, accepting input values from the previous layer of the network and then converting them into probabilities. For example, if we want to identify an image, it might be labeled apple, banana, lemon, pear, Then the value of the last layer of the network [1.0, 2.0, 3.0, 4.0] is output as [0.0320586, 0.08714432, 0.23688282, 0.64391426] after softmax function.