Keras Deep learning - Activation functions commonly used in deep learning

Make writing a habit together! This is the fourth day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Common activation function

The high nonlinearity of the network can be achieved using activation functions, which is critical to modeling complex relationships between inputs and outputs. If there is no nonlinear activation function, the network can only express simple linear mapping. No matter how many hidden layers there are, the whole network is equivalent to the single-layer neural network. Only after the nonlinear activation function is added, the deep neural network has amazing nonlinear mapping learning ability. Activation functions can be applied at multiple layers in the network.

Sigmoid activation function

Sigmoid is the most widely used kind of activation function, whose value range is [0, 1]. It can map a real number to the interval of [0, 1] and can be used in dichotomous problems.

The Sigmoid function formula is defined as follows:

sigmoid(x)=\frac 1 {1+e^{-x}}

Use Python to implement this function:

def sigmoid(x) :
     return 1/ (1+np.exp(-x))
Copy the code

The graph of the function is shown below. You can see that the shape of the function is like an S-curve, so it is also called an S-type growth curve:

sigmoidFunction advantages: smooth, easy to differentiate.
sigmoidDisadvantages: The derivation of backpropagation involves division, so the calculation is large; In the case of back propagation, the gradient will disappear easily, which limits the training of deep network.

Tanh activation function

Tanh is a kind of hyperbolic function, which is an improvement of Sigmoid activation function. It is a symmetric function centered on zero, and its value range is [-1, 1]. The calculation formula of Tanh activation function is as follows:

tanh(x) =\frac {{e^x} -e^{-x}} {{e^x} +e^{-x}}=2sigmoid(2x)-1

Use Python to implement this function:

def tanh(x) :
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
Copy the code

The graph of the function is shown below. It is an odd function with monotonically increasing in the open interval (-1, 1). The graph of the function is symmetric about the origin:

Advantages of TANH function: TanH function is an improvement of sigmoID function, with fast convergence speed and no loss value shaking.
Disadvantages of TANH function: Unable to solve the problem of gradient dispersion, the calculation of the function is also exponential, and the calculation is relatively complicated.

ReLU activation function

1. The Rectified Linear Units (ReLU) activation function is a perfect substitute for sigmoID and TANH activation function, and it is one of the most important breakthrough technologies in the field of deep learning. The calculation formula of ReLU activation function is as follows:

relu(x) = \begin{cases} 0, & {x<0} \\ x, & {x\ge0} \end{cases}

Use Python to implement this function:

def relu(x) :
    return np.where(x>0, x, 0)
Copy the code

The function graph is shown below. When the input value is greater than or equal to 0, the ReLU function is output as is. If the input is less than 0, the ReLU function value is 0. Because the linear component of ReLU that is greater than or equal to 0 has a fixed derivative, and the derivative with respect to the other linear component is 0. Therefore, it is much faster to train models using ReLU functions.

ReLUFunction advantages: there is no gradient disappearance problem, the calculation cost is very low, convergence speed ratiosigmoid 和 tanhThe delta function is much faster.
ReLUDisadvantages of the function: when the gradient value is too large, its weight is negative after updating, inReLUThe derivative of a function is always zero, resulting in a gradient that is no longer updated. Also calleddying ReLUThe problem.

Linear activation function

Linear activation outputs the input value itself, and outputs the input value as is: Linear (x)=xlinear(x) =xlinear(x) =x Implement this function using Python:

def linear(x) :
    return x
Copy the code

This function is only used for the output layer of the neural network model that solves regression problems. Note that linear activation functions cannot be used in hidden layers.

Softmax activation function

Typically, Softmax is used before the neural network outputs the final result. Softmax is often used to determine the probability that the input falls into one of n possible output categories in a given scenario. Suppose we are trying to classify digital images into one of 10 possible categories (numbers 0 through 9). In this case, there are 10 output values, each of which represents the probability that the input image falls into a certain category. Softmax activation function calculation formula is as follows:

softmax(x_i)=\frac {e^i} {\sum _{j=0} ^N e^j}

Softmax activation is used to provide a probability value for each category in the output, where III represents the index of the output. Use Python to implement this function:

def softmax(x) :
    return np.exp(x) / np.sum(np.exp(x))
Copy the code

The Softmax function generally acts as the last layer of the neural network, accepting input values from the previous layer of the network and then converting them into probabilities. For example, if we want to identify an image, it might be labeled apple, banana, lemon, pear, Then the value of the last layer of the network [1.0, 2.0, 3.0, 4.0] is output as [0.0320586, 0.08714432, 0.23688282, 0.64391426] after softmax function.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Keras Deep learning — Activation functions commonly used in deep learning

Common activation function

Sigmoid activation function

Tanh activation function

ReLU activation function

Linear activation function

Softmax activation function

Keras Deep learning — Activation functions commonly used in deep learning

Common activation function

Sigmoid activation function

Tanh activation function

ReLU activation function

Linear activation function

Softmax activation function

Related Posts

Introduction and Improvement to Pytorch (1) – Create Tensor

The JUPyter Notebook installs the C/C++ kernel

Technology blog: I use Deep Learning to make a Visual AI microprocessor!