Participate in the 11th day of The November Gwen Challenge, see the details of the event: 2021 last Gwen Challenge
Why do we need to activate functions?
Use the following neural network to analyze:
For the figure above: we know where X4×1H5×1O7×1X^{4 \times 1} \quad H^{5 \times 1} \quad O^{7 \times 1}X4×1H5×1O7×1
Calculation between each layer:
Combine the above two formulas:
Let’s look at the matrix operation:
And when I put that matrix back into the formula, it becomes O=wX+bO=wX+bO=wX+b. In this case, there is no meaning for the existence of this multi-layer neural network. Since it can be combined, is it not sweet to write a layer directly? Sweet! But multi-layer models do something that a single layer can’t do, so how do you keep multi-layer models so that they can’t be simply merged? So that’s the activation function.
1. ReLU function (Rectified Linear Unit)
Formula: ReLU (x) = Max (x, 0) \ operatorname {ReLU} (x) = \ Max (x, 0) ReLU (x) = Max (x, 0)
The ReLU function preserves only positive elements and discards all negative elements by setting the corresponding activation value to 0.
Functions processed by ReLU look like this:
The Sigmoid function
And then is my first exposure to study activation function, then listen to the Wu En class, he has a detailed talk about the benefits of using sigmoid, I remember the notes, we can see here: Logistic Regression | Logistic Regression – the nuggets (juejin. Cn)
Formula: sigmoid (x) = 11 + exp (-) x \ operatorname {sigmoid} (x) = \ frac {1} {1 \ + exp (-) x} sigmoid (x) = 1 + exp (1 – x)
For the input of a domain in R\mathbb{R}R, the sigmoid function transforms the input to the output on the interval (0, 1). For this reason, sigmoID is often called the squashing function: it compresses arbitrary input in the range (-INF, INF) to some value in the range (0, 1).
Tanh function
Formula: Tanh (x) = 1 – exp (2 x) – 1 + exp (2 x) – \ operatorname tanh (x) = attach \ frac {1 – \ exp (2 x)} {1 \ + exp (2 x)} tanh (x) = 1 + exp (1-2 x) – exp (2 x) –
Similar to the Sigmoid function, the tanh(hyperbolic tangent) function compresses its input into the interval (-1, 1).