Participate in the 11th day of The November Gwen Challenge, see the details of the event: 2021 last Gwen Challenge

Why do we need to activate functions?

Use the following neural network to analyze:

For the figure above: we know where X4×1H5×1O7×1X^{4 \times 1} \quad H^{5 \times 1} \quad O^{7 \times 1}X4×1H5×1O7×1

Calculation between each layer:

\ begin {array} {ll} H = W_ {1} x + b_ is {{1} & its matrix [H] _ \ {5 times 1} = \ left [W_ {1} \ right] _ \ {5 times 4} [x] _ {4 \ times 1} + \ left [b_ {1} \ right] _ \ {5 times 1}} \ \ O = W_ {2} H + b_ is {{2} & its matrix [O] _ {3} \ times 1 = \ left [W_ {2} \ right] _ {3 \ times 5} [H] _ {5 \times 1}+\left[b_{2}\right]_{3 \times 1}} \end{array}

Combine the above two formulas:

\begin{aligned} O &=w_{2} H+b_{2} \\ &=w_{2}\left(w_{1} X+b_{1}\right)+b_{2} \\ &=w_{2} w_{1} X+w_{2} b_{1}+b_{2} \\ \end{aligned}

Let’s look at the matrix operation:

$[w_{2}]_{3 \times 5} \cdot [w_{1}]_{5 \times 4} =[w]_{3 \times 4}$

$[w_{2}]_{25} \cdot [b_{1}]_{5 \times 1} =[b_{3}]_{3 \times 1}$

$[b_{1^{\prime}}]_{3 \times 1}+[b_{2}]_{3 \times 1} =[b]_{3 \times 1}$

And when I put that matrix back into the formula, it becomes O=wX+bO=wX+bO=wX+b. In this case, there is no meaning for the existence of this multi-layer neural network. Since it can be combined, is it not sweet to write a layer directly? Sweet! But multi-layer models do something that a single layer can’t do, so how do you keep multi-layer models so that they can’t be simply merged? So that’s the activation function.

1. ReLU function (Rectified Linear Unit)

Formula: ReLU ⁡ (x) = Max ⁡ (x, 0) \ operatorname {ReLU} (x) = \ Max (x, 0) ReLU (x) = Max (x, 0)

$\sigma(x)= \begin{cases}x & \text { if } x>0 \\ 0 & \text { otherwise }\end{cases}$

The ReLU function preserves only positive elements and discards all negative elements by setting the corresponding activation value to 0.

Functions processed by ReLU look like this:

The Sigmoid function

And then is my first exposure to study activation function, then listen to the Wu En class, he has a detailed talk about the benefits of using sigmoid, I remember the notes, we can see here: Logistic Regression | Logistic Regression – the nuggets (juejin. Cn)

Formula: sigmoid ⁡ (x) = 11 + exp ⁡ (-) x \ operatorname {sigmoid} (x) = \ frac {1} {1 \ + exp (-) x} sigmoid (x) = 1 + exp (1 – x)

$\sigma(x)= \begin{cases}1 & \text { if } x>0 \\ 0 & \text { otherwise }\end{cases}$

For the input of a domain in R\mathbb{R}R, the sigmoid function transforms the input to the output on the interval (0, 1). For this reason, sigmoID is often called the squashing function: it compresses arbitrary input in the range (-INF, INF) to some value in the range (0, 1).

Tanh function

Formula: Tanh ⁡ (x) = 1 – exp ⁡ (2 x) – 1 + exp ⁡ (2 x) – \ operatorname tanh (x) = attach \ frac {1 – \ exp (2 x)} {1 \ + exp (2 x)} tanh (x) = 1 + exp (1-2 x) – exp (2 x) –

Similar to the Sigmoid function, the tanh(hyperbolic tangent) function compresses its input into the interval (-1, 1).

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Common activation function

1. ReLU function (Rectified Linear Unit)

The Sigmoid function

Tanh function

Common activation function

1. ReLU function (Rectified Linear Unit)

The Sigmoid function

Tanh function

Related Posts

LeetCode 82, test your basic skills in removing duplicate elements II from an ordered list

Tensorflow.js massive ICONS, millisecond level recognition!

Noun explanation | Anchor Boxes – the key to high quality target detection