This article is originally published by ** Luo Zhouyang [email protected]**. This article has been published in the original author blog blog. Stupidme. Me / 2018/08/25 /…

Back propagation is the cornerstone of deep learning.

derivative

So let’s review the derivative:


The derivative of the function in each variable is the partial derivative.

For the function.And at the same time

A gradient is a vector of partial derivatives. In the example above,

The chain rule

For simple functions, we can directly calculate their derivatives according to the formula. But for complicated functions, it’s not easy to write the derivative directly. But we have the chain rule.

Without further definition, let’s do an example, just to get a feel for the magic of the chain rule.

The familiar sigmoid functionIf you can’t remember its derivative, how do we solve it?

The solution steps are as follows:

  • Modularize the function into basic parts, and for each part you can take the derivative using simple rules
  • Using the chain rule, link these derivatives together, and compute the final derivative

Details are as follows:

make,

make,

make,

make,

make,

This e up here is actually ours, right? \sigma(x)? , then according to the chain rule, there is:

The derivative of the sigmoid function can be directly expressed in terms of itself, which is also a wonderful property. Isn’t that easy to do?

Backpropagation code implementation

I know the derivative and the chain rule, but what’s the code for forward propagation and back propagation?

Let’s use a slightly more complicated example this time:


Let’s take a look at its forward pass code:

import math

x = 3
y = 4 -

sigy = 1.0 / (1 + math.exp(-y)) # sigmoid function
num = x + sigy # molecules
sigx = 1.0 / (1 + math.exp(-x))
xpy = x + y
xpy_sqr = xpy**2
den = sigx + xpy_sqr # the denominator
invden = 1.0 / den
f = num * invden # function
Copy the code

It’s pretty simple, isn’t it, to take complex functions and break them down into simple functions.

Let’s take a look at the backpropagation process:

dnum = invden
Copy the code

because


So there are


That is


dinvden = num # in the same way

dden = (1.0 / (den**2)) * dinvden # chain rule
Copy the code

To expand:


again


so


So, by the same token, we can write all the derivatives:

dsigx = (1) * dden 
dxpy_sqr = (1) * dden

dxpy = (2 * xpy) * dxpy_sqr

# backprob xpy = x + y
dx = (1) * dxpy
dy = (1) * dxpy

# start with "+=" instead of "="
dx += ((1 - sigx) * sigx) * dsigx # dsigma(x) = (1 - sigma(x))*sigma(x)
dx += (1) * dnum

# backprob num = x + sigy
dsigy = (1) * dnum
# note "+ ="
dy += ((1 - sigy) * sigy) * dsigy
Copy the code

Question:

  • Why use “+=” instead of “=” in the above calculation?

If the variables x and y appear multiple times in a forward-propagated expression, be careful when propagating back and use += instead of = to accumulate the gradient of these variables (which would otherwise cause overwriting). This follows the multivariate chain rule in calculus, which states that if the variable branches in different parts of the path, then the gradient should be added up as it goes back.

To contact me

  • Email: [email protected]

  • WeChat: luozhouyang0528

  • Personal public account, you may be interested in: