Machine Learning - Neural Networks Learning

Produced by: The Cabin by: Peter Edited by: Peter

Machine Learning — Neural Networks Learning

This paper makes further elaboration on the basis of the upper segment neural network, including:

Neural network cost function
Back propagation method and interpretation
Gradient inspection
Summary of neural networks

Neural network cost function

Parameter interpretation

The marking method of several parameters is explained:

MMM: Number of training samples
X, yx, YX, Y: input and output signals
LLL: indicates the number of neural network layers
SI{S}_{I}SI: the number of neurons in each layer
Sl{S}_{L}Sl: indicates the number of output neurons

Classification of discussion

There are two main categories: dichotomies and polytaxies

SL=0,y= 0/1s_l =0,y=0/1SL=0,y=0/1; The output is a real number

KKK classification: SL=k,yi=1S_L=k,y_i=1SL=k,yi=1 indicates the situation of being divided into class III. The output is a multidimensional vector

Cost function

Cost function in Logistic regression (LR) :

$J\left(\theta \right)=-\frac{1}{m}\left[\sum_\limits{i=1}^{m}{y}^{(i)}\log{h_\theta({x}^{(i)})}+\left(1-{y}^{(i)}\right)log\left(1-h_\ theta\left({x}^{(i)}\right)\right)\right]+\frac{\lambda}{2m}\sum_\limits{j=1}^{n}{\theta_j}^{2}$

In Logistic regression, there is only one output variable called scalar Scalar.

But in a neural network there will be multiple output variables, hθ(x)h_\theta(x)hθ(x) is a vector of KKK dimensions.

Assume the third output function:

\ newcommand {\ subk} [1] theta _k} {# 1 h (x) ∈ RKh_ \ theta \ left (x \ right) \ \ mathbb in ^ {R} {K} h theta (x) ∈ fairly RK Theta (x) (h) I = ithoutput {\ left ({h_ \ theta} \ left (x, right), right)} _ {I} = {I} ^ {th} \ text {output} theta (x) (h) I = ithoutput

The cost function JJJ is expressed as:

$J(\Theta) = -\frac{1}{m} \left[ \sum\limits_{i=1}^{m} \sum\limits_{k=1}^{k} {y_k}^{(i)} \log \subk{(h_\Theta(x^{(i)}))} + \left( 1 – y_k^{(i)} \right) \log \left( 1- \subk{\left( h_\Theta \left( x^{(i)} \right) \right)} \right) \right] + \frac{\lambda}{2m} \sum\limits_{l=1}^{L-1} \sum\limits_{i=1}^{s_l} \sum\limits_{j=1}^{s_{l+1}} \left( \Theta_{ji}^{(l)} \right)^2$

Explanation:

The cost function is expected to observe the error between the predicted result of the algorithm and the real situation
There will be KKK predictions for each line feature, and each line will be predicted by cycle
Select the one with the highest probability among KKK predictions and compare it with the actual data YYY
The regularization term is the sum of the θ\thetaθ matrices for each layer after excluding each offset θ0\theta_0θ0
The JJJ parameter (determined by the number of active cells in sl+ 1S_L + 1SL +1 layer) loops through all rows and the III parameter (determined by the number of active cells in slS_lSL layer) loops through all columns

Backpropagation Algorithm

Neural network to calculate the cost function of the partial derivative partial J (Θ) partial Θ ij (l) \ frac {\ partial J (\ Theta)} {\ partial \ Theta_ {ij ^ {(l)}}} partial Θ ij (l) partial J (Θ), you need to use back propagation method

First calculate the error of the last layer
Find the error of each layer in reverse, until you get to the penultimate layer

Propagating examples forward

Suppose we have a sample of data:

$({x^{(1)}},{y^{(1)}})$

The neural network is 4-layer, where K=SL=L=4{K=S_L=L=4}K=SL=L=4

Forward propagation is calculated from the input layer to the output layer in the order of the neural network.

Back propagation example

Calculate from the error of the last layer:

Error = activation unit forecasts a (4) {a} ^ {} (4) a (4) and y (k) between the actual value y ^ {} (k) y (k) the difference between
δ\deltaδ is used to represent the error, and the error = the predicted value of the model – the true value

$\delta^{(4)} = a^{(4)} -y$
The error of the previous layer

$\delta^{(3)}=\left({\Theta^{(3)}}\right)^{T}\delta^{(4)}\ast g’\left(z^{(3)}\right)$

(3) the g ‘(z) g (z ^ {} (3)) g’ (3) (z) is the derivative of the shape function SSS, specific expression is:

$g'(z^{(3)})=a^{(3)}\ast(1-a^{(3)})$

The error in the previous layer

$\delta^{(2)}=(\Theta^{(2)})^{T}\delta^{(3)}\ast g'(z^{(2)})$

The first layer is the input variable, and there is no error

Suppose lambda=0 lambda=0 lambda=0 if no regularization is done

$\frac{\partial J(\Theta)}{\partial \Theta_{ij}^{l}}=a_j^{(l)}\theta_i^{(l+1)}$

Explain the meanings of the upper and lower signs in the above formula:

LLL stands for what level
JJJ represents the subscript of the activation unit in the computing layer
Iii is the subscript of the error element

algorithm

The forward propagation method is used to calculate the activation units of each layer
The error of the last layer is calculated by using the real result of the training set and the prediction result of the neural network
Finally, all the errors up to the second layer are calculated by using the back propagation method.

Triangle {(l)}_{ij}△(l)ij = triangle {(l)}_{ij}△(l)ij = triangle {(l)}_{ij}△(l)ij

$D{(l)}_{ij}$

Intuitive understanding of back propagation

Principle of forward propagation

2 input units; 2 hidden layers (not including bias units); 1 output unit
Subscript III is level III, subscript is level III is trait or attribute

There is a small problem in the picture, look at the bottom right corner of the screenshot!!

According to the conclusion of the back propagation method above:

Z^{(3)}_{1}=\Theta_{10}^{(2)}*1+\Theta_{11}^{(2)}*a^{(2)}_1+\Theta_{12}^{(2)}*a^{(2)}_2

Back propagation principle

)

Parameters on

In the above formula, how to use the back propagation method to calculate the derivative of the cost function is realized. How to expand the parameter from the matrix form to the vector form is introduced here

Gradient inspection

How do I solve for the derivative at a point

How do I take the derivative of some parameter θθ in the cost function

Summary of neural networks

The first job

When constructing a neural network, the first consideration is how to choose the network structure: how many layers and how many neural units per layer

The number of units in the first layer is the number of features in our training set.
The number of units in the last layer is the number of classes that are the result of our training set.
If the number of hidden layers is greater than 1, ensure that each hidden layer has the same number of cells. Generally, the more hidden layer cells the better.

Train neural network steps

Random initialization of parameters
Calculation of all hθ(x)h_{\theta}(x)hθ(x) by forward propagation method
Write the code to calculate the cost function JJJ
All partial derivatives are calculated using the back propagation method
These partial derivatives are verified by numerical tests
Use optimization algorithms to minimize the cost function

Machine Learning — Neural Networks Learning