Produced by: The Cabin by: Peter Edited by: Peter
Machine Learning — Neural Networks Learning
This paper makes further elaboration on the basis of the upper segment neural network, including:
- Neural network cost function
- Back propagation method and interpretation
- Gradient inspection
- Summary of neural networks
Neural network cost function
Parameter interpretation
The marking method of several parameters is explained:
- MMM: Number of training samples
- X, yx, YX, Y: input and output signals
- LLL: indicates the number of neural network layers
- SI{S}_{I}SI: the number of neurons in each layer
- Sl{S}_{L}Sl: indicates the number of output neurons
Classification of discussion
There are two main categories: dichotomies and polytaxies
SL=0,y= 0/1s_l =0,y=0/1SL=0,y=0/1; The output is a real number
KKK classification: SL=k,yi=1S_L=k,y_i=1SL=k,yi=1 indicates the situation of being divided into class III. The output is a multidimensional vector
Cost function
Cost function in Logistic regression (LR) :
J\left(\theta \right)=-\frac{1}{m}\left[\sum_\limits{i=1}^{m}{y}^{(i)}\log{h_\theta({x}^{(i)})}+\left(1-{y}^{(i)}\right)log\left(1-h_\ theta\left({x}^{(i)}\right)\right)\right]+\frac{\lambda}{2m}\sum_\limits{j=1}^{n}{\theta_j}^{2}
In Logistic regression, there is only one output variable called scalar Scalar.
But in a neural network there will be multiple output variables, hθ(x)h_\theta(x)hθ(x) is a vector of KKK dimensions.
Assume the third output function:
\ newcommand {\ subk} [1] theta _k} {# 1 h (x) ∈ RKh_ \ theta \ left (x \ right) \ \ mathbb in ^ {R} {K} h theta (x) ∈ fairly RK Theta (x) (h) I = ithoutput {\ left ({h_ \ theta} \ left (x, right), right)} _ {I} = {I} ^ {th} \ text {output} theta (x) (h) I = ithoutput
The cost function JJJ is expressed as:
Explanation:
- The cost function is expected to observe the error between the predicted result of the algorithm and the real situation
- There will be KKK predictions for each line feature, and each line will be predicted by cycle
- Select the one with the highest probability among KKK predictions and compare it with the actual data YYY
- The regularization term is the sum of the θ\thetaθ matrices for each layer after excluding each offset θ0\theta_0θ0
- The JJJ parameter (determined by the number of active cells in sl+ 1S_L + 1SL +1 layer) loops through all rows and the III parameter (determined by the number of active cells in slS_lSL layer) loops through all columns
Backpropagation Algorithm
Neural network to calculate the cost function of the partial derivative partial J (Θ) partial Θ ij (l) \ frac {\ partial J (\ Theta)} {\ partial \ Theta_ {ij ^ {(l)}}} partial Θ ij (l) partial J (Θ), you need to use back propagation method
- First calculate the error of the last layer
- Find the error of each layer in reverse, until you get to the penultimate layer
Propagating examples forward
Suppose we have a sample of data:
The neural network is 4-layer, where K=SL=L=4{K=S_L=L=4}K=SL=L=4
Forward propagation is calculated from the input layer to the output layer in the order of the neural network.
Back propagation example
-
Calculate from the error of the last layer:
Error = activation unit forecasts a (4) {a} ^ {} (4) a (4) and y (k) between the actual value y ^ {} (k) y (k) the difference between
-
δ\deltaδ is used to represent the error, and the error = the predicted value of the model – the true value
-
The error of the previous layer
(3) the g ‘(z) g (z ^ {} (3)) g’ (3) (z) is the derivative of the shape function SSS, specific expression is:
- The error in the previous layer
The first layer is the input variable, and there is no error
- Suppose lambda=0 lambda=0 lambda=0 if no regularization is done
Explain the meanings of the upper and lower signs in the above formula:
-
LLL stands for what level
-
JJJ represents the subscript of the activation unit in the computing layer
-
Iii is the subscript of the error element
algorithm
-
The forward propagation method is used to calculate the activation units of each layer
-
The error of the last layer is calculated by using the real result of the training set and the prediction result of the neural network
-
Finally, all the errors up to the second layer are calculated by using the back propagation method.
Triangle {(l)}_{ij}△(l)ij = triangle {(l)}_{ij}△(l)ij = triangle {(l)}_{ij}△(l)ij
Intuitive understanding of back propagation
Principle of forward propagation
- 2 input units; 2 hidden layers (not including bias units); 1 output unit
- Subscript III is level III, subscript is level III is trait or attribute
There is a small problem in the picture, look at the bottom right corner of the screenshot!!
According to the conclusion of the back propagation method above:
Back propagation principle
)
Parameters on
In the above formula, how to use the back propagation method to calculate the derivative of the cost function is realized. How to expand the parameter from the matrix form to the vector form is introduced here
Gradient inspection
How do I solve for the derivative at a point
How do I take the derivative of some parameter θθ in the cost function
Summary of neural networks
The first job
When constructing a neural network, the first consideration is how to choose the network structure: how many layers and how many neural units per layer
- The number of units in the first layer is the number of features in our training set.
- The number of units in the last layer is the number of classes that are the result of our training set.
- If the number of hidden layers is greater than 1, ensure that each hidden layer has the same number of cells. Generally, the more hidden layer cells the better.
Train neural network steps
- Random initialization of parameters
- Calculation of all hθ(x)h_{\theta}(x)hθ(x) by forward propagation method
- Write the code to calculate the cost function JJJ
- All partial derivatives are calculated using the back propagation method
- These partial derivatives are verified by numerical tests
- Use optimization algorithms to minimize the cost function