Click on the top of the blue font, pay attention to AI xiaobai entry yo
Follow in the footsteps of bloggers and make a little progress every day
In this paper, the traditional three-layer neural network model is introduced. Firstly, the concept of neural unit in the network is introduced, and a neural unit is regarded as a logistic regression model. Therefore, neural network can be regarded as the extension of logistic regression in (width, depth); Then, forward propagation is a process of continuous propagation of composite functions, and the final loss function depends on the target; Finally, back propagation is the process of taking the derivative of a composite function. Of course, three-layer neural network is only the prototype of deep learning, and now deep learning has become all-encompassing.
The author | loevinger
Edit | yuquanle
Three layer neural network
A. Neural unit ****
The development of deep learning is generally divided into three stages, perceptron > three-layer neural network > deep learning (representation learning). Due to the linear model, the previous perceptrons could not solve the xOR problem, and their ability of representation was limited. For this reason, the three-layer neural network gives up the good interpretation of perceptron and introduces nonlinear activation function to increase the representation ability of the model. There are two differences between three-layer neural network and perceptron:
1) The introduction of nonlinear activation function enables the model to solve nonlinear problems.
2) After the activation function is introduced, there will be no loss. The loss function adopts logarithmic loss, which makes the three-layer neural network more like the compound of three-layer multivariate (neural unit) logistic regression. \
Each neuron in the neural network can be seen as a logistic regression model, the three layers of neural network is three layer compound of logistic regression model, just don’t like only one neuron in the logistic regression, the general is with multiple input and hidden layer neurons, and the output layer corresponds to a logistic regression or softmax units, Or a linear regression model.
Here is a brief introduction to some commonly used nonlinear activation functions (images, properties, derivatives) :
Properties: All the above nonlinear activation functions can be regarded as \
An approximation of. An important reason to adopt approximations is to obtain derivatives. Smooth Sigmoid and TANh functions are often used in early days. However, we can find that the derivatives of these two functions are extremely small at both ends, which makes the gradient disappear in the training of multi-layer neural network and makes it difficult to train. Relu function is a good solution to the problem of extremely small derivatives at both ends, and it is also a method to solve the problem of neural network gradient disappearance.
Derivative: \
\
B. Forward propagation ****
Forward propagation is a process of composite functions, and each neuron is a combination of linear function and nonlinear function. The structure of the whole network is as follows: the superscript represents the network layer, so represents the output layer.
Vector form:
Matrix form:
However, it should be noted that there is not only one neuron in each layer, so the vector in logistic regression is expanded to a matrix, indicating that there are multiple neurons (it is precisely because of multiple neurons that the neural network has the ability to extract features). Nonlinear functions can be selected as follows. At present, Relu function has certain advantages.
What is noteworthy is the column and column of the matrix. Deep learning often uses a column to represent a sample, so the size of the data matrix in the network is as follows:
The loss function also uses logarithmic loss (dichotomy) : \
C. Back propagation ****
Since the neural network is a multi-layer composite function, the forward propagation is to calculate the composite function, so the back propagation is a chain derivation process, determine the negative gradient direction of all parameters, and adopt the method of gradient descent to adjust the parameters of each layer of the network.
1. Loss function: \
2. Activation function: \
3. Linear function: For the loss function, take the derivative of each variable directly as follows: \
It is worth noting that the activation function is a numerical operation, which does not involve the derivation of the matrix. In a linear function, because it acts on several samples, the mean value of each sample is required when determining the direction of the negative gradient, while the mean value is not required for the derivation of the guide.
** Code combat **
int trainDNN()
{
Matrix x;
char file[20] ="data\\train.txt";
x.LoadData(file);
x = x.transposeMatrix();
cout<<"x,y"<<endl;
cout<<"-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --"<<endl;
Matrix y;
cout<<x.row<<endl;
y=x.getOneRow(x.row- 1);
x.deleteOneRow(x.row- 1);
y=one_hot(y,2);
cout<<x.row<<"*"<<x.col<<endl;
cout<<y.row<<"*"<<y.col<<endl;
const char *initialization="he";
double learn_rateing=0.1;
int iter=1000;
double lambd=0.1;
double keep_prob=0.5;
bool print_cost=true;
const char *optimizer="gd";
int mini_batch_size=64;
double beta1=0.9;
double beta2=0.999;
double epsilon=0.00000001;
DNN(x,y,optimizer="gd",learn_rateing=0.001,initialization="he",lambd=0.001,keep_prob = 1,mini_batch_size=64, \
beta1=0.9, beta2=0.999, epsilon=0.00000001, iter=5000, print_cost=true);
predict(x,y);
return 0;
}
Copy the code
int DNN(Matrix X,Matrix Y,const char *optimizer,double learn_rateing,const char *initialization, double lambd, double keep_prob, \
int mini_batch_size,double beta1, double beta2, double epsilon, int iter, bool print_cost)
{
/** Initialization parameters **/
int i=0,k=0;
int lay_dim=3;
int lay_n[3] = {0.3.1};
lay_n[0]=X.row;
string lay_active[3] = {"relu"."relu"."sigmoid"};
sup_par.layer_dims=lay_dim;
for(i=0; i<lay_dim; i++)
{
sup_par.layer_n[i]=lay_n[i];
sup_par.layer_active[i]=lay_active[i];
}
init_parameters(X,initialization);
double loss;
Matrix AL(Y.row,Y.col,0."ss");
double *keep_probs;
if(keep_prob==1)
{
keep_probs=new double (sup_par.layer_dims);
for(k=0; k<sup_par.layer_dims; k++) { keep_probs[k]=1; }}else if (keep_prob<1)
{
keep_probs=new double (sup_par.layer_dims);
for(k=0; k<sup_par.layer_dims; k++) {if(k==0 || k==sup_par.layer_dims- 1)
{
keep_probs[k]=1;
}
else
{
keep_probs[k]=1; }}}for(i=0; i<iter; i++)
{
//cout<<"-----------forward------------"<<"i="<<i<<endl;
AL=model_forward(X,keep_probs);
//cout<<"-----------loss--------------"<<endl;
loss=cost_cumpter(AL,Y,lambd);
if(i%100= =0)
cout<<"loss="<<loss<<endl;
//cout<<"-----------backword-----------"<<endl;
model_backword(AL,Y,lambd,keep_probs);
//cout<<"-----------update--------------"<<endl;
updata_parameters(learn_rateing,i+1,optimizer,beta1,beta2,epsilon);
}
predict(X,Y);
return 0;
}
Copy the code
int predict(Matrix X,Matrix Y)
{
int i,k;
parameters *p;
p=∥
p->A = X.copyMatrix();
Matrix AL;
double *keep_probs=new double [sup_par.layer_dims];
for(k=0; k<sup_par.layer_dims; k++) { keep_probs[k]=1;
}
AL=model_forward(X,keep_probs);
for(i=0; i<Y.col; i++) {if(AL.data[0][i]>0.5)
AL.data[0][i]=1;
else
AL.data[0][i]=0;
}
double pre=0;
for(i=0; i<Y.col; i++) {if((AL.data[0][i]==1 && Y.data[0][i]==1)||(AL.data[0][i]==0 && Y.data[0][i]==0))
pre+=1;
}
pre/=Y.col;
cout<<"pre="<<pre<<endl;
return 0;
}
Copy the code
The End
You are not alone in the battle. The path and materials suitable for beginners to enter artificial intelligence download machine learning online manual Deep learning online Manual note:4500+ user ID:92416895), please reply to knowledge PlanetCopy the code
Every “like” you clicked, I took it seriously