My main research topics include reinforcement learning, computer vision, deep learning, machine learning, and so on. I want to share my notes and experiences in the process of learning. Look forward to your attention, welcome to learn and exchange progress together!
This video is going to focus on neural networks in detail. From the definition of neural network to the M-P model and then extended to the single layer perceptron, multi-layer feedforward neural network, and then to the deep neural network. (Some concepts in this paper have been omitted. If there are any unclear points, we will discuss them in the wechat group together.)
Definition of a neural network
In different fields, and some preferences of different people. People call neural networks a little bit differently. However, it mainly includes the following naming methods:
- Neural Network
- Artificial Neural Network
- Artificial Neural Systems
- Neural Computer
- Adaptive Systems
- Adaptive Network
- Connectionisms
The definition of a neural network varies widely from discipline to discipline. The most widely used definition is: a widely interconnected network of adaptive simple units organized in a way that mimics the responses of the biological nervous system to real-world objects.
M-P neuron model
In neural network, the most basic information processing unit is m-P neuron model.
An M-P neuron model, receiving from othersThe input signal from the neuron. Input signals are transmitted through weighted connections, and the total input received by the neuron will be the same as that received by the neuronThe threshold valueIf the value is greater than the threshold, it will passThe activation functionProcess the output that produces the neuron. At this point, the so-called M-P model is essentially a generalized linear regression model.
The ideal activation function is the step function, but the step function has the non-differentiable property generally we use Sigmoid function, which is explained in detail in the classical machine learning series linear models and generalized Linear models.
When Sigmoid function is selected as activation function of M-P neuron model, M-P model is actually a Logistic regression model.
Single-layer perceptron
The so-called single-layer perceptron is a neural network model with only one layer of M-P neurons. Contains only two layers of neurons, an input layer and an output layer. The input layer receives the external input signal and transmits it to the output layer. The output layer is one or more M-P neurons for activation processing.
Given training data set, weightAnd the threshold valueIt can be learned. The threshold valueCan be thought of as a fixed inputThe connection weight corresponding to the dummy node ofIn this way, weight and threshold learning can be unified as weight learning.
Its learning rules are: to the training sample, if the current output of the perceptron is, the weight adjustment of perceptron is shown as follows:
The above formula is error versus parameterYou take the partial derivative.It is called the learning rate. As can be seen from the above equation, if the prediction is correct and the error is 0, the perceptron parameters will not be adjusted. If not, more errors will be corrected.
For linearly separable problems, the perceptron will eventually converge, and for linearly indivisible XOR problems, there will be oscillation of the perceptron.It’s hard to stabilize, and you can’t get a proper solution. For example, for the xOR nonlinear problem shown in the figure below, perceptron is difficult to solve.
To solve nonlinear separable problems, we need to consider the use of multilayer functional neurons. There is a hidden layer between the input layer and the output layer. Hidden layer and output layer neurons are functional neurons with activation function.
Multilayer feedforward neural network
When the hidden layers are extended, the connections between neurons become more complex. Each layer of neurons is fully interconnected with the next layer of neurons, and there is no same-layer connection or cross-layer connection between neurons. Such neural network structure is usually called “multi-layer feedforward neural network”.
One of the layers refers to the existence of one or more hidden layers in addition to the input layer and output layer. Feedforward refers to: the external signal from the input layer through the hidden layer to the output layer, there is no signal back propagation.
Similar to the M-P model, neurons in the input layer receive input signals from the outside world, and the hidden layer and output layer process the signals, and the final results are output by neurons in the output layer.
What the neural network needs to do is to adjust the “connection weight” between neurons and the threshold value of each functional neuron according to the training given sample. The resulting model is expected to have some generalization ability for unknown samples.
BP solution of single hidden layer feedforward network
We take the single hidden layer feedforward network as an example to solve its model. Since the single hidden layer feedforward neural network is fixed here, we will not discuss the solution of its neural network structure, but only discuss its parameter learning problem. Here, Back Propagation algorithm (BP algorithm for short) is adopted to solve the model parameters. Neural network learning is to adjust the weights of connections between neurons and the thresholds of each M-P neuron according to the training data.
Given training set.., i.e. input sample byProperty description, the output isDimensional real valued vectors. As shown below:
Assume that the number of neurons in the hidden layer isA. The threshold value of output layer neurons is usedSaid. The first hidden layerOf neuronsSaid.
- The input layerOne neuron with a hidden layerThe connection weight between neurons is.
- The first hidden layerA neuron with an output layerThe connection weight between neurons is.
The first hidden layerThe input received by six neurons is;
The first output layerThe input received by six neurons is. Among themFor the first hidden layerThe output of one neuron.
It is assumed that both hidden and output layer neurons use Sigmoid activation functions. Training cases of, assume that the output of the neural network is, that is, the predictive output expression of the neural network is:
Then the network is in the sample setThe mean square error of, is:
At this point, the mathematical model of our entire feedforward neural network has been established. Next, we need to solve the parameter matrix of the neural network according to the optimization goal of minimizing the error. Before calculating the parameter matrix, let’s calculate the number of parameters of the neural network: input layer to hidden layerA weight; Hide layer to output layerA weight;Neuron threshold of hidden layer;Two output layer neuron thresholds, totalA parameter.
Then we need to use BP algorithm to update the estimation of these parameters. BP algorithm is based on the gradient Descent strategy to the targetDirection of negative gradientAdjust parameters. Given learning rate, there are:
Pay attention to theIt affects the first oneThe input value of one neuron, and then affects its output valueAnd then it affects, and finally:
On the basis ofThe definition of, there are:
bySigmoid
Properties of functionsAvailable:
So we can get the value of the BP algorithmUpdate formula of
Similar to the chain derivative rule, we can get:
Among them:
At this point, the general process of solving the model parameters is completed.
Finally, a conclusion is made: the working process of single hidden layer feedforward neural network based on BP is shown as follows: For each training sample, the input is transmitted to the input layer, the hidden layer and the output layer. The error is calculated according to the output result of the output layer, and then the error is back-propagated to the hidden layer neuron. Finally, the connection weight and threshold value are adjusted according to the error of the hidden layer neuron, and the process is iterated until the set termination condition is reached (such as the set number of iterations, or the error is less than a certain value).
The pseudo-code of its algorithm is shown in the figure below:
Problems existing in BP algorithm
In the actual application process, BP algorithm will also face some practical problems:
- Structural learning problem
The multilayer feedforward neural network includes input layer, output layer and one or more hidden layers.
Mathematically, a feedforward neural network with enough hidden layers of neurons can approximate any complex continuous function with any accuracy. In other words, a two-layer feedforward neural network is enough to approximate any continuous complex function, so a hidden layer is usually selected.
When the number of network layers is determined, how many neurons are set in each layer? In general, the input layer depends on the data dimension and data type of the problem being solved. The output layer is determined by the number of categories of problems to be classified. The number of hidden layer neurons is a hyperparameter problem, which depends on experience in actual operation.
- Initialization problem
Before network learning, connection weights and thresholds are initialized to different small random numbers. The difference is to make sure the network can learn. Small random number is to prevent its value from being too large and entering saturation state in advance. If the initialization parameters are not set properly, the network tends to fall into local optimum. If this happens, the program should run several times and the network should be reinitialized.
- Step size setting problem
The learning rate controls the convergence rate of the algorithm. Too big will oscillate easily, too small will converge too slowly. The better setting method is that the compensation setting is large at the beginning of the training, and the step size decreases gradually as the training goes on. In practical application, adaptive step size can be adopted, and the step size varies with network training.
- Update of weights and thresholds
The standard BP algorithm is sample update, that is, weights and thresholds are updated with each sample processed. Defects of sample update include frequent parameter updates, the descending gradient direction between different samples may cancel, and more iterations. The input order of the training sample has a great influence on the training result. A single processing will have a preference for the last updated sample, and it is quite difficult to sort the sample.
The method adopted is periodic update system, that is, the weights and thresholds are updated after all the training samples are processed once. However, when the cumulative error of periodic update decreases to a certain extent, the further decline will be very slow. This is when the sample update is good. The compromise becomes Batch updates.
- Overfitting problem
The over-fitting problem of neural network is as follows: the training error decreases continuously, but the testing error increases. This problem can be alleviated by early termination of training and regularization. End training in advance: stop training when the training error decreases and the test error increases; Regularization: adding the part describing the complexity of the network in the error objective function, such as the sum of squares of connection weights and thresholds, reflects the idea of simplicity.
Deep neural network
With the improvement of computing ability, the inefficiency of training can be alleviated. The large increase of training data can reduce the risk of overfitting. Therefore, complex models represented by deep learning begin to attract people’s attention.
Typical deep learning models are very deep neural networks: neural networks with more than two hidden layers.
The model of deep dip network is more complex, and the ability to fit more complex model is mainly attributed to the increase in the number of hidden layer neurons (model width) and the increase in the number of hidden layer (model depth). Increasing the number of hidden layers makes the neural network model more complicated, because the number of hidden layers not only increases the number of neurons with activation functions, but also increases the number of nested layers of activation functions.
However, with the increase of hidden layers, it becomes more and more difficult for errors to propagate layer by layer in BP algorithm, resulting in the disappearance of gradient. So how do you learn deep neural network models? At present, two approaches are mainly adopted:
- Pre-training + fine-tuning
In the pre-training stage, unsupervised layer-by-layer training is adopted, and the output of the nodes of the hidden layer in the upper layer is taken as input, while the output of the nodes of the hidden layer in this layer is taken as input of the nodes of the hidden layer in the next layer.
After all the pre-training is completed, fine-tuning training is carried out on the whole network, generally using BP algorithm.
The practice of pre-training + fine-tuning can be regarded as grouping a large number of parameters, finding locally good Settings for each group first, and then combining these locally good results for global optimization.
- Right to share
Weight sharing means that a group of neurons share the same connection weight. Convolutional neural network is a typical weight sharing network. CNN combines multiple convolutional layers and sampling layers to process the input signal, and then realizes the mapping between the input signal and the output target in the connection layer.
Deep learning understanding
Through multi-layer processing, deep learning gradually transforms the initial low-level feature representation into high-level feature representation, and then it can complete complex classification learning tasks with “simple model”. Thus, deep learning can be understood as feature learning or representation learning.
When non-deep learning technologies are used to solve real tasks, the features of describing samples usually need to be manually designed by human experts, which is called “feature engineering”. As is known to all, the quality of features has a crucial influence on the generalization performance of models, and it is not easy for human experts to design good features manually. Feature learning, on the other hand, automatically generates good features through deep learning, which makes machine learning a step closer to “fully automated data analysis”.