A list,
According to our understanding of the natural structure and mechanism of the nervous system, the nervous system is composed of a large number of nerve cells (neurons) complex network, people on the network to establish a mathematical model and algorithm, to enable it to realize based on data such as pattern recognition, function mapping, etc. With the function of the “smart”, the network is a neural network. Among them, BP (Back Propagation) neural network was proposed by scientists headed by Rumelhart and McCelland in 1986. It is a multi-level feedforward network trained by error Back Propagation algorithm. BP network can learn and store a large number of input-output mode mappings without revealing the mathematical equations describing the mappings in advance. Its learning rule is to use the fastest descent method, through back propagation to constantly adjust the weight and threshold of the network, so as to minimize the sum of squares of error of the network. It is one of the most widely used neural network models. Therefore, understanding the structure of BP network and weight adjustment algorithm plays an important role in learning other neural networks.
For the fitting of 150 sets of stock data (see Appendix 1 for detailed data), five sets of data including opening, highest, lowest, closing and transaction times are selected and used to predict the closing data of the next day, thus establishing a stock data prediction model. A three-layer BP network structure including input layer, hidden layer and output layer is adopted. As shown in Figure 1, the input layer contains five neurons, the hidden layer contains three neurons, and the output layer is one neuron. Among them, the activation function of neurons in the hidden layer adopts the asymmetric Sigmoid function, expressed as:; the activation function of neurons in the output layer adopts the linear function, expressed as:. 150 sets of data are divided into three equal parts, two of which are used as training samples for network training and learning; The other is used as a test sample to test the generalization ability of the trained network. BP algorithm is used to modify the weights of hidden layer and output layer, so as to achieve the minimum difference between calculated output and actual sample output, and finally achieve more accurate prediction.
BP (Back Propagation) neural network was proposed by Rumelhart and McCelland in 1986. See their paper Learning Representations by Back-propagating Errors published in Nature.
BP neural network is one of the most widely used neural network models. BP network can learn and store a large number of input-output mode mappings without revealing the mathematical equations describing the mappings in advance. Its learning rule is to use the fastest descent method, through back propagation to constantly adjust the weight and threshold of the network, so as to minimize the sum of squares of error of the network.
Last time we said that the multilayer perceptron has a bottleneck in how to obtain the weights of hidden layers. Since we can’t get the weight of the hidden layer directly, can we adjust the weight of the hidden layer indirectly by getting the error of the output result and the expected output of the output layer first? BP algorithm is designed with this idea. Its basic idea is that the learning process is composed of two processes: signal forward propagation and error back propagation. In the forward propagation, the input sample is introduced from the input layer, and then transmitted to the output layer after being processed layer by layer by layer. If the actual output of the output layer is not consistent with the desired output (teacher signal), then the back propagation stage of the error is entered. In reverse transmission, the output is transmitted back to the input layer layer by layer through the hidden layer in some form, and the error is apportioned to all the units of each layer, so as to obtain the error signal of each unit, which is used as the basis for correcting the weight of each unit. The detailed flow of these two processes will be described later.
The signal flow diagram of BP algorithm is shown in the figure below
3. Analysis of BP network characteristics — BP three elements
When we analyze an ANN, we usually start with its three elements, namely
1) Network topology;
2) Transfer function;
3) Learning algorithm.
The characteristics of each element add up to determine the functional characteristics of the ANN. Therefore, we also start from these three elements of BP network research.
3.1 Topology structure of BP network
As we said last time, BP networks are actually multilayer perceptrons, so their topology is the same as that of multilayer perceptrons. Because the single hidden layer (three layer) perceptron has been able to solve simple nonlinear problems, it is the most common application. The topology of the three-layer perceptron is shown in the figure below.
A simple three-layer BP:
3.2 Transfer function of BP network
The transfer function adopted by BP network is nonlinear transformation function — Sigmoid function (also known as S function). Its characteristic is that the function itself and its derivative are continuous, so it is very convenient to deal with. Why this function is chosen will be further explained later when we introduce the learning algorithm of BP networks.
The unipolar S-type function curve is shown in the figure below.
The bipolar S-type function curve is shown in the figure below.
3.3 Learning algorithm of BP network
The learning algorithm of BP network is BP algorithm, also known as δ algorithm (in the learning process of ANN, we will find many terms with multiple names). Taking three-layer perceptron as an example, when the network output is different from the expected output, there is an output error E, which is defined as follows
Next, we will introduce the specific process of BP network learning and training.
Training a BP neural network is actually to adjust the weight and bias of the network. The training process of BP neural network is divided into two parts:
Forward transmission, layer by layer wave transfer output value; Reverse feedback, reverse layer by layer adjustment of weight and bias; Let’s look at forward transmission first. Before training the network, we need to randomly initialize weights and biases, take a random real number of [−1,1] [-1,1][−1,1] for each weight, take a random real number of [0,1][0,1] [0,1] for each bias, Then forward transmission begins.
The training of neural network is completed by multiple iterations, each iteration uses all the records of the training set, while each training network uses only one record, which is described abstractly as follows:
whileTermination conditions not met:for record:dataset:
trainModel(record)
Copy the code
4.1 Backpropagation
4.2 Training termination conditions
Each training round uses all the records of the data set, but when to stop and the conditions for stopping are as follows:
Set the maximum number of iterations, such as stopping training after 100 iterations with the dataset
Calculate the prediction accuracy of the training set on the network, and stop the training when it reaches a certain threshold
5. Specific flow of BP network operation
5.1 Network Structure
The input layer has N nn neurons, the hidden layer has P PP neurons, and the output layer has Q Q neurons.
5.2 Definition of Variables
Step 9: Judge the rationality of the model
Determine whether the network error meets the requirements.
When the error reaches the preset precision or the number of learning times is greater than the maximum number designed, the algorithm ends.
Otherwise, select the next learning sample and the corresponding output expectation, return the third part, and enter the next round of learning.
In the design of BP network, the number of layers of the network, the number of neurons in each layer, the activation function, the initial value and the learning rate should generally be considered. The following are some selection principles. 6.1 Layer theory of Network has proved that a network with deviation and at least one S-type hidden layer plus one linear output layer can approach any rational function. Increasing the number of layers can further reduce the error and improve the accuracy, but it also complicates the network. Also not to be used only with nonlinear activation function of single-layer network to solve the problem, because you can do it with a single network problems, using adaptive linear network will be solved, and the adaptive linear network speed, for the nonlinear function can only be used to solve the problem, single precision is high enough, also only add layer can achieve the desired results. 6.2 Number of Hidden Layer Neurons Network training accuracy can be improved by adopting a hidden layer and increasing the number of its neurons, which is much simpler than increasing the number of network layers in structural implementation. Generally speaking, we use precision and training network time to measure the design quality of a neural network: (1) When the number of neurons is too small, the network cannot learn well, and the number of training iterations is relatively large, and the training accuracy is not high. (2) When there are too many neurons, the network has more powerful functions and higher accuracy, and the number of training iterations is also larger, which may lead to over fitting. Thus, the principle of selecting the number of hidden layer neurons of neural network is as follows: on the premise of solving the problem, add one or two neurons to accelerate the error reduction speed.
6.3 Selection of Initial Weights Generally, the initial weights are random numbers ranging from −1 to 1. In addition, After analyzing how the two-layer network trains a function, Widelow et al. proposed a strategy of selecting an initial weight of S √r, where R is the number of inputs and S is the number of neurons at the first layer.
6.4 Learning Rate The learning rate ranges from 0.01 to 0.8. A large learning rate may cause system instability, but a small learning rate causes slow convergence and requires a long training time. For a more complex network, different learning rates may be required at different positions of the error surface. In order to reduce the training times and time for finding the learning rate, a more appropriate method is to adopt a variable adaptive learning rate, so that the network can set different learning rates at different stages.
6.5 Selection of Expected Error In the process of network design, an appropriate value of expected error should also be determined after comparative training, which is relative to the number of hidden layer nodes required. In general, two networks with different expected error values can be trained at the same time, and finally one of them can be determined by integrating factors.
7 Limitations of BP network BP network has the following problems:
(1) Long training time is needed: this is mainly caused by the small learning rate, which can be improved by changing or adaptive learning rate. (2) Completely unable to train: this is mainly reflected in the paralysis of the network. Usually, in order to avoid this situation, one is to choose a smaller initial weight and adopt a smaller learning rate. (3) Local minimum: The gradient descent method adopted here may converge to the local minimum, and better results may be obtained by using multi-layer networks or more neurons.
The main objective of P algorithm is to speed up the training speed and avoid falling into local minimum, etc. Common improvement methods include driving factor algorithm, adaptive learning rate, changing learning rate and action function shrinkage method. The basic idea of momentum factor method is to add a value proportional to the previous weight change to each weight change on the basis of back propagation, and generate a new weight change according to the back propagation method. The adaptive learning rate method is for some specific problems. The principle of the method to change the learning rate is that, in successive iterations, if the reciprocal sign of the objective function is the same for a certain weight, the learning rate of the weight increases; otherwise, if the sign is opposite, the learning rate of the weight decreases. And the shrink-back rule is to shift the function, which is to add a constant.
Ii. Source code
Clear all CLC CLF % adopts three-layer BP network structure % the number of neurons in input layer is5, the number of neurons in the hidden layer is3, the number of neurons in the output layer is1% Maximum iterations maxCISHU =5000; %e Specifies memory for the maxCISHU memory for the difference between the calculated output and the actual sample output.1); % Enter the data dimension5, enter the number of nodes5% MAXP day high price sequence % MINP day low price sequence % SP day opening price % EP day closing price % TNum day trading volume % call data %shuju=xlsread('dm.xlsx'.'B1:K151');
shuju=importdata('BP_ZXF.xlsx');
sp=shuju.data(:,1)';
maxp=shuju.data(:,2)';
minp=shuju.data(:,3)';
tnum=shuju.data(:,10)';
ep=shuju.data(:,4)'; % Divide the data set into training sample set and test sample set in 2:1 order jISHU = Length (EP); jishu=ceil(jishu/3*2) ; % test sample set is 2/3 to last SPT =sp(jISHU +1:end); maxpt=maxp(jishu+1:end); minpt=minp(jishu+1:end); tnumt=tnum(jishu+1:end); ept=ep(jishu+1:end); % sp=sp(1: jISHU); maxp=maxp(1:jishu); minp=minp(1:jishu); tnum=tnum(1:jishu); ep=ep(1:jishu); % Record the maximum and minimum values of each group to prepare for normalization of the training sample set maxp_max= Max (MAXP); maxp_min=min(maxp); minp_max=max(minp); minp_min=min(minp); ep_max=max(ep); ep_min=min(ep); sp_max=max(sp); sp_min=min(sp); tnum_max=max(tnum); tnum_min=min(tnum); % Target data is the next day's closing price, equivalent to moving the time series of that day's closing price forward by one unit goalP =ep(2: jISHU); % data normalization, all data normalization to (0, 1) guiyi = @ (A) ((A - min (A))/(Max - min (A) (A))); maxp=guiyi(maxp); minp=guiyi(minp); sp=guiyi(sp); ep=guiyi(ep); tnum=guiyi(tnum); % The number of target data after goalp is ep moved forward one bit, so the target data of the last group is missing % Therefore, all data sequences except target data of goalp are deleted. minp=minp(1:jishu-1); sp=sp(1:jishu-1); ep=ep(1:jishu-1); tnum=tnum(1:jishu-1); % Number of learning cycles loopn, that is, the number of training samples loopn=length(maXP); For convenience, 5 row vectors are put into simP of a 5*loopn matrix, and each column is a sample vector simP =[MAXP; MINp;sp;ep;tnum]; simP =[MAXP; MINp;sp;ep;tnum]; simP =[MAXP; MINp;sp;ep;tnum]; According to relevant data, the number of nodes in the hidden layer is less than the number of input nodes. Generally, the number of input nodes is 1/2 bn=3. % hidden layer activation function is the S-type function jihuo=@(x)(1/(1+exp(-x))); %bx is used to store the output of each node in the hidden layer. %bxe is used to store the value of Bx processed by the S function, that is, the input of the output layer bx= Zeros (BN,1); bxe=zeros(bn,1); % weight learning rate u u=0.02; %W1(m,n) represents the weight of the NTH input value of the MTH neuron node in the hidden layer, that is, each row corresponds to a node %, so the weight W1 from the input layer to the hidden layer constitutes a bn*5 matrix, and the initial value randomly generates W1=rand(BN,5); %W2(m) represents the initial weight of the MTH input of the output node, using random generation W2=rand(1,bn); %loopn training samples, corresponding loopn output out=zeros(loopn,1); For k=1:1: maxCISHU % Training begins, I means that the input is the ith sample vector. For I =1:1:loopn % Coefficient of the corresponding line is W1 n for j = 1:1: bn bx (j) = W1 (j:) * simp (:, I); bxe(j)=jihuo(bx(j)); endCopy the code
3. Operation results
Fourth, note
Version: 2014 a