Deep Learning 003- Build and train deep neural network models

(Python libraries and versions used in this article: Python 3.6, Numpy 1.14, Scikit-learn 0.19, matplotlib 2.2)

Previously, we explained the single-layer neural network model and found that its structure is simple and difficult to solve some practical and complex problems, so now we develop the deep neural network model.

The depth of deep neural network is mainly reflected in the number of hidden layers. The former single-layer neural network has only one hidden layer, while the deep neural network uses >=2 hidden layers. Its basic structure is as follows:

There are two hidden layers, the maroon circle (with four neurons) and the green circle (with two neurons), so the structure of this deep neural network is 3-4-2. The picture is from professor Zhu Xingquan’s academic lecture point of view and summary lecture 2: Single neuron/single layer neural network.

For some very complex deep neural networks, the number of hidden layers may be hundreds or thousands, such as ResNet network structure, and its training process is also more complicated and time-consuming. Then these models are very deep.

Using deeper neural network can get better expression effect, which can be intuitively understood as: in each network layer, the characteristics of input features are abstracted step by step; The next layer of the network directly uses the features of the previous layer to perform further linear or nonlinear combinations, resulting in the output step by step.


1. Construct and train the deep neural network model

1.1 Preparing data Sets

This time, using some data generated by ourselves, the code is generated as follows:

# Prepare the data set
# Generate some raw data points here
dataset_X=np.linspace(- 10.10.100)
dataset_y=2*np.square(dataset_X)+7 # that is, label is the square *2 of feature and bias is 7
dataset_y /=np.linalg.norm(dataset_y) # Normalized processing
dataset_X=dataset_X[:,np.newaxis]
Copy the code

The data distribution of this dataset is as follows:

1.2 Build and train the model

Directly on the code:

# Build and train models
import neurolab as nl
x_min, x_max = dataset_X[:,0].min(), dataset_X[:,0].max()
multilayer_net = nl.net.newff([[x_min, x_max]], [10.10.1])
# Model structure: The hidden layer has two layers, each layer has 10 neurons, and one output layer.
multilayer_net.trainf = nl.train.train_gd Set the training algorithm to gradient descent
dataset_y=dataset_y[:,np.newaxis]
error = multilayer_net.train(dataset_X, dataset_y, epochs=800, show=100, goal=0.01)
Copy the code

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – — – a — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Epoch: 100; Error: 2.933891201182385; Epoch: 200; Error: 0.032819979078409965; Epoch: 300; Error: 0.040183833367277225; The goal of learning is reached

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –

It seems that although we set 800 cycles, we automatically quit when we hit the target of 0.01. You can draw a picture of what error is going to do

1.3 Use trained models to predict new data

We have no new data here. Assuming that the original dataset_X is new data, we can predict the results of these new data and compare the difference between the real value and the predicted value, so as to intuitively see the prediction effect of the model

# Use trained models to predict
predict_y=multilayer_net.sim(dataset_X)
plt.scatter(dataset_X,dataset_y,label='dataset')
plt.scatter(dataset_X,predict_y,label='predicted')
plt.legend()
plt.title('Comparison of Truth and Predicted')
Copy the code

It can be seen that the predicted value and the real value of the model are roughly the same, which at least indicates that the model performs well in the training set.

For more details on deep neural networks, see our blog on Neural Networks: From Neurons to Deep Learning.

In fact, to solve complex problems, is not necessarily increase model of depth (i.e. increasing hidden layer, but less number of neurons in each layer, the model structure is deep and thin), also can increase the width of the model (i.e. one or a few hidden layer, but increase the number of hidden layer neurons, the model structure is shallow and fat), which is better?

In dry | neural network is the basic knowledge of the most overlooked one mentioned: although studies have shown that network structure of shallow and fat can fit any function, but it requires a very “fat”, may need to tens of thousands of a hidden layer neurons, this will lead to greatly increase the number of model parameters. The comparison chart is as follows:

As can be seen from the figure above, when the accuracy is similar, the number of parameters is several times different. This also suggests that we tend to use deep neural networks rather than shallow “fat” ones.

# # # # # # # # # # # # # # # # # # # # # # # # small * * * * * * * * * * and # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

1. There are mature frameworks for the construction and training of deep neural networks, such as Keras, Tensorflow, PyTorch, etc., which are more simple to use. Here, it is only used to explain the internal structure and conduct simple modeling training.

2. In order to solve more complex problems, we generally choose the deep and thin model structure rather than the shallow and fat model, because the number of parameters of this model is very large and the training time is long.

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #


Note: This part of the code has been uploaded to (my Github), welcome to download.

References:

1, Classic Examples of Python machine learning, by Prateek Joshi, translated by Tao Junjie and Chen Xiaoli