This article is participating in Python Theme Month. See the link to the event for more details
preface
motivation
Today, there are many useful and mature deep learning frameworks, such as PyTorch, Tensorflow and Mxnet, in which most of the popular algorithms and models have been encapsulated for us to call. So we have to write a neural network by hand. In fact, I think it is necessary. After learning the theory of deep learning, especially back propagation, it is difficult to have a deeper understanding of back propagation if you do not write one by yourself. Not only that, but in practice, sometimes the task is not so responsible that it is not necessary to introduce a large framework or your model needs to be trained and predicted on a small device, so you can implement simple neural networks in the underlying language.
Bit by bit
- Understanding deep learning
- Learn about Python and its libraries, such as Numpy Pandas and MatplotLab
Shared goals
Step by step, I’ll give you a hand-written version of implementing a network in Python, explaining some of the key steps to help you understand
import csv
import pandas as pd
Copy the code
Pandas, introduced here primarily for reading and viewing datasets, is a powerful tool set for analyzing structured data. In the data mining and analysis will often use this library, if you do not know it or feel it is necessary to learn.
Preparing a Data set
Before we start writing code, let’s talk about tasks that drive us to write a neural network. Only with a more comprehensive understanding of the task can we come up with solutions to the task or the problem. The data predict whether a person will have a heart attack. A heart disease dataset from the UCL database will be used. You can download it here.
Data characteristics
headers = ['age'.'sex'.'chest_pain'.'resting_blood_pressure'.'serum_clolestoral'.'fasting_blood_sugar'.'resting_ecg_results'.'max_heart_rate_achieved'.'exercise_induced_angina'.'oldpeak'.'slope_of_the_peak'.'number_of_major_vessels'.'thal'.'heart_disease']
Copy the code
Sets what attributes each sample of the dataset has, and the last column is the label, which indicates whether the record has heart disease. So if you’re interested in these metrics you can look them up, but I’m not going to tell you much about them.
heart_df = pd.read_csv('./data/heart.dat',sep=' ',names=headers)
heart_df.head()
Copy the code
age | sex | chest_pain | resting_blood_pressure | serum_clolestoral | fasting_blood_sugar | resting_ecg_results | max_heart_rate_achieved | exercise_induced_angina | oldpeak | slope_of_the_peak | number_of_major_vessels | thal | heart_disease | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 70.0 | 1.0 | 4.0 | 130.0 | 322.0 | 0.0 | 2.0 | 109.0 | 0.0 | 2.4 | 2.0 | 3.0 | 3.0 | 2 |
1 | 67.0 | 0.0 | 3.0 | 115.0 | 564.0 | 0.0 | 2.0 | 160.0 | 0.0 | 1.6 | 2.0 | 0.0 | 7.0 | 1 |
2 | 57.0 | 1.0 | 2.0 | 124.0 | 261.0 | 0.0 | 0.0 | 141.0 | 0.0 | 0.3 | 1.0 | 0.0 | 7.0 | 2 |
3 | 64.0 | 1.0 | 4.0 | 128.0 | 263.0 | 0.0 | 0.0 | 105.0 | 1.0 | 0.2 | 2.0 | 1.0 | 7.0 | 1 |
4 | 74.0 | 0.0 | 2.0 | 120.0 | 269.0 | 0.0 | 2.0 | 121.0 | 1.0 | 0.2 | 1.0 | 1.0 | 3.0 | 1 |
We as developers don’t need to know much about these data characteristics, and we can predict good results with our models. This is the benefit of deep learning. You don’t need a professional to process and feature engineer the data. Values are noted in the last column of the data label, with 1 for none and 2 for heart disease.
heart_df.isna().sum(a)Copy the code
age 0
sex 0
chest_pain 0
resting_blood_pressure 0
serum_clolestoral 0
fasting_blood_sugar 0
resting_ecg_results 0
max_heart_rate_achieved 0
exercise_induced_angina 0
oldpeak 0
slope_of_the_peak 0
number_of_major_vessels 0
thal 0
heart_disease 0
dtype: int64
Copy the code
Viewing data Types
# Check the data type
heart_df.dtypes
Copy the code
age float64
sex float64
chest_pain float64
resting_blood_pressure float64
serum_clolestoral float64
fasting_blood_sugar float64
resting_ecg_results float64
max_heart_rate_achieved float64
exercise_induced_angina float64
oldpeak float64
slope_of_the_peak float64
number_of_major_vessels float64
thal float64
heart_disease int64
dtype: object
Copy the code
The data type with no lost data in the data and the sample property is float64
Separate the data into training sets and test sets
After we have a general understanding of the data, such as the attributes, whether the data is complete, and the data type corresponding to each attribute, we start to split the data set into training sets and test sets. Here in the split data set and data standardization work, with skLearn provides two methods respectively train_test_split and StandardScaler
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
Copy the code
X = heart_df.drop(columns=['heart_disease'])
# Replace heart_disease with 0 and 1 instead of 1 and 2. Usually in dichotomies, it is preferred to use 0 and 1 to represent the two categories
heart_df['heart_disease'] = heart_df['heart_disease'].replace(1.0)
heart_df['heart_disease'] = heart_df['heart_disease'].replace(2.1)
# print(heart_df['heart_disease'].values.shape)
# (270,) to (270,1) [[0],[0]...]
y_label =heart_df['heart_disease'].values.reshape(X.shape[0].1)
X_train, X_test, y_train,y_test = train_test_split(X,y_label,test_size=0.2,random_state=2)
sc = StandardScaler()
sc.fit(X_train)
X_train = sc.transform(X_train)
X_test = sc.transform(X_test)
print(f"Shape of train set: {X_train.shape}")
print(f"Shape of test set: {X_test.shape}")
print(f"Shape of train label set: {y_train.shape}")
print(f"Shape of test label set: {y_test.shape}")
Copy the code
Shape of train set: (216, 13)
Shape of test set: (54, 13)
Shape of train label set: (216, 1)
Shape of test label set: (54, 1)
Copy the code
Define layer
Our early neural network structure is relatively simple, which is layer stack. If there is no activation function, it is a linear transformation of our data (matrix).
The middle layer of the neural network is mainly composed of input, output and parameters (weights). As shown in the figure, the edges between nodes are parameters (weights). In the fully connected layer, this is a fully connected layer. The output vector, each element (component) is all the input, all the components are linearly transformed to yj=∑iwijxi+bjy_j = \sum_{I} w_{ij} x_i + b_jyj=∑iwijxi+bj
class Layer:
def __init__(self) :
self.input = None
self.output = None
def forward(self, input) :
raise NotImplementedError
def backward(self,output_error, learning_rate) :
raise NotImplementedError
Copy the code
Defining the base layer
The base class Layer can also be understood as an interface, which is to define what the Layer looks like and what functions it can provide. Here forward and backward correspond to forward propagation and back propagation respectively. In the forward definition, what operations are performed in the propagation Layer. And backward calculates the gradient of input variables and parameters in the layer.
class FCLayer(Layer) :
def __init__(self,input_size,output_size) :
self.weights = np.random.rand(input_size,output_size) - 0.5
self.bias = np.random.rand(1,output_size) - 0.5
def forward(self,input_data) :
self.input = input_data
self.output = np.dot(self.input,self.weights) + self.bias
return self.output
def backward(self,output_error,learning_rate) :
# print(output_error)
# print(learning_rate)
input_error = np.dot(output_error, self.weights.T)
weights_error = np.dot(self.input.T, output_error)
self.weights -= learning_rate * weights_error
self.bias -= learning_rate * output_error
return input_error
Copy the code
We implement a fully connected layer, forward propagation Y=XTW+BY =X ^TW +BY =XTW+B, the key is back propagation, for input variables, weights and bias inverse derivative is the key, this will be explained later.
class ActivationLayer(Layer) :
def __init__(self,activation, activation_prime) :
self.activation = activation
self.activation_prime = activation_prime
def forward(self,input_data) :
self.input = input_data
self.output = self.activation(self.input)
return self.output
def backward(self,output_error, learning_rate) :
return self.activation_prime(self.input) * output_error
Copy the code
Back propagation
Neural network forward propagation are better understanding and implementation, is the linear transformation, may use matrix to represent the operation you know about linear algebra is not may seem a bit strange, but as long as the simple double linear algebra and come back to see, no big problem, the key is back propagation is difficult to understand, involves matrix derivation, this example is relatively simple. I’m going to list all the derivations, but it’s not hard to understand if you have some basis. The sense formula is more convincing. You need to be familiar with the chain rule, and then when you take the derivative of a particular parameter, and you follow it, you need to think about all the paths from that parameter to the loss function.
The weight is the derivative of the loss function
Here we take wijw_{ij}wij relative to the loss function to take the partial derivative of wij as an example, wijw_{ij}wij path from the figure above is still relatively simple path yjy_jyj and output or loss connection.
Take the derivative of the parameter with respect to the loss, which is either a scalar or a vector, and take the derivative of each element of the parameter matrix with respect to E.
Let’s combine the graph to see how to calculate an intuition. For the JTH component of the output Y vector, it is all the elements (features) of the input X sample multiplied by a corresponding weight, yjy_jyj for wijw_{ij}wij is its coefficient xix_ixi
And then we use the product chain rule to get the following formula
And then you apply that to the W matrix and you take the derivative of each of the entries, and you represent it as a matrix.
The offset is the derivative of the loss function
The next step is to take the bias, because bib_ibi is only related to yiy_iyi, so it’s relatively easy to calculate the gradient. I won’t explain too much here, if you are interested please leave me a message.
Take the derivative of X with respect to the output
Neural network back propagation is not updating the parameters to take the gradient of the parameters, why do you take the derivative of the variables? Because the gradient of the loss value needs to be passed down layer by layer, it is necessary to find the gradient of the variable.
When you take the X variable, you also take the partial of each of the elements of the sample, and then you put them together to get the partial of X with respect to E.
And the other thing we need to do is look at the graph, and again, we take each X and each element and we see how many paths it takes to get to the output, and then we take the derivative of each of those paths and we summarize them.
If you are not familiar with this derivation, you can look it up by following the diagram above.
So let’s put it in matrix form and take the derivative of this matrix.
Take the derivative in the direction of the activation function
Loss function
def tanh(x) :
return np.tanh(x)
def tanh_prime(x) :
return 1-np.tanh(x)**2
Copy the code
def mse(y_true, y_pred) :
return np.mean(np.power(y_true - y_pred,2))
def mse_prime(y_true,y_pred) :
return 2*(y_pred - y_true)/y_true.size
Copy the code
Defining the network structure
In the network structure, here is a simple layer-by-layer network, so in the constructor we maintain the list, and then add the full connection layer and the activation layer to the neural network through the add method. Loss and loss_prime each accept two methods, one for calculating the loss value and one for calculating the gradient in the back propagation. The FIT function is used to train the network and update the parameters in each layer of the network according to the gap between the model output and the true value, while the predict is to use the model to predict the input data.
class Network:
def __init__(self) :
self.layers = []
self.loss = None
self.loss_prime = None
def add(self, layer) :
self.layers.append(layer)
def use(self, loss, loss_prime) :
self.loss = loss
self.loss_prime = loss_prime
def predict(self, input_data) :
samples = len(input_data)
result = []
for i in range(samples):
output = input_data[i]
for layer in self.layers:
output = layer.forward(output)
result.append(output)
return result
def fit(self, x_train, y_train, epochs, learning_rate) :
samples = len(x_train)
for i in range(epochs):
err = 0
for j in range(samples):
output = x_train[j]
for layer in self.layers:
output = layer.forward(output)
err += self.loss(y_train[j],output)
error = self.loss_prime(y_train[j],output)
# print(f"error: {error}")
for layer in reversed(self.layers):
# print(f"error: {error}")
error = layer.backward(error, learning_rate)
err /= samples
print(f"epoch {(i+1)/epochs} error ={err}")
Copy the code
This is for a simple data set, XOR, and let the network run on this data set first to see how it works.
x_train = np.array([[[0.0]], [[0.1]], [[1.0]], [[1.1]]])
y_train = np.array([[[0]], [[1]], [[1]], [[0]]])
Copy the code
The infrastructure is ready, the next step is to build a network with these modules, and now a neural network for the XOR task.
net = Network()
net.add(FCLayer(2.3))
net.add(ActivationLayer(tanh,tanh_prime))
net.add(FCLayer(3.1))
net.add(ActivationLayer(tanh,tanh_prime))
net.use(mse,mse_prime)
net.fit(x_train,y_train,epochs=2000, learning_rate=0.01)
out = net.predict(x_train)
print(out)
Copy the code
Epoch 0.0005 error =0.7136337374260526 EPOCH 0.001 error =0.5905739289052101 EPOCH 0.0015 error =0.49847803406876834 Epoch 0.002 error =0.4315229169435869 EPOCH 0.0025 error =0.3833645522357618 EPOCH 0.003 error =0.34869455999659626 Epoch 0.0035 error =0.3235475456763668 EPOCH 0.004 error =0.3051103124983356 EPOCH 0.0045 error =0.29142808210519544 Error =0.28114855069402345 [Array ([[0.01603471]]), array([[0.884377]]), array([[0.89208809]]), Array ([[0.02946468]])]Copy the code
In terms of the predictive structure, the neural network does learn something over 2,000 iterations and give a good answer
[array ([[0.01603471]]), array ([[0.884377]]), array ([[0.89208809]]), array ([[0.02946468]]]]Copy the code
Give it a try
Give you a homework, ha ha, don’t dare. I’m going to write a network and I’m going to run around. This data set, whether I have heart disease or not doesn’t work very well. You can tune it and run it.
X_train = X_train.reshape(X_train.shape[0].1,X_train.shape[1])
Copy the code
net2 = Network()
net2.add(FCLayer(13.8))
net2.add(ActivationLayer(tanh,tanh_prime))
net2.add(FCLayer(8.5))
net2.add(ActivationLayer(tanh,tanh_prime))
net2.add(FCLayer(5.2))
net2.add(ActivationLayer(tanh,tanh_prime))
net2.use(mse,mse_prime)
net2.fit(X_train,y_label,epochs=35, learning_rate=0.1)
Copy the code
X_test = X_test.reshape(X_test.shape[0].1,X_test.shape[1])
Copy the code
output = net2.predict(X_test[0:3])
Copy the code
output
Copy the code
[[[0.82602593, 0.82602593]]), array([[0.17209814, 0.17209814]]), Array ([[0.95838074, 0.95838074]])]Copy the code
y_test[:3]
Copy the code
Tensor, there will be a tensor, and you will have a tensor
array([[1],
[0],
[0]])
Copy the code
If you have any questions about the above content, please leave a message. Due to the relatively hasty writing, if there is omission or way out of the place also hope that we are more corrections and criticism.