Suck the cat with code! This paper is participating in[Cat Essay Campaign]

The title suggests

This topic is Ng’s after-school programming homework, which is also the first demo of the author’s beginner in machine learning, available at github.com/Asthestarsf… Find the data set and complete code.

There are two kinds of images in the dataset:

One is cat, one is not cat, and our task is to train a classifier to classify the input image and get the category of the image — is cat, or is not cat.

The overall train of thought

Since this is the beginning of machine learning, we do not use convolutional neural networks here, but consider using multi-layer linear perceptron (MLP), which is often referred to as the full connection layer.

I divided the process into the following steps:

  1. Data reading and processing
  2. Parameter initialization
  3. The forward propagation
  4. Calculation error
  5. Back propagation
  6. Update parameter
  7. To predict
  8. Implementation of additional functionality

To get a clearer understanding of the various processes in MLP, I used numpy implementations, and of course the megengine and Pytorch implementations are given at the end of this article.

The code on

I believe that you have a certain understanding of full connection, activation function, loss function, gradient descent, etc., and then directly to the code in detail.

Data reading and processing

In this case, there are 209 images from 0.jpg to 208.jpg in the training set, and 50 images in the test set, and there are two TXT files storing the information of image categories, we need to read and process these data, the code is as follows:

Def Read_label(path): # open(path, 'r') as file: Data = list(file.read().replace(' ', '').replace('\n', '')) label = list(map(int, Return label def Read_data(path): Img = [] filenames = os.listdir(path) filenames. Sort (key=lambda x: Int (x[:-4])) # sort filenames by (xx.jpg) Im.append (cv2.resize(cv2.imread(path + filename, 1), (64, 64))) # return np.array(img)Copy the code

Label and picture are read separately. Label and picture are one-to-one corresponding, while os.listdir reading will not follow the order of filename size. Therefore, we will sort the obtained filename and use cv2.imread to read, and finally get a matrix, which we will use for training later

Before entering the network, we need to process the data we read — leveling and normalization:

Train_label = np.array(Read_label(Path_train_label)). 0 1) Test_label = np.array(Read_label(Path_test_label)).0 (1, -1) # transposition to (64*64*3, files Acount) matrix (The matrix information of the same picture is converted to a column) Train_data = Read_data(Path_train). Shape (Train_label.shape [1], 0 -1).T/255 test_data = Read_data(Path_test).reshape(test_label.shape[1], -1).T/255Copy the code

We can use other normalization methods here, and we’ll leave it to you to explore.

Parameter initialization

def Init_params(layers): Seed (3) # The dictionary is used to store the parameters L = len(layers) # the number of layers for the neural network in range(1, L): parameters["W" + str(l)] = np.random.randn(layers[l], Parameters ['b'+ STR (l)] = np.zeros((layers_dims[l], layers_dims[l], layers_dims[l], layers_dims[l], layers_dims[l], layers_dims[l], layers_dims[l], layers_dims[l], 1)) # initialize to 0 return parametersCopy the code

Xaiver initialization method is used for initialization, which can make the network convergence faster. Of course, random initialization can also be used, but all zeros can not be used for initialization, so that the learning of each layer of the network is consistent, thus losing the significance of multiple layers.

Note that a dictionary is used to store the weights and biases of each layer, and the ability to save parameters will be added later.

The forward propagation

def Forward_propagation(X, parameters): The activation value A for each caches cache will give to the activation value Yhat for the next caches output for the linear propagation function, Caches = [] A = X L = len(parameters) // 2 # get int for range(1, L): # (1,3) A, cache = Activation_forward(A, parameters['W' + STR (l)], parameters['b' + STR (l)], "Hiden") caches.append(cache) Yhat, cache = Activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], "Output") caches.append(cache) return Yhat, cachesCopy the code

Here the caches store parameter will be used to find the gradient

Yhat represents the output result of the last layer, for which loss will be solved and back propagation will be carried out

We added activation functions between the two layers. Here I used TanH and the Sigmoid activation function for the last layer. Of course, you can also use activation functions that you know, such as ReLU

def TanH(Z):
    return (np.exp(2*Z)-1)/(np.exp(2*Z)+1)
​
​
def Sigmoid(Z):
    return 1/(1+np.exp(-Z))
Copy the code

Use Activation_forward to encapsulate a forward propagation process – a nonlinear activation with a linear propagation and activation function, and a final layer with the Sigmoid activation function

def Activation_forward(A_pre, W, b, Type='Hiden'): "" Z represents the matrix after linear propagation, which will be lost to the activation function A_pre represents the activation value of the previous layer, which will be lost to the linear propagation unit to achieve full connectivity, and B will be broadcast to the same size as W. Z = Linear_forward(A_pre, W, b) cache = (A_pre, W, b) If Type == "Output": A = Sigmoid(Z) elif Type == "Hiden": A = TanH(Z) return A, cache def Linear_forward(A, W, b): # Linear_forward(A, W, B): # Linear_forward(A, W, BCopy the code

Calculate loss

def Compute_cost(Yhat, Y): M = y.shape [1] # Cross entropy error calculation, combined with Sigmoid function to form convex function, Multiply (np.log(Yhat), Y) + np.multiply(Np.log (1-yhat), DYhat = - (np.divide(Y, Yhat) -np. divide(1-y, 1-yhat)) return cost, dYhatCopy the code

Here, cross entropy is used as the loss function, and the gradient of Loss relative to Yhat is calculated, so the back propagation starts

Back propagation

We need to propagate back to the linear layer and the two activation functions as follows:

def Linear_backward(dZ, cache): A, W, b = np.dot(dZ, Sum (dZ, axis=1, keepdims=True)/m # db/dZ=I, keeping the dimension unchanged dA = np.dot(W.T, dZ) return dA, dW, db def Sigmoid_backward(dA, A): # Sigmoid function derivative is S(1-s) dZ = dA * A*(1-a) # gradient with respect to cost return dZ def TanH_backward(dA, A): The derivative of TanH is 1-H squared dZ = dA*(1-A**2) # gradient with respect to cos (t) return dZCopy the code

One thing to note is that the chain rule is used here, so the gradient computed is directly relative to loss, not to the layer input.

Then encapsulate the above functions to represent a layer of back propagation:

def Activation_backward(dA, cache, A_next, activation="Hiden"): "" The cache stores A_pre, W, and ba_NEXT as the activation values lost to the next layer, that is, the activation values output by this layer will be input into the gradient calculated by the previous layer when propagating backwards each time, and the gradient between the layer parameter and cost will be directly obtained. """ if activation == "Hiden": dZ = TanH_backward(dA, A_next) elif activation == "Output": dZ = Sigmoid_backward(dA, A_next) dA, dW, db = Linear_backward(dZ, cache) return dA, dW, dbCopy the code
def Backward_propagation(dYhat, Yhat, Y, caches): L = LEN (caches) # 4 M = Y.shape[1], grads["dA" + STR (L)], Grads [" DB "+ STR (L)] = Activation_backward(dYhat, caches[L-1], Yhat, "Output") # hidden layer for L in reversed(range(L-1)): # (3, 0] grads [" dA "+ STR (l + 1)], grads [" dW" + STR (l + 1)]. grads["db" + str(l + 1)] = Activation_backward( grads["dA" + str(l + 2)], caches[l], caches[l+1][0], "Hiden") # caches[][0] STORES A, by A_next, return gradsCopy the code

Dictionaries are still used to store gradients, which will be used to update parameters

Parameters are updated

Def Update_params(parameters, grads, learning_rate): len = len(parameters) // 2 for L in range(L): parameters["W" + str(l + 1)] -= learning_rate * \ grads["dW" + str(l + 1)] parameters["b" + str(l + 1)] -= learning_rate  * \ grads["db" + str(l + 1)] return parametersCopy the code

So simple to believe that everyone can understand it!

At this point, the network is complete, but don’t get too excited, we still need to improve some other features

training

def Train_model(X, Y, parameters, learning_rate, iterations, threshold): # Cost = [] # Store the loss value of every 100 iterations and create line graph for I in range(iterations): Yhat, caches = Forward_propagation(X, parameters) # Y) # compute error = Backward_propagation(dYhat, Yhat, Y, caches) # compute gradient parameters = Update_params(parameters, grads, If I % 100 == 0: costs. Append (cost) print(f" {I}, error: {cost}") if cost < threshold: Append (cost) print(f" iteration number: {I}, error: {cost}") break return parameters, costs, ICopy the code

X and Y represent pictures and labels, parameters are network parameters, learning_rate is the learning rate, iterations is the number of iterations, and threshold can terminate the training in advance through the loss value.

The parameters returned here will be used to save the parameters, and costs will be used to draw the Loss curve.

Draw loss curve

def Plot(costs, layers):
    plt.plot(costs)
    plt.ylabel('cost')
    plt.xlabel('iterations')
    plt.title("Learning rate =" +
              str(learning_rate) + f",layers={layers}")
    plt.show()
​
Copy the code

Some network configuration information is added when the curve is drawn.

Parameter saving and reading

def Save_params(parameters, layers, path): Savetxt (path+' layer. CSV ', layers, delimiter=',') n = len(parameters)//2 # Store each parameter separately, For I in range(1, n+1): np.savetxt(path+'W'+str(i)+'.csv', parameters['W'+str(i)], delimiter=',') np.savetxt(path+'b'+str(i)+'.csv', parameters['b'+str(i)], delimiter=',') def Load_params(path): Loadtxt (path+'layers.csv', dtype=int, delimiter=',')) n = len(layers) for i in range(1, n): parameters['W'+str(i)] = np.loadtxt(path+'W'+str(i) + '.csv', delimiter=",").reshape(layers[i], -1) parameters['b'+str(i)] = np.loadtxt(path+'b'+str(i) + '.csv', delimiter=",").reshape(layers[i], 1) return layers, parametersCopy the code

Use Np.savetxt for storage, save the weight and bias of each layer to the specified folder, use a unified naming method, easy to save and read, take the four-layer network as an example, can be obtained as follows:

Visual interface

After a few minutes of training, we have trained parameter files that we can load to make predictions, but how can we not have a “nice looking” interface? Here I use the Tkinter package to achieve a simple visual interface

def Create_window(parameters): Global window # global = tk.tk () window.title(' cat identifier ') window.geometry('650x650') Window. configure(background='lightpink') label = tk. label (window, text=' ', font=(' regular script ', 15), fg='Purple', bg='orange') label.pack(side='top') num = tk. Label.pack (window, text='2000301712', font=( 'fira_Code'), bg='orange', Choose_button = tk.Button(window, text=' open a picture ') # lambda can prevent functions with arguments from running automatically. Fg ='deeppink', bg='violet', activebackground='yellow', font=('宋体', 20), command=lambda: Show_img(parameters)) choose_button.pack(side='bottom') window.mainloop() def Show_img(parameters): Global window, img file = tk. Filedialog. Askopenfilename # () the selected file path img = Image. Open (file) # use cv2 reads the images, Data = cv2.resize(Cv2.imread (file, 1), (64, 64)) 0 0 -1).T/255 img = imagetk. PhotoImage(img) Predict_button = tk.Button(window, text=' identification! ', fg='CornflowerBlue', bg='slateblue', ActiveBackground ='red', font=(' windowtext ', 20), command=lambda: Predict(data, parameters)) Predict_button.pack(side='bottom') Predict_button.after(5000, Label_Img = tk.label (window, Label_img. pack(side='top') label_img. after(5000, label_img.destroy)Copy the code

Run to get the following “stud” interface (run on Windows can normally display Chinese) : \

If you don’t like it, you can change it in the code above

Click the button below to select a picture for prediction:

The smiley face below indicates a cat, and the command line gives the probability of a cat

The complete code

The full code and data set can be accessed on my Github: github.com/Asthestarsf…

MegEngine

Only part of the model is implemented here:

import megengine as mge import megengine.module as M class CustomMLP(M.Module): def __init__(self, layers:list, in_dim:int): super(CustomMLP, self).__init__() self.modules = M.Sequential(*self._make_layer(layers, in_dim)) def forward(self,inputs): for moudule in self.modules: inputs = moudule(inputs) return inputs def _make_layer(self, layers, in_dim): length = len(layers) modules = [M.Linear(in_dim, layers[0])] for i in range(length-1): activation = M.ReLU() layer = M.Linear(layers[i], layers[i+1]) modules.append(activation) modules.append(layer) modules.append(M.Sigmoid()) return modules model = CustomMLP (,8,7,1 [20], 3) print (model)Copy the code

Network structure diagram can be obtained by running:

Pytorch

import torch import torch.nn as nn class CustomMLP(nn.Module): def __init__(self, layers:list, in_dim:int): super(CustomMLP, self).__init__() self.modules = nn.Sequential(*self._make_layer(layers, in_dim)) def forward(self,inputs): for moudule in self.modules: inputs = moudule(inputs) return inputs def _make_layer(self, layers, in_dim): length = len(layers) modules = [nn.Linear(in_dim, layers[0])] for i in range(length-1): activation = nn.ReLU() layer = nn.Linear(layers[i], layers[i+1]) modules.append(activation) modules.append(layer) modules.append(nn.Sigmoid()) return modules model = CustomMLP (,8,7,1 [20], 3) print (model)Copy the code

The code is very similar. Have any question can contact me to answer!