The original link: mp.weixin.qq.com/s/Q8tNXsDh6…

Quick start the second PyTorch tutorial, which explains how to build a neural network. Previous Article:

  • Quick start Pytorch(1)- Installation, tensors and gradients

Contents of this article:


Neural networks

In PyTorch, torch. Nn is specifically used to implement neural networks. Nn. Module contains the network layer construction, and a method — forward(input), and returns the network output outpTU.

Here is a classic LeNet network for classifying characters.

For neural networks, a standard training process goes like this:

  • Define a multi-layer neural network
  • The preprocessing and preparation of data sets for input to the network
  • Enter data into the network
  • Calculate the cost of the network
  • Back propagation, calculate the gradient
  • Update the gradient of the network, a simple update rule isweight = weight - learning_rate * gradient

3.1 Defining a Network

Firstly, a neural network is defined. The following is a five-layer convolutional neural network, consisting of two convolutional layers and three full-connection layers:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        Input image is single channel, conv1 Kenrnel size=5*5, output channel 6
        self.conv1 = nn.Conv2d(1.6.5)
        Conv2 kernel size=5*5, output channel 16
        self.conv2 = nn.Conv2d(6.16.5)
        Full connection layer
        self.fc1 = nn.Linear(16*5*5.120)
        self.fc2 = nn.Linear(120.84)
        self.fc3 = nn.Linear(84.10)
        
    def forward(self, x):
        A sliding window (2,2) is used for pooling
        x = F.max_pool2d(F.relu(self.conv1(x)), (2.2))
        If the size of the kernel is square, you can define only one number, such as (2,2) with 2
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(- 1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        All dimensions except the Batch dimension
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)
Copy the code

Print network structure:

Net(
  (conv1): Conv2d(1.6, kernel_size=(5.5), stride=(1.1))
  (conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True))Copy the code

Here the forward function must be implemented, whereas the backward function is automatically defined when using autograd, and any tensor operation can be used in the forward method.

Net.parameters () can return network training parameters, as shown in the following example:

params = list(net.parameters())
print('Parameter number:', len(params))
# conv1.weight
print('First argument size:', params[0].size())
Copy the code

Output:

Number of parameters:10First parameter Size: torch.Size([6.1.5.5])
Copy the code

Then simply test the network and generate a random 32*32 input:

# Randomly define a variable input to the network
input = torch.randn(1.1.32.32)
out = net(input)
print(out)
Copy the code

Output result:

tensor([[ 0.1005.0.0263.0.0013.0.1157.0.1197.0.0141.0.1425.0.0521.0.0689.0.0220]], grad_fn=<ThAddmmBackward>)
Copy the code

Then backpropagation requires clearing the gradient cache and propagating the random gradient:

Clear the gradient cache for all parameters, then calculate the random gradient for backpropagation
net.zero_grad()
out.backward(torch.randn(1.10))
Copy the code

Note:

Nn only supports ** mini-batches ** data, that is, the input cannot be a single sample. For example, for NN.Conv2d the input is a 4-dimensional tensor –nSamples * nChannels * Height * Width.

So, if you input a single sample, use input.unsqueeze(0) to augment a fake Batch dimension from 3 to 4.

3.2 Loss function

The input of the loss function is (output, target), that is, the data of the network output and the real label pair, and then returns a value representing the difference between the network output and the real label.

PyTorch has already defined a number of loss functions. Here we use a simple mean square error (MSELoss) : nn.mseloss

output = net(input)
# Define false tags
target = torch.randn(10)
Resize to the same size as output
target = target.view(1.- 1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)
Copy the code

The output is as follows:

tensor(0.6524, grad_fn=<MseLossBackward>)
Copy the code

Here, the calculation diagram of data input to output of the entire network is shown as follows, which is actually the process of calculating Loss from data input layer to output layer.

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss
Copy the code

If loss.backward() is called, then the whole graph is differentiable, that is to say, including Loss, all tensor variables in the graph, as long as their attribute requires_grad=True, the gradient.grad tensor will accumulate along with the gradient.

Use code to illustrate:

# MSELoss
print(loss.grad_fn)
# Linear layer
print(loss.grad_fn.next_functions[0] [0])
# Relu
print(loss.grad_fn.next_functions[0] [0].next_functions[0] [0])
Copy the code

Output:

<MseLossBackward object at 0x0000019C0C349908>

<ThAddmmBackward object at 0x0000019C0C365A58>

<ExpandBackward object at 0x0000019C0C3659E8>
Copy the code

3.3 Back Propagation

The realization of backpropagation only needs to call Loser.Backward (). Of course, the current gradient cache needs to be cleared first, that is. Zero_grad () method, otherwise the previous gradient will be accumulated to the current gradient, which will affect the update of weight parameters.

Here is a simple example of the result of bias at conv1 layer before and after the back propagation:

Clear gradient cache for all parameters
net.zero_grad()
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Copy the code

Output result:

conv1.bias.grad before backward
tensor([0..0..0..0..0..0.])

conv1.bias.grad after backward
tensor([ 0.0069.0.0021.0.0090.0.0060.0.0008.0.0073])
Copy the code

To learn more about the torch. Nn library, check out the official documentation:

Pytorch.org/docs/stable…

3.4 Updating weights

The simplest updating weight rule using Stochastic Gradient Descent (SGD) method is as follows:

weight = weight - learning_rate * gradient

Following this rule, the code implementation looks like this:

# Simple weight implementation update example
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)
Copy the code

However, this is just the simplest rule. There are many optimization algorithms for deep learning, not only SGD, but also Nesterov-SGD, Adam, RMSProp, etc. To adopt these different methods, the torch. Optim library is used here, as shown in the following examples:

import torch.optim as optim
# create optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Perform the following actions during training
optimizer.zero_grad() Clear the gradient cache
output = net(input)
loss = criterion(output, target)
loss.backward()
# Update weight
optimizer.step()
Copy the code

Note that the optimizer.zero_grad() method is also called to empty the gradient cache.

This section tutorial:

Pytorch.org/tutorials/b…

The code for this section:

Github.com/ccc013/Deep…


summary

The second chapter mainly introduces the construction of a neural network, including the definition of the network, selection of loss function, back-propagation calculation of gradient and weight parameters update.

Welcome to follow my wechat official account – Machine Learning and Computer Vision, or scan the qr code below, we can communicate, learn and progress together!

Past wonderful recommendation

Machine learning series
  • Beginners of machine learning actual combat tutorial!
  • Model evaluation, over-fitting, under-fitting and hyperparameter tuning methods
  • Summary and Comparison of Commonly used Machine Learning Algorithms
  • Summary and Comparison of Common Machine Learning Algorithms (PART 1)
  • How to Build a Complete Machine Learning Project
  • Data Preprocessing for feature Engineering (PART 1)
  • Learn about eight applications of computer vision
Github projects & Resource tutorials recommended
  • [Github Project recommends] a better site for reading and finding papers
  • TensorFlow is now available in Chinese
  • Must-read AI and Deep learning blog
  • An easy-to-understand TensorFlow tutorial
  • Recommend some Python books and tutorials, both beginner and advanced!
  • [Github project recommendation] Machine learning & Python
  • [Github Project Recommendations] Here are three tools to help you get the most out of Github
  • Github provides information about universities and foreign open course videos
  • Did you pronounce all these words correctly? Incidentally recommend three programmers exclusive English tutorial!