Hope to turn learning into a game, but learning is not a game after all. Rather, they spontaneously felt themselves addicted to pytorch.

How can we use Pytorch to implement a network? In fact, nowadays the network is more and more complicated, and it is difficult to make clear the author’s thoughts at a glance. How can we read the implementation of some popular papers at present? In fact, we can not read the basic knowledge is usually not solid, not like the author step by step, but inserted into the industry, no hurry, today we will start from the simple. Building a simple network usually requires the following basics.

  • Task breakdown problem
  • And then find an appropriate set of functions, usually we use a certain structure of the network to simulate the complex set of functions
  • Collect appropriate data that accurately reflects our mission
  • Then set a target, which for the neural network is the loss function, so that the model parameters are adjusted using the direction
  • Set the policy for tuning parameters, that is, the optimizer
  • Setting, that is, which indicators the task cares about more, these indicators can reflect the task ability, such as accuracy, accuracy and recall

Everything we do is based on PyTorch. People like PyTorch because it provides elegant module and class based design,

import torch
import torch.nn as nn
Copy the code

Torch. Nn is a network design module that provides classes for designing a neural network.

import torch.nn.functional as F
Copy the code

In this module, we provide some methods for defining convolution, activating functions, etc. After using Pytorch for some time, we may find that there is some crossover between torch. Nn. Functional and torch. In particular, torch. Nn provides methods that inherit from nn.Module and are stateful, whereas torch. Nn. Functional has no state and requires input parameters, but has internal parameters. For example, an Nn. Conv2d module would have some internal attributes, such as self.weight. However, F.Cone v2D just defines the operation and needs to pass all the parameters (including weights and biases).

Nn is more comprehensive than torch. Nn. Functional, so why do we need torch. But nn.functional doesn’t have any extra baggage, so it is more flexible than torch. Nn and therefore indispensable.

x = torch.randn(1.1)
w = nn.Parameter(torch.randn(1.1))

output = x * w
print(output)
Copy the code

If the variables need to be involved in gradient calculation, then the tensor that comes back from the calculation has a grad_fn property, and then the tensor that comes back from the output has the grad_fn property of the parameter.

tensor([[-0.1428]], grad_fn=<MulBackward0>)
Copy the code
output.backward()
print(w.grad)
Copy the code
Tensor ([[0.2470]])Copy the code

After completing one episode, we continue with the main mission and continue introducing the package we need

from torch.utils.data import DataLoader

import torchvision.datasets as datasets 
import torchvision.transforms as transforms
Copy the code

Torch. Utils. data imports the DataLoader to load the data set. Later we will share how to customize the data set. Torchvision introduced this module earlier by using PyTorch to provide some predefined models or data sets for visual aspects. These tools help us to easily conduct visual orientation research.

Define the network

class Net(nn.Module) :
  def __init__(self,input_size,num_classes) :
    super(Net,self).__init__()
    self.hidden_dim = 30
    self.fc1 = nn.Linear(input_size,self.hidden_dim)
    self.fc2 = nn.Linear(self.hidden_dim,num_classes)

  def forward(self,x) :
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return x
Copy the code

Usually what we do when we init, when we initialize, is we define some basic module or layer in the network, and then what we do in the forward is we organize those layers or layers together effectively. The network structure consists of two full connections, with ReLU in the middle, input_size to specify the sample size and num_classes to specify the number of categories. And then you have to type x for forward propagation

model = Net(784.10)
x = torch.randn(64.784)
print(model(x).shape)
Copy the code

Define a network and specify that the input sample size is 28 x 28, which is 784, which is the MINIST dataset at a glance. Each input is 28 x 28 images, and then input this single channel 28 x 28 map flattened to 784 dimensions into the network.

X = Torch. Randn (64,784) where 64 represents a batch input size of 64 samples, and output a probability distribution of 10 categories for each sample.

torch.Size([64, 10])
Copy the code

Set up network running equipment

A language like Python can be used to assign values to a conditional statement, which reflects the fact that Python is an interpretive language.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Copy the code

Setting hyperparameters

input_size = 784
num_classes = 10
learning_rate = 0.001
batch_size = 64
num_epochs = 1
Copy the code

These parameters are not controlled by the network and the network does not need to learn. They are the input sample size, sample category, learning rate, number of model samples input each time, and total learning times on the total sample. Epochs refers to The Times of passing all samples in the model.

Preparing the data set

train_dataset = datasets.MNIST(root='dataset/',train=True,transform=transforms.ToTensor(),download=True)
train_loader = DataLoader(dataset=train_dataset,batch_size=batch_size,shuffle=True)

test_dataset = datasets.MNIST(root='dataset/',train=False,transform=transforms.ToTensor(),download=True)
test_loader = DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=True)
Copy the code

The datasets module provides multiple datasets, including MNIST dataset. Root specifies the storage location of the dataset train means that the dataset is used for training samples; transform means preprocessing images; download means whether to download and load the dataset. The data set provides the way to load the data, and the data structure to get samples from the data set each time, and the DataLoader is the way to load the data, which is the number of batches we feed each time, and whether or not we feed the model out of order each time.

Initialization model

model = Net(input_size=input_size,num_classes=num_classes).to(device)
Copy the code

Loss functions and optimizers

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(),lr=learning_rate)
Copy the code

The optimizer selects cross entropy, which is usually used to classify problems

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(),lr=learning_rate)
Copy the code

Training model

Start training the model, it’s a long process, and it’s one of the most telling things about our experience, a little bit

for epoch in range(num_epochs):
  for index,(data,targets) in enumerate(train_loader):
    data = data.to(device)
    targets = targets.to(device) 
    print(data.shape)
    break
    
Copy the code

The dataLoader is similar to a generator. The dataLoader provides a batch size of sample data for each iteration

torch.Size([64, 1, 28, 28])
Copy the code

64 represents the number of images provided per lot, 1 represents the number of channels of the image, and 28 and 28 represent the width and height of the image respectively

for epoch in range(num_epochs):
  for index,(data,targets) in enumerate(train_loader):
    data = data.to(device)
    targets = targets.to(device) 
    data = data.reshape(data.shape[0],-1)

    # forward propagation
    pred = model(data)
    loss = criterion(pred,targets)

    # Backpropagation
    optimizer.zero_grad()
    loss.backward()

    # update parameters
    optimizer.step()
Copy the code

validation

Verification is mainly to check whether the model updated by the current iteration is the best model by comparing a certain index after a certain number of iterations, so as to save the model.

def validation_acc(loader,model) :
  num_correct = 0
  num_samples = 0
  model.eval(a)with torch.no_grad():
    for x,y in loader:
      x = x.to(device)
      y = y.to(device)

      x = x.reshape(x.shape[0],-1)

      y_hat = model(x)
      _,pred = y_hat.max(1)

      num_correct += torch.eq(pred,y).sum()
      num_samples += pred.size(0)
    print(f"acc {float(num_correct)/float(num_samples)}")
  acc = float(num_correct)/float(num_samples)
  model.train()
  return acc
Copy the code

The first step is to switch the model to eval mode. When the model is switched to EVAL, some special Layers or parts will have different behaviors with their training, such as Dropouts Layers and BatchNorm Layers. When evaluating and verifying, these Layers need to be closed. You also need to use torch.no_grad() to turn off gradient updates.

validation_acc(test_loader,model)
Copy the code

The output

0.9215 the accCopy the code