“PyTorch Deep Learning: a 60-minute Quick Start” is a tutorial on PyTorch’s website. Some of the translations are available online. With the release of PyTorch1.0, there are major code changes in this tutorial. A Jupyter Notebook file will be published on Github. (Huang Haiguang)

If you need to debug locally, you can download it from Github:

Github.com/fengdu78/ma…

This tutorial for translation official address: \

Pytorch.org/tutorials/b…

By Soumith Chintala\

Objectives of this tutorial:

  • Understanding PyTorch’s Tensor library and neural networks at a high level
  • Training a small neural network to classify images
  • This tutorial assumes a basic understanding of Numpy

Note: Make sure you have the Torch and TorchVision packages installed.

directory

  • What is a Pytorch?
  • Second, the AUTOGRAD
  • Neural network
  • Train a classifier
  • 5. Data parallelism

What is a PyTorch

It is a Scientific computing package based on Python, aimed at two types of users

  • In order to use GPU instead of Numpy \

  • A deep learning research platform for maximum flexibility and speed

start

Tensor (Tensors)

Tensors are similar to Numpy’s Ndarrays, except that tensors can use the GPU to speed up calculations.

from __future__ import print_function
import torch
Copy the code

Construct an uninitialized 5*3 matrix: \

x = torch.Tensor(5.3)
print(x)
Copy the code

Output: \

tensor([[ 0.0000 e+00.0.0000 e+00.1.3004 e-42],
        [ 0.0000 e+00.7.0065 e-45.0.0000 e+00],
        [3.8593 e+35.7.8753 e-43.0.0000 e+00],
        [ 0.0000 e+00.1.8368 e-40.0.0000 e+00],
        [3.8197 e+35.7.8753 e-43.0.0000 e+00]])
Copy the code

Construct a zero matrix using the type \ of long

x = torch.zeros(5.3, dtype=torch.long)
print(x)
Copy the code

Output:

tensor([[0.0.0],
        [0.0.0],
        [0.0.0],
        [0.0.0],
        [0.0.0]])
Copy the code

Build a tensor directly from the data:

x = torch.tensor([5.5.3])
print(x)
Copy the code

Output:

tensor([5.5000.3.0000])
Copy the code

Or you can build a tensor from a tensor that you already have. These methods will reuse the properties of the input tensor, for example, dtype, unless the user provides a new value \

x = x.new_ones(5.3, dtype=torch.double)      # new_* methods take in sizes
print(x) x =torch. Randn_like (x, dtype=torch. Float)print(x) # result has the same sizeCopy the code

Output:

tensor([[1..1..1.],
        [1..1..1.],
        [1..1..1.],
        [1..1..1.],
        [1..1..1.]], dtype=torch.float64)
tensor([[ 1.1701.0.8342.0.6769],
        [1.3060.0.3636.0.6758],
        [ 1.9133.0.3494.1.1412],
        [ 0.9735.0.9492.0.3082],
        [ 0.9469.0.6815.1.3808]])
Copy the code

You get the magnitude of the tensor

print(x.size())
Copy the code

Output:

torch.Size([5.3])
Copy the code

Pay attention to

Torch.Size is actually a tuple, so it supports all operations for tuples.

operation

Operations on tensors have multiple syntactic forms, and we’ll use addition as an example.

Grammar 1

y = torch.rand(5.3)
print(x + y)
Copy the code

Output: \

tensor([[ 1.7199.0.1819.0.1543],
        [0.5413.1.1591.1.4098],
        [ 2.0421.0.5578.2.0645],
        [ 1.7301.0.3236.0.4616],
        [ 1.2805.0.4026.0.6916]])
Copy the code

Grammar. \

print(torch.add(x, y))
Copy the code

Output:

tensor([[ 1.7199.0.1819.0.1543],
       [0.5413.1.1591.1.4098],
       [ 2.0421.0.5578.2.0645],
       [ 1.7301.0.3236.0.4616],
       [ 1.2805.0.4026.0.6916]])
Copy the code

Syntax 3: Give an output tensor as argument \

result = torch.empty(5.3)
torch.add(x, y, out=result)
print(result)
Copy the code

Output: \

tensor([[ 1.7199.0.1819.0.1543],
       [0.5413.1.1591.1.4098],
       [ 2.0421.0.5578.2.0645],
       [ 1.7301.0.3236.0.4616],
       [ 1.2805.0.4026.0.6916]])
Copy the code

Syntax 4: in-place operation \

# add x to y.print(y)
Copy the code

Output: \

tensor([[ 1.7199.0.1819.0.1543],
       [0.5413.1.1591.1.4098],
       [ 2.0421.0.5578.2.0645],
       [ 1.7301.0.3236.0.4616],
       [ 1.2805.0.4026.0.6916]])
Copy the code

Note \

Any operation that changes a tensor in place has an ‘_’ suffix. For example, x.copy_(y), the x.t_() operation will change x.

You can use all numpy indexing operations. You can use all sorts of fancy indexing features like standard NumPy

print(x[:, 1])
Copy the code

Output: \

tensor([0.8342.0.3636.0.3494.0.9492.0.6815])
Copy the code

Resize: To resize a tensor/reshape a tensor, you can use torch. View:

x = torch.randn(4.4)
y = x.view(16)
z = x.view(- 1.8)  # - 1Means that no dimension is specifiedprint(x.size(), y.size(), z.size())
Copy the code

Output:

torch.Size([4.4]) torch.Size([16]) torch.Size([2.8])
Copy the code

If you have a one-element tensor, use.item() as a Python number

x = torch.randn(1)
print(x)
print(x.item())
Copy the code

Output:

tensor([0.3441])
0.34412217140197754
Copy the code

Numpy bridge \

Converting a Torch tensor to a NUMpy array or vice versa is simple.

The Torch tensor and the NUMpy array will share the underlying memory, and changing one will also change the other.

Convert the Torch tensor to a NUMpy array

a = torch.ones(5)
print(a)
Copy the code

Output:

tensor([1..1..1..1..1.])
Copy the code

Input:

b = a.numpy()
print(b)
print(type(b))
Copy the code

Output:

[ 1.  1.  1.  1.  1.]
<class 'numpy.ndarray'>
Copy the code

Let’s see how the value of the NUMpy array changes by doing the following. \

a.add_(1)
print(a)
print(b)
Copy the code

Output:

tensor([2..2..2..2..2.[])2.  2.  2.  2.  2.]
Copy the code

Convert the NUMpy array to the Torch tensor

See how changing the NUMpy array automatically changes the Torch tensor. \

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)
Copy the code

Output:

[ 2.  2.  2.  2.  2.]
tensor([2..2..2..2..2.], dtype=torch.float64)
Copy the code

All tensors on the CPU, except character tensors, support conversion between NUMpy.

CUDA tensor

You can use the. To method to move the tensor to any device.

# let us run this cell only if CUDA is available
# We will use ` `torch.device` ` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ` `.to("cuda")` `
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ` `.to` ` can also change dtype together!
Copy the code

Total script run time: 0.003 seconds

(2) Automatic differentiation

At the heart of all the neural networks in PyTorch is the Autograd package. Let’s start with a brief introduction to the package, and then train our first neural network.

The Autograd package provides automatic derivation for all operations on tensors. It is a run-time defined framework, meaning that backpropagation is defined according to how your code runs, and can be different with each iteration

Let’s look at the package with some simple examples:

Tensor (Tensor)

Torch.Tensor is the core class of the package. If you set its attribute. Requires_grad to True, you start tracking all operations on it. When the calculation is complete, you can call.backward() and automatically compute all gradients. The gradient of this tensor will accumulate in the.grad property.

To prevent the tensor from tracing the history, call.detach() to detach it from the calculation history and prevent future calculations from being traced.

To prevent tracking history (and using memory), you can also wrap code blocks with torch.no_grad() : this can be particularly useful when evaluating a model, which might have a trainable parameter of REQUIres_grad = True, but we don’t need gradients.

There is another class that is very important for the Autograd implementation – Function.

Tensor and Function connect to each other and build an acyclic graph and build a complete calculation. Each Tensor has a. Grad_fn property, which references functions that have already created a Tensor (except for the user created Tensors – their grad_fn is None).

If you want to compute the derivative, you can call.backward() on the Tensor. If the Tensor is scalar (that is, it contains one element of data), you don’t need to specify any parameters for backward(), but if it has more elements, you need to specify a gradient parameter, which is a Tensor that matches shapes.

import torch
Copy the code

Create a tensor and set requires_grad = True to track its calculation

x = torch.ones(2.2, requires_grad=True)
print(x)
Copy the code

Output: \

tensor([[1..1.],
        [1..1.]], requires_grad=True)
Copy the code

Perform operations on tensors :\

y = x + 2
print(y)
Copy the code

Output:

tensor([[3..3.],
        [3..3.]], grad_fn=<AddBackward>)
Copy the code

Since y is created by an operation, it has grad_fn, while x is created by the user, so its grad_fn is none.\

print(y.grad_fn)
print(x.grad_fn)
Copy the code

Output: \

<AddBackward object at 0x000001C015ADFFD0>
None
Copy the code

Perform operation \ on y

z = y * y * 3
out = z.mean()
print(z, out)
Copy the code

Output: \

tensor([[27..27.],
        [27..27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)
Copy the code

.requires\_grad_(...)Change the existing Tensor’s Requires_grad flag in place. If not given, the input flag defaults to False.

a = torch.randn(2.2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
Copy the code

Output:

False
True
<SumBackward0 object at 0x000001E020B79FD0>
Copy the code

Gradient (Gradients)

Now we do backpropagation,out.backward() equals out.backward(torch. Tensor (1.)) ‘ ‘

out.backward()
Copy the code

Output the gradient d(out)/dx of out with respect to x:

print(x.grad)
Copy the code

Output: \

tensor([[4.5000.4.5000],
        [4.5000.4.5000]])
Copy the code

You should get a matrix with all values of 4.5. Let’s call the tensor out “O “. Is:

This property of the Jacobian cross product makes it convenient to feed an external gradient into a model with a non-scalar output.

Now let’s look at an example of a Jacobian cross product:

x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)
Copy the code

Output:

tensor([  384.5854.13.6405.1049.2870], grad_fn=<MulBackward0>)
Copy the code

Now in this case, y is no longer a scalar. Torch. Autograd cannot compute the full Jacobian determinant directly, but if we only want the Jacobian cross product, we simply pass the vector backwards as a parameter:

v = torch.tensor([0.1.1.0.0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
Copy the code

Output:

tensor([5.1200 e+01.5.1200 e+02.5.1200 e-02])
Copy the code

You can also stop using trace history by using the torch.no_grad() code, which uses.requires_grad = True on the tensor.

print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)
Copy the code

Output:

True
True
False
Copy the code

About autograd and the Function of the document at http://pytorch.org/docs/autograd

Neural network

You can use the torch. Nn package to build a neural network.

You already know the Autograd package. The NN package relies on the Autograd package to define models and take derivatives. An nn.Module contains layers and a forward(input) method that returns output.

For example, consider the following network that classifies digital images.

convnet

It is a simple feedforward neural network that takes one input and then layer upon layer of input until the result is reached.

The typical training process of neural network is as follows:

  • Define the neural network model, which has some learnable parameters (or weights);
  • Iterating over data sets;
  • Processing input through neural network;
  • Calculate the loss (the difference between the output and the correct value)
  • The gradient is propagated back to the network parameters;
  • To update network parameters, use the following simple update principles:
weight = weight - learning_rate * gradient
Copy the code

Define the network

Let’s define a network

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1.6.5)
        self.conv2 = nn.Conv2d(6.16.5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5.120)
        self.fc2 = nn.Linear(120.84)
        self.fc3 = nn.Linear(84.10)

    def forward(self, x):
        # Max pooling over a (2.2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2.2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(- 1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)
Copy the code

Output: \

Net(
  (conv1): Conv2d(1.6, kernel_size=(5.5), stride=(1.1))
  (conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
Copy the code

You just define forward functions, and backward functions (calculating gradients) are created automatically for you when using Autograd. You can use whatever the Tensor does in the forward function.

Net.parameters () returns the parameters the model needs to learn.

params = list(net.parameters())
print(len(params))
print(params[0].size())
Copy the code

Output: \

10
torch.Size([6.1.5.5])f
Copy the code

Let’s try a random 32×32 input. Note: The expected input size for this network (LeNet) is 32×32. To use this network on a MNIST dataset, size the images in the dataset to 32×32.

input = torch.randn(1.1.32.32)
out = net(input)
print(out)
Copy the code

Output:

tensor([[0.1217.0.0449.0.0392.0.1103.0.0534.0.1108.0.0565.0.0116.0.0867.0.0102]], grad_fn=<AddmmBackward>)
Copy the code

The gradient cache of all parameters is cleared, and then the random gradient is propagated back.

net.zero_grad()
out.backward(torch.randn(1.10))
Copy the code

Note \

  • Torch. Nn only supports small batch input, and the entire torch. Nn package only supports small batch samples, not individual samples

  • For example,nn.Conv2d will accept a 4-dimensional tensor, with each dimension being:

    NSamples ×nChannels×Height×Width(sample number * channel number * Height * Width)

  • If you have a single sample, just use input.unsqueeze(0) to add additional dimensions.\

Before continuing, let’s review all the classes we’ve seen so far.\

review

  • torch.Tensor– Support automatic programming operations (e.gbackward()). And keep the tensor of the gradient.
  • nn.Module– Neural network module. Encapsulate parameters, move to GPU to run, export, load etc
  • nn.Parameter– A tensor when assigned to aModuleIs automatically registered as a parameter.
  • autograd.Function– Implements a forward and reverse definition of an automatic derivative operation, creating at least one for each tensor operationFunctionNode that connects to the function that creates the tensor and encodes its history.

Loss function

A loss function takes a pair (output, target) as inputs (output is the network’s output, target is the actual value) and calculates a value to estimate how much the network’s output differs from the target value.

There are several different loss functions in the NN package. A simple loss function is: nn.mseloss, which calculates the mean square error between the input (which I think is the output of the network) and the target value.

Such as:

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1.- 1)  # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
Copy the code

Output: \

tensor(0.5663, grad_fn=<MseLossBackward>)
Copy the code

Now you trace the loss backwards using its.grad_fn property and you should see a calculation graph that looks like this :\

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss
Copy the code

So, when you call Loss.Backward (), the whole graph is divided into losses and all tensors in the graph that have requires_grad = True, and the gradient of their.grad tensor is accumulated.

To illustrate, let’s reverse track a few steps:

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0] [0])  # Linear
print(loss.grad_fn.next_functions[0] [0].next_functions[0] [0])
Copy the code

Output:

<MseLossBackward object at 0x0000029E54C509B0>
<AddmmBackward object at 0x0000029E54C50898>
<AccumulateGrad object at 0x0000029E54C509B0>
Copy the code

Back propagation

To propagate the error back, all we need to do is call Loss.Backward (). You need to clear existing gradients or gradients will be added to existing gradients.

Now we will call Loss.Backward () and look at the gradient of the bias item at conv1 layer before and after the backpropagation.

net.zero_grad()     # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Copy the code

Output: \

conv1.bias.grad before backward
tensor([0..0..0..0..0..0.])
conv1.bias.grad after backward
tensor([ 0.0006.0.0164.0.0122.0.0060.0.0056.0.0052])
Copy the code

Update the weight

The simplest updating rule in practice is stochastic gradient descent (SGD).

Weight = weight - learning_rate ∗ gradientCopy the code

We can implement this rule using simple Python code.

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)
Copy the code

However, when you use neural networks, you want to use various update rules, such as SGD,Nesterov-SGD,Adam, RMSPROP, etc. To do this, we built a package called Torch. Optim that implements all of these rules. Using them is very simple :\

import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update
Copy the code

Pay attention to

Observe how to manually set the gradient buffer to zero using optimizer.zero_grad(). This is because the gradient is cumulative as explained in the backpropagation section.

Train a classifier

You have learned how to define a neural network, calculate losses and update the weights of the network.

Now you’re probably wondering: Where does the data come from?

About data

In general, when working with image, text, audio, and video data, you can use the standard Python package to load the data into a NUMpy array. Then translate that array into torch.*Tensor.

  • For images, there are Pillow,OpenCV packages and so on that are very useful
  • For audio, there are packages such as Scipy and Librosa
  • For text, you can load it using raw Python and Cython, or you can use NLTK and SpaCy. For vision, we created onetorchvisionPackage, which contains data loads for common data sets, such as Imagenet,CIFAR10,MNIST, etc., and image converters, i.etorchvision.datasetsandtorch.utils.data.DataLoader.

This provides great convenience and avoids code duplication.

In this tutorial, we use the CIFAR10 dataset, which has the following 10 categories: ‘airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘ship’, ‘Truck’. The image size in this dataset is 3 × 32 × 32, i.e.,3 channels,32 × 32 pixels.

In this tutorial, we use the CIFAR10 dataset, which has the following 10 categories: ‘airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘ship’, ‘Truck’. The image size in this dataset is 3 × 32 × 32, i.e.,3 channels,32 × 32 pixels.

Train an image classifier

We will proceed in the following order :\

  • usetorchvisionLoad and normalize CIFAR10 training set and test set.
  • Define a convolutional neural network
  • Defining loss function
  • Train the network on the training set
  • Test the network on the test set

1. Load and normalize CIFAR0

Loading CIFAR10 with TorchVision is very easy.

import torch
import torchvision
import torchvision.transforms as transforms
Copy the code

The output of torchvision is the PILImage image of [0,1], which is transformed into a tensor whose normalization range is [-1, 1].

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5.0.5.0.5), (0.5.0.5.0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)
classes = ('plane'.'car'.'bird'.'cat'.'deer'.'dog'.'frog'.'horse'.'ship'.'truck') # This process is a bit slow and will download approximately340MB Image data.Copy the code

The output of torchvision is the PILImage image of [0,1], which we convert to a tensor whose normalized range is [-1, 1].\

Output: \

Files already downloaded and verified
Files already downloaded and verified
Copy the code

We show some interesting training images.

import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1.2.0)))
    plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code

Output:

plane  deer   dog plane
Copy the code

2. Define a convolutional neural network

The neural network code is copied from the previous section of neural network and modified to accept 3-channel images instead of accepting single-channel images.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3.6.5)
        self.pool = nn.MaxPool2d(2.2)
        self.conv2 = nn.Conv2d(6.16.5)
        self.fc1 = nn.Linear(16 * 5 * 5.120)
        self.fc2 = nn.Linear(120.84)
        self.fc3 = nn.Linear(84.10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(- 1.16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()
Copy the code

3. Define loss function and optimizer \

We use cross entropy as the loss function and stochastic gradient descent of the driving quantity.

import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Copy the code

4. Training networks

This is where the fun starts, we just loop over the data iterator, feed the data into the network, and optimize.

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0) : # get the inputs inputs, labels = data # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs)  loss = criterion(outputs, labels) loss.backward() optimizer.step() #print statistics
        running_loss += loss.item()
        if i % 2000= =1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')
Copy the code

Output:

[1.2000] loss: 2.286
[1.4000] loss: 1.921
[1.6000] loss: 1.709
[1.8000] loss: 1.618
[1.10000] loss: 1.548
[1.12000] loss: 1.496
[2.2000] loss: 1.435
[2.4000] loss: 1.409
[2.6000] loss: 1.373
[2.8000] loss: 1.348
[2.10000] loss: 1.326
[2.12000] loss: 1.313
Finished Training
Copy the code

5. Test the network on the test set

We trained the network twice on the entire training set, but we also need to check if the network learned anything from the data set.

We detect the category labels output by the prediction neural network according to the actual situation. If the prediction is correct, we add the sample to the correct prediction list.

The first step is to familiarize yourself with the images in the test set.

dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: '.' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code

Output:

GroundTruth:    cat  ship  ship plane
Copy the code

Now let’s see what the neural network thinks the picture is. \

outputs = net(images)
Copy the code

The output is the probability of 10 labels. The greater the probability of a category, the more the neural network thinks it is that category. So let’s get the label with the highest probability.

_, predicted = torch.max(outputs, 1)
print('Predicted: '.' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))
Copy the code

Output:

Predicted:    cat  ship  ship plane
Copy the code

It looks pretty good.

Let’s take a look at the results of the network on the entire test set.

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
Copy the code

Output:

Accuracy of the network on the 10000 test images: 54 %
Copy the code

The results looked better than chance, which got it right 10 percent of the time, as if the network had learned something.

So in what categories did you do well, and in what categories did you do badly?

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1
for i in range(10) :print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))
Copy the code

Output:

Accuracy of plane : 52 %
Accuracy of   car : 63 %
Accuracy of  bird : 43 %
Accuracy of   cat : 33 %
Accuracy of  deer : 36 %
Accuracy of   dog : 46 %
Accuracy of  frog : 68 %
Accuracy of horse : 62 %
Accuracy of  ship : 80 %
Accuracy of truck : 63 %
Copy the code

Train on GPU

How you take a Tensor and put it on a GPU, you take a neural network and put it on a GPU. This operation recursively traverses the modules and converts their arguments and buffers to CUDA tensors.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should printA CUDA device: # Assuming we have a CUDA machine, this operation will display CUDA devices.print(device)
Copy the code

Output:

cuda:0
Copy the code

Let’s say we have a CUDA machine, then these methods recursively traverse all modules and convert their arguments and buffers to CUDA tensors:

net.to(device)
Copy the code

Output:

Net(
  (conv1): Conv2d(3.6, kernel_size=(5.5), stride=(1.1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
Copy the code

Keep in mind that you must also convert your input and target values to the GPU at each step:

inputs, labels = inputs.to(device), labels.to(device)
Copy the code

Why don’t we notice GPU speeds increasing a lot? That’s because the network is very small

Practice: Try increasing the width of your network (first nn.Conv2d 2nd argument, second nn.Conv2d 1st argument, they need to be the same number) and see what kind of acceleration you get.

Goals achieved:

  • Insight into PyTorch’s tensor library and neural network.\

  • Trained a small network to classify pictures.\

V. Data parallelism (Optional reading)

Sung Kim and Jenny Kang

In this tutorial, you will learn how to use data parallelizations to work with multiple Gpus.

PyTorch is very easy to use on a GPU. You can put a model on a GPU as follows:

device = torch.device("cuda:0")
model.to(device)
Copy the code

Then you can copy all the tensors onto the GPU:

mytensor = my_tensor.to(device)
Copy the code

Note that just calling mytensor.gpu() does not copy the tensor to the GPU. You need to assign it to a new tensor and use that tensor on the GPU.

Performing forward and back propagation on multiple Gpus is natural. However, PyTorch will default to just one GPU. You can use DataParallel to run models in parallel to make it easy to run operations on multiple Gpus.

model = nn.DataParallel(model)
Copy the code

This is the core behind this tutorial, which we’ll cover in more detail next.

Imports and parameters

Import the PyTorch module and define the parameters.

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
# Parameters and DataLoaders
input_size = 5
output_size = 2
batch_size = 30
data_size = 100
Copy the code

Equipment:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Copy the code

Virtual data set

To make a virtual (random) dataset, you simply implement __getitem__.

class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)
    def __getitem__(self, index):
        return self.data[index]
    def __len__(self):
        return self.len
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)
Copy the code

* * * *

A simple model

As a demonstration, our model takes a single input, performs a linear operation, and returns the result. However, you can use DataParallel on any model (CNN, RNN, Capsule Net, etc.).

We placed a print statement inside the model to detect the size of the input and output vectors. Notice what is printed when the batch level is 0.

class Model(nn.Module):
    # Our model
    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)
    def forward(self, input):
        output = self.fc(input)
        print("\tIn Model: input size", input.size(),
              "output size", output.size())
        return output
Copy the code

Create a model and parallel the data

This is the core of this tutorial. First, we need to create a model instance and check if we have multiple Gpus. If we have multiple Gpus, we use NN.DataParallel to wrap our model. Then put the model on the GPU via model.to(Device).

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10. ] [10. ] [10. ]  on3 GPUs
  model = nn.DataParallel(model)
model.to(device)
Copy the code

Output:

Model(
  (fc): Linear(in_features=5, out_features=2, bias=True)
)
Copy the code

* * * *

Operating model

Now we can look at the magnitude of the input and output tensors.

for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())
Copy the code

Output:

In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
  In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
  In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
  In Model: input size torch.Size([10.5]) output size torch.Size([10.2])
Outside: input size torch.Size([10.5]) output_size torch.Size([10.2])
Copy the code

* * * *

The results of

When we batch 30 inputs and outputs, we get 30 inputs and 30 outputs as expected, but if you have multiple Gpus, you get something like this.

2 the GPU

If you have 2 Gpus, you will see:

# on 2 GPUs
Let's use 2 GPUs! In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2]) In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2]) In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2]) Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])Copy the code

3 the GPU

If you have 3 Gpus, you will see:

Let's use 3 GPUs!
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code

8 the GPU

Let's use 8 GPUs!
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code

* * * *

conclusion

DataParallel automatically divides data and sends tasks to multiple models on multiple Gpus. After completing tasks for each model, DataParallel collects and merges the results back to you.

For more information, see here:

Pytorch.org/tutorials/b…

The full text after

All the code for this article is posted on Huang’s Github (and will be updated) : \

Github.com/fengdu78/ma…

Official original content (English) :

Pytorch.org/tutorials/b…

Please follow and share ↓↓↓\

Machine learning beginners \

QQ group: 554839127

(Note: there are 6 QQ groups on this site, those who have joined any of them do not need to add more)

Past wonderful review \

  • Conscience Recommendation: Introduction to machine learning and learning recommendations (2018 edition) \

  • Github Image download by Dr. Hoi Kwong (Machine learning and Deep Learning resources)

  • Printable version of Machine learning and Deep learning course notes \

  • Machine Learning Cheat Sheet – understand Machine Learning like reciting TOEFL Vocabulary

  • Introduction to Deep Learning – Python Deep Learning, annotated version of the original code in Chinese and ebook

  • Zotero paper Management tool

  • The mathematical foundations of machine learning

  • Machine learning essential treasure book – “statistical learning methods” python code implementation, ebook and courseware

  • Blood vomiting recommended collection of dissertation typesetting tutorial (complete version)

  • The encyclopaedia of Machine learning introduction – A collection of articles from the “Beginner machine Learning” public account in 2018