“PyTorch Deep Learning: a 60-minute Quick Start” is a tutorial on PyTorch’s website. Some of the translations are available online. With the release of PyTorch1.0, there are major code changes in this tutorial. A Jupyter Notebook file will be published on Github. (Huang Haiguang)
If you need to debug locally, you can download it from Github:
Github.com/fengdu78/ma…
This tutorial for translation official address: \
Pytorch.org/tutorials/b…
By Soumith Chintala\
Objectives of this tutorial:
- Understanding PyTorch’s Tensor library and neural networks at a high level
- Training a small neural network to classify images
- This tutorial assumes a basic understanding of Numpy
Note: Make sure you have the Torch and TorchVision packages installed.
directory
- What is a Pytorch?
- Second, the AUTOGRAD
- Neural network
- Train a classifier
- 5. Data parallelism
What is a PyTorch
It is a Scientific computing package based on Python, aimed at two types of users
-
In order to use GPU instead of Numpy \
-
A deep learning research platform for maximum flexibility and speed
start
Tensor (Tensors)
Tensors are similar to Numpy’s Ndarrays, except that tensors can use the GPU to speed up calculations.
from __future__ import print_function
import torch
Copy the code
Construct an uninitialized 5*3 matrix: \
x = torch.Tensor(5.3)
print(x)
Copy the code
Output: \
tensor([[ 0.0000 e+00.0.0000 e+00.1.3004 e-42],
[ 0.0000 e+00.7.0065 e-45.0.0000 e+00],
[3.8593 e+35.7.8753 e-43.0.0000 e+00],
[ 0.0000 e+00.1.8368 e-40.0.0000 e+00],
[3.8197 e+35.7.8753 e-43.0.0000 e+00]])
Copy the code
Construct a zero matrix using the type \ of long
x = torch.zeros(5.3, dtype=torch.long)
print(x)
Copy the code
Output:
tensor([[0.0.0],
[0.0.0],
[0.0.0],
[0.0.0],
[0.0.0]])
Copy the code
Build a tensor directly from the data:
x = torch.tensor([5.5.3])
print(x)
Copy the code
Output:
tensor([5.5000.3.0000])
Copy the code
Or you can build a tensor from a tensor that you already have. These methods will reuse the properties of the input tensor, for example, dtype, unless the user provides a new value \
x = x.new_ones(5.3, dtype=torch.double) # new_* methods take in sizes
print(x) x =torch. Randn_like (x, dtype=torch. Float)print(x) # result has the same sizeCopy the code
Output:
tensor([[1..1..1.],
[1..1..1.],
[1..1..1.],
[1..1..1.],
[1..1..1.]], dtype=torch.float64)
tensor([[ 1.1701.0.8342.0.6769],
[1.3060.0.3636.0.6758],
[ 1.9133.0.3494.1.1412],
[ 0.9735.0.9492.0.3082],
[ 0.9469.0.6815.1.3808]])
Copy the code
You get the magnitude of the tensor
print(x.size())
Copy the code
Output:
torch.Size([5.3])
Copy the code
Pay attention to
Torch.Size is actually a tuple, so it supports all operations for tuples.
operation
Operations on tensors have multiple syntactic forms, and we’ll use addition as an example.
Grammar 1
y = torch.rand(5.3)
print(x + y)
Copy the code
Output: \
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Grammar. \
print(torch.add(x, y))
Copy the code
Output:
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Syntax 3: Give an output tensor as argument \
result = torch.empty(5.3)
torch.add(x, y, out=result)
print(result)
Copy the code
Output: \
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Syntax 4: in-place operation \
# add x to y.print(y)
Copy the code
Output: \
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Note \
Any operation that changes a tensor in place has an ‘_’ suffix. For example, x.copy_(y), the x.t_() operation will change x.
You can use all numpy indexing operations. You can use all sorts of fancy indexing features like standard NumPy
print(x[:, 1])
Copy the code
Output: \
tensor([0.8342.0.3636.0.3494.0.9492.0.6815])
Copy the code
Resize: To resize a tensor/reshape a tensor, you can use torch. View:
x = torch.randn(4.4)
y = x.view(16)
z = x.view(- 1.8) # - 1Means that no dimension is specifiedprint(x.size(), y.size(), z.size())
Copy the code
Output:
torch.Size([4.4]) torch.Size([16]) torch.Size([2.8])
Copy the code
If you have a one-element tensor, use.item() as a Python number
x = torch.randn(1)
print(x)
print(x.item())
Copy the code
Output:
tensor([0.3441])
0.34412217140197754
Copy the code
Numpy bridge \
Converting a Torch tensor to a NUMpy array or vice versa is simple.
The Torch tensor and the NUMpy array will share the underlying memory, and changing one will also change the other.
Convert the Torch tensor to a NUMpy array
a = torch.ones(5)
print(a)
Copy the code
Output:
tensor([1..1..1..1..1.])
Copy the code
Input:
b = a.numpy()
print(b)
print(type(b))
Copy the code
Output:
[ 1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>
Copy the code
Let’s see how the value of the NUMpy array changes by doing the following. \
a.add_(1)
print(a)
print(b)
Copy the code
Output:
tensor([2..2..2..2..2.[])2. 2. 2. 2. 2.]
Copy the code
Convert the NUMpy array to the Torch tensor
See how changing the NUMpy array automatically changes the Torch tensor. \
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)
Copy the code
Output:
[ 2. 2. 2. 2. 2.]
tensor([2..2..2..2..2.], dtype=torch.float64)
Copy the code
All tensors on the CPU, except character tensors, support conversion between NUMpy.
CUDA tensor
You can use the. To method to move the tensor to any device.
# let us run this cell only if CUDA is available
# We will use ` `torch.device` ` objects to move tensors in and out of GPU
if torch.cuda.is_available():
device = torch.device("cuda") # a CUDA device object
y = torch.ones_like(x, device=device) # directly create a tensor on GPU
x = x.to(device) # or just use strings ` `.to("cuda")` `
z = x + y
print(z)
print(z.to("cpu", torch.double)) # ` `.to` ` can also change dtype together!
Copy the code
Total script run time: 0.003 seconds
(2) Automatic differentiation
At the heart of all the neural networks in PyTorch is the Autograd package. Let’s start with a brief introduction to the package, and then train our first neural network.
The Autograd package provides automatic derivation for all operations on tensors. It is a run-time defined framework, meaning that backpropagation is defined according to how your code runs, and can be different with each iteration
Let’s look at the package with some simple examples:
Tensor (Tensor)
Torch.Tensor is the core class of the package. If you set its attribute. Requires_grad to True, you start tracking all operations on it. When the calculation is complete, you can call.backward() and automatically compute all gradients. The gradient of this tensor will accumulate in the.grad property.
To prevent the tensor from tracing the history, call.detach() to detach it from the calculation history and prevent future calculations from being traced.
To prevent tracking history (and using memory), you can also wrap code blocks with torch.no_grad() : this can be particularly useful when evaluating a model, which might have a trainable parameter of REQUIres_grad = True, but we don’t need gradients.
There is another class that is very important for the Autograd implementation – Function.
Tensor and Function connect to each other and build an acyclic graph and build a complete calculation. Each Tensor has a. Grad_fn property, which references functions that have already created a Tensor (except for the user created Tensors – their grad_fn is None).
If you want to compute the derivative, you can call.backward() on the Tensor. If the Tensor is scalar (that is, it contains one element of data), you don’t need to specify any parameters for backward(), but if it has more elements, you need to specify a gradient parameter, which is a Tensor that matches shapes.
import torch
Copy the code
Create a tensor and set requires_grad = True to track its calculation
x = torch.ones(2.2, requires_grad=True)
print(x)
Copy the code
Output: \
tensor([[1..1.],
[1..1.]], requires_grad=True)
Copy the code
Perform operations on tensors :\
y = x + 2
print(y)
Copy the code
Output:
tensor([[3..3.],
[3..3.]], grad_fn=<AddBackward>)
Copy the code
Since y is created by an operation, it has grad_fn, while x is created by the user, so its grad_fn is none.\
print(y.grad_fn)
print(x.grad_fn)
Copy the code
Output: \
<AddBackward object at 0x000001C015ADFFD0>
None
Copy the code
Perform operation \ on y
z = y * y * 3
out = z.mean()
print(z, out)
Copy the code
Output: \
tensor([[27..27.],
[27..27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)
Copy the code
.requires\_grad_(...)
Change the existing Tensor’s Requires_grad flag in place. If not given, the input flag defaults to False.
a = torch.randn(2.2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
Copy the code
Output:
False
True
<SumBackward0 object at 0x000001E020B79FD0>
Copy the code
Gradient (Gradients)
Now we do backpropagation,out.backward() equals out.backward(torch. Tensor (1.)) ‘ ‘
out.backward()
Copy the code
Output the gradient d(out)/dx of out with respect to x:
print(x.grad)
Copy the code
Output: \
tensor([[4.5000.4.5000],
[4.5000.4.5000]])
Copy the code
You should get a matrix with all values of 4.5. Let’s call the tensor out “O “. Is:
This property of the Jacobian cross product makes it convenient to feed an external gradient into a model with a non-scalar output.
Now let’s look at an example of a Jacobian cross product:
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
Copy the code
Output:
tensor([ 384.5854.13.6405.1049.2870], grad_fn=<MulBackward0>)
Copy the code
Now in this case, y is no longer a scalar. Torch. Autograd cannot compute the full Jacobian determinant directly, but if we only want the Jacobian cross product, we simply pass the vector backwards as a parameter:
v = torch.tensor([0.1.1.0.0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
Copy the code
Output:
tensor([5.1200 e+01.5.1200 e+02.5.1200 e-02])
Copy the code
You can also stop using trace history by using the torch.no_grad() code, which uses.requires_grad = True on the tensor.
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
Copy the code
Output:
True
True
False
Copy the code
About autograd and the Function of the document at http://pytorch.org/docs/autograd
Neural network
You can use the torch. Nn package to build a neural network.
You already know the Autograd package. The NN package relies on the Autograd package to define models and take derivatives. An nn.Module contains layers and a forward(input) method that returns output.
For example, consider the following network that classifies digital images.
convnet
It is a simple feedforward neural network that takes one input and then layer upon layer of input until the result is reached.
The typical training process of neural network is as follows:
- Define the neural network model, which has some learnable parameters (or weights);
- Iterating over data sets;
- Processing input through neural network;
- Calculate the loss (the difference between the output and the correct value)
- The gradient is propagated back to the network parameters;
- To update network parameters, use the following simple update principles:
weight = weight - learning_rate * gradient
Copy the code
Define the network
Let’s define a network
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1.6.5)
self.conv2 = nn.Conv2d(6.16.5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5.120)
self.fc2 = nn.Linear(120.84)
self.fc3 = nn.Linear(84.10)
def forward(self, x):
# Max pooling over a (2.2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2.2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(- 1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Copy the code
Output: \
Net(
(conv1): Conv2d(1.6, kernel_size=(5.5), stride=(1.1))
(conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
Copy the code
You just define forward functions, and backward functions (calculating gradients) are created automatically for you when using Autograd. You can use whatever the Tensor does in the forward function.
Net.parameters () returns the parameters the model needs to learn.
params = list(net.parameters())
print(len(params))
print(params[0].size())
Copy the code
Output: \
10
torch.Size([6.1.5.5])f
Copy the code
Let’s try a random 32×32 input. Note: The expected input size for this network (LeNet) is 32×32. To use this network on a MNIST dataset, size the images in the dataset to 32×32.
input = torch.randn(1.1.32.32)
out = net(input)
print(out)
Copy the code
Output:
tensor([[0.1217.0.0449.0.0392.0.1103.0.0534.0.1108.0.0565.0.0116.0.0867.0.0102]], grad_fn=<AddmmBackward>)
Copy the code
The gradient cache of all parameters is cleared, and then the random gradient is propagated back.
net.zero_grad()
out.backward(torch.randn(1.10))
Copy the code
Note \
-
Torch. Nn only supports small batch input, and the entire torch. Nn package only supports small batch samples, not individual samples
-
For example,nn.Conv2d will accept a 4-dimensional tensor, with each dimension being:
NSamples ×nChannels×Height×Width(sample number * channel number * Height * Width)
-
If you have a single sample, just use input.unsqueeze(0) to add additional dimensions.\
Before continuing, let’s review all the classes we’ve seen so far.\
review
torch.Tensor
– Support automatic programming operations (e.gbackward()
). And keep the tensor of the gradient.nn.Module
– Neural network module. Encapsulate parameters, move to GPU to run, export, load etcnn.Parameter
– A tensor when assigned to aModule
Is automatically registered as a parameter.autograd.Function
– Implements a forward and reverse definition of an automatic derivative operation, creating at least one for each tensor operationFunction
Node that connects to the function that creates the tensor and encodes its history.
Loss function
A loss function takes a pair (output, target) as inputs (output is the network’s output, target is the actual value) and calculates a value to estimate how much the network’s output differs from the target value.
There are several different loss functions in the NN package. A simple loss function is: nn.mseloss, which calculates the mean square error between the input (which I think is the output of the network) and the target value.
Such as:
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1.- 1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
Copy the code
Output: \
tensor(0.5663, grad_fn=<MseLossBackward>)
Copy the code
Now you trace the loss backwards using its.grad_fn property and you should see a calculation graph that looks like this :\
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
Copy the code
So, when you call Loss.Backward (), the whole graph is divided into losses and all tensors in the graph that have requires_grad = True, and the gradient of their.grad tensor is accumulated.
To illustrate, let’s reverse track a few steps:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0] [0]) # Linear
print(loss.grad_fn.next_functions[0] [0].next_functions[0] [0])
Copy the code
Output:
<MseLossBackward object at 0x0000029E54C509B0>
<AddmmBackward object at 0x0000029E54C50898>
<AccumulateGrad object at 0x0000029E54C509B0>
Copy the code
Back propagation
To propagate the error back, all we need to do is call Loss.Backward (). You need to clear existing gradients or gradients will be added to existing gradients.
Now we will call Loss.Backward () and look at the gradient of the bias item at conv1 layer before and after the backpropagation.
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Copy the code
Output: \
conv1.bias.grad before backward
tensor([0..0..0..0..0..0.])
conv1.bias.grad after backward
tensor([ 0.0006.0.0164.0.0122.0.0060.0.0056.0.0052])
Copy the code
Update the weight
The simplest updating rule in practice is stochastic gradient descent (SGD).
Weight = weight - learning_rate ∗ gradientCopy the code
We can implement this rule using simple Python code.
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
Copy the code
However, when you use neural networks, you want to use various update rules, such as SGD,Nesterov-SGD,Adam, RMSPROP, etc. To do this, we built a package called Torch. Optim that implements all of these rules. Using them is very simple :\
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
Copy the code
Pay attention to
Observe how to manually set the gradient buffer to zero using optimizer.zero_grad(). This is because the gradient is cumulative as explained in the backpropagation section.
Train a classifier
You have learned how to define a neural network, calculate losses and update the weights of the network.
Now you’re probably wondering: Where does the data come from?
About data
In general, when working with image, text, audio, and video data, you can use the standard Python package to load the data into a NUMpy array. Then translate that array into torch.*Tensor.
- For images, there are Pillow,OpenCV packages and so on that are very useful
- For audio, there are packages such as Scipy and Librosa
- For text, you can load it using raw Python and Cython, or you can use NLTK and SpaCy. For vision, we created one
torchvision
Package, which contains data loads for common data sets, such as Imagenet,CIFAR10,MNIST, etc., and image converters, i.etorchvision.datasets
andtorch.utils.data.DataLoader
.
This provides great convenience and avoids code duplication.
In this tutorial, we use the CIFAR10 dataset, which has the following 10 categories: ‘airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘ship’, ‘Truck’. The image size in this dataset is 3 × 32 × 32, i.e.,3 channels,32 × 32 pixels.
In this tutorial, we use the CIFAR10 dataset, which has the following 10 categories: ‘airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘ship’, ‘Truck’. The image size in this dataset is 3 × 32 × 32, i.e.,3 channels,32 × 32 pixels.
Train an image classifier
We will proceed in the following order :\
- use
torchvision
Load and normalize CIFAR10 training set and test set. - Define a convolutional neural network
- Defining loss function
- Train the network on the training set
- Test the network on the test set
1. Load and normalize CIFAR0
Loading CIFAR10 with TorchVision is very easy.
import torch
import torchvision
import torchvision.transforms as transforms
Copy the code
The output of torchvision is the PILImage image of [0,1], which is transformed into a tensor whose normalization range is [-1, 1].
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5.0.5.0.5), (0.5.0.5.0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane'.'car'.'bird'.'cat'.'deer'.'dog'.'frog'.'horse'.'ship'.'truck') # This process is a bit slow and will download approximately340MB Image data.Copy the code
The output of torchvision is the PILImage image of [0,1], which we convert to a tensor whose normalized range is [-1, 1].\
Output: \
Files already downloaded and verified
Files already downloaded and verified
Copy the code
We show some interesting training images.
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1.2.0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code
Output:
plane deer dog plane
Copy the code
2. Define a convolutional neural network
The neural network code is copied from the previous section of neural network and modified to accept 3-channel images instead of accepting single-channel images.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3.6.5)
self.pool = nn.MaxPool2d(2.2)
self.conv2 = nn.Conv2d(6.16.5)
self.fc1 = nn.Linear(16 * 5 * 5.120)
self.fc2 = nn.Linear(120.84)
self.fc3 = nn.Linear(84.10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(- 1.16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Copy the code
3. Define loss function and optimizer \
We use cross entropy as the loss function and stochastic gradient descent of the driving quantity.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Copy the code
4. Training networks
This is where the fun starts, we just loop over the data iterator, feed the data into the network, and optimize.
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0) : # get the inputs inputs, labels = data # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() #print statistics
running_loss += loss.item()
if i % 2000= =1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Copy the code
Output:
[1.2000] loss: 2.286
[1.4000] loss: 1.921
[1.6000] loss: 1.709
[1.8000] loss: 1.618
[1.10000] loss: 1.548
[1.12000] loss: 1.496
[2.2000] loss: 1.435
[2.4000] loss: 1.409
[2.6000] loss: 1.373
[2.8000] loss: 1.348
[2.10000] loss: 1.326
[2.12000] loss: 1.313
Finished Training
Copy the code
5. Test the network on the test set
We trained the network twice on the entire training set, but we also need to check if the network learned anything from the data set.
We detect the category labels output by the prediction neural network according to the actual situation. If the prediction is correct, we add the sample to the correct prediction list.
The first step is to familiarize yourself with the images in the test set.
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: '.' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code
Output:
GroundTruth: cat ship ship plane
Copy the code
Now let’s see what the neural network thinks the picture is. \
outputs = net(images)
Copy the code
The output is the probability of 10 labels. The greater the probability of a category, the more the neural network thinks it is that category. So let’s get the label with the highest probability.
_, predicted = torch.max(outputs, 1)
print('Predicted: '.' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
Copy the code
Output:
Predicted: cat ship ship plane
Copy the code
It looks pretty good.
Let’s take a look at the results of the network on the entire test set.
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Copy the code
Output:
Accuracy of the network on the 10000 test images: 54 %
Copy the code
The results looked better than chance, which got it right 10 percent of the time, as if the network had learned something.
So in what categories did you do well, and in what categories did you do badly?
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10) :print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
Copy the code
Output:
Accuracy of plane : 52 %
Accuracy of car : 63 %
Accuracy of bird : 43 %
Accuracy of cat : 33 %
Accuracy of deer : 36 %
Accuracy of dog : 46 %
Accuracy of frog : 68 %
Accuracy of horse : 62 %
Accuracy of ship : 80 %
Accuracy of truck : 63 %
Copy the code
Train on GPU
How you take a Tensor and put it on a GPU, you take a neural network and put it on a GPU. This operation recursively traverses the modules and converts their arguments and buffers to CUDA tensors.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should printA CUDA device: # Assuming we have a CUDA machine, this operation will display CUDA devices.print(device)
Copy the code
Output:
cuda:0
Copy the code
Let’s say we have a CUDA machine, then these methods recursively traverse all modules and convert their arguments and buffers to CUDA tensors:
net.to(device)
Copy the code
Output:
Net(
(conv1): Conv2d(3.6, kernel_size=(5.5), stride=(1.1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
Copy the code
Keep in mind that you must also convert your input and target values to the GPU at each step:
inputs, labels = inputs.to(device), labels.to(device)
Copy the code
Why don’t we notice GPU speeds increasing a lot? That’s because the network is very small
Practice: Try increasing the width of your network (first nn.Conv2d 2nd argument, second nn.Conv2d 1st argument, they need to be the same number) and see what kind of acceleration you get.
Goals achieved:
-
Insight into PyTorch’s tensor library and neural network.\
-
Trained a small network to classify pictures.\
V. Data parallelism (Optional reading)
Sung Kim and Jenny Kang
In this tutorial, you will learn how to use data parallelizations to work with multiple Gpus.
PyTorch is very easy to use on a GPU. You can put a model on a GPU as follows:
device = torch.device("cuda:0")
model.to(device)
Copy the code
Then you can copy all the tensors onto the GPU:
mytensor = my_tensor.to(device)
Copy the code
Note that just calling mytensor.gpu() does not copy the tensor to the GPU. You need to assign it to a new tensor and use that tensor on the GPU.
Performing forward and back propagation on multiple Gpus is natural. However, PyTorch will default to just one GPU. You can use DataParallel to run models in parallel to make it easy to run operations on multiple Gpus.
model = nn.DataParallel(model)
Copy the code
This is the core behind this tutorial, which we’ll cover in more detail next.
Imports and parameters
Import the PyTorch module and define the parameters.
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
# Parameters and DataLoaders
input_size = 5
output_size = 2
batch_size = 30
data_size = 100
Copy the code
Equipment:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Copy the code
Virtual data set
To make a virtual (random) dataset, you simply implement __getitem__.
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
batch_size=batch_size, shuffle=True)
Copy the code
* * * *
A simple model
As a demonstration, our model takes a single input, performs a linear operation, and returns the result. However, you can use DataParallel on any model (CNN, RNN, Capsule Net, etc.).
We placed a print statement inside the model to detect the size of the input and output vectors. Notice what is printed when the batch level is 0.
class Model(nn.Module):
# Our model
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, input):
output = self.fc(input)
print("\tIn Model: input size", input.size(),
"output size", output.size())
return output
Copy the code
Create a model and parallel the data
This is the core of this tutorial. First, we need to create a model instance and check if we have multiple Gpus. If we have multiple Gpus, we use NN.DataParallel to wrap our model. Then put the model on the GPU via model.to(Device).
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10. ] [10. ] [10. ] on3 GPUs
model = nn.DataParallel(model)
model.to(device)
Copy the code
Output:
Model(
(fc): Linear(in_features=5, out_features=2, bias=True)
)
Copy the code
* * * *
Operating model
Now we can look at the magnitude of the input and output tensors.
for data in rand_loader:
input = data.to(device)
output = model(input)
print("Outside: input size", input.size(),
"output_size", output.size())
Copy the code
Output:
In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
In Model: input size torch.Size([10.5]) output size torch.Size([10.2])
Outside: input size torch.Size([10.5]) output_size torch.Size([10.2])
Copy the code
* * * *
The results of
When we batch 30 inputs and outputs, we get 30 inputs and 30 outputs as expected, but if you have multiple Gpus, you get something like this.
2 the GPU
If you have 2 Gpus, you will see:
# on 2 GPUs
Let's use 2 GPUs! In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2]) Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2]) In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2]) In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2]) Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])Copy the code
3 the GPU
If you have 3 Gpus, you will see:
Let's use 3 GPUs!
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code
8 the GPU
Let's use 8 GPUs!
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
Copy the code
* * * *
conclusion
DataParallel automatically divides data and sends tasks to multiple models on multiple Gpus. After completing tasks for each model, DataParallel collects and merges the results back to you.
For more information, see here:
Pytorch.org/tutorials/b…
The full text after
All the code for this article is posted on Huang’s Github (and will be updated) : \
Github.com/fengdu78/ma…
Official original content (English) :
Pytorch.org/tutorials/b…
Please follow and share ↓↓↓\
Machine learning beginners \
QQ group: 554839127
(Note: there are 6 QQ groups on this site, those who have joined any of them do not need to add more)
Past wonderful review \
-
Conscience Recommendation: Introduction to machine learning and learning recommendations (2018 edition) \
-
Github Image download by Dr. Hoi Kwong (Machine learning and Deep Learning resources)
-
Printable version of Machine learning and Deep learning course notes \
-
Machine Learning Cheat Sheet – understand Machine Learning like reciting TOEFL Vocabulary
-
Introduction to Deep Learning – Python Deep Learning, annotated version of the original code in Chinese and ebook
-
Zotero paper Management tool
-
The mathematical foundations of machine learning
-
Machine learning essential treasure book – “statistical learning methods” python code implementation, ebook and courseware
-
Blood vomiting recommended collection of dissertation typesetting tutorial (complete version)
-
The encyclopaedia of Machine learning introduction – A collection of articles from the “Beginner machine Learning” public account in 2018