60 minutes introduction to Deep learning tool -PyTorch
By Soumith Chintala
Original translation:
Pytorch.org/tutorials/b…
Chinese translation and annotation: Huang Hai Guang
Making downloads:
Github.com/fengdu78/ma…
All code tests passed.
Configuration environment: PyTorch 1.3, Python 3.7,
Host: video card: a 1080TI; Memory: 32GB (note: most code does not require a GPU)
directory
- What is a Pytorch?
- Second, the AUTOGRAD
- Neural network
- Train a classifier
- 5. Data parallelism
What is a PyTorch
It is a Scientific computing package based on Python, aimed at two types of users
- To use gpus instead of Numpy
- A deep learning research platform for maximum flexibility and speed
start
Tensor (Tensors)
Tensors are similar to Numpy’s Ndarrays, except that tensors can use the GPU to speed up calculations.
from __future__ import print_function
import torch
Copy the code
Construct an uninitialized 5*3 matrix:
x = torch.Tensor(5.3)
print(x)
Copy the code
tensor([[ 0.0000 e+00.0.0000 e+00.1.3004 e-42],
[ 0.0000 e+00.7.0065 e-45.0.0000 e+00],
[3.8593 e+35.7.8753 e-43.0.0000 e+00],
[ 0.0000 e+00.1.8368 e-40.0.0000 e+00],
[3.8197 e+35.7.8753 e-43.0.0000 e+00]])
Copy the code
Construct a zero matrix using the type of long
x = torch.zeros(5.3, dtype=torch.long)
print(x)
Copy the code
tensor([[0.0.0],
[0.0.0],
[0.0.0],
[0.0.0],
[0.0.0]])
Copy the code
Build a tensor directly from the data:
x = torch.tensor([5.5.3])
print(x)
Copy the code
tensor([5.5000.3.0000])
Copy the code
Or you can build a tensor from a tensor that you already have. These methods will reuse the properties of the input tensor, for example, dtype, unless the user provides a new value
x = x.new_ones(5.3, dtype=torch.double) # new_* methods take in sizes
print(x) x =torch. Randn_like (x, dtype=torch. Float)print(x) # result has the same sizeCopy the code
tensor([[1..1..1.],
[1..1..1.],
[1..1..1.],
[1..1..1.],
[1..1..1.]], dtype=torch.float64)
tensor([[ 1.1701.0.8342.0.6769],
[1.3060.0.3636.0.6758],
[ 1.9133.0.3494.1.1412],
[ 0.9735.0.9492.0.3082],
[ 0.9469.0.6815.1.3808]])
Copy the code
You get the magnitude of the tensor
print(x.size())
Copy the code
torch.Size([5.3])
Copy the code
Note * * * *
Torch.Size is actually a tuple, so it supports all operations for tuples.
operation
Operations on tensors have multiple syntactic forms, and we’ll use addition as an example.
Grammar 1
y = torch.rand(5.3)
print(x + y)
Copy the code
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Grammar two
print(torch.add(x, y))
Copy the code
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Grammar three:
An output tensor is given as a parameter
result = torch.empty(5.3)
torch.add(x, y, out=result)
print(result)
Copy the code
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Grammar 4:
In-place operation
# add x to y.print(y)
Copy the code
tensor([[ 1.7199.0.1819.0.1543],
[0.5413.1.1591.1.4098],
[ 2.0421.0.5578.2.0645],
[ 1.7301.0.3236.0.4616],
[ 1.2805.0.4026.0.6916]])
Copy the code
Pay attention to
Any operation that changes a tensor in place has an _ suffix. For example, x.copper_ (y), the x.t_() operation will change x.
You can use all numpy indexing operations.
You can use all sorts of fancy indexing features like standard NumPy
print(x[:, 1])
Copy the code
tensor([0.8342.0.3636.0.3494.0.9492.0.6815])
Copy the code
Resize: To resize a tensor/reshape a tensor, you can use torch. View:
x = torch.randn(4.4)
y = x.view(16)
z = x.view(- 1.8) # - 1Means that no dimension is specifiedprint(x.size(), y.size(), z.size())
Copy the code
torch.Size([4.4]) torch.Size([16]) torch.Size([2.8])
Copy the code
If you have a one-element tensor, use.item() as a Python number
x = torch.randn(1)
print(x)
print(x.item())
Copy the code
tensor([0.3441])
0.34412217140197754
Copy the code
Read later
Here: (pytorch.org/docs/stable…).
It describes more than one hundred tensor operations, including transposes, indexes, mathematical operations, linear algebra, random numbers, etc.
Numpy bridge
Converting a Torch tensor to a NUMpy array or vice versa is simple.
The Torch tensor and the NUMpy array will share the underlying memory, and changing one will also change the other.
Convert the Torch tensor to a NUMpy array
a = torch.ones(5)
print(a)
Copy the code
tensor([1..1..1..1..1.])
Copy the code
b = a.numpy()
print(b)
print(type(b))
Copy the code
[ 1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>
Copy the code
Let’s see how the value of the NUMpy array changes by doing the following.
a.add_(1)
print(a)
print(b)
Copy the code
tensor([2..2..2..2..2.[])2. 2. 2. 2. 2.]
Copy the code
Convert the NUMpy array to the Torch tensor
See how changing the NUMpy array automatically changes the Torch tensor.
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)
Copy the code
[ 2. 2. 2. 2. 2.]
tensor([2..2..2..2..2.], dtype=torch.float64)
Copy the code
All tensors on the CPU, except character tensors, support conversion between NUMpy.
CUDA tensor
You can use the. To method to move the tensor to any device.
# let us run this cell only if CUDA is available
# We will use ` `torch.device` ` objects to move tensors in and out of GPU
if torch.cuda.is_available():
device = torch.device("cuda") # a CUDA device object
y = torch.ones_like(x, device=device) # directly create a tensor on GPU
x = x.to(device) # or just use strings ` `.to("cuda")` `
z = x + y
print(z)
print(z.to("cpu", torch.double)) # ` `.to` ` can also change dtype together!
Copy the code
tensor([1.3441], device='cuda:0')
tensor([1.3441], dtype=torch.float64)
Copy the code
Official code of this chapter:
-
Jupyter notebook:
Pytorch.org/tutorials/_…
(2) Automatic differentiation
At the heart of all the neural networks in PyTorch is the Autograd package. Let’s start with a brief introduction to the package, and then train our first neural network.
The Autograd package provides automatic derivation for all operations on tensors. It is a run-time defined framework, which means that backpropagation is defined in terms of how your code runs, and can be different from iteration to iteration.
Let’s look at the package with some simple examples:
Tensor (Tensor)
Torch.Tensor is the core class of the package. If you set its attribute. Requires_grad to True, you start tracking all operations on it. When the calculation is complete, you can call.backward() and automatically compute all gradients. The gradient of this tensor will accumulate in the.grad property.
To prevent the tensor from tracing the history, call.detach() to detach it from the calculation history and prevent future calculations from being traced.
To prevent tracking history (and using memory), you can also wrap code blocks with torch.no_grad() : this can be particularly useful when evaluating a model, which might have a trainable parameter of REQUIres_grad = True, but we don’t need gradients.
There is another class that is very important for the Autograd implementation – Function.
Tensor and Function connect to each other and build an acyclic graph and build a complete calculation. Each Tensor has a. Grad_fn property, which references functions that have already created a Tensor (except for the user created Tensors – their grad_fn is None).
If you want to compute the derivative, you can call.backward() on the Tensor. If the Tensor is scalar (that is, it contains one element of data), you don’t need to specify any parameters for backward(), but if it has more elements, you need to specify a gradient parameter, which is a Tensor that matches shapes.
import torch
Copy the code
Create a tensor and set requires_grad = True to track its calculation
x = torch.ones(2.2, requires_grad=True)
print(x)
Copy the code
tensor([[1..1.],
[1..1.]], requires_grad=True)
Copy the code
To perform operations on tensors:
y = x + 2
print(y)
Copy the code
tensor([[3..3.],
[3..3.]], grad_fn=<AddBackward0>)
Copy the code
Since y is created by an operation, it has grad_fn, while X is created by the user, so its grad_fn is None.
print(y.grad_fn)
print(x.grad_fn)
Copy the code
<AddBackward0 object at 0x000001E020B794A8>
None
Copy the code
Perform the operation on y
z = y * y * 3
out = z.mean()
print(z, out)
Copy the code
tensor([[27..27.],
[27..27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)
Copy the code
.requires\_grad_(…) Change the existing Tensor’s Requires_grad flag in place. If not given, the input flag defaults to False.
a = torch.randn(2.2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
Copy the code
False
True
<SumBackward0 object at 0x000001E020B79FD0>
Copy the code
Gradient (Gradients)
Now we do backpropagation,out.backward() equals out.backward(torch. Tensor (1.))
out.backward()
Copy the code
Output the gradient d(out)/dx of out with respect to x:
print(x.grad)
Copy the code
tensor([[4.5000.4.5000],
[4.5000.4.5000]])
Copy the code
You should get a matrix with all values of 4.5. Let’s call the tensor out “”. So in mathematics if you have a vector-valued function then the gradient with respect to is the Jacobian:
In general, Torch. Autograd is an engine that computes jacobian cross products. That is, given any vector, take the product. If theta happens to be the gradient of a scalar function, so then by the chain rule, the Jacobian vector product is going to be the gradient of theta with respect to theta
This property of the Jacobian cross product makes it convenient to feed an external gradient into a model with a non-scalar output.
Now let’s look at an example of a Jacobian cross product:
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
Copy the code
tensor([ 384.5854.13.6405.1049.2870], grad_fn=<MulBackward0>)
Copy the code
Now in this case, y is no longer a scalar. Torch. Autograd cannot compute the full Jacobian determinant directly, but if we only want the Jacobian cross product, we simply pass the vector backwards as a parameter:
v = torch.tensor([0.1.1.0.0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)
Copy the code
tensor([5.1200 e+01.5.1200 e+02.5.1200 e-02])
Copy the code
You can also stop using trace history by using the torch.no_grad() code, which uses.requires_grad = True on the tensor.
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
Copy the code
True
True
False
Copy the code
About autograd and the Function of the document at http://pytorch.org/docs/autograd
Official code of this chapter:
-
Jupyter notebook:
Pytorch.org/tutorials/_…
Neural network
You can use the torch. Nn package to build a neural network.
You already know the Autograd package. The NN package relies on the Autograd package to define models and take derivatives. An nn.Module contains layers and a forward(input) method that returns output.
For example, consider the following network that classifies digital images.
It is a simple feedforward neural network that takes one input and then layer upon layer of input until the result is reached.
The typical training process of neural network is as follows:
- Define the neural network model, which has some learnable parameters (or weights);
- Iterating over data sets;
- Processing input through neural network;
- Calculate the loss (the difference between the output and the correct value)
- The gradient is propagated back to the network parameters;
- To update network parameters, use the following simple update principles:
weight = weight - learning_rate * gradient
Define the network
Let’s first define a network:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1.6.5)
self.conv2 = nn.Conv2d(6.16.5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5.120)
self.fc2 = nn.Linear(120.84)
self.fc3 = nn.Linear(84.10)
def forward(self, x):
# Max pooling over a (2.2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2.2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(- 1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Copy the code
Net(
(conv1): Conv2d(1.6, kernel_size=(5.5), stride=(1.1))
(conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
Copy the code
You just define forward functions, and backward functions (calculating gradients) are created automatically for you when using Autograd. You can use whatever the Tensor does in the forward function.
Net.parameters () returns the parameters the model needs to learn.
params = list(net.parameters())
print(len(params))
print(params[0].size())
Copy the code
10
torch.Size([6.1.5.5])
Copy the code
The input and output of forward are both autograd.variable. Note: The desired input size for this network (LeNet) is 32*32. If you are using the MNIST dataset to train this network, resize the image to 32*32.
input = torch.randn(1.1.32.32)
out = net(input)
print(out)
Copy the code
tensor([[0.1217.0.0449.0.0392.0.1103.0.0534.0.1108.0.0565.0.0116.0.0867.0.0102]], grad_fn=<AddmmBackward>)
Copy the code
The gradient cache of all parameters is cleared, and then the random gradient is propagated back.
net.zero_grad()
out.backward(torch.randn(1.10))
Copy the code
- Pay attention to
torch.nn
Supports only small batch input, wholetorch.nn
Packages only support small batch samples, not single samples- For example,
nn.Conv2d
A 4-dimensional tensor will be accepted, with each dimension being (number of samples * number of channels * height * width). - If you have a single sample, just use
input.unsqueeze(0)
To add other dimensions.
Before we continue, let’s review all the classes we’ve seen so far.
review
torch.Tensor
– Support automatic programming operations (e.gbackward()
). And keep the tensor of the gradient.nn.Module
– Neural network module. Encapsulate parameters, move to GPU to run, export, load etcnn.Parameter
– A tensor when assigned to aModule
Is automatically registered as a parameter.autograd.Function
– Implements a forward and reverse definition of an automatic derivative operation, creating at least one for each tensor operationFunction
Node that connects to the function that creates the tensor and encodes its history.
Now, we include the following:
- Define a neural network
- Handle inputs and calls
backward
The rest:
- Calculated loss value
- Update the weights of the neural network
Loss function
A loss function takes a pair (output, target) as inputs (output is the network’s output, target is the actual value) and calculates a value to estimate how much the network’s output differs from the target value.
There are several different loss functions in the NN package. A simple loss function is: nn.mseloss, which calculates the mean square error between the input and the target.
Such as:
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1.- 1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
Copy the code
tensor(0.5663, grad_fn=<MseLossBackward>)
Copy the code
Now you trace the loss backwards using its.grad_fn property and you should see a calculation graph that looks something like this:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss
So, when you call Loss.Backward (), the whole graph is divided into losses and all tensors in the graph that have requires_grad = True, and the gradient of their.grad tensor is accumulated.
To illustrate, let’s reverse track a few steps:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0] [0]) # Linear
print(loss.grad_fn.next_functions[0] [0].next_functions[0] [0])
Copy the code
<MseLossBackward object at 0x0000029E54C509B0>
<AddmmBackward object at 0x0000029E54C50898>
<AccumulateGrad object at 0x0000029E54C509B0>
Copy the code
Back propagation
To propagate the error back, all we need to do is call Loss.Backward (). You need to clear existing gradients or gradients will be added to existing gradients.
Now we will call Loss.Backward () and look at the gradient of the bias item at conv1 layer before and after the backpropagation.
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Copy the code
conv1.bias.grad before backward
tensor([0..0..0..0..0..0.])
conv1.bias.grad after backward
tensor([ 0.0006.0.0164.0.0122.0.0060.0.0056.0.0052])
Copy the code
Read later:
The neural network package contains various modules and loss functions used to form the building blocks of deep neural networks. See the full documentation [here]:(pytorch.org/docs/nn)
The only thing left:
- Update network weights
Update the weight
The simplest updating rule in practice is stochastic gradient descent (SGD).
Weight = weight - learning_rate ∗ gradient
We can implement this rule using simple Python code.
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
Copy the code
However, when you use neural networks, you want to use various update rules, such as SGD,Nesterov-SGD,Adam, RMSPROP, etc. To do this, we built a package called Torch. Optim that implements all of these rules. Using them is very simple:
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
Copy the code
Pay attention to
Observe how to manually set the gradient buffer to zero using optimizer.zero_grad(). This is because the gradient is cumulative as explained in the backpropagation section.
Official code of this chapter:
-
Jupyter notebook:
Pytorch.org/tutorials/_…
Train a classifier
You have learned how to define a neural network, calculate losses and update the weights of the network.
Now you’re probably wondering: Where does the data come from?
About data
In general, when working with image, text, audio, and video data, you can use the standard Python package to load the data into a NUMpy array. Then translate that array into torch.*Tensor.
- For images, there are Pillow,OpenCV packages and so on that are very useful
- For audio, there are packages such as Scipy and Librosa
- For text, you can load it using raw Python and Cython, or you can use NLTK and SpaCy. For vision, we created one
torchvision
Package, which contains data loads for common data sets, such as Imagenet,CIFAR10,MNIST, etc., and image converters, i.etorchvision.datasets
andtorch.utils.data.DataLoader
.
This provides great convenience and avoids code duplication.
In this tutorial, we use the CIFAR10 dataset, which has the following 10 categories: ‘airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘Dog’, ‘Frog’, ‘Horse’, ‘ship’, ‘Truck’. The image size in this dataset is 3 × 32 × 32, i.e.,3 channels,32 × 32 pixels.
Train an image classifier
We will proceed in the following order:
- use
torchvision
Load and normalize CIFAR10 training set and test set. - Define a convolutional neural network
- Defining loss function
- Train the network on the training set
- Test the network on the test set
1. Load and normalize CIFAR0
Loading CIFAR10 with TorchVision is very easy.
import torch
import torchvision
import torchvision.transforms as transforms
Copy the code
The output of torchvision is the PILImage image of [0,1], which is transformed into a tensor whose normalization range is [-1, 1].
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5.0.5.0.5), (0.5.0.5.0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane'.'car'.'bird'.'cat'.'deer'.'dog'.'frog'.'horse'.'ship'.'truck') # This process is a bit slow and will download approximately340MB Image data.Copy the code
Files already downloaded and verified
Files already downloaded and verified
Copy the code
We show some interesting training images.
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1.2.0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code
plane deer dog plane
2. Define a convolutional neural network
The neural network code is copied from the previous section of neural network and modified to accept 3-channel images instead of accepting single-channel images.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3.6.5)
self.pool = nn.MaxPool2d(2.2)
self.conv2 = nn.Conv2d(6.16.5)
self.fc1 = nn.Linear(16 * 5 * 5.120)
self.fc2 = nn.Linear(120.84)
self.fc3 = nn.Linear(84.10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(- 1.16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Copy the code
3. Define loss functions and optimizers
We use cross entropy as the loss function and stochastic gradient descent of the driving quantity.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Copy the code
4. Training networks
This is where the fun starts, we just loop over the data iterator, feed the data into the network, and optimize.
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0) : # get the inputs inputs, labels = data # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() #print statistics
running_loss += loss.item()
if i % 2000= =1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Copy the code
[1.2000] loss: 2.286
[1.4000] loss: 1.921
[1.6000] loss: 1.709
[1.8000] loss: 1.618
[1.10000] loss: 1.548
[1.12000] loss: 1.496
[2.2000] loss: 1.435
[2.4000] loss: 1.409
[2.6000] loss: 1.373
[2.8000] loss: 1.348
[2.10000] loss: 1.326
[2.12000] loss: 1.313
Finished Training
Copy the code
5. Test the network on the test set
We trained the network twice on the entire training set, but we also need to check if the network learned anything from the data set.
We detect the category labels output by the prediction neural network according to the actual situation. If the prediction is correct, we add the sample to the correct prediction list.
The first step is to familiarize yourself with the images in the test set.
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: '.' '.join('%5s' % classes[labels[j]] for j in range(4)))
Copy the code
GroundTruth: cat ship ship plane
Copy the code
Now let’s see what the neural network thinks the picture is.
outputs = net(images)
Copy the code
The output is the probability of 10 labels. The greater the probability of a category, the more the neural network thinks it is that category. So let’s get the label with the highest probability.
_, predicted = torch.max(outputs, 1)
print('Predicted: '.' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
Copy the code
Predicted: cat ship ship plane
Copy the code
It looks pretty good.
Let’s take a look at the results of the network on the entire test set.
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Copy the code
Accuracy of the network on the 10000 test images: 54 %
Copy the code
The results looked better than chance, which got it right 10 percent of the time, as if the network had learned something.
So in what categories did you do well, and in what categories did you do badly?
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10) :print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
Copy the code
Accuracy of plane : 52 %
Accuracy of car : 63 %
Accuracy of bird : 43 %
Accuracy of cat : 33 %
Accuracy of deer : 36 %
Accuracy of dog : 46 %
Accuracy of frog : 68 %
Accuracy of horse : 62 %
Accuracy of ship : 80 %
Accuracy of truck : 63 %
Copy the code
What’s next?
How do we run a neural network on a GPU?
Train on GPU
How you take a Tensor and put it on a GPU, you take a neural network and put it on a GPU. This operation recursively traverses the modules and converts their arguments and buffers to CUDA tensors.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should printA CUDA device: # Assuming we have a CUDA machine, this operation will display CUDA devices.print(device)
Copy the code
cuda:0
Copy the code
Let’s say we have a CUDA machine, then these methods recursively traverse all modules and convert their arguments and buffers to CUDA tensors:
net.to(device)
Copy the code
Net(
(conv1): Conv2d(3.6, kernel_size=(5.5), stride=(1.1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6.16, kernel_size=(5.5), stride=(1.1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
Copy the code
Keep in mind that you must also convert your input and target values to the GPU at each step:
inputs, labels = inputs.to(device), labels.to(device)
Copy the code
Why don’t we notice GPU speeds increasing a lot? That’s because the network is very small.
Practice:
Try increasing the width of your network (2nd argument to the first nn.Conv2d, 1st argument to the second nn.Conv2d, they need to be the same number) and see what kind of acceleration you get.
Goals achieved:
- We have a deep understanding of PyTorch’s tensor library and neural network
- Trained a small network to classify images
Official code of this chapter:
-
Jupyter notebook:
Pytorch.org/tutorials/_…
V. Data parallelism (Optional reading)
Sung Kim and Jenny Kang
In this tutorial, you will learn how to use data parallelizations to work with multiple Gpus.
PyTorch is very easy to use on a GPU. You can put a model on a GPU as follows:
device = torch.device("cuda:0")
model.to(device)
Then you can copy all the tensors onto the GPU:
mytensor = my_tensor.to(device)
Note that just calling mytensor.gpu() does not copy the tensor to the GPU. You need to assign it to a new tensor and use that tensor on the GPU.
Performing forward and back propagation on multiple Gpus is natural. However, PyTorch will default to just one GPU. You can use DataParallel to run models in parallel to make it easy to run operations on multiple Gpus.
model = nn.DataParallel(model)
This is the core behind this tutorial, which we’ll cover in more detail next.
Imports and parameters
Import the PyTorch module and define the parameters.
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
# Parameters and DataLoaders
input_size = 5
output_size = 2
batch_size = 30
data_size = 100
Copy the code
Equipment:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Copy the code
Virtual data set
To make a virtual (random) dataset, you simply implement __getitem__.
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
batch_size=batch_size, shuffle=True)
Copy the code
A simple model
As a demonstration, our model takes a single input, performs a linear operation, and returns the result. However, you can use DataParallel on any model (CNN, RNN, Capsule Net, etc.).
We placed a print statement inside the model to detect the size of the input and output vectors. Notice what is printed when the batch level is 0.
class Model(nn.Module):
# Our model
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, input):
output = self.fc(input)
print("\tIn Model: input size", input.size(),
"output size", output.size())
return output
Copy the code
Create a model and parallel the data
This is the core of this tutorial. First, we need to create a model instance and check if we have multiple Gpus. If we have multiple Gpus, we use NN.DataParallel to wrap our model. Then put the model on the GPU via model.to(Device).
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10. ] [10. ] [10. ] on3 GPUs
model = nn.DataParallel(model)
model.to(device)
Copy the code
Model(
(fc): Linear(in_features=5, out_features=2, bias=True)
)
Copy the code
Operating model
Now we can look at the magnitude of the input and output tensors.
for data in rand_loader:
input = data.to(device)
output = model(input)
print("Outside: input size", input.size(),
"output_size", output.size())
Copy the code
In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
In Model: input size torch.Size([30.5]) output size torch.Size([30.2])
Outside: input size torch.Size([30.5]) output_size torch.Size([30.2])
In Model: input size torch.Size([10.5]) output size torch.Size([10.2])
Outside: input size torch.Size([10.5]) output_size torch.Size([10.2])
Copy the code
The results of
When we batch 30 inputs and outputs, we get 30 inputs and 30 outputs as expected, but if you have multiple Gpus, you get something like this.
2 the GPU
If you have 2 Gpus, you will see:
3 the GPU:
If you have 3 Gpus, you will see:
8 the GPU:
If you have 8 Gpus, you will see:
conclusion
DataParallel automatically divides data and sends tasks to multiple models on multiple Gpus. After completing tasks for each model, DataParallel collects and merges the results back to you.
For more information, see here:
Pytorch.org/tutorials/b…
Official code of this chapter:
-
Jupyter notebook:
Pytorch.org/tutorials/_…
conclusion
Github download:
Github.com/fengdu78/ma…
Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.
Highlights from the past2019Machine learning Online Manual Deep Learning online Manual AI Basic Download (Part I) note: To join our wechat group or QQ group, please reply "add group" to join knowledge planet (4500+ user ID:92416895), please reply to knowledge PlanetCopy the code
Like articles, click Looking at the