Autograd: Automatic differentiation

It will be difficult to understand here. If you can’t understand it once, you are advised to read it several times.

The Autograd package is the core of all neural networks in PyTorch. Let’s introduce it briefly, and then we’re going to train our first neural network. The Autograd software package provides automatic differentiation for all operations on Tensors. It is a run-defined framework, which means defining your backward propagation in the way your code runs, and it can be different from iteration to iteration. Let’s take examples from tensor and gradients.

Tensor

Torch.Tensor is the core class of the package. If you set its.requires_grad attribute to True, you start tracking all operations on the tensor. Once the calculation is complete, you can call.backward() to automatically calculate all gradients. The gradient of this tensor is accumulated in the.grad property.

To stop tracking of the tensor history, you call.detach(), which separates it from the calculation history and prevents future calculations from being tracked.

To stop tracking history (and using memory), you can also wrap the code block with torch.no_grad():. This is especially useful when evaluating a model, which has a trainable parameter of REQUIres_grad = True in the training phase that facilitates tuning, but we don’t need gradients in the evaluation phase.

Another class that is very important for an Autograd implementation is Function. The Tensor and Function connect to each other to build an acyclic graph, which holds the history of the entire calculation process. Each tensor has a.grad_fn property that holds a reference to the Function that created the tensor (if the user created the tensor himself, grad_fn is None).

If you want to compute derivatives, you can call tensor.backward (). If the Tensor is scalar (that is, it contains one element data) then you don’t need to specify any parameter BACKWARD (), but if it has more elements then you need to specify a gradient parameter to specify the shape of the Tensor.

import torch
Copy the code

Create a tensor and set requires_grad=True to track the calculations associated with it

x = torch.ones(2.2, requires_grad=True)
print(x)
Copy the code
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
Copy the code

Let’s do an operation on tensors

y = x + 2
print(y)
Copy the code
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
Copy the code

Y is created as the result of the operation, so it has grad_fn.

print(y.grad_fn)
Copy the code
<AddBackward0 object at 0x000001787ADEDE80>
Copy the code

Do more for y:

z = y * y * 3
out = z.mean()

print(z, out)
Copy the code
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
Copy the code

.requires_grad_( … ) Would change the requires_grad tag of the tensor. The entered tag defaults to False.

a = torch.randn(2.2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)

a.requires_grad_(True)
print(a.requires_grad)

b = (a * a).sum(a)print(b.grad_fn)
Copy the code
False
True
<SumBackward0 object at 0x00000178757355B0>
Copy the code

Gradients

We now propagate backward because out contains a scalar, out.backward() is equivalent to out.backward(torch.tensor(1.)).

out.backward()
Copy the code

Print the gradient d(out)/dx

print(x.grad)
Copy the code
Tensor ([[4.5000, 4.5000], [4.5000, 4.5000]])Copy the code

You should have got a matrix of 4.5. We have the out Tensor as: O =14∑izio = \frac{1}{4}\sum_i z_io=41∑izi, (xi zi = 3 + 2) 2 z_i = 3 (x_i + 2) ^ 2 zi = 3 (xi + 2) 2 and zi ∣ xi = 1 = 27 z_i \ bigr \ rvert_ {x_i = 1} = 27 zi ∣ ∣ ∣ xi = 1 = 27. Therefore, Partial o partial xi = 32 (xi + 2) \ frac {\ partial o} {\ partial x_i} = \ frac {3} {2} (x_i + 2) partial xi partial o = 23 (xi + 2), Hence the partial o partial xi ∣ xi = 1 = 92 = 4.5 \ frac {\ partial o} {\ partial x_i} \ bigr \ rvert_ {x_i = 1} = \ frac {9} {2} = 4.5 partial xi partial o ∣ ∣ ∣ xi = 1 = 29 = 4.5.

You can also do a lot of things with Autograd

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)
Copy the code
Tensor ([786.6164, 1688.7915, 530.0458], grad_fn=<MulBackward0>)Copy the code
gradients = torch.tensor([0.1.1.0.0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)
Copy the code
E+02 tensor ([1.0240, 1.0240 e+03, 1.0240 e-01])Copy the code

You can stop automatically differentiating the tensor from.requires_grad=True in the trace history by putting the code in with torch.no_grad().

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)
Copy the code
True
True
False

Copy the code