Use PyTorch to build neural network and back propagation calculation

Some time ago, there was an epidemic in Nanjing, probably because of the improper cleaning of foreign aircraft, which led to the infection of cleaning workers. You know, when you’re abroad, you’re at home. Shenyang in a case of infected people, I am 22, flying to hangzhou by plane from shenyang, I am just a close contact with personnel after three rows, and so become a time that I had close contact with people, people off the plane just to hangzhou was taken away by the centers for disease control and prevention, to enjoy the full free isolation package, have to say that large data centers for disease control and the control is really powerful. In this period of time, also let me settle down to do something, has been pigeons before the public number also began to write… But quarantine does make me feel like a nerd… Without further ado, this section will further analyze the neural network algorithm based on PyTorch from the back propagation algorithm.

Building a neural network

In the training of neural network, the common algorithm is back propagation. In this algorithm, the parameters are adjusted according to the gradient of the given parameters corresponding to the loss function. To calculate these gradients, use PyTorch’s built-in function torch. Autograd. It supports gradient computing for any network. Through the construction of a layer of neural network to carry out detailed analysis;

import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5.3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
Copy the code

By PyTorch definition with input X, parameters w,b and loss function, a layer of neural network is constructed.

Tensors, functions, computations

print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)
Copy the code

The output is as follows;So we’re going to construct the computation graph by tensors and the function is essentiallyFunctionThe object. This object defines the forward computation function that computes the derivative in back propagation. Store the back propagation function ingrad_fnIn the.

Refer to the PyTorch documentation pytorch.org. In the entire network, w, b are parameters that need to be optimized. To calculate the gradient of the loss function for these parameters, the attribute of requires_grad for these tensors is set.

Computing the gradient

Gradient, originally a vector, means that the directional derivative of a function at that point is maximized in that direction, that is, the function changes fastest and at the highest rate in that direction (the direction of the gradient) at that point. In order to optimize the weight of the neural network parameters, it is necessary to calculate the derivative of the loss function with respect to the parameters, and calculate with loss.backward().

loss.backward()
print(w.grad)
print(b.grad)
Copy the code

The output is as follows;

We can only get requeires_grad properties set to True for the leaf nodes of the grad computed graph. Backward can only be used once to calculate a gradient on a graph. If backward is called multiple times on the same graph, we need to pass retain_graph-true and the call backward

More graph calculations

Conceptually, Autograd keeps a record of the data (tensors) and all the operations that have been performed (and the new tensors that have been generated) in a directed acyclic graph (DAG) composed of Function objects. In this DAG, the leaves are the input tensors and the roots are the output tensors. By tracking this graph from roots to leaves, you can automatically calculate the gradient using the chain rule. In the forward transfer process, autograd needs to compute the result tensor, keeping the gradient function in DAG and calling abckward() in DAG needs to compute the respective gradients, accumulate them in their respective Grad properties, and use the chain rule to pass them to the leaf tensor.

DAG is dynamic; each time the function **backward()** is called, autograd starts filling in the new figure, which can be done on each iteration if needed.

The gradient and the Jacobian product

The gradient of a tensor in the direction of an arbitrary constant C:N orderThe gradient of the tensor is zeron+1Tensor field. The Jacobic product is a matrix representing all the possible partial derivatives of two vectors. It’s the gradient of one vector with respect to another vector,AutogradYou can differentiate tensors, do back propagation from one variable. In deep learning, this variable usually holds the value of the cost function and automatically calculates all back propagation gradientsYou take the Jacobian product instead of taking the Jacobian itself;

The calculated results are shown below

inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)
Copy the code

When BACKWARD is called a second time with the same arguments, the gradient has a different value. This happens because while propagating back, PyTorch accumulates the gradient, adding the value of the gradient to the attributes of all the nodes in the Calculation of the Grad graph. If you want to calculate the appropriate gradient, you need to zero out the grad.

conclusion

On the issue of disabling gradient tracking, sometimes we don’t need to track the entire computation history. Fixing certain parameters of the neural network is a common way to fine-tune the neural network and speed up the computation when only passing forward. In the case of graph computation, Autograd keeps a record of the data (tensors) and all the operations performed (and the new tensors generated) in a directed acyclic graph (DAG) composed of Function objects. In this DAG, the leaves are the input tensors and the roots are the output tensors. By tracking this graph from root to leaf, you can use the chain rule to automatically calculate the gradient. Forward pass, autograd calculates the result tensor, maintains the gradient function of the operation in the DAG, and in reverse pass, calculates each gradient. Grad_fn, accumulates them into the. Grad property, and uses the chain rule to propagate to the leaf tensor. The next section will look at optimizing model parameters

Refer to the development documentation:Pytorch.org/tutorials/b…

Recommended reading

  • Differential operator method

  • Using PyTorch to construct neural network model for handwriting recognition

  • Neural network model and back propagation calculation were constructed using PyTorch

  • How to optimize model parameters and integrate models

  • TORCHVISION Target detection fine-tuning tutorial