0 x00 the

This series will take a look at how PyTorch’s auto-differentiation capabilities are implemented in about ten articles. This article is the first forward propagation that introduces automatic differentiation (gradient calculation) as part of the PyTorch base class. Because the number of words is too much (12,000 words), so split into two parts.

The previous chapters in the series are linked as follows:

Automatic Differentiation of Deep Learning Tools (1)

Automatic Differentiation of Deep Learning Tools (2)

Automatic differentiation of Deep Learning Tools (3) — Interpretation of examples

0x01 General Logic

For completeness, we have extracted the general logical relationship from the end of the previous article as follows.

If you look at the process of forward calculation from the perspective of computation diagram, you are constructing diagram and executing diagram. The construction diagram “describes the relationships between node operations.” The “execution graph” is the execution of the operation relation in the session, which is the process of the tensor propagating forward in the calculation graph.

Forward computation relies on some base classes, and before looking at forward propagation in detail, we need to look at the logical relationships between these base classes. Analyzing the PyTorch system from a DAG perspective, the logic is as follows.

  • Figure represents computing tasks. PyTorch treats computation as a kind of directed acyclic graph, or computational graph, but this is a virtual graph with no real data structure in the code.

  • The graph consists of nodes and edges.

  • Nodes represent operational operations.

    • A node takes zero or more Tensor through the edge, and then the node performs the calculation and produces zero or more Tensor.
    • The node’s member variable, next_functions, is a tuple list that represents which other functions the node is exporting to. The number of lists is the number of edges of this grad_fn. Each tuple in the list corresponds to an Edge message, which contains (edge. function, edge.input_nr).
  • Side (Edge)This is the flow relationship between the operations.

    • Edge.function: indicates which other function this Edge needs to output to.
    • Edge.input_nr: Specifies the Edge input for Function.
  • useTensorPresentation data, that is, data flowing between nodes, without which the graph is meaningless.

See the following figure for details:

  +---------------------+              +----------------------+  
  | SubBackward0        |              | PowBackward0         |  
  |                     |      Edge    |                      |  Edge
  |   next_functions  +-----+--------> |     next_functions +----------> ...  
  |                     |   |          |                      |  
  +---------------------+   |          +----------------------+  
 |  
 |  
 |          +----------------------+  
 |  Edge    | MulBackward0         |  
 +--------> |                      |  Edge
 |     next_functions +----------> ...  
 |                      |  
 +----------------------+  
Copy the code

0 x02 abandoned class

Let’s take a look at a few deprecated classes. Although these classes are deprecated, they are still used a lot in the code, and there are a lot of articles on the Internet about them, so we need to study them first. We may use them mixed in the article, but we hope you understand.

2.1 Variable

In the early versions, you had the Tensor and Variable data structures to store your data, and the Tensor only does multidimensional arrays. The job of automatic differentiation is done by Variable. Variable contains attributes related to autograd, which can be leaf nodes in the calculation diagram or intermediate variables generated during calculation.

After version 0.4.0, the functionality of Tensor and Variable has been combined, making it much easier to use automatic differentiation. Now, Variable is really Tensor, but it’s only for backwards compatibility.

Variable (deprecated)
^^^^^^^^^^^^^^^^^^^^^
​
.. warning::
    The Variable API has been deprecated: Variables are no longer necessary to
    use autograd with tensors. Autograd automatically supports Tensors with
    ``requires_grad`` set to ``True``. Below please find a quick guide on what
    has changed:
​
    - ``Variable(tensor)`` and ``Variable(tensor, requires_grad)`` still work as expected,
      but they return Tensors instead of Variables.
    - ``var.data`` is the same thing as ``tensor.data``.
Copy the code

In the definition of a Variable: the torch/CSRC autograd/Variable h, we can see the comments in the “Gradient Edges” of the relevant part. As you can see, “Variable” has the concept of “gradient_edge”, which is the edge of an automatic gradient graph, used in backpropagation to associate variables with specific inputs to the gradient function.

To be more precise, the gradient function can be one of two functions:

  • grad_fnIf variable is inside the graph. This is the gradient function that produces the gradient variable.
  • grad_accumulator, if the variable is a leaf node, it adds a scalar gradient value to its'grad'In variables.
namespace torch { namespace autograd { /// `Variable` is exactly the same as `Tensor` (i.e. we have `using Variable = at::Tensor`). /// This means you can perform all the usual mathematical and other /// operations you can perform on `Tensor`s also on `Variable`s. /// /// The only reason we are keeping the `Variable` class is backward compatibility ///  with external user's legacy C++ frontend code. Our intention is to eliminate /// the `Variable` class in the near future. using Variable = at::Tensor; } // namespace autograd } // namespace torch /// Gradient Edges ///~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Furthermore, `Variable`s have the notion of a `gradient_edge`, which is the /// edge in the autograd graph that connects the variable to a particular input /// of the gradient function that will be invoked with the variable during the /// backward pass. More precisely, this gradient function can be one of two /// things: /// 1. A `grad_fn`, if the variable is in the interior of the graph. This is the /// gradient of the function that produced the variable. /// 2. A `grad_accumulator`, if the variable is a leaf, which accumulates a /// scalar gradient value into its `grad` variable.Copy the code

2.2 the Function

In combination with the previous concept of Variable, Function refers to the operation performed on a node in the calculation graph, such as addition, subtraction, multiplication and division convolution, etc. Every time you put an operation on the Tensor, you have a Function object that records the input of the operation, records what happens, produces the result of the operation. The Tensor uses the.grad_fn property to record the entry to this diagram.

/// To use custom autograd operations, implement a Function subclass with /// static forward and backward functions: /// /// `forward` can take as many arguments as you want and should return either a /// variable list or a Variable. Use  of any direct Variable arguments will be /// registered in the graph but no vectors/sets or any other data structures /// will be traversed. You can use c10::optional<Tensor> as one of the arguments /// and it will be registered as a variable in the graph if the argument has a /// value. It should take a pointer to `torch::autograd::AutogradContext` as  the /// first argument. Variables can be saved in the `ctx` using /// `ctx->save_for_backward` /// (see `torch::autograd::AutogradContext::save_for_backward`) and other data /// can be saved in the `ctx->saved_data` map /// (see `torch::autograd::AutogradContext::saved_data`) /// in the form of `<std::string, at::IValue>` pairs. /// /// `backward` should take a pointer to `torch::autograd::AutogradContext` /// and a variable list containing as many Variables as there were outputs from /// `forward` as arguments. It should return as many Variables as there were /// inputs with each of them containing the gradient w.r.t. its corresponding /// input. Variables saved in `forward` can be accessed with /// `ctx->get_saved_variables` (see /// `torch::autograd::AutogradContext::get_saved_variables`) and other saved /// data can be accessed from `ctx->saved_data`. template <class T> struct TORCH_API Function { // We need to use a different template parameter than T here because T will // inherit from Function, and when Function<T> is instantiated, T::forward // is not declared yet. // The enable_if check is to ensure that the user doesn't explicitly provide // the parameter X. template<typename X=T, typename... Args> static auto apply(Args&&... args) -> std::enable_if_t<std::is_same<X,T>::value, forward_t<X,Args... > >; };Copy the code

0x03 Tensor

As mentioned earlier, the Tensor forms the structural basis for forward/back propagation, and the Tensor is one of the building blocks for computational diagrams in PyTorch.

Tensor is the key data structure PyTorch uses for multidimensional array calculation and automatic differentiation.

  • TensorNdarray, similar to Numpy, can be used againstTensorTo perform various mathematical operations;
  • When setting.requires_grad = TrueIn theTensorThe various operations performed above are recorded for subsequent gradient calculations.

3.1 Defining in Python

Let’s look at the Tensor at run time in the first example, where you can see the Tensor member variables.

Q = {Tensor} data = {Tensor} tensor(-12.) device = {device} cpu dtype = {dtype} torch.float32 grad = {NoneType} None grad_fn = {SubBackward0} metadata = {dict: 0} {} next_functions = {tuple: 2} 0 = {tuple: 2} (<MulBackward0 object at 0x000001F9547A5848>, 0) 1 = {tuple: 2} (<PowBackward0 object at 0x000001F9547A53C8>, 0) __len__ = {int} 2 requires_grad = {bool} True is_cuda = {bool} False is_leaf = {bool} False is_meta = {bool} False is_mkldnn = {bool} False is_mlc = {bool} False is_quantized = {bool} False is_sparse = {bool} False is_sparse_csr = {bool} False is_vulkan = {bool} False is_xpu = {bool} False layout = {layout} torch.strided name = {NoneType} None names  = {tuple: 0} () ndim = {int} 0 output_nr = {int} 0 requires_grad = {bool} True shape = {Size: 0} torch.Size([])Copy the code

Let’s look at some of the member variables:

  • Data: Data of the tensor.

  • Dtype: data type of the tensor.

  • Device: Type of device that stores the tensor, such as CPU or GPU.

  • Grad: Saves the gradient corresponding to data, which is the same shape as data.

    • PyTorch automatically tracks and records all operations on the tensor and is called after the current direction has been evaluated.backward()The method automatically calculates the gradient and saves the calculation to the Grad attribute.
    • Requires_grad = False, grad is None.
    • The gradient value will not be cleared automatically, and the gradient of the previous moment needs to be zero every time when calculating backward, otherwise the gradient value will be accumulated all the time.
  • Grad_fn: points to a Function object.

    • This Function object is used to calculate the gradient of the input when propagating back.
    • If the tensor is a non-leaf node, then Function is a back-propagation Function operating towards the leaf node. For example, the Function corresponding to O node in the example is MulBackward, namely the reverse Function of multiplication operation.
    • If the tensor is a leaf node and requires_grad is True, grad_fn is None.
    • Grad_fn has a property next_functions, which is a two-dimensional tuple of the form ((function 1, integer 1), (function 2, integer 2),… , (function n, integer n). We’ll explain more about that later.
  • Is_leaf: Records whether the tensor is a leaf node.

    • The tensors explicitly initialized by the user are leaf nodes.
    • All tensors for requires_grad=False are also leaf nodes by convention.
    • is_leafProperties only make sense if they need to be differentiated. For any tensor, we can usetensor.is_leafTo see if it’s a leaf tensor. In the process of back propagation, onlyis_leaf=TrueYou need to take the derivative of the tensor to keep the result.
    • For leaf nodes, thegrad_fnProperties are empty; And for non-leaf nodes, because they were generated by some operation, it’sgrad_fnDon’t be empty.
  • Requires_grad: True means that the Tensor needs to take a derivative to determine whether the Tensor needs to be tracked and calculate the gradient.

    • requires_gradProperty defaults to False, i.eTensorVariables do not need to be differentiated by default.
    • If a node’srequires_gradIs True, then all nodes that depend on itrequires_gradIt will also be True. In other words, if a node depends on all nodes that don’t need to take derivatives, then itsrequires_gradIt’s also going to be False. Therefore, in the process of back propagation, the subgraph where the node is located will be excluded from the calculation process.

The Python definition is really just a mapping of what is defined in the C++ world. Let’s look at how it is defined in C++.

3.2 Searching for Definitions

So let’s take a Tensor step by step.

First up: torch_c_variableFunctions.pyi

def tensor(data: Any, dtype: Optional[_dtype]=None, device: Union[_device, str, None]=None, requires_grad: _bool=False) -> Tensor: ...
Copy the code

Then come torch/_tensor.py

3.2.1 Tensor

You can see that the base class for the Tensor is torch. _c. _TensorBase.

class Tensor(torch._C._TensorBase):
Copy the code

3.2.2 _TensorBase

_TensorBase is dynamically generated in code such as PYTHon_STUbs \ XXX \ torch_c_tensorbase.py

class _TensorBase(object):
Copy the code

As you can see in torch/_C/__init__.pyi.in, torch. _c. _TensorBase is defined in the C++ world, but needs to be exported to the python world.

# Defined in torch/csrc/autograd/python_variable.cpp
class _TensorBase(metaclass=_TensorMeta):
    requires_grad: _bool
    shape: Size
    data: Tensor
    names: List[str]
    device: _device
    dtype: _dtype
    layout: _layout
    real: Tensor
    imag: Tensor
    T: Tensor
    ndim: _int
    output_nr: _int
    _version: _int
    _base: Optional[Tensor]
    _cdata: _int
    grad_fn: Any
    _grad_fn: Any
    _grad: Optional[Tensor]
    _backward_hooks: Optional[Dict[_int, Callable[[Tensor], Optional[Tensor]]]]
    ${tensor_method_hints}
Copy the code

3.3 conversion

This article is a quick look at how to transition from the C++ world to the Python world, and won’t delve into it here.

3.3.1 Python import

Introducing PyTorch into the code is done by importing torch. _C: torch/__init__. Py: torch/__init__. Py: torch/__init__.py: torch/__init__.

from torch._C import *
Copy the code

Torch._C is a C++ compiled shared library file, such as the so file for Linux.

The Tensor class inherits from Torch. _c. _TensorBase. Torch._C imports torch._C._TensorBase, and then torch.Tensor has a base for inheritance. Details are as follows:

+---------------------------+
|      import torch         |
+------------+--------------+
             |
             |
             v
+------------+--------------+
| torch/__init__.py         |
|                           |
|    from torch._C impor *  |
|                           |
+------------+--------------+
             |
             |
             v
+------------+--------------+
|  torch._C._TensorBase     |
+---------------------------+
Copy the code

So let’s look at how torch._C is exported from the C++ world to python.

3.3.2 C++ export & initialization

Let’s look at how the C++ world exports TensorBase.

To be able to import torch._C in Python, you must export the symbol using Python’s extension specification.

3.3.2.1 Shared Library Entry

For a Python Module, the shared library implements the PyInit_modulename symbol as a logical entry point for import. For PyTorch this Modulename is _C. The function PyInit__C is implemented in torch/ CSRC /stub.cpp.

#include <Python.h>
extern PyObject* initModule();
PyMODINIT_FUNC PyInit__C()
{
  return initModule();
}
Copy the code

If use JIT, we see the torch/directly/deploy/interpreter/CSRC interpreter_impl. CPP, omitted many code here.

struct ConcreteInterpreterImpl : public torch::deploy::InterpreterImpl {
  ConcreteInterpreterImpl() {
    PyImport_AppendInittab("torch._C", initModule);     
}
Copy the code

This is the code for the interpreter, which also calls initModule.

3.3.2.2 initModule

The initModule function initializes the Torch Module in the Python environment. It is defined in Torch/CSRC/module.cpp, where a lot of code is omitted.

PyObject* initModule() { THPSize_init(module); THPDtype_init(module); THPDTypeInfo_init(module); THPLayout_init(module); THPMemoryFormat_init(module); THPQScheme_init(module); THPDevice_init(module); THPStream_init(module); ASSERT_TRUE(THPVariable_initModule(module)); _TensorBase ASSERT_TRUE(THPFunction_initModule(module)); ASSERT_TRUE(THPEngine_initModule(module)); }Copy the code

THPVariable_initModule initModule call, the code in the torch/CSRC/autograd/python_variable CPP, here will set _TensorBase.

bool THPVariable_initModule(PyObject *module) { THPVariableMetaType.tp_base = &PyType_Type; if (PyType_Ready(&THPVariableMetaType) < 0) return false; Py_INCREF(&THPVariableMetaType); PyModule_AddObject(module, "_TensorMeta", (PyObject *)&THPVariableMetaType); static std::vector<PyMethodDef> methods; THPUtils_addPyMethodDefs(methods, torch::autograd::variable_methods); THPUtils_addPyMethodDefs(methods, extra_methods); THPVariableType.tp_methods = methods.data(); if (PyType_Ready(&THPVariableType) < 0) return false; Py_INCREF(&THPVariableType); // Set _TensorBase PyModule_AddObject(module, "_TensorBase", (PyObject *)&THPVariableType); torch::autograd::initTorchFunctions(module); torch::autograd::initTensorImplConversion(module); return true; }Copy the code
3.3.2.3 registered TensorBase

When executing THPVariable_initModule, use the following code to register THPVariableType as torch. _c._tensorBase. So torch. _c. _TensorBase is the THPVariableType in c++.

PyModule_AddObject(module, "_TensorBase",   (PyObject *)&THPVariableType); 
Copy the code

Let’s look at THPVariableType. It defines a lot of functions.

PyTypeObject THPVariableType = { PyVarObject_HEAD_INIT(&THPVariableMetaType, 0) "torch._C._TensorBase", /* tp_name */ sizeof(THPVariable), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)THPVariable_dealloc, /* tp_dealloc */ / omit...... Nullptr, /* tp_methods */ nullptr, /* tp_members */ THPVariable_properties, /* tp_getset */ THPVariable_pynew, /* tp_new */ };Copy the code

Now that we have registered the Torch. _c. _TensorBase Python class, we need to register some functions on this class.

Tp_getset is a set of functions within the Python virtual machine class mechanism, which is called THPVariable_properties. Here is the set of functions for _TenseBase, and we can see the familiar faces of grad_fn and grad.

static struct PyGetSetDef THPVariable_properties[] = {
  {"T", (getter)THPVariable_get_T, nullptr, nullptr, nullptr},
  {"_cdata", (getter)THPVariable_get_cdata, nullptr, nullptr, nullptr},
  {"_version", (getter)THPVariable_get_version, nullptr, nullptr, nullptr},
  {"grad_fn", (getter)THPVariable_get_grad_fn, nullptr, nullptr, nullptr},
  {"_grad_fn", (getter)THPVariable_get_grad_fn, (setter)THPVariable_set_grad_fn, nullptr, nullptr},
  {"is_leaf", (getter)THPVariable_is_leaf, nullptr, nullptr, nullptr},
  {"data", (getter)THPVariable_get_data, (setter)THPVariable_set_data, nullptr, nullptr},
  {"_grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr}, // Allows the python class to override .grad
  {"grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr},
  {"_base", (getter)THPVariable_get_base, nullptr, nullptr, nullptr},
  {"volatile", (getter)THPVariable_get_volatile, (setter)THPVariable_set_volatile, nullptr, nullptr},
  {"output_nr", (getter)THPVariable_get_output_nr, nullptr, nullptr, nullptr},
  {"requires_grad", (getter)THPVariable_get_requires_grad, (setter)THPVariable_set_requires_grad, nullptr, nullptr},
  {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr},
  {"name", (getter)THPVariable_get_name, nullptr, nullptr, nullptr},
  {"shape", (getter)THPVariable_get_shape, nullptr, nullptr, nullptr},
  {"is_cuda", (getter)THPVariable_is_cuda, nullptr, nullptr, nullptr},
  {"is_xpu", (getter)THPVariable_is_xpu, nullptr, nullptr, nullptr},
  {"is_sparse", (getter)THPVariable_is_sparse, nullptr, nullptr, nullptr},
  {"is_sparse_csr", (getter)THPVariable_is_sparse_csr, nullptr, nullptr, nullptr},
  {"is_mkldnn", (getter)THPVariable_is_mkldnn, nullptr, nullptr, nullptr},
  {"is_mlc", (getter)THPVariable_is_mlc, nullptr, nullptr, nullptr},
  {"is_vulkan", (getter)THPVariable_is_vulkan, nullptr, nullptr, nullptr},
  {"is_complex", (getter)THPVariable_is_complex, nullptr, nullptr, nullptr},
  {"is_quantized", (getter)THPVariable_is_quantized, nullptr, nullptr, nullptr},
  {"is_meta", (getter)THPVariable_is_meta, nullptr, nullptr, nullptr},
  {"dtype", (getter)THPVariable_dtype, nullptr, nullptr, nullptr},
  {"layout", (getter)THPVariable_layout, nullptr, nullptr, nullptr},
  {"device", (getter)THPVariable_device, nullptr, nullptr, nullptr},
  {"ndim", (getter)THPVariable_get_ndim, nullptr, nullptr, nullptr},
  {"names", (getter)THPVariable_get_names, (setter)THPVariable_set_names, nullptr, nullptr},
  {"real", (getter)THPVariable_get_real, (setter)THPVariable_set_real, nullptr, nullptr},
  {"imag", (getter)THPVariable_get_imag, (setter)THPVariable_set_imag, nullptr, nullptr},
  {nullptr}
};

Copy the code

The initialization logic and mapping logic are as follows:

Python + C++ +---------------+ | | | +---------------------------+ | | PyInit__C | | import torch | | | | +------------+--------------+ | +-------+-------+ | | | | | | v | | +------------+--------------+ | v | torch/__init__.py | | +-------+-------+ | | | | initModule | | from torch._C impor * | | +-------+-------+ | | | | +------------+--------------+ | | | | | | | v | | +--------------+----------------+ | | | | | | | THPVariable_initModule(module)| | | | | | | +--------------+----------------+ | | | | | | | | | | | v | | +-------------------------------+---------------------------------------+ | | | | | | | PyModule_AddObject(module, "_TensorBase",(PyObject *)&THPVariableType)| | | | | | | +-------------------------------+---------------------------------------+ | | | | | | | | | | | v | | +-----------+--------------+ +------------------------------------------------------+ | | | THPVariableType | | THPVariable_properties+ | v | | | | | +------------+--------------+ | | | | | | torch._C._TensorBase | <-----------------------> | tp_getset -----> | { grad, grad_fn, T, _cdata, is_leaf, output_nr ... } | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | | | | | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + +------------------------------------------------------+ +Copy the code

The mobile phone is as follows:

3.4 next_functions set

Since next_functions is the quintessence and next_functions is set in autograd, we need to look at the process of initializing autograd. Then you know how to set next_functions.

3.5 Initializing Autograd

Let’s take AccumulateGrad as an example to see how this is initialized.

An AccumulateGrad member function is omitted. As can be seen from the constructor function, an AccumulateGrad instance must be constructed with a Variable, and the internal member Variable is Variable Variable. The apply call receives an instance of Variable List, which is associated with Variable grad_accumulator_.

struct TORCH_API AccumulateGrad : public Node {
  explicit AccumulateGrad(Variable variable_);
  variable_list apply(variable_list&& grads) override;
  Variable variable;
};   
Copy the code

In older versions, the definition is as follows:

struct AccumulateGrad : public Function {
  explicit AccumulateGrad(Variable variable_);
  variable_list apply(variable_list&& grads) override;
  Variable variable;
};
Copy the code

Let’s see how AccumulateGrad is initialized.

3.5.1 track of extension

After the initModule() function is initialized, the import Torch is still initialized. The Python initialization script continues to handle many modules, such as the torch/__init__.py file:

# Check to see if we can load C extensions, and if not provide some guidance
# on what the problem might be.
try:
    # _initExtension is chosen (arbitrarily) as a sentinel.
    from torch._C import _initExtension
Copy the code

_initExtension is called to _c._initExtension (manager_path()). _C._initExtension corresponds to THPModule_initExtension.

static PyMethodDef TorchMethods[] = {
  {"_initExtension",  THPModule_initExtension, METH_O, nullptr}, 
  // ....
}
Copy the code

The THPModule_initExtension function calls THPAutograd_initFunctions, which initializes the automatic differential system.

// Callback for python part. Used for additional initialization of python classes static PyObject * THPModule_initExtension(PyObject *_unused, PyObject * shM_Manager_PATH) {// Omit THPQInt8Storage_postInit(module); THPQInt32Storage_postInit(module); THPBFloat16Storage_postInit(module); THPComplexDoubleStorage_postInit(module); THPComplexFloatStorage_postInit(module); THPAutograd_initFunctions(); // this is called to initialize the differential system // omit the code}Copy the code

THPAutograd_initFunctions adds a new set of properties or functions to _TensorBase. The addClass method is called to associate AccumulateGrad with accumulate_grad_properties.

void THPAutograd_initFunctions() { THPObjectPtr module(PyModule_New("torch._C._functions")); if (! module) throw python_error(); static PyTypeObject AccumulateGradClass; addClass<AccumulateGrad, NoCtor>(module, AccumulateGradClass, "AccumulateGrad", accumulate_grad_properties); // AccumulateGrad related static PyTypeObject CopyBackwardsClass; addClass<CopyBackwards, NoCtor>(module, CopyBackwardsClass, "CopyBackwards"); // omit other}Copy the code

3.5.2 addClass

AddClass calls registerCppFunction to register type (function_properties), Function_properties is an accumulate_grad_properties parameter. Type is an AccumulateGradClass.

template<typename C, typename T> static void addClass(PyObject* module, PyTypeObject& type, const char* name, PyGetSetDef* function_properties=nullptr, PyMethodDef * function_methods = nullptr) {/ / here set up accumulate_grad_properties createForwardFunctionPyTypeObject < T > (type, name, function_properties, function_methods); Py_INCREF(&type); PyModule_AddObject(module, name, (PyObject*)&type); // Register type registerCppFunction(typeid(C), &type); }Copy the code

There are two groups of operation, one is createForwardFunctionPyTypeObject, one is registerCppFunction. Let’s look at each of them. We first see registerCppFunction, then watch createForwardFunctionPyTypeObject.

3.5.2.1 accumulate_grad_properties

The addClass method associates AccumulateGrad with accumulate_grad_properties. In particular, is through createForwardFunctionPyTypeObject relate accumulate_grad_properties.

Accumulate_grad_properties defined in the torch/CSRC/autograd/functions provides/init. CPP

static struct PyGetSetDef accumulate_grad_properties[] = {
  THP_FUNCTION_DEFAULT_PROPERTIES,
  {(char*)"variable", accumulateGradVar, nullptr, nullptr, nullptr},
  {nullptr}
};
Copy the code

THP_FUNCTION_DEFAULT_PROPERTIES definition in the torch / / autograd python_cpp_function CSRC. H

#define THP_FUNCTION_DEFAULT_PROPERTIES \
  {(char*)"next_functions", (getter)THPCppFunction_next_functions, nullptr, nullptr, nullptr}, \
  {(char*)"requires_grad", (getter)THPCppFunction_requires_grad, nullptr, nullptr, nullptr}, \
  {(char*)"metadata", (getter)THPCppFunction_metadata, nullptr, nullptr, nullptr}
​
PyObject* THPCppFunction_next_functions(THPCppFunction* self, PyObject* hook);
PyObject* THPCppFunction_metadata(THPCppFunction *self, void *_unused);
PyObject* THPCppFunction_requires_grad(THPCppFunction* self, void *_unused);
Copy the code

Accumulate_grad_properties is an extension of THP_FUNCTION_DEFAULT_PROPERTIES and accumulateGradVar.

Static struct PyGetSetDef accumulate_grad_properties[] = {(char*)" function ", (getter)THPCppFunction_next_functions, nullptr, nullptr, nullptr}, {(char*)"requires_grad", (getter)THPCppFunction_requires_grad, nullptr, nullptr, nullptr}, {(char*)"metadata", (getter)THPCppFunction_metadata, nullptr, nullptr, nullptr} {(char*)"variable", accumulateGradVar, nullptr, nullptr, nullptr}, {nullptr} };Copy the code

THPCppFunction_next_functions THPCppFunction_next_functions

+-----------------------------------------------------------------------+
|accumulate_grad_properties                                             |
|                                                                       |
|                                                                       |
|                                                                       |
|              "variable", accumulateGradVar                            |
|                                                                       |
|                                                                       |
|              "next_functions", (getter)THPCppFunction_next_functions  |
|                                                                       |
|                                                                       |
|              "requires_grad", (getter)THPCppFunction_requires_grad    |
|                                                                       |
|                                                                       |
|              "metadata", (getter)THPCppFunction_metadata              |
|                                                                       |
+-----------------------------------------------------------------------+
​
Copy the code
3.5.2.3 createForwardFunctionPyTypeObject

CreateForwardFunctionPyTypeObject is used to set the accumulate_grad_properties, specific functions are as follows:

template<typename Ctor>
PyTypeObject* createForwardFunctionPyTypeObject(PyTypeObject& type, const char* name,
  PyGetSetDef* function_properties=nullptr, PyMethodDef* function_methods=nullptr)
{
  type.tp_new = &CppFunction_pynew<Ctor>;
  return _initFunctionPyTypeObject(type, name, function_properties, function_methods);
}
Copy the code

_initFunctionPyTypeObject sets function_properties above tp_getSet.

PyTypeObject* _initFunctionPyTypeObject(PyTypeObject& type, const char* name, PyGetSetDef* function_properties, PyMethodDef* function_methods) { type.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC; type.tp_name = name; type.tp_basicsize = sizeof(THPCppFunction); type.tp_call = THPCppFunction_call; type.tp_methods = function_methods ? function_methods : default_methods; Type. Tp_getset = function_properties? function_properties : default_properties; type.tp_dealloc = THPCppFunction_dealloc; type.tp_traverse = THPCppFunction_traverse; type.tp_clear = THPCppFunction_clear; if (PyType_Ready(&type) < 0) { auto msg = std::string("Unable to instantiate PyTypeObject for ") + name; throw std::runtime_error(msg); } return &type; }Copy the code

So THPCppFunction_next_functions was added to the Next_functions of the AccumulateGradClass. AccumulateGradClass has a set of functions where next_functions corresponds to THPCppFunction_next_functions.

+---------------------+
| AccumulateGradClass |
|                     |
|       tp_getset     |
|           +         |
|           |         |
+---------------------+
            |
            |
            v
+-----------+-----------------------------------------------------------+
|accumulate_grad_properties                                             |
|                                                                       |
|                                                                       |
|                                                                       |
|              "variable", accumulateGradVar                            |
|                                                                       |
|                                                                       |
|              "next_functions", (getter)THPCppFunction_next_functions  |
|                                                                       |
|                                                                       |
|              "requires_grad", (getter)THPCppFunction_requires_grad    |
|                                                                       |
|                                                                       |
|              "metadata", (getter)THPCppFunction_metadata              |
|                                                                       |
+-----------------------------------------------------------------------+
Copy the code

Let’s recall the _TenseBase mentioned earlier for comparison:

static struct PyGetSetDef THPVariable_properties[] = {
  {"grad_fn", (getter)THPVariable_get_grad_fn, nullptr, nullptr, nullptr},
  {"_grad_fn", (getter)THPVariable_get_grad_fn, (setter)THPVariable_set_grad_fn, nullptr, nullptr},
  {"is_leaf", (getter)THPVariable_is_leaf, nullptr, nullptr, nullptr},
  {"data", (getter)THPVariable_get_data, (setter)THPVariable_set_data, nullptr, nullptr},
  {"_grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr}, // Allows the python class to override .grad
  {"grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr},
  {"_base", (getter)THPVariable_get_base, nullptr, nullptr, nullptr},
  {"output_nr", (getter)THPVariable_get_output_nr, nullptr, nullptr, nullptr},
  {"requires_grad", (getter)THPVariable_get_requires_grad, (setter)THPVariable_set_requires_grad, nullptr, nullptr},
  {"_backward_hooks", (getter)THPVariable_get_backwards_hooks,  
  .....
};
​
Copy the code

At this point, the business logic is as follows:

                                Python    +   C++
                                          |
+--------------------------------------+  |   +---------------------------+
| torch/__init__.py                    |  |   |                           |
|                                      |  |   |  THPModule_initExtension  |
|  from torch._C import _initExtension |  |   |                           |
|                                      |  |   +--------------+------------+
+-------------------+------------------+  |                  |
                    |                     |                  |
                    |                     |                  v
                    |                     |  +---------------+--------------+
                    |                     |  |                              |
                    |                     |  |  THPAutograd_initFunctions() |
                    |                     |  |                              |
                    |                     |  +---------------+--------------+
                    |                     |                  |
                    |                     |                  |
                    |                     |                  v
                    |                     |  +---------------+-------------------------------------------+
                    |                     |  |                                                           |
                    |                     |  | addClass<AccumulateGrad, NoCtor>(module,                  |
                    |  import             |  |                               AccumulateGradClass,        |
                    |                     |  |                               "AccumulateGrad",           |
                    |                     |  |                               accumulate_grad_properties) |
                    |                     |  |                                                           |
                    |                     |  +--------------+--------------------------------------------+
                    |                     |                 |
                    |                     |                 |  register
                    v                     |                 v
                                          |                                                               +----------------------------------------------------------+
        +----------------------+          |     +--------------------+       +---------------------+      |accumulate_grad_properties                                |
        |                      |          |     |                    |       | AccumulateGradClass |      |                                                          |
        |   AccumulateGrad     | <------------> |   AccumulateGrad   +-----> |                     |      |  "variable", accumulateGradVar                           |
        |                      |          |     |                    |       |       tp_getset +------->  |                                                          |
        |                      |          |     |                    |       |                     |      |  "next_functions", (getter)THPCppFunction_next_functions |
        +----------------------+          |     +--------------------+       |                     |      |                                                          |
                                          |                                  +---------------------+      |  "requires_grad", (getter)THPCppFunction_requires_grad   |
                                          |                                                               |                                                          |
                                          |                                                               |  "metadata", (getter)THPCppFunction_metadata             |
                                          |                                                               |                                                          |
                                          |                                                               +----------------------------------------------------------+
​
Copy the code

The mobile phone is as follows:

3.5.2.4 next_functions

THPCppFunction_next_functions defined in the torch / / autograd python_cpp_function CSRC CPP, its is traversal next_edges_, then extracted a tuple list, The contents of each tuple are (edge. function, edge. input_nr) and are returned as next_functions.

PyObject* THPCppFunction_next_functions(THPCppFunction* self, PyObject* hook) { const auto num_next = self->cdata->num_outputs(); THPObjectPtr py_functions(PyTuple_New(num_next)); if (! py_functions) return nullptr; for (size_t i = 0; i < num_next; + + I) {/ / traverse auto & c_tuple = self - > cdata - > next_edge (I); // Get Edge THPObjectPtr tuple(PyTuple_New(2)); if (! tuple) return nullptr; PyObject *py_fn = functionToPyObject(c_tuple.function); // py_fn is edge. function if (! py_fn) return nullptr; PyTuple_SET_ITEM(tuple.get(), 0, py_fn); PyObject *py_idx = THPUtils_packUInt32(c_tuple.input_nr); // py_idx is edge. input_nr if (! py_idx) return nullptr; PyTuple_SET_ITEM(tuple.get(), 1, py_idx); // tuple is (py_fn, py_idx), which is (edge.function, edge.input_nr) PyTuple_SET_ITEM(py_functions.get(), I, tupl.release ()); } return py_functions.release(); // return tuple}Copy the code

Next_edge defined in the torch/CSRC autograd/function. H, it is a member of the Node function, and returns the list of Edge, and AccumulateGrad is derived classes of the Node.

struct TORCH_API Node : std::enable_shared_from_this<Node> { const Edge& next_edge(size_t index) const noexcept { return next_edges_[index]; } edge_list next_edges_; // Input variable, the edge associated with the operator}Copy the code

Edge is defined as follows:

struct Edge { /// The function this `Edge` points to. std::shared_ptr<Node> function; // The identifier of a particular input to The function. Uint32_t input_nr; // specify the Edge input of function};Copy the code

3.5.3 next_functions nature

Therefore, we take AccumulateGrad as an example to summarize the following.

  • Grad_fn has a property next_functions, which is a two-dimensional tuple of the form ((function 1, integer 1), (function 2, integer 2),… , (function N, integer N).
  • Function (edge.input_nr); next_functions (edge.input_nr); next_functions (edge. function, edge.input_nr); This list is generated by THPCppFunction_next_functions.
  • The Next_functions of AccumulateGrad point to a tuple list (2). This is an AccumulateGradClass. When propagating back, the next_functions compute the gradient step by step.

Roughly as follows:

+-----------------+ +-----------------------+ +----------------------+ +---------------------+ | Tensor | | SubBackward0  | | PowBackward0 | | AccumulateGrad | | | | | | | | | | grad_fn +---->+ next_functions +-----+--> | next_functions +----> | next_functions +----> {} | | | | | | | | | +-----------------+ +-----------------------+ | +----------------------+ +---------------------+ | | | +----------------------+ +----------------------+ +---------------------+ | | MulBackward0 | | PermuteBackward | | AccumulateGrad | +--> | | | | | | | next_functions +----> | next_functions +----> | next_functions +-----+ | | | | | | | +---------------------+ ++-------------------- -+ +----------------------+ +---------------------+ | | AccumulateGradClass | | | | | | tp_getset | 2. point to the tuple list | + | | | | | | +---------------------+ | | v | v +-----> { (function 1, int 1), (function 2, int 2) ... (function n, int n) } +-----------+-----------------------------------------------------+ | |accumulate_grad_properties | | | | | | "variable", accumulateGradVar | | | | | | "next_functions", (getter)THPCppFunction_next_functions +--------+ | | 1. generate the tuple list | "requires_grad", (getter)THPCppFunction_requires_grad | | | | "metadata", (getter)THPCppFunction_metadata | | | +-----------------------------------------------------------------+Copy the code

The mobile phone is as follows:

At this point, part of the basic classes are parsed, because of text limitations, we will continue to analyze other basic classes in the next article.

0xEE Personal information

★★★★ Thoughts on life and technology ★★★★★

Wechat official account: Rosie’s Thoughts

0 XFF reference

Github.com/KeithYin/re…

Pytorch Learning Note (13) : Low-level implementation parsing of backward processes

Initialization of PyTorch

Pytorch’s automatic derivative mechanism – the establishment of computational graph

How autograd encodes the history

Pytorch.org/tutorials/b…

Pytorch Note (Calculation diagram + Autograd)-Node(1)

Explain the network construction in Pytorch

PyTorch’s optimizer

Distribution of PyTorch

PyTorch’s Tensor

PyTorch’s Tensor

PyTorch’s Tensor

PyTorch dynamic diagram (part 2)

PyTorch dynamic diagram (part 1)

Calculation diagram — Explain teacher Li Hongyi’s PPT with Pytorch

How to use PyTorch to find gradients automatically

PyTorch Automatic Derivative (Autograd) principle analysis

Pytorch Automatic Derivation of Autograd

PyTorch’s core developers take the inner workings of the game personally

PyTorch Automatic differential fundamentals

Towardsdatascience.com/pytorch-aut…