0 x00 the

This series will take a look at how PyTorch’s auto-differentiation capabilities are implemented in about ten articles. This article, the first of a series of backsteps, covers the call process: how to get from Python code into the C++ autograd engine.

The previous chapters in the series are linked as follows:

Automatic Differentiation of Deep Learning Tools (1)

Automatic Differentiation of Deep Learning Tools (2)

Automatic differentiation of Deep Learning Tools (3) — Interpretation of examples

PyTorch implements forward propagation (1) — Base class (1)

PyTorch implements forward propagation (2) — Base class (2)

PyTorch how to implement forward propagation (3) – implementation

0x01 Previous review

Let’s first look at the connection between forward and backward transmission from three perspectives.

1.1 Training Process

Let’s start by recalling the training process.

A neural network (NN) is a collection of nested functions that are performed on some input data. These functions are defined by parameters (consisting of weights and deviations) that are stored in tensors in PyTorch. NN is trained in two steps:

  • Forward propagation: In forward propagation, the neural network makes its best guess at the correct output. It makes this guess by running the input data through each of its functions.
  • Back propagationIn back propagation, the neural network adjusts its parameters proportionally according to the error in its guess. It collects information about function parameters by iterating back from the output (The gradient), and uses gradient descent optimization parameters to achieve this.

1.2 example

Second, let’s recall the previous example.

def train_loop(model, optimizer, iterations): for _ in range(iterations): Optimizer.zero_grad () output = model(input) # Target) # Calculate loss loss.backward() # spread optimizer.step()Copy the code

After the forward calculation is complete, we have the dependencies of the computed graph, and we can start propagating backward. We need to start our analysis with BACKWARD.

1.3 Source Code Analysis

Sub_Tensor does the following for forward calculation result:

  • How do I know to call backcalculation: result is the result of the forward calculationautograd_meta_, which is a DifferentiableViewMeta type. The grad_fn_ of the DifferentiableViewMeta is the gradient function calculated in reverse.Grad_fn_ points to SubBackward0.
  • How the backpropagation is calculated: Call SubBackward0 to calculate.
  • Input to SubBackward0: The output result of the forward calculation is obtained (which will be set to subbackward0.input_metadata_ when propagated back).
  • The output of the SubBackward0Constructed:next_edges_As the output edge when it propagates back. According to thenext_edges_And you get the reverse conduction diagram.

Now that we have sorted out the relationship between forward communication and backward communication, we will look at how to enter the link of backward communication.

0x02 Python Call Procedure

2.1 call

We started with Torch /_tensor.py, where there are two functions that compute gradients, and we took backward and looked at them.

def backward(self, gradient=None, retain_graph=None, create_graph=False, inputs=None):
    r"""Computes the gradient of current tensor w.r.t. graph leaves.
    """
    if has_torch_function_unary(self):
        return handle_torch_function(
            Tensor.backward,
            (self,),
            self,
            gradient=gradient,
            retain_graph=retain_graph,
            create_graph=create_graph,
            inputs=inputs)
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
Copy the code

Then comes Torch /autograd/__init__.py. The main logic behind this backward is:

  • Input parameters are used to construct input and gradient tensors.
  • Reconstruct elements from grad_tensors into tuple(list(torch.Tensor,…) using _make_grads. ).
  • Backward propagation is then performed with variable._execution_engine.run_BACKWARD.
def backward( tensors: _TensorOrTensors, grad_tensors: Optional[_TensorOrTensors] = None, retain_graph: Optional[bool] = None, create_graph: bool = False, grad_variables: Optional[_TensorOrTensors] = None, inputs: Optional[_TensorOrTensors] = None, ) -> None: r"""Computes the sum of gradients of given tensors with respect to graph leaves. """ if grad_variables is not None: warnings.warn("'grad_variables' is deprecated. Use 'grad_tensors' instead.") if grad_tensors is None: grad_tensors = grad_variables else: raise RuntimeError("'grad_tensors' and 'grad_variables' (deprecated) " "arguments both passed to backward(). Please only  " "use 'grad_tensors'.") if inputs is not None and len(inputs) == 0: Raise RuntimeError("'inputs' argument to backward() cannot be empty.") # Construct input and gradient tensors = (tensors,) if using input parameters isinstance(tensors, torch.Tensor) else tuple(tensors) inputs = (inputs,) if isinstance(inputs, Torch.Tensor) else \ tuple(inputs) if inputs is not None else tuple() # _make_grad_tensors Re-organize the elements into tuple(List (torch.Tensor,...) Grad_tensors_ = _tensor_or_tensors_to_tuple(Grad_tensors, Len (tensors)) grad_tensors_ = _make_grads(tensors, grads) grad_tensors_) if retain_graph is None: Run_backward (tensors, grad_tensors_, retain_graph, create_graph, inputs, allow_unreachable=True, accumulate_grad=True) # allow_unreachable flagCopy the code

Variable._execution_engine.run_backward this is where we start to get into the C++ world.

Python + C++ | | | backward | + | | | | | | | v | Variable._execution_engine.run_backward +----------> | | | | | | | | |  | | | | | +Copy the code

2.2 the engine

Torch/autograd/variable. Py files, generate the _execution_engine.

from torch._C import _ImperativeEngine as ImperativeEngine
​
Variable._execution_engine = ImperativeEngine()
Copy the code

From torch/_C/__init__.pyi.in we can see that the C++ world should go to python_engine.cpp for the answer.

# Defined in torch/csrc/autograd/python_engine.cpp
class _ImperativeEngine:
Copy the code

0 x03 c + + world

Once we get into the C++ world, we slow down and remember the support system first, otherwise it gets too complicated.

3.1 Support System

3.1.1 Edge

Edge represents edges in a graph by pairing function, input_NR.

using tensor_list = std::vector<at::Tensor>; using variable_list = std::vector<Variable>; using edge_list = std::vector<Edge>; using saved_variable_list = std::vector<SavedVariable>; using IndexRange = std::pair<size_t, size_t>; /// Represents a particular input of a function. struct Edge { Edge() noexcept : function(nullptr), input_nr(0) {} Edge(std::shared_ptr<Node> function_, uint32_t input_nr_) noexcept : function(std::move(function_)), input_nr(input_nr_) {} /// The function this `Edge` points to. std::shared_ptr<Node> function; // The identifier of a particular input to The function. Uint32_t input_nr; // specify which input of function is in the back propagation of the Edge; }} // namespace torch::autogradCopy the code

3.1.2 Edge correlation function

The torch/CSRC autograd/function. H is a function of edge related here. All functions of the Node class.

  void set_next_edge(size_t index, Edge edge) {
    update_topological_nr(edge);
    next_edges_[index] = std::move(edge);
  }
​
  void add_next_edge(Edge edge) {
    update_topological_nr(edge);
    next_edges_.push_back(std::move(edge));
  }
​
  void set_next_edges(edge_list&& next_edges) {
    next_edges_ = std::move(next_edges);
    for(const auto& next_edge : next_edges_) {
      update_topological_nr(next_edge);
    }
  }
​
  const Edge& next_edge(size_t index) const noexcept {
    return next_edges_[index];
  }
​
  const edge_list& next_edges() const noexcept {
    return next_edges_;
  }
​
  edge_list& next_edges() noexcept {
    return next_edges_;
  }
​
  uint32_t num_outputs() const noexcept {
    return next_edges_.size();
  }
Copy the code

Torch / / jit/runtime/CSRC graph_executor. There are also some of the CPP edge correlation function.

void addOutputForTensor(const at::Tensor& tensor) { auto v = Variable(tensor); add_next_edge( v.defined() ? torch::autograd::impl::gradient_edge(v) : autograd::Edge{}); } void addOutputForIValue(const IValue& value) { if (value.isTensorList()) { for (const at::Tensor tensor : value.toTensorList()) { addOutputForTensor(tensor); } } else if (value.isTensor()) { addOutputForTensor(value.toTensor()); } else { // We could have None passed here via `Optional[Tensor]` add_next_edge(autograd::Edge{}); }}Copy the code

Gradient_edge (gradient_edge, gradient_edge, gradient_edge, gradient_edge, gradient_edge)

Edge gradient_edge(const Variable& self) { // If grad_fn is null (as is the case for a leaf node), we instead // interpret the gradient function to be a gradient accumulator, which will // accumulate its inputs into the grad property of the variable. These // nodes get suppressed in some situations, see "suppress gradient // accumulation" below. Note that only variables which have `requires_grad = // True` can have If (const auto& gradient = self.grad_fn()) {// This is an intermediate node. Return Edge(gradient, self.output_nr()); return Edge(gradient, self.output_nr()); // self.output_nr() indicates that Edge is the NTH input to function. The NTH output in forward propagation is the NTH input in back propagation. } else { return Edge(grad_accumulator(self), 0); // This is an AccumulateGrad with 0 indicating that Edge is the first input of function}}Copy the code

3.1.3 Python extension

Let’s move on to Python extensions. In general, people don’t write Python modules directly in C. Instead, they write C modules directly, and then wrap them so Python can be called directly. The process goes something like this:

  1. C introduced the python. h header file.
  2. Write a wrapper function that handles arguments passed in from the Python world.
  3. C language to achieve functional logic.
  4. Wrap C return values as Python objects.
  5. Register the required functions in the PyMethodDef structure.
  6. Register the module name in the initialization method.
  7. Compile C source files into a link library for Python to use.

PyMethodDef is defined as follows:

\

typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);

struct PyMethodDef {
    const char  *ml_name;   /* The name of the built-in function/method */
    PyCFunction ml_meth;    /* The C function that implements it */
    int         ml_flags;   /* Combination of METH_xxx flags, which mostly
                               describe the args expected by the C func */
    const char  *ml_doc;    /* The __doc__ attribute, or NULL */
};
typedef struct PyMethodDef PyMethodDef;
Copy the code

3.2 introduced

3.2.1 initialization

In torch/ CSRC/module.cpp, the initModule performs C++ world initialization. This is a huge function, and for the purposes of this article, we’ll focus only on THPFunction_initModule and THPEngine_initModule, omitting a lot of code.

PyObject* initModule() { ...... ASSERT_TRUE(THPFunction_initModule(module)); ASSERT_TRUE(THPEngine_initModule(module)); . }Copy the code
3.2.1.1 Initializing the Inheritance System

At initialization, THPFunction_initModule(module) creates torch. _c._functionBase.

bool THPFunction_initModule(PyObject *module) { if (PyType_Ready(&THPFunctionType) < 0) return false; Py_INCREF(&THPFunctionType); // Created 'torch. _c._functionbase' PyModule_AddObject(module, "_FunctionBase", (PyObject *)&THPFunctionType); return true; }Copy the code

In the torch/autograd/function. Py, has the following two classes with the torch. _C. _FunctionBase as base class:

class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin))
class BackwardCFunction(_C._FunctionBase, _ContextMethodMixin, _HookMixin)
Copy the code
3.2.2.2 Initializing an Engine

THPEngine_initModule(module) creates the torch._C._EngineBase class, which is responsible for preprocessing the dynamic diagram before execution. _EngineBase preprocesses a request like a BACKWARD in Torch. Autograd and sends it to the real Engine to execute.

PyObject* initModule() { ...... ASSERT_TRUE(THPVariable_initModule(module)); ASSERT_TRUE(THPFunction_initModule(module)); ASSERT_TRUE(THPEngine_initModule(module)); // Initialize the engine}Copy the code

THPEngine_initModule registers the THPEngineType object into the module (a PyObject type) with the function PyModule_AddObject and names it _ImperativeEngine. This is the Python _ImperativeEngine.

bool THPEngine_initModule(PyObject *module) { #ifndef _WIN32 if (pthread_atfork(nullptr, nullptr, child_atfork) ! = 0) { throw std::runtime_error("unable to set pthread_atfork handler"); } #endif if (PyType_Ready(&THPEngineType) < 0) return false; Py_INCREF(&THPEngineType); // PyModule_AddObject(module, "_ImperativeEngine", (PyObject *)&THPEngineType) is registered for Python; set_default_engine_stub(python::PythonEngine::get_python_engine); return true; }Copy the code

THPEngineType is defined as follows. As you can see, the generated instance is “torch. _c. _EngineBase”.

PyTypeObject THPEngineType = {
  PyVarObject_HEAD_INIT(nullptr, 0)
  "torch._C._EngineBase",                      /* tp_name */
  sizeof(THPEngine),                           /* tp_basicsize */
  0,                                           /* tp_itemsize */
  nullptr,                                     /* tp_dealloc */
  0,                                           /* tp_vectorcall_offset */
  nullptr,                                     /* tp_getattr */
  nullptr,                                     /* tp_setattr */
  nullptr,                                     /* tp_reserved */
  nullptr,                                     /* tp_repr */
  nullptr,                                     /* tp_as_number */
  nullptr,                                     /* tp_as_sequence */
  nullptr,                                     /* tp_as_mapping */
  nullptr,                                     /* tp_hash  */
  nullptr,                                     /* tp_call */
  nullptr,                                     /* tp_str */
  nullptr,                                     /* tp_getattro */
  nullptr,                                     /* tp_setattro */
  nullptr,                                     /* tp_as_buffer */
  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,    /* tp_flags */
  nullptr,                                     /* tp_doc */
  nullptr,                                     /* tp_traverse */
  nullptr,                                     /* tp_clear */
  nullptr,                                     /* tp_richcompare */
  0,                                           /* tp_weaklistoffset */
  nullptr,                                     /* tp_iter */
  nullptr,                                     /* tp_iternext */
  THPEngine_methods,                           /* tp_methods */
  nullptr,                                     /* tp_members */
  nullptr,                                     /* tp_getset */
  nullptr,                                     /* tp_base */
  nullptr,                                     /* tp_dict */
  nullptr,                                     /* tp_descr_get */
  nullptr,                                     /* tp_descr_set */
  0,                                           /* tp_dictoffset */
  nullptr,                                     /* tp_init */
  nullptr,                                     /* tp_alloc */
  THPEngine_new                                /* tp_new */
};
Copy the code

3.2.3 Connect with the Python world

Now that the C++ engine has been linked to the Python engine, let’s look at the specific functions of the engine.

For torch._C._EngineBase, the member function is THPEngine_methods. The type of THPEngine_methods is PyMethodDef, which we introduced earlier, for Python extensions. Run_backward, queue_callback, and IS_checkpoint_VALID are defined here. Recall that run_BACKWARD is the entry point for the Python world.

static struct PyMethodDef THPEngine_methods[] = { {(char*)"run_backward", CastPyCFunctionWithKeywords (THPEngine_run_backward), / / and the corresponding Python METH_VARARGS | METH_KEYWORDS, nullptr}, {(char*)"queue_callback", THPEngine_queue_callback, METH_O, nullptr}, {(char*)"is_checkpoint_valid", THPEngine_is_checkpoint_valid, METH_NOARGS, nullptr}, {nullptr} };Copy the code

As defined in PyMethodDef above, “run_BACKWARD” is the method name, and THPEngine_run_backward is the corresponding C method. Thus, variable. _execution_engine.run_backward in the Python world corresponds to THPEngine_run_backward.

Python + C++ | | initModule | + | | | | | | | v backward | THPEngine_initModule + | + | | | | | | | | | v | v Variable._execution_engine.run_backward | PyModule_AddObject(module, "_ImperativeEngine", &THPEngineType) + | + | | | | | | | | v | | | | +----------------------------------------------------------+ v | | module | | | | +-------------------------+ | | +---------------------------------------------------+ | | _ImperativeEngine | | | | _ImperativeEngine | | Variable._execution_engine +---> | | | | | | | | | | | | +----------------------------------------------+ | | | | | | | | THPEngine_methods | | | | | | | | | | | | | | | | | | |  | | | run_backward +-----------------------------> "run_backward" : THPEngine_run_backward | | | | | | | | | | | | | | | | | +----------------------------------------------+ | | +-------------------------+ | | | | | | | +---------------------------------------------------+ | | | | + +----------------------------------------------------------+Copy the code

The mobile phone is as follows:

We then analyze THPEngine_run_backward in the C++ world.

3.3 C++ engine entrance

THPEngine_run_backward is the entrance to the c + + engine, is located in the: torch / / autograd python_engine CSRC. CPP.

The main logic is as follows:

  • First, the input argument is parsed through the PyArg_ParseTupleAndKeywords function and assigned to the newly defined variable: PyArg_ParseTupleAndKeywords

    • The new variable is:tensors.grad_tensors.keep_graph.create_graph.inputsAs well asallow_unreachable. Inputs, for example, are a vector.
    • The input in the python world is torch. Autograd. backward(tensors, grad_tensors). These parameters are converted to the tensors and grad_tensors variables in the C++ world, respectively. These two variables are of type PyObject in C++ and have size 1. PyObject is the base class of any Python object. In this method, tensors and grad_tensors are actually instances of the THPVariable class.
  • Obtain the input and gradient tensors from the input, mainly to check whether the variable types and tuple sizes of Tensors and grad_tensors are consistent.

  • Three variables are constructed from the inputs, edge_list roots, output_edges and variable_list grads, which respectively indicate the starting points of back propagation, edge information and gradient of the model’s final output.

    • Roots are gradient_edge() with forward propagation output nodes (i.e(grad_fn_, 0)The vector). Be careful,grad_fn_Is a derived class of Node, so roots is Node.
    • Grads is the gradient produced by forward propagation, which, if not configured, is initialized to (tensor(1.),).
    • Output_edges are the backward propagation output edges constructed from the forward propagation input nodes in inputs.
  • Call outputs = engine. Execute (roots, grads, keep_graph, create_graph, output_edges) to formally enter the back propagation engine.

The specific code is as follows:

// Implementation of torch._C._EngineBase.run_backward PyObject *THPEngine_run_backward(PyObject *self, PyObject *args, PyObject *kwargs) { HANDLE_TH_ERRORS PyObject *tensors = nullptr; PyObject *grad_tensors = nullptr; unsigned char keep_graph = 0; unsigned char create_graph = 0; PyObject *inputs = nullptr; unsigned char allow_unreachable = 0; unsigned char accumulate_grad = 0; // Indicate whether to accumulate grad into leaf Tensors or capture const char *accepted_kwargs[] = { // NOLINT "tensors", "grad_tensors", "keep_graph", "create_graph", "inputs", "allow_unreachable", "accumulate_grad", nullptr }; // Reinterpret the input parameters and assign values to the newly defined variables tensors,grad_tensors, etc., such as inputs: a vector if (! PyArg_ParseTupleAndKeywords(args, kwargs, "OObb|Obb", (char**)accepted_kwargs, &tensors, &grad_tensors, &keep_graph, &create_graph, &inputs, &allow_unreachable, &accumulate_grad)) return nullptr; // Obtain the input and gradient tensors from the input, mainly to check whether tensors and grad_tensors have the same variable type and tuple size. Py_ssize_t num_tensors = PyTuple_GET_SIZE(tensors); Py_ssize_t num_gradients = PyTuple_GET_SIZE(grad_tensors); THPUtils_assert(num_tensors == num_gradients, "got %ld tensors and %ld " "gradients", num_tensors, num_gradients); // The user either called autograd.backward(...) or autograd.grad(...) to get here bool backward_api_called = accumulate_grad; // using variable_list = STD ::vector<Variable>; // using edge_list = std::vector<Edge>; edge_list roots; // Is the starting point of backpropagation (root node) roots. Reserve (num_tensors); variable_list grads; // grads Reserve (num_tensors); // Configure roots and grads for (int I = 0; i < num_tensors; // Tensors are input nodes, i.e. the output of the forward propagation diagram. PyObject *_tensor = PyTuple_GET_ITEM(Tensors, I); THPUtils_assert(THPVariable_Check(_tensor), "Element %d of tensors" = gradient_edge (grad_fn(), output_nr()) auto gradient_edge = torch::autograd::impl::gradient_edge(variable); roots.push_back(std::move(gradient_edge)); // add Edge PyObject *grad = PyTuple_GET_ITEM(grad_tensors, I); if (THPVariable_Check(grad)) { const Variable& grad_var = THPVariable_Unpack(grad); if (grad_var.has_names()) { TORCH_WARN( "Autograd was passed a named grad tensor with dims ", grad_var.names(), ". Autograd does not yet support named tensor semantics, so all names ", "will be ignored. In practice all computed gradients will still be correct " "according to regular tensor semantics."); } grads.push_back(grad_var); STD ::vector<Edge> output_edges; if (inputs ! = nullptr) { int num_inputs = PyTuple_GET_SIZE(inputs); output_edges.reserve(num_inputs); // iterate over the input list for (int I = 0; i < num_inputs; ++i) { PyObject *input = PyTuple_GET_ITEM(inputs, i); const auto& tensor = THPVariable_Unpack(input); const auto output_nr = tensor.output_nr(); auto grad_fn = tensor.grad_fn(); if (! Get grad_accumulator grad_fn) {/ /, used to judge whether a leaf node grad_fn = torch: : autograd: : impl: : try_get_grad_accumulator (tensor); } if (! grad_fn) { // NOTE [ Autograd Unreachable Input ] // Since input has no grad_accumulator, its guaranteed to be unreachable. // We initialize an edge pointing to a non-nullptr Node so nodes in the graph // (e.g., mul when an operand is scalar) that have edges pointing to nullptr // don't get erroneously assigned `needed = True` in Emplace_back (STD ::make_shared<Identity>(), 0); } else {// is the middle node output_beta.emplace_back (grad_fn, output_nr); }}} // Roots are now a vector containing (grad_fn_, 0 for forward-propagating output nodes). // grads is a gradient produced by forward propagation, initialized as (tensor(1.) if not configured,) // Output_edges are outputs constructed from the input node input of forward propagation variable_list; { pybind11::gil_scoped_release no_gil; auto& engine = python::PythonEngine::get_python_engine(); Outputs = engine. Execute (roots, grads, keep_graph, create_graph, accumulate_grad, output_edges); } if (! backward_api_called && inputs ! = nullptr) { int num_inputs = PyTuple_GET_SIZE(inputs); THPObjectPtr py_outputs {PyTuple_New(num_inputs)}; if (! py_outputs) return nullptr; for (int i = 0; i < num_inputs; i++) { PyTuple_SET_ITEM(py_outputs.get(), i, THPVariable_Wrap(outputs[i])); } return py_outputs.release(); } else { Py_RETURN_NONE; } END_HANDLE_TH_ERRORS }Copy the code

Let’s next examine a few helper functions used with THPEngine_run_backward.

3.3.1 try_get_grad_accumulator

The code above, there are grad_fn = torch: : autograd: : impl: : try_get_grad_accumulator (tensor) for computing the gradient method. Grad_accumulator_ is the only non-leaf node that is not empty.

Try_get_grad_accumulator returns a pointer to a Node object: STD :: Weak_ptr

grad_accumulator_. That’s how you compute the gradient.

The specific logic is:

  • Let’s go through the delta functionget_autograd_metaReturns aAutogradMetaStructure.
  • Then access the member variables in the structuregrad_accumulator_And thegrad_accumulator_Is a pointer to typeNodeThe object’sstd::weak_ptrPointer.
  • At last,lock()The function creates astd::shared_ptrTo manage objects.
std::shared_ptr<Node> try_get_grad_accumulator(const Variable& self) { if (get_autograd_meta(self)) { return get_autograd_meta(self)->grad_accumulator_.lock(); } else { return nullptr; }}Copy the code

3.3.2 rainfall distribution on 10-12 gradient_edge

In the code above gradient_edge is used to build an Edge from the input tensor.

auto gradient_edge = torch::autograd::impl::gradient_edge(variable); roots.push_back(std::move(gradient_edge)); // add Edge to rootCopy the code

Gradient_edge is as follows:

Edge gradient_edge(const Variable& self) { // If grad_fn is null (as is the case for a leaf node), we instead // interpret the gradient function to be a gradient accumulator, which will // accumulate its inputs into the grad property of the variable. These // nodes get suppressed in some situations, see "suppress gradient // accumulation" below. Note that only variables which have `requires_grad = // True` can have gradient accumulators. if (const auto& gradient = self.grad_fn()) { return Edge(gradient, self.output_nr()); } else { return Edge(grad_accumulator(self), 0); }}Copy the code

3.3.3 output_edges

In the code above, STD :: Vector output_edges builds a list of output edges.

After receiving grad_accumulator_, we assign grad_fn to this node to determine whether it is a leaf node. Then construct the leaf node and the intermediate node respectively and place them among the output_edges.

if (! grad_fn) { // NOTE [ Autograd Unreachable Input ] // Since input has no grad_accumulator, its guaranteed to be unreachable. // We initialize an edge pointing to a non-nullptr Node so nodes in the graph // (e.g., mul when an operand is scalar) that have edges pointing to nullptr // don't get erroneously assigned `needed = True` in exec_info. output_edges.emplace_back(std::make_shared<Identity>(), 0); } else {output_edges. Emplace_back (grad_fn, output_nr); // non-leaf node}Copy the code

Let’s look at the variables grad_fn and output_nr that make up the output_edges and see where they come from.

Grad_fn is a STD :: shareD_ptr pointer to a Node object obtained from the try_get_grad_accumulator method. This is how gradients are calculated.

Output_pr is set as follows, resulting in uint32_t output_nr_ as a member variable in the AutogradMeta structure.

const auto output_nr = tensor.output_nr();
Copy the code

The emplace_back() function adds a temporary object to the container. The temporary object is constructed in place without assignment or movement.

Recall the definition of Edge. As you can see, emplace_back() uses these inputs to generate an Edge.

/// Represents a particular input of a function. struct Edge { Edge() noexcept : function(nullptr), input_nr(0) {} Edge(std::shared_ptr<Node> function_, uint32_t input_nr_) noexcept : function(std::move(function_)), input_nr(input_nr_) {} /// The function this `Edge` points to. std::shared_ptr<Node> function; // The identifier of a particular input to The function. Uint32_t input_nr; // specify which input of function is in the back propagation of the Edge;Copy the code

You can see how input is converted from Python to the C++ engine, using the following variables as an example:

  • Python tensors was converted to C++ root.
  • Grad_tensors of Python was converted to C++ grads.
  • The inputs of Python are converted to the output_edges of C++.
  • Finally, these three variables are passed to the engine: pythonEngine. execute(roots, grads, keep_graph, create_graph, accumulate_grad, output_edges).
backward(tensors, grad_tensors, inputs) + + + | | | Python | | | | | | +------------------------------------------------------------------------------------------+ | | | C++ THPEngine_run_backward | | | | | +-----------------------------+ | | | | | | | +-----------------------------+ | v | | |  | +------root = [(tensor_1.grad_fn_, 0),...,(tensor_n.grad_fn_, 0)] | | | | | | | | | | | | +--grads = [grad_tensor_1,...,grad_tensor_n ] <----------------------+ | | | | | | | | | v |  | output_edges = [(input_1.grad_fn_, output_nr_1),...,(input_n.grad_fn_, output_nr_n)] | | + | +-------------------------+ | | | | | | | +----------------------+ | | | | | v v v PythonEngine.execute(roots, grads, keep_graph, create_graph, accumulate_grad, output_edges)Copy the code

3.4 PythonEngine

The previous code for THPEngine_run_backward looks like this, and we can see that THPEngine_run_backward ends up calling PythonEngine’s processing logic.

auto& engine = python::PythonEngine::get_python_engine(); Outputs = engine. Execute (roots, grads, keep_graph, create_graph, accumulate_grad, output_edges);Copy the code

3.4.1 Obtaining an Engine

Get_python_engine here defines a static variable. The entire PyTorch program maintains only one Engine instance globally, the PythonEngine instance.

Engine& PythonEngine::get_python_engine() {
  static PythonEngine engine;
  // This is "probably" thread-safe because the flag is set in a fork handler
  // before any threads are created, and this function is only called with the
  // GIL held. However, using fork + threads is playing with fire so this is
  // more of a "best effort" thing. For example, if the fork occurs while the
  // backwards threads hold a lock, we'll probably deadlock in the engine
  // destructor.
  if (_reinitialize_engine) {
    engine.release_workers();
    engine.~PythonEngine();
    new (&engine) torch::autograd::python::PythonEngine();
    _reinitialize_engine = false;
  }
  return engine;
}
Copy the code

3.4.2 definition

So let’s look at the PythonEngine definition. PythonEngine is a derived class of Engine that encapsulates it. The PythonEngine subclass overrides the execute function of its parent class and translates C++ exceptions into python exceptions. The Engine base class does the core work:

struct PythonEngine : public Engine {
  static Engine& get_python_engine();
  ~PythonEngine() override;
  void thread_init(int device,
      const std::shared_ptr<ReadyQueue>& ready_queue,
      bool should_increment) override;
  void thread_on_exception(
      std::shared_ptr<GraphTask> graph_task,
      const std::shared_ptr<Node>& fn,
      std::exception& e) override;
  variable_list execute(
      const edge_list& roots,
      const variable_list& inputs,
      bool keep_graph,
      bool create_graph,
      bool accumulate_grad,
      const edge_list& outputs = {}) override;
​
  std::shared_ptr<at::ivalue::Future> execute_with_graph_task(
      const std::shared_ptr<GraphTask>& graph_task,
      std::shared_ptr<Node> graph_root,
      InputBuffer&& input_buffer) override;
​
  std::unique_ptr<AnomalyMetadata> make_anomaly_metadata() override;
  private:
    PythonEngine();
};
Copy the code

The execute code looks like this, so we’ll start by looking at how Engine works.

variable_list PythonEngine::execute( const edge_list& roots, const variable_list& inputs, bool keep_graph, bool create_graph, bool accumulate_grad, const edge_list& outputs) { try { return Engine::execute(roots, inputs, keep_graph, create_graph, accumulate_grad, outputs); } catch (python_error& e) { e.restore(); throw; }}Copy the code

The current logic expansion is as follows:

backward(tensors, grad_tensors, inputs) + + + | | | Python | | | | | | +------------------------------------------------------------------------------------------+ | | | C++ THPEngine_run_backward | | | | | +-----------------------------+ | | | | | | | +-----------------------------+ | v | | |  | +------root = [(tensor_1.grad_fn_, 0),...,(tensor_n.grad_fn_, 0)] | | | | | | | | | | | | +--grads = [grad_tensor_1,...,grad_tensor_n ] <----------------------+ | | | | | | | | | v |  | output_edges = [(input_1.grad_fn_, output_nr_1),...,(input_n.grad_fn_, output_nr_n)] | | + | +-------------------------+ | | | | | | | +----------------------+ | | | | | v v v PythonEngine.execute(roots, grads, keep_graph, create_graph, accumulate_grad, output_edges) + + + + | | | | | | | | v v v v Engine::execute(roots, inputs, keep_graph, create_graph, accumulate_grad, outputs)Copy the code

The mobile phone is as follows:

3.5 Another call path

Finally, we insert another RUN_BACKWARD for analysis.

Run_backward located in torch / / autograd autograd. CSRC CPP. This should be specifically for direct calls from the C++ world, as opposed to roundabout calls from Python.

void backward( const variable_list& tensors, const variable_list& grad_tensors, c10::optional<bool> retain_graph, bool create_graph, const variable_list& inputs) { variable_list gradients = _make_grads(tensors, grad_tensors); if (! retain_graph) { retain_graph = create_graph; } run_backward(tensors, gradients, retain_graph.value(), create_graph, inputs, /*allow_unused=*/true, /*accumulate_grad=*/true); } variable_list grad( const variable_list& outputs, const variable_list& inputs, const variable_list& grad_outputs, c10::optional<bool> retain_graph, bool create_graph, bool allow_unused) { variable_list gradients = _make_grads(outputs, grad_outputs); if (! retain_graph) { retain_graph = create_graph; } return run_backward( outputs, gradients, retain_graph.value(), create_graph, inputs, allow_unused, /*accumulate_grad=*/false); }Copy the code

Run_backward also ends up calling Engine::get_default_engine().execute.

variable_list run_backward( const variable_list& outputs, const variable_list& grad_outputs, bool keep_graph, bool create_graph, const variable_list& inputs, bool allow_unused, bool accumulate_grad) { size_t num_tensors = outputs.size(); edge_list roots; roots.reserve(num_tensors); for (size_t i = 0; i < num_tensors; i++) { const Variable& output = outputs[i]; auto gradient_edge = impl::gradient_edge(output); roots.push_back(std::move(gradient_edge)); } edge_list output_edges; if (! inputs.empty()) { size_t num_inputs = inputs.size(); output_edges.reserve(num_inputs); for (size_t i = 0; i < num_inputs; ++i) { const Variable& input = inputs[i]; const auto output_nr = input.output_nr(); auto grad_fn = input.grad_fn(); if (! grad_fn) { grad_fn = impl::try_get_grad_accumulator(input); } if (! grad_fn) { // See NOTE [ Autograd Unreachable Input ] for details output_edges.emplace_back(std::make_shared<Identity>(), 0); } else { output_edges.emplace_back(grad_fn, output_nr); Inputs variable_list grad_inputs = Engine::get_default_engine().execute(roots, grad_inputs, keep_graph, create_graph, accumulate_grad, output_edges); // check if grad_inputs contains None or not base on the allow_unused flag if (! inputs.empty() && ! allow_unused) { size_t num_inputs = inputs.size(); for (size_t i = 0; i < num_inputs; ++i) { TORCH_CHECK( grad_inputs[i].defined(), "One of the " "differentiated Tensors appears to not have been used " "in the graph. Set allow_unused=True if this is the " "desired behavior."); } } return grad_inputs; }Copy the code

At this point, the call process analysis is finished, its core is to call the engine function for processing, so next we will start to analyze the engine.

0xEE Personal information

★★★★ Thoughts on life and technology ★★★★★

Wechat official account: Rosie’s Thoughts

0 XFF reference

Write Python modules in C