译 文 : Deep Learning From Scratch I: Computational Graphs

Translation: Sun Yimeng

Review: Kaiser


This is the first chapter in a series of tutorials. This chapter will introduce you to the mathematical and algorithmic foundations of deep neural networks. We will then follow the TensorFlow API and implement a neural network library in Python ourselves.

  • Chapter 1: Calculation diagram
  • Chapter two: Perceptrons
  • Chapter three: Training standards
  • Chapter 4: Gradient descent and back propagation
  • Chapter 5: Multilayer perceptron
  • Chapter 6: TensorFlow

No foundation in machine learning or neural networks is required to follow this chapter. However, some foundation is required for undergraduate level calculus, linear algebra, basic algorithms, and probability. If you’re having trouble learning, please write in the comments.

By the end of this chapter, you will have an in-depth understanding of the mathematics behind neural networks and the role of deep learning libraries behind them.

I’ll keep the code as simple as possible, which is easier to understand than it is to run efficiently. Since our API is modeled after TensorFlow, you will know how to use the TensorFlow API and how it works by the end of this chapter (rather than taking the time to learn the most versatile and efficient API).




Computational Graphs

We start with the theory of computational graph, because neural networks themselves are a special form of computational graph.

Computational graph is a digraph in which nodes correspond to operations or variables.

Variable can send its value to Operation, and Operation can send its output to other operations. In this case, each node in the graph defines a function of the Variable.

The values that you put in, that you take out of a node, are called tensor, and that’s a word for multidimensional arrays. So it has scalars, it has vectors, it has matrices, it has higher order tensors.

The Computational Graph in this example adds two inputs X and y to the sum Z.

In this case, x and y are input nodes to Z, and Z is the consumer of x and y. Z therefore defines a function, namely:

where


The concept of computational graph becomes even more important as calculations become more complex. For example, the computational Graph below defines an affine transformation:





Operating Operations

Each Operation has three characteristics:

  • A calculation function that computes the value that should be output for a given input
  • Input node (node) : There can be multiple operations, including Variable or other operations
  • consumer: There can be more than one, taking the output of Operation as their input

Implementation code:

class Operation:
    """Represents a graph node that performs a computation. An `Operation` is a node in a `Graph` that takes zero or more objects as input, and produces zero or more objects as output. """

    def __init__(self, input_nodes=[]):
        """Construct Operation """
        self.input_nodes = input_nodes

        # Initialize list of consumers (i.e. nodes that receive this operation's output as input)
        self.consumers = []

        # Append this operation to the list of consumers of all input nodes
        for input_node in input_nodes:
            input_node.consumers.append(self)

        # Append this operation to the list of operations in the currently active default graph
        _default_graph.operations.append(self)

    def compute(self):
        """Computes the output of this operation. "" Must be implemented by the particular operation. """
        pass
Copy the code


Some simple operations

To get familiar with the action classes (which you’ll need later), let’s implement some simple operations. In both operations, we assume that all tensor is a NumPy array, so element addition and matrix multiplication. We don’t need to implement it ourselves.

add

class add(Operation):
    """Returns x + y element-wise. """

    def __init__(self, x, y):
        """Construct add Args: x: First summand node y: Second summand node """
        super().__init__([x, y])

    def compute(self, x_value, y_value):
        """Compute the output of the add operation Args: x_value: First summand value y_value: Second summand value """
        return x_value + y_value
Copy the code


Matrix multiplication

class matmul(Operation):
    """Multiplies matrix a by matrix b, producing a * b. """

    def __init__(self, a, b):
        """Construct matmul Args: a: First matrix b: Second matrix """
        super().__init__([a, b])

    def compute(self, a_value, b_value):
        """Compute the output of the matmul operation Args: a_value: First matrix value b_value: Second matrix value """
        return a_value.dot(b_value)
Copy the code




Placeholder Placeholders

In a computational graph, not all nodes are operations, such as in an affine graph,.None of them are Operation. They are, relatively speaking, the inputs to the Graph, and if we want to compute the graph’s output, we must provide a value for each of them. So to provide that value, we put in placeholder.

class placeholder:
    """Represents a placeholder node that has to be provided with a value when computing the output of a computational graph """

    def __init__(self):
        """Construct placeholder """
        self.consumers = []

        # Append this placeholder to the list of placeholders in the currently active default graph
        _default_graph.placeholders.append(self)
Copy the code




Variables are the Variables

In the graph of affine transformations,There are essential differences. X is the input to operation, and A and b are the parameters to operation, that is, they are inherent to graph itself. We call parameters such as A and b variable.

class Variable:
    """Represents a variable (i.e. an intrinsic, changeable parameter of a computational graph). """

    def __init__(self, initial_value=None):
        """Construct Variable Args: initial_value: The initial value of this variable """
        self.value = initial_value
        self.consumers = []

        # Append this variable to the list of variables in the currently active default graph
        _default_graph.variables.append(self)
Copy the code




Graph class

Finally, we need a class that contains all operations, placeholder, and variables. When you create a new graph, you can set its _defaultgraph by calling the as_default method.

In this way, we can create operation, placeholder, and variable without passing in a reference to graph every time.

class Graph:
    """Represents a computational graph """

    def __init__(self):
        """Construct Graph"""
        self.operations = []
        self.placeholders = []
        self.variables = []

    def as_default(self):
        global _default_graph
        _default_graph = self
Copy the code




For example,

Now let’s create the Computational Graph for affine transformations using the classes listed above:

# Create a new graph
Graph().as_default()

# Create variables
A = Variable([[1, 0], [0, -1]])
b = Variable([1, 1])

# Create placeholder
x = placeholder()

# Create hidden node y
y = matmul(A, x)

# Create output node z
z = add(y, b)
Copy the code




Computational operation output

Now that we’ve learned how to create graphs, it’s time to consider how to compute the output of operation.

Create a Session class that contains the execution of an operation. We want to be able to call the run method on the session instance, be able to pass in the operation that needs to be evaluated, and a dictionary of all the values that the placeholder needs.

session = Session()
output = session.run(z, {
    x: [1, 2]
})
Copy the code

Here is the calculation process:

To calculate the functions represented by operation, we need to do the calculations in the correct order. For example, if the intermediate y hasn’t been computed, we can’t compute z first. Therefore, we must ensure that the operation is executed in the correct order to ensure that the values of the input nodes for an operation are calculated before it is evaluated. This can be achieved by post-order tree traversal.

import numpy as np


class Session:
    """Represents a particular execution of a computational graph. """

    def run(self, operation, feed_dict={}):
        """Computes the output of an operation Args: operation: The operation whose output we'd like to compute. feed_dict: A dictionary that maps placeholders to values for this session """

        # Perform a post-order traversal of the graph to bring the nodes into the right order
        nodes_postorder = traverse_postorder(operation)

        # Iterate all nodes to determine their value
        for node in nodes_postorder:

            if type(node) == placeholder:
                # Set the node value to the placeholder value from feed_dict
                node.output = feed_dict[node]
            elif type(node) == Variable:
                # Set the node value to the variable's value attribute
                node.output = node.value
            else:  # Operation
                # Get the input values for this operation from node_values
                node.inputs = [input_node.output for input_node in node.input_nodes]

                # Compute the output of this operation
                node.output = node.compute(*node.inputs)

            # Convert lists to numpy arrays
            if type(node.output) == list:
                node.output = np.array(node.output)

        # Return the requested node value
        return operation.output


def traverse_postorder(operation):
    """Performs a post-order traversal, returning a list of nodes in the order in which they have to be computed Args: operation: The operation to start traversal at """

    nodes_postorder = []

    def recurse(node):
        if isinstance(node, Operation):
            for input_node in node.input_nodes:
                recurse(input_node)
        nodes_postorder.append(node)

    recurse(operation)
    return nodes_postorder
Copy the code


Test the class we wrote in the above example:

session = Session()
output = session.run(z, {
    x: [1, 2]
})
print(output)
Copy the code

Ouch, not bad.

If you have any questions, please feel free to comment.




Recommended reading

PaddlePaddle mail swindler (Final)

This comment is toxic! — The general routine of text classification

Build your own AlphaZero with Python and Keras