By Jacob Buckman

Compiler: Bot

Editor’s note: In the summer of 2017, CMU CS master student Jacob Buckman was selected for the Google AI Residency program to begin a 12-month training program at Google headquarters focusing on NLP and intensive learning. Jacob has extensive programming experience and is also knowledgeable in machine learning. Although he has never worked with Tensorflow, he believes it is easy to master a tool based on his academic background. Unfortunately, reality hit him in the face…


In the three years since its release, Tensorflow has become a cornerstone of the deep learning ecosystem, yet it’s less intuitive for beginners than libraries like PyTorch and DyNet, which are based on dynamic diagrams called define- by-Run.

From linear regression to MNIST classification to machine translation, Tensorflow’s tutorials are a great resource to help beginners get started on projects and get a foot in the door to machine learning. But Tensorflow can be daunting for developers who want to make some original breakthroughs in areas where machine learning has yet to be explored.

The purpose of this article is to fill in the gaps in this area by sticking closely to the general approach and explaining the basic concepts underpinning Tensorflow rather than focusing on a specific task. With these concepts in hand, developers can use Tensorflow for deep learning research more intuitively.

Note: This tutorial is intended for developers with some programming and machine learning experience who must use Tensorflow.

Tensorflow is not a normal Python library

Most Python libraries are actually extensions to Python. When you import a library, what you get is a set of variables, functions, and classes that really serve as a “toolbox” of code for the developer’s real-world needs. Tensorflow is not. It would be inherently misguided to start with Tensorflow thinking about how to interact with code.

When it comes to the relationship between Python and Tensorflow, a simple analogy can be made to Javascript and HTML. Javascript is a versatile programming language, and we can do a lot with it. HTML, on the other hand, is a framework that can represent some abstract calculations (such as describing what is rendered on a web page). The purpose of Javascript is to make the user see HTML objects when he opens a web page, and replace the old HTML objects with new ones as the page iterates.

Like HTML, Tensorflow is a framework for representing abstract computations. When we operate Tensorflow in Python, the first thing our code does is assemble a graph, and the second thing it does is interact with the graph (Tensorflow sessions). But the computed graph is not inside the variable, but in the global namespace. As Shakespeare said: All RAM is a phase, all variables are Pointers.


The first key concept: computational graphs

As you browse through the Tensorflow document, you’ll find plenty of descriptions for “Graphs” and “Nodes.” If you are careful enough, you may have found all the details about data flow diagrams on the Diagrams and Sessions page. The content of this page is the focus of our discussion below, but the official documentation is “technical” and we sacrifice some technical details to capture the intuition.

So what is a computational graph? In fact, a computational graph represents a global data structure: it is a directed graph that contains all the information about the flow of data computation.

Let’s start with an example:

import tensorflow as tf
Copy the code

Calculation chart:


After importing Tensorflow, we have a blank graph representing an isolated, empty global variable. On this basis, let’s do some more “Tensorflow operations” :

Code:

import tensorflow as tf
two_node = tf.constant(2)
print two_node
Copy the code

Output:

Tensor("Const:0", shape=(), dtype=int32)
Copy the code

Calculation chart:


Here we have a node that contains the constant 2 from the function tf.constant. When we print the variable, you can see that it returns a tf.tensor object, which is a pointer to the node we just created. To verify this, here’s another example:

Code:

import tensorflow as tf
two_node = tf.constant(2)
another_two_node = tf.constant(2)
two_node = tf.constant(2)
tf.constant(3)
Copy the code

Calculation chart:


Even if the functions function the same, even if they simply assign a value to the same object repeatedly, and even if they are not assigned to a variable at all, a new node is created in the graph for each call to tf.constant.

Conversely, if we create a new variable and assign it to an existing node, we copy the pointer to that node and no new node appears on the graph:

Code:

import tensorflow as tf
two_node = tf.constant(2)
another_pointer_at_two_node = two_node
two_node = None
print two_node
print another_pointer_at_two_node
Copy the code

Output:

None
Tensor("Const:0", shape=(), dtype=int32)
Copy the code

Calculation chart:


Next, let’s try something fun:

Code:

import tensorflow as tf two_node = tf.constant(2) three_node = tf.constant(3) sum_node = two_node + three_node ## Equivalent to tf.add(two_node, three_node)Copy the code

Calculation chart:


The figure above is already a real computational picture. Note that TensorFlow overloads common mathematical operators, such as tf.add above. Although it does not appear to add a new node, it does add two tensors together into a new node.

So two_node points to a node with 2, three_node points to a node with 3, and sum_node points to a node with + — doesn’t that seem a little unusual, because why is sum_node a + and not a 5?

Because the calculation diagram only contains steps, not results! At least… Not yet!

The second key concept: conversation

If TensorFlow has a bug disaster zone, sessions are definitely at the top of the list. Due to the lack of explicit naming and the versatility of the functions, almost every Tensorflow program calls tf.session () more than once.

The purpose of the session is to manage all resources, such as memory allocation and optimization, while the program is running, so that we can do the actual calculation as instructed by the calculation diagram. You can think of a calculation diagram as a calculation “template” that lists all the detailed steps. So each time before starting the calculation chart, we should first have a session, allocate resources, complete the task; At the end of the calculation, we have to close the session again to help the system reclaim resources and prevent resource leakage.

The session contains a global pointer that is constantly updated based on all Pointers to nodes in the computation diagram. This means that sessions and nodes are created without sequencing issues.

After creating the session object, we can return the value of the node with sess.run(node), and Tensorflow performs all the calculations needed to determine the value.

Code:

import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print sess.run(sum_node)
Copy the code

Output:

5
Copy the code

Calculation chart:


We can also write sess.run([node1, node2… ), let it return multiple outputs:

Code:

import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print sess.run([two_node, sum_node])
Copy the code

Output:

[2, 5]
Copy the code

Calculation chart:


Sess.run () is generally the biggest bottleneck for TensorFlow, and the less you use it, the better it will be. Whenever possible, we should make it output multiple results at once, not use it frequently, and never put it into a complex loop.

Placeholders and feed_dict

So far, the calculations we’ve done have had no input, so we’ve been getting the same output. Next we’ll explore more meaningfully, such as building a computational graph that accepts input, processing it in a way that returns an output.

The most direct way to do this is by using Placeholders, which are nodes that accept external input.

Code:

import tensorflow as tf
input_placeholder = tf.placeholder(tf.int32)
sess = tf.Session()
print sess.run(input_placeholder)
Copy the code

Output:

Traceback (most recent call last):
...
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype int32
     [[Node: Placeholder = Placeholder[dtype=DT_INT32, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Copy the code

Calculation chart:


This is a classic case of failure, because the placeholder itself has no initial value, and because we did not assign a value to it, Tensorflow has a bug.

In session sess.run(), placeholders can feed data with feed_dict.

Code:

import tensorflow as tf
input_placeholder = tf.placeholder(tf.int32)
sess = tf.Session()
print sess.run(input_placeholder, feed_dict={input_placeholder: 2})
Copy the code

Output:

2
Copy the code

Calculation chart:


Notice the format of feed_dict, which is a dictionary that gives values for all placeholders that exist in the computed graph (as mentioned earlier, it’s actually a pointer to a placeholder point in the graph), and these values are typically scalars or Numpy arrays.

The third key concept: computational paths

Let’s try another example involving placeholders:

Code:

import tensorflow as tf
input_placeholder = tf.placeholder(tf.int32)
three_node = tf.constant(3)
sum_node = input_placeholder + three_node
sess = tf.Session()
print sess.run(three_node)
print sess.run(sum_node)
Copy the code

Output:

3
Traceback (most recent call last):
...
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_2' with dtype int32
     [[Node: Placeholder_2 = Placeholder[dtype=DT_INT32, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Copy the code

Calculation chart:


Here’s another error, so why is the second sess.run buggy this time? Why didn’t we evaluate Input_placeholder and end up with an error about it? The answer to these two questions lies in the third key concept of Tensorflow: computational paths. Fortunately, this one is generally intuitive.

When we call sess.run(), we calculate not only the current node, but also the values of some nodes associated with it. If this node is dependent on other nodes, we work our way up until we reach the “top” of the graph, where no other nodes influence the target node.

The following is the calculation path of the sum_node node:



To calculate sum_node, we evaluate the values of all three nodes, including placeholders that we did not assign, which explains the error.

In contrast, three_node evaluates to a single path:


It’s enough to evaluate one node, so even if input_placeholder doesn’t have an assignment, it won’t affect sess.run(three_node).

Tensorflow’s framework benefits from computational path design. Imagine if we had a huge graph with a lot of unnecessary nodes. By doing this, we could bypass most of the points and only calculate what was necessary, which opens up the possibility of huge runtime savings. In addition, it allows us to build large “multi-purpose” computing diagrams, which can have some shared core nodes, but we can do different computations through different computing paths.

Variables and side effects

So far, we’ve been working with two “no ancestors” nodes: TF.constant and tF.placeholder. Where the former is a constant value for each round; The latter is different from round to round. In addition, we need to consider a third case: it can be a constant value for several consecutive rounds, but it can also be updated if a new value appears. This is the concept of Variables that we will introduce.

If you want to learn more with Tensorflow, it is important to understand variables, because the parameters of the model are basically variables. During training, we update the parameters with gradient descent; During the evaluation process, however, we kept the parameters constant and fed a large number of different test sets into the model. So if possible, we want all trainable parameters to be variables.

The method to create a variable is tf.get_variable(), where the first two arguments tf.get_variable(name, shape) are fixed and the rest are optional. Name is a string that identifies the variable object and must be unique, ensuring that there are no duplicate names. Shape is a matrix of integers corresponding to the shape of a tensor, which is arranged in order, with only one integer per dimension. For example, a 3×8 matrix’s shape would be [3, 8]. If you are creating a scalar, remember the symbol is [].

Code:

import tensorflow as tf
count_variable = tf.get_variable("count", [])
sess = tf.Session()
print sess.run(count_variable)
Copy the code

Output:

Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value count
     [[Node: _retval_count_0_0 = _Retval[T=DT_FLOAT, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](count)]]
Copy the code

Calculation chart:


Something went wrong again. Why this time? When we first create a variable, its initial value is “Null”, which is always buggy to evaluate. Variables are assigned and then evaluated. There are two methods of assignment, one is to set an initial value, the other is tf.assign(). Let’s look at tf.assign() :

Code:

import tensorflow as tf
count_variable = tf.get_variable("count", [])
zero_node = tf.constant(0.)
assign_node = tf.assign(count_variable, zero_node)
sess = tf.Session()
sess.run(assign_node)
print sess.run(count_variable)
Copy the code

Output:

0
Copy the code

Calculation chart:


Tf.assign (target, value) is a bit special compared to the previous tF.assign (target, value) :

  • It doesn’t compute. It’s always equal tovalue;
  • Side Effects. The figure above shows the side effects of this operation when calculations flow throughassign_nodeWhen,count_variableThe value in the node is forcibly replacedzero_nodeThe value of the node;
  • Even if thecount_variableNodes andassign_nodeThere is a connection, but the two are not dependent on each other (dotted line).

Since the “side effects” nodes support most Tensorflow deep learning computations, it is important to understand how this works. When we run sess.run(assign_node), the calculation path goes through assign_node and zero_node:

Calculation chart:


As mentioned earlier, when we calculate the target node, we also count the nodes associated with it, which includes side effects. As shown in green in the figure above, due to a specific side effect of tf.assign, the count_variable used to store “Null” is now permanently set to 0, which means that the next time we call sess.run(count_variable), it will print 0, Instead of reporting bugs.

Next, let’s look at setting the initial value:

Code:

import tensorflow as tf
const_init_node = tf.constant_initializer(0.)
count_variable = tf.get_variable("count", [], initializer=const_init_node)
sess = tf.Session()
print sess.run([count_variable])
Copy the code

Output:

Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value count
     [[Node: _retval_count_0_0 = _Retval[T=DT_FLOAT, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](count)]]
Copy the code

Calculation chart:


Okay, so why is there no initialization here?

The answer lies in the disconnect between the session and the computed graph. We set an initial value for the variable const_init_node, but it is reflected in the diagram as a dashed connection between two nodes. This is because we didn’t initialize it at all in the session and didn’t allocate computing resources to it. We need to update the variable to const_init_node in the session.

Code:

import tensorflow as tf
const_init_node = tf.constant_initializer(0.)
count_variable = tf.get_variable("count", [], initializer=const_init_node)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
print sess.run(count_variable)
Copy the code

Output:

0.
Copy the code

Calculation chart:


To do this, we added another special node: init = tf.global_variables_initializer(). Like tf.assign(), this is a node with side effects, but it doesn’t need to specify input. Tf.global_variables_initializer () looks over the diagram from its inception and automatically adds dependencies for each tF.Initializer in the diagram. When we start to execute sess.run(init), it completes the initialization of the whole diagram, avoiding errors.

Variable Shared

In practice, we will sometimes come across Tensorflow code that shares variables, which involves creating a scope and setting “reuse = True”, but we strongly discourage this. If you want to use a single variable in more than one place, simply programmatically trace the pointer to the node of that variable and reuse it as needed. In other words, you should only call tF.get_variable () once for each parameter you intend to store in memory.

To optimize the

With that out of the way, we can now do real deep learning! The rest of the concepts should be pretty simple for you if you’ve read this far.

In deep learning, a typical “inner loop” looks like this:

  • Get input and true_Output
  • Calculate an estimate based on the inputs and parameters
  • Compared true_Output and estimated value, model Loss was calculated
  • Parameters are updated according to loss gradient

Here is a simple script for the linear regression problem:

Code:

Tensorflow = tf.get_variable("m", [], tensorflow = tf.get_variable("m", []) initializer=tf.constant_initializer(0.)) b = tf.get_variable("b", [], Initializer =tf.constant_initializer(0.)) init = tf.global_variabLES_initializer () ## Second set calculation input_placeholder = tf.placeholder(tf.float32) output_placeholder = tf.placeholder(tf.float32) x = input_placeholder y = output_placeholder Y_guess = m * x + b loss = tf square (y - y_guess) # # the last setting optimizer and minimize node optimizer = tf. Train. GradientDescentOptimizer (1-3) e Train_op = optimizer.minimize(loss) ### Start sessions sess = tf.session () sess.run(init) ### perform training loops import random ## Ask a question true_m = random.random() true_b = random.random() for update_i in range(100000): Input_data = random. Random () output_data = true_m * input_data + true_b ## (2), (3) (4) Call sess.run() together! _loss, _ = sess.run([loss, train_op], feed_dict={input_placeholder: input_data, output_placeholder: Output_data}) print update_i, _loss ## m=%.4f, b=%.4f" % (true_m, true_b) print "Learned parameters: m=%.4f, b=%.4f" % tuple(sess.run([m, b]))Copy the code

Output:

0 2.3205383 1 0.5792742 2 1.55254 3 1.5733259 4 0.6435648 5 2.4061265 6 1.0746256 7 2.1998715 8 1.6775116 9 1.6462423 10 2.441034... 99990 2.9878322E-12 99991 5.158629E-11 99992 4.53646E-11 99993 9.422685E-12 99994 3.991829e-11 99995 1.134115e-11 99996 4.9467985E-11 99997 1.3219648E-11 99998 5.684342e-14 99999 3.007017e-11 True parameters M =0.3519, b=0.3242 Learned parameters: m=0.3519, b=0.3242Copy the code

As can be seen, loss is basically unchanged, and update parameters are well calculated. Here’s what we’re new to:

# # the last setting optimizer and minimize node optimizer = tf. Train. GradientDescentOptimizer (1-3) e train_op = optimizer. Minimize (loss)Copy the code

The first line optimizer = tf. Train. GradientDescentOptimizer (1 e – 3) role is to create a Python objects contain the auxiliary function, instead of the node is added into the calculation chart. The second line train_op = optimizer.minimize(loss) acts by adding a node to the graph and storing a pointer in the variable train_op.

This train_op node has no output, but has a very complicated side effect: it traces the computed path of the input data loss, looking for variable nodes. For each variable node it finds, it calculates the gradient of that variable with respect to Loss, and then calculates the variable update value with the current value minus the gradient times the learning rate. Finally, it performs an assignment operation to update the value of the variable.

So when we call sess.run(train_op), all the variables have completed gradient descent. Of course, we also need to populate the input and output placeholders with feed_dict and print these losses for subsequent debugging.

withtf.Printdebugging

Now that we’ve learned to do complex things with Tensorflow, debugging is inevitable. In general, it’s hard to see what’s wrong with the graph — because sess.run() prints the session end directly, we have no access to the calculated values in the process, and no way to call regular Python statements. In short, these values do not exist until sess.run() is called; When sess.run() is executed, these values disappear!

Let’s look at a simple example:

Code:

import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print sess.run(sum_node)
Copy the code

Output:

5
Copy the code

Two plus three is five, but what if we want to check two_node and three_node? One possible solution to this is to add return parameters for each node to be checked in sess.run() and then print those parameters.

Code:

import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
answer, inspection = sess.run([sum_node, [two_node, three_node]])
print inspection
print answer
Copy the code

Output:

5 [2, 3]Copy the code

In general, this works, but as the code gets more complex, it becomes less convenient. An easier approach is to simply use the tF.print declaration. Here’s the oddity: Tf.print is actually a Tensorflow node with output and side effects! It takes two required parameters: the node to copy and the list of content to output. Where “Nodes to copy” can be any node in the calculation diagram, with the side effect of printing all the current values in the “Content list”.

Code:

import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
print_sum_node = tf.Print(sum_node, [two_node, three_node])
sess = tf.Session()
print sess.run(print_sum_node)
Copy the code

Output:

[2] [3] 5Copy the code

Calculation chart:


The important point here is that, like other side effects, print only occurs when the computation stream passes through the tf.print node, that is, print does not execute if tF.print is not in the computation path, or even if the node copied by tf.print is in the computation path, but it is not. So if you want to copy a node, remember to create the tf.print node immediately.

Code:

import tensorflow as tf two_node = tf.constant(2) three_node = tf.constant(3) sum_node = two_node + three_node ### The new copy of the two_node is not on the calculation path, Print print_two_node = tf. print (two_node, [two_node, three_node, sum_node]) sess = tf.Session() print sess.run(sum_node)Copy the code

Output:

5
Copy the code

Calculation chart:


For more debug, here is a great resource to check out.

Hopefully this article has helped you better understand what Tensorflow is, how it works, and how to use it.