Efficient programming with TensorFlow

TensorFlow fundamentals

The most obvious difference between TensorFlow and other numerical libraries such as Numpy is that TensorFlow manipulates symbols. This is a powerful feature, and it ensures that TensorFlow can do many things (such as auto-differentiate) that other libraries (such as Numpy) cannot. That’s probably why it’s more complicated. Today we take a step-by-step look at TensorFlow and provide some guidelines and best practices for using TensorFlow more effectively.

So let’s start with a simple example, where we’re multiplying by two random matrices. First let’s look at how this works in Numpy:

import numpy as np
x = np.random.normal(size=[10, 10])
y = np.random.normal(size=[10, 10])
z = np.dot(x, y)
print(z)Copy the code

Now we perform the exact same calculation using TensorFlow:

import TensorFlow as tf
x = tf.random_normal([10, 10])
y = tf.random_normal([10, 10])
z = tf.matmul(x, y)
sess = tf.Session()
z_val = sess.run(z)
print(z_val)Copy the code

Unlike Numpy, which immediately performs the calculation and copies the result to the output variable Z, TensorFlow only gives us a tensor type that we can manipulate. If we try to print the value of z directly, we get something like this:

Tensor("MatMul:0", shape=(10, 10), dtype=float32)Copy the code

Since both inputs are defined types, TensorFlow is able to infer the symbol of the tensor and its type. To calculate the value of the tensor, we need to create a Session and evaluate it using the session.run method.

To see what such powerful symbolic computation really is, we can look at another example. Suppose we have a sample of a curve (for example, f(x)= 5x ^ 2 + 3), and we want to estimate f(x) without knowing its parameters. We define the parameter function as g(x, w)= w0 x ^ 2 + w1 x + w2, which is a function of the input x and the potential parameter w. Our goal is to find the potential parameter such that g(x, w)≈f(x). This can be done by minimizing the loss function: L(w)=(f(x)-g(x, w))^ 2. Although there is a simple closed solution to this problem, we have chosen to use a more general approach that can be applied to any distinguishable task, which is to use stochastic gradient descent. We simply calculate the mean gradient of L(w) with respect to W over a set of sample points, moving in the opposite direction.

Here’s how to do it in TensorFlow:

import numpy as npimport TensorFlow as tf x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1) yhat = tf.squeeze(tf.matmul(f, w), 1) Loss = tf.nn.L2_loss (yhat-y) + 0.1 * tf.nn.L2_loss (w) train_op = tf.train.AdamOptimizer(0.1). Minimize (loss)def generate_data(): X_val = np. Random. Uniform (-10.0, 10.0, size=100) y_val = 5 * np. Square (x_val) + 3 return x_val, y_val sess = tf.Session() sess.run(tf.global_variables_initializer())for _ in range(1000): x_val, y_val = generate_data() _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) print(loss_val) print(sess.run([w]))Copy the code

By running this code, we can see the following data:

[4.9924135, 0.00040895029, 3.4504161]Copy the code

This is pretty close to our parameters.

This is just the tip of the iceberg of what TensorFlow can do. Many problems, such as optimizing large neural networks with millions of parameters, can be efficiently implemented in TensorFlow with just a few lines of code. TensorFlow can scale across multiple devices and threads and supports a variety of platforms.

Understand static and dynamic shapes

In TensorFlow, the tensor has a static shape property that is determined during diagram construction. This static shape can be undefined, for example, we can have a tensor of shape [None, 128] size.

import TensorFlow as tf
a = tf.placeholder(tf.float32, [None, 128])Copy the code

This means that the first dimension of the tensor can be any size, and that will be defined dynamically in session.run (). Of course, you can query a tensor for its static shape, like this:

static_shape = a.shape.as_list()  # returns [None, 128]Copy the code

To get a tensor’s dynamic shape, you can call tf.shape, which will return a tensor’s shape, like this:

dynamic_shape = tf.shape(a)Copy the code

The static shape of the tensor can be set by using the tensor_name.set_shape () method, as in:

a.set_shape([32, 128])  # static shape of a is [32, 128]a.set_shape([None, 128])  # first dimension of a is determined dynamicallyCopy the code

Calling tF.0 () method you can shape a tensor dynamically, for example: 0

a =  tf.reshape(a, [32, 128])Copy the code

We can define a function that returns static shapes when static shapes exist and dynamic shapes when static shapes do not exist, as in:

def get_shape(tensor):
  static_shape = tensor.shape.as_list()
  dynamic_shape = tf.unstack(tf.shape(tensor))
  dims = [s[1] if s[0] is None else s[0]          for s in zip(static_shape, dynamic_shape)]  return dimsCopy the code

Now, if we need to translate a third-order tensor into a two-order tensor, by folding the second and third dimensions into one, we can do that with the get_shape() method we just defined, like this:

b = tf.placeholder(tf.float32, [None, 10, 32])
shape = get_shape(b)
b = tf.reshape(b, [shape[0], shape[1] * shape[2]])Copy the code

Notice that this code works whether the shape of the tensor is statically specified or dynamically specified. In fact, we could write a new generic function for any list of folded dimensions: 0

import TensorFlow as tfimport numpy as npdef reshape(tensor, dims_list):
  shape = get_shape(tensor)
  dims_prod = []  for dims in dims_list:    if isinstance(dims, int):
      dims_prod.append(shape[dims])    elif all([isinstance(shape[d], int) for d in dims]):
      dims_prod.append(np.prod([shape[d] for d in dims]))    else:
      dims_prod.append(tf.prod([shape[d] for d in dims]))
  tensor = tf.reshape(tensor, dims_prod)  return tensorCopy the code

And then folding the second dimension becomes really easy.

b = tf.placeholder(tf.float32, [None, 10, 32])
b = reshape(b, [0, [1, 2]])Copy the code

Scope and when to use it

In TensorFlow, variables and tensors have a name attribute that is used to identify them in the graph. If you don’t explicitly name variables or tensors when creating them, TF will automatically, implicitly assign them names, such as:

a = tf.constant(1)
print(a.name)  # prints "Const:0"b = tf.Variable(1)
print(b.name)  # prints "Variable:0"Copy the code

You can also override the default names of variables or tensors by explicitly naming them at definition time, such as:

a = tf.constant(1, name="a")
print(a.name)  # prints "b:0"b = tf.Variable(1, name="b")
print(b.name)  # prints "b:0"Copy the code

TF has introduced two different context managers for changing the names of tensors or variables. The first is tf.name_scope, for example:

with tf.name_scope("scope"):
  a = tf.constant(1, name="a")
  print(a.name)  # prints "scope/a:0"

  b = tf.Variable(1, name="b")
  print(b.name)  # prints "scope/b:0"

  c = tf.get_variable(name="c", shape=[])
  print(c.name)  # prints "c:0"Copy the code

Notice that in TF, we have two ways to define a new Variable, either by tf.variable () or by calling tf.get_variable(). When tf.get_variable() is called with a new name, a new variable will be created, but if the name is not a new name but already exists in the variable scope, a ValueError will be raised, meaning that duplicate declarations of a variable are not allowed.

Tf.name_scope () only affects the names of tensors and variables created by calling tf.variable, not variables and tensors created by calling tf.get_variable().

Unlike tf.name_scope(), tf.variable_scope() can also be modified to affect variables and tensors created with tf.get_variable(), such as:

with tf.variable_scope("scope"):
  a = tf.constant(1, name="a")
  print(a.name)  # prints "scope/a:0"

  b = tf.Variable(1, name="b")
  print(b.name)  # prints "scope/b:0"

  c = tf.get_variable(name="c", shape=[])
  print(c.name)  # prints "scope/c:0"with tf.variable_scope("scope"):
  a1 = tf.get_variable(name="a", shape=[])
  a2 = tf.get_variable(name="a", shape=[])  # DisallowedCopy the code

But what if we really want to reuse a previously declared variable? The variable manager also provides a mechanism to implement this requirement:

with tf.variable_scope("scope"):
  a1 = tf.get_variable(name="a", shape=[])with tf.variable_scope("scope", reuse=True):
  a2 = tf.get_variable(name="a", shape=[])  # OKThis becomes handy for example when using built-in neural network layers:

features1 = tf.layers.conv2d(image1, filters=32, kernel_size=3)# Use the same convolution weights to process the second image:with tf.variable_scope(tf.get_variable_scope(), reuse=True):
  features2 = tf.layers.conv2d(image2, filters=32, kernel_size=3)Copy the code

This syntax may not seem particularly clear cut. In particular, if you want to implement a lot of variable sharing in a model, you need to keep track of variables, such as when to define new variables and when to reuse them, which can be particularly troublesome and error-prone, so TF provides TF templates to automatically solve the problem of variable sharing:

conv3x32 = tf.make_template("conv3x32", lambda x: tf.layers.conv2d(x, 32, 3))
features1 = conv3x32(image1)
features2 = conv3x32(image2)  # Will reuse the convolution weights.Copy the code

You can convert any function to a TF template. The first time the template is called, the variables declared in the function are defined, and they are automatically reused in subsequent calls.

4. Advantages and disadvantages of broadcasting

TensorFlow supports a broadcast mechanism that broadcasts element-by-element operations. Normally, when you want to perform operations such as addition or multiplication, you need to make sure that the shapes of the operands match. For example, you cannot add a tensor of shape [3, 2] to a tensor of shape [3,4]. However, there is a special case where if one of your operands is a tensor of one dimension, TF will implicitly fill in its single dimensional direction to make sure it matches the shape of the other operand. Therefore, adding a tensor of [3,2] to a tensor of [3,1] is legal in TF.

import TensorFlow as tf

a = tf.constant([[1., 2.], [3., 4.]])
b = tf.constant([[1.], [2.]])# c = a + tf.tile(b, [1, 2])c = a + bCopy the code

The broadcast mechanism allows us to populate implicitly, which makes our code much cleaner and uses memory more efficiently because we don’t need to store the results of the populate operation separately. One application scenario that demonstrates this advantage is when combining eigenvectors of different lengths. In order to splice eigenvectors with different lengths, we usually fill the input vector first, splice the result and then perform a series of nonlinear operations. This is the common pattern of a large class of neural network architectures.

a = tf.random_uniform([5, 3, 5])
b = tf.random_uniform([5, 1, 6])# concat a and b and apply nonlinearitytiled_b = tf.tile(b, [1, 3, 1])
c = tf.concat([a, tiled_b], 2)
d = tf.layers.dense(c, 10, activation=tf.nn.relu)Copy the code

But this can be done more efficiently through a broadcast mechanism. We used the fact that f (m) (x + y) = f (mx + my) f (x + y) (m) = f (mx + my) f (x + y) (m) = f (mx + my), simplify our filling operations. Therefore, we can do this linear operation separately, using the broadcast mechanism to implicitly complete the concatenation operation.

pa = tf.layers.dense(a, 10, activation=None)
pb = tf.layers.dense(b, 10, activation=None)
d = tf.nn.relu(pa + pb)Copy the code

In fact, this code is general enough that it can be applied to tensors of arbitrary shape:

def merge(a, b, units, activation=tf.nn.relu):
    pa = tf.layers.dense(a, units, activation=None)
    pb = tf.layers.dense(b, units, activation=None)
    c = pa + pb    if activation is not None:
        c = activation(c)    return cCopy the code

A more general function form is described above:

So far, we have discussed the advantages of broadcast mechanisms, but the same broadcast mechanism has its drawbacks. Implicit assumptions almost always make debugging more difficult. Consider the following example:

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = tf.reduce_sum(a + b)Copy the code

What do you think that’s going to be? If you said 6, you’re wrong, the answer should be 12. This is because when the order of two tensors do not match, TF will automatically start to expand in the first dimension of the tensor of lower order before inter-element operation, so the result of this addition will become [[2, 3], [3, 4]], so the result of this reduce is 12.

The way to get around this is to use it as explicitly as possible. Finding the bug is easy when we explicitly specify the dimensions when we need some reduce tensor:

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = tf.reduce_sum(a + b, 0)Copy the code

Thus, the value of C is [5, 7], and it is easy to guess why it is wrong. A more general rule is to always specify dimensions in reduce operations and in using TF.squeeze.

Feed data to TensorFlow

TensorFlow is designed to run efficiently with large amounts of data. So you need to remember never to “starve” your TF model to get the best performance. Generally speaking, there are three ways to “feed” your model.

Constant mode (`tf.constant`)

The easiest way to do this is to embed the data directly into your graph as constants, as in:

import TensorFlow as tfimport numpy as np

actual_data = np.random.normal(size=[100])
data = tf.constant(actual_data)Copy the code

It’s very efficient, but it’s not flexible. The big problem with this approach is that in order to reuse your model on other data sets, you have to rewrite your diagrams, and you have to load all the data at the same time and keep it in memory all the time, which means that this approach only works with small numbers of episodes.

Placeholder mode (`tf.placeholder`)

The problem of constant feeding network can be solved by means of placeholders, such as:

import TensorFlow as tfimport numpy as np

data = tf.placeholder(tf.float32)
prediction = tf.square(data) + 1actual_data = np.random.normal(size=[100])
tf.Session().run(prediction, feed_dict={data: actual_data})Copy the code

The placeholder operator returns a tensor whose value is obtained in the session using the manually specified feed_dict parameter.

Python operations (`tf.py_func`)

You can also feed data using Python operations:

def py_input_fn():
    actual_data = np.random.normal(size=[100])    return actual_data

data = tf.py_func(py_input_fn, [], (tf.float32))Copy the code

Python operations allow you to convert a regular Python function into a TF operation.

Use TF’s own dataset API

The most recommended way is to feed data through TF’s own data set API, such as:

actual_data = np.random.normal(size=[100])
dataset = tf.contrib.data.Dataset.from_tensor_slices(actual_data)
data = dataset.make_one_shot_iterator().get_next()Copy the code

If you need to read data from a file, you may need to convert the file to TFrecord format, which will make the whole process more efficient

dataset = tf.contrib.data.Dataset.TFRecordDataset(path_to_data)Copy the code

Check out the official documentation to learn how to convert your dataset to TFrecord format.

dataset = ...
dataset = dataset.cache()if mode == tf.estimator.ModeKeys.TRAIN:
    dataset = dataset.repeat()
    dataset = dataset.shuffle(batch_size * 5)
dataset = dataset.map(parse, num_threads=8)
dataset = dataset.batch(batch_size)Copy the code

After the Dataset is read in, we use the Dataset.cache() method to cache it in memory for higher efficiency. In training mode, we repeatedly repeat the data set, which allows us to process the entire data set multiple times. We also need to scramble the data set to get batches, which will have different sample distributions. Next, we use Dataset. Map () to preprocess the original data and convert the data into a format that the model can recognize and utilize. Then, we create a batch of samples with Dataset.batch().

Use operator overloading

Like Numpy, TensorFlow overloads many of the Operators in Python, making it easier to build graphs and making the code readable.

The slicing operation is one of many overloaded operators that can make index tensors easy:

z = x[begin:end]  # z = tf.slice(x, [begin], [end-begin])Copy the code

However, you need to be very careful when using it. Slicing is very inefficient and is best avoided, especially when the number of slicing is large. To better understand how inefficient this operator is, let’s look at an example. We want to manually implement a reduce operation on rows of a matrix:

import TensorFlow as tfimport time

x = tf.random_uniform([500, 10])

z = tf.zeros([10])for i in range(500):
    z += x[i]

sess = tf.Session()
start = time.time()
sess.run(z)
print("Took %f seconds." % (time.time() - start))Copy the code

On my MacBook Pro, this code took 2.67 seconds! The reason it took so long was that we called the slice operation 500 times, which was extremely slow to run! A better option is to use the tf.unstack() operation to cut a matrix into a list of vectors only once!

z = tf.zeros([10])for x_i in tf.unstack(x):
    z += x_iCopy the code

This operation took 0.18 seconds and, of course, the most correct way to implement this requirement is to use the tf.reduce_sum() operation:

z = tf.reduce_sum(x, axis=0)Copy the code

This takes just 0.008 seconds, 300 times the original implementation! In addition to slicing operations, TensorFlow also overloads a series of mathematical logic operations, such as:

z = -x  # z = tf.negative(x)z = x + y  # z = tf.add(x, y)z = x - y  # z = tf.subtract(x, y)z = x * y  # z = tf.mul(x, y)z = x / y  # z = tf.div(x, y)z = x // y  # z = tf.floordiv(x, y)z = x % y  # z = tf.mod(x, y)z = x ** y  # z = tf.pow(x, y)z = x @ y  # z = tf.matmul(x, y)z = x > y  # z = tf.greater(x, y)z = x >= y  # z = tf.greater_equal(x, y)z = x < y  # z = tf.less(x, y)z = x <= y  # z = tf.less_equal(x, y)z = abs(x)  # z = tf.abs(x)z = x & y  # z = tf.logical_and(x, y)z = x | y  # z = tf.logical_or(x, y)z = x ^ y  # z = tf.logical_xor(x, y)z = ~x  # z = tf.logical_not(x)Copy the code

You can also use augmented versions of these operators, such as x += y and x **=2, which are also legal. Notice that Python does not allow overloading of and, OR, and not keywords.

TensorFlow also does not allow tensors to be used as Boolean types, as this can be error-prone:

X = tf.constant(1.)if x: # This will raise TypeError...Copy the code

If you want to check the value of this tensor, you can also use tf.cond(x,…). , or use if x is None to check the value of the variable.

Some operations are not supported, such as equal to judge == and do not equal to judge! = operators, which are overloaded in NUMpy but not in TF. If necessary, use the function versions of these functions tf.equal() and tf.not_equal().

Understand the order of execution and control dependencies

As we know, TensorFlow is symbolic programming. It does not run the defined operations directly, but instead creates an associated node in the computation diagram that can be executed with session.run (). This allows TF to determine the order of optimization during the optimization process, and to eliminate some nodes that are not needed in the operation, while all this happens in the run. If you only use tF.tensors in your diagrams, you don’t need to worry about dependencies, but you are more likely to use tF.variable (), which makes the problem more difficult. My suggestion is that if the tensors don’t meet the job requirements, just use Variables. This may not be intuitive, but let’s look at an example:

import TensorFlow as tf

a = tf.constant(1)
b = tf.constant(2)
a = a + b

tf.Session().run(a)Copy the code

Calculating a will return 3, as expected. Notice that we now have three tensors, two constant tensors and one tensor that stores the result of addition. Notice that we can’t rewrite the value of a tensor, if we want to change the value of a tensor, we have to create a new tensor, just like we did.

Tip: if you do not explicitly define a new computed graph, TF will automatically build a default computed graph for you. You can use tf.get_default_graph() to get a handle to the graph, and then you can view the graph. For example, it’s okay to print all the tensors that belong to this graph. Such as:

print(tf.contrib.graph_editor.get_tensors(tf.get_default_graph()))Copy the code

Unlike tensors, variables can be updated, so let’s use variables to implement what we just needed:

a = tf.Variable(1)
b = tf.constant(2)
assign = tf.assign(a, a + b)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run(assign))Copy the code

Again, we get 3, as expected. Notice that tf.assign() returns the tensor representing the assignment operation. So far, everything looks great, but let’s look at a slightly more complicated example:

a = tf.Variable(1)
b = tf.constant(2)
c = a + b

assign = tf.assign(a, 5)

sess = tf.Session()for i in range(10):
    sess.run(tf.global_variables_initializer())
    print(sess.run([assign, c]))Copy the code

Notice that the tensor C does not have a deterministic value. The value can be 3 or 7, depending on which addition and assignment operations run first.

You should also note that the order in which you define operations in code does not affect the order in which they are executed at the TF runtime; the only thing that matters is the control dependencies. Control dependencies are straightforward for tensors. Every time you use a tensor in an operation, the operation will define an implicit dependency on that tensor. But if you also use variables, things get worse because variables can take on many values.

When dealing with these variables, you may need to explicitly control dependencies by using tf.control_dependencies(), as in:

a = tf.Variable(1)
b = tf.constant(2)
c = a + bwith tf.control_dependencies([c]):
    assign = tf.assign(a, 5)

sess = tf.Session()for i in range(10):
    sess.run(tf.global_variables_initializer())
    print(sess.run([assign, c]))Copy the code

This ensures that the assignment is called after the addition operation.

Control flow operations: conditions and loops

When building complex models, such as cyclic neural networks, you may need conditions and loops to control the flow of operations. In this section, we will introduce some common control flow operations.

Suppose you want to decide whether to multiply or add two given tensors based on predicates. This can be done simply with tf.cond, which acts as the Python “if” function:

a = tf.constant(1)
b = tf.constant(2)

p = tf.constant(True)

x = tf.cond(p, lambda: a + b, lambda: a * b)

print(tf.Session().run(x))Copy the code

Since the predicate is True in this case, the output will be the result of addition, which is 3.

Most of the time, with TensorFlow, you are working with large tensors and want to perform operations in batches. The related conditional operation is tf.WHERE, similar to tF.cond, which accepts predicates but selects outputs based on the conditions in the batch.

a = tf.constant([1, 1])
b = tf.constant([2, 2])

p = tf.constant([True, False])

x = tf.where(p, a + b, a * b)print(tf.Session().run(x))Copy the code

This will return [3,2].

Another widely used control-flow operation is TF.while_loop. It allows you to build dynamic loops in TensorFlow that operate on sequences of variable length. Let’s see how to generate Fibonacci loops using TF.while_loops:

n = tf.constant(5)def cond(i, a, b):
    return i < ndef body(i, a, b):
    return i + 1, b, a + b

i, a, b = tf.while_loop(cond, body, (2, 1, 1))

print(tf.Session().run(b))Copy the code

This will print 5. In addition to initial values for loop variables, TF. while_Loops also accepts conditional and loop body functions. These loop variables are then updated with multiple calls to the loop body function until the condition returns False.

Now imagine that we want to preserve the entire Fibonacci sequence. We can update our loop body to record the history of the current value:

n = tf.constant(5)def cond(i, a, b, c):
    return i < ndef body(i, a, b, c):
    return i + 1, b, a + b, tf.concat([c, [a + b]], 0)

i, a, b, c = tf.while_loop(cond, body, (2, 1, 1, tf.constant([1, 1])))

print(tf.Session().run(c))Copy the code

Now, if you try to run it, TensorFlow will report an error and the shape of the fourth loop variable has changed. Therefore, you must make it clear that it is intended:

i, a, b, c = tf.while_loop(
    cond, body, (2, 1, 1, tf.constant([1, 1])),
    shape_invariants=(tf.TensorShape([]),
                      tf.TensorShape([]),
                      tf.TensorShape([]),
                      tf.TensorShape([None])))Copy the code

Not only does it get ugly, it’s also a little inefficient. Note that we are building many intermediate tensors that we do not use. TensorFlow provides a better solution for this growing array. Look at the tf TensorArray. Let’s do the same thing with tensor arrays this time:

n = tf.constant(5)

c = tf.TensorArray(tf.int32, n)
c = c.write(0, 1)
c = c.write(1, 1)def cond(i, a, b, c):
    return i < ndef body(i, a, b, c):
    c = c.write(i, a + b)    return i + 1, b, a + b, c

i, a, b, c = tf.while_loop(cond, body, (2, 1, 1, c))

c = c.stack()

print(tf.Session().run(c))Copy the code

TensorFlow while loops and tensor arrays are basic tools for building complex circular neural networks. As an exercise, try beam search using TF. while_Loops. Can tensor arrays be used to make things more efficient?

Design core and advanced visualizations using Python operations

The operational core in TensorFlow is written entirely in C++ for efficiency. But writing the TensorFlow core in C++ can be a pain. As a result, you may want to prototype quickly, but inefficiently, before spending hours implementing the core. Using tf.py_func(), you can convert any piece of Python code to a TensorFlow operation.

For example, this is how to implement a simple ReLU non-linear core as a Python operation in TensorFlow:

import numpy as npimport tensorflow as tfimport uuiddef relu(inputs):
    # Define the op in python
    def _relu(x):
        return np.maximum(x, 0.)    # Define the op's gradient in python
    def _relu_grad(x):
        return np.float32(x > 0)    # An adapter that defines a gradient op compatible with TensorFlow
    def _relu_grad_op(op, grad):
        x = op.inputs[0]
        x_grad = grad * tf.py_func(_relu_grad, [x], tf.float32)        return x_grad    # Register the gradient with a unique id
    grad_name = "MyReluGrad_" + str(uuid.uuid4())
    tf.RegisterGradient(grad_name)(_relu_grad_op)    # Override the gradient of the custom op
    g = tf.get_default_graph()    with g.gradient_override_map({"PyFunc": grad_name}):
        output = tf.py_func(_relu, [inputs], tf.float32)    return outputCopy the code

To verify that the gradient is correct, use TensorFlow’s gradient checker:

x = tf.random_normal([10])
y = relu(x * x)with tf.Session():
    diff = tf.test.compute_gradient_error(x, [10], y, [10])
    print(diff)Copy the code

Compute_gradient_error () compute_gradient_error() computes gradients numerically and returns the difference of the gradients provided. What we want is a very low difference.

Note that this implementation is very inefficient and is only suitable for prototyping because Python code is not parallelized and cannot run on the GPU. Once you’ve validated your idea, you’ll want to write it as C++ core.

In practice, we usually use Python operations for visualization on Tensorboard. Consider that you are building an image classification model and want to visualize the model’s predictions during training. TensorFlow allows visualization of images using the tf.summary.image() function:

image = tf.placeholder(tf.float32)
tf.summary.image("image", image)Copy the code

But this can only display the input image. In order to display predictions, you have to find a way to add annotations to the image, which is almost impossible for existing operations. It is easier to draw in Python and wrap it in a Python operation:

import ioimport matplotlib.pyplot as pltimport numpy as npimport PILimport tensorflow as tfdef visualize_labeled_images(images, labels, max_outputs=3, name="image"): def _visualize_image(image, label): # Do the actual drawing in python fig = plt.figure(figsize=(3, 3), dpi=80) ax = fig.add_subplot(111) ax.imshow(image[::-1,...] ) ax.text(0, 0, str(label), horizontalalignment="left", verticalalignment="top") fig.canvas.draw() # Write the plot as a memory file. buf = io.BytesIO() data = fig.savefig(buf,  format="png") buf.seek(0) # Read the image and convert to numpy array img = PIL.Image.open(buf) return np.array(img.getdata()).reshape(img.size[0], img.size[1], -1) def _visualize_images(images, labels): # Only display the given number of examples in the batch outputs = [] for i in range(max_outputs): output = _visualize_image(images[i], labels[i]) outputs.append(output) return np.array(outputs, dtype=np.uint8) # Run the python op. figs = tf.py_func(_visualize_images, [images, labels], tf.uint8) return tf.summary.image(name, figs)Copy the code

Note that since the digest is usually evaluated only occasionally (not every step), this implementation can be used in practice without worrying about efficiency.

Multiple Gpus and data parallelism

If you write software for a single CPU core in a language like C++ and make it run on multiple gpus in parallel, you’ll need to rewrite the software from scratch. Not so with TensorFlow. Because of its symbolism, TensorFlow can hide all of this complexity, eliminating the need to extend programs on multiple cpus and Gpus.

Let’s start with a simple example of adding two vectors on a CPU:

 import tensorflow as tfwith tf.device(tf.DeviceSpec(device_type="CPU", device_index=0)):
    a = tf.random_uniform([1000, 100])
    b = tf.random_uniform([1000, 100])
    c = a + b

tf.Session().run(c)Copy the code

The same thing can be done on gpus:

with tf.device(tf.DeviceSpec(device_type="GPU", device_index=0)):
    a = tf.random_uniform([1000, 100])
    b = tf.random_uniform([1000, 100])
    c = a + bCopy the code

But what if we have two Gpus and want to use them at the same time? To do this, we can split the data and use a separate GPU to process each half:

split_a = tf.split(a, 2)
split_b = tf.split(b, 2)

split_c = []for i in range(2):    with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)):
        split_c.append(split_a[i] + split_b[i])

c = tf.concat(split_c, axis=0)Copy the code

Let’s rewrite it in more general form so that we can replace addition with any other operation:

def make_parallel(fn, num_gpus, **kwargs):
    in_splits = {}    for k, v in kwargs.items():
        in_splits[k] = tf.split(v, num_gpus)

    out_split = []    for i in range(num_gpus):        with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)):            with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
                out_split.append(fn(**{k : v[i] for k, v in in_splits.items()}))    return tf.concat(out_split, axis=0)def model(a, b):
    return a + b

c = make_parallel(model, 2, a=a, b=b)Copy the code

You can replace the model with any function that takes a set of tensors as inputs and returns tensors as a result, provided the inputs and outputs are batch. Note that we also added a variable scope and set reuse to True. This ensures that we use the same variable to handle both splits. In our next example, this will come in handy.

Let’s look at a slightly more practical example. We want to train neural networks on multiple Gpus. During training, we need to calculate not only forward propagation, but also back propagation (gradient). But how do we compute gradients in parallel? It turns out to be simple.

Recall from the first video that we wanted to fit quadratic polynomials into a set of samples. We reorganized some of our code to do a lot of work in our model functions:

import numpy as npimport tensorflow as tfdef model(x, y): w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1) yhat = tf.squeeze(tf.matmul(f, w), 1) loss = tf.square(yhat - y) return loss x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) loss = model(x, Y) train_op = tf.train.adamoptimizer (0.1). Minimize (tF.reduce_mean (loss))def generate_data(): X_val = np. Random. Uniform (-10.0, 10.0, size=100) y_val = 5 * np. Square (x_val) + 3 return x_val, y_val sess = tf.Session() sess.run(tf.global_variables_initializer())for _ in range(1000): x_val, y_val = generate_data() _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) print(sess.run(tf.contrib.framework.get_variables_by_name("w")))Copy the code

Now let’s parallelize it using the make_PARALLEL we just wrote. We only need to change two lines of code from the above:

Loss = make_parallel(Model, 2, x=x, y=y) train_op = tf.train.adamoptimizer (0.1). Minimize (tf.reduce_mean(loss), colocate_gradients_with_ops=True)Copy the code

The only thing we need to change to parallel gradient backpropagation is to set the colocate_gradientS_with_OPS flag to True. This ensures that the gradient operation is run on the same device as the original operation.

Debug the TensorFlow model

The symbolic nature of TensorFlow makes debugging TensorFlow code relatively difficult compared to regular Python code. Here, we introduce some of the tools that come with TensorFlow to make debugging easier.

Probably the most common mistake that can occur with TensorFlow is passing misshapely tensors to operations. Many TensorFlow operations can operate on tensors of different dimensions and shapes. This is handy when using the API, but can cause additional trouble when something goes wrong.

For example, consider the tf.matmul operation, which can multiply two matrices:

a = tf.random_uniform([2, 3])
b = tf.random_uniform([3, 4])
c = tf.matmul(a, b)  # c is a tensor of shape [2, 4]Copy the code

But the same function can also be used for batch matrix multiplication:

a = tf.random_uniform([10, 2, 3])
b = tf.random_uniform([10, 3, 4])
tf.matmul(a, b)  # c is a tensor of shape [10, 2, 4]Copy the code

Another example we talked about earlier in the broadcast section is the addition operation that supports broadcasts:

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = a + b  # c is a tensor of shape [2, 2]Copy the code

use`tf.assert*`The operation validates your tensor

One way to reduce the likelihood of unnecessary behavior is to explicitly verify the dimensions or shapes of intermediate tensors using the TF.Assert * operation.

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
check_a = tf.assert_rank(a, 1)  # This will raise an InvalidArgumentError exceptioncheck_b = tf.assert_rank(b, 1)with tf.control_dependencies([check_a, check_b]):
    c = a + b  # c is a tensor of shape [2, 2]Copy the code

Remember that the assertion node is part of the graph like any other operation, and if it is not evaluated, it will be pruned during session.run (). Therefore, be sure to create explicit dependencies for assertion operations to force TensorFlow to perform them.

You can also verify the value of a tensor at run time using assertions:

check_pos = tf.assert_positive(a)Copy the code

For a complete list of assertion operations, see the official documentation.

use`tf.Print`Record the value of the tensor

Another useful built-in function for debugging is tf.print, which logs a given tensor to standard error:

input_copy = tf.Print(input, tensors_to_print_list)Copy the code

Notice that tf.print returns a copy of the first argument as output. One way to force tF.print to run is to pass its output to another operation to execute. For example, if we wanted to print the values of tensors A and b before adding them, we could do this:

a = ...
b = ...
a = tf.Print(a, [a, b])
c = a + bCopy the code

Alternatively, we can define control dependencies manually.

use`tf.compute_gradient_error`Check the gradient

Not all operations in TensorFlow have gradients, and it is easy to inadvertently build graphs that TensorFlow cannot calculate gradients.

Let’s look at an example:

import tensorflow as tfdef non_differentiable_entropy(logits):
    probs = tf.nn.softmax(logits)    return tf.nn.softmax_cross_entropy_with_logits(labels=probs, logits=logits)

w = tf.get_variable("w", shape=[5])
y = -non_differentiable_entropy(w)

opt = tf.train.AdamOptimizer()
train_op = opt.minimize(y)

sess = tf.Session()
sess.run(tf.global_variables_initializer())for i in range(10000):
    sess.run(train_op)

print(sess.run(tf.nn.softmax(w)))Copy the code

We use TF.nn.softmax_cross_entropy_with_logits to define the entropy of category distribution. We then use the Adam optimizer to find the weight with the maximum entropy. If you’ve taken an information theory course, you know that entropy is the highest for a uniform distribution. So you expect the result to be [0.2,0.2,0.2,0.2,0.2]. But if you run this, you might get unexpected results:

Author: ApacheCN_ dragon link: https://www.jianshu.com/p/57ce5a27a5f3

Efficient programming with TensorFlow

TensorFlow fundamentals

Understand static and dynamic shapes

Scope and when to use it

4. Advantages and disadvantages of broadcasting

Feed data to TensorFlow

Constant mode (tf.constant)

Placeholder mode (tf.placeholder)

Python operations (tf.py_func)

Use TF’s own dataset API

Use operator overloading

Understand the order of execution and control dependencies

Control flow operations: conditions and loops

Design core and advanced visualizations using Python operations

Multiple Gpus and data parallelism

Debug the TensorFlow model

usetf.assert*The operation validates your tensor

usetf.PrintRecord the value of the tensor

usetf.compute_gradient_errorCheck the gradient

Related Posts

Apache Griffin Quick Start

How can Python customize modules? The Python Foundation tutorial, Module 10, custom Modules

·HTML simple implementation detection input has been completed