Author: Chen_H wechat & QQ: 862251340 wechat official account: Coderpai


Build a multi-task learning model step by step with TensorFlow

introduce

Why multitasking?

When you’re thinking about something new, they often use their previous experience and knowledge to speed up the learning process in the present. When we learn a new language, especially the related language, we usually use the language knowledge we have learned at the first level to speed up the learning process of the new language. This process can also be understood in another way — learning a new language helps you understand and speak your mind better.

Our brains learn many different tasks at once, and whether we want to translate from English to Chinese or from Chinese to German, we all use the same brain structure, our own brains. Similarly, in our machine learning model, if we use the same network to complete these two tasks simultaneously, we can call this task “multi-tasking learning”.

“Multitasking learning” has been a very interesting and exciting area of research in the last few years or the next few years, because it radically reduces the amount of data required to learn new concepts. One of the great things about deep learning is that we can use parameter sharing between models to optimize our models, and this approach will be particularly prominent in multitasking learning.

Before we started studying this area, I ran into a few obstacles — while it was easy to understand the network architecture needed to implement multitasking learning, it was hard to figure out how to implement it in TensorFlow. In addition to standard networking in TensorFlow, we need to have a good understanding of how it works, but most tutorials on the web don’t have a good tutorial feature. I hope the following tutorial will simply explain some of the key concepts and help you with your learning difficulties.

What do we need to do

  1. See an example of a TensorFlow diagram. Use TensorFlow for multitasking learning. If you already know the TensorFlow diagram, you can skip this part.
  2. Learn how we use computational graphs for multitasking learning. We will use an example to show how to adjust a simple computational graph for multitasking learning.

Use a simple example to understand the computational graph

TensorFlow’s graphs make TensorFlow run faster and are an important, if sometimes confusing, component of deep learning.

Computational diagrams can easily organize the organizational structure of the model, which is of great significance to our multi-task learning. First, let’s understand some concepts about multitasking.

A simple example: linear transformations

We will perform a simple calculation on the calculated graph — a linear transformation of the input data and a calculation of the square root loss.

# Import Tensorflow and numpy import tensorflow as tf import numpy as np # ====================== # Define the Graph # ====================== # Create Placeholders For X And Y (for feeding in data) X = tf.placeholder("float",[10, 10],name="X") # Our input is 10x10 Y = tf.placeholder("float", [10, 1],name="Y") # Our output is 10x1 # Create a Trainable Variable, "W", Our weights for the linear transformation initial_W = np.zeros((10,1)) W = tf.variable (initial_W, name="W", dtype="float32") # Define Your Loss Function Loss = tf.pow(tf.add(Y,-tf.matmul(X,W)),2,name="Loss")Copy the code

The diagram and code above have a few points to emphasize:

  • If we run this code now, we won’t get any output. Remember, the calculation diagram is just a template — it doesn’t do anything on it. If we want a computational output, then we must tell TensorFlow to run it using a Session.
  • We haven’t obviously created the computed graph object yet. You might expect that we have to create an image object somewhere so that TensorFlow knows exactly what the computational graph we want to create is. In fact, we can tell TensorFlow what code is in the diagram by using the TensorFlow operation.

Tip: Keep the graphs separate. You usually do a fair amount of data manipulation and computation outside of the graph, which means we have to distinguish between code that belongs to the graph and code that doesn’t. I like to keep my calculations in a separate file so they can be easily distinguished.

The calculations on the calculation graph are performed in a TensorFlow session. To get results from a session, you need to provide two things: the target result and the input data.

Target result or action. You can tell TensorFlow which parts of the calculation graph to return values, and it will automatically calculate the internal calculation that needs to be run. For example, you can call an operation to initialize a variable.

Input channel. In most calculations, you will provide temporary input data. In this case, you can build the graph using a placeholder for this data and enter it at calculation time. Not all calculations or operations require input — for many people, all information is contained in the computation diagram.

# Import Tensorflow and Numpy import Tensorflow as tf import numpy as np # ====================== # Define the Graph # ====================== # Create Placeholders For X And Y (for feeding in data) X = tf.placeholder("float",[10, 10],name="X") # Our input is 10x10 Y = tf.placeholder("float", [10, 1],name="Y") # Our output is 10x1 # Create a Trainable Variable, "W", Our weights for the linear transformation initial_W = np.zeros((10,1)) W = tf.variable (initial_W, name="W", dtype="float32") # Define Your Loss Function Loss = tf.pow(tf.add(Y,-tf.matmul(X,W)),2,name="Loss") with tf.Session() as  sess: # set up the session sess.run(tf.initialize_all_variables()) Model_Loss = sess.run( Loss, # the first argument is the name of the Tensorflow variabl you want to return { # the second argument is the data for 1, 0 The placeholder X: Np.random.rand (10,10), Y: NP.Random.rand (10). 0Copy the code

How do I multitask using computational graphs

When we create a neural network that performs multitasking learning, we want some of the neurons in the network to be shared, while other parts of the network are individually designed for different tasks. When we train, we want each individual task to modify the shared neurons.

So, first, we draw a simple two-task network structure with a shared layer and a specific network layer for each individual task. We will feed these outputs to our loss function with our targets. I have marked in the diagram where we need design placeholders.

# GRAPH CODE # ============ # Import Tensorflow import Tensorflow as tf # ====================== # Define the Graph # ====================== # Define the Placeholders X = tf.placeholder("float", [10, 10], name="X") Y1 = tf.placeholder("float", [10, 1], name="Y1") Y2 = tf.placeholder("float", [10, 1], Name ="Y2") # Define the weights for the layers shared_layer_weights = tf.variable ([10,20], Name ="share_W") Y1_layer_weights = tf.Variable([20,1], name="share_Y1") Y2_layer_weights = tf.Variable([20,1], name="share_Y2") # Construct the Layers with RELU Activations shared_layer = tf.nn.relu(tf.matmul(X,shared_layer_weights)) Y1_layer = tf.nn.relu(tf.matmul(shared_layer,Y1_layer_weights)) Y2_layer_weights = tf.nn.relu(tf.matmul(shared_layer,Y2_layer_weights)) # Calculate Loss Y1_Loss = tf.nn.l2_loss(Y1,Y1_layer) Y2_Loss = tf.nn.l2_loss(Y2,Y2_layer)Copy the code

When we train the network, we want to train task 2 without changing the parameters of task 1. However, the parameters of the shared layer change when training for either task. This may seem a little difficult — usually we only have one optimizer in the diagram, because you only optimize one loss function. Fortunately, we can use the nature of graphs to subtly train the model in two ways.

Interval training

The first solution is particularly useful when you need a batch of Task 1 data, followed by a batch of Task 2 data.

Keep in mind that TensorFlow automatically calculates what calculations you need to do for the operation you want, and only those calculations. This means that if we define an optimizer for just one of the tasks, it will train only the parameters needed to compute that task — and keep the rest of the parameters separate. Since task 1 depends only on task 1 layer and the sharing layer, the parameters of task 2 layer do not change. Let’s draw the required optimizer diagram at the end of each task.

# GRAPH CODE # ============ # Import Tensorflow and Numpy import Tensorflow as tf import numpy as np # ====================== # Define the Graph # ====================== # Define the Placeholders X = tf.placeholder("float",  [10, 10], name="X") Y1 = tf.placeholder("float", [10, 20], name="Y1") Y2 = tf.placeholder("float", [10, 20], Name ="Y2") # Define the weights for the layers initial_shared_layer_weights = np.random.rand(10,20) Initial_Y1_layer_weights = np.random. Rand (20,20) initial_Y2_layer_weights = np.random. Rand (20,20) shared_layer_weights =  tf.Variable(initial_shared_layer_weights, name="share_W", dtype="float32") Y1_layer_weights = tf.Variable(initial_Y1_layer_weights, name="share_Y1", dtype="float32") Y2_layer_weights = tf.Variable(initial_Y2_layer_weights, name="share_Y2", dtype="float32") # Construct the Layers with RELU Activations shared_layer = tf.nn.relu(tf.matmul(X,shared_layer_weights)) Y1_layer = tf.nn.relu(tf.matmul(shared_layer,Y1_layer_weights)) Y2_layer =  tf.nn.relu(tf.matmul(shared_layer,Y2_layer_weights)) # Calculate Loss Y1_Loss = tf.nn.l2_loss(Y1-Y1_layer) Y2_Loss = tf.nn.l2_loss(Y2-Y2_layer) # optimisers Y1_op = tf.train.AdamOptimizer().minimize(Y1_Loss) Y2_op = tf.train.AdamOptimizer().minimize(Y2_Loss)Copy the code

We can multitask learning by calling each task optimizer alternately, which means we can constantly pass some information from each task to another task, since this is done through the shared layer. Loosely speaking, we are discovering “commonalities” between tasks. The following code simply implements this process for us. If you follow my lead step by step, you can copy the following code:

# Calculation (Session) Code # ========================== # open the session with tf.Session() as session: Session.run (tf.initialize_all_variables()) for iters in range(10): if np.random.rand() < 0.5: Run ([Y1_op, Y1_loss], {X: np.random. Rand (10,10)*10, Y1: np.random. Rand (10,20)*10, Y2: Np.random. Rand (10,20)*10}) print(Y1_loss) else: _, Y2_loss = session.run([Y2_op, Y2_loss], {X: Rand (10,10)*10, Y1: np.random.rand(10,20)*10, Y2: np.random.rand(10,20)*10}) print(Y2_loss)Copy the code

Hint: When is a good time to alternate training?

Alternate training is a good idea when you have different data sets for each different task (e.g., translation from English to French and translation from English to German). By designing the network in this way, we can improve the performance of each task without having to hunt for more training data.

Alternate training is one of the most common things we do, because there aren’t enough data sets to meet both of your needs. Let’s take an example. For example, in machine vision, you might perform one of the tasks to predict whether the object is rotated or not, and another task might require you to change the camera object. The two tasks are obviously related.

Hint: When is not a good time to alternate training?

Alternate training is easy to bias towards specific tasks. The first method is obvious. If one of your tasks has a larger data set than the others, then if you train in proportion to the size of the data set, your shared layer will contain information about the tasks that have more data.

The second case is not so. If the training is alternated, the final task in the model will be biased in the parameters. You don’t have any obvious way to overcome this, but it does mean trying to use a second workout when you don’t need to alternate.

Joint training

When you have data sets with multiple tags per input, then what you really want is to train these tasks jointly. The question is, how do you maintain independence from one task to another? The answer is very simple. You just add and optimize the loss functions for individual tasks. Here is an icon showing a network structure that can be jointly trained, with the following code:

# GRAPH CODE # ============ # Import Tensorflow and Numpy import Tensorflow as tf import numpy as np # ====================== # Define the Graph # ====================== # Define the Placeholders X = tf.placeholder("float",  [10, 10], name="X") Y1 = tf.placeholder("float", [10, 20], name="Y1") Y2 = tf.placeholder("float", [10, 20], Name ="Y2") # Define the weights for the layers initial_shared_layer_weights = np.random.rand(10,20) Initial_Y1_layer_weights = np.random. Rand (20,20) initial_Y2_layer_weights = np.random. Rand (20,20) shared_layer_weights =  tf.Variable(initial_shared_layer_weights, name="share_W", dtype="float32") Y1_layer_weights = tf.Variable(initial_Y1_layer_weights, name="share_Y1", dtype="float32") Y2_layer_weights = tf.Variable(initial_Y2_layer_weights, name="share_Y2", dtype="float32") # Construct the Layers with RELU Activations shared_layer = tf.nn.relu(tf.matmul(X,shared_layer_weights)) Y1_layer = tf.nn.relu(tf.matmul(shared_layer,Y1_layer_weights)) Y2_layer =  tf.nn.relu(tf.matmul(shared_layer,Y2_layer_weights)) # Calculate Loss Y1_Loss = tf.nn.l2_loss(Y1-Y1_layer) Y2_Loss = tf.nn.l2_loss(Y2-Y2_layer) Joint_Loss = Y1_Loss + Y2_Loss # optimisers Optimiser = tf.train.AdamOptimizer().minimize(Joint_Loss) Y1_op = tf.train.AdamOptimizer().minimize(Y1_Loss) Y2_op = tf.train.AdamOptimizer().minimize(Y2_Loss) # Joint Training # Calculation (Session) Code # ========================== # open the session with tf.Session() as session: session.run(tf.initialize_all_variables()) _, Joint_Loss = session.run([Optimiser, Joint_Loss], { X: Rand (10,10)*10, Y1: np.random.rand(10,20)*10, Y2: np.random.rand(10,20)*10}) print(Joint_Loss)Copy the code

conclusion

In this article, we have looked at the basic principles of multitasking learning in deep learning. If you’ve used TensorFlow before and have your own projects, hopefully this is enough to get you started.

For those who want more content and more detailed examples of how to use these examples to improve multitasking performance, check out the related papers yourself.


Source: making