This is the 10th day of my participation in the August More text Challenge. For details, see: August More Text Challenge

Deep Learning with Python

This article is one of a series of notes I wrote while studying Deep Learning with Python (2nd edition, by Francois Chollet). This post marks the turn from Jupyter Notebooks to Markdown, as you can check out the original.ipynb notebooks at GitHub or Gitee.

You can read the original copy of the book online (in English) at this website. The book’s author also gave the accompanying Jupyter notebooks.

This is one of the notes from Chapter 7. Advanced Deep-Learning Best Practices.

7.1 Going beyond the Sequential model: the Keras functional API

Solution without a Sequential model: Keras functional API

The Sequential model we used before is the most basic but commonly used model, where there is only one input and one output, and the network is a linear stack of layers.

However, sometimes our networks need more than one input. For example, to predict the price of clothes, input commodity information, text description and pictures, these three kinds of information should be processed with Dense, RNN and CNN respectively. After extracting the information, a merging module is used to integrate all kinds of information to finally predict the price:

Sometimes our network needs multiple outputs (multiple headers). For example, if we type in a novel, we want to get a classification of the novel and guess when it was written. This problem should use a common module to process the text, extract the information, and then give it to the novel classifier and date regression respectively to predict classification and writing time:

Sometimes, complex networks use nonlinear network topologies. For example, there is something called Inception, where the input is processed by multiple parallel convolution branches and the output of those branches is then combined into a single tensor; There is also a method called residual Connection where the output tensors are added to the output tensors to re-inject the previous representations into the downstream data stream to prevent information loss in the process:

These networks are graph-like, a network structure, not Sequential, linear stacks. To implement this complex model in Keras, you need to use the functional API of Keras.

Functional API

Keras’s functional API uses layers as functions, receiving and returning tensors:

from tensorflow.keras import Input, layers

input_tensor = Input(shape=(32)),# Input tensor
dense = layers.Dense(32, activation='relu')    # layer function
output_tensor = dense(input_tensor)   # Output tensor
Copy the code

Let’s build a simple network using a functional API to compare Sequential:

# Sequential model

from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

seq_model = Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64, )))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))

seq_model.summary()
Copy the code
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_1 (Dense) (None, 32) 2080 _________________________________________________________________ dense_2 (Dense) (None, 32) 1056 _________________________________________________________________ dense_3 (Dense) (None, 10) 330 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 3466 Trainable params: 3466 Non - trainable params: 0 _________________________________________________________________Copy the code
Functional API model

from tensorflow.keras.models import Model
from tensorflow.keras import Input
from tensorflow.keras import layers

input_tensor = Input(shape=(64, ))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)

func_model = Model(input_tensor, output_tensor)

func_model.summary()
Copy the code
Model: "functional_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 64)] 0 _________________________________________________________________ dense_4 (Dense) (None, 32) 2080 _________________________________________________________________ dense_5 (Dense) (None, 32) 1056 _________________________________________________________________ dense_6 (Dense) (None, 10) 330 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 3466 Trainable params: 3466 Non - trainable params: 0 _________________________________________________________________Copy the code

When the Model object is instantiated, only the input tensors and the output tensors derived from the transformation of the input tensors (through various layers) are given. Keras automatically takes each layer from input_tensor to Output_tensor and combines them into a graph data structure — a Model.

Note that the output_tensor must be obtained from the corresponding Input_tensor transform. If you build a Model with unrelated input and output tensors, you will burst Graph disconnected ValueError:

>>> unrelated_input = Input(shape=(32)),>>> bad_model = Model(unrelated_input, output_tensor)
.# Traceback
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_2:0", shape=(None.64), dtype=float32) at layer "dense_4". The following previous layers were accessed without issue: []
Copy the code

In other words, there is no way to form a Graph from the specified input to the output.

A network built with this functional API is compiled, trained, or evaluated in the same way as a Sequential.

# compiler
func_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# Random training data
import numpy as np
x_train = np.random.random((1000.64))
y_train = np.random.random((1000.10))

# training
func_model.fit(x_train, y_train, epochs=10, batch_size=128)

# assessment
score = func_model.evaluate(x_train, y_train)
Copy the code
Epoch 1/10 8/8 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 1 ms/s step - loss: 33.5245... Epoch 10/10 8/8 [==============================] - 0s 685us/step - loss: 74.5707 32/32 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 617 us/step - loss: 78.1296Copy the code

Multiple input model

Functional apis can be used to build models with multiple inputs. Models with multiple inputs usually combine the different branches at some point with a layer that can combine tensors. Composition tensors can be added, joined, etc. Keras provides layers such as keras.layers.add, keras.layers.concatenate, etc.

Look at a specific example and do a question and answer model. A typical q&A model uses two inputs:

  • Question text
  • Provide informative text (such as relevant news articles) to answer questions

The model generates (outputs) an answer. The simplest case is to answer only one word, and we can get the result by making the output softMax for some predefined word list.

To implement this model with functional API, we first build two independent branches, represent Reference text and question as vectors respectively, then connect these two vectors, and add a SoftMax classifier to the representation of the connection completion:

from tensorflow.keras.models import Model
from tensorflow.keras import layers
from tensorflow.keras import Input

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

# reference
text_input = Input(shape=(None, ), dtype='int32', name='text')
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)

Problem #
question_input = Input(shape=(None, ), dtype='int32', name='question')
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

Merge reference, question branch
concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)

# Top-level classifier
answer = layers.Dense(anser_vocabulary_size, activation='softmax')(concatenated)

model = Model([text_input, question_input], answer, name='QA')

model.summary()

model.compile(optimizer='rmsprop', 
              loss='categorical_crossentropy', 
              metrics=['acc'])
Copy the code

When training a multi-input model, you can pass in a list of components, or a dictionary for an input with a name:

import numpy as np

num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size, 
                         size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, 
                             size=(num_samples, max_length))
answers = np.random.randint(0.1, 
                            size=(num_samples, answer_vocabulary_size)) # one-hot coded

# Method 1. Pass the list
model.fit([text, question], answers, epochs=2, batch_size=128)

# Method 2. Pass the dictionary
model.fit({'text': text, 'question': question}, answers, epochs=2, batch_size=128)
Copy the code

Multiple output model

It is also convenient to use functional apis to build models with multiple outputs (many). For example, let’s make a network that tries to simultaneously predict different natures of data: Enter some of someone’s social media posts and try to predict that person’s age, gender, and income level attributes:

The implementation is simple, just write three different outputs at the end:

from tensorflow.keras.layers import Conv1D, MaxPooling1D, GlobalMaxPool1D, Dense
from tensorflow.keras import Input
from tensorflow.keras.models import Model

vocabulary_size = 50000
num_income_groups = 10

posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_post = layers.Embedding(256, vocabulary_size)(posts_input)
x = Conv1D(128.5, activation='relu')(embedded_post)
x = MaxPooling1D(5)(x)
x = Conv1D(256.5, activation="relu")(x)
x = Conv1D(256.5, activation="relu")(x)
x = MaxPooling1D(5)(x)
x = Conv1D(256.5, activation="relu")(x)
x = Conv1D(256.5, activation="relu")(x)
x = GlobalMaxPool1D()(x)
x = Dense(128, activation='relu')(x)

# Define multiple headers (output)
age_prediction = Dense(1, name='age')(x)
income_prediction = Dense(num_income_groups, activation='softmax', name='income')(x)
gender_prediction = Dense(1, activation='sigmoid', name='gender')(x)

model = Model(posts_input, [age_prediction, income_prediction, gender_prediction])

model.summary()
Copy the code

Compilation of the multi-head model

When compiling this model, it is important to note that different loss functions need to be specified for each head of the network because of the different objectives.

However, the effect of gradient descent can only be to minimize a scalar, so in Keras, different losses specified for different outputs at compile time will be added up to get a global loss in training, and the training process is to minimize this global loss.

In this case, if the loss contribution is severely unbalanced, the model prioritizes the task with the largest loss value to the exclusion of other tasks. To solve this problem, different losses can be weighted. For example, the loss value of MSE is usually 3-5, and the loss value of binary_crossentropy is usually as low as 0.1. In order to balance the loss contribution, we can set the weight of Binary_crossentropy to 10 and the weight of MSE loss to 0.5.

Multiple losses and weights are specified using lists or dictionaries:

model.compile(optimizer='rmsprop',
              loss=['mse'.'categorical_crossentropy'.'binary_crossentropy'],
              loss_weights=[0.25.1..10.])

Or, if you have a name for the output layer, you can use a dictionary:
model.compile(optimizer='rmsprop',
              loss={'age': 'mse'.'income': 'categorical_crossentropy'.'gender': 'binary_crossentropy'},
              loss_weights={'age': 0.25.'income': 1..'gender': 10.})
Copy the code

Training of multi-head models

To train this model, simply pass the target output in a list or dictionary:

model.fit(posts, [age_targets, income_targets, gender_targets],
          epochs=10, batch_size=64)

# or

model.fit(posts, {'age': age_targets,
                  'income': income_targets,
                  'gender': gender_targets},
          epochs=10, batch_size=64)
Copy the code

Directed acyclic graphs of layers

With functional apis, in addition to building multi-input and multi-output models, we can also implement networks with complex internal topologies.

In fact, the neural network in Keras can be any directed acyclic graph composed of layers. The more well-known components of the graph structure are the Inception module and residual connections.

Inception module

Inception is a stack of modules, where each module looks like a small independent network, and the modules are divided into multiple parallel branches to connect the resulting features together. This allows the network to learn the spatial and channel-by-channel features of the image separately, which is more efficient than using a single network to learn these features simultaneously.

Here is a simplified Inception V3 module:

Note: The 1×1 convolution used here, known as pointwise convolution, is a feature of the Inception module. It looks at one pixel at a time, and the calculated features are able to mix information from the incoming channels together without mixing in spatial information. In this way, the learning of open channel feature and spatial feature can be distinguished.

This can be implemented using a functional API:

from tensorflow.keras import layers

x = Input(shape=(None.None.3))    # RGB images

branch_a = layers.Conv2D(128.1, activation='relu', strides=2, padding='same')(x)

branch_b = layers.Conv2D(128.1, activation='relu', padding='same')(x)
branch_b = layers.Conv2D(128.3, activation='relu', strides=2, padding='same')(branch_b)

branch_c = layers.AveragePooling2D(3, strides=2, padding='same')(x)
branch_c = layers.Conv2D(128.3, activation='relu', padding='same')(branch_c)

branch_d = layers.Conv2D(128.1, activation='relu', padding='same')(x)
branch_d = layers.Conv2D(128.3, activation='relu', padding='same')(branch_d)
branch_d = layers.Conv2D(128.3, activation='relu', strides=2, padding='same')(branch_d)

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)
Copy the code

In fact, Keras has a complete Inception V3 architecture built in. Can pass keras. Applications. Inception_v3. InceptionV3 to call.

Related to Inception, there is also something called Xception. The word Xception means extreme inception. It is a relatively extreme inception that completely separates the learning of channel features from the learning of spatial features. Xception has roughly the same number of arguments as Inception V3, but it uses them more efficiently, so it runs with better performance and precision on large data sets.

Residual connection

Residual Connection is a commonly used component that solves the problem of gradient disappearance and representation bottleneck in large-scale deep learning models. In general, it can be helpful to add residual joins to any model with more than 10 layers.

  • Gradient disappearance: that is, after more layers, the previously learned representation becomes blurred or even completely lost, resulting in the network can not be trained.
  • Represents a bottleneck: Layers are stacked so that the later layer can access only what the previous layer learned. If one of the layers is too small (less information can be crammed into the activation) the card is pulled down and a bottleneck occurs.

Residual connections allow the output of a previous layer to be input to a later layer (creating a shortcut in the network). The output of the previous layer is not linked to the activation of the later layer, but is added to the activation of the later layer (if the shape is different, a linear transformation is used to change the activation of the previous layer to the target shape).

Note: Linear transformation can be used in the Dense layer without activation, or in CNN with the convolution without activation 1×1.

from keras import layers

x = ...

y = layers.Conv2D(128.3, activation='relu', padding='same')(x)
y = layers.Conv2D(128.3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)

# Linear transformation is required for different shapes:
residual = layers.Conv2D(128.1, strides=2, padding='same')(x)  # Use 1×1 convolution to subsample x linearly to have the same shape as y

y = layers.add([y, residual])
Copy the code

Shared layer weight

Using a functional API, another operation is to use a layer instance multiple times. If the same layer instance is called multiple times, the same weight can be reused. Using this feature, we can build models with shared branches, where several branches all share the same knowledge and perform the same operations.

For example, we want to evaluate the semantic similarity between two sentences. The model inputs two sentences and outputs a score in the range of 0 to 1. The larger the value, the more similar the meaning of the sentences.

In this problem, the two input sentences are interchangeable (the similarity of sentence A with respect to B is equal to the similarity of sentence B with respect to A). Therefore, two separate models should not be learned to process two input sentences separately. Instead, one LSTM layer should be used to process two sentences. The representation (weight) of the LSTM layer is learned from both inputs simultaneously. This model is called Siamese LSTM (Siamese LSTM) or shared LSTM (Shared LSTM).

Such a model is implemented using layer sharing in the Keras functional API:

from tensorflow.keras import layers
from tensorflow.keras import Input
from tensorflow.keras.models import Model

lstm = layers.LSTM(32)  Example only one LSTM

left_input = Input(shape=(None.128))
left_output = lstm(left_input)

right_input = Input(shape=(None.128))
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

model = Model([left_input, right_input], predictions)

model.summary()
Copy the code
Model: "functional_8" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_13 (InputLayer)  [(None, None, 128)] 0 __________________________________________________________________________________________________ input_14 (InputLayer) [(None, None, 128)] 0 __________________________________________________________________________________________________ lstm_15 (LSTM) (None, 32) 20608 input_13[0][0] input_14[0][0] __________________________________________________________________________________________________ concatenate_13 (Concatenate) (None, 64) 0 lstm_15[0][0] lstm_15[1][0] __________________________________________________________________________________________________ dense_15 (Dense) (None, 1) 65 concatenate_13[0][0] ================================================================================================== Total params: Non-trainable Params: 20,673 0 __________________________________________________________________________________________________Copy the code

Treat the model as a layer

In Keras, we can use the Model as a layer (the Model phenomenon is a large layer), Sequential classes and Model classes can be used as layers. Just call it functionally like a layer:

y = model(x)
y1, y2 = model_with_multi_inputs_and_outputs([x1, x2])
Copy the code

For example, we deal with a visual model with dual cameras as input (this model can sense depth). We implement the network using the applications.Xception model as the layer and using the previous method of sharing the layer:

from tensorflow.keras import layers
from tensorflow.keras import Input
from tensorflow.keras import applications

xception_base = applications.Xception(weights=None, include_top=False)

left_input = Input(shape=(250.250.3))
right_input = Input(shape=(250.250.3))

left_features = xception_base(left_input)
right_input = xception_base(right_input)

merged_features = layers.concatenate([left_features, right_input], axis=-1)
Copy the code

conclusion

Using the Keras functional API, it is more flexible to build a complex neural network based on directed acyclic graphs, such as:

  • Multiple input model
  • Multiple output model
  • Residual connection
  • Shared layer weight

These techniques can be used to produce models that are more powerful than simple Sequential neural networks.