The application of RNN in Alidien

0 x00 the

This article is based on Ali recommended DIEN code, sorting out some concepts of RNN, as well as some source code in TensorFlow. This blog is designed to help you understand each step in detail and why you are doing it.

0x01 Background

1.1 RNN

RNN, Recurrent Neural Networks, Recurrent Neural Networks.

People don’t start from scratch. For example, when we read, our understanding of each word depends on the information we have seen before, rather than discarding all the information we have seen before. When applied to deep learning, if we want to learn to understand some information that depends on the above information, RNN can do it. It has a circular operation, which can make it retain the previously learned content.

The most common way to define an RNN is:

output = new_state = f(W * input + U * state + B) = act(W * input + U * state + B)
Copy the code

U, W are network parameters (weight matrix), b is bias parameter, these parameters are learned through backward propagation training network.
Act is the activation function, usually the sigmoID or TANh function is chosen.

1.2 DIEN project code

In DIEN project, I took the RNN code of TensorFlow into my own project and made some modifications, specifically:

GRUCell was used;
Custom VecAttGRUCell;
Rnn.py has been modified because the VecAttGRUCell interface has been modified;

0x02 Cell

The basic unit of RNN is called Cell. Don’t underestimate the small Cell. It has not only one Neuron unit, but N hidden units.

As a result, we noticed that tensorflow defined in a cell (BasicRNNCell/BasicLSTMCell/GRUCell/RNNCell/LSTMCell) structure when the need to provide a parameter is hidden_units_size.

In actual neural networks, each gate processing function is actually processed by a certain number of hidden layer neurons.

In RNN, the actual function of the hidden layer composed of M neurons should be F (wx + B), and two steps are realized here:

Firstly, M hidden layer neurons are fully connected with input vector X, and weighted sum of X vector is carried out by W parameter matrix.
Secondly, the output of the hidden layer is obtained through f excitation function after screening the x vector in each dimension and adding bias matrix.

In LSTM Cell, a Cell contains several door processing functions. If the physical realization of each door is realized by num_hidden neurons, we can see that each door contains corresponding W parameter matrix and bias matrix parameters, which is the realization in the figure above.

Can be seen from the diagram, in the unit cell has four doors, each door correspond to 128 neurons in hidden layer, the equivalent of four hidden layer, each hidden layer to the input x full connection, and the input vector x is consists of two parts, one part is the cell at a moment in the output, size of 128, and part is current input sample vector, The size is 6. Therefore, after internal calculation of the cell, the output of the current moment is finally obtained. The size is 128, namely num_hidden, which is used as part of the input of the cell at the next moment.

The following is a detailed analysis of the implementation mechanism and principle of Cell based on TensorFlow.

2.1 RNNCell (Abstract parent)

2.1.1 basis

“RNNCell”, which is the basic unit in TensorFlow to implement RNN, each RNNCell has a call method, using the way :(output, next_state) = call(input, state).

RNNCell is an abstract parent class that all other RNNcells inherit and then implement the call() function.

An RNNCell is an object that contains a State and can perform some processing input matrix. RNNCell outputs the Input Matrix operation to an Ouput Matrix containing the “self.output” column.

state: State is the state of the RNN cell in the RNN network. For example, if your RNN definition contains N cells (i.e. your self.state_size is an integer N), So you should have a 2D Tensor of [batch_size,self.state_size] form every time you do an RNN network, and if your self.state_size is a primitive, Then the given state should also be a Tuple, and the states in each Tuple should be represented in the same way as before.

If the “self.state_size” attribute is defined and the value is an integer, RNNCell also outputs a State Matrix containing the “self.state_size” column.
If “self.state_size” is defined as an integer Tuple, it is a Tuple that outputs the corresponding length of the state matrix. Each state matrix in the Tuple is the same length as before, containing the “self.state_size” column.

RNNCell consists of zero_state() and call() functions.

Zero_state is used to initialize the vector whose initial state h0 is all zero.
Call defines the actual RNNCell operations (for example, RNN is one activation, two gates for GRU, three gates for LSTM, etc. The difference between RNN and LSTM is mainly in this function).

In addition to the call method, two other class properties are important for RNNCell. The state_size() and output_size() methods are set as class properties and can be called as properties (using Python’s built-in @property decorator, Call a method to a property, like a C# property or field) :

State_size, is the size of the hidden layer (represents the state size of the Cell)
Output_size, is the size of the output (output dimension)

For example, we usually send a batch to model calculation, and set the shape of input data as (batch_size, input_size), then the hidden layer state obtained during calculation is (batch_size, state_size), The output is (batch_size, output_size).

Neither method is implemented here, which means we have to implement a subclass that inherits RNNCell and implements both methods.

class RNNCell(base_layer.Layer) :

  def __call__(self, inputs, state, scope=None) :
    if scope is not None:
      with vs.variable_scope(scope,
                             custom_getter=self._rnn_get_variable) as scope:
        return super(RNNCell, self).__call__(inputs, state, scope=scope)
    else:
      with vs.variable_scope(vs.get_variable_scope(),
                             custom_getter=self._rnn_get_variable):
        return super(RNNCell, self).__call__(inputs, state)

  def _rnn_get_variable(self, getter, *args, **kwargs) :
    variable = getter(*args, **kwargs)
    if context.in_graph_mode():
      trainable = (variable in tf_variables.trainable_variables() or
                   (isinstance(variable, tf_variables.PartitionedVariable) and
                    list(variable)[0] in tf_variables.trainable_variables()))
    else:
      trainable = variable._trainable  # pylint: disable=protected-access
    if trainable and variable not in self._trainable_weights:
      self._trainable_weights.append(variable)
    elif not trainable and variable not in self._non_trainable_weights:
      self._non_trainable_weights.append(variable)
    return variable

  @property
  def state_size(self) :
    raise NotImplementedError("Abstract method")

  @property
  def output_size(self) :
    raise NotImplementedError("Abstract method")

  def build(self, _) :
    pass

  def zero_state(self, batch_size, dtype) :
    with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]):
      state_size = self.state_size
      return _zero_state_tensors(state_size, batch_size, dtype)
Copy the code

2.1.2 call

Each derived RNNCell must have the following properties and implement a function with the following function signature (output, next_state) = call(input, state). The optional third input parameter, ‘scope’, is used for backward compatibility and customization of subclasses. Scope passes in a value of type tf.Variable for easier management of variables.

Run from the given state, based on the input of the RNN cell

args:

Inputs: shapes are [batch_size, input_size] states: If self.state_size is an integer, state should be a two-dimensional tensor and shape is [batch_size, self.state_size], otherwise, If self.state_size is an integer tuple (for example, if LSTM needs to calculate cell state and hidden Unit state, it is a tuple), State should be [batch_size, s] for s in self.state_size tuple. Scope: variable created by other subclasses.

Return:

Output: [batch_size, self.output_size]State: Shape matching State

Each call to RNNCell’s call method “advances” in time, which is the basic function of RNNCell.

2.2 BasicRNNCell (Basic)

2.2.1 basis

RNNCell is an abstract class that we use all the time with its two subclasses BasicRNNCell and BasicLSTMCell. As the name implies, the former is the base class for RNN and the latter is the base class for LSTM.

BasicRNNCell is what we call RNN.The simplest RNN structure is shown in the figure above. The code is as follows:

class BasicRNNCell(RNNCell) :
  def __init__(self, num_units, activation=None, reuse=None) :
    super(BasicRNNCell, self).__init__(_reuse=reuse)
    self._num_units = num_units
    self._activation = activation or math_ops.tanh
    self._linear = None

  @property
  def state_size(self) :
    return self._num_units

  @property
  def output_size(self) :
    return self._num_units

  def call(self, inputs, state) :
    """Most basic RNN: output = new_state = act(W * input + U * state + B)."""
    if self._linear is None:
      self._linear = _Linear([inputs, state], self._num_units, True)

    output = self._activation(self._linear([inputs, state]))
    # output = Ht = tanh([x,Ht-1]*W + B)
    One output is used as the input Ht at the next moment, and the other as the input Ht at the next level
    return output, output
Copy the code

2.3.2 Meaning of Parameters

You can see that there are several arguments in initializing __init__.

def __init__(self, num_units, activation=None, reuse=None) :
Copy the code

The most important parameter __init__ is num_units, which is the number of neurons in the Cell. Activation is the default activation function. Tanh, reuse is used to determine whether the Cell can be reused.

We know that a basic RNN unit has three trainable parameters W, U, B, and two input variables. Therefore, we need to specify the dimensions of each parameter when constructing RNN.

Note, in the figure abovenRepresents the input dimension DIM

BasicRNNCell BasicRNNCell

State_size is num_units:def state_size(self): return self._num_units
Output_size is num_units:def output_size(self): return self._num_units
Define state_size and output_size to be the same,
Ht and outputs are also the same (the call output is two outputs:return output, outputThat is, it does not define the output part).
As can be seen from _linear,output_sizeRefers to the dimension of bias B (_Linear is explained below).

2.2.3 function

The main implementation is the first line comment of the call function, that is, the input and the previous state pass through a linear function and then pass through an activation function, which is the most common RNN definition. That is to say,

output = new_state = f(W * input + U * state + B) = act(W * input + U * state + B)
Copy the code

The state_size() and output_size() methods return num_units, the number of neurons.

Next in the call() method:

Inputs and state are the inputs x and the last implied state
W * [Inputs, state] + b: linear transformation w * [Inputs, state] + B: linear transformation w * [Inputs, state] + B: linear transformation W * [Inputs, state] + B: output = new_state = tanh(W * input + U * state + B).
BasicRNNCell’s call() method includes a layer of _activation() in addition to _Linear (), which applies a TANH activation function to the linear transformation as the output result.
The results returned are output and output, the first representing Output and the second representing the hidden state, which is also equal to output.

2.2.4 Linear

The _Linear class is used, so let’s introduce it now.

This class passes in [Inputs, state] as args for the call() method, executes the concat() and matmul() methods, followed by the bias_add() method, thus achieving linear transformation.

Output_size is the size of the output layer, as we can see

In BasicRNNCell, output_size is _num_units;
In GRUCell, 2 * _num_units;
BasicLSTMCell is 4 * _num_units;

This is because _Linear performs the Wx + Uh + B function of several equations in RNN, but the number varies from RNN to RNN. For example, LSTM requires four computations, and then defines output_size as 4_num_units, and then divides the output into four variables.

Below is an abbreviated version of the source code

class _Linear(object) :

  def __init__(self, args, output_size, build_bias, bias_initializer=None,
               kernel_initializer=None) :
    
    self._build_bias = build_bias
    
    if not nest.is_sequence(args):
      args = [args]
      self._is_sequence = False
    else:
      self._is_sequence = True

    # Calculate the total size of arguments on dimension 1.
    total_arg_size = 0
    shapes = [a.get_shape() for a in args]
    for shape in shapes:
	    total_arg_size += shape[1].value

    dtype = [a.dtype for a in args][0]

  # loop this function num_step(sentence length) times, then this layer is finished;
  def __call__(self, args) :

    If it is time 0, then the current state(the last output H0) is all 0;
    # input 的 shape为: [batch_size, emb_size]
    # state shape: [batch_zize, Hidden_size]
    # matmul: Matrix multiplication
    [batch_size,input_size + Hidden_size] [Xt, HT-1] [batch_size,input_size + Hidden_size]
    
    if not self._is_sequence:
      args = [args]

    if len(args) == 1:
      res = math_ops.matmul(args[0], self._weights)
    else:
      [input,state] * [W,U] == [Xt, HT-1] * W, shape [batch_size,Hidden_size]
      res = math_ops.matmul(array_ops.concat(args, 1), self._weights)
    
    Shape for # B: [Hidden_size]
    # [Xt, hT-1] * W shape [batch_size,Hidden_size]
    # nn_ops.bias_add, this function is calculated by adding this B to each batch of values
	  [batch_size,Hidden_size] [batch_size,Hidden_size] [batch_size,Hidden_size]
	  # Then this Ht will be used as the input of the next moment and the input of the next layer;
    if self._build_bias:
      res = nn_ops.bias_add(res, self._biases)
    return res
Copy the code

2.3 GRUCell

GRU, Gated Recurrent Unit. In a GRU, there are only two gates: Reset Gate and Update Gate. In this structure, Ct and hidden state were combined, and the overall structure was simpler than the standard LSTM structure, which became very popular.

Next, let’s look at the definition of GRU. Compared with BasicRNNCell, it only changes the part of call function, and adds two parts of reset gate and update gate, represented by R and U respectively. Then c represents the status value to be updated. The corresponding figure and formula are as follows:

r = f(W1 * input + U1 * state + B1)
u = f(W2 * input + U2 * state + B2)
c = f(W3 * input + U3 * r * state + B3)
h_new = u * h + (1 - u) * c
Copy the code

GRUCell’s implementation code is abbreviated as follows:

class GRUCell(RNNCell) :
    
  def __init__(self,
               num_units,
               activation=None,
               reuse=None,
               kernel_initializer=None,
               bias_initializer=None) :
    super(GRUCell, self).__init__(_reuse=reuse)
    self._num_units = num_units
    self._activation = activation or math_ops.tanh
    self._kernel_initializer = kernel_initializer
    self._bias_initializer = bias_initializer
    self._gate_linear = None
    self._candidate_linear = None

  @property
  def state_size(self) :
    return self._num_units

  @property
  def output_size(self) :
    return self._num_units

  def call(self, inputs, state) :

    value = math_ops.sigmoid(self._gate_linear([inputs, state]))
    r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)

    r_state = r * state
    if self._candidate_linear is None:
      with vs.variable_scope("candidate"):
        self._candidate_linear = _Linear(
            [inputs, r_state],
            self._num_units,
            True,
            bias_initializer=self._bias_initializer,
            kernel_initializer=self._kernel_initializer)
    c = self._activation(self._candidate_linear([inputs, r_state]))
    new_h = u * state + (1 - u) * c
    return new_h, new_h
Copy the code

Specific functions are analyzed as follows:

The state_size() and output_size() methods return num_units, the number of neurons.

In the call() method, because Reset Gate RT and Update Gate Zt are represented by variables R and u respectively, they need to merge HT-1 (state and XT) first, and then realize linear transformation, and then call sigmod function to get:

value = math_ops.sigmoid(self._gate_linear([inputs, state]))
r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)
Copy the code

Then we need to solve for HT ~, first multiplying rt and HT-1 (state) :

r_state = r * state
Copy the code

Insert _Linear into a linear function and call tanh to activate the function:

c = self._activation(self._candidate_linear([inputs, r_state]))
Copy the code

Finally, the implicit state and output result are calculated, which are consistent:

new_h = u * state + (1 - u) * c
Copy the code

This returns the output and the hidden state.

return new_h, new_h
Copy the code

2.4 Customizing the RNNCell

__init__, build, and __call__ are the three functions that need to be implemented in the call function. (Note: Build is called only once, variable instantiation is done in build and rnNCell operations are done in call).

2.5 the VecAttGRUCell DIEN

The code for calling VecAttGRUCell is as follows:

rnn_outputs2, final_state2 = dynamic_rnn(VecAttGRUCell(HIDDEN_SIZE), inputs=rnn_outputs,                                         att_scores = tf.expand_dims(alphas, -1),                                         sequence_length=self.seq_len_ph, dtype=tf.float32,                                         scope="gru2")
Copy the code

First notice the use of tf.expand_dims, which adds one dimension to alphas.

alphas = Tensor("Attention_layer_1/Reshape_4:0", shape=(? ,?) , dtype=float32)Copy the code

Minus 1 means you add one dimension at the end.

att_scores = tf.expand_dims(alphas, -1)
Copy the code

The change ali made here is mainly to the call function, and it’s about att_score:

u = (1.0 - att_score) * u
new_h = u * state + (1 - u) * c
return new_h, new_h    
Copy the code

The specific code is:

def call(self, inputs, state, att_score=None) :. c = self._activation(self._candidate_linear([inputs, r_state])) u = (1.0 - att_score) * u  # Here is a new addition
    new_h = u * state + (1 - u) * c # Here is a new addition
    return new_h, new_h
Copy the code

Where the runtime variables are as follows:

inputs = {Tensor} Tensor("rnn_2/gru2/while/TensorArrayReadV3:0", shape=(? .36), dtype=float32)
state = {Tensor} Tensor("rnn_2/gru2/while/Identity_2:0", shape=(? .36), dtype=float32)
att_score = {Tensor} Tensor("rnn_2/gru2/while/strided_slice:0", shape=(? .1), dtype=float32)
new_h = {Tensor} Tensor("rnn_2/gru2/while/add_1:0", shape=(? .36), dtype=float32)
u = {Tensor} Tensor("rnn_2/gru2/while/mul_1:0", shape=(? .36), dtype=float32)
c = {Tensor} Tensor("rnn_2/gru2/while/Tanh:0", shape=(? .36), dtype=float32)
Copy the code

The specific corresponding paper is:

0x03 RNN

3.1 Perform multiple steps at a time

3.1.1 basis

There is an obvious problem with the basic RNNCell: for a single RNNCell, we use its call function to perform operations just one step forward in sequence time. For example, use x1 and h0 to get H1, and use X2 and h1 to get H2, etc. ** In this case, if our sequence length is 10, we have to call the call function 10 times, which is quite troublesome. For this, TensorFlow provides a tf.nn.dynamic_rnn function, which is equivalent to calling the call function n times. {h0,x1, x2… ., xn} {h1,h2… Hn}.

def dynamic_rnn(cell, inputs, att_scores=None, sequence_length=None, initial_state=None,
                dtype=None, parallel_iterations=None, swap_memory=False,
                time_major=False, scope=None) :
Copy the code

Important parameters:

Cell: memory unit of an LSTM or GRU. The cell parameter represents a memory unit of an LSTM or GRU, that is, a cell. For example, cell = tf.nn.rnn_cell.lstmcell ((num_units), where NUM_units indicates the number of neurons in the RNN cell, that is, cell.output_size. Returns an LSTM or GRU cell, passed in as a parameter.
Inputs: The input training or test data is in the format of [batch_size, max_time, embed_size], where BATch_size is the number of the data to be input, max_time is the maximum length of the sequence in the data, and embed_size is the dimension of the embedd-word vector.
Sequence_length =[5,10,25]. Sequence_length =[5,10,25]. Sequence_length =[5,10,25].
Time_major: determines the output tensor format. If True, the tensor must be [max_time, batch_size,cell.output_size]. If False, the tensor must be [batch_size, max_time, cell.output_size], cell.output_size represents the number of neurons in the RNN cell.

The return value is as follows:

Outputs are all outputs of the time_steps step. Its shape is (batch_size, time_steps, cell.output_size).

State is the hidden state of the last step, and its shape is (batch_size, cell.state_size).

Details are as follows:

outputsOutputs is a tensor, the output of each step.
- If time_major==True, then the output shape is [max_time, batch_size, cell.output_size].
- Outputs shape [batch_size, max_time, cell.output_size] if time_major==False (default), outputs shape [batch_size, max_time, cell.output_size]
State. State is a tensor. State is the final state, the state output by the last cell in the sequence. Generally, state is in the shape of [batch_size, cell.output_size]. When BasicLSTMCell is entered, state is in the shape of [2, batch_size, cell.output_size]. Where 2 also corresponds to cell State and hidden State in LSTM

Cell. Output_size refers to the number of neurons in the RNN cell. If three sentences are entered, max_time is the number of words in the longest sentence.

3.1.2 use

Suppose we input data in the format (batch_size, time_steps, input_size), where:

Batch_size is the amount of data entered in this batch;
Time_steps represents the length of the sequence itself. For example, in Char RNN, time_steps equals 10 for a sentence of length 10.
Input_size represents the inherent length of a single time dimension in a single sequence of input data;

We have defined an RNNCell below, call the RNNCell call function time_steps times

# inputs: shape = (batch_size, time_steps, input_size) 
# cell: RNNCell
# initial_STATE: shape = (batch_size, cell.state_size). Initial state. You can always take the null matrix
outputs, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
Copy the code

The following is an example of the parameters:

Sample data:

Xiaomai studies

Xiao Wang loves to study

Xiao Li loves studying

Little flower loves to learn

Typically, sample data is fed into the model as (batch_size, time_step, embedding_size), which can be (4,5,100).

4 means batch sent namely (small, small, small, small) the second batch is (Ming, Wang, Li, hua)…

5 indicates the time step. A sentence consists of five characters.

Here’s another example:

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import variable_scope as vs

output_size = 5
batch_size = 4
time_step = 3
dim = 3
cell = tf.nn.rnn_cell.BasicRNNCell(num_units=output_size)
inputs = tf.placeholder(dtype=tf.float32, shape=[time_step, batch_size, dim])
h0 = cell.zero_state(batch_size=batch_size, dtype=tf.float32)
X = np.array([[[1.2.1], [2.0.0], [2.1.0], [1.1.0]],  # x1
              [[1.2.1], [2.0.0], [2.1.0], [1.1.0]],  # x2
              [[1.2.1], [2.0.0], [2.1.0], [1.1.0]]])  # x3
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=h0, time_major=True)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
a, b = sess.run([outputs, final_state], feed_dict={inputs:X})
print(a)
print(b)
Copy the code

3.1.3 time_step

Specific explanations are as follows:

Text data

If the data has 1000 sequential sentences, each sentence has 25 words, each word is vectorized, and each word has vector dimension 300, then batch_size=1000, time_steps=25, input_size=300.

Time_steps is the length of the sentence, input_size is the length of the quantized vector.

Image data

Take MNIST handwritten number set for example, the training data has 6000 handwritten digital images, the size of each digital image is 28*28, batch_size=6000, time_steps=28, input_size=28, we can understand that the picture is divided into 28. Per shape=(1, 28).

Audio data

For single-channel audio data, the audio data is one-dimensional, given shape=(8910,). The data using RNN must be two-dimensional, so if batch_size is added, the data is three-dimensional. The first dimension is batch_size, the second dimension is time_steps, and the third bit is data input_size. We could shape the data into 3d data So you can use RNN.

3.2 How do I Loop calls

DNN can be static or dynamic.

Static_rnn will flatten RNN and trade space for time.
Dynamic_rnn uses a for or while loop.

The static_rnn call actually generates a graph of the RNN expanded in time series. Open tensorboard and you will see sequence_LENGTH rnN_cell stack together, but these cells are share weight. As a result, Sequence_length is tied to the topology of the graph, thus limiting the sequence_length to be consistent from batch to batch.

Dynamic_rnn does not expand RNN. Instead, use the TF_. while_loop API to control flow nodes such as Enter, Switch, Merge, LoopCondition, NextIteration, etc. Generate a graph that can execute the loop (this graph should still be static, because the topology of the graph does not change during execution). On the Tensorboard, you’ll only see an RNN_cell surrounded by a bunch of Control flow nodes. For dynamic_rnn, sequence_length only represents the number of cycles, and has nothing to do with the topology of the graph itself, so each batch can have different sequence_length.

For DIEN, when the program runs, the stack looks like this:

call, utils.py:144
__call__, utils.py:114
<lambda>, rnn.py:752
_rnn_step, rnn.py:236
_time_step, rnn.py:766
_BuildLoop, control_flow_ops.py:2590
BuildLoop, control_flow_ops.py:2640
while_loop, control_flow_ops.py:2816
_dynamic_rnn_loop, rnn.py:786
dynamic_rnn, rnn.py:615
__init__, model.py:364
train, train.py:142
<module>, train.py:231
Copy the code

The implementation of loops is primarily in control_flow_ops.py.

While_loop will loop the code for the body argument until the cond argument is true.

def while_loop(cond, body, loop_vars, shape_invariants=None,
               parallel_iterations=10, back_prop=True, swap_memory=False,
               name=None) :
  """Repeat `body` while the condition `cond` is true. `cond` is a callable returning a boolean scalar tensor. `body` is a  callable returning a (possibly nested) tuple, namedtuple or list of tensors of the same arity (length and structure) and types as `loop_vars`. `loop_vars` is a (possibly nested) tuple, namedtuple or list of tensors that is passed to both `cond` and `body`. `cond` and `body` both take as many arguments as  there are `loop_vars`. Args: cond: A callable that represents the termination condition of the loop. body: A callable that represents the loop body. loop_vars: A (possibly nested) tuple, namedtuple or list of numpy array, `Tensor`, and `TensorArray` objects. """
    if context.in_eager_mode():
      while cond(*loop_vars):
        loop_vars = body(*loop_vars)
      return loop_vars

    if shape_invariants is not None:
      nest.assert_same_structure(loop_vars, shape_invariants)

    loop_context = WhileContext(parallel_iterations, back_prop, swap_memory)  # pylint: disable=redefined-outer-name
    ops.add_to_collection(ops.GraphKeys.WHILE_CONTEXT, loop_context)
    result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
    return result
Copy the code

Here’s an example:

i = tf.constant(0)
c = lambda i: tf.less(i, 10)
b = lambda i: tf.add(i, 1)
r = tf.while_loop(c, b, [i])
print(sess.run(r) ) # 10
Copy the code

In RNN, _time_step makes a call to while_loop, which completes the iteration.

      _, output_final_ta, final_state = control_flow_ops.while_loop(
          cond=lambda time, *_: time < time_steps,
          body=_time_step,
          loop_vars=(time, output_ta, state),
          parallel_iterations=parallel_iterations,
          swap_memory=swap_memory)
Copy the code

3.3. The RNN DIEN

In the DIEN project, the main changes are to the _time_step function, because the att_scores parameter needs to be added.

It is mainly:

throughlambda: cell(input_t, state, att_score)Call the cell # call function, which is the business logic we wrote in advance;
By calling thecontrol_flow_ops.while_loop(cond=lambda time, *_: time < time_steps, body=_time_step...)To carry out loop iteration;

The code for the reduced version is as follows:

  def _time_step(time, output_ta_t, state, att_scores=None) :
    """Take a time step of the dynamic RNN. Args: time: int32 scalar Tensor. output_ta_t: List of `TensorArray`s that represent the output. state: nested tuple of vector tensors that represent the state. Returns: The tuple (time + 1, output_ta_t with updated flow, new_state). """.if att_scores is not None:
        att_score = att_scores[:, time, :]
        call_cell = lambda: cell(input_t, state, att_score)
    else:
        call_cell = lambda: cell(input_t, state)
        
    ......

    output_ta_t = tuple(
        ta.write(time, out) for ta, out in zip(output_ta_t, output))
    
    if att_scores is not None:
        return (time + 1, output_ta_t, new_state, att_scores)
    else:
        return (time + 1, output_ta_t, new_state)

  if att_scores is not None:  
      _, output_final_ta, final_state, _ = control_flow_ops.while_loop(
          cond=lambda time, *_: time < time_steps,
          body=_time_step,
          loop_vars=(time, output_ta, state, att_scores),
          parallel_iterations=parallel_iterations,
          swap_memory=swap_memory)
  else:
      _, output_final_ta, final_state = control_flow_ops.while_loop(
          cond=lambda time, *_: time < time_steps,
          body=_time_step,
          loop_vars=(time, output_ta, state),
          parallel_iterations=parallel_iterations,
          swap_memory=swap_memory)

    ......
Copy the code

0xEE Personal information

★★★★ Thoughts on life and technology ★★★★★

Wechat official account: Rosie’s Thoughts

If you want to get a timely news feed of personal articles, or want to see the technical information of personal recommendations, please pay attention.

0 XFF reference

Learn RNN in code and understand time_step inside and out

LSTM actual neuron hidden layer physical architecture principle analysis

LSTM for machine learning

The RNN implementation in TensorFlow is properly opened

Read the RNN of Tensorflow

Char-rnn-tensorflow source code analysis and structural process analysis

Recurrent Neural Network (LSTM and GRU) (2)

The correct opening of the RNN implementation in TensorFlow

Fully illustrated RNN, RNN variants, Seq2Seq, Attention mechanism

Tensorflow RNNCell source code parsing

Tensorflow RNNcell source code analysis and custom RNNcell method

Tensorflow RNN_cell API Reading Notes

BasicRNNCell in Tensorflow

Dynamic_rnn in Tensorflow

LSTM tf.nn.dynamic_rnn processing process in detail

Small white recurrent neural network RNN LSTM Parameter number gate cell units timestep batCH_size

Tensorflow Notes: Multilayer LSTM code analysis

Tensorflow Dynamic RNN, line by line interpretation of source code

Tensorflow RNN

Tensorflow RNN Source code parsing Note 1: Basic implementation of RNNCell

Tensorflow RNN Source code Parsing Note 2: Basic implementation of RNN

【tensorflow】 Static_rnn and dynamic_rnn

The application of RNN in Alidien

The application of RNN in Alidien

0 x00 the

0x01 Background

1.1 RNN

1.2 DIEN project code

0x02 Cell

2.1 RNNCell (Abstract parent)

2.1.1 basis

2.1.2 call

2.2 BasicRNNCell (Basic)

2.2.1 basis

2.3.2 Meaning of Parameters

2.2.3 function

2.2.4 Linear

2.3 GRUCell

2.4 Customizing the RNNCell

2.5 the VecAttGRUCell DIEN

0x03 RNN

3.1 Perform multiple steps at a time

3.1.1 basis

3.1.2 use

3.1.3 time_step

3.2 How do I Loop calls

3.3. The RNN DIEN

0xEE Personal information

0 XFF reference

Related Posts

The catastrophic forgetting problem of deep learning

Computation and Memory Requirement analysis of classical CNN model for deep learning

Half of China’s Internet users sat together. What do they think about AI