The application of RNN in Alidien
0 x00 the
This article is based on Ali recommended DIEN code, sorting out some concepts of RNN, as well as some source code in TensorFlow. This blog is designed to help you understand each step in detail and why you are doing it.
0x01 Background
1.1 RNN
RNN, Recurrent Neural Networks, Recurrent Neural Networks.
People don’t start from scratch. For example, when we read, our understanding of each word depends on the information we have seen before, rather than discarding all the information we have seen before. When applied to deep learning, if we want to learn to understand some information that depends on the above information, RNN can do it. It has a circular operation, which can make it retain the previously learned content.
The most common way to define an RNN is:
output = new_state = f(W * input + U * state + B) = act(W * input + U * state + B)
Copy the code
- U, W are network parameters (weight matrix), b is bias parameter, these parameters are learned through backward propagation training network.
- Act is the activation function, usually the sigmoID or TANh function is chosen.
1.2 DIEN project code
In DIEN project, I took the RNN code of TensorFlow into my own project and made some modifications, specifically:
- GRUCell was used;
- Custom VecAttGRUCell;
- Rnn.py has been modified because the VecAttGRUCell interface has been modified;
0x02 Cell
The basic unit of RNN is called Cell. Don’t underestimate the small Cell. It has not only one Neuron unit, but N hidden units.
As a result, we noticed that tensorflow defined in a cell (BasicRNNCell/BasicLSTMCell/GRUCell/RNNCell/LSTMCell) structure when the need to provide a parameter is hidden_units_size.
In actual neural networks, each gate processing function is actually processed by a certain number of hidden layer neurons.
In RNN, the actual function of the hidden layer composed of M neurons should be F (wx + B), and two steps are realized here:
- Firstly, M hidden layer neurons are fully connected with input vector X, and weighted sum of X vector is carried out by W parameter matrix.
- Secondly, the output of the hidden layer is obtained through f excitation function after screening the x vector in each dimension and adding bias matrix.
In LSTM Cell, a Cell contains several door processing functions. If the physical realization of each door is realized by num_hidden neurons, we can see that each door contains corresponding W parameter matrix and bias matrix parameters, which is the realization in the figure above.
Can be seen from the diagram, in the unit cell has four doors, each door correspond to 128 neurons in hidden layer, the equivalent of four hidden layer, each hidden layer to the input x full connection, and the input vector x is consists of two parts, one part is the cell at a moment in the output, size of 128, and part is current input sample vector, The size is 6. Therefore, after internal calculation of the cell, the output of the current moment is finally obtained. The size is 128, namely num_hidden, which is used as part of the input of the cell at the next moment.
The following is a detailed analysis of the implementation mechanism and principle of Cell based on TensorFlow.
2.1 RNNCell (Abstract parent)
2.1.1 basis
“RNNCell”, which is the basic unit in TensorFlow to implement RNN, each RNNCell has a call method, using the way :(output, next_state) = call(input, state).
RNNCell is an abstract parent class that all other RNNcells inherit and then implement the call() function.
An RNNCell is an object that contains a State and can perform some processing input matrix. RNNCell outputs the Input Matrix operation to an Ouput Matrix containing the “self.output” column.
state: State is the state of the RNN cell in the RNN network. For example, if your RNN definition contains N cells (i.e. your self.state_size is an integer N), So you should have a 2D Tensor of [batch_size,self.state_size] form every time you do an RNN network, and if your self.state_size is a primitive, Then the given state should also be a Tuple, and the states in each Tuple should be represented in the same way as before.
- If the “self.state_size” attribute is defined and the value is an integer, RNNCell also outputs a State Matrix containing the “self.state_size” column.
- If “self.state_size” is defined as an integer Tuple, it is a Tuple that outputs the corresponding length of the state matrix. Each state matrix in the Tuple is the same length as before, containing the “self.state_size” column.
RNNCell consists of zero_state() and call() functions.
- Zero_state is used to initialize the vector whose initial state h0 is all zero.
- Call defines the actual RNNCell operations (for example, RNN is one activation, two gates for GRU, three gates for LSTM, etc. The difference between RNN and LSTM is mainly in this function).
In addition to the call method, two other class properties are important for RNNCell. The state_size() and output_size() methods are set as class properties and can be called as properties (using Python’s built-in @property decorator, Call a method to a property, like a C# property or field) :
- State_size, is the size of the hidden layer (represents the state size of the Cell)
- Output_size, is the size of the output (output dimension)
For example, we usually send a batch to model calculation, and set the shape of input data as (batch_size, input_size), then the hidden layer state obtained during calculation is (batch_size, state_size), The output is (batch_size, output_size).
Neither method is implemented here, which means we have to implement a subclass that inherits RNNCell and implements both methods.
class RNNCell(base_layer.Layer) :
def __call__(self, inputs, state, scope=None) :
if scope is not None:
with vs.variable_scope(scope,
custom_getter=self._rnn_get_variable) as scope:
return super(RNNCell, self).__call__(inputs, state, scope=scope)
else:
with vs.variable_scope(vs.get_variable_scope(),
custom_getter=self._rnn_get_variable):
return super(RNNCell, self).__call__(inputs, state)
def _rnn_get_variable(self, getter, *args, **kwargs) :
variable = getter(*args, **kwargs)
if context.in_graph_mode():
trainable = (variable in tf_variables.trainable_variables() or
(isinstance(variable, tf_variables.PartitionedVariable) and
list(variable)[0] in tf_variables.trainable_variables()))
else:
trainable = variable._trainable # pylint: disable=protected-access
if trainable and variable not in self._trainable_weights:
self._trainable_weights.append(variable)
elif not trainable and variable not in self._non_trainable_weights:
self._non_trainable_weights.append(variable)
return variable
@property
def state_size(self) :
raise NotImplementedError("Abstract method")
@property
def output_size(self) :
raise NotImplementedError("Abstract method")
def build(self, _) :
pass
def zero_state(self, batch_size, dtype) :
with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]):
state_size = self.state_size
return _zero_state_tensors(state_size, batch_size, dtype)
Copy the code
2.1.2 call
Each derived RNNCell must have the following properties and implement a function with the following function signature (output, next_state) = call(input, state). The optional third input parameter, ‘scope’, is used for backward compatibility and customization of subclasses. Scope passes in a value of type tf.Variable for easier management of variables.
Run from the given state, based on the input of the RNN cell
args:
Inputs: shapes are [batch_size, input_size] states: If self.state_size is an integer, state should be a two-dimensional tensor and shape is [batch_size, self.state_size], otherwise, If self.state_size is an integer tuple (for example, if LSTM needs to calculate cell state and hidden Unit state, it is a tuple), State should be [batch_size, s] for s in self.state_size tuple. Scope: variable created by other subclasses.
Return:
Output: [batch_size, self.output_size]State: Shape matching State
Each call to RNNCell’s call method “advances” in time, which is the basic function of RNNCell.
2.2 BasicRNNCell (Basic)
2.2.1 basis
RNNCell is an abstract class that we use all the time with its two subclasses BasicRNNCell and BasicLSTMCell. As the name implies, the former is the base class for RNN and the latter is the base class for LSTM.
BasicRNNCell is what we call RNN.The simplest RNN structure is shown in the figure above. The code is as follows:
class BasicRNNCell(RNNCell) :
def __init__(self, num_units, activation=None, reuse=None) :
super(BasicRNNCell, self).__init__(_reuse=reuse)
self._num_units = num_units
self._activation = activation or math_ops.tanh
self._linear = None
@property
def state_size(self) :
return self._num_units
@property
def output_size(self) :
return self._num_units
def call(self, inputs, state) :
"""Most basic RNN: output = new_state = act(W * input + U * state + B)."""
if self._linear is None:
self._linear = _Linear([inputs, state], self._num_units, True)
output = self._activation(self._linear([inputs, state]))
# output = Ht = tanh([x,Ht-1]*W + B)
One output is used as the input Ht at the next moment, and the other as the input Ht at the next level
return output, output
Copy the code
2.3.2 Meaning of Parameters
You can see that there are several arguments in initializing __init__.
def __init__(self, num_units, activation=None, reuse=None) :
Copy the code
The most important parameter __init__ is num_units, which is the number of neurons in the Cell. Activation is the default activation function. Tanh, reuse is used to determine whether the Cell can be reused.
We know that a basic RNN unit has three trainable parameters W, U, B, and two input variables. Therefore, we need to specify the dimensions of each parameter when constructing RNN.
Note, in the figure aboven
Represents the input dimension DIM
BasicRNNCell BasicRNNCell
- State_size is num_units:
def state_size(self): return self._num_units
- Output_size is num_units:
def output_size(self): return self._num_units
- Define state_size and output_size to be the same,
- Ht and outputs are also the same (the call output is two outputs:
return output, output
That is, it does not define the output part). - As can be seen from _linear,
output_size
Refers to the dimension of bias B (_Linear is explained below).
2.2.3 function
The main implementation is the first line comment of the call function, that is, the input and the previous state pass through a linear function and then pass through an activation function, which is the most common RNN definition. That is to say,
output = new_state = f(W * input + U * state + B) = act(W * input + U * state + B)
Copy the code
The state_size() and output_size() methods return num_units, the number of neurons.
Next in the call() method:
-
Inputs and state are the inputs x and the last implied state
-
W * [Inputs, state] + b: linear transformation w * [Inputs, state] + B: linear transformation w * [Inputs, state] + B: linear transformation W * [Inputs, state] + B: output = new_state = tanh(W * input + U * state + B).
-
BasicRNNCell’s call() method includes a layer of _activation() in addition to _Linear (), which applies a TANH activation function to the linear transformation as the output result.
-
The results returned are output and output, the first representing Output and the second representing the hidden state, which is also equal to output.
2.2.4 Linear
The _Linear class is used, so let’s introduce it now.
This class passes in [Inputs, state] as args for the call() method, executes the concat() and matmul() methods, followed by the bias_add() method, thus achieving linear transformation.
Output_size is the size of the output layer, as we can see
- In BasicRNNCell, output_size is _num_units;
- In GRUCell, 2 * _num_units;
- BasicLSTMCell is 4 * _num_units;
This is because _Linear performs the Wx + Uh + B function of several equations in RNN, but the number varies from RNN to RNN. For example, LSTM requires four computations, and then defines output_size as 4_num_units, and then divides the output into four variables.
Below is an abbreviated version of the source code
class _Linear(object) :
def __init__(self, args, output_size, build_bias, bias_initializer=None,
kernel_initializer=None) :
self._build_bias = build_bias
if not nest.is_sequence(args):
args = [args]
self._is_sequence = False
else:
self._is_sequence = True
# Calculate the total size of arguments on dimension 1.
total_arg_size = 0
shapes = [a.get_shape() for a in args]
for shape in shapes:
total_arg_size += shape[1].value
dtype = [a.dtype for a in args][0]
# loop this function num_step(sentence length) times, then this layer is finished;
def __call__(self, args) :
If it is time 0, then the current state(the last output H0) is all 0;
# input 的 shape为: [batch_size, emb_size]
# state shape: [batch_zize, Hidden_size]
# matmul: Matrix multiplication
[batch_size,input_size + Hidden_size] [Xt, HT-1] [batch_size,input_size + Hidden_size]
if not self._is_sequence:
args = [args]
if len(args) == 1:
res = math_ops.matmul(args[0], self._weights)
else:
[input,state] * [W,U] == [Xt, HT-1] * W, shape [batch_size,Hidden_size]
res = math_ops.matmul(array_ops.concat(args, 1), self._weights)
Shape for # B: [Hidden_size]
# [Xt, hT-1] * W shape [batch_size,Hidden_size]
# nn_ops.bias_add, this function is calculated by adding this B to each batch of values
[batch_size,Hidden_size] [batch_size,Hidden_size] [batch_size,Hidden_size]
# Then this Ht will be used as the input of the next moment and the input of the next layer;
if self._build_bias:
res = nn_ops.bias_add(res, self._biases)
return res
Copy the code
2.3 GRUCell
GRU, Gated Recurrent Unit. In a GRU, there are only two gates: Reset Gate and Update Gate. In this structure, Ct and hidden state were combined, and the overall structure was simpler than the standard LSTM structure, which became very popular.
Next, let’s look at the definition of GRU. Compared with BasicRNNCell, it only changes the part of call function, and adds two parts of reset gate and update gate, represented by R and U respectively. Then c represents the status value to be updated. The corresponding figure and formula are as follows:
r = f(W1 * input + U1 * state + B1)
u = f(W2 * input + U2 * state + B2)
c = f(W3 * input + U3 * r * state + B3)
h_new = u * h + (1 - u) * c
Copy the code
GRUCell’s implementation code is abbreviated as follows:
class GRUCell(RNNCell) :
def __init__(self,
num_units,
activation=None,
reuse=None,
kernel_initializer=None,
bias_initializer=None) :
super(GRUCell, self).__init__(_reuse=reuse)
self._num_units = num_units
self._activation = activation or math_ops.tanh
self._kernel_initializer = kernel_initializer
self._bias_initializer = bias_initializer
self._gate_linear = None
self._candidate_linear = None
@property
def state_size(self) :
return self._num_units
@property
def output_size(self) :
return self._num_units
def call(self, inputs, state) :
value = math_ops.sigmoid(self._gate_linear([inputs, state]))
r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)
r_state = r * state
if self._candidate_linear is None:
with vs.variable_scope("candidate"):
self._candidate_linear = _Linear(
[inputs, r_state],
self._num_units,
True,
bias_initializer=self._bias_initializer,
kernel_initializer=self._kernel_initializer)
c = self._activation(self._candidate_linear([inputs, r_state]))
new_h = u * state + (1 - u) * c
return new_h, new_h
Copy the code
Specific functions are analyzed as follows:
The state_size() and output_size() methods return num_units, the number of neurons.
In the call() method, because Reset Gate RT and Update Gate Zt are represented by variables R and u respectively, they need to merge HT-1 (state and XT) first, and then realize linear transformation, and then call sigmod function to get:
value = math_ops.sigmoid(self._gate_linear([inputs, state]))
r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)
Copy the code
Then we need to solve for HT ~, first multiplying rt and HT-1 (state) :
r_state = r * state
Copy the code
Insert _Linear into a linear function and call tanh to activate the function:
c = self._activation(self._candidate_linear([inputs, r_state]))
Copy the code
Finally, the implicit state and output result are calculated, which are consistent:
new_h = u * state + (1 - u) * c
Copy the code
This returns the output and the hidden state.
return new_h, new_h
Copy the code
2.4 Customizing the RNNCell
__init__, build, and __call__ are the three functions that need to be implemented in the call function. (Note: Build is called only once, variable instantiation is done in build and rnNCell operations are done in call).
2.5 the VecAttGRUCell DIEN
The code for calling VecAttGRUCell is as follows:
rnn_outputs2, final_state2 = dynamic_rnn(VecAttGRUCell(HIDDEN_SIZE), inputs=rnn_outputs, att_scores = tf.expand_dims(alphas, -1), sequence_length=self.seq_len_ph, dtype=tf.float32, scope="gru2")
Copy the code
First notice the use of tf.expand_dims, which adds one dimension to alphas.
alphas = Tensor("Attention_layer_1/Reshape_4:0", shape=(? ,?) , dtype=float32)Copy the code
Minus 1 means you add one dimension at the end.
att_scores = tf.expand_dims(alphas, -1)
Copy the code
The change ali made here is mainly to the call function, and it’s about att_score:
u = (1.0 - att_score) * u
new_h = u * state + (1 - u) * c
return new_h, new_h
Copy the code
The specific code is:
def call(self, inputs, state, att_score=None) :. c = self._activation(self._candidate_linear([inputs, r_state])) u = (1.0 - att_score) * u # Here is a new addition
new_h = u * state + (1 - u) * c # Here is a new addition
return new_h, new_h
Copy the code
Where the runtime variables are as follows:
inputs = {Tensor} Tensor("rnn_2/gru2/while/TensorArrayReadV3:0", shape=(? .36), dtype=float32)
state = {Tensor} Tensor("rnn_2/gru2/while/Identity_2:0", shape=(? .36), dtype=float32)
att_score = {Tensor} Tensor("rnn_2/gru2/while/strided_slice:0", shape=(? .1), dtype=float32)
new_h = {Tensor} Tensor("rnn_2/gru2/while/add_1:0", shape=(? .36), dtype=float32)
u = {Tensor} Tensor("rnn_2/gru2/while/mul_1:0", shape=(? .36), dtype=float32)
c = {Tensor} Tensor("rnn_2/gru2/while/Tanh:0", shape=(? .36), dtype=float32)
Copy the code
The specific corresponding paper is:
0x03 RNN
3.1 Perform multiple steps at a time
3.1.1 basis
There is an obvious problem with the basic RNNCell: for a single RNNCell, we use its call function to perform operations just one step forward in sequence time. For example, use x1 and h0 to get H1, and use X2 and h1 to get H2, etc. ** In this case, if our sequence length is 10, we have to call the call function 10 times, which is quite troublesome. For this, TensorFlow provides a tf.nn.dynamic_rnn function, which is equivalent to calling the call function n times. {h0,x1, x2… ., xn} {h1,h2… Hn}.
def dynamic_rnn(cell, inputs, att_scores=None, sequence_length=None, initial_state=None,
dtype=None, parallel_iterations=None, swap_memory=False,
time_major=False, scope=None) :
Copy the code
Important parameters:
- Cell: memory unit of an LSTM or GRU. The cell parameter represents a memory unit of an LSTM or GRU, that is, a cell. For example, cell = tf.nn.rnn_cell.lstmcell ((num_units), where NUM_units indicates the number of neurons in the RNN cell, that is, cell.output_size. Returns an LSTM or GRU cell, passed in as a parameter.
- Inputs: The input training or test data is in the format of [batch_size, max_time, embed_size], where BATch_size is the number of the data to be input, max_time is the maximum length of the sequence in the data, and embed_size is the dimension of the embedd-word vector.
- Sequence_length =[5,10,25]. Sequence_length =[5,10,25]. Sequence_length =[5,10,25].
- Time_major: determines the output tensor format. If True, the tensor must be [max_time, batch_size,cell.output_size]. If False, the tensor must be [batch_size, max_time, cell.output_size], cell.output_size represents the number of neurons in the RNN cell.
The return value is as follows:
Outputs are all outputs of the time_steps step. Its shape is (batch_size, time_steps, cell.output_size).
State is the hidden state of the last step, and its shape is (batch_size, cell.state_size).
Details are as follows:
- outputsOutputs is a tensor, the output of each step.
- If time_major==True, then the output shape is [max_time, batch_size, cell.output_size].
- Outputs shape [batch_size, max_time, cell.output_size] if time_major==False (default), outputs shape [batch_size, max_time, cell.output_size]
- State. State is a tensor. State is the final state, the state output by the last cell in the sequence. Generally, state is in the shape of [batch_size, cell.output_size]. When BasicLSTMCell is entered, state is in the shape of [2, batch_size, cell.output_size]. Where 2 also corresponds to cell State and hidden State in LSTM
Cell. Output_size refers to the number of neurons in the RNN cell. If three sentences are entered, max_time is the number of words in the longest sentence.
3.1.2 use
Suppose we input data in the format (batch_size, time_steps, input_size), where:
- Batch_size is the amount of data entered in this batch;
- Time_steps represents the length of the sequence itself. For example, in Char RNN, time_steps equals 10 for a sentence of length 10.
- Input_size represents the inherent length of a single time dimension in a single sequence of input data;
We have defined an RNNCell below, call the RNNCell call function time_steps times
# inputs: shape = (batch_size, time_steps, input_size)
# cell: RNNCell
# initial_STATE: shape = (batch_size, cell.state_size). Initial state. You can always take the null matrix
outputs, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
Copy the code
The following is an example of the parameters:
Sample data:
Xiaomai studies
Xiao Wang loves to study
Xiao Li loves studying
Little flower loves to learn
Typically, sample data is fed into the model as (batch_size, time_step, embedding_size), which can be (4,5,100).
4 means batch sent namely (small, small, small, small) the second batch is (Ming, Wang, Li, hua)…
5 indicates the time step. A sentence consists of five characters.
Here’s another example:
import tensorflow as tf
import numpy as np
from tensorflow.python.ops import variable_scope as vs
output_size = 5
batch_size = 4
time_step = 3
dim = 3
cell = tf.nn.rnn_cell.BasicRNNCell(num_units=output_size)
inputs = tf.placeholder(dtype=tf.float32, shape=[time_step, batch_size, dim])
h0 = cell.zero_state(batch_size=batch_size, dtype=tf.float32)
X = np.array([[[1.2.1], [2.0.0], [2.1.0], [1.1.0]], # x1
[[1.2.1], [2.0.0], [2.1.0], [1.1.0]], # x2
[[1.2.1], [2.0.0], [2.1.0], [1.1.0]]]) # x3
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=h0, time_major=True)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
a, b = sess.run([outputs, final_state], feed_dict={inputs:X})
print(a)
print(b)
Copy the code
3.1.3 time_step
Specific explanations are as follows:
Text data
If the data has 1000 sequential sentences, each sentence has 25 words, each word is vectorized, and each word has vector dimension 300, then batch_size=1000, time_steps=25, input_size=300.
Time_steps is the length of the sentence, input_size is the length of the quantized vector.
Image data
Take MNIST handwritten number set for example, the training data has 6000 handwritten digital images, the size of each digital image is 28*28, batch_size=6000, time_steps=28, input_size=28, we can understand that the picture is divided into 28. Per shape=(1, 28).
Audio data
For single-channel audio data, the audio data is one-dimensional, given shape=(8910,). The data using RNN must be two-dimensional, so if batch_size is added, the data is three-dimensional. The first dimension is batch_size, the second dimension is time_steps, and the third bit is data input_size. We could shape the data into 3d data So you can use RNN.
3.2 How do I Loop calls
DNN can be static or dynamic.
- Static_rnn will flatten RNN and trade space for time.
- Dynamic_rnn uses a for or while loop.
The static_rnn call actually generates a graph of the RNN expanded in time series. Open tensorboard and you will see sequence_LENGTH rnN_cell stack together, but these cells are share weight. As a result, Sequence_length is tied to the topology of the graph, thus limiting the sequence_length to be consistent from batch to batch.
Dynamic_rnn does not expand RNN. Instead, use the TF_. while_loop API to control flow nodes such as Enter, Switch, Merge, LoopCondition, NextIteration, etc. Generate a graph that can execute the loop (this graph should still be static, because the topology of the graph does not change during execution). On the Tensorboard, you’ll only see an RNN_cell surrounded by a bunch of Control flow nodes. For dynamic_rnn, sequence_length only represents the number of cycles, and has nothing to do with the topology of the graph itself, so each batch can have different sequence_length.
For DIEN, when the program runs, the stack looks like this:
call, utils.py:144
__call__, utils.py:114
<lambda>, rnn.py:752
_rnn_step, rnn.py:236
_time_step, rnn.py:766
_BuildLoop, control_flow_ops.py:2590
BuildLoop, control_flow_ops.py:2640
while_loop, control_flow_ops.py:2816
_dynamic_rnn_loop, rnn.py:786
dynamic_rnn, rnn.py:615
__init__, model.py:364
train, train.py:142
<module>, train.py:231
Copy the code
The implementation of loops is primarily in control_flow_ops.py.
While_loop will loop the code for the body argument until the cond argument is true.
def while_loop(cond, body, loop_vars, shape_invariants=None,
parallel_iterations=10, back_prop=True, swap_memory=False,
name=None) :
"""Repeat `body` while the condition `cond` is true. `cond` is a callable returning a boolean scalar tensor. `body` is a callable returning a (possibly nested) tuple, namedtuple or list of tensors of the same arity (length and structure) and types as `loop_vars`. `loop_vars` is a (possibly nested) tuple, namedtuple or list of tensors that is passed to both `cond` and `body`. `cond` and `body` both take as many arguments as there are `loop_vars`. Args: cond: A callable that represents the termination condition of the loop. body: A callable that represents the loop body. loop_vars: A (possibly nested) tuple, namedtuple or list of numpy array, `Tensor`, and `TensorArray` objects. """
if context.in_eager_mode():
while cond(*loop_vars):
loop_vars = body(*loop_vars)
return loop_vars
if shape_invariants is not None:
nest.assert_same_structure(loop_vars, shape_invariants)
loop_context = WhileContext(parallel_iterations, back_prop, swap_memory) # pylint: disable=redefined-outer-name
ops.add_to_collection(ops.GraphKeys.WHILE_CONTEXT, loop_context)
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
return result
Copy the code
Here’s an example:
i = tf.constant(0)
c = lambda i: tf.less(i, 10)
b = lambda i: tf.add(i, 1)
r = tf.while_loop(c, b, [i])
print(sess.run(r) ) # 10
Copy the code
In RNN, _time_step makes a call to while_loop, which completes the iteration.
_, output_final_ta, final_state = control_flow_ops.while_loop(
cond=lambda time, *_: time < time_steps,
body=_time_step,
loop_vars=(time, output_ta, state),
parallel_iterations=parallel_iterations,
swap_memory=swap_memory)
Copy the code
3.3. The RNN DIEN
In the DIEN project, the main changes are to the _time_step function, because the att_scores parameter needs to be added.
It is mainly:
- through
lambda: cell(input_t, state, att_score)
Call the cell # call function, which is the business logic we wrote in advance; - By calling the
control_flow_ops.while_loop(cond=lambda time, *_: time < time_steps, body=_time_step...)
To carry out loop iteration;
The code for the reduced version is as follows:
def _time_step(time, output_ta_t, state, att_scores=None) :
"""Take a time step of the dynamic RNN. Args: time: int32 scalar Tensor. output_ta_t: List of `TensorArray`s that represent the output. state: nested tuple of vector tensors that represent the state. Returns: The tuple (time + 1, output_ta_t with updated flow, new_state). """.if att_scores is not None:
att_score = att_scores[:, time, :]
call_cell = lambda: cell(input_t, state, att_score)
else:
call_cell = lambda: cell(input_t, state)
......
output_ta_t = tuple(
ta.write(time, out) for ta, out in zip(output_ta_t, output))
if att_scores is not None:
return (time + 1, output_ta_t, new_state, att_scores)
else:
return (time + 1, output_ta_t, new_state)
if att_scores is not None:
_, output_final_ta, final_state, _ = control_flow_ops.while_loop(
cond=lambda time, *_: time < time_steps,
body=_time_step,
loop_vars=(time, output_ta, state, att_scores),
parallel_iterations=parallel_iterations,
swap_memory=swap_memory)
else:
_, output_final_ta, final_state = control_flow_ops.while_loop(
cond=lambda time, *_: time < time_steps,
body=_time_step,
loop_vars=(time, output_ta, state),
parallel_iterations=parallel_iterations,
swap_memory=swap_memory)
......
Copy the code
0xEE Personal information
★★★★ Thoughts on life and technology ★★★★★
Wechat official account: Rosie’s Thoughts
If you want to get a timely news feed of personal articles, or want to see the technical information of personal recommendations, please pay attention.
0 XFF reference
Learn RNN in code and understand time_step inside and out
LSTM actual neuron hidden layer physical architecture principle analysis
LSTM for machine learning
The RNN implementation in TensorFlow is properly opened
Read the RNN of Tensorflow
Char-rnn-tensorflow source code analysis and structural process analysis
Recurrent Neural Network (LSTM and GRU) (2)
The correct opening of the RNN implementation in TensorFlow
Fully illustrated RNN, RNN variants, Seq2Seq, Attention mechanism
Tensorflow RNNCell source code parsing
Tensorflow RNNcell source code analysis and custom RNNcell method
Tensorflow RNN_cell API Reading Notes
BasicRNNCell in Tensorflow
Dynamic_rnn in Tensorflow
LSTM tf.nn.dynamic_rnn processing process in detail
Small white recurrent neural network RNN LSTM Parameter number gate cell units timestep batCH_size
Tensorflow Notes: Multilayer LSTM code analysis
Tensorflow Dynamic RNN, line by line interpretation of source code
Tensorflow RNN
Tensorflow RNN Source code parsing Note 1: Basic implementation of RNNCell
Tensorflow RNN Source code Parsing Note 2: Basic implementation of RNN
【tensorflow】 Static_rnn and dynamic_rnn