This article is excerpted from Lovemiss-Y’s blog at blog.csdn.net/qq_27825451… It helps to see the actual operation mechanism of RNN clearly


Abstract: It is very simple and convenient to use the current mainstream deep learning framework to realize the recurrent neural network, so we may forget how to analyze the input of neural network, what is the output, what is the so-called state maintained between each node of the cycle layer? This paper needs the basic knowledge and some basic theories of recurrent neural networks, which can be referred to my previous two articles:

Blog.csdn.net/qq_27825451…

The purpose of writing this article is because I have seen many posts, many of them are reproduced the same, part of the original illustrates some basic implementation, but feel ambiguous, did not involve very detail place (also is likely to be someone else) I can’t understand, so I decided to write an article, this article only said a point, That is, what is the output of each cell of the circulating neural network and the state between them exactly? Why is it like this? In view of the limited level, there are some inaccuracies.

The basic structure of RNN

An example to thoroughly understand the “dimension” of output and state

(1) Step 1: Create training data sample X

(2) Step 2: Create an operational graph structure

(3) Step 3: Build graph and test

Analysis of the running results of the program

Fourth, RNN output value and state value summary

Five: Supplement,

The basic structure of RNN

You can see this structure in many books and blogs as follows:

Of course, there is no mistake in this, but the disadvantage is that the diagram is refined by the god, which is too abstract and not easy to understand. What should I do then? I have made a sketch in the sketch book, which makes me feel better to understand:

It’s not a pretty drawing, but the following illustration explains it in some detail.

  • In fact, at the beginning, I struggled with whether to concat the internal operation directly (MUL (concat(input,state),W)+ B), or after matrix operation respectively (input * Wi + state * Ws), The above process can already explain my question, obviously the latter, but also another discovery is that the number of units of state and hidden layer is the same.
  • The figure shows the batCH_size data, and the results obtained after calculation in RNN[batch_size,max_length,vocab_size]Sequence results and[batch_size,vocab_size]State result of

An example to thoroughly understand the “dimension” of output and state

This article uses the TensorFlow framework, version tensorFlow 1.9. The dynamic_rnn() function is used in the dynamic_rnn() function.

(1) Step 1: Create training data sample X

import tensorflow as tf
import numpy as np
import pprint
 
train_X = np.array([
[[0.1.2], [9.8.7], [3.6.8]], 
[[3.4.5], [1.3.5], [6.2.9]], 
[[6.7.8], [6.5.4], [1.7.4]], 
[[9.0.1], [3.7.4], [5.8.2]]])Samples data is of the form (samples, timesteps,features), where samples=4, timesteps=3,features =3"
Copy the code

(2) Step 2: Create an operational graph structure

Create a placeholder to hold training data
X=tf.placeholder(tf.float32,shape=[None.3.3])  
 
Tensorflow processes variably long time sequences. First, there are 5 neurons in each cell cycle
basic_cell=tf.nn.rnn_cell.BasicRNNCell(num_units=5)
 
# create a dynamically computed RNN using dynamic_rnn
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
Copy the code

(3) Step 3: Build graph and test

with tf.Session() as sess: sess.run(tf.global_variables_initializer()) outputs_val, states_val = sess.run( [outputs, states], Feed_dict ={X:train_X}) pprint.pprint(outputs_val) Print (" = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ') print (np) shape (outputs_val) view the output value and status of the dimension print(np.shape(states_val)) print('++++++++++++++++++++++++++++++++++++++++++++++++') print(basic_cell.state_size) print(basic_cell.output_size)Copy the code

Analysis of the running results of the program

The running results of the program are as follows:

Array ([[-0.6108298, 0.0596378, -0.37820065, 0.3211917,0.56089014], [-0.99999994, 0.9993362, 0.9778955, 0.5386584,-0.9203638], [-0.9997577, 0.9915552, -0.9343918, -0.24285667,0.63978183]], [[-0.9990114, 0.9177295, -0.06776464, 0.6655134,0.4589014], [-0.98075587, 0.53816617, -0.05612217, -0.24713938,0.48741972], [-0.99997896, -0.47020257, 0.9985639, 0.99964374,0.99789286]], [[-0.99999998, 0.9958609, 0.2563717, 0.85442424,0.34319228], [-0.999999506, 0.9972327, 0.8965514, -0.5725894,-0.9418285], [-0.98404294, 0.99636745, -0.99936426, ], [[-0.9999995047, 0.92161566, 0.9999575, 0.9958721,-0.23263188], [-0.99924177, 0.9996709, -0.97150767, -0.9945894,-0.991192], [-0.99982065, 0.99869967, -0.8663484, -0.98502225,-0.98442185]], Dtype =float32) array([[-0.9997577, 0.9915552, -0.9343918, -0.24285667, 0.63978183], [-0.99997896, -0.47020257, 0.9985639, 0.99964374, 0.99789286], [-0.98404294, 0.99636745, -0.99936426, -0.98879707, -0.83194304], [-0.99982065, 0.99869967, 0.8663484, 0.98502225, 0.98442185]], dtype float32) = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = (4, 3, 5) (4, 5) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 5 to 5Copy the code

Through the above tests, we found that:

The dimension of the output value Y is (4,3,5). Why? This is illustrated in the handwriting diagram above;

The dimension of the state value S is (4,5), as has been stated above,

Theoretically, Y and S should be the same, but the definition of this state in Tensorflow is that the output above the last time step is S. We can also see from the above results that the last vector of each output Y is equal to S, marked as follows:

Array ([[-0.6108298, 0.0596378, -0.37820065, 0.3211917,0.56089014], [-0.99999994, 0.9993362, 0.9778955, 0.5386584,-0.9203638], [-0.9997577, 0.9915552, -0.9343918, -0.24285667,0.63978183]], [[-0.9990114, 0.9177295, -0.06776464, 0.6655134,0.4589014], [-0.98075587, 0.53816617, -0.05612217, -0.24713938,0.48741972], [-0.99997896, -0.47020257, 0.9985639, 0.99964374,0.99789286]], [[-0.99999998, 0.9958609, 0.2563717, 0.85442424,0.34319228], [-0.999999506, 0.9972327, 0.8965514, -0.5725894,-0.9418285], [-0.98404294, 0.99636745, -0.99936426, ], [[-0.9999995047, 0.92161566, 0.9999575, 0.9958721,-0.23263188], [-0.99924177, 0.9996709, -0.97150767, -0.9945894,-0.991192], [-0.99982065, 0.99869967, -0.8663484, -0.98502225,-0.98442185]], Dtype =float32) array([[-0.9997577, 0.9915552, -0.9343918, -0.24285667, 0.63978183], [-0.99997896, -0.47020257, 0.9985639, 0.99964374, 0.99789286], [-0.98404294, 0.99636745, -0.99936426, -0.98879707, -0.83194304], [-0.99982065, 0.99869967, -0.8663484, -0.98502225, -0.98442185]], dType =float32)Copy the code

Four, RNN summary

The summary of this paper is based on tensorflow dynamic_RNN, but this is the essence of RNN operation process, are universal.

Conclusion summary:

(1) RNN output dimension, output_Y dimension is (samples, timesteps, num_units)

(2) The state dimension of RNN, and the dimension of output_S is (samples, num_units).

Note that output_S and output_Y are essentially the same, except that the state in the middle of the artificial RNN is passed between the cells of the RNN and has no output, so it does not count the output value, but just takes the state above the last time step as the output state.

Add: take a look at the definition of dynamic_rnn

tf.nn.dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)
'''
Defined in tensorflow/python/ops/rnn.py.
See the guide: Neural Network > Recurrent Neural Networks
Creates a recurrent neural network specified by RNNCell cell.
Performs fully dynamic unrolling of inputs.
'''
#Example:
# = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
"Use use cases that can be referenced."
# create a BasicRNNCell
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(hidden_size)
 
# 'outputs' is a tensor of shape [batch_size, max_time, cell_state_size]
 
# defining initial state
initial_state = rnn_cell.zero_state(batch_size, dtype=tf.float32)
 
# 'state' is a tensor of shape [batch_size, cell_state_size]
outputs, state = tf.nn.dynamic_rnn(rnn_cell, input_data,
                                   initial_state=initial_state,
                                   dtype=tf.float32)
# = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
"Use use cases that can be referenced."
# create 2 LSTMCells
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [128.256]]
 
# create a RNN cell composed sequentially of a number of RNNCells
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
 
# 'outputs' is a tensor of shape [batch_size, max_time, 256]
# 'state' is a N-tuple where N is the number of LSTMCells containing a
# tf.contrib.rnn.LSTMStateTuple for each cell
outputs, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
                                   inputs=data,
                                   dtype=tf.float32)
Copy the code

Parameter Description:

  • Cell: Instance object created by the RNNCell class
  • inputs: This is the input of RNN network, iftime_major == FalseInputs shape must be:atch_size, max_time, features]But if thetime_major == True, their shape must be:[max_time, batch_size,features]Note: max_time is equivalent to the previous timesteps, but why use max_time here? Because our sequences are sometimes of variable length, this is the maximum sequence length, and we’ll talk about what to do with indeterminate sequences later, we use max_time for generality. There are a lot of features and a lot of depth.
  • Sequence_length: This is an optional parameter that deals with variable-length sequences, but will not be explained here.
  • time_majorThis parameter determines the inputs and outputs shape. If True, inputs and outputs are inputs[max_time, batch_size, depth]. If False, inputs and outputs are inputs[batch_size, max_time, depth]. It is important to note that the default is False, but using True is more efficient. As I explained in the previous article, timesteps refer to static_rnn, which is implemented by RNN. No transpose operations are required. If timesteps are in the middle, they correspond to dynamic_rnn, which requires some transpose operations and is relatively inefficient

The return value:

  • outputs: The RNN output Tensor.

    • If time_major == False (default), this will be a Tensor shaped: [batch_size, max_time, cell.output_size]. This is the default

    • If time_major == True, this will be a Tensor shaped: [max_time, batch_size, cell.output_size].

  • State: Same as above.

Five: Supplement,

Here we have collected several dynamic diagrams of RNN to share with you: