1. What is RNN

It has a Recurrent Neural Network (RNN), a population that feeds on sequence data, A recursive neural network that recurses in the progression direction of a sequence and all nodes (cyclic units) are linked by a chain.

1.1 Application of RNN

  • Text generation (Generate sequences)
  • Machine translation
  • Look at the picture speak
  • Text (emotion) analysis
  • Intelligent customer service
  • Chatbot
  • Speech recognition
  • Search engine
  • Personalized recommendation

1.2 Why RNN is needed when THERE is CNN?

  • In traditional neural networks (including CNN), input and output are independent of each other. The cat and dog in the image are separated, but for some tasks, the subsequent output is related to the previous content. For example: I am Chinese, my mother tongue is ____. This is a fill-in-the-blank question that relies on previous input.
  • Therefore, RNN introduces the concept of “memory”, that is, the output needs to rely on the previous input sequence, and the key input to remember. The word loop 2 comes from the fact that each of its elements performs the same task.
  • Instead of rigidly remembering all fixed length sequences, it stores information about previous time steps by hiding states.

1.3 Network structure of RNN

First, the picture above, and then the explanation:

Now let’s consider the time dependence of the input data. Assuming thatIs the small batch input of time step T in the sequence,Is the hidden variable of the time step. Then according to the above structure diagram, the current formula of hidden variables is as follows:


From the above formula we can see that here we save the hidden variable of the previous time stepAnd introduces a new weight parameter that describes how hidden variables from the previous time step are used at the current time step. Specifically,The calculation of the hidden variable of time step T is determined by the input of the current time step and the hidden variable of the previous time step. A function is an activation function.

We added it hereA. From the hidden variable of the adjacent time step in the above equationThe hidden variable here can capture the historical information of the sequence up to the current time step, just like the state or memory of the current time step of the neural network. Therefore, the hidden variable is also called the hidden state.Since the definition of the hidden state in the current time step uses the hidden state in the previous time step, the calculation of the above formula is cyclic. Networks using cyclic computing are known as recurrent neural networks.

In time step T, the output of the output layer is similar to the calculation in the multilayer perceptron:


1.4 the bidirectional RNN

The recurrent neural network models introduced previously assume that the current time step is determined by the sequence of previous earlier time steps, so they all transmit information from front to back through hidden states. Sometimes, the current time step may also be determined by the subsequent time step. For example, when we write a sentence, we might modify the words in front of the sentence based on the words that follow the sentence. Bidirectional cyclic neural networks can process such information more flexibly by adding hidden layers that transmit information from back to front. ** The figure below illustrates the architecture of a bidirectional cyclic neural network with a single hidden layer.

In the architecture of bidirectional recurrent neural network, the forward hiding state of the time step is assumed to be(The number of forward hiding units is H), and the reverse hiding state is(The number of reverse hidden units is H). We can calculate the forward hiding state and reverse hiding state respectively:



Then we connect the hidden states in both directionsTo get the hidden stateAnd input it to the output layer. The output layer computes the output(Output number is Q) :


The hidden state of the bidirectional recurrent neural network in each time step depends on both the subsequence before and after the time step (including the input of the current time step).

1.5 BPTT algorithm

You’ve seen before how forward propagation (the direction of the blue arrow above) can be computed from left to right in a neural network until all the predictions are printed out. In the case of backpropagation, as YOU’ve guessed, the calculated direction of backpropagation (the direction indicated by the red arrow above) is basically the opposite of forward propagation.

Let’s define an element loss function:


Loss function of the whole sequence:


In this calculation, throughCorresponding loss function can be calculated, then calculate the loss function of the first time step, and then calculate the second time step loss function, and then is the third time step, until the last time step, finally to calculate the overall loss function, we’ll put them all together, through the equation to calculate the final 𝐿, So you add up the loss function for each individual time step. And then you can update the parameters by gradient descent with the derivative dependent parameters.

In the process of backpropagation, the most important information transfer or the most important recursive operation is the operation from right to left, which is why this algorithm has a unique name, “backpropagation through time”. ** got the name because for forward propagation, you need to compute from left to right, and in the process, the moments 𝑡 keep increasing. For backpropagation, you compute from right to left, like going back in time. “Back propagation through time,” like traveling through time, sounds like you need a time machine to implement this algorithm.

2. Other types of RNN

  • **One to One: ** This may not be that important, this is a small standard neural network, input 𝑥 and output 𝑦.

  • **One to many: ** Music generation, your goal is to output some notes using a neural network. For a piece of music, type 𝑥

    It can be an integer representing the type of music you want or the first note of the music you want, and if you don’t want to enter anything, 𝑥 can be an empty input, which can be set to the 0 vector.

  • **Many to one: ** Sentence classification problem, input document, output document type.

  • **Many to Many () : ** Named entity recognition.

  • **Many to Many () : **

3. The difference between CNN and RNN

category The characteristics of description
The same 1. The extension of traditional neural network.

2. Forward calculation produces results, and reverse calculation model updates.

3. Multiple neurons can coexist horizontally in each layer of neural network, and multi-layer neural network can be connected longitudinally.
The difference between 1. CNN space expansion, neuron convolution with feature; RNN time expansion, neurons with multiple time output calculations

2. RNN can be used to describe the output of continuous state in time with memory function, while CNN is used for static output

4. Why does Loss fluctuate greatly during RNN training

Since the unique memory of RNN will affect the characteristics of other RNN in the later period, the gradient is large and small, and the learning rate cannot be personalized adjusted. As a result, Loss will fluctuate during the process of train. In order to solve this problem of RNN, critical value can be set during training. When the gradient is greater than a critical value, it is directly cut off, and this critical value is used as the magnitude of the gradient to prevent large oscillation.

5. Example code

TensorFlow implementation RNN

Machine Learning


Author: @ mantchs

GitHub:github.com/NLP-LOVE/ML…

Welcome to join the discussion! Work together to improve this project! Group Number: [541954936]