Recently wrote a few basic things, always understand sex, didn’t see the instance, today is a basic network structure RNN, then write a instance, experience the depth of the neural network, the cow force the RNN neural network learning, although look very profound, but don’t panic, not theory, is full of vernacular, everybody can understand.

Note: WHEN reading, I hope to look at the table of contents first to know what I am talking about. I can also quickly read the points I want to get, so that I can understand it faster.

1. What is RNN

RNN is short for Rerrent Neural Network. Its English name is Rerrent Neural Network = RNN. Why loop. This will be explained slowly. There’s no hurry.

RNN is very effective for sequential data. It can mine temporal and semantic information in data. In terms of words, RNN has context information, can understand context, and can be used to mine the relationship between data when you do analysis.

For example: I don’t like beauty, to break up the text, I, no, love, beauty, in the ordinary course of connection of the neural network can each word is encouraged, no relationship, through a lot of function to fitting the data, this machine may understand is like beauty, not consider “no” in front of the context information.

RNN can solve this problem. RNN will record the information of the whole sentence and then make a comprehensive judgment before reaching a conclusion.

To sum up: RNN neural network is good at finding the relationship between data.

2. Principle description

2.1 RNN and full connection neural network slightly difference Common connection neural network as follows, all of them, and each attribute is independent, and then through a lot of fitting function parameters, and then processed, their own conclusions, as detailed introduction of function fitting, it can be seen that there was no correlation between each data

RNN is a recurrent neural network. Although I want to describe RNN in the most straightforward terms, but you may refer to the data in the future, you will see this picture frequently, so I put it in, so as not to fail to understand next time, because the figure is not so easy to understand, of course, if you understand RNN, you may understand. But it’s a little hard for us to get started.

The left part is the unexpanded RNN as shown in the figure. Where is the cycle of the recurrent neural network

X is a vector that represents the value of the input layer

U is the weight matrix from the input layer to the hidden layer

S is a vector that represents the value of the hidden layer

V is the weight matrix from the hidden layer to the output layer.

O is also a vector that represents the value of the output layer

Expressed as a function:

Let’s do it in code

def getHidenS(x,w,prevS):
        return x * u + prevS*w
def getOutput(s):
        return s * v
Copy the code

2.3 RNN expansion diagram interpretation

The diagram on the right looks simple, but the x below adds a time series, where the x represents the input word in time, for example:

For example: I love China, the word sequence, T-1 is the vector representation of the word “I”, love is the vector representation of the word “T”, t+1 is the vector representation of the word “China”

O is the output of the neural network for each word input, that is to say, each time you input a word vector there is an output, and then you can come to the conclusion that you can use one of them, or you can take a combination of them, depending on your needs

2.4 Common parameters of some points of RNN: Common parameters of time series of RNN, that is to say, the whole RNN shares a common set of parameters. When input at different time points, the weight parameters of neural network are all the same set. So there’s only one set of W parameters.

Memory function: Memory function is realized based on the output value of the hidden layer. Because the hidden layer will save the last information

3. The pseudo-code of RNN means encoding “I love China”, I = 1 love = 2 China =3

Enter x = [1,2,3]

W = 1 # weight matrix u = 1 # input layer to hidden layer matrix V = 2 # hidden layer to output layer matrix prevS = 1 # Hidden layer output value

def getHidenS(x, w, prevS):
   return x * u + prevS * w
 
def getOutput(s,v):
   return s * v
 
sentance = [1.2.3]
for x in sentance:
   prevS = getHidenS(x,u,prevS)
   o = getOutput(prevS,v)
   print('Hidden layer value:'+ str(prevS))
   print('Output layer value:'+str(o))
   print('-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --')
Copy the code

PrevS saves previous memories, and each output can be used to determine

4. Here’s a quick example

import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt
 
# https://www.cnblogs.com/lokvahkoor/p/12263953.html
# torch.manual_seed(1) # reproducible
 
# hyperparameter definition
TIME_STEP = 10  # rnn time step
INPUT_SIZE = 1  # rnn input size
LR = 0.02  Vector #
HIDDEN_SIZE = 32# Number of hidden layer neurons
EPOCH = 100
 
Output 100 float points in horizontal coordinates
steps = np.linspace(0, np.pi * 2.100, dtype=np.float32)  # float32 for converting torch FloatTensor
x_np = np.sin(steps)
y_np = np.cos(steps)
The input argument is a sequence of sine, and the output is a sequence of cosine
plt.plot(steps, y_np, 'r-', label='target (cos)')
plt.plot(steps, x_np, 'b-', label='input (sin)')
plt.legend(loc='best')
plt.show()
 
input("Please enter:")
 
class RNN(nn.Module) :
    def __init__(self) :
        super(RNN, self).__init__()
 
        self.rnn = nn.RNN(
            input_size=INPUT_SIZE,
            hidden_size=HIDDEN_SIZE,  # Number of hidden neurons
            num_layers=1.# a layer of RNN
            batch_first=True.# input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
        )
        self.out = nn.Linear(HIDDEN_SIZE, 1)
 
    def forward(self, x, h_state) :
        # x = (batch, time_step, input_size)
        # h_state = (n_layers, batch, hidden_size)
        # r_out = (batch, time_step, hidden_size)
 
        out, h_state = self.rnn(x, h_state)
 
        out = out.view(-1, HIDDEN_SIZE) # (10, 32)
        out = self.out(out) # (1, 1)
        out = out.unsqueeze(dim=0) # (1,10,1) -> (n_layers, batch, hidden_size)
        return out, h_state
 
 
rnn = RNN()
print(rnn)
optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)  # the optimizer
loss_func = nn.MSELoss()  # Loss function
 
h_state = None  Hide the output value of the layer
 
plt.figure(1, figsize=(12.5))
plt.ion()  # continuously plot
 
for step in range(EPOCH):
    # Every time new data is generated, the overall trend is to fit the cosine curve
    start, end = step * np.pi, (step + 1) * np.pi  # time range
    # use sin predicts cos
    steps = np.linspace(start, end, TIME_STEP, dtype=np.float32,
                        endpoint=False)  # float32 for converting torch FloatTensor
    x_np = np.sin(steps)
    y_np = np.cos(steps)
    # np.newaxis insert new dimensions,(1,10,1)
    # shape (batch, time_step, input_size)
    # means one in each batch
    x = torch.from_numpy(x_np[np.newaxis, :, np.newaxis])
    y = torch.from_numpy(y_np[np.newaxis, :, np.newaxis])
 
    prediction, h_state = rnn(x, h_state)  # compute output
    Save the hidden layer result of the previous step and enter it next time
    h_state = h_state.data  # repack the hidden state, break the connection from last iteration
 
    loss = loss_func(prediction, y)  # Calculation error
    optimizer.zero_grad()  Clear the previous gradient
    loss.backward()  # Backpropagation
    optimizer.step()  # Optimization parameters
 
    # Start drawing
    plt.plot(steps, y_np.flatten(), 'r-')
    plt.plot(steps, prediction.data.numpy().flatten(), 'b-')
    plt.draw()
    plt.pause(0.05)
 
plt.ioff()
plt.show()
Copy the code

Take a look at the final fit:

5. Problems existing in RNN

For gradient extinction: since they all have special ways to store “memories”, “memories” with large gradients in the past will not be erased immediately like simple RNN, so the gradient extinction problem can be overcome to a certain extent.

For gradient explosion: The problem to overcome gradient explosion is gradient clipping. That is, when the gradient you calculate exceeds the threshold C or is less than the threshold -C, set the gradient to C or -C.

6, summary

The key point of RNN is the memory function, that is, the context information is saved, but there are some problems, we will analyze how to solve this problem later.

Original is not easy, ask for praise support, power for love.

Click on the title to jump to

1. If you don’t go into the pit again, it will be too late

2. It is too late to enter the pit again. What is the simplest neural network like?

Understanding what a tensor is in three minutes