The introduction
This article is a summary of the PyTorch Introduction course: Deep Learning on the Torch series taught by Jizhi Academy.
This series of courses includes a large number of practical tasks, such as text classification, handwritten number recognition, translator, composing machine, AI games, etc. In the field also blended in a lot of basic and classic knowledge in the field of machine learning, such as back propagation principle, principle of neurons, in the field of NLP RNN, LSTM, word2vec, and migration study in the field of image, convolution neural network, and so on, very suitable for introduction to want to machine learning of the students, is the only good on the network course.
I would like to express my sincere thanks to Mr. Zhang Jiang for the great benefit I have gained from my study
The same series of articles:
- Use linear regression to fit the case to thoroughly understand the back propagation of deep learning
- Your first neural network — bike-sharing predictor
The body of the
The task of this paper is to realize a neural network model based on the text data of commodity reviews from e-commerce, which can be used to judge whether a review is favorable or unfavorable. The task is essentially a dichotomy problem. In the same way, the dichotomy problem can solve tasks such as sorting out meaningless comments and deciding whether an image belongs to a dog or a cat. The difference is whether the input is text, image, or some other data type.
Before we begin, let me make a brief underline to give you a general idea of what the text is about. The core of our task is divided into two parts:
First, data processing. In your first neural network — Bike-sharing predictor, I also spent a lot of space on data processing, including one-hot, normalization and other basic data processing operations. In this task, data processing is the most important. It involves, essentially, the question of how to make a machine recognize text and use it to compute. Here, we mainly use the simplest word bag model (BOW). In addition, regular matching and jieba word segmentation tools are needed to process Chinese.
Second, is the basic steps of neural network training, actually this step and your first neural network – steps Shared cycling predictor is basic same, is to establish a model of input data, forecast, back propagation loss calculation, and then to the gradient operation adjustment of weight, and repeat this entire process, until the loss no longer fall, So the instructions for the training steps will not be expanded in detail, and you can go back to the article if you don’t know. The only difference is that the loss function for the classification task is different from that for the regression task, which will be explained later.
Let’s start with the formal operation. Let’s look at data processing first.
1. Data preprocessing
Let’s take a look at the data format we used for modeling. TXT and bad. TXT, which store the good and bad reviews respectively. Each line represents a review. The data is from JINGdong (2013) and can be obtained from Github.
By reading the file line by line and storing it in sequence, we can filter out punctuation as we read and split sentences. This gives you two arrays, an array of positive text and an array of negative text.
Among them, regular expression can be used to filter punctuation, and stutter word segmentation can be used directly. This tool can accurately divide a sentence into several meaningful words. The number of times a particular word appears in a sentence, such as “good”, “like”, etc., usually leads us to assume that the sentence is a positive comment. So Chinese words are the smallest unit of analysis.
import re # regular expression package
import jieba # Stutter participle pack
# forward text
with open('good.txt'.'r') as fr:
for idx, line in enumerate(fr):
if True:
# Filter punctuation
line = re.sub("[\ s + \ \! \ / _ $% ^ * (+ \ \" "" "" ""?"] + | [+ -.,.?, ~ @ # $% & * () :] +"."", line)
# participle
words = jieba.lcut(line)
if len(words) > 0:
all_words += words
pos_sentences.append(words)
Negative text processing same, get array neg_sentences
Copy the code
BOW (Word bag model)
For a computer to be able to process text, the first step is to find a way to vectorize text. BOW method is a very easy to understand text vectorization method. Let me give you a simple example that you can see. The idea is to take the number of words contained in the text as the dimension of the vector, and the frequency of words appearing in the current sentence as the value of the corresponding position.
Sentence 1: "I like dancing, and so does Xiao Ming." Sentence 2: "I like singing, too."Copy the code
These are the two statements, and we now want to represent these two statements as vectors. From these two sentences we can extract a dictionary containing all the words in these two sentences.
Dictionary = {1: "I", 2: "like", 3: "dance", 4: "xiao Ming", 5: "also", 6: "sing"}Copy the code
The number of all words contained in the text is taken as the dimension of the vector, and the frequency of words appearing in the current sentence is taken as the value of the corresponding position. So, we immediately have vector representations of sentences.
Sentence 2: [1, 1, 0, 0, 1, 1, 1]Copy the code
Let’s get back to our mission. Following the idea above, we need to build a large dictionary of all the words and count the frequency of each word to get a vector representation of each sentence. Using the Collections tool here makes word frequency statistics even easier.
from collections import Counter # collector, which makes it easier to count word frequency
diction = {} # Big dictionary to build
cnt = Counter(all_words)
for word, freg in cnt.items():
diction[word] = [len(diction), freg] # Store the number and frequency of each word in the dictionary
Copy the code
Once you’ve built the large dictionary, start processing the comment text line by line
dataset = [] The vector of all sentences represents the set of data to be used for training and testing
# Handle positive comments
for sentence in pos_sentences:
new_sentence = []
for l in sentence:
if l in diction:
new_sentence.append(word2index(l, diction))
dataset.append(sentence2vec(new_sentence, diction))
labels.append(0) The positive tag is 0
sentences.append(sentence)
# which
# word2Index is the encoding function based on the word
# sentence2vec is to convert the target sentence into a vector function
# Here is not detailed display, you can try to write their own
The source code can be downloaded from github address mentioned at the beginning of this article
Copy the code
The dataset and label contain all the information we need, including text data and corresponding labels. Next, we can enter the training model steps
Next, you can start training the model. The following part of the code is quite large. Again, the detailed explanation of the training process code is almost the same as that of your first neural network — the bike-sharing predictor, so this article and its repeated part of the code will not be explained in detail. So the following code is just a paper tiger.
2. Start training
2.1 Build input and objective functions and build models
That is to process the initial data and divide the data into training set, verification set and test set based on dataset and label
The whole data set is divided into training set, calibration set and test set. The length of calibration and test set is 1/10 of the whole data set
test_size = len(dataset) // 10
train_data = dataset[2 * test_size :]
train_label = labels[2 * test_size :]
valid_data = dataset[: test_size]
valid_label = labels[: test_size]
test_data = dataset[test_size : 2 * test_size]
test_label = labels[test_size : 2 * test_size]
Copy the code
Pytorch can be used to quickly build a simple neural network model
The input dimension is the size of the dictionary: the word bag model for each comment paragraph
model = nn.Sequential(
nn.Linear(len(diction), 10),
nn.ReLU(),
nn.Linear(10, 2),
nn.LogSoftmax(),
)
Copy the code
- Enter a text vector of dictionary size length
- After a layer of nonlinear transformation relu
- It goes through a layer of linear transformation
- After normalization logSoftmax
Why is the output two-dimensional here? Isn’t our label 1 or 0?
In fact, for the sake of calculation, we have one-hot encoding for the tag. The one-hot encoding is also mentioned in the last article, because there is no such thing as “1 is greater than 0” for 0 and 1, just like Monday and Tuesday, they are type variables. In order to avoid the influence of type variables 0 and 1 on the training of neural network.
2.1 Training + verification process
Let’s go straight to the code
The loss function is cross entropy
cost = torch.nn.NLLLoss()
The optimization algorithm is Adam, which can automatically adjust the learning rateOptimizer = torch.optim.adam (model.parameters(), lr = 0.01) records = []# Loop 10 epochs
losses = []
for epoch in range(10):
for i, data in enumerate(zip(train_data, train_label)):
x, y = data
x = torch.tensor(x, requires_grad = True, dtype = torch.float).view(1, -1)
y = torch.tensor([y], dtype = torch.long)
optimizer.zero_grad()
predict = model(x)
loss = cost(predict, y)
Add the loss function value to the list
losses.append(loss.data.numpy())
# Start gradient backpass
loss.backward()
# start a step optimization of the parameters
optimizer.step()
Every 3000 steps, run the validation data set to output temporary results
if i % 3000 == 0:
rights = []
val_losses = []
for j, val in enumerate(zip(valid_data, valid_label)):
x, y = val
x = torch.tensor(x, requires_grad = True, dtype = torch.float).view(1, -1)
y = torch.tensor([y], dtype = torch.long)
predict = model(x)
Call the rightness function to calculate the accuracy
right = rightness(predict, y)
rights.append(right)
loss = cost(predict, y)
val_losses.append(loss.data.numpy())
# Calculate the average accuracy above the checksum setRight_ratio = 1.0 * np.sum([I [0]for i in rights]) / np.sum([i[1] for i in rights])
print('Round {}, training loss: {:.2f}, check loss: {:.2f}, check accuracy: {:.2f}'.format(epoch, np.mean(losses), np.mean(val_losses), right_ratio))
records.append([np.mean(losses), np.mean(val_losses), right_ratio])
Copy the code
Here I want to emphasize the following points:
First of all, the above code is slightly different from the previous article in that the verification process and training process are written together, but the idea is the same. The data of the verification set is used to run on the trained model and observe the change of the verification set VAL_loss. The result would be more objective
Secondly, for classification problems, the accuracy of the results can also be calculated rightness. The accuracy of the prediction is calculated by comparing the actual tag with the predicted value. The real label and the predicted value are both two-dimensional matrices
def rightness(predictions, labels):
# """ a function that calculates the prediction error rate
# Where Predictions is a set of predictions from the model
# batch_size matrix for the num_classes column
# labels is the correct answer in the data """
For the first dimension of the output value of any row (a sample), maximize the index of the largest element in each row
pred = torch.max(predictions.data, 1)[1]
# Compare the subscript to the categories contained in labels and accumulate the correct number
rights = pred.eq(labels.data.view_as(pred)).sum()
# return the correct number and how many elements were compared this time
return rights, len(labels)
Copy the code
Last, and most important, is the loss function torch.nn.nllLoss (), or cross entropy.
The cross entropy formula of dichotomies is as follows:
This is also the method used in the current task, where:
- Y — represents the label of the sample, positive sample is 1, negative sample is 0
- P — is the probability of a positive sample prediction.
The predicted value of the neural network for the classification problem is usually a probability. In the current task, for example, the prediction spits out [0.8, 0.2], which means that the neural network has a higher probability of predicting the current sample to be 1 (the first value is larger). Cross entropy is used to calculate the predicted loss of classification problems, that is, if the real sample is 1, the “difference” between [0.8, 0.2] and [1,0] is compared. The smaller the “difference” value is, the closer the predicted loss is to the real one.
When loss does not decrease any more, the model basically completes the training. The following figure plots the changes of training set Loss, calibration set Loss and accuracy.
It can be believed that the model has the best effect in the part where the verification set Loss and the training set Loss overlap. If the training continues, although the loss of the training set continues to decrease, the verification set loss does not decrease but increases. At this time, the model has been fitted.
3. Test the effect of the model.
We take the test set data to see the prediction effect of the model.
rights = []
for i, data inenumerate(zip(test_data, test_label)): x, y = data x = torch.tensor(x, requires_grad = True, dtype = torch.float).view(1, -1) y = torch.tensor([y], dtype = torch.long) predict = model(x) right = rightness(predict, Y) right.append (right) right_ratio = 1.0 * np.sum([I [0]for i in rights]) / np.sum([i[1] for i in rights])
print('Test accuracy: {:.2f}'.format(right_ratio))
Copy the code
Final output accuracy: 0.91.
At this point, the job is done, and we have a text classifier that can distinguish between good and bad reviews with 91% accuracy
The end of the
Thank you for seeing this. The pit for the text classifier is filled.
In the next period of time, I will also try to image recognition, text translation, AI games and other real cases. All study cases are from Teacher Zhang Jiang’s PyTorch and Deep Learning course.
Hope to share with you.