To view the tutorial in full, go to studyai.com/pytorch-1.4…

We will build and train a basic character-level RNN to classify words. The character-level RNN reads the word as a series of characters — prints a prediction and “hidden state” at each step, entering its previous hidden state into the next step. We take as output the final prediction, which category the word belongs to.

Specifically, we will train thousands of surnames from 18 languages and predict which language a name comes from based on spelling:

$python predictor.py Hinton (-0.47) Scottish (-1.52) English (-3.57) Irish $python predictor.py Schmidhuber (-0.19) German (2.48) Czech Dutch (2.68)Copy the code

Recommended reading:

I assume you have recently installed PyTorch, know Python, and understand what Tensors are:

https://pytorch.org/ See the installation guide for deep learning with PyTorch: Learn PyTorch Examples for a broad and in-depth overview of PyTorch for Former Torch Users if you are a Former Lua Torch userCopy the code

It would be useful to know RNNs and how they work:

The irrational validity of recursive neural networks shows a lot of real life cases. Understanding LSTM Networks This article is specifically about LSTMs and also has general information about RNNs.Copy the code

To prepare data

Note

Download the data from here and extract it to the current directory.

Contained in the data/names directory are 18 text files named [Language].txt. Each file contains a number of names, most of which are spelled in Latin letters on a single line (but we still need to convert them from Unicode to ASCII).

Finally, we get a dictionary with a list of names for each language, {language: [names…] }. The generalization variables “category” and “line” (for language and name in this case) are used for later extensibility.

from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os

def findFiles(path): return glob.glob(path)

print(findFiles('data/names/*.txt'))

import unicodedata
import string

all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)

# Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ' '.join(
        c for c in unicodedata.normalize('NFD', s)
        ifunicodedata.category(c) ! ='Mn'
        and c in all_letters
    )

print(unicodeToAscii('Ś lusarski'))

# Build the category_lines dictionary, a list of names per language
category_lines = {}
all_categories = []

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for filename in findFiles('data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

n_categories = len(all_categories)
Copy the code

Now we have category_lines, which is a dictionary that maps each category (language) to a list of one line (name). We also tracked all_categories (just a list of languages) and n_categories for future reference.

print(category_lines['Italian'][:5])
Copy the code

Converts Names to tensors

Now that we have all the names organized, we need to convert them to tensors to use them.

To represent a single letter, we use a “one-hot vector” of size <1 x n_letters>. The one-hot vector is populated with zeros except for the ones at the current alphabetic index, for example, e.g. “b” = <0 1 0 0 0… >.

To create a word, we add these elements to a 2D matrix <line_length x 1 x n_letters>.

This extra dimension is attributed to PyTorch’s assumption that everything is just a batch size of 1.

import torch

# Find letter index from all_letters, e.g. "a" = 0
def letterToIndex(letter):
    return all_letters.find(letter)

# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letterToTensor(letter):
    tensor = torch.zeros(1, n_letters)
    tensor[0][letterToIndex(letter)] = 1
    return tensor

# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def lineToTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li, letter in enumerate(line):
        tensor[li][0][letterToIndex(letter)] = 1
    return tensor

print(letterToTensor('J'))

print(lineToTensor('Jones').size())
Copy the code

Create a network

Before automatic gradients, creating a recursive neural network in Torch required cloning the parameters of a layer in several time steps. Layers containing hidden states and gradients are now handled entirely by the diagram itself. This means that you can implement RNN in a very “pure” way, as a regular feedforward layer.

The RNN Module (copied mainly from the PyTorch tutorial written for Torch users) is just two linear layers that work in input and hidden states, with a LogSoftmax layer behind the output.

import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

n_hidden = 128
rnn = RNN(n_letters, n_hidden, n_categories)
Copy the code

To run a step of this network, we need to pass an input (in our case, the tensor of the current letter) and a previously hidden state (which we first initialize to zero). We will return the output (the probability of each language) and the next hidden state (which we will keep in the next step).

input = letterToTensor('A')
hidden =torch.zeros(1, n_hidden)

output, next_hidden = rnn(input, hidden)
Copy the code

For efficiency, we don’t want to create a new tensor for each step, so we’re going to use lineToTensor instead of letterToTensor and slicing. This can be further optimized by predicting the batches of the tensor.

input = lineToTensor('Albert')
hidden = torch.zeros(1, n_hidden)

output, next_hidden = rnn(input[0], hidden)
print(output)
Copy the code

As you can see, the output is the <1 x n_categories> tensor, where each item has a likelihood of that category. Training prepares for training

Before we start training, we should do some auxiliary functions. The first is to explain the outputs of the network, and we know that the outputs of the network are possibilities for each category. We can use Tensor. Topk to get the maximum index:

def categoryFromOutput(output):
    top_n, top_i = output.topk(1)
    category_i = top_i[0].item()
    return all_categories[category_i], category_i

print(categoryFromOutput(output))
Copy the code

We also need a way to quickly get the training sample (name and language) :

import random

def randomChoice(l):
    return l[random.randint(0, len(l) - 1)]

def randomTrainingExample():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
    line_tensor = lineToTensor(line)
    return category, line, category_tensor, line_tensor

for i in range(10):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    print('category =', category, '/ line =', line)
Copy the code

Training network

Now all it takes to train the network is to show it a bunch of examples, let it guess, and if it’s wrong, tell it it made a mistake.

Nn.nllloss is appropriate for the loss function, because the last layer of RNN is nn.logsoftmax.

criterion = nn.NLLLoss()
Copy the code

Each cycle of training:

Create the input tensor and the target tensor create a zero initialized hidden state to read each character in, and leave the hidden state for the next character to compare the final output to the target value. Back propagation returns output and lossCopy the code
Learning_rate = 0.005# If you set this too high, it might explode. If too low, it might not learn

def train(category_tensor, line_tensor):
    hidden = rnn.initHidden()

    rnn.zero_grad()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    # Add parameters' gradients to their values, multiplied by learning rate
    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

    return output, loss.item()
Copy the code

Now we just need to analyze the problem with a bunch of examples. Since the train function returns output and loss, we can either print its guess or trace the loss to plot. Since there are 1000 examples, we just print each print_every example and take an average loss.

import time
import math

n_iters = 100000
print_every = 5000
plot_every = 1000



# Keep track of losses for plotting
current_loss = 0
all_losses = []

def timeSince(since):
    now = time.time()
    s = now - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

start = time.time()

for iter in range(1, n_iters + 1):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output, loss = train(category_tensor, line_tensor)
    current_loss += loss

    # Print iter number, loss, name and guess
    if iter % print_every == 0:
        guess, guess_i = categoryFromOutput(output)
        correct = '✓' if guess == category else '✗ (% s)' % category
        print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))

    # Add current loss avg to list of losses
    if iter % plot_every == 0:
        all_losses.append(current_loss / plot_every)
        current_loss = 0
Copy the code

Draw the result

Draw historical losses from all_losses to show network learning:

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

plt.figure()
plt.plot(all_losses)
Copy the code

Evaluate the results

To see the performance of the network on different categories, we will create a confusion matrix that indicates the network guess language (column) for each actual language (row). To compute the obturation matrix, run evaluate() to pass a bunch of samples across the network, the same as train() with the back propagation removed.

# Keep track of correct guesses in a confusion matrix
confusion = torch.zeros(n_categories, n_categories)
n_confusion = 10000

# Just return an output given a line
def evaluate(line_tensor):
    hidden = rnn.initHidden()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    return output

# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output = evaluate(line_tensor)
    guess, guess_i = categoryFromOutput(output)
    category_i = all_categories.index(category)
    confusion[category_i][guess_i] += 1

# Normalize by dividing every row by its sum
for i in range(n_categories):
    confusion[i] = confusion[i] / confusion[i].sum()

# Set up plot
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(confusion.numpy())
fig.colorbar(cax)

# Set up axes
ax.set_xticklabels([' '] + all_categories, rotation=90)
ax.set_yticklabels([' '] + all_categories)

# Force label at every tick
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# sphinx_gallery_thumbnail_number = 2
plt.show()
Copy the code

You can pick out the bright spots on the spindle to show which languages it guessed wrong, such as Chinese for Korean, and Spanish for Italian. It seems to do well in Greek and badly in English (perhaps because of the overlap with other languages). Run our user input

def predict(input_line, n_predictions=3):
    print('\n> %s' % input_line)
    with torch.no_grad():
        output = evaluate(lineToTensor(input_line))

        # Get top N categories
        topv, topi = output.topk(n_predictions, 1, True)
        predictions = []

        for i in range(n_predictions):
            value = topv[0][i].item()
            category_index = topi[0][i].item()
            print('(%.2f) %s' % (value, all_categories[category_index]))
            predictions.append([value, all_categories[category_index]])

predict('Dovesky')
predict('Jackson')
predict('Satoshi')
Copy the code

The final version of PyTorch action splits the above code into several files:

Py (defines RNN) train.py (runs the training procedure) predictor. py (runs predict() with the command line argument) server.py (uses bottle.py to make the prediction JSON API)Copy the code

Run train.py to train and save the network.

Run predictor. py with a name to see the predicted results:

$python predict.py Hazaki (-0.42) Japanese (-1.39) Polish (-3.51) Czech

Run the server. Py and visit http://localhost:5533/Yourname to obtain prediction practice

Try another dataset with the structure line -> category, for example: Any word -> language First name -> Gender personas -> author page title -> Blog Get better results with a bigger and/or better network Add more linear layers try nn.LSTM and nn.GRU layers Combine multiple of these RNNs into higher-level networks.Copy the code