To view the full version of the tutorial, go to studyai.com/pytorch-1.4…
In the last tutorial, we used RNN to classify names into the language they belong to. This time, we’ll turn around and generate names from languages.
> python sample.py Russian RUS
Rovakov
Uantov
Shavakov
> python sample.py German GER
Gerren
Ereng
Rosher
> python sample.py Spanish SPA
Salla
Parer
Allan
> python sample.py Chinese CHI
Chan
Hang
Iun
Copy the code
We still handcraft a small RNN with several linear layers. The big difference is that after reading all the letters of a name, instead of predicting a category, we type a category and output one letter at a time. Recurrently predicting characters to form language (which can also be done with words or other higher-order structures) is often referred to as a “language model.”
Recommended reading:
I assume you have recently installed PyTorch, know Python, and understand what Tensors are:
https://pytorch.org/ See the installation guide for deep learning with PyTorch: Learn PyTorch Examples for a broad and in-depth overview of PyTorch for Former Torch Users if you are a Former Lua Torch userCopy the code
It would be useful to know RNNs and how they work:
The irrational validity of recursive neural networks shows a lot of real life cases. Understanding LSTM Networks This article is specifically about LSTMs and also has general information about RNNs.Copy the code
I also recommend reading the previous tutorial on Sorting Names using character-level RNN.
To prepare data
Note
Download the data from here and extract it to the current directory.
Refer to the previous tutorial for more information on this process. To put it simply, we have a bunch of plain text files, data/names/[Language].txt, where each line is a name. We split lines into an array, converted from Unicode to ASCII, and constructed a dictionary: {language: [names…] }.
from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
import unicodedata
import string
all_letters = string.ascii_letters + " .,;'-"
n_letters = len(all_letters) + 1 # Plus EOS marker
def findFiles(path): return glob.glob(path)
# Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
return ' '.join(
c for c in unicodedata.normalize('NFD', s)
ifunicodedata.category(c) ! ='Mn'
and c in all_letters
)
# Read a file and split into lines
def readLines(filename):
lines = open(filename, encoding='utf-8').read().strip().split('\n')
return [unicodeToAscii(line) for line in lines]
# Build the category_lines dictionary, a list of lines per category
category_lines = {}
all_categories = []
for filename in findFiles('data/names/*.txt'):
category = os.path.splitext(os.path.basename(filename))[0]
all_categories.append(category)
lines = readLines(filename)
category_lines[category] = lines
n_categories = len(all_categories)
if n_categories == 0:
raise RuntimeError('Data not found. Make sure that you downloaded data '
'from https://download.pytorch.org/tutorial/data.zip and extract it to '
'the current directory.')
print('# categories:', n_categories, all_categories)
print(unicodeToAscii("O 'neal"))
Copy the code
Create a network
This network extends the RNN network of the previous tutorial and provides additional parameters for class tensors, along with concatenate for other class tensors. Category tensor is a one-hot vector just like letter input.
We will interpret the output as the probability of the next letter. When sampling, the most likely output letter is used as the next input letter.
I added a second linear layer o2o (after merging hiding and output) to give it more work space. There is also a Dropout layer, which randomly zeros parts of its input with a given probability (0.1 in this case) and is often used to blur the input to prevent overfitting. Here, we use it near the end of the network in order to add some confusion and increase sampling diversity.
import torch import torch.nn as nn class RNN(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(RNN, self).__init__() self.hidden_size = hidden_size self.i2h = nn.Linear(n_categories + input_size + hidden_size, hidden_size) self.i2o = nn.Linear(n_categories + input_size + hidden_size, output_size) self.o2o = nn.Linear(hidden_size + output_size, Def forward(self, category, input) self.dropout = self.dropout (0.1) self.dropout = self.dropout (0.1) def forward(self, category, input) self.dropout = self.dropout (0.1) self.dropout = self.dropout (0.1) def forward(self, category, input) hidden): input_combined = torch.cat((category, input, hidden), 1) hidden = self.i2h(input_combined) output = self.i2o(input_combined) output_combined = torch.cat((hidden, output), 1) output = self.o2o(output_combined) output = self.dropout(output) output = self.softmax(output)return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
Copy the code
training
Get ready for training
Before training, we should do some auxiliary functions to obtain random pairs (category, line) :
import random
# Random item from a list
def randomChoice(l):
return l[random.randint(0, len(l) - 1)]
# Get a random category and random line from that category
def randomTrainingPair():
category = randomChoice(all_categories)
line = randomChoice(category_lines[category])
return category, line
Copy the code
For each time step (that is, for each letter in the training word), the input to the network will be (category, current letter, hidden State) and the output will be (next Letter, next hidden State). So, for each training set, we need category, a set of input letters, and a set of output/target letters.
Because we at each time step from current letter to predict the next letter, so the letter to group is the continuous line of letters, for example, for “ABCD”, we will create (” A “, “B”), (” B “, “C”), (” C “, “D”), (” D “, “EOS”).
Category tensor is a one-hot tensor and its size is <1 x n_categories>. When training, we upload it to the network at each time step – a design choice that can be included as part of the initial state or some other strategy.
# One-hot vector for category
def categoryTensor(category):
li = all_categories.index(category)
tensor = torch.zeros(1, n_categories)
tensor[0][li] = 1
return tensor
# One-hot matrix of first to last letters (not including EOS) for input
def inputTensor(line):
tensor = torch.zeros(len(line), 1, n_letters)
for li in range(len(line)):
letter = line[li]
tensor[li][0][all_letters.find(letter)] = 1
return tensor
# LongTensor of second letter to end (EOS) for target
def targetTensor(line):
letter_indexes = [all_letters.find(line[li]) for li in range(1, len(line))]
letter_indexes.append(n_letters - 1) # EOS
return torch.LongTensor(letter_indexes)
Copy the code
To facilitate the training process, we will do a randomTrainingExample function that takes a random (category, line) pair and converts them into the desired (category, input, Target) tensor.
# Make category, input, and target tensors from a random category, line pair
def randomTrainingExample():
category, line = randomTrainingPair()
category_tensor = categoryTensor(category)
input_line_tensor = inputTensor(line)
target_line_tensor = targetTensor(line)
return category_tensor, input_line_tensor, target_line_tensor
Copy the code
Training network
In contrast to a classification network that uses only the last output (making the prediction and calculating the loss), we make the prediction at each step, so we calculate the loss at each step.
The magic of automatic gradients allows you to simply add up the losses at each step and call back propagation at the end.
Criterion = nn.nllLoss () Learning_rate = 0.0005 def train(category_tensor, input_line_tensor, target_line_tensor): target_line_tensor.unsqueeze_(-1) hidden = rnn.initHidden() rnn.zero_grad() loss = 0for i in range(input_line_tensor.size(0)):
output, hidden = rnn(category_tensor, input_line_tensor[i], hidden)
l = criterion(output, target_line_tensor[i])
loss += l
loss.backward()
for p in rnn.parameters():
p.data.add_(-learning_rate, p.grad.data)
return output, loss.item() / input_line_tensor.size(0)
Copy the code
To keep track of the time required for training, I’ve added a timeSince(timestamp) function that returns a human-readable string:
import time
import math
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
Copy the code
Train as usual – call train ‘several times, wait a few minutes, print the current time every print_every sample, loss, and save the average loss on print_every sample to all_losses for later drawing.
rnn = RNN(n_letters, 128, n_letters)
n_iters = 100000
print_every = 5000
plot_every = 500
all_losses = []
total_loss = 0 # Reset every plot_every iters
start = time.time()
for iter in range(1, n_iters + 1):
output, loss = train(*randomTrainingExample())
total_loss += loss
if iter % print_every == 0:
print('%s (%d %d%%) %.4f' % (timeSince(start), iter, iter / n_iters * 100, loss))
if iter % plot_every == 0:
all_losses.append(total_loss / plot_every)
total_loss = 0
Copy the code
Plot the loss curve
Draw historical losses from all_losses to show network learning:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.figure()
plt.plot(all_losses)
Copy the code
Network sampling
To sample, we give the network a letter and ask what the next letter is, enter its answer as the next letter, and repeat until EOS token.
Create tensors for input categories, initial letters, and empty hidden states using the initial letters create a string output_name until the maximum output length is reached, feed the current letter to the network to get the next letter from the highest output, and get the next hidden state if the letter is EOS, I'll stop there and if it's a regular letter, add it to output_name and then return the final nameCopy the code
Note
Another strategy is not to give it a start letter, but to include a “Start of string” token in the training and let the network choose its own start letter.
max_length = 20
# Sample from a category and starting letter
def sample(category, start_letter='A'):
with torch.no_grad(): # no need to track history in sampling
category_tensor = categoryTensor(category)
input = inputTensor(start_letter)
hidden = rnn.initHidden()
output_name = start_letter
for i in range(max_length):
output, hidden = rnn(category_tensor, input[0], hidden)
topv, topi = output.topk(1)
topi = topi[0][0]
if topi == n_letters - 1:
break
else:
letter = all_letters[topi]
output_name += letter
input = inputTensor(letter)
return output_name
# Get multiple samples from one category and multiple starting letters
def samples(category, start_letters='ABC') :for start_letter in start_letters:
print(sample(category, start_letter))
samples('Russian'.'RUS')
samples('German'.'GER')
samples('Spanish'.'SPA')
samples('Chinese'.'CHI')
Copy the code
practice
Try another dataset with the structure category -> line, for example: Series -> Part of Speech -> Word -> country -> city uses a "start of Sentence" token so that sampling of the network can be done without selecting the starting character Try merging multiple of these RNNs into more advanced networks using the Nn. LSTM layer and the Nn. GRU layer.Copy the code