First, show the effect
There are many beautiful things in the world, and when we want to express them, we always feel that cultural heritage is not enough.
- When I saw the sea, I felt happy and passionate. I thought for a long time and said: Really big ah!
- When I saw the bird’s nest, I felt happy and passionate. I thought for a long time and said: Really big ah!
- See the beauty, the mood is happy, immediately feel passionate, think for a long time, said: really big ah!
Yeah, no culture that’s what it is.
However, you were born in this digital age, and the cultural heritage of 5,000 years of China is within your reach! This tutorial is for the AI to learn a lot of poetry, find the rules of poetry, and then you give it a few keywords, and it gives you a poem.
See the effect:
Input keywords | Output verse |
---|---|
The sea, the cool wind | The sea is wide and green, and the moon is empty. The drum moves with song, the cool wind rustles. |
Building, Bird’s Nest | Building drum bell urges, bird’s nest wears wu shore. Deep words in gao He, where birds swim. |
The beauty | Beauty step cold spring, home will not live. Day and night climb to see, Yin xuan see love. |
I, love, beauty, woman | I mean this leisurely, love chrysanthemum corresponding. Beautiful flower wine fear spring, female MOE step bride. |
Lao, Ban, Ying, Ming | Old lock cable sorrow spring, plate pavilion know me. British Fujian asked the old swim, Ming Lord good time late. |
Two, implementation steps
In our last post, RNN Text Generation — Want to Write poetry for Your Girlfriend (PART 1), we talked about how to train data.
In this post, we’ll try to use trained data.
2.1 Recovery Model
How you trained in the first place, now we need to restore that scene. How many floors were there, how many people were meeting, where they were sitting, and it’s the same now, but the topic of discussion has changed.
from numpy.core.records import array
import tensorflow as tf
import numpy as np
import os
import time
import random
# read the dictionary
vocab = np.load('vocab.npy')
Create a mapping from non-repeating characters to indexes
char2idx = {u:i for i, u in enumerate(vocab)}
Create a mapping from numbers to characters
idx2char = np.array(vocab)
The length of the word set is the size of the dictionary
vocab_size = len(vocab)
The dimension of embedding is the dimension of the generated embedding
embedding_dim = 256
# number of cells in RNN
rnn_units = 1024
def build_model(vocab_size, embedding_dim, rnn_units, batch_size) :
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)])
return model
Read the saved training results
checkpoint_dir = './training_checkpoints'
tf.train.latest_checkpoint(checkpoint_dir)
model = build_model(vocab_size, embedding_dim,
rnn_units, batch_size=1)
# only weights were saved, now only weights are loaded
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
# Build a model from historical results
model.build(tf.TensorShape([1.None]))
Copy the code
Finally, a model is obtained, which contains input structure, neuron structure, output format and other information. The most important thing is that it also loads these weights. These weights are random at the beginning, but after preliminary training, they become effective values that can predict the results. If model.summary()model.summary()model.summary() model.summary() is printed, it looks like this:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (1, None, 256) 1377280
_________________________________________________________________
gru_1 (GRU) (1, None, 1024) 3938304
_________________________________________________________________
dense_1 (Dense) (1, None, 5380) 5514500
=================================================================
Total params: 10,830,084
Trainable params: 10,830,084
Non-trainable params: 0
_________________________________________________________________
Copy the code
About the model
The sequence of the model is three parts, and the function is to input text, go through various layers of ravage, and finally predict what the next word might be
Graph TD A[Embedding layer Embedding] --> B[GRU]--> C[Dense] I()-->A C --> O1 C --> O2 C --> O4(...) C --> O5(rabbit: 1%)
2.1.1 Embedding layer Embedding
Give emotion to words
Words have emotional relationships. For example, when we see fallen leaves, we feel sad. See The Summer rain lotus, dream shouted the emperor.
But he didn’t know about the machine. He only knew about 10101010. Computer it’s really sad!
To solve this problem, he needs to mark words into numbers and let him understand the relationship between them through calculation.
How is color represented by numbers
color | The numerical |
---|---|
red | [0, 255] |
green | [0,255,0] |
blue | ,0,255 [0] |
yellow | [255,255,0] |
white | [255255255] |
Witness the miracle below, know colorology all know, red and green mixed together what color?
Come, read after me: red + green = yellow.
[255,0,0]+[0,255,0] = [255,255,0]
Is that hard? Seems to be not difficult, as long as the number is good, the bench has feet can run.
The same goes for text. That’s what the embedded layer does.
The color above is represented by 3 dimensions, while the embedding layer has more dimensions, 256 dimensions in our code for the embedding layer.
Each word has 256 dimensions to represent it. Such as “IT guy” and “IT person”.
Graph of TD C (IT male) - - > O1 (IT industry: 100%) C - > O2 (male: 100%) C - > O3 (grid unlined upper garment: 20%) C - > m1 (...). C --> O5(flirt: 0.01%)
Graph TD C practitioner (IT) -- - > O1 (IT industry: 100%) C - > O2 (male: 35.1%) C - > O3 (grid unlined upper garment: 30%) C - > m1 (...). C --> O5(flirting: 5%)
With that in mind, when we asked him to calculate the difference between IT men and IT workers, he calculated that the biggest difference was whether or not he was male.
So, you see, the computer understands the relationship between words.
This is the contribution of the embedding layer.
When we type in text, because of the previous training, the text has these dimensions in it.
With word dimensions, neural networks are in a position to make predictions.
2.1.2 Gated loop unit GRU
His girlfriend always remembers the things that annoyed him over the years, in great detail
Fill in the blanks: I sat on the curb is smoking, the distance, a man took out a cigarette, skilled out of a dike into his mouth, touched the pocket, and patted the pocket, he shook his head, walked to me, he said: brothers, _____.
What should I fill in this space, please?
Come on, shout it out with me: Give me a light.
Why is that? Have you ever asked yourself why you can answer it?
If the question is: he shook his head, walked towards me, he said: brother, _____.
Can you get it right? Do you know exactly what to say? Why is that?
It’s because of memory, the context of the text is related, we call it context, you’ve seen it in code called context.
You can first understand the text, and you have a memory, so you can predict what’s going to come next.
We solved the machine’s understanding problem above, but it also needs to have memory like we do.
This GRU layer, that’s what you do with memory.
It is highly advanced, storing only the memories relevant to the result and ignoring irrelevant words.
After its training and processing, it is estimated that the problem will look like this:
Fill in the blank: I smoke, the man took out a cigarette, dibeak into his mouth, he shook his head, he said: brother, _____. Answer: Borrow a fire.
With memory, neural networks have the confidence and power to make predictions.
2.1.3 Fully connected layer Dense
Three thousand weak water, just take one ladle.
In this case, it’s actually a classifier.
When we built it, the code was Dense. Dense.
His network layer structure in the sequence looks like this:
dense_1 (Dense) (1, None, 5380) 5514500
Copy the code
And what it does, whatever it is in front of you, comes to me and I’m going to force it into a fixed channel.
For example, to recognize numbers from 0 to 9, I have 500 neurons involved in the judgment, but the final output is 10 channels (0,1,2,3,4,5,6,7,8,9).
Recognize letters, that’s 26 channels.
The text we’re training here, over 70,000 sentences, has a total of 5,380 types of characters, so 5,380 channels. Given an input, the output is the probability of each word.
Graph of TD [C] Ming - the next words would be -- -- > O1 (month: 30%) C - the next words would be -- -- > O2 (20% :) C - the next words would be -- -- > O3 (turtle: 0.5%) C - the next words would be -- -- > m1 (...). C -- next word will be --> O5(rabbit: 1%)
With the classification layer, the neural network has a way to output the predicted results.
2.2 Forecast data
With the above explanation, I’m sure you can make sense of the following code. Not understanding is not my reader.
Here is an example of predicting the next word based on one word.
start_string = "Big"
# Convert the starting string to a number
input_eval = [char2idx[s] for s in start_string]
print(input_eval) # [1808]
Training model structure is generally multiple sets of input and output, to raise dimension
input_eval = tf.expand_dims(input_eval, 0)
print(input_eval) # Tensor([[1808]])
# Get the predicted results, which are multidimensional
predictions = model(input_eval)
print(predictions)
"" output is the prediction result, input a word" Ming "in total, output the corresponding probability of the next word, Tensor([[[-3.3992984 2.3124864-2.7357426... -10.154563]]]) ""
[[xx]] changes to [xx]
predictions1 = tf.squeeze(predictions, 0)
Using the characters returned by the classification distribution prediction model, num_samples are found out from the 5380 words based on probability
predicted_ids = tf.random.categorical(predictions1, num_samples=1).numpy()
print(idx2char[predicted_ids]) # [[' name ']]
Copy the code
The following is an example of generating an epilogue.
# Based on one paragraph of text, predict the next
def generate_text(model, start_string, num_generate=6) :
# Convert the starting string to a number (vectorization)
input_eval = [char2idx[s] for s in start_string]
[2,3,4,5]
Training model structure is generally multiple sets of input and output, to raise dimension
input_eval = tf.expand_dims(input_eval, 0)
[[2,3,4,5]]
An empty string is used to store results
text_generated = []
model.reset_states()
for i in range(num_generate):
# Get the predicted results, which are multidimensional
predictions = model(input_eval)
[[xx,xx]] changes to [xx,xx]
predictions = tf.squeeze(predictions, 0)
# Use classification distribution to predict the characters returned by the model
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1.0].numpy()
# Pass the prediction character along with the previous hidden state to the model as the next input
input_eval = tf.expand_dims([predicted_id], 0)
# Save the predicted characters
text_generated.append(idx2char[predicted_id])
# return the result
return start_string+' '.join(text_generated)
# % %
s = "There's more to dig."
array_keys = list(s)
all_string = ""
for word in array_keys:
all_string = all_string +""+ word
next_len = 5-len(word)
print("input:",all_string)
all_string = generate_text(model, start_string=all_string, num_generate = next_len)
print("out:",all_string)
print("Final output :"+all_string)
# % %
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | What's the end of digging a falcon and dragging a horse?"
Copy the code
Here is an example of a “dig for more” trope:
Nuggets not only · Hyundai ·TF boy |
---|
digFalcon drag jun later |
goldThe horse ACTS bootstrap |
Don’tWhat word blarney |
checkThe foot will be boring |