- Part of Speech Tagging tutorial with the Keras Deep Learning Library
- 原文作者:Cdiscount Data Science
- Translation from: The Gold Project
- This article is permalink: github.com/xitu/gold-m…
- Translator: luochen
- Proofread by: Stormluke Mingxing47
In this tutorial, you will see how to use a simple Keras model to train and evaluate artificial neural networks for multiple classification problems.
Photograph by Unsplash’s Joao Tzanno
Part of speech tagging is a well-known task in natural language processing. It refers to categorizing words into parts of speech (also known as parts of speech or part of speech categories). This is a supervised method of learning.
Artificial neural networks have been successfully applied to pos tagging with excellent performance. We will focus on multi-layer perceptron networks, a very popular network structure that is seen as the latest technology to solve the problem of pos tagging. (Translator: RNN works better for pos tagging problems)
Let’s put it into practice!
In this article, you’ll get a quick tutorial on how to implement a simple multilayer perceptron in Keras, trained on the annotated corpus.
Ensure repeatability
To ensure that our experiment can be reproduced, we need to set a random seed:
import numpy as np
CUSTOM_SEED = 42
np.random.seed(CUSTOM_SEED)
Copy the code
Get the annotated corpus
Penn Treebank is a part-of-speech tagging corpus. Example of a Python library, [me] (https://github.com/nltk/nltk) can be used for training and testing of some natural language processing (NLP) models model corpus.
First, we download the marked corpus:
import nltk
nltk.download('treebank')
Copy the code
Then we load the marked sentences.
from nltk.corpus import treebank
sentences = treebank.tagged_sents(tagset='universal')
Copy the code
Then let’s pick a random sentence and see:
import random
print(random.choice(sentences))
Copy the code
This is a list of tuples (term, tag).
[('Mr.'.'NOUN'), ('Otero'.'NOUN'), (', '.'. '), ('who'.'PRON'), ('apparently'.'ADV'), ('has'.'VERB'), ('an'.'DET'), ('unpublished'.'ADJ'), ('number'.'NOUN'), (', '.'. '), ('also'.'ADV'), ('could'.'VERB'), ("n't".'ADV'), ('be'.'VERB'), ('reached'.'VERB'), ('. '.'. ')]
Copy the code
This is a multi-classification problem with more than forty different categories. Pos tagging on the Treebank corpus is a well-known problem, and we expect the accuracy of the model to exceed 95%.
tags = set([
tag for sentence in treebank.tagged_sents()
for _, tag in sentence
])
print('nb_tags: %sntags: %s' % (len(tags), tags))
Copy the code
Produced a:
46
{'IN'.'VBZ'.'. '.'RP'.'DT'.'VB'.'RBR'.'CC'.The '#'.', '.'VBP'.'WP$'.'PRP'.'JJ'.'RBS'.'LS'.'PRP$'.'WRB'.'JJS'.'` `'.'EX'.'POS'.'WP'.'VBN'.'-LRB-'.'-RRB-'.'FW'.'MD'.'VBG'.'TO'.'$'.'NNS'.'NNPS'."" '".'VBD'.'JJR'.':'.'PDT'.'SYM'.'NNP'.'CD'.'RB'.'WDT'.'UH'.'NN'.'-NONE-'}
Copy the code
### Data set preprocessing for supervised learning
We divided the marked sentences into three data sets:
- The training set is equivalent to the sample data of the fitting model,
- The validation set is used to adjust the parameters of the classifier, such as selecting the number of neurons in the network,
- The test set is used only to evaluate the performance of the classifier.
We use about 60% of the marked sentences for training, 20% for validation sets, and 20% for evaluating our model.
train_test_cutoff = int(80. * len(sentences))
training_sentences = sentences[:train_test_cutoff]
testing_sentences = sentences[train_test_cutoff:]
train_val_cutoff = int(25. * len(training_sentences))
validation_sentences = training_sentences[:train_val_cutoff]
training_sentences = training_sentences[train_val_cutoff:]
Copy the code
Characteristics of the engineering
Our feature set is very simple. For each word, we create a feature dictionary based on the sentence from which the word was extracted. These properties include the words before and after the word as well as its prefix and suffix.
def add_basic_features(sentence_terms, index):
Param sentence_terms: [w1, w2,...] :type sentence_terms: list :param index: the index of the word :type index: int :return: dict containing features :rtype: dict """
term = sentence_terms[index]
return {
'nb_terms': len(sentence_terms),
'term': term,
'is_first': index == 0.'is_last': index == len(sentence_terms) - 1.'is_capitalized': term[0].upper() == term[0].'is_all_caps': term.upper() == term,
'is_all_lower': term.lower() == term,
'prefix-1': term[0].'prefix-2': term[:2].'prefix-3': term[:3].'suffix-1': term[- 1].'suffix-2': term[2 -:].'suffix-3': term[- 3:].'prev_word': ' ' if index == 0 else sentence_terms[index - 1].'next_word': ' ' if index == len(sentence_terms) - 1 else sentence_terms[index + 1]}Copy the code
We map the list of sentences to the list of feature dictionaries.
def untag(tagged_sentence):
""" "Remove the tag for each tagged word. Tagged_sentence: list :return: a list of tags :rtype: list of strings ""
return [w for w, _ in tagged_sentence]
def transform_to_dataset(tagged_sentences):
Param tagged_sentences: List of sentences tagged :param tagged_sentences: List of tuples (term_i, tag_i) :return: """
X, y = [], []
for pos_tags in tagged_sentences:
for index, (term, class_) in enumerate(pos_tags):
# Add basic NLP features for each sentence term
X.append(add_basic_features(untag(pos_tags), index))
y.append(class_)
return X, y
Copy the code
For training, validation, and testing sentences, we divide the attributes into X (input variable) and Y (output variable).
X_train, y_train = transform_to_dataset(training_sentences)
X_test, y_test = transform_to_dataset(testing_sentences)
X_val, y_val = transform_to_dataset(validation_sentences)
Copy the code
Character encoding
Our neural network takes vectors as input, so we need to convert our dictionary features into vectors. Built-in function for skLearn [DictVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html) Provides a very straightforward way to perform vector transformations.
from sklearn.feature_extraction import DictVectorizer
# Fit the dictionary vector generator with our feature set
dict_vectorizer = DictVectorizer(sparse=False)
dict_vectorizer.fit(X_train + X_test + X_val)
# Convert dictionary features to vectors
X_train = dict_vectorizer.transform(X_train)
X_test = dict_vectorizer.transform(X_test)
X_val = dict_vectorizer.transform(X_val)
Copy the code
Our y vector has to be coded. The output variable contains 49 different string values that are encoded as integers.
from sklearn.preprocessing import LabelEncoder
# Train tag encoders with category lists
label_encoder = LabelEncoder()
label_encoder.fit(y_train + y_test + y_val)
# Code category values as integers
y_train = label_encoder.transform(y_train)
y_test = label_encoder.transform(y_test)
y_val = label_encoder.transform(y_val)
Copy the code
We then need to convert these encoded values into dummy variables (thermal encoding alone).
# Convert integers to dummy variables
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
y_val = np_utils.to_categorical(y_val)
Copy the code
Build the Keras model
[Keras] (https://github.com/fchollet/keras/) is an advanced framework, to design and run the neural network, It has multiple backend like [TensorFlow] (https://github.com/tensorflow/tensorflow/), [Theano] (https://github.com/Theano/Theano) and [CNTK] (https://github.com/Microsoft/CNTK).
We want to create a very basic neural network: multilayer perceptrons. This linear lamination can be accomplished easily with a Sequential model. The model will contain an input layer, a hidden layer, and an output layer. To overcome overfitting, we use the Dropout regularization. We set the disconnect rate to 20%, which means that the input neurons are randomly disconnected at a 20% probability every time the parameters are updated during the training.
We use the Corrected Linear Units (ReLU) activation functions for the hidden layers because they are the simplest nonlinear activation functions available.
For the multi-classification problem, we want to convert the neuron output to probability, which can be done using the Softmax function. We decided to use a categorical cross-entropy loss function. In the end, we chose Adam Optimizer because it seemed to be a good fit for the classification task.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
def build_model(input_dim, hidden_neurons, output_dim):
Build, compile, and return a Keras model for fitting/forecasting. "" "
model = Sequential([
Dense(hidden_neurons, input_dim=input_dim),
Activation('relu'),
Dropout(0.2),
Dense(hidden_neurons),
Activation('relu'),
Dropout(0.2),
Dense(output_dim, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
Copy the code
Create a wrapper between the Keras API and Scikit-Learn
[Keras] (https://github.com/fchollet/keras/) provides a wrapper called [KerasClassifier] (https://keras.io/scikit-learn-api/). It implements the Scikit-Learn classifier interface.
All model parameters are defined as follows. We need to provide a function (build_fn) that returns the structure of the neural network. The number of hidden neurons and the batch size were chosen at random. We set the number of iterations to 5 because as the number of iterations increases, the multilayer perceptron begins to overfit (using Dropout Regularization).
from keras.wrappers.scikit_learn import KerasClassifier
model_params = {
'build_fn': build_model,
'input_dim': X_train.shape[1].'hidden_neurons': 512.'output_dim': y_train.shape[1].'epochs': 5.'batch_size': 256.'verbose': 1.'validation_data': (X_val, y_val),
'shuffle': True
}
clf = KerasClassifier(**model_params)
Copy the code
Training the Keras model
Finally, we train the multilayer perceptron on the training set.
hist = clf.fit(X_train, y_train)
Copy the code
Through callback history, we can visualize the change of the model’s log loss and accuracy over time.
import matplotlib.pyplot as plt
def plot_model_performance(train_loss, train_acc, train_val_loss, train_val_acc):
Curve model loss and accuracy over time
blue= '#34495E'
green = '#2ECC71'
orange = '#E23B13'
# Draw the model loss curve
fig, (ax1, ax2) = plt.subplots(2, figsize=(10.8))
ax1.plot(range(1, len(train_loss) + 1), train_loss, blue, linewidth=5, label='training')
ax1.plot(range(1, len(train_val_loss) + 1), train_val_loss, green, linewidth=5, label='validation')
ax1.set_xlabel('# epoch')
ax1.set_ylabel('loss')
ax1.tick_params('y')
ax1.legend(loc='upper right', shadow=False)
ax1.set_title('Model loss through #epochs', color=orange, fontweight='bold')
# Draw model accuracy curve
ax2.plot(range(1, len(train_acc) + 1), train_acc, blue, linewidth=5, label='training')
ax2.plot(range(1, len(train_val_acc) + 1), train_val_acc, green, linewidth=5, label='validation')
ax2.set_xlabel('# epoch')
ax2.set_ylabel('accuracy')
ax2.tick_params('y')
ax2.legend(loc='lower right', shadow=False)
ax2.set_title('Model accuracy through #epochs', color=orange, fontweight='bold')
Copy the code
Then, look at the performance of the model:
plot_model_performance(
train_loss=hist.history.get('loss', []),
train_acc=hist.history.get('acc', []),
train_val_loss=hist.history.get('val_loss', []),
train_val_acc=hist.history.get('val_acc'[])),Copy the code
Model performance varies with the number of iterations.
After two iterations, we found that the model was overfitted.
Evaluate multilayer perceptrons
Since our model has been trained, we can evaluate it directly:
score = clf.score(X_test, y_test)
print(score)
[Out] 0.95816
Copy the code
Our accuracy on the test set is close to 96%, which is very impressive when you look at the basic features that we put into the model. Keep in mind that 100% accuracy is not possible, even for human taggers. We estimate the accuracy of human pos tagging at about 98%.
Visualization of the model
from keras.utils import plot_model
plot_model(clf.model, to_file='model.png', show_shapes=True)
Copy the code
Save the Keras model
Saving the Keras model is very simple because the Keras library provides a way to localize:
clf.model.save('/tmp/keras_mlp.h5')
Copy the code
The model structure, weights, and training configuration (loss function, optimizer) are saved.
resources
Keras
Python deep learning library:[doc]- Adam: A Stochastic Optimization Method: [Paper]
- Improving neural networks by preventing co-adaptation of feature detectors: [paper]
In this article, you learn how to use the Keras library to define and evaluate the accuracy of neural networks for multiple classifications. Code here: [p y |. Ipynb].
Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.