Writing in the front
Build a network that divides Reuters news into 46 mutually exclusive topics. Because there are multiple categories, this is an example of a multiclass classification problem. Since each data point can only be classified into one category, this is more specifically an example of the problem of single-label (multiclass) classification. If each data point can be classified into multiple categories (topics), then it is a multilabel (multiclass classification) problem.
Reuters data set
This article uses the Reuters data set, which contains many short stories and their corresponding topics, published by Reuters in 1986. It is a simple, widely used text categorization data set. It includes 46 different topics: some topics have more samples, but the training set has at least 10 samples for each topic. Like IMDB and MNIST, the Reuters dataset is built in as part of Keras. Let’s see.
from tensorflow.keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
Copy the code
As with the IMDB dataset, the parameter num_words=10000 restricts the data to the first 10,000 most frequently occurring words. We had 8,982 training samples and 2,246 test samples. As with IMDB comments, each sample is a list of integers (representing word indexes). If you are curious, you can decode the index into words with the following code.
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ' '.join([reverse_word_index.get(i - 3.'? ') for i in train_data[0]])
print(decoded_newswire)
Copy the code
To prepare data
Vectorization of data.
def vectorize_sequences(sequences, dimension=10000) :
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
Copy the code
There are two ways to vectorize labels: you can convert tag lists to integer tensors, or you can use one-hot encoding. One-hot encoding, also known as categorical encoding, is a widely used format for categorical data. In this example, the one-hot encoding of the tag represents each tag as an all-zero vector, with only the element corresponding to the tag index being 1. The code implementation is as follows.
def to_one_hot(labels, dimension=46) :
results = np.zeros((len(labels), dimension))
for i, label in enumerate(labels):
results[i, label] = 1.
return results
one_hot_train_labels = to_one_hot(train_labels)
ont_hot_test_labels = to_one_hot(test_labels)
Copy the code
Note that the Keras built-in method can do this, which you’ve already seen in the MNIST example.
from keras.utils import to_categorical
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)
Copy the code
To build the network
This problem has one constraint: the number of output categories is 46. The dimensions of the output space are large. For the stack of the used Dense layer, each layer can only access the output of the previous layer. If one layer loses some information related to the classification problem, that information cannot be retrieved by subsequent layers, that is, each layer can become an information bottleneck. The previous example used a 16-dimensional middle layer, but the 16-dimensional space is probably too small for this example to learn to distinguish between 46 different categories. This small dimensional layer can become an information bottleneck, permanently losing relevant information. For this reason, a more dimensional layer with 64 units will be used below
from tensorflow.keras import layers
from tensorflow.keras import models
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
Copy the code
Two other points should be noted about this architecture.
- The final layer of the network is the 46 Dense layer. This means that, for each input sample, the network will output a 46-dimensional vector. Each element of this vector (that is, each dimension) represents a different output category.
- The last layer uses SoftMax activation. You’ve seen this used in the MNIST example. The network will output probability distributions over 46 different output categories — for each input sample, the network will output a 46-dimensional vector, where output[I] is the probability that the sample falls into the i-th category. The sum of the 46 probabilities is 1.
For this example, the best loss function is categorical_crossentropy. It is used to measure the distance between two probability distributions, namely the probability distribution of network output and the actual distribution of labels. By minimizing the distance between the two distributions, the training network can make the output as close to the real label as possible.
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Copy the code
Validate your approach
We set aside 1000 samples in the training data as validation sets.
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
Copy the code
Now start training network, 20 rounds in total.
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val))
Copy the code
Finally, we draw the loss curve and the accuracy curve.
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1.len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Copy the code
plt.clf()
acc = history.history['acc']
val_acc = history.history['val_acc']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Copy the code
The network began to overfit after 9 rounds of training. We trained a new network from scratch, nine rounds in total, and then evaluated the model on the test set.
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,))) model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(partial_x_train,
partial_y_train,
epochs=9,
batch_size=512,
validation_data=(x_val, y_val))
results = model.evaluate(x_test, one_hot_test_labels)
Copy the code
This method can achieve an accuracy of about 80%. For balanced dichotomies, a completely random classifier can achieve 50% accuracy. But in this case, the accuracy was about 19% completely random, so the results are pretty good, at least compared to random benchmarks.
Generate prediction results on new data
You can verify that the predict method for the model instance returns probability distributions on 46 topics. We generate topic predictions for all test data.
predictions = model.predict(x_test)
Copy the code
Each element in the predictions is a vector of length 46.
predictions[0].shape
# (46),
Copy the code
The sum of all the elements of this vector is 1.
np.sum(predictions[0])
# 1.0
Copy the code
The largest element is the prediction category, the category with the highest probability.
np.argmax(predictions[0])
# 4
Copy the code
The complete code
from tensorflow.keras.datasets import reuters
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import layers
from tensorflow.keras import models
import matplotlib.pyplot as plt
import numpy as np
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
#word_index = reuters.get_word_index()
#reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
#decoded_newswire = ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[0]])
#print(decoded_newswire)
def vectorize_sequences(sequences, dimension=10000) :
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
def to_one_hot(labels, dimension=46) :
results = np.zeros((len(labels), dimension))
for i, label in enumerate(labels):
results[i, label] = 1.
return results
one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)
# one_hot_train_labels = to_categorical(train_labels)
# one_hot_test_labels = to_categorical(test_labels)
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
history = model.fit(partial_x_train,
partial_y_train,
epochs=9,
batch_size=512,
validation_data=(x_val, y_val))
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1.len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# plt.clf()
# acc = history.history['acc']
# val_acc = history.history['val_acc']
# plt.plot(epochs, acc, 'bo', label='Training acc')
# plt.plot(epochs, val_acc, 'b', label='Validation acc')
# plt.title('Training and validation accuracy')
# plt.xlabel('Epochs')
# plt.ylabel('Accuracy')
# plt.legend()
# plt.show()
Copy the code