Artificial intelligence (ai)

MNIST handwritten data set introduction

January 3, 2024

by Michael O'Sullivan-Atkinson

No Comments

1. Introduction of data set

The MNIST dataset is a classic dataset in machine learning. The simplest way is to load it directly with the following code:

import tensorflow as tf
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
Copy the code

(60,000, 28, 28) (60,000,) (10000, 28, 28) (10000,)Copy the code

It can be seen that the data set consists of 60,000 training samples and 10,000 test samples
Each sample is a 28-by-28 pixel grayscale handwritten digital image
Each pixel is an integer between 0 and 255

2. Print your first handwritten image

import matplotlib.pyplot as plt

plt.figure()
plt.imshow(X_train[0])
plt.colorbar()
plt.grid(False)
plt.show()
Copy the code

3. Print the first 25 handwritten numbers

Scale the pixel value to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# All category tags
class_names = ['0'.'1'.'2'.'3'.'4'.'5'.'6'.'7'.'8'.'9']

plt.figure(figsize=(10.10))
for i in range(25):
    plt.subplot(5.5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i]])
Copy the code