In this article, mainly to do MNIST handwritten number set classification task. This is a basic, classic sorting task. I suggest you must follow the code to do, the source code has been uploaded to the public number.

1 exploratory data analysis

In general, before model training, we should do a task of data set analysis. This is commonly abbreviated to EDA in English, which is Exploring Data Analysis.

Data set to obtain, this was supposed to course mentioned torchvision. Before using datasets. The MNIST (), but given the download this torchvision MNIST offers complete need to the size of 200 MB, Therefore, I directly provided the CSV file of MNIST data (including tr.csv and test.csv), the size of which was only 14M after compressed into.zip, and the code was based on this data file.

1.1 Basic data set information

import pandas as pd
# read the training set
train_df = pd.read_csv('./MNIST_csv/train.csv')
n_train = len(train_df)
n_pixels = len(train_df.columns) - 1
n_class = len(set(train_df['label']))
print('Number of training samples: {0}'.format(n_train))
print('Number of training pixels: {0}'.format(n_pixels))
print('Number of classes: {0}'.format(n_class))

Read the test set
test_df = pd.read_csv('./MNIST_csv/test.csv')
n_test = len(test_df)
n_pixels = len(test_df.columns)
print('Number of test samples: {0}'.format(n_test))
print('Number of test pixels: {0}'.format(n_pixels))
Copy the code

Output result:

The training set has 42,000 images, each image has 784 pixels (so to turn 784 pixels into images, you need to turn 784 pixels into 28×2828\times 2828×28), and the sample has 10 categories, i.e., 0 to 9. There are 28,000 samples in the test set.

1.2 Data set visualization

# Show some pictures
import numpy as np
from torchvision.utils import make_grid
import torch
import matplotlib.pyplot as plt
random_sel = np.random.randint(len(train_df), size=8)
data = (train_df.iloc[random_sel,1:].values.reshape(-1.1.28.28) /255.)

grid = make_grid(torch.Tensor(data), nrow=8)
plt.rcParams['figure.figsize'] = (16.2)
plt.imshow(grid.numpy().transpose((1.2.0)))
plt.axis('off')
plt.show()
print(*list(train_df.iloc[random_sel, 0].values), sep = ', ')
Copy the code

The output has a picture:

And a line of print:

Eight samples were randomly selected for visualization, and then the label values corresponding to the samples were printed out.

1.3 Categories Are Balanced

Then we need to check whether the categories in the training sample are balanced, using the histogram to check:

Check for category imbalance
plt.figure(figsize=(8.5))
plt.bar(train_df['label'].value_counts().index, train_df['label'].value_counts())
plt.xticks(np.arange(n_class))
plt.xlabel('Class', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.grid('on', axis='y')
plt.show()
Copy the code

Output image:

It’s basically fine. It’s balanced.

2. Training and reasoning

2.1 build a dataset

We can rewrite a Python script, again importing the library and reading the file:

import pandas as pd
train_df = pd.read_csv('./MNIST_csv/train.csv')
test_df = pd.read_csv('./MNIST_csv/test.csv')
n_train = len(train_df)
n_test = len(test_df)
n_pixels = len(train_df.columns) - 1
n_class = len(set(train_df['label']))
Copy the code

SQL > create a Dataset from Dataloader. SQL > create a Dataset from Dataloader

import torch
from torch.utils.data import Dataset,DataLoader
from torchvision import transforms

class MNIST_data(Dataset) :
    def __init__(self, file_path,
                 transform=transforms.Compose([transforms.ToPILImage(), transforms.ToTensor(),
                                               transforms.Normalize(mean=(0.5.), std=(0.5.))])
                 ) :
        df = pd.read_csv(file_path)
        if len(df.columns) == n_pixels:
            # test data
            self.X = df.values.reshape((-1.28.28)).astype(np.uint8)[:, :, :, None]
            self.y = None
        else:
            # training data
            self.X = df.iloc[:, 1:].values.reshape((-1.28.28)).astype(np.uint8)[:, :, :, None]
            self.y = torch.from_numpy(df.iloc[:, 0].values)
        self.transform = transform

    def __len__(self) :
        return len(self.X)

    def __getitem__(self, idx) :
        if self.y is not None:
            return self.transform(self.X[idx]), self.y[idx]
        else:
            return self.transform(self.X[idx])
Copy the code

As you can see, the dataset returns two different values depending on whether there are labels. (Training sets return both data and tags, test sets return only data).

batch_size = 64

train_dataset = MNIST_data('./MNIST_csv/train.csv',
                           transform= transforms.Compose([
                            transforms.ToPILImage(),
                            transforms.RandomRotation(degrees=20),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=(0.5,), std=(0.5,))]))
test_dataset = MNIST_data('./MNIST_csv/test.csv')

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                           batch_size=batch_size, shuffle=False)
Copy the code

About this code:

  • The dataset of train and test was constructed, and then the corresponding dataloader was constructed respectively
  • Random rotation is used in train_dataset. Because this function is applied to PIL images, data needs to be converted to PIL and then rotated, and then transformed to Tensor for standardization. Here, 0.5 is randomly selected for standardization, and further changes can be made if necessary.
  • It should be noted that the data before being transferred to PIL is in numpy format, so the data should be in the form of W×H×CW\times H \times CW×H×C. Since this is a single channel image, the shape of the data is :(7200, 28, 28, 1). (72000 is the number of samples)
  • Image enhancement methods such as rotation and zooming are only used in training sets, which enhance the difficulty of model training and make the model more robust. Image enhancement methods such as rotation and scaling are not normally used in the test set. (The training phase is for the model to learn, the testing phase is mainly for improving the accuracy of the prediction, which feels like nonsense…)

2.2 Build model classes

import torch.nn as nn
class Net(nn.Module) :
    def __init__(self) :
        super(Net, self).__init__()

        self.features1 = nn.Conv2d(1.32, kernel_size=3, stride=1, padding=1)
        self.features = nn.Sequential(
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.Conv2d(32.32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32.64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64.64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(64 * 7 * 7.512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(512.512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(512.10),for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def forward(self, x) :
        x = self.features1(x)
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x
Copy the code

This model class as a whole is pretty neat, and it’s all the same methods we talked about before. Quiz: Remember what happened when Xavier initialized? The Xavier initialization method is a very common one, and was derived in detail in previous articles.

After that, we instantiated the model, transmitted the parameters of the model to the optimizer, and set a learning rate attenuation strategy. Learning rate attenuation is the epoch of training, and the more the learning rate is, the lower the learning rate will be, which will be described in detail in the following articles.

import torch.optim as optim

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Net().to(device)
# model = torchvision.models.resnet50(pretrained=True).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.003)
criterion = nn.CrossEntropyLoss().to(device)
exp_lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
print(model)
Copy the code

The result, of course, is to print out the entire model:

Net(
  (features1): Conv2d(1.32, kernel_size=(3.3), stride=(1.1), padding=(1.1))
  (features): Sequential(
    (0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): ReLU(inplace=True)
    (2): Conv2d(32.32, kernel_size=(3.3), stride=(1.1), padding=(1.1(a))3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(32.64, kernel_size=(3.3), stride=(1.1), padding=(1.1(a))7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): Conv2d(64.64, kernel_size=(3.3), stride=(1.1), padding=(1.1(a))10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=3136, out_features=512, bias=True)
    (2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): ReLU(inplace=True)
    (4): Dropout(p=0.5, inplace=False)
    (5): Linear(in_features=512, out_features=512, bias=True)
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): ReLU(inplace=True)
    (8): Dropout(p=0.5, inplace=False)
    (9): Linear(in_features=512, out_features=10, bias=True)))Copy the code

2.3 Training Model

def train(epoch) :
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # Read data
        data = data.to(device)
        target = target.to(device)
        # Calculate the predicted results and losses of the model
        output = model(data)
        loss = criterion(output, target)

        optimizer.zero_grad() # Calculate graph gradient zero clearing
        loss.backward() # Loss back propagation
        optimizer.step()Then update the parameters
        if (batch_idx + 1) % 50= =0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                       100. * (batch_idx + 1) / len(train_loader), loss.item()))
               
    exp_lr_scheduler.step()
Copy the code

A function that trains one EPOCH is defined, followed by the main function code that trains 10 Epochs.

log = [] Record the changes of Loss
n_epochs = 2
for epoch in range(n_epochs):
    train(epoch)

Convert log to a line graph
import matplotlib.pyplot as plt
plt.plot(log)
plt.show()
Copy the code

Pay attention to the notice“, this time will report an error, let’s take a look, I detailed my personal reading error habit:

The RandomRotate function is required to input the image as a single channel, because the image is grayscale. The RandomRotate function is required to input the image as three channels. It is perfectly possible to copy numpy’s (7200,28,28,1) fourth dimension to (7200,28,28, 3) directly before converting to PIL format. But here I want to use one way to teach a lesson torchvision. Transforms. The GrayScale (num_output_channels), live learning.

So change train_dataset to:

train_dataset = MNIST_data('./MNIST_csv/train.csv',
                           transform= transforms.Compose([
                            transforms.ToPILImage(),
                            transforms.Grayscale(num_output_channels=3),
                            transforms.RandomRotation(degrees=20),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=(0.5,), std=(0.5,))]))
test_dataset = MNIST_data('./MNIST_csv/test.csv',
                          transform=transforms.Compose([
                              transforms.ToPILImage(),
                              transforms.Grayscale(num_output_channels=3),
                              transforms.ToTensor(),
                              transforms.Normalize(mean=(0.5,), std=(0.5,))))Copy the code

Then don’t forget to change the input channel of the first convolution layer in the model class to 3

# self.features1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
self.features1 = nn.Conv2d(3.32, kernel_size=3, stride=1, padding=1)
Copy the code

Then re-run the code and find that it can be trained normally. The part of the screenshot printed out is as follows:

Then take a look at the loss decline, which is convergence, training more epoch should be better:The training is found to converge. What needs to be noted here is that it is possible for over-fitting to occur when all data are used for training without validation set (but it is only trained for 10 epochs here, so over-fitting will not happen).A more secure approach is to split the data into training sets and validators (2:1,3:1,4:1). 4:1 is more commonly used, and this is the N-fold approach.In the later study will introduce this in detail, but this knowledge point is not difficult, you can also consult.

2.4 Inference prediction

def prediciton(data_loader) :
    model.eval()
    test_pred = torch.LongTensor()

    for i, data in enumerate(data_loader):
        data = data.to(device)
        output = model(data)
        pred = output.cpu().data.max(1, keepdim=True) [1]
        test_pred = torch.cat((test_pred, pred), dim=0)
    return test_pred

test_pred = prediciton(test_loader)
Copy the code

Similar to trian, write a prediction function that returns the predicted value. Then, as in EDA, extract eight numbers from the test set to see how well the image matches the prediction

from torchvision.utils import make_grid
random_sel = np.random.randint(len(test_df), size=8)
data = (test_df.iloc[random_sel,:].values.reshape(-1.1.28.28) /255.)

grid = make_grid(torch.Tensor(data), nrow=8)
plt.rcParams['figure.figsize'] = (16.2)
plt.imshow(grid.numpy().transpose((1.2.0)))
plt.axis('off')
plt.show()
print(*list(test_pred[random_sel].numpy()), sep = ', ')
Copy the code

The output image is:Printout:

OK, congratulations, you have completed the classification of MNIST handwritten number set.