Generally, deep learning teaching material or a video, the author will through MNIST this data set, explain the effect of deep learning, but the data set is too small, and is a monochrome picture, just get some model can obtain better results, but if we are not satisfied with this, want to train a neural network to classify color images, can not?

Of course you can, but it’s not as easy as you think.

When I first set up the neural network for training, the training accuracy was less than 30% and could not converge. Later, I gradually used my understanding of deep learning to adjust the structure of the network, and adopted effective optimization methods to increase my model to 50%, then to 70%, and finally to 99% accuracy in training and 85% accuracy in testing. Therefore, this model also witnessed the growth of my ability in the process of deep learning. That’s why I’m sharing this article.

So, what are the general steps of neural network training?

The figure above illustrates the general supervised learning training process.

  1. Load the data set and preprocess it.
  2. The pre-processed data is divided into feature and label. The feature is sent into the model, and label is regarded as ground-truth.
  3. Model receives feature as input and outputs predict outward-through a series of operations.
  4. By using predict and Predict as variables, a Loss function, Loss, is established. The function value of Loss is to represent the gap between Predict and Ground-truth.
  5. The Optimizer is established to optimize the Loss function to minimize its value as much as possible. The smaller the Loss is, the higher the accuracy of Model prediction will be.
  6. In the process of Optimizer, Model changes the weight of its parameters according to the rules, which is a repeated cycle and continuous process until the loss value tends to be stable and no smaller value can be obtained.

Note: For this article, developed using PyTorch, other deep learning frameworks such as TensorFlow can be easily implemented.

Since it is based on the PyTorch code specification, I assume that the reader has a basic understanding of PyTorch as a deep learning framework.

1. Load the data set

We can write our own code for the loading of the dataset, but if it’s for learning purposes, the experience of writing the code for this step would be very frustrating and boring.

Fortunately, PyTorch offers a very convenient package called TorchVision.

Torchvison provides Dataloader to load common MNIST, CIFAR-10, ImageNet and other data sets, and also provides Transform to transform, regularize and visualize images.

Core packages: torchvision. Datasets, torch. Utils. Data. The DataLoader

In this article, we aim to use PyTorch to create an image classifier based on the CIFAR-10 dataset.

Cifar-10 has 50,000 training images, 10,000 test images in 10 categories, They are ‘airplane’, ‘Automobile’, ‘Bird’, ‘Cat’, ‘Deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’ and ‘Truck’. The size of ciFAR-10 is 32x32X3. So it’s a smaller color image.

How do I load it?

import torch import torch.nn as nn import torch.nn.functional as F import torchvision import torchvision.transforms as transforms import torch.optim as optim import time import os transform = transforms.Compose( [ Transforms. RandomHorizontalFlip (), transforms RandomGrayscale (), transforms. ToTensor (), transforms the Normalize ((0.5, 0.5, Transforms.ToTensor(), transforms.Normalize((transforms. ), 0.5 (0.5, 0.5, 0.5))]) trainset = torchvision. Datasets. CIFAR10 (root = '. / data, "train" = True, download = True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2) testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform1) testloader = torch.utils.data.DataLoader(testset, batch_size=50, shuffle=False, num_workers=2) classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')Copy the code

The transform in the above code is mainly used to preprocess data.

transforms.RandomHorizontalFlip(),
transforms.RandomGrayscale(),
Copy the code

These two lines of code are mainly used for data enhancement. In order to prevent over-fitting in training, it is usually used to increase the capacity of data set during training by randomly flipping pictures and adjusting the brightness of pictures on small data sets.

Transforms. ToTensor (), transforms the Normalize ((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)))Copy the code

The default image format when the dataset is loaded is Numpy, so transform it to Tensor via transforms. The input image is then standardized.

However, when testing, there is no need to enhance the data, so their transformations are a little different.

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
Copy the code

Torchvision. Datasets. CIFAR10 specified CIFAR – 10 this data set, the module defines the data set, how to download it and how to from a local load data available.

Root specifies the location of the data set, and train specifies whether it is a training data set.

trainloader = torch.utils.data.DataLoader(trainset, batch_size=100,
                                          shuffle=True, num_workers=2)
Copy the code

The dataset needs to be used in conjunction with the DataLoader, which continuously extracts data from the dataset and sends it to the Model for training and prediction. Shuffle = True indicates that the extraction of data is randomly shuffled, because we conduct training optimization based on stochastic gradient descent. Shuffle = True indicates that the extraction of data is randomly shuffled. But when testing, there is no need to update the parameters, so there is no need to scramble them. Num_workers = 2 specifies the number of worker threads.

After running the code, the data set will be automatically downloaded and stored in a data file in the current directory.

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Files already downloaded and verified
Copy the code

2. Define the neural network

If we look at the picture at the beginning of the article, the Model is the core, and the neural network is a Model.

In this blog post, THE neural network I created is a deep layer, a copy of VGG, in order to make the accuracy of the test higher. However, I did not fully refer to VGG-16, because my GPU is GTX1080i. At that time, the problem of insufficient memory was exposed in the experiment on Tensorflow, so I reduced the number of convolution kernel on the basis of VGG-16.

The biggest feature of VGG is the extensive use of 3×3 convolution kernels. Please refer to my blog for more details. Deep learning: Interpretation of classical neural network VGG paper

input (32x32x3) color image
Conv1 3 x3, 64
Conv2 3 x3, 64
Maxpool 2×2,strides=2
Batch Normalization
Relu
Conv3 3 x3, 128
Conv4 3 x3, 128
Maxpool 2×2,strides=2
Batch Normalization
Relu
Conv5 3 x3, 128
Conv6 3 x3, 128
Conv7 1 x1, 128
Maxpool 2×2,strides=2
Batch Normalization
Relu
Conv8 3 x3, 256
Conv9 3 x3, 256
Conv10 1 x1, 256
Maxpool 2×2,strides=2
Batch Normalization
Relu
Conv11 3 x3, 512
Conv12 3 x3, 512
Conv13 1 x1, 512
Maxpool 2×2,strides=2
Batch Normalization
Relu
FC14 (8192102)
Dropout
Relu
FC15 (1024102)
Dropout
Relu
FC16 (1024, 10)
class Net(nn.Module): def __init__(self): Super (Net,self).__init__() self.conv1 = nn.Conv2d(3,64,3,padding=1) self.conv2 = nn  = nn.MaxPool2d(2, 2) self.bn1 = nn.batchnorm2d (64) self.relu1 = nn.relu () self.conv3 = nn.conv2d (64,128,3,padding=1) self.conv4 = nn.Conv2d(128, 128, 3,padding=1) self.pool2 = nn.MaxPool2d(2, 2, Padding =1) self.bn2 = nn.batchnorm2d (128) self.relu2 = nn.relu () self. 3,padding=1) self.conv6 = nn.Conv2d(128, 128, 3,padding=1) self.conv7 = nn.Conv2d(128, 128, 1,padding=1) self.pool3 = nn.MaxPool2d(2, 2, padding=1) self.bn3 = nn.BatchNorm2d(128) self.relu3 = nn.ReLU() self.conv8 = nn.Conv2d(128, 256, 3,padding=1) self.conv9 = nn.Conv2d(256, 256, 3, padding=1) self.conv10 = nn.Conv2d(256, 256, 1, padding=1) self.pool4 = nn.MaxPool2d(2, 2, padding=1) self.bn4 = nn.BatchNorm2d(256) self.relu4 = nn.ReLU() self.conv11 = nn.Conv2d(256, 512, 3, padding=1) self.conv12 = nn.Conv2d(512, 512, 3, padding=1) self.conv13 = nn.Conv2d(512, 512, 1, padding=1) self.pool5 = nn.MaxPool2d(2, 2, Padding =1) self.bn5 = nn.batchnorm2d (512) self.relu5 = nn.relu () self.fc14 = nn.linear (512*4*4,1024) self.drop1 = Dropout2d() self.fc15 = nn.linear (1024,1024) self.drop2 = nn.dropout2d () self.fc16 = nn.linear (1024,10) def forward(self,x): x = self.conv1(x) x = self.conv2(x) x = self.pool1(x) x = self.bn1(x) x = self.relu1(x) x = self.conv3(x) x = self.conv4(x) x = self.pool2(x) x = self.bn2(x) x = self.relu2(x) x = self.conv5(x) x = self.conv6(x) x = self.conv7(x) x = self.pool3(x) x = self.bn3(x) x = self.relu3(x) x = self.conv8(x) x = self.conv9(x) x = self.conv10(x) x = self.pool4(x) x = self.bn4(x) x = self.relu4(x) x = self.conv11(x) x = self.conv12(x) x = self.conv13(x) x = Self. Pool5 (x) x = self. Bn5 (x) x = self. Relu5 (x) # print (" x "(shape, x.s considering ()) x = x.v iew (1512 * 4 * 4) = x F.relu(self.fc14(x)) x = self.drop1(x) x = F.relu(self.fc15(x)) x = self.drop2(x) x = self.fc16(x) return xCopy the code

In PyTorch you can customize the neural network by inheriting nn.Module, setting the structure in init(), and the flow of forward propagation in forward(). Because PyTorch calculates gradients automatically, there is no need to specifically define backward propagation.

It’s worth noting that I have used Batch Normalization to accelerate neural network training, as well as this blog post for more details. 【 Deep learning 】 The role and theoretical basis of Batch Normalizaton

In addition, forward does not carry out softmax operation on output at last, so it is only inference. In the training stage, Loss will define this operation, while softmax operation is not needed for testing or prediction.

3. Define the Loss function and optimizer

According to the flow chart above, after defining the neural network model, Loss function and Optimizer need to be defined.

The cross-entropy-loss function is used as the loss function, Adam is used as the Optimizer, and SGD can also be used.

#optimizer = optim.sgd (self.parameters(),lr=0.01) Optimizer = optim.adam (self.parameters(), Lr = 0.0001)Copy the code

It should be noted that nn.CrossEntropyLoss() contains both logSoftmax and Loss operations.

4. Training

By defining Loss and Optimizer, training can be started. The training means is to make the value of Loss as small as possible through Optimizer, so that the prediction of neural network becomes higher and higher.

for epoch in range(100): # loop over the dataset multiple times timestart = time.time() running_loss = 0.0 total = 0 correct = 0 for I, data in enumerate(trainloader, 0): # get the inputs inputs, labels = data inputs, labels = inputs.to(device),labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward + backward +  optimize outputs = self(inputs) l = loss(outputs, labels) l.backward() optimizer.step() # print statistics running_loss += l.item() # print("i ",i) if i % 500 == 499: # print every 500 mini-batches print('[%d, %5d] loss: 3f '% (epoch, I, running_loss / 500)) Running_loss = 0.0_, predicted = Torch. Max (outputs. Data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the %d tran images: %.3f %%' % (total, 100.0 * correct/total)) total = 0 correct = 0 print('epoch %d cost %3f SEC '%(epoch,time.time()-timestart)) print('Finished Training')Copy the code

The training process is programmed to go through 100 epochs and then end. If you are not aware of the difference between epoch and Iteration, iteration refers to a single mini-batch training, and epoch is related to the size of the dataset and batch size. The number of pictures in ciFAR-10 training set is 50000, and the batch size is 100, so an epoch can be completed only after 500 iterations. Epoch can be roughly regarded as a neural network that has gone through all the photos in the training set from beginning to end.

inputs, labels = data
inputs, labels = inputs.to(device),labels.to(device)
Copy the code

PyTorch can specify a device, such as a GPU or CPU, and the code above does just that. The default is the CPU.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Copy the code

The code above specifies the device type.

optimizer.zero_grad()

# forward + backward + optimize
outputs = self(inputs)
l = loss(outputs, labels)
l.backward()
optimizer.step()

Copy the code

The last few lines of code are the core of the training.

Forward propagation + backward propagation + optimization, according to the official PyTorch example.

running_loss += l.item() # print("i ",i) if i % 500 == 499: # print every 500 mini-batches print('[%d, %5d] loss: 3f '% (epoch, I, running_loss / 500)) Running_loss = 0.0_, predicted = Torch. Max (outputs. Data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the %d tran Images: %.3f %%' % (total, 100.0 * correct/total)) total = 0 correct = 0Copy the code

As mentioned just now, in this paper, 500 iterations create an epoch. It is hoped to detect the effect of training when training, so the loss value and accuracy should be printed for each epoch. Accuracy is the percentage of the total number of samples predicted to be accurate in the epoch.

4.1 Save and restore the training model

Some students may not have available GPU on their computers or the GPU version is relatively low, so the training time will be very long, so it is very important to save and restore the training model at any time.

PyTorch also supports this functionality.

torch.save({'epoch':epoch,
            'model_state_dict':net.state_dict(),
            'optimizer_state_dict':optimizer.state_dict(),
            'loss':loss
            },path)
Copy the code

Torch. Save () can save the epoch, state_dict, and loss during training.

What is state_dict? Is a parameter dictionary that can be learned. Note that network state_dict and Optimizer state_dict can be saved.

Path specifies the path to save the model.

So we can save the training progress and status at the necessary time according to the actual situation.

path = 'weights.tar'
checkpoint = torch.load(path)
self.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
initepoch = checkpoint['epoch']
loss = checkpoint['loss']
Copy the code

Load checkpoint via torch.load() and restore the parameters via net and optimizer’s load_state_dict() method. Meanwhile, the previous EPOCH value and loss can be restored.

Thus, the complete code of the training function with the function of saving and restoring the model is as follows:

def train_sgd(self,device): optimizer = optim.Adam(self.parameters(), Lr =0.0001) path = 'weights. Tar 'initepoch =0 if os.path.exists(path) is not True: Loss = nn.CrossEntropyLoss() # Optimizer = optim.sgd (self.parameters(),lr=0.01) else: checkpoint = torch.load(path) self.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) initepoch = checkpoint['epoch'] loss = checkpoint['loss'] for epoch in range(initepoch,100): # loop over the dataset multiple times timestart = time.time() running_loss = 0.0 total = 0 correct = 0 for I, data in enumerate(trainloader, 0): # get the inputs inputs, labels = data inputs, labels = inputs.to(device),labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward + backward +  optimize outputs = self(inputs) l = loss(outputs, labels) l.backward() optimizer.step() # print statistics running_loss += l.item() # print("i ",i) if i % 500 == 499: # print every 500 mini-batches print('[%d, %5d] loss: 3f '% (epoch, I, running_loss / 500)) Running_loss = 0.0_, predicted = Torch. Max (outputs. Data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the %d tran images: %.3f %%' % (total, 100.0 * correct/total)) total = 0 correct = 0 torch. Save ({'epoch':epoch, 'model_state_dict':net.state_dict(), 'optimizer_state_dict':optimizer.state_dict(), 'loss':loss },path) print('epoch %d cost %3f sec' %(epoch,time.time()-timestart)) print('Finished Training')Copy the code

5. Test

The purpose of training the neural network is to make predictions, but in order to test the ability of the model, it needs to be tested after the training is completed.

Test images and training images are not in the same heap of data.

When testing, the parameters of the model are fixed and there is no need to calculate the gradient. Then it mainly evaluates the percentage between the structure predicted by the model and the ground-truth.

def test(self,device): correct = 0 total = 0 with torch.no_grad(): for data in testloader: images, labels = data images, labels = images.to(device), labels.to(device) outputs = self(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the 10000 Test images: %.3f %%' % (100.0 * correct/total)Copy the code

Finally, we look at the end result.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = Net()
net = net.to(device)
net.train_sgd(device)
net.test(device)
Copy the code

Net objects are created, then trained, and then tested.

Files already downloaded and verified Files already downloaded and verified [99, 499] loss: 0.0383 Accuracy of the network on the 100 TRAN images: 99.000 % epoch 99 cost 15.083664 SEC Finished Training Accuracy of the network on the 10000 test images: 85.050%Copy the code

It took me about 15 seconds to run an epoch, and the test accuracy of the neural network reached 85% in the end. Cifar-10’s best known score is 96.53, which, based on the 85% I’m showing here, would probably rank 37th.

The complete code covered in this article is as follows

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import time
import os

transform = transforms.Compose(
    [
     transforms.RandomHorizontalFlip(),
     transforms.RandomGrayscale(),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

transform1 = transforms.Compose(
    [
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=100,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform1)
testloader = torch.utils.data.DataLoader(testset, batch_size=50,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')



class Net(nn.Module):


    def __init__(self):
        super(Net,self).__init__()
        self.conv1 = nn.Conv2d(3,64,3,padding=1)
        self.conv2 = nn.Conv2d(64,64,3,padding=1)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu1 = nn.ReLU()

        self.conv3 = nn.Conv2d(64,128,3,padding=1)
        self.conv4 = nn.Conv2d(128, 128, 3,padding=1)
        self.pool2 = nn.MaxPool2d(2, 2, padding=1)
        self.bn2 = nn.BatchNorm2d(128)
        self.relu2 = nn.ReLU()

        self.conv5 = nn.Conv2d(128,128, 3,padding=1)
        self.conv6 = nn.Conv2d(128, 128, 3,padding=1)
        self.conv7 = nn.Conv2d(128, 128, 1,padding=1)
        self.pool3 = nn.MaxPool2d(2, 2, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.relu3 = nn.ReLU()

        self.conv8 = nn.Conv2d(128, 256, 3,padding=1)
        self.conv9 = nn.Conv2d(256, 256, 3, padding=1)
        self.conv10 = nn.Conv2d(256, 256, 1, padding=1)
        self.pool4 = nn.MaxPool2d(2, 2, padding=1)
        self.bn4 = nn.BatchNorm2d(256)
        self.relu4 = nn.ReLU()

        self.conv11 = nn.Conv2d(256, 512, 3, padding=1)
        self.conv12 = nn.Conv2d(512, 512, 3, padding=1)
        self.conv13 = nn.Conv2d(512, 512, 1, padding=1)
        self.pool5 = nn.MaxPool2d(2, 2, padding=1)
        self.bn5 = nn.BatchNorm2d(512)
        self.relu5 = nn.ReLU()

        self.fc14 = nn.Linear(512*4*4,1024)
        self.drop1 = nn.Dropout2d()
        self.fc15 = nn.Linear(1024,1024)
        self.drop2 = nn.Dropout2d()
        self.fc16 = nn.Linear(1024,10)


    def forward(self,x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.pool1(x)
        x = self.bn1(x)
        x = self.relu1(x)


        x = self.conv3(x)
        x = self.conv4(x)
        x = self.pool2(x)
        x = self.bn2(x)
        x = self.relu2(x)

        x = self.conv5(x)
        x = self.conv6(x)
        x = self.conv7(x)
        x = self.pool3(x)
        x = self.bn3(x)
        x = self.relu3(x)

        x = self.conv8(x)
        x = self.conv9(x)
        x = self.conv10(x)
        x = self.pool4(x)
        x = self.bn4(x)
        x = self.relu4(x)

        x = self.conv11(x)
        x = self.conv12(x)
        x = self.conv13(x)
        x = self.pool5(x)
        x = self.bn5(x)
        x = self.relu5(x)
        # print(" x shape ",x.size())
        x = x.view(-1,512*4*4)
        x = F.relu(self.fc14(x))
        x = self.drop1(x)
        x = F.relu(self.fc15(x))
        x = self.drop2(x)
        x = self.fc16(x)

        return x

    def train_sgd(self,device):
        optimizer = optim.Adam(self.parameters(), lr=0.0001)

        path = 'weights.tar'
        initepoch = 0

        if os.path.exists(path) is not True:
            loss = nn.CrossEntropyLoss()
            # optimizer = optim.SGD(self.parameters(),lr=0.01)

        else:
            checkpoint = torch.load(path)
            self.load_state_dict(checkpoint['model_state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
            initepoch = checkpoint['epoch']
            loss = checkpoint['loss']




        for epoch in range(initepoch,100):  # loop over the dataset multiple times
            timestart = time.time()

            running_loss = 0.0
            total = 0
            correct = 0
            for i, data in enumerate(trainloader, 0):
                # get the inputs
                inputs, labels = data
                inputs, labels = inputs.to(device),labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward + backward + optimize
                outputs = self(inputs)
                l = loss(outputs, labels)
                l.backward()
                optimizer.step()

                # print statistics
                running_loss += l.item()
                # print("i ",i)
                if i % 500 == 499:  # print every 500 mini-batches
                    print('[%d, %5d] loss: %.4f' %
                          (epoch, i, running_loss / 500))
                    running_loss = 0.0
                    _, predicted = torch.max(outputs.data, 1)
                    total += labels.size(0)
                    correct += (predicted == labels).sum().item()
                    print('Accuracy of the network on the %d tran images: %.3f %%' % (total,
                            100.0 * correct / total))
                    total = 0
                    correct = 0
                    torch.save({'epoch':epoch,
                                'model_state_dict':net.state_dict(),
                                'optimizer_state_dict':optimizer.state_dict(),
                                'loss':loss
                                },path)

            print('epoch %d cost %3f sec' %(epoch,time.time()-timestart))

        print('Finished Training')

    def test(self,device):
        correct = 0
        total = 0
        with torch.no_grad():
            for data in testloader:
                images, labels = data
                images, labels = images.to(device), labels.to(device)
                outputs = self(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        print('Accuracy of the network on the 10000 test images: %.3f %%' % (
                100.0 * correct / total))

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = Net()
net = net.to(device)
net.train_sgd(device)
net.test(device)

Copy the code

The above code, personally test can run.

A place to be optimized

1. The number of layers of the neural network and the number of convolution kernels

Maybe some students have a better GPU than mine, so you can try to deepen the level, or increase the number of convolution cores for each level.

Students with poor GPU or no GPU can try to reduce the number of layers of neural network and the number of convolution cores in each layer, otherwise the training may take a certain amount of time.

2. Adjust the architecture of the neural network

My neural network is modified with reference to VGG-16. Interested students can refer to Inception-V3 or ResNet to improve accuracy.

3. Visualization of training results

Tensorflow has a visualization tool called TensorBoard, and PyTorch can be used for visualization through Visdom.

I hope you can practice and improve the accuracy of the test.