The cat and dog to identify

In this article you will learn how to use PyTorch for image recognition

🔥 GitHub github.com/kzbkzb/Pyth… Has been included

  • Akik: Student K
  • From column: 100 Examples of Deep Learning -PyTorch edition
  • Data link: pan.baidu.com/s/1YREL1omT… (Extraction code: IONW)

My environment:

  • Language: Python3.8
  • Compiler: Jupyter Lab
  • Deep learning environment:
    • The torch = = 1.10.0 + cu113
    • Torchvision = = 0.11.1 + cu113
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import numpy as np
Copy the code

Data reading and preprocessing

Pay attention to whether it is reasonable to enhance the data! Originally, I used the following code to enhance the data in an attempt to solve the problem of insufficient data. After testing, I found that not all of the enhancements had a positive impact.

In this regard, I have conducted several small comparative experiments, with other parameters unchanged and only data enhancement method changed. The final experimental results are as follows:

  • No data enhancement: 79.2%
  • Random rotation: 80.8%
  • Random rotation + Gaussian blur blur: 83.3%
  • Random vertical flip: 73.3%

In the later article of 100 Cases of Deep Learning, I will make a more detailed comparison. This time, I will let you know about it first.

train_datadir = './1-cat-dog/train/'
test_datadir  = './1-cat-dog/val/'

train_transforms = transforms.Compose([
    transforms.Resize([224.224]),  # resize the input image to the same size
    # transforms. RandomRotation (degrees = (10, 10)), # random rotation, and randomly choose between 10 to 10 degrees
    # transforms. RandomHorizontalFlip (p = 0.5), random flip horizontal # Select a probability and probability
    # transforms. RandomVerticalFlip (p = 0.5), # random flip vertical
    # transforms. RandomPerspective (distortion_scale = 0.6, p = 1.0), # random Angle
    Transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5)), # transforms
    transforms.ToTensor(),          Translate PIL Image or numpy. Ndarray into tensor and normalize to between 0,1
    transforms.Normalize(           Standardization process --> Conversion to standard normal distribution (Gaussian distribution), so that the model is easier to converge
        mean=[0.485.0.456.0.406], 
        std=[0.229.0.224.0.225])  Where mean=[0.485,0.456,0.406] and STD =[0.229,0.224,0.225] were calculated by random sampling from the dataset.
])

test_transforms = transforms.Compose([
    transforms.Resize([224.224]),  # resize the input image to the same size
    transforms.ToTensor(),          Translate PIL Image or numpy. Ndarray into tensor and normalize to between 0,1
    transforms.Normalize(           Standardization process --> Conversion to standard normal distribution (Gaussian distribution), so that the model is easier to converge
        mean=[0.485.0.456.0.406], 
        std=[0.229.0.224.0.225])  Where mean=[0.485,0.456,0.406] and STD =[0.229,0.224,0.225] were calculated by random sampling from the dataset.
])

train_data = datasets.ImageFolder(train_datadir,transform=train_transforms)

test_data  = datasets.ImageFolder(test_datadir,transform=test_transforms)

train_loader = torch.utils.data.DataLoader(train_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=1)
test_loader  = torch.utils.data.DataLoader(test_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=1)
Copy the code

This part about transforms.Com pose more information you can refer to pytorch – cn. Readthedocs. IO/useful/latest/t…

If you want to know what other data enhancements are available, you can check them out here: pytorch.org/vision/stab… The corresponding API can be found here at pytorch.org/vision/stab…

for X, y in test_loader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break
Copy the code
Shape of X [N, C, H, W]:  torch.Size([4, 3, 224, 224])
Shape of y:  torch.Size([4]) torch.int64
Copy the code

Define the model

import torch.nn.functional as F

Find gpus that can be used for training
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

# Define model
class LeNet(nn.Module) :
    __init__ generally defines the operators required by the network, such as convolution, full join operators, etc
    def __init__(self) :
        super(LeNet, self).__init__()
        The first argument to Conv2d is the number of channels to input, the second is the number of channels to output, and the third is the kernel size
        self.conv1 = nn.Conv2d(3.6.5)
        self.conv2 = nn.Conv2d(6.16.5)
        Since the upper layer has 16 channel outputs and each feature map size is 5*5, the input of the full connection layer is 16*5*5
        self.fc1 = nn.Linear(16*53*53.120)
        self.fc2 = nn.Linear(120.84)
        There are 10 classes in the end, so the last full connection layer output number is 10
        self.fc3 = nn.Linear(84.2)
        self.pool = nn.MaxPool2d(2.2)
    The # forward function defines the forward propagation operation, just as you would write normal Python arithmetic operations
    def forward(self, x) :
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        # The next step transforms the two-dimensional feature graph into one dimension so that the full connection layer can handle it
        x = x.view(-1.16*53*53)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = LeNet().to(device)
print(model)
Copy the code
Using cuda device
LeNet(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=44944, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=2, bias=True)
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
Copy the code

Loss functions and optimizers

Define a loss function and an optimizer.

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
Copy the code

Define training function

In a single training cycle, the model makes predictions on the training data set (fed to it in batches) and propagates the prediction errors back to adjust the model’s parameters.

def train(dataloader, model, loss_fn, optimizer) :
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Calculate the prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100= =0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]. "")
Copy the code

Defining test functions

def test(dataloader, model, loss_fn) :
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0.0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
Copy the code

training

epochs = 20
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
    test(test_loader, model, loss_fn)
print("Done!")
Copy the code
Epoch 1 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: [0/480] loss 0.697082:0.686452 [400/480] Test Error: Accuracy: 50.8%, Avg loss: 0.692428 Epoch 2 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.696046/0/480 loss: 0.674288 [400/480] Test Error: Accuracy: 50.0%, Avg loss: 0.690799 Epoch 3 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.682432 [0/480] loss: 0.677850 [400/480] Test Error: Accuracy: 58.3%, Avg loss: 0.686088 Epoch 4 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: [0/480] loss: 0.707287 0.681919 (400/480) Test the Error: Accuracy: 60.8%, Avg loss: 0.681735 Epoch 5 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.662526/0/480 loss: 0.686361 [400/480] Test Error: Accuracy: 59.2%, Avg loss: 0.678261 Epoch 6 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.643308 [0/480] loss: 0.588915 [400/480] Test Error: 63.3%, Avg loss: 0.661859 Epoch 7 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: [0/480] loss: 0.456625 0.446218 (400/480) Test the Error: Accuracy: 64.2%, Avg loss: 0.660168 Epoch 8 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.416538/0/480 loss: 0.779305 [400/480] Test Error: Accuracy: 61.7%, Avg loss: 0.647555 Epoch 9 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.622066 [0/480] loss: 0.547348 [400/480] Test Error: 66.7%, Avg loss: 0.647476 Epoch 10 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: [0/480] loss: 0.690601 0.458835 (400/480) Test the Error: Accuracy: 65.0%, Avg loss: 0.637805 Epoch 11 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.441014/0/480 loss: 0.798121 [400/480] Test Error: Accuracy: 68.3%, Avg loss: 0.644360 Epoch 12 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.340511 [0/480] loss: 0.479057 [400/480] Test Error: 67.5%, Avg loss: 0.608323 Epoch 13 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: [0/480] loss: 0.435809 0.755974 (400/480) Test the Error: Accuracy: 65.0%, Avg loss: 0.621828 Epoch 14 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.403148/0/480 loss: 0.312620 [400/480] Test Error: Accuracy: 66.7%, Avg loss: 0.646973 Epoch 15 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 0.165473 [0/480] loss: 0.518625 [400/480] Test Error: Accuracy: 70.0%, Avg loss: 0.600993 Epoch 16 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: [0/480] loss: 0.328379 0.196470 (400/480) Test the Error: Accuracy: 72.5%, Avg loss: 0.526722 Epoch 17 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - loss: 1.021464/0/480 loss: 0.422744 [400/480] Test Error: Accuracy: 75.8%, Avg loss: 0.539513 Epoch 18 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- loss: 0.140470 [0/480] loss: 0.335353 [400/480] Test Error: Accuracy: 71.7%, Avg loss: 0.538070 Epoch 19 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- loss: [0/480] loss: 0.265230 0.180824 (400/480) Test the Error: Accuracy: 75.0%, Avg loss: 0.485590 Epoch 20 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- loss: 0.277113/0/480 loss: 0.548571 [400/480] Test Error: Accuracy: 73.3% Avg loss: 0.498096 Done!Copy the code