Resnet18- Cat and Dog recognition

This is the 25th day of my participation in the August Genwen Challenge.More challenges in August

One, foreword

Cat and dog recognition is an entry-level task for CNN network. By realizing cat and dog recognition, we can better understand the structure and operation effect of CNN network. What is more valuable is that cat and dog recognition is simple and effective, which can stimulate learning motivation.

Dogs vs. Cat | kaggle connections: www.kaggle.com/c/dogs-vs-c…

2. Prepare data sets

www.kaggle.com/c/dogs-vs-c…

From Kaggle, we can download 25,000 pictures of dogs and cats, including 12,500 cats and 12,500 dogs.

Here’s a tip for downloading kaggle data sets:

First use Google Browser to download, Google Browser will go to Google mirror site (should be), and then copy the download link, in the download tools such as Thunderbolt open, download speed multiplied several times, reduce download waiting time.

Unzip the downloaded data into the train directory of the project file

Third, split the data

import os
import shutil
def get_address() :
    """ Get all image paths """
    data_file = os.listdir('./train/')
    
    dog_file = list(filter(lambda x: x[:3] = ='dog', data_file))
    cat_file = list(filter(lambda x: x[:3] = ='cat', data_file))

    root = os.getcwd()

    return dog_file, cat_file, root
    
def arrange() :
    """ Collate data, move image position """
    dog_file, cat_file, root = get_address()

    print('Start data collation')
    Create a new folder
    for i in ['dog'.'cat'] :for j in ['train'.'val'] :try:
                os.makedirs(os.path.join(root,j,i))
            except FileExistsError as e:
                pass

    # Move 10%(1250) of the dog image to the validation set
    for i, file in enumerate(dog_file):
        ori_path = os.path.join(root, 'train', file)
        if i < 0.9*len(dog_file):
            des_path = os.path.join(root, 'train'.'dog')
        else:
            des_path = os.path.join(root, 'val'.'dog')
        shutil.move(ori_path, des_path)

    Move 10%(1250) of the cat graph to the validation set
    for i, file in enumerate(cat_file):
        ori_path = os.path.join(root, 'train', file)
        if i < 0.9*len(cat_file):
            des_path = os.path.join(root, 'train'.'cat')
        else:
            des_path = os.path.join(root, 'val'.'cat')
        shutil.move(ori_path, des_path)

    print('Data collation completed')
Copy the code

Since Kaggle does not provide a validation set, we can divide part of the training set into validation sets. Supervised learning can follow the principle of 8:1:1. We divided 10% of the data into validation sets, namely 1250 pictures of cats and dogs.

It should be noted that the 2500 pieces taken out here cannot be returned to the training set for training. If the training set coincides with the verification set, overfitting will occur (the results are pretty good, but not useful in actual combat).

4. Convert to readable data

"""get_data.py"""def get_data(input_size, batch_size) :
    """ Get file data and convert. ""
    from torchvision import transforms
    from torchvision.datasets import ImageFolder
    from torch.utils.data import DataLoader

    # Tandem multiple image transformation operations (training set)
    # transforms. RandomResizedCrop (input_size) random sampling first, and then to cut out the image zooming for the same size
    # RandomHorizontalFlip() rotates the image of a given PIL randomly and horizontally with a given probability
    # transforms.totensor () transforms images into Tensor, normalized to [0,1]
    # transforms.Normalize(mean=[0.5, 0.5, 0.5], STD =[0.5, 0.5, 0.5])
    transform_train = transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5.0.5.0.5], std=[0.5.0.5.0.5]])Get the training set (via the aspect above)
    train_set = ImageFolder('train', transform=transform_train)
    Encapsulate the training set
    train_loader = DataLoader(dataset=train_set,
                              batch_size=batch_size,
                              shuffle=True)

    # Concatenate multiple image transformation operations (validation set)
    transform_val = transforms.Compose([
        transforms.Resize([input_size, input_size]),  # Note that the Resize parameter is 2-dimensional, which is different from RandomResizedCrop
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5.0.5.0.5], std=[0.5.0.5.0.5]])Get the validation set (via the above aspect)
    val_set = ImageFolder('val', transform=transform_val)
    Encapsulate the validation set
    val_loader = DataLoader(dataset=val_set,
                            batch_size=batch_size,
                            shuffle=False)
    # output
    return transform_train, train_set, train_loader, transform_val, val_set, val_loader
Copy the code

For reading data, I used pyTorch’s built-in reading function. In addition to reading data, it can also perform a unified processing of data while reading.

Here the ImageFolder in PyTorch is used to read the image set data directly (the first parameter determines the folder address), but each image is of a different size and needs to be converted into recognizable data. The read image needs to be transformed (i.e. the transform parameter). In addition to image scaling, normalization is also required to reduce data complexity and facilitate data processing. The transforms.Compose function concatenates these image changes and calls the ImageFolder to quickly obtain the required data. For example, above, I use its encapsulation of random clipping to the same size, random rotation, normalization, etc. This makes it easier to throw data into the network for training and to amplify features (a rotated dog is still a dog) in the image.

Build a network

Resnet-18: Residual network (18 specifies 18 layers with weights, including convolutional layer and full connection layer, excluding pooling layer and BN layer) (Resnet network may be introduced in a separate article after the detailed introduction, I will not go into details here, simply speaking, it is an improved CNN network)

Download the resnet18 network model and its pre-training model.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")     # Select training mode
# pretrained=True
# use the resnet18 model
transfer_model = models.resnet18(pretrained=True)
for param in transfer_model.parameters():
    # shielding the weights of the pre-training model, training only the weights of the last layer of the full connection
    param.requires_grad = False
# Change the dimension of the last layer, that is, replace the original full connection layer with a full connection layer of output dimension 2
Extract fixed parameters in fc layer
dim = transfer_model.fc.in_features
Set the full connection layer in the network to 2
transfer_model.fc = nn.Linear(dim, 2)
# Build a neural network
net = transfer_model.to(device)
Copy the code

Since we are dealing with a classification problem, and a dichotomy problem at that, we need to set the output for the full connection layer to 2. We’ll just keep the rest of the network structure different.

6. Set training parameters

input_size = 224
batch_size = 128    # Number of samples selected for one training (directly affecting GPU memory usage)
save_path = './weights.pt'  # training parameter storage address
lr = 1e-3             # Learning rate
n_epoch = 10          # Number of training sessions
Copy the code

Set the training parameters: input_size: Batch_size: indicates the number of samples selected for a training (directly affecting GPU memory usage) save_PATH: indicates the storage address of training parameters LR: indicates the learning rate n_EPOCH = 10: indicates the number of training times

Seven, start training

def train(net, optimizer, device, criterion, train_loader) :
　  Training "" "" ""
    net.train()
    batch_num = len(train_loader)
    running_loss = 0.0
    for i, data in enumerate(train_loader, start=1) :Pass input to GPU(CPU)Inputs, labels = data inputs, labels = inputs. To (device), labels. To (device)Parameter gradient zero, forward, reverse, optimize
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # Calculate error and display
        running_loss += loss.item()
        if i % 10= =0:
            print('batch:{}/{} loss:{:.3f}'.format(i, batch_num, running_loss / 20))
            running_loss = 0.0
Copy the code

Optimizer.zero_grad (): zero the gradient (because gradient calculation is cumulative).
Inputs = net(Inputs): forward propagation, calculate the predicted value.
Loss = criterion(outputs, labels): Calculate the loss.
Loss. Backward (): Backward propagation, calculate the current gradient.
Optimizer.step () : Updates network parameters according to the gradient.

Basically, there’s nothing to talk about, just a process of throwing in data batch by batch, doing calculations, figuring out the loss function and passing it back, updating the network parameters, and eventually converging the image features of cats and dogs.

Validation function

def validate(net, device, val_loader) :
    "" verification function ""
    net.eval(a)# Test, need to be off dropout
    correct = 0
    total = 0
    with torch.no_grad():
        for data in val_loader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print('Test network accuracy of image: %d %%' %
          (100 * correct / total))
Copy the code

In validation, it’s important to start the validation mode with Net.eval () and turn off the dropout. Otherwise, we’ll change our trained network and break it.

Start training

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.fc.parameters(), lr=lr)
# optimizer = torch.optim.Adam(net.parameters(), lr=lr)
for epoch in range(n_epoch):
    print('{} session'.format(epoch+1))
    f.train(net, optimizer, device, criterion, train_loader)
    f.validate(net, device, val_loader)

Save model parameters
torch.save(net.state_dict(), save_path)
Copy the code

The optimizer I chose was the stochastic gradient descent method (both used by Adam and SGD was slightly better than others).

Because it is a classification problem, the cross entropy loss function is used here (it will also be introduced in a separate chapter later).

X. Training results

The accuracy of the network reached 95% after a single training and 97% after ten training.

Here is a simple encapsulation of the network using TK. The output is as follows:

Project address: github.com/1224667889/…