This is the 25th day of my participation in the August Genwen Challenge.More challenges in August
One, foreword
Cat and dog recognition is an entry-level task for CNN network. By realizing cat and dog recognition, we can better understand the structure and operation effect of CNN network. What is more valuable is that cat and dog recognition is simple and effective, which can stimulate learning motivation.
Dogs vs. Cat | kaggle connections: www.kaggle.com/c/dogs-vs-c…
2. Prepare data sets
www.kaggle.com/c/dogs-vs-c…
From Kaggle, we can download 25,000 pictures of dogs and cats, including 12,500 cats and 12,500 dogs.
Here’s a tip for downloading kaggle data sets:
- First use Google Browser to download, Google Browser will go to Google mirror site (should be), and then copy the download link, in the download tools such as Thunderbolt open, download speed multiplied several times, reduce download waiting time.
Unzip the downloaded data into the train directory of the project file
Third, split the data
import os
import shutil
def get_address() :
""" Get all image paths """
data_file = os.listdir('./train/')
dog_file = list(filter(lambda x: x[:3] = ='dog', data_file))
cat_file = list(filter(lambda x: x[:3] = ='cat', data_file))
root = os.getcwd()
return dog_file, cat_file, root
def arrange() :
""" Collate data, move image position """
dog_file, cat_file, root = get_address()
print('Start data collation')
Create a new folder
for i in ['dog'.'cat'] :for j in ['train'.'val'] :try:
os.makedirs(os.path.join(root,j,i))
except FileExistsError as e:
pass
# Move 10%(1250) of the dog image to the validation set
for i, file in enumerate(dog_file):
ori_path = os.path.join(root, 'train', file)
if i < 0.9*len(dog_file):
des_path = os.path.join(root, 'train'.'dog')
else:
des_path = os.path.join(root, 'val'.'dog')
shutil.move(ori_path, des_path)
Move 10%(1250) of the cat graph to the validation set
for i, file in enumerate(cat_file):
ori_path = os.path.join(root, 'train', file)
if i < 0.9*len(cat_file):
des_path = os.path.join(root, 'train'.'cat')
else:
des_path = os.path.join(root, 'val'.'cat')
shutil.move(ori_path, des_path)
print('Data collation completed')
Copy the code
Since Kaggle does not provide a validation set, we can divide part of the training set into validation sets. Supervised learning can follow the principle of 8:1:1. We divided 10% of the data into validation sets, namely 1250 pictures of cats and dogs.
It should be noted that the 2500 pieces taken out here cannot be returned to the training set for training. If the training set coincides with the verification set, overfitting will occur (the results are pretty good, but not useful in actual combat).
4. Convert to readable data
"""get_data.py"""def get_data(input_size, batch_size) :
""" Get file data and convert. ""
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
# Tandem multiple image transformation operations (training set)
# transforms. RandomResizedCrop (input_size) random sampling first, and then to cut out the image zooming for the same size
# RandomHorizontalFlip() rotates the image of a given PIL randomly and horizontally with a given probability
# transforms.totensor () transforms images into Tensor, normalized to [0,1]
# transforms.Normalize(mean=[0.5, 0.5, 0.5], STD =[0.5, 0.5, 0.5])
transform_train = transforms.Compose([
transforms.RandomResizedCrop(input_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5.0.5.0.5], std=[0.5.0.5.0.5]])Get the training set (via the aspect above)
train_set = ImageFolder('train', transform=transform_train)
Encapsulate the training set
train_loader = DataLoader(dataset=train_set,
batch_size=batch_size,
shuffle=True)
# Concatenate multiple image transformation operations (validation set)
transform_val = transforms.Compose([
transforms.Resize([input_size, input_size]), # Note that the Resize parameter is 2-dimensional, which is different from RandomResizedCrop
transforms.ToTensor(),
transforms.Normalize(mean=[0.5.0.5.0.5], std=[0.5.0.5.0.5]])Get the validation set (via the above aspect)
val_set = ImageFolder('val', transform=transform_val)
Encapsulate the validation set
val_loader = DataLoader(dataset=val_set,
batch_size=batch_size,
shuffle=False)
# output
return transform_train, train_set, train_loader, transform_val, val_set, val_loader
Copy the code
For reading data, I used pyTorch’s built-in reading function. In addition to reading data, it can also perform a unified processing of data while reading.
Here the ImageFolder in PyTorch is used to read the image set data directly (the first parameter determines the folder address), but each image is of a different size and needs to be converted into recognizable data. The read image needs to be transformed (i.e. the transform parameter). In addition to image scaling, normalization is also required to reduce data complexity and facilitate data processing. The transforms.Compose function concatenates these image changes and calls the ImageFolder to quickly obtain the required data. For example, above, I use its encapsulation of random clipping to the same size, random rotation, normalization, etc. This makes it easier to throw data into the network for training and to amplify features (a rotated dog is still a dog) in the image.
Build a network
Resnet-18: Residual network (18 specifies 18 layers with weights, including convolutional layer and full connection layer, excluding pooling layer and BN layer) (Resnet network may be introduced in a separate article after the detailed introduction, I will not go into details here, simply speaking, it is an improved CNN network)
Download the resnet18 network model and its pre-training model.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # Select training mode
# pretrained=True
# use the resnet18 model
transfer_model = models.resnet18(pretrained=True)
for param in transfer_model.parameters():
# shielding the weights of the pre-training model, training only the weights of the last layer of the full connection
param.requires_grad = False
# Change the dimension of the last layer, that is, replace the original full connection layer with a full connection layer of output dimension 2
Extract fixed parameters in fc layer
dim = transfer_model.fc.in_features
Set the full connection layer in the network to 2
transfer_model.fc = nn.Linear(dim, 2)
# Build a neural network
net = transfer_model.to(device)
Copy the code
Since we are dealing with a classification problem, and a dichotomy problem at that, we need to set the output for the full connection layer to 2. We’ll just keep the rest of the network structure different.
6. Set training parameters
input_size = 224
batch_size = 128 # Number of samples selected for one training (directly affecting GPU memory usage)
save_path = './weights.pt' # training parameter storage address
lr = 1e-3 # Learning rate
n_epoch = 10 # Number of training sessions
Copy the code
Set the training parameters: input_size: Batch_size: indicates the number of samples selected for a training (directly affecting GPU memory usage) save_PATH: indicates the storage address of training parameters LR: indicates the learning rate n_EPOCH = 10: indicates the number of training times
Seven, start training
def train(net, optimizer, device, criterion, train_loader) :
Training "" "" ""
net.train()
batch_num = len(train_loader)
running_loss = 0.0
for i, data in enumerate(train_loader, start=1) :Pass input to GPU(CPU)Inputs, labels = data inputs, labels = inputs. To (device), labels. To (device)Parameter gradient zero, forward, reverse, optimize
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# Calculate error and display
running_loss += loss.item()
if i % 10= =0:
print('batch:{}/{} loss:{:.3f}'.format(i, batch_num, running_loss / 20))
running_loss = 0.0
Copy the code
- Optimizer.zero_grad (): zero the gradient (because gradient calculation is cumulative).
- Inputs = net(Inputs): forward propagation, calculate the predicted value.
- Loss = criterion(outputs, labels): Calculate the loss.
- Loss. Backward (): Backward propagation, calculate the current gradient.
- Optimizer.step () : Updates network parameters according to the gradient.
Basically, there’s nothing to talk about, just a process of throwing in data batch by batch, doing calculations, figuring out the loss function and passing it back, updating the network parameters, and eventually converging the image features of cats and dogs.
Validation function
def validate(net, device, val_loader) :
"" verification function ""
net.eval(a)# Test, need to be off dropout
correct = 0
total = 0
with torch.no_grad():
for data in val_loader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Test network accuracy of image: %d %%' %
(100 * correct / total))
Copy the code
In validation, it’s important to start the validation mode with Net.eval () and turn off the dropout. Otherwise, we’ll change our trained network and break it.
Start training
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.fc.parameters(), lr=lr)
# optimizer = torch.optim.Adam(net.parameters(), lr=lr)
for epoch in range(n_epoch):
print('{} session'.format(epoch+1))
f.train(net, optimizer, device, criterion, train_loader)
f.validate(net, device, val_loader)
Save model parameters
torch.save(net.state_dict(), save_path)
Copy the code
The optimizer I chose was the stochastic gradient descent method (both used by Adam and SGD was slightly better than others).
Because it is a classification problem, the cross entropy loss function is used here (it will also be introduced in a separate chapter later).
X. Training results
The accuracy of the network reached 95% after a single training and 97% after ten training.
Here is a simple encapsulation of the network using TK. The output is as follows:
Project address: github.com/1224667889/…