Ta-ying Cheng is a PhD candidate at Oxford University and a Medium technology blogger. Her articles have been collected by Towards Data Science
Translation | Song Xian
Image classification has been one of the hottest fields in deep learning over the past few years. Traditional image recognition relies heavily on processing methods such as expansion/erosion or frequency domain transformation, but the difficulty of feature extraction limits the progress of these methods.
Nowadays, neural networks significantly improve the accuracy of image recognition, because neural networks can find the relationship between the input image and the output label, and thus constantly adjust its recognition strategy.
However, neural networks often need a lot of data for training, and high-quality training data is not readily available. As a result, many are studying how they can achieve so-called Data augmentation, or mentation by adding Data out of thin air to an existing small Data set.
This article introduces Mixup, a simple and effective data enhancement strategy, and shows you how to implement Mixup directly in PyTorch.
Why do you need data enhancement?
The parameters in the neural network architecture are trained and updated according to the given data. However, since the training data only covers the distribution of a certain part of the possible data, the network is likely to overfit in the “visible” part of the distribution.
So the more training data we have, the more we can theoretically cover the entire distribution, which is why data-centric AI is so important. Of course, in the case of limited data, we are not without solutions. With data enhancement, we can try to generate new data by fine-tuning the original data and send it to the network as a “new” sample for training.
What is Mixup?
Figure 1: A simple demo of Mixup
Suppose all we want to do now is categorize pictures of cats and dogs, and we have a set of data labeled as cat and dog (for example, [1, 0] -> dog, [0, 1] -> cat), then Mixup simply averages two images and their labels into a new data set.
Specifically, we can write the concept of Mixup mathematically:
Where x and y are images and labels mixed with xi (labeled yᵢ) and xⱼ (labeled yⱼ) respectively, while λ is a random number taken from a given beta distribution.
As a result, Mixup can provide us with a continuous sample of data between different data categories and thus directly expand the distribution of a given training set, making the network more powerful during the testing phase.
The versatility of Mixup
Mixup is really just a data enhancement approach that is orthogonal to any network architecture used for classification. That is, we can use the Mixup method on the corresponding data set in any network where we want to classify tasks.
Zhang Hongyi et al., the originator of Mixup, conducted experiments on multiple data sets and architectures based on their original paper Mixup: Beyond Empirical Risk Minimization, and found that Mixup can also demonstrate its powerful capabilities in applications other than neural networks.
Computing environment
library
We will build the entire program with PyTorch (including TorchVision). The samples that Mixup needs to generate from the beta distribution are available from the NumPy library. We will also use random to find random images for Mixup. The following code imports all the libraries we need:
"""
Import necessary libraries to train a network using mixup
The code is mainly developed using the PyTorch library
"""
import numpy as np
import pickle
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
Copy the code
The data set
For demonstration purposes, we will use the traditional image classification task to illustrate the power of Mixup, in which case CIFAR-10 would be an ideal data set. Cifar-10 contains 60,000 color images in 10 categories (6,000 per category), divided into training and test sets in a 5:1 ratio. These images are fairly simple to classify, but more difficult than MNIST, the most basic digital recognition dataset.
There are many ways to download the CIFAR-10 dataset, such as the University of Toronto website. I recommend using Gewu Titanium’s open dataset platform, where you can access free dataset resources without downloading them if you use their SDK.
In fact, the open dataset platform contains hundreds of well-known and high-quality datasets in the industry, each with an associated author description, as well as labels for different training tasks, such as classification or target detection. Of course, you can also download other classification datasets, such as CompCars or SVHN, to test Mixup in different scenarios.
Hardware requirements
In general, it is best to train a neural network with a GPU (graphics card) because it can significantly improve the training speed. However, if only the CPU is available, we can still do a simple test of the program.
If you want your application to determine the hardware it needs, use the following code:
"""
Determine if any GPUs are available
"""
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Copy the code
implementation
network
Here, our goal is to test the performance of Mixup rather than debug the network itself, so we simply need to implement a convolutional neural network (CNN) with four convolutional layers and two fully connected layers. To compare mixups and non-mixups, we’ll apply the same network to make sure the comparison is accurate.
We can use the following code to build the simple network described above:
"""
Create a simple CNN
"""
class CNN(nn.Module) :
def __init__(self) :
super(CNN, self).__init__()
# Network consists of 4 convolutional layers followed by 2 fully-connected layers
self.conv11 = nn.Conv2d(3.64.3)
self.conv12 = nn.Conv2d(64.64.3)
self.conv21 = nn.Conv2d(64.128.3)
self.conv22 = nn.Conv2d(128.128.3)
self.fc1 = nn.Linear(128 * 5 * 5.256)
self.fc2 = nn.Linear(256.10)
def forward(self, x) :
x = F.relu(self.conv11(x))
x = F.relu(self.conv12(x))
x = F.max_pool2d(x, (2.2))
x = F.relu(self.conv21(x))
x = F.relu(self.conv22(x))
x = F.max_pool2d(x, (2.2))
# Size is calculated based on kernel size 3 and padding 0
x = x.view(-1.128 * 5 * 5)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return nn.Sigmoid()(x)
Copy the code
Mixup
The Mixup stage is done during dataset loading, so we have to write our own dataset instead of using the default dataset provided by TorchVision.datasets.
The following code simply implements Mixup, using NumPy’s beta function in combination.
"""
Dataset and Dataloader creation
All data are downloaded found via Graviti Open Dataset which links to CIFAR-10 official page
The dataset implementation is where mixup take place
"""
class CIFAR_Dataset(Dataset) :
def __init__(self, data_dir, train, transform) :
self.data_dir = data_dir
self.train = train
self.transform = transform
self.data = []
self.targets = []
# Loading all the data depending on whether the dataset is training or testing
if self.train:
for i in range(5) :with open(data_dir + 'data_batch_' + str(i+1), 'rb') as f:
entry = pickle.load(f, encoding='latin1')
self.data.append(entry['data'])
self.targets.extend(entry['labels'])
else:
with open(data_dir + 'test_batch'.'rb') as f:
entry = pickle.load(f, encoding='latin1')
self.data.append(entry['data'])
self.targets.extend(entry['labels'])
# Reshape it and turn it into the HWC format which PyTorch takes in the images
# Original CIFAR format can be seen via its official page
self.data = np.vstack(self.data).reshape(-1.3.32.32)
self.data = self.data.transpose((0.2.3.1))
def __len__(self) :
return len(self.data)
def __getitem__(self, idx) :
# Create a one hot label
label = torch.zeros(10)
label[self.targets[idx]] = 1.
# Transform the image by converting to tensor and normalizing it
if self.transform:
image = transform(self.data[idx])
# If data is for training, perform mixup, only perform mixup roughly on 1 for every 5 images
if self.train and idx > 0 and idx%5= =0:
# Choose another image/label randomly
mixup_idx = random.randint(0.len(self.data)-1)
mixup_label = torch.zeros(10)
label[self.targets[mixup_idx]] = 1.
if self.transform:
mixup_image = transform(self.data[mixup_idx])
# Select a random number from the given beta distribution
# Mixup the images accordingly
alpha = 0.2
lam = np.random.beta(alpha, alpha)
image = lam * image + (1 - lam) * mixup_image
label = lam * label + (1 - lam) * mixup_label
return image, label
Copy the code
Note that we did not Mixup all images, but roughly one in five. We also used a beta distribution of 0.2. You can vary the distribution and the number of images to be mixed for different experiments and you might get better results!
Training and evaluation
The following code shows the training process. We set the batch size to 128, the learning rate to 1E-3, and the total number of times to 30. The entire exercise was done twice, the only difference being whether Mixup was used or not.
Note that the loss function needs to be defined by ourselves, as BCE losses are currently not allowed with labels with decimals.
"""
Initialize the network, loss Adam optimizer
Torch BCE Loss does not support mixup labels (not 1 or 0), so we implement our own
"""
net = CNN().to(device)
optimizer = torch.optim.Adam(net.parameters(), lr=LEARNING_RATE)
def bceloss(x, y) :
eps = 1e-6
return -torch.mean(y * torch.log(x + eps) + (1 - y) * torch.log(1 - x + eps))
best_Acc = 0
"""
Training Procedure
"""
for epoch in range(NUM_EPOCHS):
net.train()
# We train and visualize the loss every 100 iterations
for idx, (imgs, labels) in enumerate(train_dataloader):
imgs = imgs.to(device)
labels = labels.to(device)
preds = net(imgs)
loss = bceloss(preds, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if idx%100= =0:
print("Epoch {} Iteration {}, Current Loss: {}".format(epoch, idx, loss))
# We evaluate the network after every epoch based on test set accuracy
net.eval(a)with torch.no_grad():
total = 0
numCorrect = 0
for (imgs, labels) in test_dataloader:
imgs = imgs.to(device)
labels = labels.to(device)
preds = net(imgs)
numCorrect += (torch.argmax(preds, dim=1) == torch.argmax(labels, dim=1)).float().sum()
total += len(imgs)
acc = numCorrect/total
print("Current image classification accuracy at epoch {}: {}".format(epoch, acc))
if acc > best_Acc:
best_Acc = acc
Copy the code
To evaluate Mixup, we conducted three controlled trials to calculate the final accuracy. The network’s accuracy on the test set was about 74.5% without Mixup, and improved to about 76.5% with Mixup!
Beyond image classification
Mixup has taken the accuracy of image classification to an unprecedented level, but the research shows that its benefits extend to other computer vision tasks, such as the generation and defense of adversarial data. In addition, there are related literatures that extend Mixup to 3D representation, and the current results show that Mixup is also very effective in this field, such as PointMixup.
conclusion
So, our little Mixup experiment is done! In this article, we briefly introduce the concept of Mixup and demonstrate how to apply Mixup in image classification network training. The full implementation can be found in the GitHub repository.
[About Tiaowu Titanium] :
Positioned as a data platform for machine learning, Gewu Ti Is committed to creating the next generation of new infrastructure for AI developers, fundamentally changing the way they interact with unstructured data. Through unstructured data management tool TensorBay and Open Datasets, we help machine learning teams and individuals reduce the cost of data acquisition, storage and processing, accelerate AI development and product innovation, and provide a solid foundation for AI to empower thousands of industries and drive industrial upgrading.