Preface:

In the direction of computer vision, the essence of data enhancement is to artificially introduce the prior knowledge of human vision, which can greatly improve the performance of the model. At present, it has basically become the standard configuration of the model. In recent years, there are many new data enhancement methods. This article will summarize data enhancement.

This paper introduces the function of data enhancement, classification of data enhancement, common methods of data enhancement, some special methods, such as Cutout, Random Erasing, Mixup, hide-and-seek, CutMix, GridMask, etc. FenceMask and KeepAugment methods are also introduced, some augment methods based on varied text such as SMOTE, Mosaic and SamplePairing.

It is worth mentioning that in almost every one of these papers, CNN visualization method is used to prove the effectiveness of its enhancement method, and to some extent, it is also to increase workload and word count. Readers interested in CNN visualization methods can read the series of summary articles in the CV Technical Guide.

CNN Visualization Technology Summary

Click follow and update two computer vision articles a day

The role of data enhancement

1. Avoid overfitting. When the data set has some obvious features, for example, the images in the data set are basically taken in the same scene, Cutout method and style transfer and change can be used to avoid the model learning information irrelevant to the target.

2. Improve the robustness of the model and reduce the sensitivity of the model to images. When the training data are in an ideal state, it is easy to identify errors in some special situations, such as occlusion, brightness and blur. Adding noise to the training data and mask can improve the robustness of the model.

3. Increase training data and improve model generalization ability.

4. Avoid sample imbalance. In the aspect of industrial defect detection and medical disease identification, it is easy to have extreme imbalance of positive and negative samples. Through some data enhancement methods for a small number of samples, the unbalanced proportion of samples can be reduced.

Classification of data enhancement

Data enhancement can be classified into two types: online enhancement and offline enhancement. The difference between the two is that offline enhancement is to process data sets before training, which can often get multiple data sets, while online enhancement is to preprocess loaded data during training without changing the amount of training data.

Offline enhancement is generally used for small data sets when training data is insufficient, and online enhancement is generally used for large data sets.

Commonly used method

The more commonly used geometric transformation methods are: flip, rotate, cut, zoom, translation, jitter. It is worth noting that in some specific tasks, the change of main label data is required when these methods are used. For example, if flipping is used in target detection, the GT frame needs to be adjusted accordingly.

The commonly used pixel transformation methods include adding salt and pepper noise, Gaussian noise, Gaussian blur, adjusting HSV contrast, adjusting brightness, saturation, histogram equalization, and adjusting white balance, etc.

These common methods are relatively simple and will not be described here.

Cutout(2017)

The method is derived from Improved Regularization of Convolutional Neural Networks with Cutout

In some tasks such as human pose estimation, face recognition, target tracking and pedestrian re-recognition, occlusion often occurs. In order to improve the robustness of the model, Cutout data enhancement method is proposed. The method is based on the fact that Cutout enables CNN to make better use of the global information of images rather than relying on a small number of specific visual features.

Method: Randomly select a small square area of an image, and set the pixel value in this area to 0 or other uniform values. Note: There is a 50% chance of not using Cutout for images.

The renderings are as follows:

Official code:

Github.com/uoguelph-ml…

Random Erasing(2017)

This method is derived from the paper “Random Erasing Data Augmentation mentation”

This approach is somewhat similar to Cutout, which was published in the same year. Different from Cutout, the length and width of the Random Erasing mask area and the alternative value of the pixel value in the area are all Random. Cutout uses a fixed square and the alternative value is the same.

The specific algorithm is as follows:

The renderings are as follows:

Official code:

Github.com/zhunzhong07…

Mixup(2018)

This method is derived from the paper mixup: BEYOND EMPIRICAL RISK MINIMIZATION

The main idea is to randomly select two images in the data set according to a certain proportion of fusion, including the label value. The code is given in the paper, which can be well understood at a glance.

The renderings are as follows:

A combination of a sailing boat and a panda.

Official code:

Github.com/facebookres…

Hide-and-Seek(2018)

It came from the paper hide-and-seek: A Data Augmentation Technique for Weakly Supervised Localization and Beyond. ‘

The main idea is to divide the picture into S x S grids, and each grid is masked according to a certain probability (0.5). It is inevitable that a complete small target will be completely masked. When this idea is used for behavior recognition, the video frame is divided into several sections, and each section is masked according to a certain probability.

Note: The paper mentioned that the alternative values used in the mask would have a certain influence on the recognition. After some theoretical calculations, the mean value of the pixel values of the whole image has the least influence.

In the summary of CNN visualization technology, we mentioned that CNN visualization can improve the credibility of the method, increase the workload and increase the number of words, which is well reflected in this paper. The paper uses visualization methods such as CAM and convolution kernel visualization to analyze the rationality of the algorithm.

Summary of CNN Visualization Techniques — Convolutional Kernel Visualization

CNN Visualization Technology Summary — Class Visualization

Effect:

Official code:

Github.com/kkanshul/Hi…

CutMix(2019)

This method comes from CutMix: Regularization Strategy to Train Strong Classififiers with Localizable Features.

This method combines the ideas of Cutout, Random erasing and Mixup, and makes some intermediate and harmonic changes. It also selects a small area and carries out the mask, but the mask covers the area of another picture here. You can understand this better by looking at a picture.

After understanding the above figure, the implementation is relatively simple, the formula is as follows:

For setting the size of this mask area, use the following formula to determine:

The width and height always satisfy this equation.

The effect is as follows:

Official code:

Github.com/clovaai/Cut…

GridMask(2020)

This method comes from GridMask Data Augmentation

The main idea is to improve the previous methods. Because the selection of the mask area in the previous methods is random, it is easy to cover all important parts. GridMask, on the other hand, is at best and almost certainly partially covered. The way used is to arrange square areas to carry out the mask.

The specific implementation is to determine the mask by setting the side length of each small square and the distance D between the two masks, so as to control the fine granularity of the mask.

The effect is as follows:

Official code:

Github.com/akuxcw/Grid…

FenceMask(2020)

This method is derived from FenceMask: A Data Augmentation Approach for Pre-extracted Image Features.

This method is an improvement on the previous GridMask, and it is believed that using a square mask will have a great effect on small targets. Thus a better shape is proposed and FenceMask has better fineness.

The effect comparison is as follows:

KeepAugment(2020)

This Approach comes from KeepAugment: A Simple Information-Preserving Data Augmentation Approach

The main idea is to improve the random selection of mask regions in the previous methods. By obtaining Saliency map, the least important region can be analyzed and selected for Cutout, or the most important region can be analyzed for CutMix.

The calculation method of saliency map region is consistent with the method of class visualization. The gradient of each pixel value is obtained by calculating the return gradient, so as to determine the degree of influence of each pixel value on the category. The division of the most important region and the least important region is determined by the sum of all gradients in this region greater or less than a corresponding threshold.

For details on how to calculate saliency map, see CNN Visualization Summary – Class Visualization.

The algorithm is as follows :(Selcetive-cut is Cutout, Selective-paste is CutMix)

The effect is as follows:

Other data enhancement methods

RandAugment, FastAugment and AutoAugment all belong to constructing a set of data augmentation methods. The reinforcement learning method is adopted to search for data augmentation methods suitable for the specified data set. The biggest characteristic of these methods is that they cost a lot and take a long time, so I won’t introduce them here.

Note: the above mentioned methods are basically close to zero cost, more commonly used characteristics, here several only suitable for aristocrats.

There are also data enhancements such as style migration via GAN.

Various data enhancement methods

In addition to CutMix and Mixup, the methods mentioned above are basically single-sample enhancement. In addition, there are also multi-sample enhancement methods, whose main principle is to use multiple samples to generate new samples.

SMOTE– this method comes from way back in 2002. It is mainly used to obtain new samples from small data sets. The realization method is to randomly select a sample, calculate its distance with other samples, and get K nearest neighbors. Multiple samples are randomly selected from K nearest neighbors to construct new samples. The construction method in this paper is not mentioned because it is not used for images, but the reader can design the construction method for images.

Mosaic- Derived from YOLO_v4, Mosaic takes four images and stitches them together into a single image. The advantage of doing this is that the background of the image is no longer a single scene, but in four different scenes. When BN is used, it is equivalent to normalizing each layer on four pictures at the same time, which can greatly reduce batch-size.

SamplePairing- The principle of this method is to randomly select 2 pictures from the training set and then average the new samples per pixel after geometric enhancement. The details are shown in the figure below:

conclusion

This paper introduces the common data enhancement methods, several special enhancement methods, several kinds of this enhancement methods.

Theoretically, the data enhancement method should also include some methods used to solve the imbalance of positive and negative samples, such as hard negative example mining, focal loss, etc.

DropOut, DropConnect, and DropBlock are also data enhancement methods because they are similar to Cutout, hide-and-seek, and GridMask in that they selectively discard some data.

Note: reply “Data Enhancement” in the CV Technical Guide to obtain 13 papers related to data enhancement.

This article comes from the public CV technical guide technical summary series.

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

A summary PDF of the following articles can be obtained by replying to the keyword “Technical Summary” in the public account.

Other articles

Summary of computer vision terms (a) to build the knowledge system of computer vision

Summary of under-fitting and over-fitting techniques

Summary of normalization methods

Summary of common ideas of paper innovation

Summary of efficient Reading methods of English literature in CV direction

A review of small sample learning in computer vision

A brief overview of intellectual distillation

Optimize the read speed of OpenCV video

NMS summary

Loss function technology summary

Summary of attention mechanism technology

Summary of feature pyramid technology

Summary of pooling techniques

Summary of data enhancement methods

Summary of CNN structure Evolution (I) Classical model

Summary of CNN structural evolution (II) Lightweight model

Summary of CNN structure evolution (iii) Design principles

How to view the future trend of computer vision

Summary of CNN visualization technology (I) – feature map visualization

Summary of CNN visualization Technology (ii) – Convolutional kernel visualization

CNN Visualization Technology Summary (iii) – Class visualization

CNN Visualization Technology Summary (IV) – Visualization tools and projects

The reference papers

​Improved Regularization of Convolutional Neural Networks with Cutout
Random Erasing Data Augmentation
mixup: BEYOND EMPIRICAL RISK MINIMIZATION
Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond
CutMix: Regularization Strategy to Train Strong Classififiers with Localizable Features
GridMask Data Augmentation
FenceMask: A Data Augmentation Approach for Pre-extracted Image Features
KeepAugment: A Simple Information-Preserving Data Augmentation Approach
SMOTE: Synthetic Minority Over-sampling Technique
YOLOv4: Optimal Speed and Accuracy of Object Detection
Data Augmentation by Pairing Samples for Images Classifification
Copy the code