Data, algorithm and computing power are the three elements of artificial intelligence development. Data determines the upper limit of Ai model learning. The larger the data scale and the higher the quality, the better the model can have generalization ability. However, in practical engineering, there are often problems such as too little Data (relative to the model), unbalanced samples, and difficulty in covering all scenarios. An effective solution to these problems is to achieve better generalization performance of model learning through Data Augmentation.

1 Data enhancement

Data Augmentation is the processing of additional representations from raw Data without substantively adding Data, improving the quantity and quality of the raw Data to approximate the value generated by more Data. The principle is that by integrating prior knowledge into the original data, more data representations can be processed, which is helpful for the model to discriminate statistical noise in the data, strengthen the learning of ontology features, reduce model overfitting and improve generalization ability.

Take the classic machine learning example of husky misclassifying as Wolf:

By interpretable methods, it was found that the misclassification was due to snow on the image. Generally, there is less snow background in the dog image than the Wolf image, and the classifier learns to use snow as a feature to classify the image as Wolf or dog, while ignoring the characteristics of the animal itself. At this point, the model can be trained by adding transformed data (such as background color change, adding noise, etc.) through the method of data enhancement to help the model learn the characteristics of ontology and improve generalization ability.It should be noted that data enhancement samples may also introduce one-sided noise, leading to over-fitting. At this time, it is necessary to adjust the data enhancement method or select the best subset of the enhanced data through the algorithm (which can refer to the idea of PU-learning) to improve the generalization ability of the model.

Common data enhancement methods can be divided into data enhancement based on sample transformation and data enhancement based on deep learning.

2. Data enhancement based on sample transformation

Sample transformation data enhancement refers to the use of preset data transformation rules to amplify the existing data, including single sample data enhancement and multiple data enhancement.

2.1 Single sample enhancement

Single (image) sample enhancement mainly includes geometric operation, color transformation, random erasure, adding noise and other methods, which can be found in imgaug open source library.

2.2 Various data enhancement methods

Variety original enhancement is to construct the neighborhood value of known samples in the characteristic space by combining prior knowledge and transforming multiple samples, mainly in Smote, SamplePairing, Mixup etc.

  • Smote(Synthetic Minority Over-sampling Technique)

Smote method is usually used for sample balance learning. The core idea of Smote is to get a new sample from the two nearest neighbors of the random class in the training set. The method can be divided into three steps:

1. For each sample X_i, calculate the Euclidean distance with similar samples and determine K nearest neighbor samples of the same type (as shown in Figure 3);

2. A sample such as nearest neighbor X_ik is randomly selected from the sample k nearest neighbor to generate a new sample

Xsmote_ik =  Xi  +  rand(0,1) ∗ ∣X_i − X_ik∣
Copy the code

3. Repeat step 2 for N times to synthesize N new samples.

# SMOTE
from imblearn.over_sampling import SMOTE

print("Before OverSampling, counts of label\n{}".format(y_train.value_counts()))
smote = SMOTE()
x_train_res, y_train_res = smote.fit_resample(x_train, y_train)
print("After OverSampling, counts of label\n{}".format(y_train_res.value_counts()))
Copy the code
  • SamplePairing

The core idea of SamplePairing algorithm is to superposition two images randomly extracted from the training set to synthesize a new sample (average pixels) and use the label of the first image as the correct label of the composite image.

  • Mixup

The core idea of Mixup algorithm is to mix two training samples and their labels randomly in a certain proportion. This mixing method can not only increase the diversity of samples, but also make the decision boundary smoother, enhance the identification of difficult samples, and improve the robustness of the model. The method can be divided into two steps:

1. Two samples (Xi, Yi) and (xj, yj) randomly selected from the original training data. Y (original label) is one-hot coded.

2. Combine the two samples in proportion to form a new sample and a label with weight

X = λxi + (1 − λ)xj y = λyi + (1 − λ)yjCopy the code

The final loss is calculated as cross-entropy loss and weighted sum for each label. λ ∈ [0, 1], λ is the hyperparameter of mixUP, which controls the intensity of interpolation between the two samples.

# Mixup
def mixup_batch(x, y, step, batch_size, alpha=0.2) :
    """ get batch data :param x: training data :param y: one-hot label :param step: step :param batch_size: Batch size: Param alpha: hyper-parameter α, default as 0.2 :return: x y ""
    candidates_data, candidates_label = x, y
    offset = (step * batch_size) % (candidates_data.shape[0] - batch_size)
 
    # get batch data
    train_features_batch = candidates_data[offset:(offset + batch_size)]
    train_labels_batch = candidates_label[offset:(offset + batch_size)]

    if alpha == 0:
        return train_features_batch, train_labels_batch

    if alpha > 0:
        weight = np.random.beta(alpha, alpha, batch_size)
        x_weight = weight.reshape(batch_size, 1)
        y_weight = weight.reshape(batch_size, 1)
        index = np.random.permutation(batch_size)
        x1, x2 = train_features_batch, train_features_batch[index]
        x = x1 * x_weight + x2 * (1 - x_weight)
        y1, y2 = train_labels_batch, train_labels_batch[index]
        y = y1 * y_weight + y2 * (1 - y_weight)
        return x, y
Copy the code

3 data enhancement based on deep learning

3.1 Data enhancement of feature space

Different from the traditional data enhancement methods that transform in the input space, neural network can map the input samples to low-dimensional vectors (representational learning) in the network layer, so as to directly enhance the data by combinational transformation in the learning feature space, such as MoEx method.

3.2 Data enhancement based on generation model

Generative models such as Variational auto-encoding network (VAE) and Generative Adversarial network (GAN) can also be used to generate samples for data enhancement. This method based on network synthesis is more complex than traditional data enhancement techniques, but the samples generated are more diverse.

  • Variational self encoder VAE

The basic idea of VAE is to transform the real sample into an ideal data distribution through the encoder network, and then transmit the data distribution to the decoder network to construct the generated sample. The process of model training and learning is to make the generated sample close enough to the real sample.

# VAE class VAE(keras.model):... def train_step(self, data): with tf.GradientTape() as tape: z_mean, z_log_var, z = self.encoder(data) reconstruction = self.decoder(z) reconstruction_loss = tf.reduce_mean( tf.reduce_sum( keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)) kl_loss = -0.5 * (1 + Z_log_var-tf.square (z_mean) -tf.exp (z_log_var)) kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1)) total_loss = reconstruction_loss + kl_loss grads = tape.gradient(total_loss, self.trainable_weights) self.optimizer.apply_gradients(zip(grads, self.trainable_weights)) self.total_loss_tracker.update_state(total_loss) self.reconstruction_loss_tracker.update_state(reconstruction_loss) self.kl_loss_tracker.update_state(kl_loss) return { "loss": self.total_loss_tracker.result(), "reconstruction_loss": self.reconstruction_loss_tracker.result(), "kl_loss": self.kl_loss_tracker.result(), }Copy the code
  • Generate adversarial network GAN

Generative Adversarial Network (GAN) Generative Adversarial NetworkG) and Discriminator,D) and generate a network to form a mapping functionGZX(Input noisezOutput the generated image datax), discriminating network discriminates whether the input is from real data or data generated by the generated network.

# DCGAN Model (keras.model):... def train_step(self, real_images): batch_size = tf.shape(real_images)[0] random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim)) # G: Z→X (input noise Z, Output the generated image data x) generated_images = self.generator(random_latent_vectors) # combine the generated and real samples and assign the tag combined_images = tf.concat([generated_images, real_images], axis=0) labels = tf.concat( [tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))], axis=0) # add random noise labels += 0.05 * tf.random.uniform(tF.shape (labels)) # training decision network with tF.gradienttape () as tape: predictions = self.discriminator(combined_images) d_loss = self.loss_fn(labels, predictions) grads = tape.gradient(d_loss, self.discriminator.trainable_weights) self.d_optimizer.apply_gradients( zip(grads, self.discriminator.trainable_weights) ) random_latent_vectors = tf.random.normal(shape=(batch_size, Misleading_labels = tf.zeros((batch_size, batch_size, batch_size, batch_labels) GradientTape() 1) # GradientTape() predictions = self.discriminator(self.generator(random_latent_vectors)) g_loss = self.loss_fn(misleading_labels, predictions) grads = tape.gradient(g_loss, self.generator.trainable_weights) self.g_optimizer.apply_gradients(zip(grads, Self.generator. Trainable_weights)) # Update loss self.d_loss_metric.update_state(d_loss) self.g_loss_metric.update_state(g_loss) return { "d_loss": self.d_loss_metric.result(), "g_loss": self.g_loss_metric.result(), }Copy the code

3.3 Data enhancement based on neural style transfer

Neural Style Transfer can Transfer the Style of one image to another while preserving the original content. In addition to achieving similar color space lighting transformations, different textures and artistic styles can be generated.

Neural style transfer is achieved by optimizing three types of losses:

Style_loss: Make the generated image close to the local texture of the style reference image;

Content_loss: Make the content representation of the generated image close to the representation of the basic image;

Total_variation_loss: is a regularization loss that keeps the generated image locally consistent.

Def style_loss(style, combination): S = gram_matrix(style) C = gram_matrix(combination) channels = 3 size = img_nrows * img_ncols return Tf.reduce_sum (tf.square(s-C))/(4.0 * (Channels ** 2) * (size ** 2)) # content_loss(base, combination): Return tf.reduce_sum(tf.square(combination-base)) # re loss def total_variation_loss(x): a = tf.square(x[:, :) img_nrows - 1, : img_ncols - 1, :] - x[:, 1:, : img_ncols - 1, :] ) b = tf.square( x[:, : img_nrows - 1, : Img_ncols - 1:] - [x:, : img_nrows - 1, 1: :]) return tf. Reduce_sum (tf) pow (1.25), a + b)Copy the code

3.4 Data enhancement based on meta-learning

Meta learning in deep learning research usually refers to the optimization of Neural networks by using Neural networks, and data enhancement by Meta learning includes methods such as Neural augmentation.

  • Nerve enhancement

Neural augmentation is a method of obtaining superior data augmentation and improved classification through learning of Neural network groups. The method and steps are as follows:

1. A pair of random images of the same category as target images are obtained, and they are mapped into synthetic images by the front-end enhancement network through CNN. The synthetic images are compared with target images to calculate the loss.

2. Input the synthetic image into the classification network after converting the neural style of target image, and output the classification loss of the image;

3. After weighted average of loss enhancement and classification, reverse propagation is performed to update classification network and enhance network weight. The difference of the output image in the same class is reduced and the classification is accurate.


This article was first published in the algorithm advanced, public number to read the original text can access GitHub source