• How to Configure Image Data Augmentation When Training Deep Learning Neural Networks
  • Originally by Jason Brownlee
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: ccJia
  • Proofreader: LSvih, Minghao23

Image data enhancement is a way to increase the training set manually, mainly by modifying the images in the data set.

More training data leads to more efficient deep learning models, while data enhancement techniques generate more image variations, which improve the model’s generalization ability to new images.

In the Keras framework, the ImageDataGenerator class provides data enhancement capabilities.

In this tutorial, you will learn how to use image data enhancement techniques when training models.

After completing this tutorial, you will know the following:

  • Image data enhancement is to expand the training data set to improve the accuracy and generalization ability of the model.
  • In the Keras framework, you can passImageDataGeneratorClass uses image data enhancement methods.
  • How to use pan, flip, brightness, and zoom data enhancement methods.

Let’s get started.

Tutorial overview

This tutorial is divided into the following eight parts:

  1. Image data enhancement
  2. The sample images
  3. Use ImageDataGenerator for data enhancement
  4. Horizontal and vertical translation is enhanced
  5. Horizontal and vertical flips are enhanced
  6. Random rotation enhancement
  7. Random brightness enhancement
  8. Random scaling enhancement

Data to enhance

The performance of deep learning networks is always proportional to the amount of data.

Data enhancement is a method to manually add new training data on the basis of the original data. It is achieved by using domain-specific techniques to transform the data of the training set into a new data.

Image data enhancement is probably the most well-known data enhancement method, which mainly involves creating a distorted image that belongs to the same category as the original image.

From the field of image processing we can obtain a lot of deformation methods, such as: translation, flip, zoom and so on.

The main intention is to extend the training data with reasonable new data. In other words, we can make the model see more diverse training data. For example, if we flip a cat horizontally, this makes sense because the camera could be shooting from the left or the right. But doing a vertical flip makes little sense and is not very suitable because the model is unlikely to receive a cat on its head or feet.

Therefore, we should understand that we must carefully choose the data enhancement method applied to the training set according to the training data and the specific scenarios of the problem domain. In addition, a more effective way is to do some independent experiments on small prototype data sets to measure whether the enhanced model improves performance.

Modern deep learning methods, such as convolutional neural network (CNN), can learn position-independent features in images. Data enhancement can help models learn this property and make them insensitive to changes such as left-to-right and top-to-bottom order, lighting changes in photos, and so on.

These image data enhancements are generally applied to training sets rather than validation and test sets. These data enhancement methods are different from preprocessing methods that require consistency across data sets that interact with the model, such as resizing images and scaling pixel values.

Want results in computer vision direction?

Join me now for a 7-day crash course in email (complete with sample code).

Click to register and get a free PDF version of the course.

Download your free mini lesson

The sample images

We need a sample image to demonstrate standard data enhancement techniques.

In this tutorial, we will use a licensed bird photo by Andya Ontstop called Feathered Friend.

Download this photo and save it in your working directory named ‘bird.jpg’.

Feathered Friend, author andya Ontstop. Retain some power.

  • Image download link (bird.jpg)

Image data enhancement using image Data Agenerator

The Keras framework can automatically use data enhancement when training models.

You can do this using the ImageDataGenerator class.

First, specific parameters can be passed to the constructor at class instantiation time to configure the corresponding enhancement method.

This class supports a number of enhancement methods, including scaling pixel values, but we will focus on the following five main image data enhancement methods:

  • throughwidth_shift_rangeheight_shift_rangeParameter setting Image pan.
  • throughrotation_rangeParameter Setting Image flipping.
  • throughbrightness_rangeParameter Sets the image brightness.
  • throughzoom_rangeParameter Sets image scaling.

An example of instantiating the ImageDataGenerator through the constructor.

Create a data generator
datagen = ImageDataGenerator()
Copy the code

Once constructed, iterators for the dataset are created.

This iterator returns one batch of enhanced data per iteration.

The flow() function creates an iterator from a set of data that has been read into memory, as shown in the following example:

Read the image data set
X, y = ...
Create an iterator
it = dataset.flow(X, y)
Copy the code

Alternatively, you can create an iterator to the data set specified in the file path, in which the data of different subclasses needs to be stored in different subfolders.

.Create an iterator
it = dataset.flow_from_directory(X, y, ...)
Copy the code

After the iterator is created, the neural network can be trained by calling the fit_generator() function.

The steps_per_EPOCH parameter needs to be set to the number of batches that can contain the entire dataset. For example, if your raw data is 10,000 images and your BATCH_size is set to 32, then a reasonable steps_per_epoch should be set to CEIL (10,000/32) when you train a model based on enhanced data, Or 313 batches.

# Define model
model = ...
# Fit the model on the enhanced dataset
model.fit_generator(it, steps_per_epoch=313,...).Copy the code

Images from the dataset are not used directly, but enhanced images are provided to the model. Since the enhanced image representation is random, it is permissible for modified images and data close to the original (i.e., barely enhanced images) to be generated and used in training.

Generators of data can also be used on validation sets and test sets. In general, the ImageDataGenerator for validation and test sets will have the same pixel value scaling configuration as the ImageDataGenerator for training sets (not covered in this tutorial), but data enhancement will not be involved. This is because the purpose of data enhancement is to manually expand the number of training data sets to improve the performance of the model on the unenhanced data sets.

Now that we’re familiar with the use of the ImageDataGenerator, I’ll look ata few specific enhancement methods for image data.

We will show the enhanced images of each method separately. This is a good way to do events and is recommended when configuring your own data enhancement. It is also common to use several enhancements simultaneously during training. For the sake of presentation, we discuss each enhancement in separate sections.

Horizontal and vertical translation is enhanced

Panning means moving all the pixels of an image in one direction, either horizontal or vertical, without changing the size.

This also means that some of the original pixels will be moved out of the image, and there will be an area where the pixel values need to be reassigned.

The width_shift_range and height_shift_range parameters, which control the size of the horizontal and vertical panes, are passed in when the ImageDataGenerator class is constructed.

These two parameters can be specified as a decimal between 0 and 1, representing the translation distance as a percentage of the width or height. Alternatively, it can be specified as an exact pixel value.

Specifically, the actual translation value will be selected between no translation and the percentage (or specific pixel value) that will be used to process the image, and in distance, between [0, value]. Alternatively, you can pass in a specified set of tuples or arrays, specifying specific maximum and minimum values for sampling, for example: [-100, 100] or [-0.5, 0.5].

Here is a code that sets the shift parameter width_shift_range to [-200, 200] pixels and draws the result.

Examples of horizontal translation enhancement
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# Read in the image
img = load_img('bird.jpg')
# Convert to numpy array
data = img_to_array(img)
# Extend dimensions
samples = expand_dims(data, 0)
# Generate data enhancement stacks
datagen = ImageDataGenerator(width_shift_range=[- 200..200])
Prepare the iterator
it = datagen.flow(samples, batch_size=1)
# Generate data and graph
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	Generate a batch image
	batch = it.next()
	# Convert to unsigned integer for easy display
	image = batch[0].astype('uint32')
	# drawing
	pyplot.imshow(image)
# Show pictures
pyplot.show()
Copy the code

Executing this code by configuring the ImageDataGenerator generates an image enhancement instance and creates an iterator. This iterator is executed nine times in a loop and draws the enhanced image each time.

By observing the drawn result, I found that the picture would be randomly translated in positive or negative direction, and the blank area brought by the translation would be filled with pixels of the edge area.

Random translation data enhancement result graph

Here is a similar example of vertical translation by adjusting the height_shift_range parameter, which is set to 0.5.

Examples of vertical translation enhancement
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# read figure
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# Extend dimensions
samples = expand_dims(data, 0)
Create a generator
datagen = ImageDataGenerator(height_shift_range=0.5)
Prepare the iterator
it = datagen.flow(samples, batch_size=1)
# Generate samples and draw pictures
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	# Generate a batch image
	batch = it.next()
	# Convert to shaping display
	image = batch[0].astype('uint32')
	# drawing
	pyplot.imshow(image)
# show
pyplot.show()
Copy the code

By running this code, you can randomly generate enhanced images that are panned forward or negatively.

It can be found that horizontal or vertical displacement, whether positive or negative, can effectively enhance the corresponding image, but those parts that are refilled are meaningless to the model.

It is worth noting that other filling methods can be specified with the “fill_mode” parameter.

Rendering of vertical random translation

Horizontal and vertical flip enhancements

Flipping an image reverses the pixel values of all rows when flipped vertically and all columns when flipped horizontally.

The flipped argument is specified when the ImageDataGenerator class is constructed using the Boolean arguments horizontal_flip or vertical_flip, respectively. For the bird image mentioned earlier, the horizontal flip is meaningful, while the vertical flip is not.

For aerial, astronomical, and microscopic images, vertical flipping is likely to be effective.

The following example is an example of enhancing image flipping by controlling the horizontal_flip parameter.

# horizontal flip example
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# read figure
img = load_img('bird.jpg')
# Convert to numpy array
data = img_to_array(img)
# Extend dimensions
samples = expand_dims(data, 0)
# create generator
datagen = ImageDataGenerator(horizontal_flip=True)
Prepare the iterator
it = datagen.flow(samples, batch_size=1)
# Generate pictures and draw pictures
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	Generate a batch image
	batch = it.next()
	# Convert to integer type for easy display
	image = batch[0].astype('uint32')
	# drawing
	pyplot.imshow(image)
# show
pyplot.show()
Copy the code

Executing this program produces nine enhanced images.

We’ll see that horizontal flipping is only used on some of the images.

Enhanced results of random horizontal flip

Random rotation enhancement

Rotation enhancement is a random clockwise rotation of the image from 0 to 360 degrees.

Rotation also causes parts of the data to be moved out of the picture frame, creating areas with no pixel values that need to be filled

The following example shows the effect of random rotation enhancement by rotating the image with the rotation_range parameter between 0 and 90 degrees.

# Rotation enhancement example
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# read figure
img = load_img('bird.jpg')
# Convert to numpy array
data = img_to_array(img)
# Extend dimensions
samples = expand_dims(data, 0)
# create generator
datagen = ImageDataGenerator(rotation_range=90)
Prepare the iterator
it = datagen.flow(samples, batch_size=1)
# Generate pictures and draw pictures
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	Generate a batch image
	batch = it.next()
	# Convert to integer type for easy display
	image = batch[0].astype('uint32')
	# drawing
	pyplot.imshow(image)
# show
pyplot.show()
Copy the code

Executing this example produces an example of a rotated picture in which white space is filled by the nearest neighbor method.

Image of random rotation enhancement

Random brightness enhancement

Image brightness enhancement can be lightening, darkening, or both.

This is done so that the model can cover different brightness levels during training.

I can pass the Brightness_range argument in the constructor of ImageDataGenerator() to specify a range of maximum and minimum values to select a brightness number.

Values less than 1.0 will darken the image, such as [0.5, 1.0]. Conversely, values greater than 1.0 will make the image brighter, such as [1.0,1.5]. When the value is 1.0, the brightness will not change.

The following example shows the effect of random brightness enhancement varying between 0.2 (20%) and 1.

# Example of brightness enhancement
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# read figure
img = load_img('bird.jpg')
# Convert to numpy array
data = img_to_array(img)
# Extend dimensions
samples = expand_dims(data, 0)
# create generator
datagen = ImageDataGenerator(brightness_range=[0.2.1.0])
Prepare the iterator
it = datagen.flow(samples, batch_size=1)
# Generate pictures and draw pictures
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	Generate a batch image
	batch = it.next()
	# Convert to integer type for easy display
	image = batch[0].astype('uint32')
	# drawing
	pyplot.imshow(image)
# show
pyplot.show()
Copy the code

Run the sample and you will see how different values are dimmed.

Image generated by random brightness enhancement

Random scaling enhancement

Zoom enhancement is the random scaling of an image by adding pixels or interpolation around the image.

Zoom_range is passed in the constructor of the ImageDataGenerator class to configure the scale to scale. This parameter can be a floating point number or an array or tuple.

If specified as a floating-point number, the range of scaling is between [1-value, 1 + value]. For example, if you set the parameter to 0.3, your zoom range will be between [0.7, 1.3], in other words between 70% (zoom in) and 130% (zoom out).

The scale is obtained from a uniform random sampling of each dimension (width, height) in the scale area.

The scaling parameters are a little unintuitive. It should be noted that when the value is less than 1.0, the image will be enlarged. For example, [0.5, 0.5] will make the object in the image 50% larger or closer. Similarly, if the value is greater than 1.0, the image will be reduced by 50%. When the coefficient is 1.0, nothing changes in the picture.

The following example shows an example of making objects larger in an image.

# Scaling enhancement example
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# read figure
img = load_img('bird.jpg')
# Convert to numpy array
data = img_to_array(img)
# Extend dimensions
samples = expand_dims(data, 0)
# create generator
datagen = ImageDataGenerator(zoom_range=[0.5.1.0])
Prepare the iterator
it = datagen.flow(samples, batch_size=1)
# Generate pictures and draw pictures
for i in range(9) :# Define subgraph
	pyplot.subplot(330 + 1 + i)
	Generate a batch image
	batch = it.next()
	# Convert to integer type for easy display
	image = batch[0].astype('uint32')
	# drawing
	pyplot.imshow(image)
# show
pyplot.show()
Copy the code

Run the sample to get a scaled image, which shows an example that varies in width and height. The aspect ratio of the image changes due to the difference in width and height scaling.

Random zoom enhancement renderings

Further reading

The division provides additional resources for your further study.

publications

  • Image Augmentation for Deep Learning With Keras

API

  • Image Preprocessing Keras API
  • Keras Image Preprocessing Code
  • Sequential Model API

The article

  • Building powerful image classification models using very little data, Keras Blog.

conclusion

This tutorial explores the application of image data enhancement to model training.

Here’s what you should learn:

  • Image data enhancement is to expand the training data set to improve the performance and generalization ability of the model.
  • You can get image data enhancement support on Keras by using the ImageDataGenerator class.
  • How to use pan, flip, brightness, and zoom enhancement methods.

Any other questions? Leave a comment below and I’ll do my best to answer your questions.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.