introduce
Skills that do well in deep learning hacking contests (or, frankly, any data science hacking contest) usually boil down to feature engineering. How creative can you be when you don’t have enough data to build a successful deep learning model?
I speak from my own experience of participating in multiple deep learning hacking contests, where we were given data sets containing hundreds of images — not nearly enough to win or even complete the top of the leaderboard. So how do we deal with this problem?
The answer? Well, that depends on the skills of the data scientist! This is where our curiosity and creativity come to the fore. That’s the idea behind feature engineering — how well we can come up with new features based on existing features. The same idea applies when we deal with image data.
This is the main role of image enhancement. This concept is not limited to hacking contests — we use it in both industrial and real-world deep learning modeling projects!
Image enhancement helps me expand my existing data set without time and effort. And I’m sure you’ll find this technique very helpful for your own projects.
Therefore, in this article, we will learn about the concept of image enhancement, why it is useful and what the different image enhancement techniques are. We will also implement these image enhancement techniques to build an image classification model using PyTorch.
directory
- Why do we need image enhancement?
- Different image enhancement techniques
- Basic criteria for selecting the right enhancement techniques
- Case study: Using image enhancement to solve image classification problems
Why do we need image enhancement?
Deep learning models usually require large amounts of data for training. In general, the more data, the better the performance of the model. But accessing massive amounts of data presents its own challenges. Not everyone has the deep pockets of a big corporation.
The lack of data made our [] deep learning model (courses.analyticsvidhya.com/courses/com… -learning-Pytorch) may not be able to learn patterns or functions from data, and thus may not provide good performance on unseen data.
So what do we do in that situation? Instead of spending days collecting data manually, we can use image enhancement.
Image enhancement is the process of generating new images to train our deep learning model. These new images are generated using existing training images, so we don’t have to collect them manually.
There are a variety of image enhancement techniques, and we will discuss some of the common and most widely used techniques in the next section.
Different image enhancement techniques
The image rotation
Image rotation is one of the most commonly used enhancement techniques. It helps our model become robust to changes in object orientation. Even if we rotate the image, the information in the image remains the same. A car is just a car, even if we look at it from different angles:
Therefore, we can use this technique to increase the amount of data by creating a rotating image from the original image. Let’s see how to rotate the image:
Import all required libraries
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import skimage.io as io
from skimage.transform import rotate, AffineTransform, warp
from skimage.util import random_noise
from skimage.filters import gaussian
import matplotlib.pyplot as plt
% matplotlib inline
Copy the code
I will use this image to demonstrate different image enhancement techniques. You can also try other images on your own request.
Let’s import the image and visualize it:
# reading the image using its path
image = io.imread('emergency_vs_non-emergency_dataset/images/0.jpg')
# shape of the image
print(image.shape)
# displaying the image
io.imshow(image)
Copy the code
This is the original image. Now let’s see how we can rotate it. I will use the rotation function of the SkImage library to rotate the image:
print('Rotated Image')
#rotating the image by 45 degrees
rotated = rotate(image, angle=45, mode = 'wrap')
#plot the rotated image
io.imshow(rotated)
Copy the code
Very good! Set the mode to “wrap” and fill the points outside the input boundary with the remaining pixels of the image.
Pan image
There may be cases where objects in the image are not perfectly centered and aligned. In these cases, you can use image translation to add translation invariance to the image.
By moving the image, we can change the position of objects in the image, thus making the model more diverse. The result is a more generic model.
Image translation is a geometric transformation that maps the position of each object in an image to a new position in the final output image.
After the shift operation, the object at position (x, y) in the input image is shifted to the new position (x, y) :
- X = x + dx
- Y = y + dy
Where dx and dy are displacements along different dimensions respectively. Let’s see how shift can be applied to an image:
Apply the pan operation
transform = AffineTransform(translation=(25.25))
wrapShift = warp(image,transform,mode='wrap')
plt.imshow(wrapShift)
plt.title('Wrap Shift')
Copy the code
Translation hyperparameter defines the number of pixels an image should move. Here, I moved the image by (25,25) pixels. You are free to set the value of this hyperparameter.
Again, I use the “wrap” mode, which fills the points outside the input boundary with the remaining pixels of the image. In the output above, you can see that both the height and width of the image have been moved by 25 pixels.
Flip the image
Rollover is an extension of rotation. It allows us to flip images left, right and up and down. Let’s look at how to implement the rollover:
#flip image left-to-right
flipLR = np.fliplr(image)
plt.imshow(flipLR)
plt.title('Left to Right Flipped')
Copy the code
Here, I use NumPy’s fliplr function to flip the image from left to right. It flips the pixel values for each row and confirms the same output. Similarly, we can flip the image up and down:
# Flip the image up and down
flipUD = np.flipud(image)
plt.imshow(flipUD)
plt.title('Up Down Flipped')
Copy the code
This is how we can flip the image and make a more general model that learns from the original image as well as the flipped image. Adding random noise to an image is also an image enhancement technique. Let’s take an example to understand it.
Add noise to the image
Image noise is an important enhancement step that enables our model to learn how to separate signal and noise in the image. This also makes the model more robust to changes in input.
We’ll use the “random_noise” function of the “Skipage” library to add some random noise to the original image
I took the standard deviation of noise to be 0.155 (you can also change this value). Keep in mind that increasing this value adds more noise to the image and vice versa:
The standard deviation of noise to be added to the image
sigma=0.155
Add random noise to the image
noisyRandom = random_noise(image,var=sigma**2)
plt.imshow(noisyRandom)
plt.title('Random Noise')
Copy the code
We can see that random noise has been added to the original image. Try different standard deviation values and see what you get.
Blurred image
All lovers of photography will immediately understand the idea.
Images come from different sources. As a result, the image quality from each source will be different. Some images may be of high quality, while others may be of poor quality.
In this case, we can blur the image. How will that help? Well, that helps make our deep learning model more powerful.
Let’s see how we can do that. We will use a Gaussian filter to blur the image:
# Blur images
blurred = gaussian(image,sigma=1,multichannel=True)
plt.imshow(blurred)
plt.title('Blurred Image')
Copy the code
Sigma is the standard deviation of the Gaussian filter. I’m going to call that one. The higher the sigma value is, the stronger the blur effect is. Setting Multichannel to true ensures that each channel of the image is filtered separately.
Again, you can try to change the ambiguity with different sigma values.
These are some image enhancement techniques that help make our deep learning model robust and scalable. This also helps increase the size of the training set.
We are about to complete the implementation part of this tutorial. Before we do that, let’s look at some basic guidelines to determine the right image enhancement technique.
Basic criteria for selecting the right enhancement techniques
I think it’s important to have some guidelines when deciding on enhancements based on the problem you’re trying to solve. Here is a brief overview of these guidelines:
- The first step in any model-building process is to ensure that the size of the input matches the size expected by the model. We also have to make sure that all images should be similar in size. To do this, we can resize our image to the appropriate size.
- Suppose you are dealing with a classification problem and the sample data is relatively small. In this case, different enhancement techniques can be used, such as image rotation, image noise, inversion, shift, etc. Keep in mind that all of these operations apply to classification problems that are irrelevant to the position of objects in the image.
- If you are working on an object detection task where the location of the object is what we want to detect, these techniques may not be appropriate.
- The standardization of image pixel value is a good strategy to ensure better and faster convergence of the model. If the model has specific requirements, we must preprocess the image according to the requirements of the model.
Now, without waiting, let’s move on to the model building part. We will apply the enhancement techniques discussed in this article to generate images, which will then be used to train the model.
We will examine the classification of emergency vehicles and non-emergency vehicles. If you read my previous PyTorch article, you should be familiar with the description of the problem.
The goal of the project is to classify vehicle images into emergency and non-emergency categories. You guessed it, it’s an image classification problem. You can download the dataset here.
Load data set
Let’s get started! Let’s load the data into the notebook first. Then, we will apply image enhancement technology, and finally, build a convolutional neural network (CNN) model.
Let’s import the required libraries:
# import libraries
from torchsummary import summary
import pandas as pd
import numpy as np
from skimage.io import imread, imsave
from tqdm import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from skimage.transform import rotate
from skimage.util import random_noise
from skimage.filters import gaussian
from scipy import ndimage
Copy the code
Now we will read the CSV file containing the image name and its corresponding label:
Load the data set
data = pd.read_csv('emergency_vs_non-emergency_dataset/emergency_train.csv')
data.head()
Copy the code
0 indicates that the vehicle is a non-emergency vehicle, and 1 indicates that the vehicle is an emergency vehicle. Now let’s load all the images from the dataset:
# load image
train_img = []
for img_name in tqdm(data['image_names']):
image_path = 'emergency_vs_non-emergency_dataset/images/' + img_name
img = imread(image_path)
img = img/255
train_img.append(img)
train_x = np.array(train_img)
train_y = data['emergency_or_not'].values
train_x.shape, train_y.shape
Copy the code
There are 1646 images in the dataset. Let’s break this data down into training and validation sets. We will use validation sets to evaluate the performance of the model on unseen data:
train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size = 0.1, random_state = 13, stratify=train_y)
(train_x.shape, train_y.shape), (val_x.shape, val_y.shape)
Copy the code
** I left the “test_size” at 0.1, so 10% of the data will be randomly selected as a validation set and the remaining 90% will be used to train the model. ** The training set has 1481 images, which is quite small for training deep learning models.
So, next, we will add these training images to increase the training set and possibly improve the performance of the model.
Enhance image
We will use the image enhancement techniques discussed earlier:
final_train_data = []
final_target_train = []
for i in tqdm(range(train_x.shape[0])):
final_train_data.append(train_x[i])
final_train_data.append(rotate(train_x[i], angle=45, mode = 'wrap'))
final_train_data.append(np.fliplr(train_x[i]))
final_train_data.append(np.flipud(train_x[i]))
final_train_data.append(random_noise(train_x[i],var=0.2六四运动2))
for j in range(5):
final_target_train.append(train_y[i])
Copy the code
We generated four enhanced images for each of the 1,481 images in the training set. Let’s convert the image as an array and verify the size of the dataset:
len(final_target_train), len(final_train_data)
final_train = np.array(final_train_data)
final_target_train = np.array(final_target_train)
Copy the code
This confirms that we have enhanced the image and increased the size of the training set. Let’s visualize these enhancements:
fig,ax = plt.subplots(nrows=1,ncols=5,figsize=(20.20))
for i in range(5):
ax[i].imshow(final_train[i+30])
ax[i].axis('off')
Copy the code
The first image here is the original image from the dataset. The remaining four images were generated using different image enhancement techniques (rotation, flipping from left to right, flipping up and down, and adding random noise).
Our data set is now ready. It was time to define the structure of our deep learning model and then train it on an enhanced training set. Let’s start by importing all the functions from PyTorch:
# PyTorch libraries and modules
import torch
from torch.autograd import Variable
from torch.nn import Linear, ReLU, CrossEntropyLoss, Sequential, Conv2d, MaxPool2d, Module, Softmax, BatchNorm2d, Dropout
from torch.optim import Adam, SGD
Copy the code
We must convert the training set and validation set to PyTorch format:
Convert the training image to torch format
final_train = final_train.reshape(7405.3.224.224)
final_train = torch.from_numpy(final_train)
final_train = final_train.float(a)Convert target to Torch format
final_target_train = final_target_train.astype(int)
final_target_train = torch.from_numpy(final_target_train)
Copy the code
Again, we will transform the validation set:
Convert the validation image to torch format
val_x = val_x.reshape(165.3.224.224)
val_x = torch.from_numpy(val_x)
val_x = val_x.float(a)Convert target to Torch format
val_y = val_y.astype(int)
val_y = torch.from_numpy(val_y)
Copy the code
Model structure
Next, we will define the structure of the model. This is a bit complicated because the model structure contains four convolution blocks, followed by four fully connected layers:
torch.manual_seed(0)
class Net(Module) :
def __init__(self) :
super(Net, self).__init__()
self.cnn_layers = Sequential(
Define 2D convolution layers
Conv2d(3.32, kernel_size=3, stride=1, padding=1),
ReLU(inplace=True),
# Add batch Normalization layer
BatchNorm2d(32),
MaxPool2d(kernel_size=2, stride=2),
# add dropout
Dropout(p=0.25),
Define another 2D convolution layer
Conv2d(32.64, kernel_size=3, stride=1, padding=1),
ReLU(inplace=True),
# Add batch Normalization layer
BatchNorm2d(64),
MaxPool2d(kernel_size=2, stride=2),
# add dropout
Dropout(p=0.25),
Define another 2D convolution layer
Conv2d(64.128, kernel_size=3, stride=1, padding=1),
ReLU(inplace=True),
# Add batch Normalization layer
BatchNorm2d(128),
MaxPool2d(kernel_size=2, stride=2),
# add dropout
Dropout(p=0.25),
Define another 2D convolution layer
Conv2d(128.128, kernel_size=3, stride=1, padding=1),
ReLU(inplace=True),
# Add batch Normalization layer
BatchNorm2d(128),
MaxPool2d(kernel_size=2, stride=2),
# add dropout
Dropout(p=0.25),
)
self.linear_layers = Sequential(
Linear(128 * 14 * 14.512),
ReLU(inplace=True),
Dropout(),
Linear(512.256),
ReLU(inplace=True),
Dropout(),
Linear(256.10),
ReLU(inplace=True),
Dropout(),
Linear(10.2))# Define the forward process
def forward(self, x) :
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
Copy the code
Let’s define the other hyperparameters of the model, including the optimizer, learning rate, and loss function:
# defining the model
model = Net()
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.000075)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
print(model)
Copy the code
Training model
Training 20 Epochs for our deep learning model:
torch.manual_seed(0)
# Batch size of model
batch_size = 64
# Train the epoch number of the model
n_epochs = 20
for epoch in range(1, n_epochs+1):
train_loss = 0.0
permutation = torch.randperm(final_train.size()[0])
training_loss = []
for i in tqdm(range(0,final_train.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = final_train[indices], final_target_train[indices]
if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
optimizer.zero_grad()
outputs = model(batch_x)
loss = criterion(outputs,batch_y)
training_loss.append(loss.item())
loss.backward()
optimizer.step()
training_loss = np.average(training_loss)
print('epoch: \t', epoch, '\t training loss: \t', training_loss)
Copy the code
This is a summary of the training phase. You will notice that training loss decreases as the EPOCH increases. Let’s save the weights of the trained models so we can use them in the future without retraining the model:
torch.save(model, 'model.pt')
Copy the code
If you do not want to train the model in your terminal, you can use this link to download the weights of the model that has trained 20 epochs.
Next, let’s load this model:
the_model = torch.load('model.pt')
Copy the code
Test the performance of our model
Finally, let’s predict the training set and verification set and check their accuracy:
torch.manual_seed(0)
# Predictive training set
prediction = []
target = []
permutation = torch.randperm(final_train.size()[0])
for i in tqdm(range(0,final_train.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = final_train[indices], final_target_train[indices]
if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
with torch.no_grad():
output = model(batch_x.cuda())
softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)
prediction.append(predictions)
target.append(batch_y)
# Training accuracy
accuracy = []
for i in range(len(prediction)):
accuracy.append(accuracy_score(target[i].cpu(),prediction[i]))
print('training accuracy: \t', np.average(accuracy))
Copy the code
The accuracy of the training set is over 91%! Very promising. But let’s wait and see. We need to do the same check for the validation set:
# Predictive validation set
torch.manual_seed(0)
output = model(val_x.cuda())
softmax = torch.exp(output).cpu()
prob = list(softmax.detach().numpy())
predictions = np.argmax(prob, axis=1)
accuracy_score(val_y, predictions)
Copy the code
The validation accuracy was about 78%. Very good!
endnotes
When we start to get less training data, we can use image enhancement.
In this article, we have introduced most commonly used image enhancement techniques. We learned how to rotate, move and flip images. We also learned how to add random noise to an image or blur it. We then discuss the basic criteria for selecting the right enhancement technique.
You can try these image enhancement techniques on any image classification problem, and then compare performance with and without enhancement. Feel free to share your results in the comments section below.
Also, if you are not familiar with deep learning, computer vision, and image data, it is recommended that you complete the following courses:
- Computer vision with Deep Learning 2.0
The original link: www.analyticsvidhya.com/blog/2019/1…