It contains 11345 words and is expected to last 23 minutes or longer
An organization in the Netherlands called Vlinderstichting collects large numbers of butterflies every year. Some volunteers help identify the types of butterflies in the garden, while Vlinderstichting collects information and analyzes the results.
Since it is volunteers who identify the butterfly species, errors are inevitable, leading to Vlinderstichting staff checking the submitted information themselves, which can be a huge waste of time.
Specifically, there are three types of butterflies that many people misjudge. These are
• Meadow brown or Maniola jurtina
• Gatekeeper or Pyronia Tithonus
• Small heath or Coenonympha pamphilus
This article describes the steps to distinguish the first two butterflies using a deep learning model.
Use the Flickr API to download images
In order to train the convolutional neural network, it is necessary to find correctly classified butterfly images. To be efficient, we need an automated way to get images — using the Flickr API via Python.
Set the Flickr API
First of all, PIP can download flickrapi installation package (https://pypi.org/project/flickrapi/2.3/). Then create API keys in Flickr web page (https://www.flickr.com/services/api/misc.api_keys.html) to connect to the Flickr API.
In addition to the Flickrapi package, you need to import OS and URllib packages for downloading images and setting up directories.
from flickrapi import FlickrAPI
import urllib
import os
import configCopy the code
In the configuration template, define the public and key for the Flickr API. This is just a Python script (config.py) with the following encoding:
API_KEY = 'XXXXXXXXXXXXXXXXX' // replace with your key
API_SECRET = 'XXXXXXXXXXXXXXXXX' // replace with your secret
IMG_FOLDER = 'XXXXXXXXXXXXXXXXX' // replace with your folder to store
the imagesCopy the code
For security reasons, these keys are listed in a separate file. This allows you to save your code in a public repository like GitHub or BitBucket, and put config.py in.gitignore. The code can then be shared without worrying about others accessing your credentials.
To separate the different categories of butterfly images, we wrote a function download_Flickr_photos, Will be detailed explanation (https://github.com/bertcarremans/Vlindervinder/tree/master/flickr)
The input parameters
First, check that the input parameter types and values are correct. If you’re right, ask a question. A description of the parameters can be found in the function’s docString.
if not (isinstance(keywords, str) or isinstance(keywords, list)):
raise AttributeError('keywords must be a string or a list of strings')
if not (size in ['thumbnail'.'square'.'medium'.'original']):
raise AttributeError('size must be "thumbnail", "square", "medium" or "original"')
if not (max_nb_img == -1 or (max_nb_img > 0 and isinstance(max_nb_img,
int))):
raise AttributeError('max_nb_img must be an integer greater than zero or equal to -1')Copy the code
Second, define some parameters that will be used later in the walk method. Create a list of keywords and select which URL to download the image from.
if isinstance(keywords, str):
keywords_list = []
keywords_list.append(keywords)
else:
keywords_list = keyword
sif size == 'thumbnail':
size_url = 'url_t'
elif size == 'square':
size_url = 'url_q'
elif size == 'medium':
size_url = 'url_c'
elif size == 'original':
size_url = 'url_o'Copy the code
Connect to the Flickr API
When calling the Flickr API, use the API keys defined in the configuration module.
flickr = FlickrAPI(config.API_KEY, config.API_SECRET)
Create subfolders for each butterfly category
Store different butterfly images in different subfolders. Name each folder after a butterfly, provided by the keyword. If the subfolder does not already exist, create it.
results_folder = config.IMG_FOLDER + keyword.replace(""."_") + "/"
if not os.path.exists(results_folder):
os.makedirs(results_folder)Copy the code
Browse Flickr library
photos = flickr.walk(
text=keyword,
extras='url_m',
license='1,2,4,5',
per_page=50)Copy the code
Use the Flickr API’s Walk method to search for images with the specified keywords. This method uses the same parameters as the Flickr API search method.
In text parameters, search for related images using keywords. Then in the additional parameters, formulate URL_M for small and medium sized images. More about the image size and the URL please see Flickcurl C library (http://librdf.org/flickcurl/api/flickcurl-searching-search-extras.html).
Next, select the non-commercial licensed image in the license parameters. For more information about license codes and what they mean, see the Flickr API platform. Finally, the per_page parameter specifies the number of images per page.
So, you get a generator called photos to download the image.
Download Flickr images
With the Photos generator, you can download images from all your search queries. Find the URL that specifically downloads the image, add the count variable, and use this counter to create the file name of the image.
The image was downloaded using the URlRetrieve method and saved in the butterfly species file. If an error occurs, print an error message.
for photo in photos:
try:
url=photo.get('url_m')
print(url)
count += 1
urllib.request.urlretrieve(url, results_folder + str(count)
+".jpg")
except Exception as e:
print(e, 'Download failure')Copy the code
To download multiple butterfly images, create a list and call the download_Flickr_photos function in the for loop. To keep things simple, we’ve only downloaded images of two of the three butterflies.
butterflies = ['meadow brown butterfly'.'gatekeeper butterfly']
for butterfly in butterflies:
download_flickr_photos(butterfly)Copy the code
Image data expansion
Training convnet with a small number of images leads to overfitting. As a result, the model makes errors in classifying new images that it has never seen before. Data scaling can avoid this. Fortunately, there are some nice tools in Keras that make it easy to convert images.
The larger the training, the less likely the classification is to be wrong. So we need to give Convnet more images of butterflies than it has so far. One simple solution is data augmentation. Basically, you apply a set of transformations to the Flickr image.
Keras provides a wide range of image conversion (https://keras.io/preprocessing/image/). But first, the images must be transformed before Keras can process them.
Convert images to numbers
First import the Keras module. Use an example of an image to explain the image transformation. Use the load_img method again.
from keras.preprocessing.image import ImageDataGenerator, array_to_img,
img_to_array, load_img
i = load_img('data/train/maniola_jurtina/1.jpg' )
x = img_to_array(i)
x = x.reshape((1,) + x.shape)Copy the code
The load_img method creates a Python image library file. It needs to be converted to a Numpy array for later use in the ImageDataGenerator method, which is implemented using the IMG_to_array method. You end up with a 75x75x3 array. These dimensions reflect width, height, and RGB values.
In fact, each pixel of the image has 3 RGB values. The range is 0 to 255, reflecting the intensity of red, green and blue. The value is inversely proportional to the intensity. For example, a pixel is represented by [78, 136, 60]. Black is 0, 0, 0.
Finally, an extra dimension needs to be added during the transformation to avoid numeric errors. This step uses the remodeling function.
Continue the transformation.
rotating
Specify a value between 0 and 180, and Keras rotates the image at a random Angle. Either clockwise or counterclockwise. In this example, the image is rotated at most 90°.
The image data generator also has the fill_mode parameter. The default value is the closest. By rotating the image within the width and height of the original image, you end up with “empty” pixels. Fill_mode fills the blank space with the closest pixel.
imgGen = ImageDataGenerator(rotation_range = 90)
i = 1
for batch in imgGen.flow(x, batch_size=1,
save_to_dir='example_transformations', save_format='jpeg',
save_prefix='trsf'):
i += 1
if i > 3:
breakCopy the code
The flow method determines where to transform the image. Make sure the directory exists! It also prefixes the name of the new image for convenience. The flow method can be used indefinitely, but in this case just three images will do. So when counter hits this value, it breaks the for loop. The results are as follows:
The width of the migration
In the width_shift_range parameter, specify the size of the original width ratio to allow the image to move left and right. Fill_mode will again fill the newly created empty portion of the pixel. For the rest of the examples, I show you how to instantiate the image generator with different parameters. The encoding used to generate the image is the same as in the rotation example.
imgGen = ImageDataGenerator(width_shift_range = 90)
In the converted image you will see that the image has moved to the right. Once empty pixels are filled, they have the sensation of being extended outward.
Moving up or down can also be done by assigning a value to the height_shift_range parameter.
To adjust the
Rescaling the image selects a value before any other program to handle multiple RGB values per pixel. This example uses minimum-maximum scaling to handle values. It turns out that these values are going to be between zero and one. This makes the values smaller and the model easier to work with.
imgGen = ImageDataGenerator(rescale = 1./255)
shear
Using the Shear_range parameter, you can specify how the shear_range transformation is applied. This conversion does not produce too strange an image even if the value is too high. Don’t set it too high, though.
ImgGen = ImageDataGenerator(Shear_range = 0.2)
The zoom
This transformation will scale inside the image. Just like cropping parameters, values should not be too large in order to keep the image realistic.
ImgGen = ImageDataGenerator(zoom_range = 0.2)
Flip horizontal
This transformation flips the image horizontally. Life can be simple sometimes…
imgGen = ImageDataGenerator(horizontal_flip = True)
Combine all transformations
Now that you’ve seen the effects of each transformation, apply all of them together.
ImgGen = ImageDataGenerator(rotation_range = 40, width_shift_range = 0.2, height_shift_range = 0.2, rescale = 1./255, Shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True) I = 1for batch in imgGen.flow(x, batch_size=1,
save_to_dir='example_transformations',
save_format='jpeg', save_prefix='all'):
i += 1
if i > 3:
breakCopy the code
Setting folder structure
Save these images in a specific folder structure. Therefore, the flow_FROm_directory method can be used to expand the image and create corresponding labels. The folder structure needs to look like this:
• “train”
• maniola_jurtina
• 0. JPG
• 1. JPG
•…
• pyronia_tithonus
• 0. JPG
• 1. JPG
•…
The validation,
• maniola_jurtina
• 0. JPG
• 1. JPG
•…
• pyronia_tithonus
• 0. JPG
• 1. JPG
•…
In order to set up the structure, we created the gist img_train_test_split. Py (https://gist.github.com/bertcarremans/679624f369ed9270472e37f8333244f5).
Creating a generator
As mentioned earlier, give the training generator structural parameters. Validation images will not be converted to training images. Just divide the RGB value to make it smaller.
The flow_FROm_directory method takes images from the training or validation folder and generates 32 batches of transformed images. You can create one-dimensional labels based on the image file name by setting class_mode to “binary”.
Train_datagen = ImageDataGenerator(rotation_range = 40, width_shift_range = 0.2, height_shift_range = 0.2, Rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True) validation_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory('data/train',
batch_size=32,
class_mode='binary')
validation_generator = validation_datagen.flow_from_directory(
'data/validation',
batch_size=32,
class_mode='binary')Copy the code
How to deal with images of different sizes?
The Flickr API will download images of a specific size. In real-world applications, however, the size of the image is not constant. If the aspect ratio is the same, just resize the image, or crop the image. Unfortunately, it’s difficult to crop an image while keeping the target intact.
Keras can handle images of different sizes. When configuring the model, you can enter “null values” for width and height in input_shape.
input_shape=(3, None, None) # Theano
input_shape=(None, None, 3) # TensorflowCopy the code
We want to show that it is possible to handle images of different sizes, but there are some disadvantages.
• Not all layers (such as collage) use “null values” as input dimensions.
• Running computations can be heavy
Build a deep learning model
Next, convolutional neural structures are discussed and illustrated with some examples from the Butterfly project. The results of the first classification will be available at the end of the article.
What layers do convolutional neural networks consist of?
Of course, you can select layers and types to add to the convolutional neural network (also known as CNN or CONVnet). In this project, you can start with the following structure:
Learn what each layer does and how to create them with Keras.
The input layer
These different versions of the image are embellished with several transformations. The image is then converted into a digital representation or matrix.
The dimension of the matrix is width x height x number of channels (color). For RGB images, the number of channels is three. Is one for grayscale images. A numerical representation of the 7×7 image can be seen below.
If the image size is 75×75, it needs to be specified in the input_shape parameter when adding the first convolution layer.
CNN = Sequential() cnn.add(Conv2D(32,(3,3), input_shape = (3, 75,75))Copy the code
Convolution layer
At the first level, convolutional neural networks look for low-level features, like horizontal or vertical edges. As you dig deeper into the neural network, it’s looking for higher-level features, like butterfly wings. But how does it look for features when only numbers are input?
Filters (or kernels)
You can think of filters as searchlights that scan images of a particular size. The following filter example is 3×3×3 and contains weights to detect vertical edges. For grayscale images, the size is 3x3x1. Typically, the size of the filter is smaller than the image you want to classify. 3×3, 5×5 or 7×7 are commonly used. The third dimension should always be equal to the number of channels.
As the image is scanned, the RGB value is converted by multiplying the RGB value by the filter weight. Finally, the values over all channels are summed. In the case of a 7x7x3 instance image and a 3x3x3 filter, the result will be a 5x5x1 size when multiplied.
The following animation illustrates this convolution operation. For convenience, look for vertical edges only within the red channel. So both the green and blue channels have weights of 0. But remember that what you get when you multiply these channels is going to be added to the red channel.
As shown below, the convolution layer generates numbers. The higher the number, it means the filter has found a signature. In this case, the vertical edge.
We can specify that more filters are required. These filters use their characteristics to find images. Suppose you take 32 filters with a size of 3x3x3, and you stack the results of all the filters together and, in this case, end up with a size of 5x5x32. In the code snippet above, 32 3x3x3 filters have been added.
The lines of span
In the example above, you can see that the filter is promoted one pixel at a time. This is called the row span. You can increase the number of pixels by increasing the filter. Increasing the row span reduces the size of the original image more quickly. In the following example we will see how the filter moves at span 2, and finally the 3x3x3 filter and the 7x7x3 image will produce 3x3x1 results.
fill
With filters, the size of the original image is rapidly reduced. In particular, the pixels on the edge of the image will only be used once in the convolution operation, which will lead to information loss. To avoid this, you can specify padding. Padding is simply adding “extra pixels” around the image.
If you fill a pixel around a 7x7x3 image, the result is a 9x9x3 image. Apply a 3x3x3 filter and a row span, and you’ll end up with 7x7x1. So in that case, keep the size of the original image and use external pixels more than once.
The result of the convolution operation can be calculated using the following padding and row span:
1+[(original size + fill size x 2- filter size)/row span size]
For example, suppose we have this vector convolution setup:
• 7x7x3 graphics
• 3x3x3 filter
• Fill 1 pixel
• Row span is 2 pixels
The given operation is 1 + [(7 + 1 x 2-3) / 2] = 4
Why do we need a convolution layer
One advantage of using the CONV layer is that the number of parameters estimated is much smaller. There are far fewer hidden layers than normal. Assume that the convolution operation of the 7x7x3 image and the 3x3x3 filter continues under the condition of no padding and span 1. The convolution layer will have a deviation of 5x5x1 + 1 =26 weights to estimate. In a neural network with inputs of 7x7x3 and hidden layer neurons of 5x5x1, 3,675 weights need to be estimated. Imagine what that number would be when the picture is bigger…
Activation function layer
Or rectilinear unit layers. This layer adds nonlinearity to the network. The convolution layer is the straight line layer, which adds the filter weight multiplied by the RGB value.
For all values x <= 0, the result is 0 after activating the function. Or it’s equal to the x value. The activation function layer code added in Keras is:
CNN. The add (Activation (‘ relu))
Pooling layer
Pooling brings the input values together to further reduce the size. Because the number of parameters to estimate is reduced, the calculation time is faster. In addition, hardening the network will help avoid overfitting. Maximum pooling is illustrated below in terms of size 2×2 span 2.
The 2×2 code in Keras to add a pooling layer is:
cnn.add(MaxPooling2D(pool_size = (2 ,2)))
Finally, ConvNet can detect higher-level features in the input image. This can then be used as input to the fully connected layer. Tile the last activation function layer before you can do this. Tiling means converting it to a vector. Vector values are then connected to all the neurons in the fully connected layer. To do this in Python, use the following Keras functionality:
cnn.add(Flatten())
cnn.add(Dense(64))
Random inactivation
Just like pooling, random inactivation can help avoid overfitting. During the training of the model, the specified part of the input is randomly set to 0. Random inactivation rates between 20% and 50% indicate good work.
CNN. The add (Dropout (0.2))
Sigmoid activation
To create the possibility that the image is one of two types of butterflies (i.e., binary classification), you can use the SigmoID activation function layer.
cnn.add(Activation('relu'))
cnn.add(Dense(1))
cnn.add(Activation( 'sigmoid'))Copy the code
The convolution neural network is applied to the butterfly image
The complete convolutional neural network structure can now be defined as the one shown at the beginning of this article. First, you need to import the necessary Keras modules. Then you can start adding the layers we talked about earlier.
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Flatten, Dense, Dropout
from keras.preprocessing.image import ImageDataGenerator
import time
IMG_SIZE = # Replace with the size of your images
NB_CHANNELS = # 3 for RGB images or 1 for grayscale images
BATCH_SIZE = # Typical values are 8, 16 or 32
NB_TRAIN_IMG = # Replace with the total number training images
NB_VALID_IMG = # Replace with the total number validation imagesCopy the code
We have clearly listed some additional parameters for the CONV layer. Here’s a quick introduction:
• kernel_size Specifies the filter size. So for the first conv layer, size is 2×2
• Pooling = ‘same’ means applying zero pooling while preserving the original image
• Pooling = ‘effective’ means no pooling is required
• Only the number of color channels specified in input_shape is established last
CNN = Sequential() cnn.add(Conv2D(filters=32, kernel_size=(2,2), strides=(1,1), padding='same',
input_shape=(IMG_SIZE,IMG_SIZE,NB_CHANNELS),
data_format='channels_last'))
cnn.add(Activation('relu'Strides =2) cnn.add(MaxPooling2D(pool_size=(2,2), strides=2)) cnn.add(Conv2D(filters=64, kernel_size=(2,2), strides=(1,1), padding='valid'))
cnn.add(Activation('relu') cnn.add(MaxPooling2D(pool_size=(2,2), strides=2)) cnn.add(Flatten()) cnn.add(Dense(64)) cnn.add(Activation(Dense(64))'relu'CNN)). The add (Dropout (0.25)) CNN. The add (Dense (1)) CNN. The add (Activation ('sigmoid'))
cnn.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=
['accuracy'])Copy the code
Finally, the network structure is compiled, and the parameter of loss is set as binary crossentropy (binary_crossentropy), which is useful for binary targets, and the accuracy is taken as the evaluation index.
After the neural network structure is established, generators are created for training and validating samples. The above data expansion method is applied when training the samples. When you validate, you don’t do any extensions, because the extensions are just to evaluate the performance of the model.
Train_datagen = ImageDataGenerator(rotation_range = 40, width_shift_range = 0.2, height_shift_range = 0.2, Rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True) validation_datagen = ImageDataGenerator(rescale = 1./255) train_generator = train_datagen.flow_from_directory('.. /flickr/img/train'
, target_size=(IMG_SIZE,IMG_SIZE),
class_mode='binary',
batch_size = BATCH_SIZE)
validation_generator = validation_datagen.flow_from_directory(
'.. /flickr/img/validation',
target_size=(IMG_SIZE,IMG_SIZE),
class_mode='binary',
batch_size = BATCH_SIZE)Copy the code
With flow_FROm_directory on the generator, you can easily browse through all images in the specified directory.
Finally, convolutional neural networks can be used to train data and evaluate and validate data. The weight results of the model can be saved and used later.
start = time.time()
cnn.fit_generator(
train_generator,
steps_per_epoch=NB_TRAIN_IMG//BATCH_SIZE,
epochs=50,
validation_data=validation_generator,
validation_steps=NB_VALID_IMG//BATCH_SIZE)
end = time.time()
print('Processing time:',(end - start)/60)
cnn.save_weights('cnn_baseline.h5')Copy the code
Set the era to 50 arbitrarily. An epoch is a period of forward propagation, which checks for errors and adjusts weights as it propagates backward.
The steps_per_EPOCH parameter is set to the number of training images divided by the batch size (by the way, the double division symbol ensures that the result is an integer rather than a floating point number). Specifying a batch size larger than the author speeds up the process. Idem stands for validation_STEPS parameter.
The results of
After 50 eras of operation, the accuracy of training is 0.8091 and the accuracy of verification is 0.7359. Therefore, convolutional neural networks still have some over-fitting situations. It can also be seen that the accuracy of verification varies greatly. This is because the validation sample array is small. Cross-validation with k-folds is preferred for each round of evaluation.
To solve the over-fitting phenomenon, we can:
• Increase random inactivation rate
• Apply random deactivation on each layer
• Find more training data
We looked at the first two operations and tested the results. The results of the first model will serve as a baseline. When the random inactivation function layer was applied and the random inactivation rate was increased, the over-fitting of the model was improved.
All code can be found on the lot: https://github.com/bertcarremans/Vlindervinder
Recommended Reading topics
Leave a comment like follow
We share the dry goods of AI learning and development. Welcome to pay attention to the “core reading technology” of AI vertical we-media on the whole platform.
(Add wechat: DXSXBB, join readers’ circle and discuss the freshest artificial intelligence technology.)