Public account: You and the cabin by: Peter Editor: Peter

Hello, I’m Peter

This paper records the first application of convolutional neural network in the field of image recognition: cat and dog image recognition. The main contents include:

  • The data processing
  • Neural network model building
  • Data enhancement implementation

The deep learning framework used in this article is Keras;

Image data from kaggle official website: www.kaggle.com/c/dogs-vs-c…

The data processing

The amount of data

The dataset contains 25,000 images, 12,500 each for cats and dogs; Create a training set of 1000 samples per category, a validation set of 500 samples, and a test set of 500 samples

Note: Only part of the data is taken out for modeling

Create a directory

In [1]:

import os, shutil
Copy the code

In [2]:

current_dir = ! PWD # Current directory current_dir[0]Copy the code

Out[2]:

'/Users/peter/Desktop/kaggle/kaggle_12_dogs&cats/dogs-vs-cats'
Copy the code

Create a new directory to store the required data set:

base_dir = current_dir[0] + '/cats_dogs_small'
os.mkdir(base_dir)  Create a directory
Copy the code
Create directories for training set, validation set, and test set respectively

train_dir = os.path.join(base_dir,"train")
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir,"validation")
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir,"test")
os.mkdir(test_dir)

# Cat, dog training, verification, testing image directory
train_cats_dir = os.path.join(train_dir, "cats")
os.mkdir(train_cats_dir)
train_dogs_dir = os.path.join(train_dir, "dogs")
os.mkdir(train_dogs_dir)

validation_cats_dir = os.path.join(validation_dir, "cats")
os.mkdir(validation_cats_dir)
validation_dogs_dir = os.path.join(validation_dir, "dogs")
os.mkdir(validation_dogs_dir)

test_cats_dir = os.path.join(test_dir, "cats")
os.mkdir(test_cats_dir)
test_dogs_dir = os.path.join(test_dir, "dogs")
os.mkdir(test_dogs_dir)
Copy the code

Data set replication

In [5]:

Train fnames = ['cat.{}.jpg'. Format (I) for I in range(1000)] for fname in fnames: Join (current_dir[0] + "/train", fname) fname) shutil.copyfile(src, dst)Copy the code

In [6]:

Valiation fnames = ['cat.{}.jpg'. Format (I) for I in range(1000,1500)] for fnames in fnames: src = os.path.join(current_dir[0] + "/train", fname) dst = os.path.join(validation_cats_dir, fname) shutil.copyfile(src, dst)Copy the code

In [7]:

Fnames = ['cat.{}.jpg'. Format (I) for I in range(1500,2000)] for fnames in fnames: src = os.path.join(current_dir[0] + "/train", fname) dst = os.path.join(test_cats_dir, fname) shutil.copyfile(src, dst)Copy the code

In [8]:

# Do the same for dog

# 1, 1000 as a training set train
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(current_dir[0] + "/train", fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
# 2. 500 as valiation
fnames = ['dog.{}.jpg'.format(i) for i in range(1000.1500)]
for fname in fnames:
    src = os.path.join(current_dir[0] + "/train", fname)
    dst = os.path.join(validation_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
# 3. 500 copies as test set
fnames = ['dog.{}.jpg'.format(i) for i in range(1500.2000)]
for fname in fnames:
    src = os.path.join(current_dir[0] + "/train", fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)
Copy the code

Check the data

Check how many images are contained in each set (training, verification, test) for the cat and dog categories:

Building a neural network

To review the structure of convolutional neural network: Conv2D layer (using relu activation function) + MaxPooling2D layer stacked alternately.

When larger images and more complex problems are required, a Conv2D layer (using the RELu activation function) + MaxPooling2D layer is added.

The benefits of this:

  • Increase network capacity
  • Reduce the size of the feature map

Note: Cat and dog classification is a dichotomous problem, so the last layer of the network is a single unit (size 1 Dense layer) activated with SIGmoID

In the network, the depth of feature graph increases gradually (from 32 to 128), but the size of feature graph decreases gradually (from 150-150 to 7-7).

  1. Increased depth: The original image is more complex and requires more filters
  2. Size reduction: More convolution and pooling layers compress and abstract the image continuously

The network structures,

In [15]:

import tensorflow as tf
from keras import layers 
from keras import models

model = models.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3.3),activation="relu",
                               input_shape=(150.150.3)))
model.add(tf.keras.layers.MaxPooling2D((2.2)))

model.add(tf.keras.layers.Conv2D(64, (3.3),activation="relu"))
model.add(tf.keras.layers.MaxPooling2D((2.2)))

model.add(tf.keras.layers.Conv2D(128, (3.3),activation="relu"))
model.add(tf.keras.layers.MaxPooling2D((2.2)))

model.add(tf.keras.layers.Conv2D(128, (3.3),activation="relu"))
model.add(tf.keras.layers.MaxPooling2D((2.2)))  # 

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation="relu"))
model.add(tf.keras.layers.Dense(1, activation="sigmoid"))

model.summary()
Copy the code

Model compilation (optimization)

The last layer of the network is a single sigmoID cell, which uses binary cross entropy as the loss function

In [16]:

# 翻 译 : From keras import optimizers

from tensorflow.keras import optimizers

model.compile(loss="binary_crossentropy",
             optimizer=optimizers.RMSprop(lr=1e-4),
             metrics=["acc"])
Copy the code

Data preprocessing

Data must be converted into floating point tensors before being fed into the neural network.

Keras a image processing module: keras. Preprocessing, image.

It includes the ImageDataGenerator class, which allows you to quickly create Python generators that process graphics files into tensor batches

How to understand generators in Python?

Data preprocessing

  1. Read the file
  2. Convert files from JPEG files to RGB pixel networks
  3. The grid of pixels is transformed into a floating point tensor

In [18]:

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255)  # zoom
test_datagen = ImageDataGenerator(rescale=1./255)  # zoom

train_generator = train_datagen.flow_from_directory(
    train_dir,  # Directory to be processed
    target_size=(150.150),  # Image size Settings
    batch_size=20,
    class_mode="binary"  The # loss function is binary_crossentropy so use binary tags
)

validation_generator = test_datagen.flow_from_directory(
    validation_dir,  # Directory to be processed
    target_size=(150.150),  # Image size Settings
    batch_size=20,
    class_mode="binary"  The # loss function is binary_crossentropy so use binary tags
)
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Copy the code

In [19]:

for data_batch, labels_batch in train_generator:
    print(data_batch.shape)
    print(labels_batch.shape)
    break
(20.150.150.3)
(20.)Copy the code

The output of the generator is a batch of 150-150 RGB images and binary labels with shapes of (20,). Each batch contains 20 samples (the size of the batch).

The generator keeps generating these batches, constantly looping images in the target folder.

The KERAS model uses the FIT_generator method to fit the effects of the generator. The model has a parameter, the STEps_per_EPOCH parameter: After the steps_per_EPOCH batch is extracted from the generator, the fitting goes to the next round.

In this example: there are 2000 samples in total, and each batch is 20 samples, so 100 batches are needed

The model fitting

In [20]:

history = model.fit_generator(
    train_generator,  The first argument must be a Python generator
    steps_per_epoch=100.# 2000/20
    epochs=30.# number of iterations
    validation_data=validation_generator,  # Data set to be validated
    validation_steps=50 
)
Copy the code

Save the model

In [21]:

Save ("cats_and_dogs_small.h5")Copy the code

Loss and accuracy curves

In [22]:

import matplotlib.pyplot as plt
%matplotlib inline
Copy the code

In [23]:

history_dict = history.history  # dictionary form
for key, _ in history_dict.items():
    print(key)
loss
acc
val_loss
val_acc
Copy the code

In [24]:

acc = history_dict["acc"]
val_acc = history_dict["val_acc"]

loss = history_dict["loss"]
val_loss = history_dict["val_loss"]
Copy the code

In [25]:

epochs = range(1.len(acc)+1)

# acc
plt.plot(epochs, acc, "bo", label="Training acc")
plt.plot(epochs, val_acc, "b", label="Validation acc")
plt.title("Training and Validation acc")
plt.legend()

plt.figure()

# loss
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and Validation loss")
plt.legend()

Copy the code

Summary: the conclusion of overfitting is obtained

  1. Over time, training accuracy continues to increase, approaching 100%, while verification accuracy stays at 70%
  2. The verification loss reached its minimum after round 6 and remained constant for a certain number of rounds, while the training loss continued to decrease and approached zero

Data augmentation – Data augmentation

What is data enhancement

Data enhancement is also a method to solve overfitting. The other two methods are:

  1. dropout
  2. Weight attenuation regularization

What is data enhancement: Generate more training data from existing training samples and increase the data samples by using various random variations that can generate trusted images.

The model does not look at two identical images during training

Setting data Enhancement

In [26]:

datagen = ImageDataGenerator(
    rotation_range=40.# Angle value 0-180
    width_shift_range=0.2.# Range of horizontal and vertical directions; Proportion to the total width or height
    height_shift_range=0.2,
    shear_range=0.2.# Angle of random tangent transformation
    zoom_range=0.2.# Random zoom Angle of the image
    horizontal_flip=True.# Flip half the image horizontally at random
    fill_mode="nearest"  The method used to fill newly created pixels
)
Copy the code

Display the enhanced image

In [27]:

from keras.preprocessing import image

fnames = [os.path.join(train_cats_dir,fname) for fname in os.listdir(train_cats_dir)]
img_path = fnames[3]
Copy the code

In [28]:

# Read the image and resize it
img = image.load_img(img_path, target_size=(150.150))  
# convert to array
x = image.img_to_array(img)

# shape into (1,150,150,3)
x = x.reshape((1,) + x.shape)

i = 0
for batch in datagen.flow(x, batch_size=1) :# Generate image batch after random transformation
    plt.figure()   
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4= =0:
        break  # The loop is infinite and needs to end at some point
        
plt.show()
Copy the code

A new convolutional neural network with Dropout layer

Data enhancement to train the network so that it does not see the same input twice. But the inputs are still highly correlated, and overfitting cannot be completely eliminated.

Consider adding a Dropout layer that precedes the dense sorting connector

In [29]:

import tensorflow as tf
from keras import layers 
from keras import models

model = models.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3.3),activation="relu",
                               input_shape=(150.150.3)))
model.add(tf.keras.layers.MaxPooling2D((2.2)))

model.add(tf.keras.layers.Conv2D(64, (3.3),activation="relu"))
model.add(tf.keras.layers.MaxPooling2D((2.2)))

model.add(tf.keras.layers.Conv2D(128, (3.3),activation="relu"))
model.add(tf.keras.layers.MaxPooling2D((2.2)))

model.add(tf.keras.layers.Conv2D(128, (3.3),activation="relu"))
model.add(tf.keras.layers.MaxPooling2D((2.2)))  # 

model.add(tf.keras.layers.Flatten())

# Add content
model.add(tf.keras.layers.Dropout(0.5))

model.add(tf.keras.layers.Dense(512, activation="relu"))
model.add(tf.keras.layers.Dense(1, activation="sigmoid"))

model.compile(loss="binary_crossentropy",
             optimizer=optimizers.RMSprop(lr=1e-4),
             metrics=["acc"])
Copy the code

Training convolutional Neural Network using Data enhancer (Error Resolution)

About error resolution: we have 2000 training images, 1000 verification images, and 1000 test images.

  • Steps_per_epoch =100, batch_size=32, so the data should be 3200, obviously the input training data is not enough.
  • Validation_steps =50, batch_size=32, so the data should be 1600, obviously not enough validation data.

Therefore, steps_per_EPOCH =2000/32≈63, validation_steps=1000/32≈32.

In [44]:

# Enhancement of training data
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

Validation data cannot be enhanced
test_datagen = ImageDataGenerator(rescale=1.0 / 255)

train_generator = train_datagen.flow_from_directory(
    train_dir,  # target directory
    target_size=(150.150),  # resize
    batch_size=32,
    class_mode="binary"
)

validation_generator = test_datagen.flow_from_directory(
    validation_dir,
    target_size=(150.150),
    batch_size=32,
    class_mode="binary"
)

# optimizer: Error reported with modifications
history = model.fit_generator(
    train_generator,
    The original # steps_per_epoch = 100,
    steps_per_epoch=63.Steps_per_epoch =2000/32≈63
    epochs=100,
    validation_data=validation_generator,
    The original # validation_steps = 50
    validation_steps=32  Material 32 # validation_steps = 1000/32
)
Copy the code

Model preservation:

# Save the model
model.save("cats_and_dogs_small_2.h5")
Copy the code

Loss and accuracy curves

In [46]:

history_dict = history.history  # dictionary form

acc = history_dict["acc"]
val_acc = history_dict["val_acc"]
loss = history_dict["loss"]
val_loss = history_dict["val_loss"]
Copy the code

Specific drawing code:

epochs = range(1.len(acc)+1)

# acc
plt.plot(epochs, acc, "bo", label="Training acc")
plt.plot(epochs, val_acc, "b", label="Validation acc")
plt.title("Training and Validation acc")
plt.legend()

plt.figure()

# loss
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and Validation loss")
plt.legend()

plt.show()
Copy the code

Conclusion: After using data enhancement, the model no longer fits, and the training set curve closely follows the verification curve. The accuracy is 81%, which is better than before.