First, preliminary work

This paper will realize the recognition of characters in the spirit cage. Compared to the previous article, THIS time I adopted the VGG-19 structure and added two parts: prediction and save and load model.

My environment:

  • Locale: Python3.6.5
  • Compiler: Jupyter Notebook
  • Deep learning environment: TensorFlow2.4.1

Recommended Reading:

  • Depth study of 100 cases – convolution neural network (CNN) implementation mnist handwritten numeral recognition | 1 day
  • Deep learning 100 cases – convolution neural network (CNN) color image classification | 2 days
  • Depth study of 100 cases – convolution neural network (CNN) weather identification | 5 days
  • Depth study of 100 cases (VGG – 16) – convolution neural network to identify one piece hat | gang on the sixth day

From the column:100 Examples of Deep Learning

1. Set the GPU

You can skip this step if you are using a CPU

import tensorflow as tf

gpus = tf.config.list_physical_devices("GPU")

if gpus:
    tf.config.experimental.set_memory_growth(gpus[0].True)  # Set GPU memory usage as required
    tf.config.set_visible_devices([gpus[0]],"GPU")
Copy the code

2. Import data

import matplotlib.pyplot as plt
# Support Chinese
plt.rcParams['font.sans-serif'] = ['SimHei']  # is used to display Chinese labels normally
plt.rcParams['axes.unicode_minus'] = False  # is used to display the negative sign normally

import os,PIL

# Set random seeds to reproduce the results as much as possible
import numpy as np
np.random.seed(1)

# Set random seeds to reproduce the results as much as possible
import tensorflow as tf
tf.random.set_seed(1)

from tensorflow import keras
from tensorflow.keras import layers,models

import pathlib
Copy the code
data_dir = "D:/jupyter notebook/DL-100-days/datasets/linglong_photos"

data_dir = pathlib.Path(data_dir)
Copy the code

3. View data

There are six characters in the data set, such as Bai Yuekui, Charles, Hongkou, Mark, Morgan and Ran Bing.

folder meaning The number of
baiyuekui White in Quebec 40
chaersi Charles 76
hongkou Red Chloe 36
make mark 38 a
mogen Morgan 30 pieces of
ranbing RanBing 60 pieces of
image_count = len(list(data_dir.glob('* / *')))

print("The total number of pictures is:",image_count)
Copy the code
Total number of pictures: 280Copy the code

2. Data preprocessing

1. Load data

Use the image_DATASet_from_directory method to load the data from the disk into tf.data.dataset

batch_size = 16
img_height = 224
img_width = 224
Copy the code
"" "about image_dataset_from_directory () articles detailing can refer to: https://mtyjkh.blog.csdn.net/article/details/117018789, "" "
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.1,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)
Copy the code
Found 280 files belonging to 6 classes.
Using 252 files for training.
Copy the code
"" "about image_dataset_from_directory () articles detailing can refer to: https://mtyjkh.blog.csdn.net/article/details/117018789, "" "
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.1,
    subset="validation",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)
Copy the code
Found 280 files belonging to 6 classes.
Using 28 files for validation.
Copy the code

We can output the labels of the dataset through class_names. The labels will correspond alphabetically to the directory name.

class_names = train_ds.class_names
print(class_names)
Copy the code
['baiyuekui', 'chaersi', 'hongkou', 'make', 'mogen', 'ranbing']
Copy the code

2. Visualize data

plt.figure(figsize=(10.5))  The width of the figure is 10 and the height is 5
plt.suptitle("Wechat official Account: STUDENT K")

for images, labels in train_ds.take(1) :for i in range(8):
        
        ax = plt.subplot(2.4, i + 1)  

        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        
        plt.axis("off")
Copy the code

plt.imshow(images[1].numpy().astype("uint8"))
Copy the code

3. Check the data again

for image_batch, labels_batch in train_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break
Copy the code
(16, 224, 224, 3)
(16,)
Copy the code
  • Image_batchIs the tensor of the shape (32,180,180,3). This is a batch of 32 images with the shape 180x180x3 (the last dimension refers to the color channel RGB).
  • Label_batchIs the tensor of the shape (32,), and these labels correspond to 32 pictures

4. Configure the data set

  • Shuffle () : disturb data, detailed introduction about this function can be reference: zhuanlan.zhihu.com/p/42417456
  • Prefetch () : The process of prefetching data to speed up a run is described in my previous two articles.
  • Cache () : The data set is cached in memory to speed up operation
AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
Copy the code

5. The normalized

normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

normalization_train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y))
Copy the code
image_batch, labels_batch = next(iter(val_ds))
first_image = image_batch[0]

# View normalized data
print(np.min(first_image), np.max(first_image))
Copy the code
0.00390696 1.0
Copy the code

Iii. Build vGG-19 network

Between the official model and self-built model to choose one, choose a comment out another, are legitimate VGG-19 ha.

VGG advantages and disadvantages analysis:

  • VGG advantages

The structure of VGG is very simple, with the same convolution kernel size (3×3) and maximum pooling size (2×2) used throughout the network.

  • VGG shortcomings

1) The training time is too long and the adjustment is difficult. 2) The required storage capacity is large, which is not conducive to deployment. For example, the size of the vGG-16 weight value file is more than 500 MB, which is not suitable for installation in an embedded system.

1. Official model (packaged)

I’ll leave this part of the official website model invocation in the next few articles, but I’ll focus on VGG-19

# model = keras.applications.VGG19(weights='imagenet')
# model.summary()
Copy the code

2. Self-built model

from tensorflow.keras import layers, models, Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout

def VGG19(nb_classes, input_shape) :
    input_tensor = Input(shape=input_shape)
    # 1st block
    x = Conv2D(64, (3.3), activation='relu', padding='same',name='block1_conv1')(input_tensor)
    x = Conv2D(64, (3.3), activation='relu', padding='same',name='block1_conv2')(x)
    x = MaxPooling2D((2.2), strides=(2.2), name = 'block1_pool')(x)
    # 2nd block
    x = Conv2D(128, (3.3), activation='relu', padding='same',name='block2_conv1')(x)
    x = Conv2D(128, (3.3), activation='relu', padding='same',name='block2_conv2')(x)
    x = MaxPooling2D((2.2), strides=(2.2), name = 'block2_pool')(x)
    # 3rd block
    x = Conv2D(256, (3.3), activation='relu', padding='same',name='block3_conv1')(x)
    x = Conv2D(256, (3.3), activation='relu', padding='same',name='block3_conv2')(x)
    x = Conv2D(256, (3.3), activation='relu', padding='same',name='block3_conv3')(x)
    x = Conv2D(256, (3.3), activation='relu', padding='same',name='block3_conv4')(x)
    x = MaxPooling2D((2.2), strides=(2.2), name = 'block3_pool')(x)
    # 4th block
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block4_conv1')(x)
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block4_conv2')(x)
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block4_conv3')(x)
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block4_conv4')(x)
    x = MaxPooling2D((2.2), strides=(2.2), name = 'block4_pool')(x)
    # 5th block
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block5_conv1')(x)
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block5_conv2')(x)
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block5_conv3')(x)
    x = Conv2D(512, (3.3), activation='relu', padding='same',name='block5_conv4')(x)
    x = MaxPooling2D((2.2), strides=(2.2), name = 'block5_pool')(x)
    # full connection
    x = Flatten()(x)
    x = Dense(4096, activation='relu',  name='fc1')(x)
    x = Dense(4096, activation='relu', name='fc2')(x)
    output_tensor = Dense(nb_classes, activation='softmax', name='predictions')(x)

    model = Model(input_tensor, output_tensor)
    return model

model=VGG19(1000, (img_width, img_height, 3))
model.summary()
Copy the code
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 224, 224, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv4 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ================================================================= Total params: 143,667,240 non-trainable Params: 0 _________________________________________________________________Copy the code

3. Network structure diagram

Knowledge can refer to the article about convolution calculation: mtyjkh.blog.csdn.net/article/det…

Structure description:

  • The 16 Convolutional layers are used respectivelyblockX_convXsaid
  • Three Fully connected layers are used separatelyfcXwithpredictionssaid
  • Five Pool layers, respectivelyblockX_poolsaid

Vgg-19 contains 19 hidden layers (16 convolution layers and 3 fully connected layers), so it is called VGG-19

Four, compile,

Before you are ready to train the model, you need to set it up a little more. The following was added in the build step of the model:

  • Loss function: Used to measure the accuracy of the model during training.
  • Optimizer: Determines how the model is updated based on the data it sees and its own loss function.
  • Metrics: Used to monitor training and testing steps. The following example uses accuracy, which is the ratio of images that are correctly classified.
# Set optimizer, I changed the learning rate here.
opt = tf.keras.optimizers.Nadam(learning_rate=1e-5)

model.compile(optimizer=opt,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
Copy the code

5. Training model

epochs = 10

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs
)
Copy the code
Epoch 1/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 12 276 ms/s step - loss: 5.4474 accuracy: 0.1501 - val_loss: 6.8601 - val_accuracy: 0.0714 Epoch 2/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 133 ms/s step - loss: 1.7873 - accuracy: 0.3191-val_loss: 6.8396-val_accuracy: 0.4643 Epoch 3/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 137 ms/s step - loss: 1.4631 accuracy: 0.4250 - val_loss: 6.8453 - val_accuracy: 0.5714 Epoch 4/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 130 ms/s step - loss: 1.1500 - accuracy: 0.6090 - val_loss: 6.8554-val_accuracy: 0.3571 Epoch 5/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 130 ms/s step - loss: 1.0349 accuracy: 0.6292 - val_loss: 6.8421 - val_accuracy: 0.4643 Epoch 6/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 131 ms/s step - loss: 1.0131-accuracy: 0.5919-val_loss: 6.8288-val_accuracy: 0.5714 Epoch 7/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 131 ms/s step - loss: 0.6961 accuracy: 0.7776 - val_loss: 6.8388 - val_accuracy: 0.6429 Epoch 8/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 130 ms/s step - loss: 0.3716 - accuracy: 0.8975 - val_loss: 6.8132 - val_accuracy: 0.5714 Epoch 9/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 130 ms/s step - loss: 0.3372 accuracy: 0.8586 - val_loss: 6.8059 - val_accuracy: 0.6071 Epoch 10/10 16/16 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 2 130 ms/s step - loss: Accuracy: 0.9736-val_loss: 6.7767-val_accuracy: 0.8929Copy the code

Vi. Model evaluation

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(12.4))
plt.subplot(1.2.1)
plt.suptitle("Wechat official Account: STUDENT K")

plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1.2.2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
Copy the code

In order to reflect the original VGG-19, the model parameters are not modified in this paper, but the correlation parameters in the model can be modified according to the actual situation to adapt to the actual situation so as to improve the classification effect.

An article on the study of 100 cases (VGG – 16) – convolution neural network to identify one piece hat | gang on the sixth day 】 I made three changes as follows:

  • The model fromVGG-16Instead ofVGG-19.
  • Change the learning_rate from1e-4Change to1e-5
  • The data set was changed

Do you seem to understand something

It doesn’t matter if you don’t understand, and then explain one by one, here to give you an experience first

Save and load the model

This is the simplest way to save and load a model

# Save model
model.save('model/my_model.h5')
Copy the code
# Load model
new_model = keras.models.load_model('model/my_model.h5')
Copy the code

Eight, to predict

# Use the loaded model (new_model) to see the prediction results

plt.figure(figsize=(10.5))  The width of the figure is 10 and the height is 5
plt.suptitle("Wechat official Account: STUDENT K")

for images, labels in val_ds.take(1) :for i in range(8):
        ax = plt.subplot(2.4, i + 1)  
        
        # Display images
        plt.imshow(images[i])
        
        # Need to add a dimension to the image
        img_array = tf.expand_dims(images[i], 0) 
        
        Use the model to predict the people in the picture
        predictions = new_model.predict(img_array)
        plt.title(class_names[np.argmax(predictions)])

        plt.axis("off")
Copy the code

VGG-19 this article actually buried a lot of pits, I am very clever to hide it do not know if you have found. Feel free to discuss your findings in the comments below. For a perfectionist, these imperfections are hard to watch. Let’s see if we can do a couple of articles on that.


Recommended Reading:

  • Depth study of 100 cases – convolution neural network (CNN) implementation mnist handwritten numeral recognition | 1 day
  • Deep learning 100 cases – convolution neural network (CNN) color image classification | 2 days
  • Depth study of 100 cases – convolution neural network (CNN) weather identification | 5 days
  • Depth study of 100 cases (VGG – 16) – convolution neural network to identify one piece hat | gang on the sixth day

From the column:100 Examples of Deep Learning

Wechat. Search a search [K students ah], attention after reply [DL+7] can obtain data