1. VGG model architecture

VGG has been developed by the Visual Geometry Group at Oxford University. Contains two versions: VGG16 and VGG19, with 16 and 19 levels, respectively. This article only introduces VGG16. According to the paper published on arxiv.org, the convolution kernel size of VGG is (3, 3), the maximum pooling layer core size is (2, 2), the activation function of the hidden layer is ReLu, and the activation function of the output layer is Softmax. If we know the shape of the input and output of each layer of the model, we can build a VGG by ourselves using Keras. Fortunately, we don’t need to extract these parameter details of the model from obscure papers, and Keras can give us the full details of the model.

Create a VGG16 model using Keras and load the weights trained on ImageNet:

from keras.applications.vgg16 import VGG16

VGG16_model = VGG16(weights='imagenet')
Copy the code
Using TensorFlow backend.
Copy the code

Since this is a Keras model, is it possible to use the summary() method to view the architecture of the model just as you build your own? The answer is yes.

VGG16_model.summary()
Copy the code
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ================================================================= Total params: 138,357,544 Non-trainable Params: 0 _________________________________________________________________Copy the code

With this information, we have all the information we need to manually build a VGG16 model. SO, roll up your sleeves and do it!

2. Build VGG16 from scratch

This article is built using the Keras functional API, but you can also use the serialization model, so you can try it out for yourself.

2.1 Import Keras model and layer

From the model architecture printed above, it can be seen that VGG16 uses Conv2D, MaxPooling2D, Flatten and Dense. Therefore, we import these layers from keras. Layers.

from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
Copy the code

2.2 Design model layer

VGG16 contains 13 convolutional layers, 3 full connection layers (the last one is the output layer), a total of 16 layers with parameters, which is also the meaning of 16 in VGG16. Of course, there are also five maximum pooling layers and one flat layer, and these layers have no parameters or weights, so VGG does not count these layers in the total number of layers.

# input layer
inputs = Input(shape=(224.224.3))

# convolution layer and maximum pooling layer
conv1 = Conv2D(64, (3.3), padding='same', activation='relu')(inputs)
conv2 = Conv2D(64, (3.3), padding='same', activation='relu')(conv1)
pool1 = MaxPooling2D(pool_size=2)(conv2)

conv3 = Conv2D(128, (3.3), padding='same', activation='relu')(pool1)
conv4 = Conv2D(128, (3.3), padding='same', activation='relu')(conv3)
pool2 = MaxPooling2D(pool_size=2)(conv4)

conv5 = Conv2D(256, (3.3), padding='same', activation='relu')(pool2)
conv6 = Conv2D(256, (3.3), padding='same', activation='relu')(conv5)
conv7 = Conv2D(256, (3.3), padding='same', activation='relu')(conv6)
pool3 = MaxPooling2D(pool_size=2)(conv7)

conv8 = Conv2D(512, (3.3), padding='same', activation='relu')(pool3)
conv9 = Conv2D(512, (3.3), padding='same', activation='relu')(conv8)
conv10 = Conv2D(512, (3.3), padding='same', activation='relu')(conv9)
pool4 = MaxPooling2D(pool_size=2)(conv10)

conv11 = Conv2D(512, (3.3), padding='same', activation='relu')(pool4)
conv12 = Conv2D(512, (3.3), padding='same', activation='relu')(conv11)
conv13 = Conv2D(512, (3.3), padding='same', activation='relu')(conv12)
pool5 = MaxPooling2D(pool_size=2)(conv13)

# flat layer
flat = Flatten()(pool5)

# Fully connected layer
fc1 = Dense(4096, activation='relu')(flat)
fc2 = Dense(4096, activation='relu')(fc1)

# output layer
outputs = Dense(1000, activation='softmax')(fc2)
Copy the code

2.3 Create and Preview the model

After using API function to print VGG16 Model architecture according to the above, design parameters of each layer of the Model and the connection relationship between layers according to the bottle, you can use Model(Inputs, outputs), specify inputs and outputs parameters to create your own VGG16 Model.

my_VGG16_model = Model(inputs=inputs, outputs=outputs)
Copy the code

Use the summary() method to look at your VGG16 model and see if it has the same structure as the previous model.

my_VGG16_model.summary()
Copy the code
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ conv2d_2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 112, 112, 64) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ conv2d_4 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 56, 56, 128) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ conv2d_6 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ conv2d_7 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 28, 28, 256) 0 _________________________________________________________________ conv2d_8 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ conv2d_9 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ conv2d_10 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 14, 14, 512) 0 _________________________________________________________________ conv2d_11 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ conv2d_12 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ conv2d_13 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 7, 7, 512) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 25088) 0 _________________________________________________________________ dense_1 (Dense) (None, 4096) 102764544 _________________________________________________________________ dense_2 (Dense) (None, 4096) 16781312 _________________________________________________________________ dense_3 (Dense) (None, 1000) 4097000 ================================================================= Total params: 138,357,544 Non-trainable Params: 0 _________________________________________________________________Copy the code

3. Classify images using models

3.1 Image preprocessing

To use VGG16 image classification, first need to preprocess the image, into tensors, the following helper function is to accomplish this function, specify the image storage path, return a VGG16 model can process a 4-dimensional tensor (NUMpy multidimensional array).

import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

def path_to_tensor(img_path):
    # PIL load RGB Image as pil.image.image
    img = image.load_img(img_path, target_size=(224.224))
    The pil.image. Image type is converted to a 3-dimensional tensor of format (224, 224, 3)
    x = image.img_to_array(img)
    Convert a 3-dimensional tensor to a 4-dimensional tensor of the form (1, 224, 224, 3)
    tensor = np.expand_dims(x, axis=0)
    # VGG tensor preprocessing
    return preprocess_input(tensor)
Copy the code

3.2 Using models to predict classification

The direct output of the model is an array of 1000 numpy lengths corresponding to the probability of 1000 categories. We do not need to know the probability value of each category here, but only need to know the classification with the highest probability, as the classification predicted by the model. Therefore, model_predict() uses numpy.argmax() to get the most likely category index (which is also the category label of the image) before returning.

def model_predict(model, img_path):
    tensor = path_to_tensor(img_path)
    predict_label = model.predict(tensor)
    return np.argmax(predict_label)
Copy the code

3.3 Comparison test

Use the VGG16_model created above and my_VGG16_model created by myself respectively to predict the same image.

img_path = 'dog.jpeg'

print('VGG16_model predict label: {}'.format(model_predict(VGG16_model, img_path)))
print('my_VGG16_model predict label: {}'.format(model_predict(my_VGG16_model, img_path)))
Copy the code
VGG16_model predict label: 245
my_VGG16_model predict label: 788
Copy the code

We found that the VGG16 we created was different from the VGG16 predicted by the system load. What went wrong? In fact, we just created a model that is the same as the VGG16 architecture, but it has not been trained, the weights of the model are randomly initialized, and the loaded VGG16 has already loaded the pre-trained weights on the ImageNet dataset. You know the problem, right, as long as we set the weights of the model to be the same as VGG16? Try it.

3.4 Setting Weights

Keras’s model provides get_weights() and set_weights() methods to get and set the weights of the model, respectively. So it made sense for me to get the weight of VGG16_model and use that weight to set the weight of my_VGG16_model.

weights = VGG16_model.get_weights()

my_VGG16_model.set_weights(weights)
Copy the code

All is well. Next, run the previous code again to see if the predictions of the two models are consistent.

print('VGG16_model predict label: {}'.format(model_predict(VGG16_model, img_path)))
print('my_VGG16_model predict label: {}'.format(model_predict(my_VGG16_model, img_path)))
Copy the code
VGG16_model predict label: 245
my_VGG16_model predict label: 245
Copy the code

Awsome! Everything is as we expected. You can see that both models give the same value 245. So the question is, what does 245 mean? As mentioned earlier, this value is an index of the 1000 common categories given by ImageNet. Let’s imagine a dictionary whose values are the text names of the categories and whose keys are the labels our model predicts. So where is the dictionary? The answer here: gist.github.com/yrevar/942d…

3.5 Actual Category Name

The author of the link above gives the address of the.pkl file that gets the dictionary loaded with pickle. Download this file locally and load it with pickle.load() to get the dictionary we need.

Note: when the browser saves the downloaded file directly, it automatically adds a.txt suffix. The author has removed this suffix manually. Otherwise, the full file name is imagenet1000_clsid_to_human.pkl.txt.

import pickle

with open('imagenet1000_clsid_to_human.pkl'.'rb') as f:
    cat1000 = pickle.load(f)
Copy the code

Now, finally, what is 245 that our model predicted?

cat1000[245]
Copy the code
'French bulldog'
Copy the code

‘French Bulldog ‘is a French bulldog. From the file name ‘dog.jpeg’, we can initially tell that the model is half right, at least that it has successfully identified a dog. French bulldog or not. See below:

At this point, isn’t it time to give a round of applause to the model we built ourselves? ^^

3.6 Friendlier image classification functions

Now that we have verified that the VGG model we just built works, and we have a text dictionary of 1000 categories, we might as well encapsulate our model to make it easier to use. With that in mind, all we need to do is put together a nice function that takes the image path img_path and returns the text name of the image category. If your English is the same as the author’s, it is recommended to use Google or Baidu translation frequently.

def imgcate(img_path):
    label = model_predict(my_VGG16_model, img_path)
    return cat1000[label]
Copy the code

Readers can experiment with various images and experience the fun of AI.

print(imgcate('dog.jpeg'))
print(imgcate('xx.jpeg'))
Copy the code
French bulldog
maillot, tank suit
Copy the code



Wechat scan qr code to obtain the latest technology original