directory

2.0 Brief introduction to convolutional neural networks

2.1 Two-dimensional convolution layer

2.1.1. Two-dimensional cross-correlation operation

2.1.2. Object edge detection in images

2.1.3. VGGNet instance edge detection analysis


2.0 Brief introduction to convolutional neural networks

This chapter introduces convolutional neural networks. It is the foundation of deep learning in computer vision in recent years. It is also increasingly being used in other fields such as natural language processing, recommendation systems and speech recognition. We will first describe the working principle of convolutional layer and pooling layer in convolutional neural network, and explain the meanings of fill, step, input channel and output channel. After mastering these basic knowledge, we will explore the design ideas of several representative deep convolutional neural networks. These models include the first proposed AlexNet, as well as the later networks using repeating elements (VGG), networks of networks (NiN), networks with parallel links (GoogLeNet), residual networks (ResNet) and dense connections (DenseNet). Many of them have flourished in the ImageNet competition, a well-known computer vision competition, over the past few years. Although depth models look like neural networks with many layers, obtaining effective depth models is not easy. Fortunately, the batch normalization and residual networks described in this chapter provide two important ideas for training and designing depth models.

2.1 Two-dimensional convolution layer

Convolutional neural network is a neural network with convolutional layer. The convolutional neural networks introduced in this chapter all use the most common two-dimensional convolutional layer. It has two spatial dimensions, height and width, and is often used to process image data. In this section, we will introduce the working principle of two-dimensional convolution layer in simple form.

2.1.1. Two-dimensional cross-correlation operation

Although the convolution layer is named after convolution operations, we usually use more intuitive cross-correlation operations in the convolution layer. In the two-dimensional convolution layer, a two-dimensional input array and a two-dimensional kernel array output a two-dimensional array through cross-correlation operation. Let’s use a concrete example to illustrate the meaning of two-dimensional cross-correlation operations. As shown in Figure 5.1, the input is a two-dimensional array of 3 heights and 3 widths. Let’s call the shape of this array 3×33×3 or (3, 3). The height and width of the core array are 2 respectively. This array is also called convolution kernel or filter in convolution calculation. The shape of the convolution kernel window (also known as the convolution window) depends on the height and width of the convolution kernel, that is, 2×22×2. The shaded part in Figure 5.1 is the first output element and the input and core array elements used in the calculation: 0×0+1×1+3×2+4×3=190×0+1×1+3×2+4×3=19.

Figure 2.1 two-dimensional cross-correlation operation

In the two-dimensional cross-correlation operation, the convolution window starts at the top left of the input array and slides on the input array from left to right and from top to bottom. When the convolution window slides to a certain position, the input subarray and the kernel array in the window are multiplied and summed by elements to obtain the elements at the corresponding position in the output array. The height and width of the output array in Figure 5.1 are 2 respectively, and the four elements are calculated by two-dimensional cross-correlation:

0 * 0 + 1 * 1 + 3 * 2 + 4 * 3 = 19,

1 * 0 + 2 * 1 + 4 * 2 + 5 * 3 = 25,

3 * 0 + 4 * 1 + 6 * 2 + 7 * 3 = 37,

4 x 0 + 5 * 2 + 1 + 7 8 x 3 = 43.

Let’s implement the above process in a CORr2D function. It takes an input array X and a core array K and outputs an output array Y.

Code:

import numpy as np

def corr2d(X,K) :
    h,w = X.shape
    h1,w1 = K.shape
    Y = np.zeros(shape=(h - h1 + 1, w - w1 +1))

    for i in range(Y.shape[0) :for j in range(Y.shape[1]):
            Y[i,j] = np.sum(X[i:i+h1,j:j+w1]*K)

    return(Y)


X = np.array([[0.1.2], [3.4.5], [6.7.8]])
K = np.array([[0.1], [2.3]])

Y = corr2d(X,K)
print(Y)
Copy the code

Output:

[[19. 25.]
 [37. 43.]]
Copy the code

2.1.2. Object edge detection in images

Let’s look at a simple application of convolution layer: detect the edge of an object in an image, that is, find the position of the pixel change. First we construct a 6×86×8 image (that is, an image with a height of 6 pixels and a width of 8 pixels). It is listed as black (0) and white (1).

In [4]:
Copy the code
import numpy as np
X = np.ones((6, 8))
X[:, 2:6] = 0
X
Copy the code
Out[4]:
Copy the code
[[1. 1. 0. 0 0. 0. 1. 1.] [1. 1. 0. 0. 0. 0. 1. 1.] [1. 1. 0. 0. 0. 0. 1. 1.] [1. 1. 0. 0. 0. 0. 1. 1.] [1. 1. 0. 0. 0. [1. 0. 0.Copy the code

Then we construct a convolution kernel K of height and width 1 and 2 respectively. When it cross-correlation with the input, if the horizontal adjacent elements are the same, the output is 0. Otherwise the output is non-zero.

In [5]:
Copy the code
K = np.array([[1, -1]])
Copy the code

Now let’s cross correlation the input X with the convolution kernel K that we designed. As you can see, we detected the white-to-black edge as 1 and the black-to-white edge as -1. The rest of the output is all zeros.

In [6]:
Copy the code
Y = corr2d(X, K)
Y
Copy the code
Out[6]:
Copy the code
[[0. 1. 0. 0 0. 1. 0.] [0. 1. 0. 0. 0. 1. 0.] [0. 1. 0. 0. 0. 1. 0.] [0. 1. 0. 0. 0. 1. 0.] [0. 1. 0. 0. 0. [0. 0. 0.] [0.Copy the code

Thus, we can see that the convolution layer can effectively represent the local space through repeated use of the convolution kernel. Used to extract the edge information of the picture. The function of multi-layer convolution is to convolve the upper convolution results so as to obtain more complex boundary information for combination. Now let’s understand the function of each layer of convolution through VGGNet.

2.1.3. VGGNet instance edge detection analysis

At the first level, the network might learn something as simple as a diagonal. At each level, the network is able to combine these discoveries and continue to learn more complex concepts. This all sounds vague. Zeiler and Fergus (2013) have done an excellent job of visualizing CNN learning. This is the CNN that they used in their paper. The Vgg16 model that won the Imagenet contest was based on this.

CNN by Zeiler&Fergus (2013)

This picture looks confusing now, don’t panic! Let’s start with something we can all see in this photo. First, the input image is square and 224×224 pixels. The filter I talked about earlier is 7×7 pixels. The model has an input layer, seven hidden layers and an output layer. The C in the output layer refers to the number of classes predicted by the model. Now let’s look at the most interesting thing: what the model learns at different levels!

The second floor of CNN

The left image represents what CNN has learned, and the right image represents part of the actual image.

At LEVEL 2 of CNN, the model has acquired more interesting shapes than diagonals.

◆ In the sixth square (horizontal count), you can see the model picking up round shapes

◆ In addition, the last square is picking up the corner.

The third floor of CNN

At level 3, we can see the model start to learn more concrete things.

◆ The first box shows that the model can now recognize geographic patterns

◆ The sixth square is identifying car tires

◆ The eleventh square is identifying people.

CNN’s floors 4 and 5

Finally, floors 4 and 5 continue this trend. Level 5 is picking up bells that are very useful for our dog and cat problems. It also picks up unicycles and bird/reptile eyes. Note that these images show only a small portion of each layer’s learning.