This is the 18th day of my participation in the August Genwen Challenge.More challenges in August

Convolutional Neural Networks (CNN) are a kind of Feedforward Neural Networks with deep structure and Convolutional computation. It is one of the representative algorithms of deep learning. The convolutional neural network is capable of representation learning and shift-invariant classification of input information according to its hierarchical structure. For this reason it is also known as “shift-invariant Artificial Neural Networks (SIANN)”. (From Baidu Baike)

As a paramedic, CNN is a common network, and its shadow can be seen in many fields of image processing. In this chapter, I will introduce the core of convolutional neural network, the convolutional layer, and the calculation of the number of parameters of this layer.

1. Convolution kernel

First let’s look at what a convolution kernel is:

The convolution kernel is a two-dimensional n* N sequence (n is usually odd, such as 1*1, 3*3, 5*5, etc.), above which is a 3*3 convolution kernel.

Introduce a single layer of 5*5 “image” :

The so-called “convolution” is the process of summing the inner product of the “image” with the convolution kernel in order:

The calculation of the above steps is shown below:

Set step size s=1 (step size can be adjusted), and calculate the next step:

Here are the complete steps (image source) :

Ii. Neurons

Each neuron is composed of N convolution kernels plus a bias value B, and the value of n is the same as the layer number of the input image. For example, if we want to convolve an RGB three-color photo, we need 3 convolution kernels. The sum of the sum of the inner product above is summed again (in human terms, it is a convolution of the convolution of the layer in charge of itself, add the three results), and finally add the result to the transpose value B, and get a number (the previous classic figure, was also learned by looking at this figure) :

The other thing to note is that if the length of the image is not exactly contained by the convolution kernel, it will be expanded next to it, which is the gray zero grid in the graph.

Repeating the procedure in one, we can use one neuron to get a layer of output. If you use multiple neurons, you get multiple layers of output, the depth of the output.

Therefore, the depth of the output of the convolution layer is determined by the number of neurons. Of course, the convolution kernel has to be exactly the same size.

Here’s the full GIF:

Third, convolution mode

It should be noted that there is more than one convolution mode, divided into three categories:

  1. full
  2. same
  3. valid

full

The beginning and end signs are: the convolution kernel edge touches the image edge:

(The selected part is the convolution kernel)

This convolution pattern doesn’t seem to be very common (or maybe I haven’t studied it enough).

same

The beginning and end signs are: the center of the convolution kernel coincides with the image edge:

It is called same because when the step size s=1, the size of the output layer of the convolution layer is exactly the same as the size of the input layer (again, the depth is determined only by the number of neurons and has nothing to do with the convolution mode).

valid

The beginning and end signs are: the convolution kernel is fully entered with the image:

Valid is the convolution mode introduced above, which is the most common mode and has been introduced many times. Here is a formula for calculating the size of the output layer:

Image size -> a* A * D convolution kernel size -> NUMBER of I * I neurons -> N step size -> S neurons: D convolution kernels plus one transpose number B Output layer size: [(A-i)/ S + 1]*[(A-i)/ S + 1]* NCopy the code

(Three important things: the number of output layers is determined by the number of neurons)

Four, the calculation of the number of parameters

The parameters of the convolution layer are mainly located in the convolution kernel, and the convolution kernel that we did so much computation above is adjusted by back propagation.

Another easy to forget is that the transpose value B is also part of the parameter, used for the overall adjustment.

In addition, this is the total number of neurons, so we get the following formula:

Image size -> a* A * D convolution kernel size -> number of I * I neurons -> number of n parameters: N *(I * I *d+1)Copy the code

(Note: step size does not affect the number of parameters, but will affect the amount of calculation)

Here’s an example:

One image size: 150*150*3

We continue the convolution so that the output is 146 times 146 times 32

To complete the convolution operation, we set the step size s=1, the convolution kernel size I * I =5*5, and the number of neurons n=32 (I don’t think I need to repeat myself).

Number of parameters =32*(5*5*3+1)=2432.