Convolutional neural networks basically take a bunch of small matrices and dot them with a big matrix, and you get local information. Look at the picture (from the Internet, for illustration only) :

A matlab solving code is given:

A1 = [0 0 0 0 0 0 0; 0 0 1 1 0 2 0; 0 2 2 2 2 1 0; 0 1 0 0 2 0 0; 0 0 1 1 0 0 0; 0 1 2 0 0 2 0; 0 0 0 0 0 0 0]; A2 =[0 0 0 0 0 0 0; 0 1 0 2 2 0 0; 0 0 0 0 2 0 0; 0 1 2 1 2 1 0; 0 1 0 0 0 0 0; 0 1 2 1 1 1 0; 0 0 0 0 0 0 0]; A3 =[0 0 0 0 0 0 0; 0 2 1 2 0 0 0; 0 1 0 0 1 0 0; 0 0 2 1 0 1 0; 0 0 1 2 2 2 0; 0 2 1 0 0 1 0; 0 0 0 0 0 0 0]; w1 = [-1 1 0 0 1 0 0 1 1]; W3 = [0 0 1 0 1 0 1-1-1] a = zeros(3,3); w3 = [0 0-1 0 1 0 1-1-1] a = zeros(3,3); for i = 1:2:5 for j = 1:2:5 sum1 = sum(sum(A1(i:i+2,j:j+2).*w1)); sum2 = sum(sum(A2(i:i+2,j:j+2).*w2)); sum3 = sum(sum(A3(i:i+2,j:j+2).*w3)); a((i+1)/2,(j+1)/2) = sum1 + sum2 + sum3 + 1; end endCopy the code

Input a picture represented by a matrix, there are three channels (RGB), and deconvolution it with two convolution kernels, each of which has three channels. Why the convolution kernel changes from three channels to two channels is that after the three-channel convolution of 3, the final convolution result is added and biased. Still not understand? You calculate whether the FilterW0 layer and the InputVolume layer are equal to 6 when you multiply them by bias 1, and so on and so forth to calculate the final output matrix. Now let’s talk about how much depth (D), width (W) and length (L) of the convolution image are programmed. First of all, there is a concept called padding. In fact, in Tensorflow and Keras, there are two methods for convolution: valid and same. Valid means that there are no zeros added to both sides, and same means that there are the same number of zeros added to both sides. Ok, so the question is, how many zeros do we have on both sides? Don’t worry, this is fixed.

 

  • 3 by 3, the padding is usually 1, 2 times 1 on both sides is 2
  • 5 by 5, the padding is usually 2, 2 times the padding on both sides is 4
  • The depth of the convolution matrix (W) is equal to the size of the convolution kernel. In other words, a convolution kernel generates a feature_map. The green matrix is actually a Feature_map. And then length and width, which is the same thing because it’s a square matrix, there’s a formula:




  • Doesn’t it make a lot of sense? Convert to symbols:
  • S is the stride, that is, the step length, where W/H represents the width/length, P represents the complement of one side (draw the key side), and K represents the size of the convolution kernel (for example, 5*5 is 5). Ok, so one more question, why do we add 1? Well, let’s just think about it this way how many numbers do you have between 1 and 9? Is it 9 minus 1 plus 1?