• A Gentle Introduction to Convolutional Layers for Deep Learning Neural Networks
  • Originally by Jason Brownlee
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: QiaoN
  • Proofreader: HearFishle, Shixi-Li

Photo by Mendhak. All rights reserved.

Convolution and convolution layer are the main building blocks used in convolutional neural networks.

Convolution is simply activating the input through a filter. The activated graph obtained by repeatedly applying the same filter to the input is called feature map and represents the position and intensity of features detected in the input (such as an image).

Convolutional neural networks can learn a large number of filters in parallel and automatically for training data sets under the constraints of specific predictive modeling problems (such as image classification). The result is that highly specific features can be detected at any location in the input image.

In this tutorial, you will learn how convolution works in convolutional neural networks.

After completing this tutorial, you will know:

  • Convolutional neural networks use filters to derive feature maps from inputs that summarize the presence of features detected in the inputs.
  • Filters can be designed by hand, such as line detectors, but the innovation of convolutional neural networks is that filters are learned during training in the context of specific prediction problems.
  • How to calculate the feature mapping of one and two dimensional convolution layers in convolutional neural networks.

Let’s get started.

Tutorial overview

This tutorial is divided into four parts, which are:

  1. Convolution in convolutional neural networks
  2. Convolution in computer vision
  3. The ability to learn filters
  4. Example of convolution layer

Convolution in convolutional neural networks

Convolutional neural network, or CNN, is a neural network model designed specifically for processing two-dimensional image data, although it can also be used for one-dimensional and three-dimensional data.

The core of convolutional neural network is the convolutional layer, which is also the origin of the name of convolutional neural network. The operation that this layer performs is called “convolution.”

In the context of convolutional neural networks, convolution is a linear operation involving a set of weights multiplied by the input, much like traditional neural networks. Since the technique is designed for two-dimensional inputs, this multiplication takes place between an array of input data and a two-dimensional weight array, called a filter or kernel.

The filter is smaller than the input data, and the multiplication applied between the input block of the filter size and the filter is called the dot product. The dot product is the single value produced by multiplying and summing the corresponding elements between the input blocks of the filter size and the filter. Because it produces a single value, this operation is often referred to as “scalar multiplication.”

We deliberately use filters smaller than the input because it allows the same filter (weight set) to be multiplied by the input array multiple times at different points of the input. Specifically, the filter is systematically applied from left to right and top to bottom to each overlap or filter size block of input data.

Systematically applying the same filter to an image is a powerful idea. If the filter is designed to detect a particular type of feature in the input, then systematic application of the filter across the input image gives the chance of finding that feature anywhere in the image. This ability is often called translational invariance, for example, by focusing on whether, rather than where, the feature is present.

Local translation invariance can be a very useful property if we care more about whether a feature exists than exactly where it exists. For example, when determining whether an image contains a face, we don’t need to know exactly where the eyes are at the pixel level, we just need to know that there is one eye on the left and one on the right of the face.

— Page 342, Deep Learning, 2016.

Multiplying the filter once with the input array yields a single value. Because the filter is applied to the input array many times, the result is a two-dimensional array output that represents the filtered input, and this result is called a “feature map.”

After the feature map is created, we can pass each value in the feature map through a non-linear function, such as ReLU, just as we did for the output of the fully connected layer.

Example filter for creating feature maps for two-dimensional inputs

If you come from the field of digital signal processing or related mathematics, you probably understand the convolution operation of matrices as something different. In particular, the filter (core) is flipped before the input is applied. Theoretically, the convolution in convolutional neural network is actually “cross-correlation”. In deep learning, however, it is called a “convolution” operation.

The cross-correlation implemented by many machine learning libraries is called convolution.

— Page 333, Deep Learning, 2016.

So anyway, we have an input, like a pixel value image, and we have a filter, which is also a set of weights, and the filter is systematically applied to the input data to create a feature map.

Convolution in computer vision

For convolutional neural networks, the idea of processing image data with convolution is not new or unique. It is a common technique in computer vision.

Historically, filters have been manually designed by computer vision specialists and then applied to images to produce feature maps or filtered outputs, which somehow make image analysis easier.

For example, here is a manual 3×3 element filter for detecting vertical lines:

0.0, 1.0, 0.0

0.0, 1.0, 0.0

0.0, 1.0, 0.0
Copy the code

Applying this filter to an image produces a feature map containing only vertical lines. This is a vertical line detector.

You can see this in the weight values of the filter: any pixel values on the central vertical line will be positively activated, and any pixel values on the other sides will be negatively activated. Systematically dragging and dropping this filter on the pixel values of the image can only highlight vertical line pixels.

We can also create a horizontal line detector and apply it to the image, for example:

0.0, 0.0, 0.0

1.0, 1.0, 1.0

0.0, 0.0, 0.0
Copy the code

Combining the results of two filters, such as combining two feature maps, will highlight all the lines in the image.

A set of dozens or even hundreds of other small filters can be designed to detect other features in the image.

The novelty of using convolution in neural networks is that the values of the filters are the weights that need to be learned in network training.

The network will learn the feature types extracted from the input. Specifically, in training for stochastic gradient descent, the network must learn to extract features from an image that minimize the loss of the specific task the network was trained to solve, such as extracting features that would be most useful for classifying an image as a dog or cat.

In this case, you can see that this is a very powerful idea.

The ability to learn filters

Learning individual filters for machine learning tasks is a powerful technique.

However, convolutional neural networks achieve more in practice.

Multiple filter

Convolutional neural networks don’t just learn individual filters. In fact, they learn multiple features in parallel for a given input.

For example, the convolution layer typically learns 32 to 512 filters in parallel for a given input.

This gives the model 32 or even 512 different ways of extracting features from inputs, or “learning to see”, and many different ways of “seeing” the input data after training.

This diversity allows for customization, such as not only lines, but also specific lines in specific training data.

Multiple channels

Color images have multiple channels, usually one for each color channel, such as red, green, and blue.

From a data point of view, this means that a single image as input is actually three images on the model.

The filter must always have the same number of channels as the input, often referred to as “depth”. If the input image has three channels (depth of 3), the filter applied to the image must also have three channels (depth of 3). In this case, a 3×3 filter actually has ascending, column and depth of 3x3x3 or [3, 3, 3]. The dot product operation is used to apply the filter to the input to produce a single value, regardless of the depth of the input and the filter.

This means that if the convolution layer has 32 filters, then these 32 filters are not only two-dimensional but three-dimensional for the two-dimensional image input, with specific filter weights for each of the three channels. However, each filter generates a feature map, which means that for the 32 feature maps created, the convolution layer applying 32 filters has an output depth of 32.

Multiple layers

The convolution layer applies not only to input data such as raw pixel values, but also to the outputs of other layers.

Stacking of convolutional layers allows for hierarchical decomposition of inputs.

Filters that consider operating directly on raw pixel values will learn to extract low-level features, such as lines.

A filter operating on the output of the first-line layer may extract features that synthesize low-level features, such as features of multiple lines that can represent shapes.

This process goes all the way down to very deep layers, extracting faces, animals, houses, etc.

This is exactly what we see in practice. With the increase of network depth, feature extraction will be of higher and higher order.

Example of convolution layer

Keras, the deep learning library, provides a series of convolutional layers.

We can better understand the convolution operation by looking at some examples of human data and manual filters.

In this section, we will examine examples of both one-dimensional and two-dimensional convolution layers, both of which embody convolution operations and also provide demonstrations of the use of Keras layers.

An example of a one-dimensional convolution layer

We can define a one-dimensional input with eight elements, with the two raised elements in the middle being 1.0 and the rest 0.0.

[0, 0, 0, 1, 1, 0, 0, 0]
Copy the code

For the one-dimensional convolution layer, the Keras input must be three-dimensional.

The first dimension refers to each input sample, and in this case we only have one sample. The second dimension refers to the length of each sample, which in this case is 8. The third dimension is the number of channels per sample, and in this case we only have one channel.

Therefore, the shape of the input array is [1, 8, 1].

# define input data
data  =  asarray([0,  0,  0,  1,  1,  0,  0,  0])
data  =  data.reshape(1,  8,  1)
Copy the code

We will define a model whose input sample shape is [8, 1].

The model will have a filter with shape 3, or three elements wide. Keras calls the filter’s shape kernel_size.

# create model
model  =  Sequential()
model.add(Conv1D(1,  3,  input_shape=(8,  1)))
Copy the code

By default, filters in the convolution layer are initialized using random weights. In this artificial example, we will manually weight the individual filters. We will define a filter capable of detecting a bulge, which is a high input value surrounded by a low input value, as we defined in the input example.

We define the three-element filter as follows:

[0, 1, 0]
Copy the code

The convolution layer also has a deviation input value, which also requires us to set a weight of 0.

Therefore, we can force the weight of our one-dimensional convolution layer to use a manual filter as shown below:

# define a vertical line detectorWeights = [asarray ([[[0]], [[1]], [[0]]]), asarray ([0.0])]# store the weights in the model
model.set_weights(weights)
Copy the code

The weights must be set in a three-dimensional structure of rows, columns, and channels. The filter has one row, three columns, and one channel.

We can retrieve the weights and verify that they are set correctly.

# confirm they were stored
print(model.get_weights())
Copy the code

Finally, we can apply a single filter to the input data.

We can do this by calling the predict() function on the model. This returns the feature map directly: this is the output of the filter systematically applied to the input sequence.

# apply filter to input data
yhat  =  model.predict(data)
print(yhat)
Copy the code

Putting all this together, the complete sample is listed below.

# example of calculation 1d convolutions
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv1D
# define input data
data = asarray([0, 0, 0, 1, 1, 0, 0, 0])
data = data.reshape(1, 8, 1)
# create model
model = Sequential()
model.add(Conv1D(1, 3, input_shape=(8, 1)))
# define a vertical line detectorWeights = [asarray ([[[0]], [[1]], [[0]]]), asarray ([0.0])]# store the weights in the model
model.set_weights(weights)
# confirm they were stored
print(model.get_weights())
# apply filter to input data
yhat = model.predict(data)
print(yhat)
Copy the code

Running the sample, we first print out the weights of the network, which confirms that our manual filter is set up in the model as we expect.

Next, the filter is applied to the input mode to compute and display the feature map. We can see from the value of the feature map that the bulge is correctly detected.

[array([[[0.]],
       [[1.]],
       [[0.]]], dtype=float32), array([0.], dtype=float32) [[0.] [0.] [1.] [1.] [0.] [0.] [0.]] [0.]Copy the code

Let’s take a closer look at what’s going on.

Recall that the input is an eight-element vector whose values are: [0, 0, 0, 1, 1, 0, 0, 0].

First, by calculating the dot product (“. Operator) applies the three-element filter [0, 1, 0] to the first three inputs [0, 0, 0] of the input, resulting in a single output value of 0 in the feature map.

Remember, the dot product is the sum of the products of the corresponding elements, in this case it’s 0 x 0 + 1 x 0 + 0 x 0 = 0. In NumPy, this can be implemented manually as:

from numpy import asarray
print(asarray([0, 1, 0]).dot(asarray([0, 0, 0])))
Copy the code

In our manual example, the details are as follows:

[0, 0, 0] = 0Copy the code

The filter then moves along an element of the input sequence and repeats the process. Specifically, the same filter is applied to the input sequence at indexes 1,2 and 3, resulting in an output of 0 in the feature map.

[0, 0, 1] = 0Copy the code

We are systematic, so again, the filter moves along another element of the input and applies to the input at indexes 2, 3, and 4. This time the output value in the feature map is 1. We detect this feature and activate it accordingly.

[0, 1, 1] = 1Copy the code

This process is repeated until we calculate the entire feature map.

[0, 0, 1, 1, 0, 0]
Copy the code

Notice that the feature map has six elements, while our input has eight. This is the manual result of the filter applied to the input sequence. There are other ways to apply filters to input sequences to obtain feature maps of different shapes, such as padding, but we will not discuss these methods in this article.

As you can imagine, with different inputs, we can detect features with different strengths, with different weights in the filter, so we will detect different features in the input sequence.

An example of a two-dimensional convolution layer

We can extend the bulge detection sample in the previous section to a two-dimensional image vertical line detector.

Similarly, we can constrain the input, in this case an 8×8 pixel input image with a single channel (such as grayscale) square with a vertical line in between.

[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
Copy the code

The input to a Conv2D (two dimensional convolution layer) must be four dimensional.

The first dimension defines the sample, which in this case is only one sample. The second dimension defines the number of rows, which in this case is 8. The third dimension defines the number of columns, again 8 in this case. Finally, define the number of channels, which in this case is 1.

Therefore, the input must have a four-dimensional Shape [sample, column, row, channel], in this case [1, 8, 8, 1].

# define input datadata = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0] to [0, 0, 0, 1, 1, 0, 0, 0] to [0, 0, 0, 1, 1, 0, 0, 0] to [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1)Copy the code

We will define Conv2D with a single filter, just as we did with the Conv1D sample in the previous section.

The filter will be two-dimensional, a square of shape 3 by 3. This layer will expect the input sample to have shape, in this case [8, 8, 1].

# create modelModel = Sequential() model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1))Copy the code

We will define a filter for a vertical line detector to detect a single vertical line in the input data.

The filter is shown below:

0, 1, 0
0, 1, 0
0, 1, 0
Copy the code

We can implement the following:

# define a vertical line detectorThe detector = [[[[0]], [[1]], [[0]]], [[[0]], [[1]], [[0]]], [[[0]], [[1]], [[0]]]] weights = [asarray (detector), asarray ([0.0])]# store the weights in the model
model.set_weights(weights)
# confirm they were stored
print(model.get_weights())
Copy the code

Finally, we apply the filter to the input image, which will result in a feature map indicating the detection of vertical lines in the input image, as we hope.

# apply filter to input data
yhat = model.predict(data)
Copy the code

The output shape of the feature map will be four-dimensional, [batch, row, column, filter]. We will perform a single batch, and we have a filter (a filter and an input channel), so the output shape is [1,?,?, 1]. We can perfectly print the contents of a single feature map, as follows:

for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Copy the code

Putting all this together, the complete sample is listed below.

# example of calculation 2d convolutions
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
# define input datadata = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0] to [0, 0, 0, 1, 1, 0, 0, 0] to [0, 0, 0, 1, 1, 0, 0, 0] to [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1)# create modelModel = Sequential() model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1))# define a vertical line detectorThe detector = [[[[0]], [[1]], [[0]]], [[[0]], [[1]], [[0]]], [[[0]], [[1]], [[0]]]] weights = [asarray (detector), asarray ([0.0])]# store the weights in the model
model.set_weights(weights)
# confirm they were stored
print(model.get_weights())
# apply filter to input data
yhat = model.predict(data)
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Copy the code

Running the sample, first verify that the manual filter is correctly defined in the layer weights.

Next, print the calculated feature map. From the scale of the numbers we can see that the filter does detect a single strongly activated vertical line in the middle of the feature map.

[array([[[[0.]],
        [[1.]],
        [[0.]]],
       [[[0.]],
        [[1.]],
        [[0.]]],
       [[[0.]],
        [[1.]],
        [[0.]]]], dtype=float32), array([0.], dtype=float32)]

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
Copy the code

Let’s take a closer look at what we calculate.

First, the filter is applied to the upper left corner of the image, or 3×3 elements of the image block. In theory, the image block is three-dimensional, with a single channel and filters of the same size. In NumPy we can’t implement it using the dot() function, we have to use the tensordot() function instead so that we can properly sum over all dimensions, for example:

from numpy import asarray from numpy import tensordot m1 = asarray([[0, 1, 0], [0, 1, 0], [0, 1, 0]]) m2 = asarray([[0, [0, 0, 0], [0, 0, 0]])print(tensordot(m1, m2))
Copy the code

This calculation results in a single output value of 0.0, which means no characteristics are detected. This gives us the first element in the upper-left corner of the feature map.

The manual is as follows:

0, 1, 0 0, 0 0, 1, 0. 0, 0 = 0 0, 1, 0 0, 0, 0Copy the code

The filter moves to the left along a column and repeats the process. Again, this feature is not detected.

0, 1, 0 0, 1 0, 1 0, 0, 1 = 0 0, 1, 0 0, 0, 1Copy the code

Move left to the next column and the feature is detected for the first time and strongly activated.

0, 1, 0 0, 1, 1 0. 0, 1, 1 = 3 0, 1, 0 0, 1, 1Copy the code

This process is repeated until the edge of the filter is on the edge of the input image or on the last column. This gives the last element in the first full row of the feature map.

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
Copy the code

The filter then moves down one row and returns to the first column, repeating the process from left to right, giving the second row of the feature map. Until the bottom of the filter is at the bottom or last line of the input image.

As in the previous section, we can see that the feature map is a 6×6 matrix, smaller than the 8×8 input image due to the limitations of the filter applied to the input image.

read

This section provides additional resources on this topic if you wish to go deeper.

The article

  • Crash course in machine learning convolutional Neural networks

book

  • Chapter 9: Convolutional Networks, Deep Learning, 2016.
  • Chapter 5: Deep Learning in Computer Vision, Deep Learning with Python, 2017.

API

  • Keras Convolutional Layers API
  • numpy.asarray API

conclusion

In this tutorial, you learned how convolution works in convolutional neural networks.

Specifically, you learned:

  • Convolutional neural networks use filters to derive feature maps from inputs that summarize the presence of features detected in the inputs.
  • Filters can be designed by hand, such as line detectors, but the innovation of convolutional neural networks is to learn filters during training in the context of a particular prediction problem.
  • How to calculate the feature mapping of one and two dimensional convolution layers in convolutional neural networks.

What’s your problem? Post your questions in the comments below and I’ll do my best to answer them.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.