Convolution computation for convolutional neural networks

[Free book] Long press to scan the qr code above to become a community member

Membership information will return after the completion of a sequence number, number of 1, 10, 20, 30, 40, 50, 60, 70, 80, 90, children’s shoes, you can free access to the “artificial intelligence extremely brief introduction to programming (based on the Python) book a prize books provided by mechanical industry press. If you find the serial number for the above figures, Please add the wechat account of Yumi: Codinggo and reply the mailing information (including name, mobile phone and address).

Closing date: 22:00 PM, Tuesday, June 18.

* * * *

One dimensional convolution \

Vector multiplication is realized by one-dimensional array, which can be written in the following form:

Copy the code

If the shapes of two arrays are different, what should we do?

It’s easy to follow the simplest rule of thumb: experience a one-dimensional array first, then a higher-dimensional array.

1. The sliding window

The multiply() method will cause an error if the multiply() method is used directly on both arrays.

The solution is simple. It is the usual routine for solving complex problems, namely, decomposition and step by step.

Specifically, for array multiplication of different shapes, the process of multiplication is decomposed into the multiplication of corresponding elements one by one. The code and running results are shown in Figure 1.

Figure 1 is a multiplication of the corresponding elements \

X and W are two one-dimensional arrays. The corresponding elements refer to the element with index 0 in X and the element with index 0 in W, and their product is stored in the variable product_00.

The element with index 1 in x corresponds to the element with index 1 in W, and its product is stored in the variable product_11.

Storing the variable ARR0 in a new array of two products is the result of the first step.

Obviously, the multiplication of x and w has not been completed, but only the first step has been completed. Before proceeding to the next calculation, let’s first understand the calculation process of the first step through Diagram 2.

The larger numbers in the three squares of Figure 2 represent the three elements of array X.

The numbers in the smaller size and subscript position in the shadows in Figure 2 represent the two elements of array W.

Each element in W is located at the lower right corner of the corresponding element in X in the form of subscript. The subscript shown by the arrow in Figure 2 is w[1] at this time.

The first step in multiplying w and x can be intuitively seen as multiplying each element in the shaded grid by its subscripts.

With this base, it is easy to do the second step of multiplying w and x, which is to move the shaded area one space to the right (slide one step to the right), as shown in Figure 3.

Copy the code

Figure 2 schematic diagram – initial window position

Figure 3 schematic diagram – Window slide one step to the right \

At this point, the element with index 1 in x corresponds to the element with index 0 in W, and the element with index 2 in x corresponds to the element with index 1 in W, as shown by the arrow in Figure 3.

It is possible to multiply different arrays of shapes by moving (sliding) the shadow, and the shadow looks like a window, so it is called a sliding window.

The process in Figure 2 to Figure 3 shows that the window slides from x[0] and X [1] to x[1] and X [2].

The length of the window is equal to the length of w, which in this case is 2, so always cover 2 squares in the shadow.

The term sliding window is used in many fields, but it doesn’t mean the same thing. For now, we just need to take the word for it: a sliding window.

Array W is the window that slides over array X, and since x is of length 3 and W is of length 2, you only slide once (also known as slide one step, one step).

Here is a summary of the main points of this section:

In this section, the corresponding positions of elements in the two arrays involved in the multiplication operation are determined by the window, and the elements at the corresponding positions in the area covered by the window are calculated separately.
Presenting concepts, processes and data in the form of graphic images is a very efficient way of communication. Data visualization analysis has been widely used in many fields of current social and economic activities. However, the visualization of knowledge points lags far behind the data visualization in economy and business. At the same time, there are a lot of unnecessary fancy elements in the visualization, such as cool background, complex decoration, these extra elements are interference to the spread of knowledge and information, I hope that children can strive to simplify and efficient in future study and work rather than complicated, show off.
By sliding the window, you can multiply two arrays of different lengths, but this section only gives the first step of the code, the second step operation (that is, after the window slides to the right of the corresponding element multiplication) code implementation please try by yourself, must immediately start oh!

2. A one-dimensional convolution

Convolution operation is the most core and basic concept in deep learning algorithm, and the high-dimensional array (such as four-dimensional array) is usually involved in the operation. However, for pure white, it is difficult to directly and easily understand the convolution operation of high-dimensional array, so we first experience the simplest version of convolution operation, and then gradually increase the dimensions, and finally fully master it.

The simplest version of convolution is that it involves a one-dimensional array, and the result is a one-dimensional array, so it’s called a one-dimensional convolution. Figure 6-20 shows the code and running results.

Before the convolution operation, we first complete the second step of the previous section. The code and operation results are shown in Figure 4.

Figure 4. Second step of multiplying w and x arrays

Now you need to combine ARR0 with ARR1. The specific combination is divided into the following two steps:

(1) Sum arr0 and ARR1 respectively.

(2) Form a new array with the sum of two arrays.

The first step is completed, and the code and results are shown in Figure 5.

FIG. 5 Sum () method

For arr0, there are two elements in the array: product_00 and product_11. The sum() method is used to add product_00 to product_11, resulting in a scalar 5.

Do the same for ARr1, and the result is 9.

With a good understanding of what’s going on, it’s easy and smooth to do the following operation, convolution. The code and results are shown in Figure 62.

Figure 6 Correlate () method

The sum of arr0 and ARR1 respectively constitutes a new array, which is the final result of the convolution operation of the two arrays X and W, array([5, 9]).

The Convolution in DL corresponds to the Correlation in mathematics, so the way NumPy provided to do it was called correlate(), and the code and result are shown in Figure 6.

Pure White is advised not to waste time and energy on the name question, just to remember that the one-dimensional convolution in this section can be calculated by NumPy’s correlate() method.

The whole calculation of one-dimensional convolution, which is the main point of this section, is linked together with the previous section. Taking the two arrays x and W as examples, the process can be summarized in the following three steps:

(1) W as a window, run element multiplication (elder-wise x using a multi if w[0] corresponds to x[0]), and then sum the new array using a multi.

(2) W slides one step to the right, multiplies elements in the new window position, and sums the results.

(3) Take the results obtained in the above steps as elements to form a new array, which is the result of the convolution operation of the two arrays X and W.

It should be noted that the above steps are only one-dimensional convolution, and the calculation process of higher dimensional convolution (such as two-dimensional convolution) will be more steps, but the basic principle is the same.

We’ve achieved one-dimensional convolution by hand, which is a huge achievement! Once you’ve mastered the tips of this lesson, take a break and treat yourself.

What does convolution yield

In the last class, we realized one-dimensional convolution by ourselves. After class, a child asked the teaching assistant: “What is the function of array obtained through convolution operation?”

In DL context, the result of the convolution operation is the feature learned by the algorithm. So the question is, what are features?

1. Characteristics and learning

A person’s height, weight and age are characteristics of the person.

The ears, nose and tail of a cat are the characteristics of the cat.

In a picture, different colors and different brightness constitute various objects in the picture, and the color and brightness of the object in the picture are its characteristics.

The task of CV (Cornpuer Vision, computer Vision) is to carry out high-level understanding of images. Give the CV algorithm a picture and the algorithm can identify whether the picture contains a cat or a dog, a pedestrian or a car.

Automated driving, for example, no one car through the camera capture image, the image to the CV algorithm, the algorithm of CV is not only to identify the picture pedestrians or other vehicles, to identify the person, car, or other objects and their distance, direction, and decision is moving, or acceleration, deceleration and braking or other operation.

This CV task is also colloquially known as semantic understanding.

It’s a very natural and common process for us humans, and we repeat these recognition and judgments every time we go outside. But it’s a huge challenge for a computer program.

Preliminary mastered the knowledge of programming, we have learned that, at least is a computer program is written in advance human code and instructions, such as in what color what position did you draw a line, on how to deal with, what data do send a what kind of information to teachers and elders, all of these are through programming language accurate description. The premise of programming is to be able to articulate a problem, a task, using human natural language (that is, the words we speak, the words we write in a book).

Traditional programming involves translating natural language into code, for example:

Natural language: draw a dot at (0,0).
Code: plot(0,0,’o’).

In this case, natural language corresponds to code to tell the computer exactly what we need it to do. So you want a program to recognize a cat or a car in the traditional way, if it can describe the cat or the car accurately in natural language.

Children can try to describe exactly what a cat is in natural language, such as color, size, length of tail, head, body, claws, what a cat looks like and what it does not look like. You end up finding it almost impossible. So how does the DL do it? To learn!

Suppose we want to develop a program that identifies cats in pictures, hence the name of the program.

The following steps are only used to establish a preliminary intuition of the learning process of artificial intelligence algorithms:

(1) Give Afafaow a thousand pictures of cats and tell Afafaow that this is a cat.

(2) Give Afaow a thousand pictures without cats. Tell Afaow that there are no cats in these pictures. As for the specific characteristics of a cat, we don’t and can’t tell Afafaow. Afaow first looked at a picture with a cat, then looked at a picture without a cat, found the difference between the two pictures, extract some cat features; Look at a picture of a black cat, then look at a picture of a white cat, find similarities between the two pictures, and extract some cat features; Then build your own knowledge base (W) with these characteristics.

(3) Show the cat a picture and ask it to guess if there is a cat in the picture. If it guesses correctly, it will be given a piece of dried fish. If it guesses wrong, it will be kicked. Each time afalmeow guesses wrong, he updates his knowledge base with his newfound knowledge. This is just like our homework, exercises, exams to do the wrong question against the book and the answer to see which knowledge point did not understand, did not grasp, make up the knowledge point.

(4) Repeat the above process until the accuracy of Afafmeow meets the requirements. So let’s say we have a 95% requirement, so we can’t get more than 50 wrong guesses out of 1,000 tests. This section is intended to give children an intuitive sense of the DL algorithm learning process, so there is no point.

2. Combination of features

The previous section briefly described how the Cat learned to recognize a picture of a cat in spring and autumn style. One detail that was left unexplored was the knowledge that the Alpha cats learned from the pictures to extract what the cat’s features looked like. To illustrate this, this section gives Afafaow a new task, recognizing faces.

In order to simplify the calculation, the face image needs to be cut into a uniform size and processed in the grayscale format.

There are 24 faces in Figure 7. Taking the face shown by the arrow as an example, how can Afaow recognize this face?

First of all, how do we usually recognize faces? Basically be to see facial features, skin color, can’t see hairstyle only, right? The features and relative positions of the five features are the features of the face that we want to extract.

Figure 7 face

In Figure 8, there are 32 partial pictures of five senses. Taking the nose shown by the arrow as an example, from the perspective of the image, a person’s nose is composed of lines of different lengths and directions, namely horizontal, vertical, 30°, 60° and 90° with different angles, directions and brightness.

Figure 8. Part of five senses

Looking at Figure 9 without the first two images, even an AI expert would not be able to tell that this is the line that makes up the nose contour. But combined with the first two pictures, Afameow can be combined into a local facial features through different lines, and then through the overall features of their respective features and relative positions of the facial features to identify a face.

Figure 9 lines

3. Minimal experience characteristics

In the previous section, we transformed the recognition process of an image of a face into the recognition of the different lines that together form the outline of a specific part (such as the nose). The lines that make up the contour are the edges of the parts, so the operation of identifying these lines is called edge detection.

Open the following URL in Chrome:

Copy the code

Colab.research.google.com/github/Mach…

Then select the Runtime→Run all command, as shown in Figure 10.

FIG. 10 Edge detection

The example outputs three graphs, from left to right axS [0], AXS [1], and AXS [2].

Axs [0] shows the original image, which we generated in the previous chapter with a capital J. One thing that’s different is that instead of 12 by 12, this is 360 by 360, as you can see on the axis.

Axs [1] and AXS [2] are the results of two different convolution operations on the original image. The difference between the two convolution operations is the value of the window W, which will be discussed in more detail in the following sections.

The first stroke of the capital letter J is a bar. The bar in Figure 10 is black and the background is white. From top to bottom, the transition from white to black is the edge of the bar, i.e. the white line on the gray background shown by the axS [1] arrow.

The horizontal at the position indicated by the axS [1] arrow is the edge of the transition from white to black from left to right.

To further experience the edges, let’s look directly at the data that generated these images, with the code and results shown in Figure 11.

Figure 11. Upper-left corner \ of J in AXS [0]

The example looks at the upper-left corner of the image J in AXS [0], where the change from 255 to 0 is visible. Look again at the result of the convolution operation on this graph. The code and result are shown in Figure 12.

Upper left corner in Figure 12 AXS [1]

This view is in figure 10 axs [1] (figure) in the middle of the top left arrow shown in a white line on the gray background of the left side, the white line is based on the original image (capital J) after the convolution operation to extract the feature, the edge of the transition from white to black, two key pixel row in 360 x 360 pixels (capital J 1 pen, namely horizontal edge).

A child raised his hand: “Grayscale is represented by a number from 0 to 1. Why 255 and 765?” That’s a very good question! That’s the point of our next section. Before moving on to the next section, let’s summarize the main points of this section:

Edge detection is the first step of CV algorithm to semantic recognition of images. Based on the local contour of edges (such as human nose, cat tail or car tire), the overall recognition effect (such as a person, a cat or a car) can be achieved.
Edges can be extracted from pixels of the original image by convolution operation. In grayscale, an edge is a boundary from black to white or white to black.

This article is adapted from the introduction to Minimalist Programming for Artificial Intelligence (Based on Python), a new book by China Machine Press. Throughout the book “the simplest experience” teaching principle, and simulate the actual classroom teaching style, through the vernacular language of humor, takes readers holding unit 1, learn step by step, let them in immersive teaching atmosphere, relaxed, cheerful to master basic knowledge in the field of artificial intelligence technology, artificial intelligence entered the door.

Hot recommended in Python create WeChat robot in Python robot to monitor WeChat group chat in Python for cameras and real-time control face open source project | beautification LeetCode warehouse is recommended in Python Python Chinese community’s several kind of public service, announced notice | Python Chinese Community award for articles \

Click to become a registered member of the community ** “Watching” **

Convolution computation for convolutional neural networks

Related Posts

Linux files and directories

Spring Boot2 configures AOP logging

Spring IOC source code parsing