Sliding Windows are used for many tasks, such as crowd counting, and the tag file is a two-dimensional matrix the size of the picture, with the center of the head 1 and the rest of the head 0. I want to find the densest region of heads in this picture (given the size of the region), so how do I find it?
The solution I came up with is to use the size of this region as a fixed window, and slide through the entire label matrix. Each time I slide to a place, I will calculate the number of “1” in the current window. The region with the largest number (the largest sum) is the region with the densest heads, that is, the current window.
For sliding window calculation, the most obvious idea is to use a two-layer for loop. However, first of all, it needs to deal with the boundary problem. Secondly, as the size of the image increases, the efficiency will become very low and the memory consumption will be very large.
In this paper, Numpy is used to complete the sliding window calculation and to calculate the largest region of the sum of the element values (the size of the region is set to the size of the convolution kernel). In order to increase the universality of the algorithm, the tag file element’s value is not limited to two Numbers 0, 1, in order to solve this problems, changes need to be calculated in sliding window after adding a small piece of area element additive action, along the way of thinking can use numpy to achieve maximum and mean additional complete two tasks of pool.
To complete the discussion below, take a simple example:
$$ X=\left[ \begin{matrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{matrix} \right] $$
If the size of Kernal is 2×2 and the stride is 1, then the result of sliding window calculation is the new matrix A composed of the following 4 small matrices:
At this point, the manual calculation version of the sliding window is over. In the following task, the four small matrices are respectively summed up to get four values [12,16,24,28]. It can be known that the largest region of the sum of the elements’ values is the one in the lower right corner.
Let’s use a program to replicate the above discussion.
def pool2d(X, kernel_size, stride, padding, pool_mode='avg'): # ** X = np. Pad (X, padding, If ((x.shape [0] -kernel_size [0] + 2*padding) + 2*padding)// (x.shape [1] -kernel_size [1] -kernel_size [1] + 2*padding) Strides = (Strides * x.trides [0], stride* x.trides [1]) + x.trides) If pool_mode == 'Max ': *kernel_size) Reshape (1,2, 3). Reshaping (1,2, 3). Reshaping (1, 3, 4). Return A, a. m.(axis=(1,2)). Reshape (output_shape) # achieve mean pooling task
Step 1: Edge padding
The matrix is pored. The constant parameter indicates that the same value is continuously filled, i.e. Padding. It is generally full zero filling.
The second step: size estimation
The size of each small matrix AI after window sliding is calculated in advance. This part can be calculated according to the formula of size change before and after convolution:
new=(old-kennel_size+2*padding)/stride+1
In the above example output_shape is (2,2). [(4-2 + 2 * 0) / / 2 + 1 = 2]
Step 3: Sliding window calculation
Call the as_strided function to do the window sliding calculation. The function takes three main arguments:
- I don’t need to say much about the matrix I’m going to operate on.
- Shape:Returns the size of the matrix, different from “output_shape”The shape is the size of the matrix A, that is, the size of all the small matrices put together. This size is not necessarily equal to the size of the input matrix X. In the example above, shape is
,2,2,2 (2)
, and the size of the input matrix X is(3, 3)
。 -
Strides: This is a property of the Numpy array, as stated in the official manual in terms of the number of bytes required to travel across the dimensions of the array. Use the matrix X above as an example:
- from
X[0][0]
toX[0][1]
It takes four bytes. Why four? The data type of A is INT32, which is exactly 4 bytes. -
It takes 12 bytes to go from X[0][0] to X[1][0]. Why 12? Since Python is a row-first programming language, which reads the elements of a matrix line by line, flatten the matrix X to [0, 1, 2, 3, 4, 5, 6, 7, 8], going from 0 to 3 requires traversing 3 elements, each of which is 4 bytes, so a total of 12 bytes is required.
Note 1: Of the common programming languages, only MATLAB and FORTRAN have column order precedence.
Note 2: For a closer look at Strides, Strides is another efficient implementation of the convolution algorithm.
- from
The result of this step is the following matrix A, with the dimensions (2,2,2,2).
! The formula of the [[]] (https://www.zhihu.com/equatio…
Step 4: Dense area search task
In fact, the calculation of the sliding window is finished at the third step. This step is to readjust the size. (2,2,2,2) is adjusted to (4,2,2), that is, the number of small matrices in the first dimension, and the next two dimensions are the size of small matrices. The sum of each small matrix can be compared to find the largest region of the sum, and then complete the dense region search task.
Step 5: Two types of pooling tasks
Take averaging pooling as an example, and call the following function
Arjun ean (axis = (1, 2)). Reshape (output_shape)
Axis =(1,2) is because at this time, the matrix A dimension is (4,2,2), and the processing should start from the second dimension.
Reshape (output_shape) is because according to the requirements of pooling task, the output results should be consistent with the dimensions of small matrices, i.e. (2,2).
Program running results:
Reference:
https://zhuanlan.zhihu.com/p/…