OpenCV linear scaling in XY direction with Bilinear Interpolation

1, the principle of In many parts of image affine transformation, the need to use interpolation operation, common interpolation arithmetic including the adjacent interpolation, the bilinear interpolation, double three interpolation, francois, methods of interpolation, OpenCV offers so many ways, among them, the bilinear interpolation with the compromise interpolation effect and operation speed, use more widely. The simpler the model, the better the example, let’s take a simple image: a 3 × 3 grayscale 256 image. Suppose the pixel matrix of the image looks like the following (the original image is called Source) : 234, 38, 22, 67, 44, 12, 89, 65, 63 in this matrix, the coordinates of the elements (x,y) are determined as x goes from left to right, starting at zero, and y goes from top to bottom, also starting at zero, which is the most common coordinate system in image processing. If you want to enlarge this image to a 4 by 4 size, how do you do it? Well, the matrix is drawn, as shown below. Of course, each pixel of the matrix is unknown, waiting for us to fill (the graph to be filled is called Destination) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? And then we have to fill in the empty matrix, where are the values coming from? From the source image, ok, first fill in the top left corner pixel of the target image, the coordinate is (0,0), SrcX =dstX* (srcWidth/dstWidth), srcY = dstY * (srcHeight/dstHeight). You can find the coordinates of the corresponding original graph (0*(3/4),0*(3/4))=>(0*0.75,0*0.75)=>(0,0), find the corresponding coordinates of the source graph, you can fill in the value of 234 pixels in the source graph (0,0) in the position of the target graph (0,0). Then, do the same thing, find the coordinates of the target graph (1,0) corresponding to the coordinates of the source graph, apply the formula: (1*0.75,0*0.75)=>(0.75,0) Results found that the obtained coordinates actually have decimals, what can we do? The image in the computer is digital image, the pixel is the smallest unit, the coordinates of the pixel are integer, never decimal coordinates. One strategy used in this case is to round the non-integer coordinates to integers (you can also round off the decimal places directly). Ok, then round off the coordinates to get (1,0). The complete operation looks like this: (1*0.75,0*0.75)=>(0.75,0)=>(1,0) then you can fill another pixel into the target matrix, also fill the pixel value 38 at the coordinate of (1,0) in the source graph into the coordinate of the target graph. Fill in each pixel in turn, and an enlarged image is created, with the pixel matrix as follows: 234, 38, 22, 22, 67, 44, 12, 89, 65, 63, 89, 65, 63, 63, 63 this method of magnifying images is called the nearest interpolation algorithm, which is the most basic and simplest image scaling algorithm, and the effect is also the worst. The enlarged images have very serious Mosaic. The reduced image has serious distortion; The root of the result is bad is the most near its simple interpolation method introduced serious image distortion, for example, when the target figure’s push to get the coordinates of the source map are the coordinates of a floating point number, the rounding method was adopted, apply directly to the nearest pixel value and the floating-point Numbers, this method is not scientific, When the coordinate value is 0.75, it should not simply be 1. Since it is 0.75, 0.25 smaller than 1 and 0.75 larger than 0, the target pixel value should actually be calculated according to the four real points around the virtual point in the source map according to certain rules, so as to achieve better scaling effect. Bilinear interpolation algorithm is a better image scaling algorithm, which makes full use of the four real pixel values around the virtual point in the source image to jointly determine a pixel value in the target image, so the scaling effect is much better than the simple nearest neighbor interpolation. The bilinear interpolation algorithm is described as follows: For a destination pixel, set the floating point coordinates obtained by reverse transformation of coordinates as (I +u,j+v) (where I and j are integer parts of floating point coordinates, u and v are decimal parts of floating point coordinates, and are floating point numbers in the range of [0,1)). Then the value f(I +u,j+ V) of this pixel can be determined by the values of the surrounding four pixels corresponding to the coordinates of (I,j), (I +1,j), (I,j+1) and (I +1,j+1) in the original image, namely: F (I + u, j + v) = (1 – u) (1 – v) f (I, j) + (1 – u) vf (I, j + 1) + u (1 – v) f (I + 1, j) + the uvf (I + 1, j + 1) of f (I, j) (I, j) represents the source image pixel values, and so on. For example, like the one just now, Now if the pixel coordinates of target images (1, 1), then push the corresponding to the source map coordinates are in the form of (0.75, 0.75), it is just a concept of virtual pixels, the actual does not exist such a pixel in the source diagram, then the target figure of pixel values (1, 1) can be determined by the virtual pixels, It can only be determined by these four pixels of the source image: (0,0) (0,1) (1, 0) (1,1), and since (0.75,0.75) is closer to (1,1), (1,1) plays a greater decisive role, which can be reflected by the coefficient uv=0.75×0.75 in formula 1, while (0.75,0.75) is furthest from (0,0). Therefore, (0,0) plays a less decisive role, which is also reflected in the coefficient (1-u)(1-v)=0.25×0.25 in the formula

First, two linear interpolations are performed in the X direction, followed by one in the Y direction.

In image processing, we first calculate the position of the target pixel in the source image according to srcX=dstX* (srcWidth/dstWidth), srcY = dstY * (srcHeight/dstHeight). SrcX and srcY calculated here are generally floating point numbers. For example, f(1.2, 3.4) is a virtual pixel, and the four actual pixels (1,3) (2,3) (1,4) (2,4) adjacent to it are first found in the form of f(I +u,j+v). Then u=0.2,v=0.4, I =1, j=3 in the differential interpolation along the X direction, f(R1)=u(f(Q21)-f(Q11))+f(Q11) in the same way along the Y direction. Or, directly step calculation, f (I + u, j + v) = (1 – u) (1 – v) f (I, j) + (1 – u) vf (I, j + 1) + u (1 – v) f (I + 1, j) + the uvf (I + 1, j + 1). 3. The acceleration and optimization strategy simply based on the interpolation algorithm implemented above can only barely complete the interpolation function, and the speed and effect are not ideal. There are some small skills in the specific code implementation. Refer to OpenCV source code and online blog collation as follows:

Alignment of source image and target image geometry center.
Convert a floating point operation to an integer operation

3.1 Alignment of geometric center of source image and target image

SrcX =dstX* (srcWidth/dstWidth), srcY = dstY * (srcHeight/dstHeight) center alignment (OpenCV also works like this) SrcX=(dstX+0.5)* (srcWidth/dstWidth) -0.5 SrcY=(dstY+0.5) * (srcHeight/dstHeight)-0.5

The blog explains that “if you choose the upper-right corner as the origin (0,0), the right-most and bottommost pixels are not actually involved in the calculation, and each pixel in the target image calculates a grayscale that is slightly to the left of the source image.” I’m kind of skeptical. SrcX =dstX* (srcWidth/dstWidth)+0.5*(srcWidth/ dstwidth-1) = 0.5*(srcWidth/ dstwidth-1) = 0.5*(srcWidth/ dstwidth-1) The sign of this term can be positive or negative, and it depends on the ratio of srcWidth to dstWidth and whether the current interpolation is enlarging or shrinking the image. What does that do? Look at an example: Assuming that the source image is 3*3 and the center point coordinates are (1,1), the target image is 9*9 and the center point coordinates are (4,4), we hope to use the pixel information of the source image as evenly as possible when carrying out interpolation mapping. The most intuitive one is that (4,4) is mapped to (1,1). Now we can directly calculate srcX=4*3/9=1.3333! =1, which means that the pixels used in interpolation are concentrated in the lower right part of the image rather than evenly distributed throughout the image. Now consider center point alignment, srcX=(4+0.5)*3/9-0.5=1, which is just what we want. Refer to the optimization of bilinear interpolation algorithm in image processing boundary. If the calculation is carried out directly, since the srcX and srcY calculated are both floating point numbers, a large number of multiplications will be carried out subsequently, and the image data is large, so the speed will not be ideal. The solution is as follows: Floating point operation →→ integer operation →→ “<< left and right shift bitwise operation”. The main object of magnification is u, V floating point numbers, OpenCV select the magnification factor is 2048 “how to take this appropriate magnification factor, to consider from three aspects, first: precision, if this number is too small, then after calculation may lead to large errors in the results. Second, the number must not be so large that it would cause the calculation to go beyond what the length shaping can express. Third: speed considerations. If the magnification is 12, the final result should be divided by 12*12=144, but if it is 16, the final divisor is 16*16=256. This is a good number and we can do it by right shift, which is much faster than regular divisible.” We can zoom in by moving 11 bits to the left. 4, the code

    uchar* dataDst = matDst1.data;
    int stepDst = matDst1.step;
    uchar* dataSrc = matSrc.data;
    int stepSrc = matSrc.step;
    int iWidthSrc = matSrc.cols;
    int iHiehgtSrc = matSrc.rows;

    for (int j = 0; j < matDst1.rows; ++j)
    {
        float fy = (float)((j + 0.5) * scale_y - 0.5);
        int sy = cvFloor(fy);
        fy -= sy;
        sy = std::min(sy, iHiehgtSrc - 2);
        sy = std::max(0, sy);

        short cbufy[2];
        cbufy[0] = cv::saturate_cast<short>((1.f - fy) * 2048);
        cbufy[1] = 2048 - cbufy[0];

        for (int i = 0; i < matDst1.cols; ++i)
        {
            float fx = (float)((i + 0.5) * scale_x - 0.5);
            int sx = cvFloor(fx);
            fx -= sx;

            if (sx < 0) {
                fx = 0, sx = 0;
            }
            if (sx >= iWidthSrc - 1) {
                fx = 0, sx = iWidthSrc - 2;
            }

            short cbufx[2];
            cbufx[0] = cv::saturate_cast<short>((1.f - fx) * 2048);
            cbufx[1] = 2048 - cbufx[0];

            for (int k = 0; k < matSrc.channels(); ++k)
            {
                *(dataDst+ j*stepDst + 3*i + k) = (*(dataSrc + sy*stepSrc + 3*sx + k) * cbufx[0] * cbufy[0] + 
                    *(dataSrc + (sy+1)*stepSrc + 3*sx + k) * cbufx[0] * cbufy[1] + 
                    *(dataSrc + sy*stepSrc + 3*(sx+1) + k) * cbufx[1] * cbufy[0] + 
                    *(dataSrc + (sy+1)*stepSrc + 3*(sx+1) + k) * cbufx[1] * cbufy[1]) >> 22;
            }
        }
    }
    cv::imwrite("linear_1.jpg", matDst1);

    cv::resize(matSrc, matDst2, matDst1.size(), 0, 0, 1);
    cv::imwrite("linear_2.jpg", matDst2);
Copy the code

Reference: OpenCV resize function five interpolation algorithm implementation process

Basic principles of nearest neighbor interpolation and bilinear interpolation in image scaling

OpenCV linear scaling in XY direction with Bilinear Interpolation

Related Posts

How can You use Python to enhance Excel and reduce the pain of dealing with complex data?

Entry must see, software test engineer, can do how long

The things I gave up and Gained from graduation to work | Mid 2021 summary