Master the SIFT feature detection principle of image
SIFT algorithm is also called scale invariant feature transform matching algorithm SIFT algorithm is decomposed into the following four steps:
- Scale-space extremum detection: search for image positions on all scales. Potential points of interest for scale and rotation invariant are identified by gaussian differential functions.
- Key point location: Location and scale are determined by a fine-fitting model at each candidate location. Key points are chosen based on their stability.
- Direction determination: assign one or more directions to each key point location based on the local gradient direction of the image. All subsequent operations on the image data transform relative to the direction, scale, and position of the key points, thus providing invariance to these transformations.
- Key point description: In the neighborhood around each key point, the local gradient of the image is measured at the selected scale. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.
Detection scale space extreme value
Construct the scale space. The scale here can be understood as the blur degree of the image, which is the degree of myopia of the eye. The larger the scale is, the less the details are. SIFT feature hopes to extract the information at all scales. Therefore, the scale space of the image is constructed, that is, different smoothing checks are applied to smooth the image. Therefore, gaussian kernel is used for smooth kernel here, and the spatial scale is determined by Gaussian kernel scale
Where L is the original image, * is the convolution symbol, corresponding to the scale image under the scale. It’s a Gaussian kernel.
Scale space describes the features of different fuzzy degrees, but does not describe the situation of image size, so this paper combines scale space with image pyramid. The pyramid of the image is obtained by smoothing the image step by step. Given that the mesoscale variation range of each octave is s scales measured for each pyramid, the scale range of t pyramid is that the first image of the first pyramid is sampled from the lower scale of the upper tower with a lower sampling ratio of 0.5. The scale variation factor for each layer of the pyramid, the NTH layer scale is such that the scale from the bottom of the pyramid to the top of the pyramid is continuous
DoG extremum
Now the image has been changed to the scale space, and I hope to find a significant point in this space, that is, the point with obvious change, that is, the point that attracts your attention in the process of gradually blurring the image. The protrusion of this point can be reflected by the change trend on the scale axis, that is, the gradient extreme value on the scale axis. Calculating the gradient on the scale axis is the difference of the image along the scale axis
Here, DoG is approximately equal to the scale-normalized Gaussian Laplacian operator, while the scale-normalized Gaussian Laplacian operator is more stable than other corner detection operators, such as gradient, Hessian or Harris focus features
Now we’re going to find the extremum, and we’re going to find the extremum in the DoG space, which means the point that is bigger or smaller than the surrounding point is the key point. Since DoG is superimposed by differential images, the surroundings here are of course 3D surroundings, including 26 adjacent points in total.
So now it’s a step by step search for extremes on the corresponding scale along the scale axis. We now look at a pixel of the image along the scale axis, and it becomes a one-dimensional image
There are s-layer scale images in each pyramid, that is to say, there are S scales. We hope to find extreme points in all s-scales, so there should be DoG images of S +2 layers, because this process is a three-layer data comparison. However, DoG image is obtained by difference of two layer images, so S +2 layer DoG image should require S +3 layer scale image.
So the S +3 layer scale image should be calculated on each layer of the pyramid.
This gives us a rough idea of the key points from the DoG space. However, in order to further refine the results, the results need to be screened to remove noise points.
Remove the interference
Both LoG and DoG are affected by edges when they perform key point detection
Remove disturbance 1: smaller extremum
Of course, the value of the extreme point found can be threshold filtered directly, but in order to locate the key point more accurately, 3D quadratic function fitting is carried out on the key point (which can be analogous to the Newton method in optimization), and then the extreme value of the function is calculated. Perform Taylor quadratic expansion on a segment of function at each key point
Where x is the distance from the key point. And represent the value, first derivative and second derivative of the ternary function at key points respectively. This expansion is the 3D conic curve fitted around the key point, and the derivative is zero by finding the extreme value. So:
The extreme value is zero
Points removed from the text.
Remove interference 2: edge noise
The change trend of the curve in one dimension can be reflected by curvature reaction. So when the maximum point is noise edge detection can choose two section of maximum and minimum surface change, judge the profile curve of curvature at that point, the larger the curvature said the steep, the smaller the curvature said more smoothly, and for blob regions, namely peaks, ideally the curvature of the arbitrary direction are the same and is larger. On the other hand, the ridgeline, namely the edge, changes slowly along the direction, that is, the curvature is small, while in the direction of the vertical ridge, the change is severe and the curvature is large. You can tell if it’s a mountain or a ridge line by the ratio of two curvatures.
For a two-dimensional surface, the second moment matrix at the point of the surface, namely the Hessian matrix, describes the trend of change around the point. To put it bluntly, the second-order moment matrix is the covariance matrix of the binary function at this point. The eigenvalue of the covariance matrix corresponds to the projection in the direction of the eigenvector. The larger the value is, the slower the reaction function changes in this direction, that is, the larger the curvature is. So the eigenvalue of the Hessian matrix is proportional to the curvature of the eigenvector. We hope to use the ratio of curvature to eliminate edge points so that the ratio of curvature can be replaced by the ratio of Hessian’s eigenvalue. The Hessian matrix is defined as follows:
It can be calculated by second order difference, and then you can calculate the ratio of the eigenvalues. Wait a minute. Suppose the two eigenvalues are, let’s look at the following two expressions:
And it turns out that the ratio of the eigenvalues can be computed by these two, instead of the eigenvalue decomposition, which is much easier. Is assumed to be a larger eigenvalue, and, then
So the points that we want to eliminate are the same thing as PI
The point is removed.