1 background
In theory, image processing is usually divided into three types: low-level vision related to some simple calculations (image statistics and simple calculations, such as histograms, filtering, texture, etc.), middle-level vision related to some known human visual characteristics (image understanding, such as gradient, matching tracking, image features, etc.), Advanced vision (image understanding, such as classification, detection, semantically based processing, etc.) that is highly related to human visual characteristics and ways of thinking.
Among them, image feature is a platitude of image information, early methods such as gradient extraction operator, image local histogram can be used as image local feature. In recent years, convolutional neural network and other methods, which have been widely concerned, are also used to obtain a layer (or multiple layers) of a specific size of the feature graph through network framework such as convolutional, so as to realize the specific purpose of classification as the feature of the image.
This paper focuses on local feature calculation in traditional image processing, and introduces an application of local feature of image — panoramic image Mosaic.
2. Problems in local feature extraction
As an important information in images, image feature information plays an irreplaceable role in image matching and image stitching. If image features are not used, we can hardly find other effective methods to solve image matching and other problems, as shown in the figure below.
Through direct visual discrimination, we can easily match the same places of two images, but how to make the computer automatic and process-based area to achieve image matching, which needs to extract image features and achieve matching.
However, the feature information of the image is not so easy to get, if you do not refer to relevant research to figure out how to extract the feature, then the following questions come up
Scene 1: A big change in lighting
Scene 2: Large changes in field Angle and distance
For scene 1, lighting changes, you can certainly tell the same part of the scene, but for scene 2, most people can’t even visually match regions. The question is so outrageous that even humans can’t tell it apart. Can it be handed over to a computer? The answer is yes, and the results are as follows.
We can summarize what kind of features we need, or what conditions a good feature should satisfy.
1. Geometry unchanged: not changed by geometric deformation
2. Constant illumination: Does not change because of the illumination change
3. Repeatability and accuracy: features are extracted repeatedly, the results will not change, and the accuracy is high
4. Sufficient number: A large number of points are needed to cover an area
5. Characteristic: Each feature should be unique and have obvious characteristics to distinguish it
Thus, we have identified what an image feature is and what properties it is best to have.
3 Harris corner detection
Objective:
- Repeatable detection
- Accurate positioning
- High frequency information containing images
Essence: Looking for two-dimensional signal changes in the image, near the corner region, the gradient of the image has two or more directions of change.
And corner points have the following mathematical laws:
For flat region, the gradient in horizontal and vertical direction is small; For the edge region, there is a large gradient in horizontal and vertical directions, and a small gradient in vertical direction. For corner points, there is a large gradient in both directions. The following diagram illustrates the differences between the three.
3.1 Mathematical expression of Harris corner detection
Shift the imageGenerate grayscale changeThat is as follows:
It’s Taylor’s expansion
the
One itemIt can be approximated as zero, so
Quadratic expansion
The formula is expressed by matrix as:
This expression is
When programming, we treat M as a 2×2 matrix that can be computed from the gradient of the image
Looking at the equation above, it shows that:
- The main gradient is along the x or y direction
- If any of themIt’s close to zero, so this is not an Angle, so there are only twoBigger is a corner.
3.2 Harris Corner detection Example
For an input image, the Harris corner response diagram with non-maximum suppression is shown below. (For the code part, see the actual code part and examples)
Among them, the bright spot and red dot area are the corner area calculated by Harris corner point.
Harris corner detection can guarantee the translation invariance, rotation invariance and intensity (illumination) invariance of image features, but the defect is that it cannot guarantee the invariance of scale (scale invariance), so the corner points that can be detected are different for different window sizes. As a result, features change due to the change of shooting distance, and the stability of features cannot be achieved.
4 Histogram of Oriented Gradients (HOG)
Goal:
- Looking for a stable set of features to distinguish objects (mainly people)
- Mainly used for overall description of images (sliding Windows in pictures)
Difficult points:
- Too much variation in shape and appearance
- Under different lighting conditions, the background is too complex
- Real-time processing speed is limited
The principle behind HOG involves less mathematics, so we can use process to describe its operating principle vividly.
5. Application of image features: computational imaging of panoramic images in cameras
The unsmooth transition of the synthesized image here is caused by changes in the amount of light entering the lens during the imaging process. There are many methods to solve this problem, such as image pyramid fusion, weighted fusion, etc., which is not the focus of this paper.
6 code actual combat part
# Python 3.6
# panorama.py
import numpy as np
from skimage import filters
from skimage.feature import corner_peaks
from skimage.util.shape import view_as_blocks
from scipy.spatial.distance import cdist
from scipy.ndimage.filters import convolve
from utils import pad, unpad, get_output_space, warp_image
def harris_corners(img, window_size=3, k=0.04):
"""Calculate the Harris corner value of the image by the following formula: R=Det(M)-k(Trace(M)^2). Tip: you can use the convolution function scipy. Ndimage. Filters. Convolve. Parameter: img: grayscale image with shape (H, W) WINDOW_size: kernel size k: sensitivity adjustment factor Returned value: Response: corner response value of image (H, W)"""
# img = img [:, :, 0] # 29 line if an error, please put the 28 img = [0] :, :, remove comments
H, W = img.shape
window = np.ones((window_size, window_size))
response = np.zeros((H, W))
dx = filters.sobel_v(img)
dy = filters.sobel_h(img)
### YOUR CODE HERE
#M = np.ones((H, W,2, 2))
#padded_dx = np.pad(dx, int((window_size-1)/2), mode='edge')
#padded_dy = np.pad(dy, int((window_size-1)/2), mode='edge')
#for i in range(H):
#for j in range(W):
#M[i, j, :, :] = np.array([[sum(sum(b) for b in (padded_dx[i:i+2, j:j+2]*padded_dx[i:i+2, j:j+2])),sum(sum(b) for b in (padded_dx[i:i+2, j:j+2]*padded_dy[i:i+2, j:j+2]))],[sum(sum(b) for b in (padded_dx[i:i+2, j:j+2]*padded_dy[i:i+2, j:j+2])),sum(sum(b) for b in (padded_dy[i:i+2, j:j+2]*padded_dy[i:i+2, j:j+2]))]])
#for a in range(H):
#for b in range(W):
#response[a][b]=np.linalg.det(M[a][b])-k*(np.trace(M[a][b])**2)
dxx = dx * dx
dyy = dy * dy
dxy = dx * dy
mxx = convolve(dxx, window)
mxy = convolve(dxy, window)
myy = convolve(dyy, window)
for i in range(H):
for j in range(W):
M = np.array([[mxx[i, j], mxy[i, j]], [mxy[i, j], myy[i, j]]])
response[i, j] = np.linalg.det(M) - k * np.trace(M) ** 2
### END YOUR CODE
return response
def simple_descriptor(patch):
"""The pixel value distribution is normalized to describe the small patch (the mean value of the standard normal distribution is 0 and the standard deviation is 1), and then it is expanded into a one-dimensional array. Normalizing operations make descriptors more robust to illumination changes. Hint: If the denominator is 0, use 1 instead. Returns: feature: one-dimensional array with size (H * W)"""
feature = []
### YOUR CODE HERE
patch = patch.reshape(-1)
mean = np.mean(patch)
delta = np.std(patch)
ifDelta > 0.0: patch = (patch-mean)/deltaelse:
patch = patch - mean
feature = list(patch)
### END YOUR CODE
return feature
def describe_keypoints(image, keypoints, desc_func, patch_size=16):
"""Args: image: grayscale image of size (H, W) Keypoints: two-dimensional array, each row contains a keypoint (y, x). Desc_func: feature description function, input a graph, and output patch_size: the local image size corresponding to each feature point. For simplicity, I'm going to set the length and width to be the same. Returns: desc: A feature vector describing each feature point."""
image.astype(np.float32)
desc = []
for i, kp in enumerate(keypoints):
y, x = kp
patch = image[y-(patch_size//2):y+((patch_size+1)//2),
x-(patch_size//2):x+((patch_size+1)//2)]
desc.append(desc_func(patch))
returnNp. array(desc) def match_descriptors(desc1, desc2, threshold=0.5):""The matching of feature points is accomplished by calculating Euclidean distance between two points. Now, we have two sets of eigenvectors. Take any feature vector v1 from the first set, calculate the distance between it and each of the feature vectors in the second set, and we get the distance between v1 and the other feature vectors. If the closest eigenvectors to v1 are v21 and v22, then if the distance between v1 and v21 is much less than the distance between v1 and v22, then v1 and v21 are considered to be a group. That is, d(v1,v21)/ D (v1,v22) must be less than a certain threshold. Based on that, we can find all the matching points. Tip: The functions np.sort, np.argmin, np.asarray in Numpy may be helpful. Args: DESc1: array of shape (M, P), each row of which is a descriptor (size P) and has M feature vectors (M feature points). Desc2: Array of shape (N, P), each row of which is a descriptor (size P) with a total of N feature vectors (N feature points). Returns: matches: an array of size (Q, 2). Each line is a indices of one pair of matching Descriptors. """
matches = []
N = desc1.shape[0]
dists = cdist(desc1, desc2) The Euclidean distance of each vector is N times N
### YOUR CODE HERE
#M = desc2.shape[1]
idx = np.argsort(dists, axis=1)
for i in range(N):
closed_dist = dists[i, idx[i, 0]]
second_dist = dists[i, idx[i, 1]]
if(closed_dist < threshold * second_dist):
matches.append([i, idx[i, 0]])
matches = np.array(matches)
### END YOUR CODE
return matches
def fit_affine_matrix(p1, p2):
"""Fitting the Affine transformation matrix so that P2 * H = P1 hint: you can solve this problem using the Np.linalg.lstsq function. Args: P1: a matrix of size (M, P), each row representing the coordinates of a feature point p2: a matrix of size (M, P), each row representing the coordinates of a feature point Return: H: a matrix of size (P, P), he converts P2 to P1. """
assert (p1.shape[0] == p2.shape[0])
p1 = pad(p1)
p2 = pad(p2)
### YOUR CODE HERE
H = np.linalg.lstsq(p2, p1)[0]
### END YOUR CODE
[0, 0, 1] [0, 0, 1] [0, 0, 1]
H[:,2] = np.array([0, 0, 1])
return H
def ransac(keypoints1, keypoints2, matches, n_iters=200, threshold=20):
"""Using RANSAC to find the Affine transformations of a robust. Select a random set of match points (for an Affine transform, you need to select at least three matches, why?). 2. Calculate the transformation matrix of Affine 3. Calculate the number of inliers 4. Matches: Args: keypoints1: M1 x 2 keypoints2: M2 x 2 matches: Args: keypoints1: M1 x 2 matches: N x 2 matrix, each row representing a set of match [keypoint1, keypoint 2] n_iters: number of times the RANSAC algorithm will go through the loop. To simplify the calculation, we fixed the number of cycles to 200. (Remember we talked in class about how to determine the number of cycles? How do you dynamically change the number of loops without knowing if the data is good or bad? Returns: H: A robust estimate of the Affine transform, which transforms keypoints2 to keypoints 1"""
Save the match array to avoid overwriting it directly
orig_matches = matches.copy()
matches = matches.copy()
N = matches.shape[0]
print(N)
n_samples = int(N * 0.2) Here we associate the value of n_samples with N. For the Affine transformation, you can fix it to 3.
matched1 = pad(keypoints1[matches[:,0]])
matched2 = pad(keypoints2[matches[:,1]])
max_inliers = np.zeros(N)
n_inliers = 0
Start a loop through the RANSAC algorithm
### YOUR CODE HERE
for i in range(n_iters):
temp_max = np.zeros(N, dtype=np.int32)
temp_n = 0
idx = np.random.choice(N, n_samples, replace=False)
p1 = matched1[idx, :]
p2 = matched2[idx, :]
H = np.linalg.lstsq(p2, p1)[0]
H[:, 2] = np.array([0, 0, 1])
temp_max = np.linalg.norm(matched2.dot(H) - matched1, axis=1) ** 2 < threshold
temp_n = np.sum(temp_max)
if temp_n > n_inliers:
max_inliers = temp_max.copy()
n_inliers = temp_n
H = np.linalg.lstsq(matched2[max_inliers], matched1[max_inliers])[0]
H[:, 2] = np.array([0, 0, 1])
### END YOUR CODE
print(H)
return H, matches[max_inliers]
def hog_descriptor(patch, pixels_per_cell=(8,8)):
"""Generate the HOG descriptor by following the steps: 1. Compute the gradient image in the X - and y directions (already calculated) 2. For each cell, compute the gradient histogram 3. For each block, tile them into one-dimensional vectors. 4. Normalizing the tiled block would make the descriptors more robust to illumination changes. Parameter: Patch: grayscale image with size (H, W) pixels_per_cell: Returns: Block: one-dimensional vectors representing the feature description vectors of the patch, size: ((H*W*n_bins)/(M*N))"""
assert (patch.shape[0] % pixels_per_cell[0] == 0),\
'Heights of patch and cell do not match'
assert (patch.shape[1] % pixels_per_cell[1] == 0),\
'Widths of patch and cell do not match'
n_bins = 9
degrees_per_bin = 180 // n_bins
Gx = filters.sobel_v(patch)
Gy = filters.sobel_h(patch)
# unsigned gradient
G = np.sqrt(Gx**2 + Gy**2)
theta = (np.arctan2(Gy, Gx) * 180 / np.pi) % 180
Treat a cell of size (M, N) as a pixel
# G_cells.shape = theta_cells.shape = (H//M, W//N)
# G_cells[0, 0].shape = theta_cells[0, 0].shape = (M, N)
G_cells = view_as_blocks(G, block_shape=pixels_per_cell)
theta_cells = view_as_blocks(theta, block_shape=pixels_per_cell)
rows = G_cells.shape[0]
cols = G_cells.shape[1]
# For each cell, compute the gradient histogram. Its size is N_bins
cells = np.zeros((rows, cols, n_bins))
# For each cell, compute the gradient histogram
### YOUR CODE HERE
# Compute histogram per cell
for i in range(rows):
for j in range(cols):
for m in range(pixels_per_cell[0]):
for n in range(pixels_per_cell[1]):
idx = int(theta_cells[i, j, m, n] // degrees_per_bin)
if idx == 9:
idx = 8
cells[i, j, idx] += G_cells[i, j, m, n]
cells = (cells - np.mean(cells)) / np.std(cells)
block = cells.reshape(-1)
### YOUR CODE HERE
return block
def linear_blend(img1_warped, img2_warped):
"""Follow these steps to mix img1_Warped and img2_Warped linearly: 1. Determine the left and right Spaces (already done for you) 2. To determine a weight matrix for img1_Warped and IMG2_Warped, you might use the Np. linspace and Np. tile functions 3. Img2_warped: merged image into one Args: img1_warped: merged image into one img2 Args: img1_Warped: merged image into one img2 Returns: merged image"""
out_H, out_W = img1_warped.shape # Size of output imageimg1_mask = (img1_warped ! = 0)# Mask == 1 inside the imageimg2_mask = (img2_warped ! = 0)# Mask == 1 inside the image
Find the rightmost column of IMg1 in the output image
# This is the end point of our img1 weight matrix (i.e. the position where the weight value is 0)
right_margin = out_W - np.argmax(np.fliplr(img1_mask)[out_H//2, :].reshape(1, out_W), 1)[0]
Find the leftmost column of IMg2 in the output image
# This is the starting point of our img2 weight matrix (i.e. the position where the weight value is 1).
left_margin = np.argmax(img2_mask[out_H//2, :].reshape(1, out_W), 1)[0]
### YOUR CODE HERE
left_matrix = np.array(img1_mask, dtype=np.float64)
right_matrix = np.array(img2_mask, dtype=np.float64)
left_matrix[:, left_margin: right_margin] = np.tile(np.linspace(1, 0, right_margin - left_margin), (out_H, 1))
right_matrix[:, left_margin: right_margin] = np.tile(np.linspace(0, 1, right_margin - left_margin), (out_H, 1))
img1 = left_matrix * img1_warped
img2 = right_matrix * img2_warped
merged = img1 + img2
### END YOUR CODE
return merged
def stitch_multiple_images(imgs, desc_func=simple_descriptor, patch_size=5):
"""Splice together a series of images. Parameters: IMgs: a List of length m, where each element is a picture and the picture is ordered. Desc_func: feature description function, input a graph, and output patch_size: the local image size corresponding to each feature point. For simplicity, I'm going to set the length and width to be the same. Returns: Panorama: The resulting panorama."""
# For each photo, detect its feature points
keypoints = [] # keypoints[I] correspond to imgs[I]
for img in imgs:
kypnts = corner_peaks(harris_corners(img, window_size=3),
threshold_rel=0.05,
exclude_border=8)
keypoints.append(kypnts)
# Describe feature points
descriptors = [] # descriptors[I] corresponding to keypoints[I]
for i, kypnts in enumerate(keypoints):
desc = describe_keypoints(imgs[i], kypnts,
desc_func=desc_func,
patch_size=patch_size)
descriptors.append(desc)
# Match feature points of adjacent images together
matches = [] # matches[I] records the match point for descriptors[I] and descriptors[I +1]
H = []
robust_matches = []
for i inrange(len(imgs)-1): MTCHS = match_descriptors(descriptor [I], descriptors[I +1], 0.7) matches. Append (MTCHS) h, robust_m = ransac(keypoints[i], keypoints[i+1], matches[i]) H.append(h) robust_matches.append(robust_m)# Use pixels of match to complete image Mosaic
### YOUR CODE HEREoutput_shape, offset = get_output_space(imgs[1], [imgs[0], imgs[2], imgs[3]], [np.linalg.inv(H[0]), H[1], np.dot(H[1],H[2])]) img1_warped = warp_image(imgs[0], np.linalg.inv(H[0]), output_shape, offset) img1_mask = (img1_warped ! = -1) img1_warped[~img1_mask] = 0 img2_warped = warp_image(imgs[1], np.eye(3), output_shape, offset) img2_mask = (img2_warped ! = -1) img2_warped[~img2_mask] = 0 img3_warped = warp_image(imgs[2], H[1], output_shape, offset) img3_mask = (img3_warped ! = -1) img3_warped[~img3_mask] = 0 img4_warped = warp_image(imgs[3], np.dot(H[1], H[2]), output_shape, offset) img4_mask = (img4_warped ! = -1) img4_warped[~img4_mask] = 0 panorama = linear_blend(img1_warped, img2_warped) panorama = linear_blend(panorama, img3_warped) panorama = linear_blend(panorama, img4_warped)### END YOUR CODE
return panorama
Copy the code
# Python 3.6
# utils.py
import numpy as np
from scipy.ndimage import affine_transform
Complete the conversion between Euclidean coordinates and homogeneous coordinates
pad = lambda x: np.hstack([x, np.ones((x.shape[0], 1))])
unpad = lambda x: x[:,:-1]
def plot_matches(ax, image1, image2, keypoints1, keypoints2, matches,
keypoints_color='k', matches_color=None, only_matches=False):
"""Draw the matching feature points. Parameter ---------- ax: matplotlib.axes.Axes axes used to draw images and match point pairs image1: (N, M [, 3]) array grayscale image or color image (N, M [, 3]) array grayscale image or color image keyPoints1: (K1, 2) Array (col, 1). Matches: (K2, 2) array matches (row, col) matches: (Q, 2) array match matches Keypoints_color: matplotlib color, optional specifies the color to draw. Matches_color: Only_matches: bool, optional Specifies whether to draw the coordinate of the keyPoint.""
image1.astype(np.float32)
image2.astype(np.float32)
new_shape1 = list(image1.shape)
new_shape2 = list(image2.shape)
if image1.shape[0] < image2.shape[0]:
new_shape1[0] = image2.shape[0]
elif image1.shape[0] > image2.shape[0]:
new_shape2[0] = image1.shape[0]
if image1.shape[1] < image2.shape[1]:
new_shape1[1] = image2.shape[1]
elif image1.shape[1] > image2.shape[1]:
new_shape2[1] = image1.shape[1]
ifnew_shape1 ! = image1.shape: new_image1 = np.zeros(new_shape1, dtype=image1.dtype) new_image1[:image1.shape[0], :image1.shape[1]] = image1 image1 = new_image1ifnew_shape2 ! = image2.shape: new_image2 = np.zeros(new_shape2, dtype=image2.dtype) new_image2[:image2.shape[0], :image2.shape[1]] = image2 image2 = new_image2 image = np.concatenate([image1, image2], axis=1) offset = image1.shapeif not only_matches:
ax.scatter(keypoints1[:, 1], keypoints1[:, 0],
facecolors='none', edgecolors=keypoints_color)
ax.scatter(keypoints2[:, 1] + offset[1], keypoints2[:, 0],
facecolors='none', edgecolors=keypoints_color)
ax.imshow(image, interpolation='nearest', cmap='gray')
ax.axis((0, 2 * offset[1], offset[0], 0))
for i in range(matches.shape[0]):
idx1 = matches[i, 0]
idx2 = matches[i, 1]
if matches_color is None:
color = np.random.rand(3)
else:
color = matches_color
ax.plot((keypoints1[idx1, 1], keypoints2[idx2, 1] + offset[1]),
(keypoints1[idx1, 0], keypoints2[idx2, 0]),
The '-', color=color)
def get_output_space(img_ref, imgs, transforms):
"""Args: img_ref: Reference image: images to be transformed: Transforms [I] maps points in image imgs[I] to img_ref Returns: outputs the size of the image."""
assert (len(imgs) == len(transforms))
r, c = img_ref.shape
corners = np.array([[0, 0], [r, 0], [0, c], [r, c]])
all_corners = [corners]
for i in range(len(imgs)):
r, c = imgs[i].shape
H = transforms[i]
corners = np.array([[0, 0], [r, 0], [0, c], [r, c]])
warped_corners = corners@H[:2,:2] + H[2,:2]
all_corners.append(warped_corners)
# Find the boundary of Corner
all_corners = np.vstack(all_corners)
# The shape of the entire output graph is the subtraction of the two farthest Conrners
corner_min = np.min(all_corners, axis=0)
corner_max = np.max(all_corners, axis=0)
output_shape = (corner_max - corner_min)
Make sure the output is an integer
output_shape = np.ceil(output_shape).astype(int)
offset = corner_min
return output_shape, offset
def warp_image(img, H, output_shape, offset):
# note the affine_transfomr function
Given a pixel o of the output image,
Its pixel value is determined by the following formula:
# np.dot(matrix,o) + offset.
# Therefore, we need to invert the transformation matrix.
# The advantage of this inverse interpolation is more convenient.Hinv = np.linalg.inv(H) m = hinv. T[:2,:2] b = hinv. T[:2,2] img_warped = Affine_Transform (img.astype(NP.float32), m, b+offset, output_shape, cval=-1)return img_warped
Copy the code
7 cases
7.1 Harris Corner part
We start with Harris corner detection
Initial configuration
import numpy as np
from skimage import filters
from skimage.feature import corner_peaks
from skimage.io import imread
import matplotlib.pyplot as plt
from time import time
from panorama import harris_corners
%matplotlib inline
plt.rcParams['figure.figsize'] = (15.0, 12.0) Set the default size
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# Automatic reload of external modules
%load_ext autoreload
%autoreload 2
img = imread('sudoku.png', as_gray=True)
# Calculate Harris corner response value
response = harris_corners(img)
Display corner response values
plt.imshow(response)
plt.axis('off')
plt.title('Harris Corner Response')
Copy the code
# Non-maximum suppression of Harris response images
Then print the coordinates of the response pointCorners = corner_peaks (response, threshold_rel = 0.01)# show corners
plt.imshow(img)
plt.scatter(corners[:,1], corners[:,0], marker='x')
plt.axis('off')
plt.title('Detected Corners')
plt.show()
Copy the code
from panorama import harris_corners
img1 = imread('uttower1.jpg', as_grey=True)
img2 = imread('uttower2.jpg', as_grey=True)
# Use Harris corner detection algorithm to detect key points in the two imagesKeypoints1 = corner_peaks(harris_corners(IMg1, window_size=3), threshold_rel=0.05, Exclude_border =8) Keypoints2 = corner_peaks(harris_corners(IMg2, window_size=3), threshold_rel=0.05, exclude_border=8)# Show the key points detectedPlt. subplot(1,2,1) plt.imshow(IMg1) plt.scatter(KeyPoints1 [:,1], KeyPoints1 [:,0], marker='x')
plt.axis('off')
plt.title('Detected Keypoints for Image 1'Subplot (1,2,2) plt.imshow(IMg2) plt.scatter(KeyPoints2 [:,1], KeyPoints2 [:,0], marker='x')
plt.axis('off')
plt.title('Detected Keypoints for Image 2')
plt.show()
Copy the code
Thus, corner features are obtained, and we need to carry out matching description. Find matching vectors in two sets of description subsets. First, take a descriptor (i.e. the vector generated by the key point) from each of the two images and calculate their Euclidean distance. The Euclidean distance is used to determine whether they are a corresponding set of descriptors: if the distance between the pair of descriptors is significantly smaller than the other distances, then we consider them to be a corresponding set of feature points. The output of this function is an array of numbers, each row of which is a corresponding set of descriptive sub-indexes.
from panorama import simple_descriptor, match_descriptors, describe_keypoints
from utils import plot_matches
patch_size = 5
Generate description vector (descriptor) from key points
desc1 = describe_keypoints(img1, keypoints1,
desc_func=simple_descriptor,
patch_size=patch_size)
desc2 = describe_keypoints(img2, keypoints2,
desc_func=simple_descriptor,
patch_size=patch_size)
# match descriptorsMatches = match_descriptors(desc1, desc2, 0.7)# Draw the matching points
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
ax.axis('off')
plot_matches(ax, img1, img2, keypoints1, keypoints2, matches)
plt.show()
Copy the code
We now have a corresponding list of descriptors. We will use this set of corresponding lists to compute the transformation matrix that maps from picture two to picture one. In other words, if the point in picture 1And the point in picture 2We need to find an affine transformation matrix to perform the coordinate transformation so that
Among them,andThese are all homogeneous coordinates.
The thing to notice is that the matrixIt may not be possible to make every set of corresponding points match exactly, but we can use the least square method to estimate the optimal matrix H. Let’s say that givenGroup matching feature points to letandFor the two, each row of the matrix is the homogeneous coordinate of the corresponding feature point (that is, the complement 1 after the original coordinate). So, we know the matrixConditions should be met:
from panorama import fit_affine_matrix
Check whether the transformation matrix solution is correct
# test valuesA = np. Array ([[0.5, 0.1], [0.4, 0.2], [0.8, 0.2]]) b = np. Array ([[0.3, 0.2], [0.4, 0.9], [0.1, 0.1]]) H = fit_affine_matrix(b, a)# Expected output valueSol = np. Array ([[1.25, 2.5, 0.0], [5.75, 4.5, 0.0], [0.25, 1.0, 1.0]]) error = np. The sum ((H - sol) * * 2)if error < 1e-20:
print('Implementation correct! ')
else:
print('There is something wrong.')
Copy the code
from utils import get_output_space, warp_image
# Get feature points
p1 = keypoints1[matches[:,0]]
p2 = keypoints2[matches[:,1]]
Solve the transformation matrix
H = fit_affine_matrix(p1, p2)
output_shape, offset = get_output_space(img1, [img2], [H])
print("Output shape:", output_shape)
print("Offset:", offset)
# coordinate transformation
img1_warped = warp_image(img1, np.eye(3), output_shape, offset)# Picture 1 remains the same, so the identity matrix is used as the transformation matrix
img1_mask = (img1_warped != -1) # Mask not equal to -1 indicates that the position is the interpolated point of the original image
img1_warped[~img1_mask] = 0 # Mask reverses to indicate that there is no value at the location (the location is the background), and 0 makes the location black
img2_warped = warp_image(img2, H, output_shape, offset)
img2_mask = (img2_warped != -1) # Mask not equal to -1 indicates that the position is the interpolated point of the original image
img2_warped[~img2_mask] = 0 # Mask reverses to indicate that there is no value at the location (the location is the background), and 0 makes the location black
# Draw the image of the coordinate transformationPLT. Subplot (1, 2, 1) PLT. Imshow (img1_warped) PLT. Title ('Image 1 warped')
plt.axis('off') PLT. Subplot (1,2,2) PLT. Imshow (img2_warped) PLT. Title ('Image 2 warped')
plt.axis('off')
plt.show()
Copy the code
Output shape: [496 615]
Offset: / – 39.37184617 0.
merged = img1_warped + img2_warped
Determine which values are added twiceOverlap = (img1_mask * 1.0 +Type bool is converted to float
img2_mask)
# Divide the value of the overlapped part by 2 and the value of the unoverlapped part by 1 to ensure the same brightness
normalized = merged / np.maximum(overlap, 1)
plt.imshow(normalized)
plt.axis('off')
plt.show()
Copy the code
Ok, so that’s the final splice. I don’t know if you have a lot of question marks in your heart, or even want to send out “this is this is this is this is this is this” doubt. Let’s analyze the reason for this result. When matching feature points, we found that features had a very good correspondence and were visualized. Therefore, the problem was not in feature detection.
After that, in order to deform the image, we adopted the method of solving affine transformation matrix for affine transformation. At this step, our results had problems, because when determining the corresponding point, our limiting conditions were not accurate, leading to the introduction of many invalid data. RANSAC (“Random Sample Consensus”) method can be used to further filter the data and solve the transformation matrix.
The main steps of RANSAC include:
- Pick the corresponding points randomly
- Compute transformation matrix
- Calculate valid data points (Inliers) within a given threshold range
- Repeat steps 1-3 to record the maximum number of valid points
- The transformation matrix is recalculated using valid data points and least square method
from panorama import ransac
# Set the same seed to ensure the same random numbers are generated
np.random.seed(131)
H, robust_matches = ransac(keypoints1, keypoints2, matches, threshold=1)
# show results using RANSAC
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
plot_matches(ax, img1, img2, keypoints1, keypoints2, robust_matches)
plt.axis('off')
plt.show()
Copy the code
Now, we can use the newly computed transformation matrix toTo create a more stable panoramic image.
output_shape, offset = get_output_space(img1, [img2], [H])
# coordinate transformation
img1_warped = warp_image(img1, np.eye(3), output_shape, offset)# Picture 1 remains the same, so the identity matrix is used as the transformation matrix
img1_mask = (img1_warped != -1) # Mask not equal to -1 indicates that the position is the interpolated point of the original image
img1_warped[~img1_mask] = 0 # Mask reverses to indicate that there is no value at the location (the location is the background), and 0 makes the location black
img2_warped = warp_image(img2, H, output_shape, offset)
img2_mask = (img2_warped != -1) # Mask not equal to -1 indicates that the position is the interpolated point of the original image
img2_warped[~img2_mask] = 0 # Mask reverses to indicate that there is no value at the location (the location is the background), and 0 makes the location black
# Display the image after coordinate transformationPLT. Subplot (1, 2, 1) PLT. Imshow (img1_warped) PLT. Title ('Image 1 warped')
plt.axis('off') PLT. Subplot (1,2,2) PLT. Imshow (img2_warped) PLT. Title ('Image 2 warped')
plt.axis('off')
plt.show()
Copy the code
merged = img1_warped + img2_warped
Determine which values are added twiceOverlap = (img1_mask * 1.0 +Type bool is converted to float
img2_mask)
# Divide the value of the overlapped part by 2 and the value of the unoverlapped part by 1 to ensure the same brightness
normalized = merged / np.maximum(overlap, 1)
plt.imshow(normalized)
plt.axis('off')
plt.show()
Copy the code
7.2 Histogram of gradient Direction
In the previous calculations, we used simple_descriptor to calculate the eigenvectors of the eigenpoints. However, in this module, we will use HOG to calculate the features of feature points.
HOG value is histogram of gradient direction. In the HOG algorithm, we use the direction distribution of gradient as the feature vector. Gradients (x and y directions) are useful in an image and can be used to determine the type of points, such as edges, etc.
The HOG steps include:
- Compute the gradient image in the x and y directions
- Calculate the gradient histogram: Divide the image into several cells and calculate the gradient histogram for each cell
- Several cells are combined into a block, and the gradient histogram inside is transformed into feature vector
- Normalize the eigenvectors
from panorama import hog_descriptor
img1 = imread('uttower1.jpg', as_grey=True)
img2 = imread('uttower2.jpg', as_grey=True)
# Key point detectionKeypoints1 = corner_peaks(harris_corners(IMg1, window_size=3), threshold_rel=0.05, Exclude_border =8) Keypoints2 = corner_peaks(harris_corners(IMg2, window_size=3), threshold_rel=0.05, exclude_border=8)# Extracting features
desc1 = describe_keypoints(img1, keypoints1,
desc_func=hog_descriptor,
patch_size=16)
desc2 = describe_keypoints(img2, keypoints2,
desc_func=hog_descriptor,
patch_size=16)
# Feature matchingMatches = match_descriptors(desc1, desc2, 0.7)# Draw matching features
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
ax.axis('off')
plot_matches(ax, img1, img2, keypoints1, keypoints2, matches)
plt.show()
Copy the code
from panorama import ransac
# Set the same seed to ensure the same random numbers are generated
np.random.seed(131)
H, robust_matches = ransac(keypoints1, keypoints2, matches, threshold=1)
# Draw the matching points
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
plot_matches(ax, img1, img2, keypoints1, keypoints2, robust_matches)
plt.axis('off')
plt.show()
plt.imshow(imread('solution_hog_ransac.png'))
plt.axis('off')
plt.title('HOG descriptor + RANSAC Solution')
plt.show()
Copy the code
output_shape, offset = get_output_space(img1, [img2], [H])
# Coordinate transformation
img1_warped = warp_image(img1, np.eye(3), output_shape, offset)# Picture 1 only needs translation, so the identity matrix is used as the transformation matrix
img1_mask = (img1_warped != -1) # Mask not equal to -1 indicates that the position is the interpolated point of the original image
img1_warped[~img1_mask] = 0 # Mask reverses to indicate that there is no value at the location (the location is the background), and 0 makes the location black
img2_warped = warp_image(img2, H, output_shape, offset)
img2_mask = (img2_warped != -1) # Mask not equal to -1 indicates that the position is the interpolated point of the original image
img2_warped[~img2_mask] = 0 # Mask reverses to indicate that there is no value at the location (the location is the background), and 0 makes the location black
# Draw the image of the coordinate transformationPLT. Subplot (1, 2, 1) PLT. Imshow (img1_warped) PLT. Title ('Image 1 warped')
plt.axis('off') PLT. Subplot (1,2,2) PLT. Imshow (img2_warped) PLT. Title ('Image 2 warped')
plt.axis('off')
plt.show()
Copy the code
merged = img1_warped + img2_warped
Determine which values are added twiceOverlap = (img1_mask * 1.0 +Type bool is converted to float
img2_mask)
# Divide the value of the overlapped part by 2 and the value of the unoverlapped part by 1 to ensure the same brightness
normalized = merged / np.maximum(overlap, 1)
plt.imshow(normalized)
plt.axis('off')
plt.show()
plt.imshow(imread('solution_hog_panorama.png'))
plt.axis('off')
plt.title('HOG Descriptor Panorama Solution')
plt.show()
Copy the code
7.3 Simple Image Fusion Strategy
It can be found that in the final output picture, there is a clear dividing line in the fusion area. These dividing lines are not what we want, so we need to get rid of them, using a very simple method called Linear blending.
In the previous calculation of the fusion image, we divided the intensity value of the overlap area by 2, and the other parts by 1. What this means is that the left and right images have equally weighted proportions. But in practice, the left and right parts of the overlap should not have the same gravity: the part near the left image should have more gravity; Closer to the image on the right, the image on the right should be heavier. Accordingly, we can use the linear fusion method to improve the quality of the final panoramic image.
Linear fusion can be achieved by the following steps:
- The left and right boundaries of the fusion region are determined.
- Determine a weight matrix for Picture 1:
- From the far left of picture 1 to the far left of the fusion region, weight is 1;
- From the far left of the fusion region to the far right of picture 1, weight is distributed from 1 to 0;
- Determine a weight matrix for Picture 2:
- From the far right of the fusion region to the far right of picture 2, weight is 1;
- From the far left of the fusion region to the far right of picture 1, weight is distributed from 0 to 1;
- Apply weight matrix to the left and right images respectively;
- Add the two graphs.
from panorama import linear_blend
img1 = imread('uttower1.jpg', as_grey=True)
img2 = imread('uttower2.jpg', as_grey=True)
# Set the same seed to ensure the same random numbers are generated
np.random.seed(131)
# Detect feature pointsKeypoints1 = corner_peaks(harris_corners(IMg1, window_size=3), threshold_rel=0.05, Exclude_border =8) Keypoints2 = corner_peaks(harris_corners(IMg2, window_size=3), threshold_rel=0.05, exclude_border=8)Extract feature description vector
desc1 = describe_keypoints(img1, keypoints1,
desc_func=hog_descriptor,
patch_size=16)
desc2 = describe_keypoints(img2, keypoints2,
desc_func=hog_descriptor,
patch_size=16)
# Carry out matching of feature description vectorsMatches = match_descriptors(desc1, desc2, 0.7) H, robust_matches = ransac(keypoints1, keypoints2, matches, threshold=1) output_shape, offset = get_output_space(img1, [img2], [H])# Place the image in the final output image
img1_warped = warp_image(img1, np.eye(3), output_shape, offset)
img1_mask = (img1_warped != -1) # Mask == 1 inside the image
img1_warped[~img1_mask] = 0 # Return background values to 0
img2_warped = warp_image(img2, H, output_shape, offset)
img2_mask = (img2_warped != -1) # Mask == 1 inside the image
img2_warped[~img2_mask] = 0 # Return background values to 0
# Make two pictures into one
merged = linear_blend(img1_warped, img2_warped)
plt.imshow(merged)
plt.axis('off')
plt.show()
Copy the code
7.4 Multi-image registration — panoramic image Mosaic
Suppose now you haveimage, we can treat two adjacent pictures as a group and then calculate the picturesTo the pictureIs the coordinate transformation matrix, which is the affine transformation matrix we completed earlier. Next, we can select the middle photo as the reference image) and convert the coordinate systems of other photos to this imageA Mosaic of images.
from panorama import stitch_multiple_images
# Set the same seed to ensure the same random numbers are generated
np.random.seed(131)
Load the image that needs to be spliced
img1 = imread('yosemite1.jpg', as_grey=True)
img2 = imread('yosemite2.jpg', as_grey=True)
img3 = imread('yosemite3.jpg', as_grey=True)
img4 = imread('yosemite4.jpg', as_grey=True)
imgs = [img1, img2, img3, img4]
# Splice images together
panorama = stitch_multiple_images(imgs, desc_func=simple_descriptor, patch_size=5)
# Show the spliced image
plt.imshow(panorama)
plt.axis('off')
plt.show()
Copy the code
7.5 Expanding the Discussion
In order to reduce the amount of calculation, the code of this paper is to process grayscale images, but this does not mean that color images can not be feature extraction and splicing. It is also feasible to calculate features and register based on features for color images. Feature matching based on gray-scale images or comprehensive three-channel images can achieve Mosaic on color images.
In addition, there are 3 to 4 mainstream image Mosaic directions in the relevant researches in the field of image Mosaic, among which the most interesting one is shape-preserving half-projective (SPHP, 2014CVPR) based on mesh deformation. The way to do it is to find an optimal common plane and map all the images onto the common plane.
From the perspective of shape correction, SPHP gradually transforms non-overlapping regions into global similarity, and adds similarity transformation constraints to the whole image to correct the shape of splicing images and reduce projection distortion (the core idea of the algorithm — mesh deformation, as shown below).
The algorithm flow chart is as follows:
The Matlab code can be downloaded in this url: www.cmlab.csie.ntu.edu.tw/~frank/SPH/cvpr14_SPHP_code.tar
The feature point matching results of the method are as follows:
The result of this method is very good, and it can correct the local nonlinear distortion of image, which is the advantage of mesh deformation.
Interested friends can understand the relevant methods in this field, feature and registration is still a hot research direction in the field of computer vision.
The eight is at the end
Harris corner detection reference, published in 1988, cited 17316 times
[1] Harris C G, Stephens M. A combined corner and edge detector[C]//Alvey vision conference. 1988, 15(50): 10-5244.
Google Academic and Baidu Academic are unable to see the original text of this 88 year old article, it is probably a little old…. Put a certain domestic document network of online browsing page max.book118.com/html/2017/0…
IJCV and CVPR are one of the three summits of computer vision, [2] published in 2005, cited 30607 times, [3] published in 2012, cited 166
[2] Dalal N , Triggs B . Histograms of Oriented Gradients for Human Detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE, 2005.
Eee.uci.edu/16w/34830/h…
[3] Chandrasekhar V , Takacs G , Chen D M , et al. Compressed Histogram of Gradients: A Low-Bitrate Descriptor[J]. International Journal of Computer Vision, 2012, 96(3):384-399.
Web.stanford.edu/~bgirod/pdf…
SPHP Image Mosaic, published in 2014, cited 158:
[4] Chang C H, Sato Y, Chuang Y Y. Shape-preserving half-projective warps for image stitching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 3254-3261.
www.cv-foundation.org/openaccess/…