Hello everyone, today we are going to look at a very, very classic algorithm problem – nearest point pair problem.
This question often appears in various interviews, the difficulty is not low, few people can answer. To tell you the truth, I have been asked the same question, so I did not answer because I was unprepared. Yes, it’s a bit of a magic question that unprepared people often fail to answer.
The question
So let’s look at the problem. The problem is very simple, n points in a plane. So now we know the coordinates of these n points, and we want to find the distance between the two nearest points of these n points.
I’m not sure if this question comes from astronomy, but it’s very appropriate to put it in an astronomical context. Imagine that there are millions of stars in the universe, and we want to find the closest two of them. They could be binary stars or companion galaxies… Does that sound romantic when you think about it?
If we analyze the problem, we will find a contradiction. The paradox is that if we want to figure out the distance between every two points, the complexity must be zeroBecause if you take two of n points, one of them will bePossible. If there were a faster algorithm, then of course we wouldn’t be able to find all the distances between the pairs, but if we haven’t even enumerated all the distances, how can we tell if we’ve got it right?
That’s what I said in the interview. We now know it’s wrong, but without this information, how can you judge?
Divide and conquer method
If we think about it a little bit, this is a very similar problem to sorting. Because when we sort, it looks like there’s a magnitude relationship between every two points, and it looks like we’re trying to sort to get those relationships. But actually, we all know that you can do either quicksort or merge sortTime to complete the sorting.
Both quicksort and merge sort are essentially divide-and-conquer. Can this problem be solved using divide-and-conquer?
The answer is yes, of course, since we are using divide-and-conquer, the first thing we need to do is split the whole data into two parts, use recursion to complete the two parts, and then merge to get the complete result. In this problem, it is very simple for us to split the data. We just need to sort all the points according to the X-axis coordinates, and then select the midpoint for segmentation. After segmentation, we get the following results:
After the split, we only need to count the closest point pair of the left part, the closest point pair of the right part, and one point on the left and one point on the right. The first two cases are easy to solve, we can just recurse, but what about the third case? That is the difficulty of this question.
It is not easy to analyze this problem clearly, and in-depth thinking is required. First of all, we can obtain the shortest distance D1 of SL on the left and the shortest distance D2 of SR on the right through recursive calls. We takeThat’s the smallest distance between the left and the right, and that should make sense.
Once we have D, we can use it to limit the range of pairs of points in SL and SR, otherwise we have to compare n/2 points on each side, which is still computatively complicated.
So let’s analyze the problem, let’s pick a random point p on the left, and let’s think about a question, for point P, all of the points on the side of SR could be closest to it, right? Of course not, there are some far away is obviously impossible, for these points we do not need to go through, directly can batch ignore. To form the nearest point pair with p, it must be within the dotted line below.
The box formed by the dotted line is a rectangle with a width of D and a length of 2D. Where does that come from? In fact, for point P, in order to form a global nearest point pair with it, the distance from it must be less than the current optimal solution D. And since the distance has to be less than D, that means that the absolute value of the difference between their vertical and horizontal coordinates has to be less than D.
Of course, this box is just what we see intuitively. There is no such box when the actual algorithm runs. We can only judge whether a certain point is in the box according to the coordinate axis.
With this box, we have another question, which is how useful is this box? Does this box reduce the complexity of the algorithm? Is there an extreme case where all the right points are in the box?
And it turns out that it’s not possible, and it turns out that it’s not possible, and it turns out that it’s very, very surprising.
First of all, let’s argue why it’s impossible for all the points on the right to fall in this dotted box. So let’s look at the extreme case first, and the extreme case is where we pick p right on the dividing line. Then the frames drawn with it should all fall in the SR region, and the drawing should look something like this:
But if we think about it a little bit, the problem is that the number of dots in this dotted box can’t be infinite. Because we have a basic requirement for the points in the box, that the points that are in the box and in the SR region should be at least D apart. If it’s less than D then it contradicts what we just said that D is the minimum distance between the left and the right. So the situation in the diagram above is actually impossible, because there are so many points clustered together that the distance between the two points is obviously less than D, which is a contradiction.
So because of this distance, there’s a finite number of points that can fall into this dotted box, and it’s a much smaller number than you might think. How small is it? As small as 6, this is the case:
In the figure above, there are six points, and the shortest distance between them is D, which is the most extreme case. No matter how we add points to it, it’s bound to produce two points that are less than D apart. This is our intuitive feeling, is there any way to prove it? In fact, there is. We can divide the rectangle into six equal parts and make it look like this:
So let’s see, the length of each of these little rectangles is, and the width isIts diagonal length is 1. Then according to thePigeon house principleIf we put more than six points, there must be two points in a small rectangle. However, the maximum distance within the small rectangle is less than D, which means that the distance between the two points must also be less than D, which contradicts our previous assumption. Therefore, it can be concluded that there is no case where there are more than 7 points.
In other words, for point P on SL side, we can only find 6 points on SR side at most to constitute the shortest point pair, so the number of point pairs we need to screen is greatly reduced. And for the points on SL side, not all points need to be considered, only the points whose abscissa difference with the midpoint O is less than D need to be considered.
On the surface it looks like all of our analysis is over, but there’s actually one problem left unsolved. How do we find these six points? Obviously, you can’t just take the x-coordinate, but you have to take the y-coordinate. After dividing the point set into the left and right parts, we sorted the right part according to ordinate. For the point on the left (x, y), we only need to screen out the points on the right that meet the ordinate range of (y-d, y + D), and there are only 6 such points at most. We can use the dichotomy method to find the smallest point whose ordinate is greater than y-d, and then enumerate the next six points.
Code implementation
Before we can implement the algorithm, we need to generate test data, otherwise how do we verify that our algorithm has problems? I also developed this algorithm from scratch, which is also helpful for DEBUG.
In this problem, the test data is relatively simple, only need to generate two random numbers as coordinates. We call this method to make 200 points as a test.
import random
def random_point(a):
x, y = random.uniform(0.1000), random.uniform(0.1000)
return (x, y)
points = [random_point() for _ in range(200)]
Copy the code
And then we’re going to implement the brute force solution, to test the correctness of our algorithm, which I don’t think I have to say much about.
def distance(x, y):
return math.sqrt((x[0] - y[0]) * *2 + (x[1] - y[1]) * *2)
def brute_force(points):
ret = sys.maxsize
a, b = None.None
n = len(points)
for i in range(n):
for j in range(i+1, n):
dis = distance(points[i], points[j])
if dis < ret:
ret = dis
a, b = i, j
return ret, points[a], points[b]
Copy the code
In fact, the algorithm itself is not a large amount of code, but a lot of details, a slight mistake may be overturned.
def divide_algorithm(points):
n = len(points)
# select only one or two dots
if n < 2:
return sys.maxsize, None.None
elif n == 2:
return distance(points[0], points[1]), points[0], points[1]
# Sort all points in horizontal order
points = sorted(points)
half = (n - 1) / /2
# recursion, here's a question, why sort first and then recurse?
d1, a1, b1 = divide_algorithm(points[:half])
d2, a2, b2 = divide_algorithm(points[half:])
d, a, b = (d1, a1, b1) if d1 < d2 else (d2, a2, b2)
calibration = points[half][0]
# Divide the set of points into two parts based on the middle position
left, right = [], []
for u in points:
if calibration - d < u[0] < calibration:
left.append(u)
elif calibration <= u[0] < calibration + d:
right.append(u)
The set of points on the right is sorted in ordinate order
right = sorted(right, key=lambda x: (x[1], x[0]))
res = d
for u in left:
# left open right closed dichotomy
l, r = - 1, len(right)- 1
while r - l > 1:
m = (l + r) >> 1
if right[m][1] <= u[1] - d:
l = m
else:
r = m
idx = r
A maximum of 6 points in a range can form the nearest point pair
for j in range(7) :
if j + idx >= len(right):
break
if distance(u, right[idx + j]) < res:
res = distance(u, right[idx + j])
a, b = u, right[idx + j]
return res, a, b
Copy the code
The algorithm is done, but there are still some details, like why do we need to sort and recurse when we divide and conquer? Can we just recurse in two parts? Why not? For example, when we’re dichotomizing, we’re using the interval dichotomy, right?
I’m not going to answer these two questions, but I hope you can try to think about it for yourself. If you have any ideas, feel free to leave a comment below.
That’s all for today’s article. I sincerely wish you all a fruitful day. If you still like today’s content, please join us in a three-way support.
Original link, ask a concern