Use GeoHash to encode a location

One, foreword

There was a recent requirement to calculate all stores within 5km of customer coordinates and sort them by walking distance.

The most direct method is to traverse all stores in the city, but this method is obviously not desirable, because the time complexity is too high in the case of a large number of stores and the calculation of walking distance and ranking.

It suddenly occurred to me that when I was making images, I met a problem: given three thousand consecutive frames in a video, but they were out of order, I told you the first frame and sorted the three thousand frames. Traversing through all the pixels of an image was also not desirable; the solution was to use perceptual hashing to calculate the fingerprints of all images, then use Ming distance to calculate the image closest to the first as the second, then calculate the image closest to the second as the third, and so on.

In the same way, there must be a hashing method that converts a geographic location into a code that can be used for geocomputation. It is the GeoHash.

2. Relevant knowledge

Before entering the body, let’s review the geography of junior high school:

West of the prime meridian is the West meridian, divided into 12 zones, 15 degrees each zone a total of 180 degrees; East longitude, also divided into 12 zones, a total of 180 degrees.

North of the equator is north latitude, a total of 90 degrees; South is south latitude, same 90 degrees.

Then in the computer, the coordinates are expressed as:

The west longitude is negative and the east longitude is positive, so the longitude is [-180, 180].

The north latitude is positive and the south is negative, so the latitude is [-90, 90].

We know that the equator is about 40,000 kilometers long, so each degree of longitude is about 111 kilometers. The earth is actually an irregular sphere, but for simplicity of calculation we shall assume that each degree of latitude is approximately equal to 222 kilometers.

3. First acquaintance with GeoHash

1. Calculate the binary code

Firstly, the binary coding is calculated, starting with [-180, 180] in longitude and [-90, 90] in latitude. The interval is divided into two parts each time. If the input coordinate is less than the middle value, the coding is 0. If it is greater than, the code is 1, and the next interval is the right half interval. Similarly, the longer the code, the closer it is to the coordinate value, and thus the more accurate it is.

Taking 118°04 ’04 and 24°26′ 46 as examples, the coding of longitude is firstly calculated:

The length of the code interval The median coding instructions Accuracy (km)
1 [- 180, 180] 0 1 118°04 ’04 is greater than 0°, so code 1 is taken as the right range 19980
2 [0, 180] 90 1 Over 90 ° 118 ° 04 ’04 9990
3 [90, 180] 135 0 118°04 ’04 is less than 135°, so the code 0 is taken as the left interval 4995
4 [90, 135] 112.5 1 2497.5
5 [112.5, 135] 123.75 0 1248.75
6 [112.5, 123.75] 118.125 0 624.375 ‬
7 [112.5, 118.125] 115.3125 1 312.188
8 [115.3125, 118.125] 116.71875 ‬ 1 156.094
9 ‬ [116.71875, 118.125] 117.421875 1 78.047 ‬
10 [117.421875, 118.125] 117.7734375 ‬ 1 39.024
N . . . .

Similarly, calculate the latitude to get the code:

The length of the code interval The median coding instructions Accuracy (km)
1 [- 90, 90] 0 1 24°26 ’46 is greater than 0°, so code 1, take the right interval 19980
2 [0, 90] 45 0 24°26 ’46 is less than 45°, so code 0 and take the left interval 9990
3 [0, 45] 22.5 1 4995
4 [45] 22.5, 33.75 0 2497.5
5 [22.5, 33.75] 28.125 0 1248.75
6 [22.5, 28.125] 25.3125 0 624.375 ‬
7 [22.5, 25.3125] 23.906 1 312.188
8 [23.906, 25.3125] 24.609 0 156.094
9 [23.906, 24.609] 24.2575 ‬ 1 78.047 ‬
10 ‬ [24.2575, 24.609] 24.433 0 39.024
N . . . .

To sum up, suppose we only take the tens digit code, the longitude code is 1101001111, and the latitude code is 1010001010.

Interweave the two codes like a warp and weft web:

11100110000011101110

2. Convert base32 encoding

Binary encoding can actually be used as geolocation encoding, but:

  1. Not easy to find the surrounding block. To calculate adjacent blocks, it is necessary to parse them into warp and weft codes and then calculate them. When base32 encoding is used, table lookup method can be used to speed up the calculation.
  2. The binary code length is too long for retrieval.

Therefore, GeoHash uses both Base32 and Base36 encodings, and since most do, this article covers only base32 encodings.

Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Base 32 0 1 2 3 4 5 6 7 8 9 B C D E F G
Decimal 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Base 32 H J K M N P Q R S T U V W X Y Z

Base32 encoding has 32 encodings, so 5 bits are needed to convert 11100110000011101110 to Base32 encoding:

WS7F

4. Calculate adjacent blocks

1. The general method

The input of the general method is binary code, so unlike the lookup table method has a bit limit, applicable to all cases, the process is simple.

Firstly, the binary code of longitude and latitude is divided into longitude and latitude. If the input position is north, the latitude is added by one. South, the latitude is reduced by one; To the west, the longitude is reduced by one; East, add one to the longitude. Note that the addition and subtraction must keep the original significant bits (for example, 11 + 01 = 00, keeping two significant bits). Finally, the calculated results are recombined to obtain the binary code of the desired position.

The process is as follows:

2. The look-up table method

The table lookup method is much faster than the general method and does not divide the input encoding into latitude and longitude for addition and subtraction, but it is important to note that the input is base32 encoding and therefore only works with geohashes that are multiples of five binary encoding bits.

Most articles on the web only talk about how to use lookup table method to calculate the neighborhood block, so how to get this lookup table? Now that I have the code, how do I compute the neighborhood block? Most articles on the web only show how to use table lookup to calculate neighborhood blocks, but how to get this table lookup?

From the previous, we know that the encoding consists of five bits, which are interlaced with warp and weft. Therefore, if the code is in odd digits, that is, weft by weft interleaved; If it is in even position, it is weft by weft by weft.

So, take W as an example, binary is 11100. If it is in odd digits, it is upper right, lower left in the table; if it is in even digits, it is upper right, lower left, lower left in the table. Therefore, the position of W in the table lookup is shown as follows:

WS7F
F
F
E, G, D, 9, C
5, 4, 1
5, 4, 1
7
7
K

Therefore, the surrounding adjacent blocks are located WS7E, WS7G, WS7D, WS79, WS7C, WSK5, WSK4 and WSK1 respectively, and the position relationship is as follows:

Five, the summary

Mainly introduced in theory:

  1. GeoHash encodes geographic points by taking a binary based on a “recursive dichotomy” and converting the binary to Base32 encoding.
  2. A quick way to compute the GeoHash neighborhood is to take the last bit code, look up the neighborhood code based on the even-digit table, look up the previous bit code if one direction is out of bounds, and look up the corresponding direction code based on the odd/even table.

Use GeoHash to encode a location

Sixth, the extensions

  1. GeoHash can be used to encode not only positional points but also faces, helping to compute multiple combinations of points and faces. For example, determine whether the location is within the store’s distribution fence.

  2. The GeoHash is an application of the Peano curve as follows:

    There are also many space-filling curves, such as the Hilbert curve, which is generally considered good and has no large mutations.

reference

  1. Core principles of GeoHash
  2. Based on fast GeoHash, how to achieve efficient matching between mass goods and business circles?