I learned how to use different data types (String, Hash, List, Set, Sorted Set, HyperLogLog, Bitmap) to solve statistics problems in different scenarios.

The product manager said he had an idea to provide an opportunity for teenagers to connect with each other

Let in this most beautiful age boys and girls can meet in every twelve hours that Ta.

So I want to develop an App, users can log in and find the nearby Ta, connect to each other.

How do I realize discovering people nearby? I also hope to meet a goddess through this App…

In my memory, one night after work, she was moving lightly from the crowd, her tall and slender figure like an elegant note floating in space. Her eyes were full of clear sunlight and vitality, and she had the stars of the Milky Way in her eyes.

The opening remarks

Practice your presentation skills, especially at work. A lot of people say, “The people who do the work are not as good as the people who do the PPT.” In fact, the boss is not stupid. Why do they recognize the people who do the PPT more?

Because they see things from the boss’s point of view, for whom a “solution” is needed. Think from the point of view of a creator rather than a programmer;

Think about what value this thing provides, not “how am I going to make it happen?” Of course, how you get there is essential, but often not the most important thing.

What is a LBS app

Longitude and latitude are the combined names of longitude and latitude to form a coordinate system. Also known as geographic coordinate system, it is a spherical coordinate system that defines the space of the earth by using a sphere of three dimensions, capable of identifying any position on the earth (to 7 decimal places, up to 1 centimeter).

The range of longitude is (-180, 180), and the range of latitude is (-90, 90). Latitude is bounded by the equator, north and south, and longitude is bounded by the prime meridian (Greenwich Observatory, UK), east and west, and negative.

People nearby is also known as LBS (Location Based Services), which is a service Based on users’ current geographical Location data and provides accurate encounter Services for users.

The core thoughts of the people nearby are as follows:

  1. With “I” as the center, search for nearby Ta;
  2. Calculate the distance between others and “me” based on my current geographical location;
  3. Sort by the distance between “me” and others, and screen out the users closest to me.

MySQL implementation

Calculate “nearby people”, calculate other data around this coordinate by a coordinate, order by distance, how to start?

If we draw a circle with a radius of 1000 meters, the users in the circle are the “nearby people” we want to meet.

Store latitude and longitude in MySQL:

CREATE TABLE `nearby_user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL COMMENT 'name'.`longitude` double DEFAULT NULL COMMENT 'longitude'.`latitude` double DEFAULT NULL COMMENT 'latitude'.`create_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'Creation time',
  PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Copy the code

But can’t go through all the “goddess” longitude and latitude and their own longitude and latitude data in order according to the distance, this calculation is too large.

We can filter out the finite “goddess” coordinate data through the region, and then calculate the full distance of the data in the rectangular region and then order it, so that the calculation amount is significantly reduced.

How do we divide the rectangular area?

In a square on the circular coat, according to the maximum and minimum values of the user’s longitude and latitude (longitude, latitude + distance), as a filter condition to filter the data, it is easy to search out the “goddess” information in the square.

What about the extra areas?

The distance between the users in the extra area and the dot must be larger than the radius of the circle, so we calculate the distance between the center point of the user and all users in the square, and select all users whose distance is less than or equal to the radius. The users in the circular area are the nearby people who meet the requirements.

To meet the high performance rectangular region algorithm, the data table needs to be indexed in longitude and latitude coordinates to maximize query performance.

In actual combat

To obtain the maximum and minimum longitude and latitude of the outer rectangle based on the latitude and longitude and calculate the distance based on the latitude and longitude use a third-party class library:

<dependency>
     <groupId>com.spatial4j</groupId>
     <artifactId>spatial4j</artifactId>
     <version>0.5</version>
</dependency>
Copy the code

After obtaining the outer rectangle, search the users in the square area with the maximum, minimum and latitude values of the rectangle, and then eliminate the users who exceed the specified distance, which is the final nearby people.

/** * Get people nearby x meters **@paramDistance Search range unit km *@paramUserLng Specifies the longitude * of the current user@paramUserLat Specifies the latitude */ of the current user
public String nearBySearch(doubledistancedouble userLng, double userLat) {
  //1. Get the outer square
  Rectangle rectangle = getRectangle(distance, userLng, userLat);
  //2. Get all users whose positions are within the square
  List<User> users = userMapper.selectUser(rectangle.getMinX(), rectangle.getMaxX(), rectangle.getMinY(), rectangle.getMaxY());
  //3. Delete redundant users whose radius exceeds the specified distance
  users = users.stream()
    .filter(a -> getDistance(a.getLongitude(), a.getLatitude(), userLng, userLat) <= distance)
    .collect(Collectors.toList());
  return JSON.toJSONString(users);
}

// Get the enclosing rectangle
private Rectangle getRectangle(doubledistancedouble userLng, double userLat) {
  return spatialContext.getDistCalc()
    .calcBoxByDistFromPt(spatialContext.makePoint(userLng, userLat), 
                         distance * DistanceUtils.KM_TO_DEG, spatialContext, null);
}

     /*** * The distance between two points on a sphere *@paramLongitude 1 *@paramLatitude latitude 1 *@paramUserLng longitude 2 *@paramUserLat Latitude 2 *@returnReturn distance in km */
    private double getDistance(Double longitude, Double latitude, double userLng, double userLat) {
        return spatialContext.calcDistance(spatialContext.makePoint(userLng, userLat),
                spatialContext.makePoint(longitude, latitude)) * DistanceUtils.DEG_TO_KM;
    }
Copy the code

Since the sorting of user distances is done in the business code, you can see that the SQL statements are also very simple.

SELECT * FROM nearby_user
WHERE 1=1
AND (longitude BETWEEN #{minlng} AND #{maxlng})
AND (latitude BETWEEN #{minlat} AND #{maxlat})
Copy the code

However, database query performance is limited, and if there are a lot of “nearby” query requests, this may not be a good solution in high concurrency situations.

The Redis Hash attempt failed

Let’s analyze the characteristics of LBS data:

  1. Each “goddess” has an ID number, and each ID corresponds to latitude and longitude information.
  2. “Homeboy” landedapp When you get a girl,app According to the latitude and longitude of the “otaku” to find the nearby “goddess”.
  3. After obtaining the “goddess” ID list matching the location, the “goddess” information corresponding to the ID is obtained from the database and returned to the user.

The data feature is a goddess (user) corresponding to a set of latitude and longitude, which reminds me of Redis Hash structure. That is, a key (goddess ID) corresponds to a value (longitude and latitude).

The Hash looks like it could be implemented, but in addition to recording the latitude and longitude, the LBS application also needs to perform a range query on the data in the Hash set and sort the data based on the conversion of latitude and longitude to distance.

The data in the Hash collection is unordered, which is obviously undesirable.

Sorted Set is on its way

Is the Sorted Set type appropriate? Because it can sort.

The Sorted Set type also has a key corresponding to a value, the key element contents, and the value ‘is the element’s weight score.

The Sorted Set can sort elements by their weight score, so it looks like it satisfies our needs.

For example, the Sorted Set element is a “goddess ID,” and the element weight score is latitude and longitude information.

The Sorted Set element has a floating point weight and a latitude and longitude value. Can you convert latitude and longitude to a floating point number?

Right idea. In order to compare longitude and latitude, Redis adopts the GeoHash encoding widely used in the industry to encode longitude and latitude respectively, and then combine the respective codes of longitude and latitude into a final code.

This converts the latitude and longitude to a value, whereas Redis uses the Sorted Set as the underlying data structure of the GEO type.

Let’s see how GeoHash encodes latitude and longitude.

GEOHash encoding

About the GeoHash may refer to: en.wikipedia.org/wiki/Geohas…

The GeoHash algorithm maps two-dimensional latitude and longitude data to one-dimensional integers, so that all elements are mounted on a line, and points that are close to each other from adjacent two-dimensional coordinates are mapped to one-dimensional points.

When we want to calculate “nearby people,” we first map the target location to this line, and then get the nearby points on this one-dimensional line.

The GeoHash encoding encodes a longitude value into an N-bit binary value. We do N dipartition operations on the longitude range [-180,180], where N can be customized.

On the first dichotomy, the longitude range [-180,180] is divided into two sub-sections: [-180,0] and [0,180] (which I call the left and right partitions).

At this point, we can check whether the longitude value to be coded falls in the left partition or the right partition. If it falls in the left partition, we’ll call it 0; If it falls in the right partition, it’s a 1.

This way, every time we do a binary partition, we get a 1-bit code value (either 0 or 1).

Then do two partitions for the partition to which the longitude value belongs, and check again whether the longitude value falls in the left partition or the right partition after the two partitions, and do one bit coding according to the rule just now. After we have done the N dipartition, the longitude value can be expressed as an N bit number.

All map element coordinates will be placed in a unique square. The smaller the grid, the more accurate the coordinates. Then the squares are integer coded, and the closer the squares are, the closer they are.

After coding, the coordinates of each map element will become an integer, through which the coordinates of the elements can be restored. The longer the integer is, the less the loss of the restored coordinate value will be. For the “Nearby” feature, the loss of a bit of accuracy is negligible.

For example, the longitude value of 169.99 is coded in 4 bits (N = 4, 4 partitions), and the longitude interval [-180,180] is divided into a left partition [-180,0] and a right partition [0,180].

  1. 169.99 belongs to the right partition, use1Represents the first partition code;
  2. Then, 169.99 is divided into [0, 90) and [90, 180] after the first partition from the interval of [0, 180]. 169.99 is still in the right interval, and the code is’ 1 ‘.
  3. Divide [90, 180] into [90, 135) and [135, 180], this time in the left partition, code ‘0’.

So we end up with a four-bit code.

Latitude is encoded in the same way as longitude, so I won’t repeat it.

Combined latitude and longitude coding

If the calculated longitude and latitude codes are 11011 and 00101 ‘respectively, the 0th bit of the target coding takes the value 1 of the 0th bit of the longitude as the target value, and the 1st bit of the target coding takes the value 0 of the 0th bit of the latitude as the target value, and so on:

In this way, latitude and longitude (35.679, 114.020) can be represented by 1010011011, and this value can be sorted as the weight value of the SortedSet.

Redis GEO implementation

The GEO type uses the geohashed combined values of latitude and longitude as the score weight of the Sorted Set element.

We need to save the girl’s ID and the corresponding latitude and longitude to the Sorted Set.

IO /commands#ge…

GEOADD

Redis provides the GEOADD Key Longitude Latitude member command to record a set of longitude and latitude information and the corresponding “goddess ID” into a set of GEO types, as follows: Record the longitude and latitude information of multiple users at a time (Sora aoi, Noori Haruo).

GEOADD Girl :localtion 13.361389 38.115556 "Sora" 15.087269 37.502669"Copy the code

GEORADIUS

I log in the app and get my latitude and longitude information. How do I find other users within a certain range centered on this latitude and longitude?

The Redis GEO type provides the GEORADIUS directive, which looks for other elements within a range centered on the input latitude and longitude.

Assuming your latitude and longitude are (15.087269 37.502669), you need to get a nearby “goddess” of 10 km and return it to the LBS app:

GEORADIUS girl:locations 15.087269 37.502669 km ASC COUNT 10
Copy the code

ASC can realize the “goddess” information according to its latitude and longitude from the nearest to the farthest.

The COUNT option specifies the number of goddesses to be returned to prevent too many nearby goddesses and save bandwidth resources.

If you feel like you need more goddesses, you can have no limit, but you need to pay attention to your body and eat more eggs.

After user logging out, if delete logging out of the “goddess” longitude and latitude?

That’s a good question. The GEO type is implemented based on the Sorted Set, so you can borrow the ZREM command to delete geo-location information.

For example, delete the location information of “Sora” :

ZREM Girl: LocaltionCopy the code

summary

Instead of designing a new underlying data structure itself, GEO uses the Sorted Set collection type directly.

The GEO type uses the GeoHash encoding method to convert the weight scores of the elements in the longitude and latitude to the Sorted Set. The two key mechanisms are interval partitioning and interval coding for the 2d map.

After a Set of longitudes and latitudes falls within a certain interval, they are represented by the code value of that interval, and the code value is treated as the weight score of the Sorted Set element.

In a map application, there may be millions of pieces of data about cars, restaurants and people, and if you use Redis’ Geo data structure, they will all be in a Zset set.

In the Redis cluster environment, the collection may be migrated from one node to another. If the data of a single key is too large, the migration of the cluster will be greatly affected. In the cluster environment, the data amount corresponding to a single key should not exceed 1M; otherwise, the cluster migration will lag. Online services are affected.

Therefore, it is recommended that Geo data be deployed using a separate Redis cluster instance.

If the data volume exceeds 100 million or more, it is necessary to split the Geo data by country, by province, by city, and even by region in mega-populous cities.

This can significantly reduce the size of a single Zset set.