Hornet’s nest technology original content, more dry goods please pay attention to the public number: MFWtech

With the expansion of smartphone storage capacity and the popularity of photo album backup technology, we can record our life with mobile phone video anytime and anywhere. It has become common to store thousands or even tens of thousands of photos on mobile phones. But on the other hand, when we try to find one from so many photos, it is also a trouble.

Hornet’s nest, as a platform for travel and entertainment, hopes to realize the connection between “people who can play” and “fun things”. Many travel enthusiasts record and share their travel memories here, which makes hornet’s nest accumulate a lot of content in the field of tourism UGC. That’s why we’re constantly trying to optimize the user experience when publishing content.

Using photos and videos is the most direct way to record your trip. This article will introduce how Hornet’s nest through the application of App geographic album spatial index, to provide users with intuitive, user-friendly picture sharing services.

Part.1 Application Scenarios and Requirements

In order to enable users to quickly find the photos/videos they want to share, we need an effective and reasonable screening method, which can aggregate and sort users’ albums and improve the ease and convenience of sharing and recording life by relying on albums.

The first step is to determine the filter dimension of aggregation sort. The geographic location of the photo is the most intuitive classification dimension; At the same time, recording recent events is consistent with the user’s Posting behavior. Therefore, our plan should meet the following requirements:

  1. According to the destination and time, the user albums are aggregated and sorted.

  2. Searches a user’s album for photos/videos in a given range based on a geographic location and a given range.

The geographic photo album service mentioned in this paper mainly has two landing scenes in hornet’s Nest App.

1.1 notes

“Notes” is short travel content sharing in the form of pictures and videos. User the first link of the release notes is to select need to release from the album photo/video, in the new App, based on geographic photo album service data, combined with the hornet’s nest has its own destination album to users to classify according to the location dimension of polymerization, and according to the piece/video creation time near and sorting, promote efficiency of the user to select release.

1.2 footprint

The footprint feature is designed to make it easier for hornet’s cell users to record their trips by automatically synchronizing or manually clicking on countries and regions they have visited. There is a scene in My Footprints that encourages users to post notes about places they have visited but haven’t posted notes yet. At this point, the geographic album service can help users to publish all the photos in the album with the designated place as the center and within a given radius.

Part.2 Scheme design and algorithm selection

2.1 Initial Plan

At the beginning, the scheme we came up with was intuitive and crude, that is, the service end would calculate the result after traversing the album. To be specific, first take out all photos/videos of users carrying geographic information, then upload the geographic information (latitude and longitude) to the server, which will conduct aggregation and screening, and return the results to the client. However, this scheme has many disadvantages.

At the beginning of this article, we have described that the number of photos in users’ mobile devices is tens of thousands. If we go through all the pictures, the volume of uploaded data is huge. At the same time, the geographical location of the photos of ordinary users will be clustered, because we usually take many photos within the range of a place, which leads to a large number of repeated clustering calculation.

If we want to optimize this scenario, we can use the cache + incremental request approach for the first requirement, because the user classification data is stable. However, for a given range of queries, we can not do caching, which requires a large amount of time to request the server to do a lot of calculations, the consumption of time is intolerable.

As can be seen, the challenges of the above scheme mainly lie in the data volume and repetition of geographic information in user albums, performance problems caused by relying on server to calculate search results, and user experience. After investigation, we found that the index algorithm based on geographic spatial points (latitude and longitude) can solve these problems well.

2.2 Practice of geospatial point index algorithm

Combined with our actual needs to understand the geospatial point index algorithm, that is, to find a suitable method to add index to the massive coordinates in the geographic space, so as to quickly query and sort the spatial points of an algorithm.

We have compared the selection of some common geospatial point index algorithms, and the GeoHash algorithm and Google-S2 algorithm are mainly introduced below.

2.2.1 Algorithm selection

(1) the GeoHash

The GeoHash algorithm is the geographical distance sorting algorithm. Geohash is a type of geocoding invented by Gustavo Niemeyer. It uses a hierarchical data structure to divide space into grids.

GeoHash is a practical application of the Z-order curve of a spatial filling curve. One property of GeoHash related to the Z-order curve is that the Hash string near a point always has a common prefix, and the longer the length of the common prefix, the closer the two points are. Because of this feature, geoHashes are often used as unique identifiers, such as to uniquely represent a point in a database.

The feature of the common prefix GeoHash can be used to quickly search for nearby points. The closer the point is, the longer the common prefix is with the GeoHash string of the target point. But the z-order curve has a serious problem, that is, its mutability. At each corner of the letter Z, there may be a sudden change in the order, resulting in poor accuracy of the search for nearby points, which cannot meet the accuracy requirements of our business scenarios.

(2) S2 algorithm

S2 actually comes from the mathematical symbol S squared in geometric mathematics, which refers to the unit sphere.

S2 algorithm USES the cubes projection of the earth, then using the Hilbert fractal curve will open after two dimensional earth populated, completed the three dimensionality of the fractal of the earth, to get the space coordinates of point and Hilbert fractal curve function relation, the spherical latitude and longitude coordinates into spherical xyz coordinate, Then it is converted to the coordinates on the cube projection plane, and finally transformed to the corrected coordinates in the coordinate system and mapped to the interval [0,2^30^-1]. The last step is to map all the points in the coordinate system to the Hilbert curve. Finally, the points mapped to the Hilbert curve become Cell ids, which are the indexes of spatial coordinate points.

The biggest advantage of S2 lies in its high accuracy. Geohash has 12 levels, ranging from 5000km to 3.7cm, with large variations at each level. Sometimes the next level might be a lot bigger, and the next level might be a little bit smaller. S2 has 30 levels, from 0.7cm² to 85,000,000km², and the changes of each level in the middle are relatively gentle, close to the fourth power curve. So there are no Geohash problems when choosing precision.

To sum up, S2 algorithm can meet our requirements for function and accuracy, so S2 algorithm is finally selected as the realization scheme of spatial point index algorithm.

Part.3 Function realization and performance optimization

3.1 Module Design

The App geographic album service in this paper is mainly realized based on four modules: album index data operation, user album scanning, album index service and album location classification calculation:

The following are introduced respectively.

3.1.1 Album index data operation module

The index of album location information uses database as storage medium, and stores user photo information and Cell ID calculated by S2 algorithm into the database. Cell ids from Level4 to Level16 are stored, considering the number of stores and the longitude requirements for searching and aggregating.

Album index data operation module, composed of database (DB) and database operation layer (DAO). The design of the data table is shown below:

The DATABASE Operation layer (DAO) encapsulates the API of data insertion, deletion, query and other basic operations.

3.1.2 User album scanning module

Based on the API of album query provided by iOS native, the user album scanning module compares the data of user album with the photo data stored in the local database, and extracts the newly added photo data and deleted photos of users.

3.1.3 Album index service module

Album index service module is the core module of album service based on S2 algorithm. Module functions are as follows:

  • Directly interact with the data module, shield the data operation details of the data layer to users, and provide API to meet the needs of query and search

  • Query photo resources under the specified Cell ID

  • Query the Cell ID of the photo index of an album at a specified Level

  • Queries photos centered at the specified coordinate point and within the specified radius

  • Interact with the user album scanning module, obtain the data of newly added photos and deleted photos, update the database content, and support the query and notification of update status

3.1.4 Album location classification calculation module

The location classification calculation module of photo album is the core module to calculate the location classification result of user photo album. The main functions of this module are as follows:

  • Obtain the photo Cell ID of S2 album index service and upload it to the server as a parameter. The server returns the aggregation result of the Cell ID to the server according to the aggregation interface provided by the map service

  • Considering the accuracy and data volume of the Cell ID, select the Cell ID at Level12 as the Cell ID level of the request server

  • The album index service module is called to obtain the Cell ID after deduplication according to the method of obtaining the Cell ID at the specified Level

  • The data structure returned by the server is a one-to-many mapping between mDD_ID (destination ID) and Cell ID

  • Use the photo Cell ID in the local S2 album index service to classify the photos according to the classification data returned by the server in the previous step

  • Caches the results of each location classification

3.2 Overall Process

The album index service module will update the service when the App starts, and synchronize the local data with the album data. When the user triggers the function of place album, the album place classification calculation module will first take out the cached calculation results of local album place classification and show them to the user, and drive the update of album index service at the same time.

After receiving the notification of the completion of the update service, it first requests the Cell ID of 12Level full deduplication from the album, and then uploads the Cell ID to the service end for the service end to calculate the classification. Finally, combining with the full photo data of the album index service, it calculates the location classification result of the photo, caches the result and renders it to the user.

3.3 Performance Optimization

3.3.1 Obtaining incremental photos of an album

The album index service module needs to synchronize the photo resource data of the service and the user’s album, find the new data and add it to the service database. The initial design for acquiring new data is as follows:

Step.1 Obtain the data of the full user album

Step.2 Run the user photos to check whether the photos are stored in the local service database

However, when this scheme is applied to mobile phones with a large number of photos, the delay of obtaining new photos is very high. After investigation, we found that the reason was that the time delay of the full traversal of user albums was very high, and it was time-consuming to query the database frequently during the traversal. According to the survey, iOS user albums have the album category of “recent projects”, under which resources are only arranged in reverse order of addition, that is, the newer photos are the more advanced. Therefore, the scheme is optimized as follows:

Step.1: Cut 100 entries from the head of the list

Step.2: Append the 100 photos as new photos

Step.3: Check whether the last item in the 100, that is, the latest item to be added, exists in the service database

  • If not, continue with Step.1

  • If so, stop capturing and get new photos

3.3.2 Progressively calculate the location classification of photos in the album

After obtaining the classification result (mapping relationship between MDD_ID and Cell ID list) returned by the server, the album location classification calculation module classifies the photos in the local service database according to the result. The initial plan was as follows:

Step.1: Iterates through the result list to obtain the Cell ID list of each MDD_ID mapping

  • A. Traverse the Cell ID list and query the photo resources under the Cell ID from the album index service module to obtain the photo resources corresponding to the MDD_ID

  • B. Sort the photos in reverse order according to their creation time

Step.2: Classify the results of all destination dimension photos, sort them in reverse order according to the latest creation time of photos in each result set, that is, the creation time of the first photo, and obtain the final calculation results of place albums sorted according to location dimension and creation time.

As a result, when the location album is calculated for the first time, users need to wait for the results of all destinations to be presented to users. Meanwhile, they need to sort the results according to the creation time for many times, resulting in high latency and poor user experience under cold startup.

Therefore, we optimized the scheme to reduce the number of sorting and optimize the user experience through progressive loading. The main idea is that in the database of album index service module, the creation time of stored photos can be queried through SQL, and all photo resources arranged in reverse order according to the creation time can be obtained in reverse order:

Step.1: Take 1000 photos from the header of the photo resource set each time

  • Traverse each photo and query the destination in the mDD_id -Cell ID mapping table according to the Cell ID of the photo to determine whether the photo resource classification set of the destination exists in the photo destination classification result set

  • If yes, append the photo

  • Create a result set for that destination, append to the photo destination classification result set, and append the photo

Step.2: Render the classification results of 1000 photos and present them to users

Step.3: After calculating the classification of all photos, notify the end of rendering. The calculation is complete.

The above scheme, the full amount of local photo resources in a batch of 1000, progressive calculation, at the same time progressive rendering, shorten the user waiting time; At the same time, relying on the sorting ability of relational database, the sorting times are reduced and the performance is optimized.

Part.4 Future planning and summary

At present, the location album based on The Google-S2 algorithm introduced in this paper has been online for a period of time in the iOS client of Hornet’s Nest APP, and has brought positive growth to the number of notes published. However, this scheme still has great room for optimization and exploration in the application of Google-S2 algorithm in database data processing, and our team will continue to optimize and dig into it in the future.

The implementation and landing of Google-S2 algorithm service in the iOS client of Horhoneycomb App not only meets the exploration of note release scene, but also enables the client to index and search users’ album photos with 100-meter accuracy, which can serve for more and more complex business scenes in the future. We believe that in the near future, we can provide users with more convenient and interesting travel record products.

Author: Wang Yan, Wang Mingyou, Hornet’s Nest content business r&d engineer.