This text is the transcript of the keynote speech #nLive Vol.001 | Meituantu Database Platform Construction and Business Practice #. You can go to station B to watch this video
Hello, everyone. I’m Zhao Dengchang from Meituan. Today, I’d like to share with you the construction and business practice of Meituan database platform.
This is the outline of the report, which mainly includes the following six aspects.
background
Firstly, I will introduce the business requirements of Meituan in graph data.
Meituan has more requirements for graph data storage and multi-hop query, generally speaking, it has the following four aspects:
The first aspect is the direction of the knowledge map. There are nearly 10 knowledge maps in meituan, including the food map, commodity map and tourism map, and the data level is about 100 billion. In the process of iterating and mining data, a component is needed to manage the atlas data uniformly.
The second aspect is safety risk control. The business department has the demand of content risk control and hopes to identify false reviews through multi-hop query among merchants, users and reviews. Financial risk control verification is carried out during payment, and real-time multi-hop query of risk points.
The third aspect is link analysis, for example, data kinship management. There are many ETL jobs on the company’s data platform, and strong and weak dependencies exist between jobs. These strong and weak dependencies form a graph, which needs to be analyzed during ETL Job optimization or fault handling. Similar requirements include code analysis, service governance, and so on.
The fourth aspect is organizational structure management, including: corporate organizational structure management, solid line reporting chain, dotted line reporting chain, virtual organization management, as well as business chain store management. For example, it is necessary to manage which stores a merchant has in different regions, and it can conduct multi-layer relationship search or reverse relationship search.
In general, we need a component that can manage hundreds of billions of graphs, solve the problems of graph data storage and multi-hop queries.
Traditional relational database and NoSQL database can be used to store graph data, but they cannot handle the high frequency operation of multi-hop query on graph well. Neo4j compared the multi-hop query performance of MySQL and Neo4j in social scenarios. The specific test scenario was to find friends of friends with the maximum depth of 5 in a social network with 1 million people and about 50 friends per person. According to the test results, when the query depth increased to 5, MySQL could no longer check the results, but Neo4j could still return data at the second level. Experimental results show that multi-hop query map database has obvious advantages.
Graph Database selection
Our graph database selection work is introduced below.
We mainly have the following 5 graph database selection requirements
- A. The project is open source, and the graph database that needs to be paid is not considered for the time being;
- B. Distributed architecture design with good scalability;
- C. Multi-hop query delay at millisecond level;
- D. Support hundreds of billions of magnitude point-edge storage;
- E. Have the ability to import data from data warehouse in batches.
We analyzed the Top30 graph databases on db-engine, stripped out the non-open source projects, and divided the remaining graph databases into three categories.
- The first category includes Neo4j, ArangoDB, Virtuoso, TigerGraph, and RedisGraph. This kind of graph database is available only in stand-alone version and open source, with excellent performance, but it cannot cope with the increase of data scale in distributed scenario, that is, it does not meet the selection requirements B and D.
- The second category includes JanusGraph and HugeGraph. Such figure database on existing storage system with general image semantic interpretation layer has been added, figure semantic layer provides the ability of graph traversal, but is limited by storage layer or architecture, does not support the complete calculation pushed down, jump through more performance is poorer, it is difficult to meet the requirements of OLTP scenarios for low latency, which did not meet the requirements of type C;
- The third category includes Dgraph, Nebula Graph. According to the characteristics of graph data, the data storage model, point-edge distribution and execution engine are completely designed for this kind of graph database, and the multi-hop traversal of graph is deeply optimized, which basically meets the requirements of graph database selection
Nebula, Dgraph, and HugeGraph were then tested in depth against Nebula, Dgraph, and HugeGraph on an open social data set of approximately 20 billion points. The evaluation is mainly conducted from three aspects:
- Batch data import
- Real-time write performance test
- Data multi-hop query performance test
For details of the testing, see Nebula Forum: Mainstream Open source distributed graphics database Benchmark 🔗
Test results show Nebula Graph performs better than its competition in data import, real-time writes, and multi-hop queries. In addition, the Nebula Graph community was active and responsive to problems, so the team chose to build the Graph database platform based on Nebula Graph.
Figure database platform construction
The following describes the construction of meituan map database platform, the whole map database platform includes 4 layers.
The first layer is the data production layer. The graph data of the platform mainly comes from two sources. The first one is that the business side converts the data in the data warehouse into the Hive table of points and sides through ETL Job, and then imports it into the graph database offline. The second is real-time line-of-business data, or near-line data generated through Spark, Flink, and other streaming processes, fed into the Graph database in real time through an online batch interface provided by Nebula Graph.
The second layer is the data storage layer. The platform supports two deployment modes of graph database cluster.
- The first is the CP solution, Consistency & Partition Tolerance, which is the cluster deployment supported by Nebula Graph native. In single-cluster deployment mode, the number of machines in the cluster is greater than or equal to the number of replicas, and the number of replicas is greater than or equal to 3. As long as there are more than half of the copies in the cluster, the cluster can provide services. The CP scheme ensures data read and write consistency, but the cluster availability is low in this deployment mode.
- The second deployment mode is AP scheme, namely Availability & Partition tolerance. Multiple graph database clusters are deployed in an application, and the number of data copies in each cluster is 1. Mutual backup is carried out among multiple clusters. The advantage of this deployment mode lies in the high availability of the entire application, but the consistency of data reads and writes is lower.
The third layer is the data application layer. The business side can introduce the atlas SDK provided by the platform into the business service to add, delete, modify and check graph data in real time.
The fourth layer is the support platform, which provides Schema management, authority management, data quality inspection, data addition, deletion, modification and query, cluster expansion and shrinkage, atlas portrait, graph data export, monitoring and alarm, graph visualization, cluster package management and other functions.
After the design of these four layers, the graph database platform basically has the function of one-stop self-management of graph data. If a business party wants to use the graph database capability, the business party can create the graph database cluster, create the graph Schema, import the graph data, configure the execution plan of the imported data, and introduce the SDK provided by the platform to operate the data on the platform. The platform side is mainly responsible for the stability of the database cluster of each business side. At present, there are 30 to 40 businesses on the platform, which can basically meet the needs of all business parties.
Introduce the design of several core modules in the database platform below.
High availability module design
The first is the design of single application and multiple cluster ha module (AP scheme). Why is there AP scheme design? Because the metric that businesses accessing the graph database platform care about is cluster availability. Online service availability requirements of the cluster is very high, the most basic requirements are available for the cluster performance reach 4, 9, the year of the cluster is unavailable time less than an hour, for online service, service, or availability of the cluster is the lifeline of the business, if it can’t guarantee, even if the cluster’s ability to provide more rich, Then the business side will not consider use, availability is the basis of business selection.
In addition, the company requires the middleware to have cross-region DISASTER recovery (Dr) capabilities, that is, the ability to deploy multiple clusters in multiple regions. We analyzed the business requirements of the platform access party, and found that about 80% of the scenarios were T+1 full data import and online read-only. In this scenario, the requirement for strong consistency of graph data reading and writing is not high, so we design a deployment scheme of single application and multiple clusters.
For details about the deployment mode, see the preceding figure. A customer creates an application on the database platform and deplores four clusters, including two in Beijing and two in Shanghai. Normally, these four clusters provide external services at the same time. If Beijing cluster 1 is now down, then Beijing cluster 2 can provide services. If the Beijing cluster is down or the external network is unavailable, then the Shanghai cluster can provide services. Under this deployment mode, the platform will try to ensure the availability of the whole application in some ways. Then, try to deploy machines in the same machine room inside each cluster, because there are a lot of RPC inside the graph data cluster. If there are frequent calls across the machine room or across regions, the external performance of the whole cluster will be low.
The design of this AP module mainly includes the following four parts:
- The first part is the Graph Database Agent on the right, a process deployed in the Graph database cluster that collects information about the machine and the Nebula Graph core modules and reports it to the Graph database platform. Agent can receive the command of graph database platform and operate graph database.
- The second part is the graph management platform, which mainly manages the cluster and synchronizes the graph database cluster status to the configuration center.
- The third part is the Graph database SDK, which mainly manages the connections to the graph database cluster. If a business party sends a query request, the SDK will route and load balance the cluster and select a high-quality connection to send the request. In addition, the SDK handles the automatic degradation and recovery of the problem machines in the graph database cluster, and supports smooth switching between data versions of the cluster.
- The fourth part is the configuration center, similar to ZooKeeper, the current state of the storage cluster.
Ten billion data import module design per hour
The second module is the data import module of tens of billions of magnitude per hour. It says that 80% of the business scenes are T+1 full data import and then read only online. The platform fully imported the data in late 19/early 20 by calling Nebula Graph’s bulk data import interface. The data write rate was about a billion bytes per hour, and the import took about 10 hours, which was a bit long. In addition, data is transferred at the speed of hundreds of thousands of seconds, occupying CPU and I/O resources of the machine for a long time. On the one hand, machine consumption is caused. On the other hand, read performance of the cluster is weakened during data import.
To solve the above two problems, the platform is optimized as follows: Generate the SST file in the Spark cluster, and ingest file to the graph database by using RocksDB Bulkload function. This part of speed optimization started at the end of 19th, but it was not online due to core dump problem. In June and July of 20, benli big man of wechat submitted pr in this aspect to the community. After communicating with him online, our problem was solved. Here we thank Benli big man 💐.
The core data import process of graph database platform can be seen in the figure on the right. When users submit derivative data operation on the platform, graph database platform will submit a Spark task to the Spark cluster of the company, and the Spark task will generate SST files related to point and edge indexes in the graph database. And upload it to the company’s S3 cloud storage. After the files are generated, the graph database platform notifies multiple clusters in the application to download the storage files, complete ingest and Compact operations, and finally complete the data version switch.
In order to take into account the different needs of various business parties, the platform unified application import, cluster import, offline import, online import, full import and incremental import, and then subdivided into the following nine stages to ensure the overall availability of applications in the process of derivative data.
- SST file generated
- SST file download
- ingest
- compact
- Data validation
- Incremental back
- Data Version Switching
- The cluster to restart
- Data preheating
Design of real-time write multi-cluster data synchronization module
The third module is real-time writing of multi-cluster data synchronization. In 15% of the demand scenarios of the platform, the newly generated business data should be written into the cluster in real time when the data is read in real time, and the strong consistency of data reading and writing is not high, that is to say, the data written to the graph database by the business side does not need to be read immediately.
In the preceding scenario, when a service provider uses a single-application multi-cluster deployment solution, data in multiple clusters must be consistent. The platform has done the following design, the first part is the introduction of Kafka component, business side in the service through SDK to graph database write operation, SDK does not directly write graph database, but write operation to Kafka queue, then by the application of multiple clusters asynchronous consumption of the Kafka queue.
The second part is that the cluster can configure consumption concurrency at the application level to control the rate at which data can be written to the cluster. The specific process is
- The SDK does grammar analysis for user write operation statements and disassembs batch operations of point and edge into single point and edge operations, that is, rewriting the write statements once
- The Agent consumes Kafka by ensuring that each point and its out-edge operations are executed sequentially in a single thread. This ensures that the end result is consistent across clusters after performing writes.
- Concurrent scaling: Changes and adjusts the consumption rate of Kafka operations by changing the number of Kafka shards and the number of Kafka threads consuming in the Agent.
- If Nebula Graph is to support transactions in the future, the configuration above will need to be adjusted for single-shard, single-thread consumption, and the platform will need to tweak the design.
The third part is that in the process of real-time data writing, graph database platform can synchronously generate a full data version, and do smooth switching, through the process in the right picture to ensure that the data is not heavy, leakage and delay.
Figure visual module design
A fourth module, the Nebula Graph Visualization Module, was developed in the first half of 2020 after examining the Nebula Graph’s official graphics design and a number of third-party open source visualizations components. It then added general graphics visualization capabilities to the Graph Database platform to address subgraph exploration issues. When users view graph data through visual components in graph database platform, they can avoid screen explosion caused by too many nodes by proper interaction design.
Currently, the visualization module on the platform has the following functions.
The first is to find vertices by ID or index.
The second is a card that can view vertices and edges (the card shows the point and edge properties and property values), and you can select vertices by single, multiple, box and type.
The third is graph exploration. When the user clicks on a vertex, the system will display its one-hop neighbor information, including: what outsides does the vertex have? How many points can it Touch through this edge? What about the entry edge of that vertex? Through this one-hop information display, users can quickly learn about their neighbors and explore subgraphs faster when exploring subgraphs on the platform. The platform also supports edge filtering during exploration.
The fourth is the graphical editing capability, which allows platform users to add, remove, alter and temporarily interfere with online data without familiarity with Nebula Graph syntax.
Business practices
Here are some landing projects connected to our platform.
The first project is intelligent Assistant, whose data is a catering and entertainment knowledge map constructed based on meituan merchant data and user reviews, covering food, hotel, tourism and other fields, including 13 types of entities and 22 types of relationships. At present, the number of points and edges is about 10 billion level, and the data is T+1 full update, mainly used to solve the search or intelligent assistant KBQA (full name: Knowledge Based Question Answer) class problems. The core process is to use the NLP algorithm to identify relationships and entities and then construct a Nebula Graph SQL statement to retrieve the data from the Graph database.
Typical application scenarios include shopping mall search. For example, if a user wants to know whether the shopping mall of Wangjing Xinhui City is haidilao, the intelligent assistant can quickly find out the result and tell the user.
There is also a typical scene is tag shop, want to know whether there is a suitable restaurant for dating couples near Wangjing SOHO, or you can add a few scene tags, the system can find out for you.
The second is search recall. The data is mainly a medical aesthetic knowledge graph constructed based on the information of medical aesthetic merchants, including 9 types of entities and 13 types of relationships, and the number of dots and edges is in the million level. It is also a T+1 full update, which is mainly used for real-time recall at the bottom of big search to return information of merchants, products or doctors related to query. Solve the problem of few results and no results in medical beauty search terms. For example, if a user searches for a symptom such as “beer belly” or a brand such as “Runbaiyan”, the system can recall the relevant medical beauty store.
The third reason is the recommendation of atlas. The data comes from users’ portrait information, merchants’ characteristic information, and users’ collection/purchase behavior within half a year. The current data level is 1 billion, and T+1 is fully updated. The goal of this project is to give an explainable reason for the food list recommendation. Why do you do this? At present, the default merchant recommendation list on Meituan App and Dianping App is generated by the deep learning model, but the model does not give the reason for generating this list, which is lack of explanation. However, there are naturally multiple communication paths between users and merchants in the atlas. We hope to select an appropriate path to generate recommendation reasons and show users the reasons for recommending a certain store on the App interface. For example, we can generate recommendation reasons based on the user’s collaborative filtering algorithm, find multiple paths from multiple combination dimensions such as hometown, consumption level, preference category and preferred cuisine, and then score these paths, select a path with a higher score, and then produce recommendation reasons according to a specific pattern. In the above way, we can get the reason that shandong people who like Beijing food in Beijing say this restaurant is great, or Guangzhou people like his authentic Beijing noodles with fried bean paste.
The fourth is code dependency analysis, which writes code dependencies from the company’s code base to a graph database. Company code library has a lot of service code, these services will be provided interfaces, the interfaces of implementation depends on the service of some class member functions, these class member functions depend on the members of this class member variables, functions, or other class member functions, then the dependencies between them form the picture, What do we do when we write this graph to the graph database?
The typical scenario is QA precision testing. After RD completes the requirements and submits his PR to the company’s code repository, the changes are written to the graph database in real time, so RD can look up which external interfaces his code has affected and show the invocation path. If RD originally intended to change the behavior of interface A, he changed A lot of things, but he may not be aware that his changes also affect external interfaces B, C, and D. In this case, code dependency analysis can be used to Check.
The fifth is service governance. There are hundreds of thousands of micro-services in Meituan. These micro-services are called each other, and these call relationships form a graph. We write these call relationships into the graph database in real time, and then do some service link governance and alarm optimization.
The sixth project is data kinship. The dependency of ETL tasks in the data warehouse is written into the graph database, which is about tens of millions of levels of data. Data is written in real time, and a full reload is made every morning, which is mainly used to find the upstream and downstream dependencies of data tasks. For instance, Use the FIND NOLOOP PATH FROM hash(‘task1’) OVER depend WHERE depend. Type == ‘strongly dependent’ UPTO 50 STEPS statement to FIND all strongly dependent paths for task1. Here, we’ve made some enhancements to the Nebula Graph’s official FIND PATH feature, adding loopless PATH retrieval and WHERE statement filtering.
Meituan and Nebula
Finally, talk about the team’s contribution to the community.
To better serve the Needs of the Nebula database platform users, we have made some enhancements and performance improvements to the Nebula Graph 1.0 kernel, and introduced relatively common features to the Nebula Graph community. I also contributed a mainstream open source distributed graph database Benchmark 🔗 to the community public account.
Of course, with Nebula Graph, we’ve solved many of the business problems within the company, and while we’re currently contributing little to the Nebula Graph community, we’ll continue to work on community technology sharing in the hope of nurturing more Nebula committers.
Future planning of Meituantu database platform
There are two main plans for the future. One is to adapt the Nebula Graph 2.0 kernel to our Graph database platform once the Nebula Graph 2.0 kernel is stable. The second aspect is to mine more value of graph data. Today, with Nebula Graph’s basic capabilities for Graph storage and multi-hop queries, we’re going to use Nebula Graph to explore learning, computing, and value mining capabilities for the platform’s users.
The above is the map database platform construction sharing brought by Teacher Zhao Dengchang, the TECHNICAL expert of MEituan NLP.
If you are interested in [graph storage], [graph learning], [graph computing], please send your resume to Zhao Dengchang at [email protected].
Like this article? GitHub: 🙇♂️🙇♀️
Ac graph database technology? NebulaGraphbot takes you into the NebulaGraphbot community