The background,
1.1 Distributed database architecture
At present, there are many distributed database architectures, but the overall architectures are not different. The main components all include coordination nodes, data sharding, metadata nodes, and global clock. A common distributed architecture is shown below:
-
GTM: Global transaction manager (global clock), one active and multiple standby;
-
Catalog: Metadata management, one master, multiple standby;
-
Group: horizontal fragments. Each group consists of one active and multiple standby data storage nodes.
-
Proxy: a stateless coordination node, responsible for processing requests from clients, sending requests to data fragments according to sharding rules, summarizing the data returned by data fragments, and cooperating with other components to ensure the consistency of distributed transactions.
1.2 Sorting Problems
Sorting is also an important function in distributed databases. Select *from T1 order by field1. The data to be queried may be distributed in different data fragments. This requires the proxy to reorder the ordered data returned for different data fragments, and then return the globally ordered data to the client.
If the amount of related data is small, the proxy can store the data returned by different data fragments in the memory, and then reorder the data in the memory and return the data to the client. If the data to be reordered is stored in the memory, OOM may occur. If the data to be reordered is temporarily stored in the proxy disk, the disk may be used up and a large number of disk I/OS may exist. A distributed database sorting and optimization method will be introduced below.
Second, solutions
2.1 Introduction to sorting schemes
To improve the performance of distributed sorting, each data shard participates in sorting itself. In this way, the fragmented data returned by the proxy is orderly. The proxy can reorder the ordered data by merging or prioritization queue, which greatly reduces the pressure on the proxy.
You can configure the sort Buffer size based on the size of proxy memory, which is usually 10 MB by default. If N data fragments are associated with a query statement, N data fragments need to be sharded in the sort Buffer. The size of each data fragment is 10M/N.
Directly in memory, the specific steps are as follows:
-
The client sends a sort query statement to the proxy SELECT *from T1 order by ID.
-
The proxy sends the query statement SELECT *from T1 order by ID to related data fragments group1 and Group2 based on the sharding key and rules.
-
Data fragmentation Sends ordered data to the proxy after querying and sorting data locally.
-
The proxy stores the ordered data returned from data fragments in the sort buffer corresponding to the data fragments and merges the ordered data.
-
The proxy sends the sorted data to the client.
2.2 Defects of sorting scheme
This method can only meet the requirements of sorting small amount of data. When sorting large amount of data, we can choose to increase the sort buffer on proxy. However, increasing sort Buffer will consume more memory resources, so it is not possible to increase sort Buffer indefinitely.
2.3 Sorting optimization idea
The ordered data returned from data fragments is saved to disks, and then the disk data is reordered. The following will introduce an optimization scheme, a distributed sorting method for large data volume.
Third, optimization plan
3.1 Introduction to sorting Schemes
Due to the limitation of memory, it is not feasible to merge and sort a large amount of data in memory. In this case, the returned data must be stored in disks temporarily. The specific optimization steps are shown below:
1) The client sends a sort query statement to the proxy select *from T1 order by ID.
2) Proxy sends a sort query statement to related data shards group1 and group2 based on the shard key select *from T1 order by ID.
3) Data sharding Sends ordered data to the proxy after querying and sorting data locally.
4) Proxy stores ordered data returned from data fragments in disk files corresponding to data fragments.
5) Use the priority queue sorting method for reordering:
-
Each data fragment generates a piece of data to build the heap. The number of nodes in the heap equals the number of data fragments.
-
To avoid performance problems caused by reading data from disks one by one during priority queue sorting, proxy reads data from disk files and prefills the data into the sort Buffer corresponding to data fragments.
-
Each shard’s sort buffer generates one heap of data.
-
Eject data from the heap top and send it to the client.
-
After the heap top data is ejected, another data is read from the sort Buffer corresponding to the ejected node and pushed to the heap.
-
After the data in the fragment sort Buffer is finished, you need to continue to pull data from the corresponding disk files to fill the sort Buffer.
-
Until all data is retrieved and sent to the client.
3.2 Defects of sorting scheme
-
The proxy needs to collect all relevant data fragments and store them in disks to solve the problem of insufficient memory. However, disks are also limited. If the amount of data is too large, the proxy disk may not be able to hold the data to be sorted.
-
Proxy stores data on disks, and a large number of disk I/OS exist.
-
For example, select * from t1 order by field1 limit 100w: if the queried data is from 50 data fragments, the proxy node needs to pull 100w data from each data fragment and save the data to disks. In this case, 5000W data (100w x 50) needs to be saved. However, the client only needs 100w data, wasting a lot of network bandwidth and disk I/o.
3.3 Sorting optimization idea
In this method, the proxy pulls all the ordered data of related data fragments to the proxy, and then sorts them. Do we pull data from the shard in batches, process the data in batches and then pull the next batch from the shard? Here’s a batch sorting method.
Iv. Final Plan
4.1 Introduction to sorting Schemes
The proxy does not store data fragments on disks. Instead, it pulls ordered data of a fixed size from the data fragments at a time. The proxy fills the pulled data into the sort Buffer corresponding to the data fragments. The specific steps are shown below:
1) The client sends a sort query statement to the proxy select *from T1 order by ID.
2) Proxy sends a sort query statement to related data shards group1 and group2 based on the shard key select *from T1 order by ID.
3) Data sharding Sends ordered data of a fixed size to the proxy after querying and sorting data locally.
4) Proxy stores the ordered data returned by data fragments in the corresponding sort buffer of data fragments.
5) Priority queue sorting.
-
The sort buffer corresponding to each data fragment constructs a heap of data, and the number of heap nodes is equal to the number of data fragments.
-
Eject data from the heap top and send it to the client.
-
After the heap top data is ejected, another data is read from the sort Buffer corresponding to the ejected node and pushed to the heap.
-
After the data in the fragment sort Buffer is finished, you need to continue to pull data from the corresponding data fragment node to fill the sort Buffer.
-
Until all data is retrieved and sent to the client.
4.2 Sorting scheme analysis
The solution of the three defects in the optimization scheme 3.2.
Defect 1: Proxy needs to collect all relevant data fragments and store the ordered data to disk, which can solve the problem of insufficient memory. However, disk is also limited. When the amount of data is too large, proxy disk may not be able to hold the data to be sorted.
Solution: As shown in the figure, proxy disks do not store fragmented data.
Defect 2: Proxy stores data on disks and has a large number of disk I/OS.
Solution: The proxy disk does not store fragmented data. Therefore, the disk pressure is not high.
Select * from T1 order by field1 limit 100w If the queried data is in 50 data fragments, the proxy node needs to pull 100w data from each data fragment and save it to disks, saving 5000W data (100w x 50), while the client only needs 100w data, wasting a lot of network bandwidth and disk I/o.
Solution: Each time a fixed size of data is pulled from a data fragment, and data is returned to the client while sorting. When the returned data to the client reaches 100W, the query is completed, and the network bandwidth waste is greatly improved.
Assuming that the size of the sort buffer corresponding to the data fragment on proxy is 2M, the amount of data pulled from the data fragment is as follows:
-
Worst case: the amount of data to be pulled is 2M x 50+100W, and disks do not need to be saved.
-
Best case: the data distribution is uniform. After returning 100W data to the client, all the data corresponding to the sort Buffer fragment is almost empty (all the data is left), and the amount of data pulled is 100W+50.
4.3 Scheme Usage Restrictions
1) The data shard node itself supports sorting, and most data shards support sorting.
2) Data fragmentation needs to support batch reading.
Taking MySQL as an example, streaming query or cursor query can be used on proxy. Other distributed databases are designed with distributed issues in mind. Their data shard nodes retain context until the end of the query, and their batch read performance is better, which is not an example here.
5. References
1.JDBC operation MySQL (3) – Query
2.MySQL JDBC StreamResult communication principle analysis
Author: Vivo Internet Database team -Xia Qianyong