Editor’s note: In recent years, the market of domestic database is booming, and many excellent domestic database products have emerged. This paper selects three typical domestic distributed database for all-round comparison and pressure test, and presents the development status of domestic distributed database.

For all application systems, data is the core asset carrying business logic, and the database system storing data is one of the most core systems. With the development of localization, it is more and more important to construct application system based on localization database and become the mainstream of database selection.

In recent years, the domestic database market is booming, and many excellent domestic database products have emerged. Major manufacturers are also investing heavily in database research and development. In this paper, three typical domestic distributed databases are selected for comprehensive comparative pressure measurement, and the development status of domestic distributed databases is analyzed for readers’ reference.

Test the environment and database architecture

PolarDB-X

Database architecture:

Oceanbase

Database architecture:

TiDB

Database architecture:

Pressure measurement index analysis

Sysbench pressure measurement:

1. Set pressure test parameters

Test table structure:  CREATE TABLE `sbtest1` ( `id` int(11) NOT NULL, `k` int(11) NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT'', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4Copy the code

2. Scenario Description

  • A total of 16 tables, each table 10 million rows of data, data distribution uniform.
  • Tidb scenario: distributed based on range level split mode (TiDB automatically balances all table data according to range by default, and the data of a certain test table is evenly distributed to multiple machines).
  • OB mode: A single table is a table that is automatically created by the sysbench script when no changes are made. Distributed based on hash horizontal split mode, short for partitioned table.
  • Polardb-x scenario: single-table, non-partitioned tables created automatically by sysbench scripts without modification; Distributed based on hash horizontal split mode, partitioned table for short, using local index; Distributed based on hash horizontal split mode, referred to as partitioned table, index using GSI global index.

3. Test result data

TPCC (5000)

TPCC is a test specification specifically for online transaction processing systems (OLTP systems), commonly referred to as online business processing systems. Released in July 1992, tPC-C test results have been published by almost all the major foreign vendors providing hardware and software platforms in the OLTP market. With the continuous development of computer technology, these test results are constantly updated.

TPCC is usually used to simulate and test complex online transaction processing systems to test the transaction processing capability of databases under high pressure. The following pressure test summarizes the maximum tpmC indicators of three distributed databases:

Tiup bench TPCC --warehouses 5000 -d TPCC -h XXX -p XXX -t threads_num -U root --db tpcc2 --host xxx --port xxx --time xxx --warehouses 5000 --threadsCopy the code

TPCH (100G)

TPCH (business intelligence) is the transaction processing Performance Council (TPC, TransactionProcessing Performance Council) to make the class used to simulate the decision support applications of a set of tests. At present, it is widely used in academia and industry to evaluate the performance of applications in decision support technology.

This commercial test can comprehensively evaluate the overall commercial computing capability of the system, which has higher requirements for manufacturers, and also has universal commercial practical significance. At present, it is widely used in bank credit analysis and credit card analysis, telecom operation analysis, tax analysis and tobacco industry decision analysis. Tpch-100g is used to compare and analyze the analysis capability of the three distributed databases as follows:

100G tiup bench TPCH prepare --host XXX --port XXX --db tpCH_100 --sf 100 --analyze --threads XXX bench tpch run --host xxx --port xxx --db tpch_100 --sf 100 --check=trueCopy the code

DDL ability

1. Scenario Description

The test data is the LINEItem table generated by TPCH100G, with 600 million rows of data in a single table

2. Parallel DDL testing

Parallel DDL is used to test whether the second step can return immediately in a DDL overrush, adding a column under the same LINEItem table, creating a table under the same library, and indexing a small table (such as nation table) when the previous DDL is not completed. If the second step can return immediately, it indicates that parallel DDL is supported.

Hot row update

For database resources co., LTD., if you have a large number of requests to consumer, will produce a large number of lock contention (data in a database update leads to lock) on the indexes for the record, consumes server resources, and the request of the success rate is not high also, in other words, you’re wasting server resources, price is not high); Hot row updates are used to test database lock control and transactions under high concurrency and pressure.

Reading and writing separation

Scenario Description: Consistent read is used to check whether the read data is consistent when the read-only node reads data, including strong consistent read and weak consistent read. The read-only node delay control is used to control the maximum delay supported by the standby database during service read.

Partition change feature

Scenario Description Partition rule change is used to verify the distributed adjustment capability of a database. Partition policy adjustment can flexibly adapt to the service scenarios of online tables, especially from a single table to a partitioned table (distributed table) or from a single table to a broadcast table.

Special scenario

1. Large transactions

Select * from lineitem; select * from lineitem; update lineitem set L_PARTKEY=L_PARTKEY+1;Copy the code

Test results:

2. The type of Json

3. Number of single-machine tables

The number of tables on a single machine tests the maximum number of tables (partitions) that can be stored on a single machine in a complex service scenario, verifies the metadata management capability of the database, and examines the distributed storage cost that can be reduced if a single machine supports more tables.

4. Drop the impact of the large table

TiDB, OceanBase, and PolarDB-X can be smoothly deleted without affecting online services.

5. Emergency current limiting

Scenario Description: Emergency traffic limiting is used to limit the running of some bad SQL statements or faulty SQL statements in an online emergency. It can be used for emergency online recovery when most services are normal.

6. Resource isolation

Scenario Description: Users verify whether automatic resource isolation is supported in OLTP and OLAP scenarios. Olap requires a large amount of data query and analysis resources. Failure to perform resource isolation may affect online services and stability.

7. Dynamic index binding

Scenario Description: Used to test execution plan binding capability

Analysis of test results

TiDB:

1. The laboratory feature (Plan Cache) is enabled. It is not recommended to use it in production. If point_SELECT is disabled by default in the production environment, the performance of point_SELECT is degraded by about 60% and only 360,000 QPS in resource point-lookup scenarios with about 100 cores

2. In sysbench test scenarios, there are a lot of WHERE ID between XX and XX, but in actual business, the scope query based solely on user ID or transaction ID is not meaningful, and more is the time range query. TiDB’s range-based partitioning strategy can only access one data shard in partition clipping between, while polarDB-X and OceanBase’s hashing strategy can access five data shards, so TiDB’s data structure has some advantages in the simple metric capability of Sysbench. Ps. In view of the performance difference between Range and Hash partitions, a comparison test was performed on PolarDB-X based on the read only scenario. The performance improvement of Range partition compared to Hash partition was about 45% (280,000 vs 190,000).

3. In the TPC-C scenario, the overall disadvantages are obvious

4. In the TPC-H scenario, the performance in tilfash mode is good, but in the ordinary TIkV mode, part of THE SQL can not run results

5. In special scenarios, the performance of indexed DDL needs to be improved. It supports JSON but is not recommended for production

OceanBase:

1. Non-partitioned table (commonly understood as single table), in OceanBase, table-level balancing is carried out on distributed multiple nodes. Data of a table is stored on only one node. Multiple tables for sysbench scenarios are completely independent during testing, allowing for a better total throughput by taking advantage of “multiple machines”. In this mode, compared with TiDB, it has an advantage of 30% to 70%. In real business scenarios, multiple independent single-table modes generally need to cooperate with the branch database and table on the business side.

2. Partitioned table: OceanBase supports the distribution of data of a table to multiple machines, realizing horizontal expansion capability at row level. Under partitioned table, additional costs such as distributed transactions and fragmented aggregated queries will exist, which is the most test of distributed ability. The sysbench performance test results show a significant difference between partitioned and non-partitioned tables. Especially in write and mixed read and write scenarios, partitioned tables are only about 1/5 of the single-table test, and the performance of distributed transactions needs to be further improved.

3. Excellent performance in the TPC-C scenario. In the TPC-H scenario, the overall performance of parallel computing + line storage is good.

4. In special scenarios, JSON is not supported and there are obvious bottlenecks in updating hotspot rows.

PolarDB – X:

1. Non-partitioned tables (generally known as single tables), polarDB-X supports the locality mode to allocate tables to different nodes. Data of a table is stored on only one node, which tests the capability of a single machine. For pure read and mixed read scenarios, the performance is 2-2.5 times better than that of TiDB.

2. Partition table, polarDB-X internal support to distribute the data of a table to multiple machines, the implementation method and TiDB, OceanBase distributed table is basically the same, in write only on the overall performance will be better than TiDB; In read Write, the most common business scenario, the performance of partitioned tables and single-machine tables is much better than OB. Non-partitioned tables have obvious performance advantages over TIDB, and partitioned tables are basically consistent with TIDB

3. Excellent performance in the TPC-C scenario. In the TPC-H scenario, the overall performance of parallel computing + line storage is good

4. In special scenarios, fast add-on DDL needs to be optimized. Support for JSON and optimization for hot updates is obvious.

Polardb-x has the best partition rule change support and basically supports all common partition change strategies

conclusion

1. Polardb-x /OceanBase/TiDB have similar performance in distributed horizontal expansion with little differentiation.

2. TiDB has some good experimental features (such as Plan cache and JSON), which are of great help to performance and functional ease of use, but are not recommended for production at present.

3. OceanBase’s model is complex, so the test scenario needs to fully understand partitioned tables and non-partitioned tables (single-table). In the non-partitioned table (single table) mode, the performance is good, focusing on the pure stand-alone capacity, the performance is fair, slightly lower than MySQL. However, in partitioned table mode, performance deteriorates considerably and needs to be differentiated by services.

4. Polardb-x has good functionality and usability, complete support for JSON, large transaction and hot update. In non-partitioned table (single-table) mode, pure MySQL performs well in standalone mode. In partitioned table mode, MySQL can further expand performance through distributed capability, and supports the most complete change strategy for partitioned tables. (after)

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.