1 Background

With constant expansion of the Internet financial scene, vast amounts of data access and processing from the traditional centralized database began to show the performance bottlenecks, and scenes of distributed data solution for use arises at the historic moment, and data security and compliance with the enterprise more and more high to the requirement of data using more seriously. So in this scenario, the distributed data solution with high performance, scalable, highly available features, such as the traditional centralized database is difficult to meet at the same time, this paper focuses on analyzing the characteristics of distributed data middleware and related products of comparative analysis, and according to the results of the analysis, how to choose an appropriate depots table middleware advice is given.

2 Comparative analysis

From the analysis of technical route, distributed data can be divided into three modes: embedded client mode, distributed database middleware mode and native distributed database mode.

  • Cut-in customer model

Functions are gathered in jar. Applications rely on JAR without additional dependency or deployment, and then only need simple configuration, zero intrusion to services, and flexible disassembly.

  • Middleware pattern

Distributed database middleware is a middle-tier solution based on multiple traditional single point database systems. By dividing data into different database nodes, middleware is used to manage and access the data in each database. The distributed database schemes initially used in Internet industry are mostly based on middleware and have achieved good results in solving the problem of service pressure.

  • Distributed database mode Native distributed database refers to the architecture design, underlying storage and query processing are oriented to distributed data management requirements, database cluster as a whole to provide external services, users do not need to pay attention to the implementation details of the cluster.

2.1 Analysis of mainstream middleware

2.1.1 Cobar

Cobar is an open source MySQL distributed processing middleware developed by Alibaba, which can provide massive data services in a distributed environment just like traditional databases.

2.1.1.1 Main technology stack

Using Java language development, Cobar front-end implementation of MySQL protocol, that is, only accept the MySQL protocol request, docking received SQL parsing, routing, result set processing and other operations; Internal implementation of Hash, enumeration and other commonly used algorithms;

Back-end access to MySQL is directly binary oriented.

2.1.1.2 Community activity

Analysis results:

parameter instructions note
attention Around 3k, attention is only one aspect, it depends on time Only reference, can not accurately determine whether active
active From 2014 to 2020, the project activity decreased significantly, and now it is basically at a standstill. More accurately determine whether active or not

2.1.1.3 Function Constraints

  • Only MySQL databases are supported
  • SQL support is limited

2.1.2 MyCat

MyCat is an open source distributed data middleware optimized based on Cobar. Its backend is compatible with JDBC protocol and supports multiple database access. MyCat currently exists in two versions, one is 1.xxx and the other is 2.0. In the future version 2.0, client mode will be provided to support the operation of MYCAT in API mode without starting the network layer, so as to realize the execution of SQL.

2.1.2.1 Main technology stack

Using Java language development, MyCat front-end implementation of MySQL protocol, using NIO or AIO to receive application requests, received SQL parsing, routing, result set processing and other operations.

The backend uses the native binary protocol to access the MySQL database. It is compatible with JDBC and supports multiple types of database access.

2.1.2.2 Community activity

Analysis results:

parameter instructions note
attention MyCat1 was about 8.3K, MyCat2 624 Only reference, can not accurately determine whether active
active It has been active since May 2020 and is under continuous construction More accurately determine whether active or not

2.1.2.3 Function Constraints

  • The operation after a transaction is started can only be the same shard
  • A global table that is not in the same shard cannot be rolled back when used in a transaction
  • Some associated subqueries are not currently supported (non-push-down subqueries)
  • Oracle support is limited

2.1.3 Dble

Dble was developed based on the open source project Mycat1.xxx, supporting only MySQL database, eliminating support for other databases, making deep improvements/optimizations to compatibility, complex queries and distributed transaction behavior.

2.1.3.1 Main technology stack

Using Java syntax development, front-end implementation of MySQL protocol, NIO or AIO to receive application requests, internal implementation of SQL optimizer to optimize the REWRITING of SQL, received SQL parsing, routing, result set processing and other operations.

The back-end uses native binary protocol to access the MySQL database.

2.1.3.2 Community activity

Analysis results:

parameter instructions note
attention 685 Only reference, can not accurately determine whether active
active From 2016 to 2020, the project activity is relatively stable and is under continuous maintenance More accurately determine whether active or not

2.1.3.3 Function Constraints

  • Only MySQL databases are supported

2.1.4 ShardingSphere

ShardingSphere provides the Proxy mode sharding-proxy, and also provides the client mode Sharding-JDBC; Sharding-proxy is positioned as a transparent database Proxy and provides server version that encapsulates database binary protocol to support heterogeneous languages. Currently available in MySQL/PostgreSQL, it can use any MySQL/PostgreSQL compliant access Client (e.g. MySQL Command Client, MySQL Workbench, Navicat, etc.) to manipulate data.

Sharding-jdbc is the embedded client mode, embedded in the application in the form of Jar

2.1.4.1 Main technology stack

Using Java language development, internal implementation of MySQL protocol to receive MySQL protocol requests, custom SQL syntax parsing template, integration of Antl4 parser to analyze SQL, at the same time through the extension of Spring Namespace label configuration of the sub-library sub-table. Compatible springboot.

2.1.4.2 Community activity

Analysis results:

parameter instructions note
attention 12k Only reference, can not accurately determine whether active
active The development was most active in 2019, and the latter stage is under stable maintenance More accurately determine whether active or not

2.1.4.3 Function Constraints

  • Only MySQL and PostgreSQL databases are supported
  • SQL support is limited

2.2 Comparison Conclusion

Functional comparison

component The database Heterogeneous language performance Static entry note
Cobar Only the MySQL any Loss is slightly high There are Provides embedded clients
MyCat any any Loss is slightly high There are Apis to support embedded patterns will be provided in future releases
Dble Only the MySQL any Loss is slightly high There are
Sharding-JDBC any Only the Java Low loss There is no

Attention contrast

Activity contrast

  • Analysis of the data in the table shows that Cobar has been in a stagnant state and has not been maintained in recent years.

  • MyCat2.0 began to develop in 2019 with a relatively fast development speed. It is in the construction stage and its functions are being improved. Due to the short development time, it is uncertain whether it will last in the later stage.

  • Dble develops stably and is under continuous maintenance. Issues raised by users on Github are also periodically resolved, but only the MySQL database is supported.

  • Sharding-jdbc attention is around 12K and activity is stable.

3 How to Choose

Generally according to the following principles to choose a suitable sub-database sub-table middleware, there is no best, only the right.

  • Simple principle

    If the amount of data in the last 3-5 years has not reached the database bottleneck, it is best not to use sub-database sub-table components, as far as possible to reduce complexity, to ensure the simplicity of business, the use of sub-database sub-table components will increase other complexity, the key component does not play an obvious role in this period, there is loss.

  • Business complexity principle

For business systems with relatively simple SQL, the client embedded mode is preferred, because in the client mode, components are embedded in applications in the form of JAR packages, and the life cycle of components is consistent with the life cycle of the business system. There is no need to consider features such as high availability of components, and the performance of middleware in the proxy mode is significantly better than that in the proxy mode. In addition, you need to consider whether physical resources are sufficient. Because they are embedded in applications, physical resources such as cpus are shared with service systems.

For the business system with complex SQL and cross-shard query, middleware proxy mode should be considered in this scenario. One of the big differences between client mode and proxy mode is SQL support. In addition, the proxy mode needs to be deployed independently and users need to consider the load. At present, the open source middleware itself basically does not provide load. The complexity is higher than that of the client mode.

  • Select based on existing components

Common embedded client mode components include TDDL of Taobao, Meituan Zebra, Sharding-JDBC, etc. Sharding-jdbc is recommended to use, which has obvious advantages from attention and activity. TDDL and Zebra have stopped maintenance at present. You can also consider self-developed embedded components, which cost is lower than the proxy mode.

As for the selection of middleware, it can be seen from the conclusion of data analysis in Section 2 that Cobar has stagnated and is not recommended for use.

MyCat2.0 is developed based on 1.xx and has been under construction since the middle of 2019. The first stage is still under construction and the functions are still being improved. If it is used, in-depth analysis is needed.

Dble is in stable maintenance and actively resolves the problems raised in the community. It can be considered, but only supports MySQL database.

This article is formatted using MDNICE