Under the wave of microservices and cloud native, servitization has always been the focus of people’s attention. However, after servitization, data access is rarely discussed. Despite the fact that relational databases are far from cloud native, and have long been criticized for being unfriendly to distribution, they still play a very important role today.

It will be difficult for either NoSQL or NewSQL to completely replace it in the near future in terms of its maturity and the surrounding ecosystem, the flexibility of data query, the control of developers and DBAs, and the difficulty of recruiting suitable people.

So, is there an effective way to manage the increasing vertical split of databases in microservices architectures, as well as the horizontal split of databases as the volume of data explodes? Can the popular Service Mesh concept shed some light on database governance?

Database Mesh

Database Mesh is a new term derived from the wave of Service Mesh. As the name suggests, the Database Mesh uses a Mesh layer to unify the management of databases scattered around the system. The network of interactions between applications and databases, held together by meshing layers, is as complex and ordered as a spider’s web.

In this sense, the concept of a Database Mesh is similar to that of a Service Mesh. It is called a Database Mesh, not a Data Mesh, because its primary goal is not to engage Data stored in the Database, but to interact with the Database.

Database Mesh focuses on how to organically connect distributed data access applications with databases. It pays more attention to interaction and effectively combs the interaction between chaotic applications and databases.

Using the Database Mesh, applications and databases that access the Database eventually form a huge grid system. Applications and databases only need to be placed in the grid system, and they are governed by the Mesh layer.

Review the Service Mesh

Service governance focuses on non-functional requirements such as service discovery, load balancing, dynamic routing, degraded fuses, invocation links, and SLA collection. Typically, this can be achieved through both the proxy and client architectures.

The proxy solution is gateway based. The application server that provides services is hidden behind the gateway. Access requests must pass through the gateway, and the gateway performs service governance actions before routing traffic to the back-end application. Nginx, Kong, Kubernetes Ingress, etc.

The client-side solution is that the libraries deployed on the application side perform the corresponding service governance actions and access the service provider in a point-to-point manner. Dubbo, Spring Cloud and others use this solution.

Both proxy and client service governance have their own advantages and disadvantages.

The advantage of service governance on the proxy is that the application only needs to obtain the gateway address, and the complex deployment structure of the back-end is completely shielded. The disadvantage is that the performance and availability of the proxy itself is the bottleneck of the whole system, and the consequences of downtime are more serious. The idea of a centralized architecture is the opposite of cloud native.

Service governance on the client side has the advantage of using a decentralized architecture without worrying about a node becoming a system bottleneck. The downside is the intrusion of service governance into business code. For the zero intrusion that cloud native values, the client-side approach to service governance is clearly not feasible. Client-side governance schemes are even less able to support heterogeneous languages.

The third architecture model, Sidecar, is a better fit for a cloud native architecture that wants both zero intrusion and no center. Sidecar is started as an independent process. Each host can share the same Sidecar process, or each application can have an exclusive Sidecar process.

All service governance functions are taken over by the Sidecar. Applications can access only the Sidecar. Obviously, Service Mesh based on Sidecar mode is a better way to implement the cloud native architecture. Zero intrusion and centralization make Service Mesh highly recommended.

Especially when used with Mesos or Kubernetes, Ensuring that Sidecar starts on each host with Marathon or DeamonSet, coupled with its ability to dynamically schedule containers, can be even more powerful. Kubernetes (Mesos) + Service Mesh = Elastic scaling + zero intrusion + no center, together they complete a cloud required infrastructure.

Similarities and differences between Database Mesh and Service Mesh

The goals of database application governance and service governance overlap and differ. In contrast to services, databases are stateful and cannot be randomly routed to peers like services, so data sharding is an important capability. Automatic discovery of database instances is less important because of the stateful nature of the database. Starting or stopping a new database instance often means data migration. Of course, we can also use multiple data copies, read and write separation, main library write and other methods for further processing. Other functions, such as load balancing for multiple slave libraries, fuses, and link acquisition, are also applicable in database governance.

As with service governance, governance for database applications can also apply these three architectural solutions.

The agent-based solution is to use a proxy server that implements the corresponding database communication protocol, such as MySQL. Cobar, MyCAT, Kingshard, and the upcoming Sharding-JDBC-Server use this approach. Client-based solutions must be strongly bound to development languages, such as the Java language, which can be implemented through JDBC or an ORM framework. TDDL and Sharding-JDBC use this scheme.

Similarly, both the proxy and the client have their own advantages and disadvantages. The advantage of the proxy side is the support of heterogeneous languages, while the disadvantage is still the centralized architecture. The advantage of the client is that it has no centralized architecture, but the disadvantage is that it cannot support heterogeneous languages. Therefore, it cannot effectively support the command line and graphical interface of various databases.

In Sidecar mode, the advantages of the proxy and client can be effectively combined, while shielding their disadvantages. But is a service governance-based Sidecar the same as a database access-based Sidecar? Of course not. The main difference is data sharding.

Sharding is a complex process. To ensure transparent application, it is common to parse SQL and route it to the corresponding database for execution. Finally, the execution results are merged to ensure that the logic is correct even in sharding.

The core process of a data sharding is SQL parsing – > SQL routing – > SQL rewriting – > SQL execution – > result merging. In order to achieve zero intrusion into legacy code, the execution protocol of SQL needs to be encapsulated. For example, on the proxy side, you need to simulate the communication protocol of MySQL or other corresponding databases. When implemented on the Java client, the corresponding methods of the JDBC interface need to be overridden.

Is there an implementation of Database Mesh? Unfortunately, not yet. Even with the widespread popularity of Service Mesh, its various products are still maturing.

Earlier versions of Linkerd and Envoy can be used in production environments, but the next generation of Istio has caught the industry’s eye with a much grander picture, just not yet available in production. As an extension of Service Mesh, Database Mesh is in the embryonic stage of development.

Sharding-JDBC

Sharding-jdbc was open-source by Dangdang in 2016. Initially, it was a database middle layer that implemented libraries and tables in Java’s JDBC layer. This year, JD Financial Cloud decided to take Sharding-JDBC as its core external output.

Therefore, Sharding-JDBC must be innovated, and Database Mesh is definitely the right development direction for cloud. Sharding-jdbc decided to implement the Sidecar version with the expectation that it would become a pure cloud native database mid-tier product.

The target

The ultimate goal of Sharding-JDBC is to transparently use databases scattered across systems as if they were a single database. Make it as easy as possible for application developers and DBAs to migrate their work to a Sharding-JDBC-based cloud native environment. Sharding-jdbc wants to provide a decentralized, zero-intrusion, and cross-language cloud-native solution.

The evolution

Sharding-JDBC has always taken JDBC layer Sharding as its core concept. Its architecture diagram is as follows:

It is divided into sharding module, flexible transaction module and database governance module. As its core, the sharding module implements SQL parsing, routing, rewriting, execution and merging. However, as a product that aspires to serve the cloud’s native architecture, it is not enough to provide services only in the JDBC layer.

Sharding-jdbc will implement three different versions of Driver, Server and Sidecar respectively to form a Sharding-JDBC ecosystem and provide more targeted differentiated services for different needs and environments. Sharding-jdbc will release its Server version soon. A Sidecar version of Sharding-JDBC will also be developed in the near future. The original Sharding-JDBC will be renamed to sharding-jdbc-driver. Since the core function of Sharding has been realized, the adjustment of the architecture model is not complicated. The core code of Sharding-JDBC-Server still uses the original Sharding logic of Sharding-JDBC, but the MySQL protocol is wrapped in the periphery. In the future, the concurrent correlation protocol of other databases will also be provided. The architecture diagram is as follows:

With the advent of Sharding-jdbC-server, the original DBA cannot operate on data through sharding-jdbC-driver. Since sharding-JdbC-driver does not require secondary forwarding through the agent layer, online performance is better. Sharding-jdbc can be used in the following mixed deployment schemes:

Online applications use Sharding-JdbC-driver to directly connect to the database for optimal performance, and use MySQL command line or UI client to connect to Sharding-jdbC-server for convenient data query and DDL statement execution. They use the same registry cluster. After the administrator configures data in the registry, the registry automatically pushes configuration changes to the Driver and Server applications. If the number of connections skyrocketed due to excessive database splitting, you can consider using Sharding-JDBC-server directly online to effectively control the number of connections.

In the near future, Sharding-JDBC-Sidecar will also be available, and its deployment architecture looks like this:

Database Mesh and Service Mesh based on Sharding-JDBC do not interfere with each other and complement each other. The interaction between services is taken over by the Service Mesh Sidecar and the SQL-based database access is taken over by the Sharding-JdbC-Sidecar.

For business applications, no matter RPC or access to a database, there is no need to pay attention to its real physical deployment structure to achieve true zero intrusion. Since Sharding-jdbC-Sidecar is created and dies with the life cycle of the host machine,

Therefore, it is not static IP, but completely dynamic and elastic existence, the entire system does not exist any central node. For data operation and maintenance, you can still start a Sharding-JdbC-server process as the entrance to the static IP and perform operations through various command lines or UI clients.

Clear the sharding-JDBC fog

Sharding-JDBC since its birth, the user manual has been relatively perfect. But because it has been in a state of rapid development, few details of its internal implementation have been disclosed. Although the code is open source, it is not realistic to require technicians to read through the third party code when selecting a model. There is not enough space in this article to analyze all the sharding-JDBC source code from beginning to end, but it can provide answers to common questions.

1. How is the SQL parsing of Sharding-JDBC done? Is there any problem with efficiency?

Sharding-jdbc uses lexical and syntax parsing to parse SQL. The SQL is first broken down into word roots and then parsed according to the SQL syntax and its context. SQL parsing of Sharding-JDBC does not produce an AST (abstract syntax tree), but directly extracts the parsing context required for Sharding, as in: Tables, Select Items, Conditions, Order Items, Group By Items, Limit, etc. In addition, the parsing context is directly applied to the route, eliminating the need for the second traverse of the AST, which further improves the performance. A relatively complex SQL parsing of Sharding-JDBC takes about 10ms, which is several times or more than 10 times faster than SQL parsers based on JavaCC such as JSqlParser.

2. Sharding-jdbc For paging, sorting, and grouping queries, do I need to fetch all the data to the memory for operation, so that the memory will be overloaded?

Sharing-jdbc uses both streaming merge and in-memory merge. Streaming merge takes no memory at all. Its principle is the same as JDBC ResultSet. However, memory merging takes up memory, so all data in the ResultSet will be taken out and put into memory before merging. Contrary to what most people might think, sharding-JDBC only uses in-memory merging when both ORDER BY and GROUP BY exist and the ordering is inconsistent.

LIMIT, ORDER BY only, GROUP BY only, and ORDER BY and GROUP BY exist at the same time but in the same ORDER, all merge BY streaming and do not occupy extra memory. The specific implementation details cannot be described in detail in this article, and will be analyzed in more articles in the future. In summary, Sharding-JDBC does a lot of kernel optimization.

With incomplete statistics, there are dozens or even hundreds of companies using Sharding-JDBC, including some well-known enterprises. As the fog gradually clears, hopefully Sharding-JDBC will bring plenty of confidence to technology seleters.

The future planning

Sharding-jdbc 2018 will continue to be a year of rapid growth. Future planning focuses on the following four aspects:

  1. Cloud native. Sharding-jdbc-driver is mature, sharding-JdbC-Server will be released soon, and Sharding-JdbC-Sidecar will also be put on the agenda in the near future. Sharding-jdbc will embrace all three architectural models. Let it shine in the cloud native architecture.

  2. SQL compatibility. At present, sharding-JDBC SQL kernel supports most DQL, DML, DDL, DCL and MySQL admin statements except subquery and OR. As the core module of a database product, Sharding-JDBC will also increase the support for sub-queries and OR, strive to support as much SQL as possible, truly achieve the maximum compatibility of legacy code.

  3. Functional closed loop. The database middle layer focuses on data sharding, distributed transactions and database governance. Sharding-jdbc deals more with data Sharding than with the other two. In the future, while improving data sharding, we will gradually increase investment in the other two aspects. Database Mesh is the governance of two parts: data access layer and Database. Currently, Sharding-JDBC has yet to be developed for Database governance. In distributed transactions, Sharding-JDBC also improves flexible transactions by automatically rolling back data by parsing binlog.

  4. Link collection. Sharing-jdbc currently supports the opentracing protocol and has been officially approved by it. Integration with SkyWalking, another well-known APM project, allows you to view and analyze sharding-JDBC call links directly. The other core product of JD Financial Cloud, Service Governance And Monitoring (SGM), will also be integrated, enabling JD Financial Cloud to provide more integrated external services. More official information about SGM will be revealed in the near future.

If you are interested in Sharding-JDBC, please visit GitHub:

Github.com/shardingjdb… .

Thanks to Guo Lei for correcting this article.