July 21, 2021 Amazon Cloud Technology China Summit site, SphereEx co-founder, Apache ShardingSphere PMC Pan Juan was invited to participate in the summit, With The theme of “Open Source Ecological Construction of Apache ShardingSphere Distributed Database Middleware”, the paper introduces the spread of open source concept, community construction, how ShardingSphere practices Apache Way and other aspects. This paper is summarized from Pan Juan’s content sharing.

** 01 The new ecology layer above the DataBase and under the business is close to the application layer and close to the DataBase layer. 支那

Different industries, different users, different positioning, different needs…. Today’s database is facing more complex data application scenarios and more personalized and customized data processing requirements than in the past. Increasingly demanding production environments are pushing different databases to maximize data read/write speed, latency, throughput and other performance indicators.

Over time, the data application scenarios with clear division of labor gradually lead to the fragmentation of the database market, and it is difficult to emerge a database that can perfectly adapt to all scenarios. Choosing different databases in different business scenarios has become a common method of enterprise selection.

But also, this kind of database form of a thousand flowers, also brings “a thousand flowers” problem. But from a macro point of view, there are commonalities between these problems, which can be separated and form a set of factual standards. If a platform layer can be built on top of these blooming databases to unify the application of management data, it can be developed in accordance with fixed standards under the premise of shielding the differences of underlying databases. This standardized solution will greatly reduce the pressure and learning cost of user management of basic data facilities.

Apache ShardingSphere is located in this layer. The ability to reuse the original database can help the technical team to realize the development of incremental capabilities such as sharding, encryption and decryption on this layer. It does not need to consider the configuration of the underlying database down, and can shield user perception up. In this way, the business-oriented direct database connection capability can be built quickly and large-scale data cluster can be easily managed.

02 How to practice Apache Way Sharding

ShardingSphere can simultaneously superimpose multiple functions to meet the diversified needs of users.

As the volume of services increases, a single database cannot support large volume of services. Therefore, it is necessary to expand the database horizontally, which inevitably leads to the problem of distributed management. ShardingSphere builds a hot-plug functional layer on the database and provides the operation mode of traditional database, shielding users’ perception of changes in the underlying database and giving developers the ability to manage large-scale database clusters by using single database. ShardingSphere mainly includes the following four application scenarios:

Sharding strategy

As the business volume increases, the data sharding pressure will increase, and the corresponding sharding strategy will be designed more complex. ShardingSphere can help users to do more sharding strategies beyond the original horizontal expansion in a flexible and easily extensible way at the lowest cost, and also supports the ability to customize the expansion.

Reading and writing separation

In most cases, the active/standby deployment can effectively relieve the pressure on databases. However, if a machine or database table in a cluster is faulty and normal read/write operations cannot be performed, services will be greatly affected. To avoid service unavailability, developers usually need to write a set of highly available policies to implement primary/secondary switchover of read/write library tables. ShardingSphere can automatically explore the status of all clusters, and find problems such as unreliable requests and primary/secondary switching in the underlying database in the first time, and can automatically restore the primary/secondary status without the awareness of surface users.

Sharding Scaling

As the business grows, it may be necessary to split the previously split data cluster again. ShardingSphere’s Scaling component can start tasks with a single SQL command and display their running status in real time in the background. The old and new database ecosystems were reconnected by Scaling the ‘pipeline’.

Data encryption and decryption

In the application of database, encryption and decryption of key data is also a very important part. If the original system monitoring ability is not up to the standard, some sensitive data may be stored in the plaintext state, which needs to be encrypted later. This is a common problem in many teams. By standardizing these capabilities and integrating them into the middleware ecosystem, ShardingSphere automates the process of data desensitization, encryption and decryption of new and old businesses for users, and the whole process realizes no perception at the user level. It also supports a variety of built-in data encryption and decryption/desensitization algorithms, and users can customize and expand the corresponding data algorithms according to their own conditions.

Constructing access nerves for data: pluggable Database Plus platform

In the face of various requirements and application scenarios, ShardingSphere provides developers in different fields with three access forms: JAVa-oriented JDBC, heterogeneous proxy and upcloud-oriented Sidecar. Users can choose according to their specific needs. Sharding, read/write separation, data migration and other related operations are performed on the original cluster.

JDBC access: completely in the way of JDBC to use, can be understood as an enhanced JDBC driver, fully compatible with JDBC and various ORM framework, without additional deployment and dependence that can achieve distributed management, horizontal expansion, desensitization and a series of operations;

Proxy access: In the form of simulated database services, Proxy is used to manage the real database cluster at the bottom, without service transformation.

On-cloud Mesh access: ShardingSphere is deployed in the public cloud. In cloud, SphereEx has joined Amazon Cloud technology’s Cloud initiative and will continue to work with Amazon Cloud Technology in Marketplace in China and overseas to provide amazon cloud users with more powerful Proxy image deployment capabilities. Work together to create a more mature on-cloud environment for enterprise applications.

03 Open source, personal work connected to the world

ShardingSphere has exerted considerable influence in the industry since it was open source. At present, ShardingSphere is usually included in the candidate list whenever tools or capabilities related to horizontal expansion are involved in China. This, of course, is due to the contributions of the project maintenance team over the years, which makes the functions of ShardingSphere more and more perfect. On the other hand, it is also due to the increasingly upward open source atmosphere in China.

In the past few years, most of the Chinese users in the open source community played the role of program downloading and code reference, but there was little involvement in community building. In recent years, with the promotion of the concept of open source in China, more and more students with strong technical feelings began to emerge. It is the participation of these students that can make the ShardingSphere community more and more active. For a good open source project, the evaluation criterion is not only its advanced concept and technology, but also the profound foundation accumulated in many aspects such as technological influence, open source influence, ecological construction and developer group.

This is why ShardingSphere, as one of Apache’s top open source projects, is still actively calling for participation in the open source community. After all, you’re only in contact with the people around you and the work you do in the office. You’re “confined” to that group of people every day. Through open source, I can connect my work to the world, so that I can put aside books and really devote myself to projects, open my vision, gradually cultivate the spirit of openness and cooperation, and rediscover the value generated by myself at present.

Source SphereEx Pan Juan (Shared at Amazon Cloud Technology China Summit)