After nearly two years of optimization and polishing, Apache ShardingSphere 5.0.0 GA version was finally officially released this month. Compared with 4.1.1 GA version, 5.0.0 GA version has a lot of optimization at the kernel level. First of all, the kernel has been completely revamped based on a pluggable architecture, so that functions in the kernel can be arbitrarily combined and stacked. Secondly, in order to improve the SQL distributed query capability, 5.0.0 GA builds a new Federation execution engine to meet the complex business scenarios of users. In addition, the 5.0.0 GA release also includes a number of enhancements at the kernel feature API level to reduce the cost of using these features. This article will explain these major kernel optimizations in 5.0.0 GA release in detail, and compare the differences between the two GA releases to help users better understand these optimizations and upgrade with typical scenarios of data sharding, read/write separation, and encryption/decryption integration.
The authors introduce
Good strong
SphereEx Senior Middleware Development Engineer, Apache ShardingSphere Committer.
In 2018, I started to contact Apache ShardingSphere middleware, and once led the company’s internal massive data database and table, with rich practical experience; Love open source, willing to share, currently focus on Apache ShardingSphere kernel module development.
Pluggable architecture kernel
Apache ShardingSphere 5.0.0 GA version puts forward the new concept of Database Plus, which aims to build the upper standard and ecology of heterogeneous Database and provide users with precision and differentiation capabilities. Database Plus has the characteristics of connection, increment and pluggable. Specifically, Apache ShardingSphere can connect different heterogeneous databases. Based on the basic services of heterogeneous databases, it provides incremental functions such as data sharding, data encryption and decryption, and distributed transactions. In addition, with a pluggable platform, the incremental functionality provided by Apache ShardingSphere can be extended indefinitely and users have the flexibility to expand as needed. The emergence of Database Plus makes ShardingSphere transform from a Database and table middleware into a powerful distributed Database ecosystem in a real sense. By practicing the concept of Database Plus and based on the extension points provided by the pluggable platform, the Apache ShardingSphere kernel has also been fully pluggable. The new pluggable architecture kernel is shown below:
The Apache ShardingSphere kernel process provides rich extension points for metadata loading, SQL parsing, SQL routing, SQL rewriting, SQL execution and result merging. Based on these extension points, By default, Apache ShardingSphere implements data sharding, read and write separation, encryption and decryption, shadow library pressure measurement and high availability and other functions.
Extension points can be divided into functional extension points and technical extension points according to whether they are based on technology or function implementation. In the Apache ShardingSphere kernel process, the extension points of SQL parsing engine and SQL execution engine belong to technical extension points, while the extension points of metadata loading, SQL routing engine, SQL rewriting engine and result merging engine belong to functional extension points.
SQL parsing engine extension points, including SQL syntax tree parsing and SQL syntax tree traversal two extension points. Apache ShardingSphere’s SQL parsing engine, based on these two extension points, supports parsing and traversal of MySQL, PostgreSQL, Oracle, SQLServer, openGauss and SQL92 database dialects by default. Users can also implement database dialects not currently supported by the Apache ShardingSphere SQL parsing engine and develop new features such as SQL auditing based on these two extension points.
The SQL execution engine extension point provides extensions according to different execution modes. At present, Apache ShardingSphere SQL execution engine has provided single-threaded execution engine and multi-threaded execution engine. The single-threaded execution engine is mainly used to process the execution of statements containing transactions, while the multi-threaded execution engine is suitable for scenarios without transactions to improve the performance of SQL execution. In the future, Apache ShardingSphere will provide more execution engines such as MPP execution engine based on execution engine extension points to meet the requirements of SQL execution in distributed scenarios.
Based on function extension points, Apache ShardingSphere provides data sharding, read and write separation, encryption and decryption, shadow library pressure measurement, high availability and other functions. These functions realize all or part of function extension points according to their own requirements. Furthermore, internal extension points such as sharding strategy, distributed ID generation and load balancing algorithm are provided by refining function-level extension points. The following are extension points for Apache ShardingSphere kernel functionality:
- Data sharding: realize all the extension points of metadata loading, SQL routing, SQL rewriting and result merging. In data sharding function, it also provides extension points such as sharding algorithm and distributed ID.
- Read and write separation: the function extension point of SQL routing is realized, and the extension point of load balancing algorithm is provided inside the function.
- Encryption and decryption: realize metadata loading, SQL rewriting and result merging extension points, internal provides encryption and decryption algorithm extension points;
- Shadow library pressure measurement: SQL routing extension point, in the shadow library pressure measurement function, provides the shadow algorithm extension point;
- High availability: An extension point for SQL routing.
Based on these extension points, there is a lot of room to expand the Apache ShardingSphere functionality, and features like multi-tenancy and SQL auditing can be seamlessly integrated into the Apache ShardingSphere ecosystem through extension points. In addition, users can also according to their own business needs, based on the extension point to complete the development of customized functions, quickly build a set of distributed database system. About pluggable architecture extension points of detail, you can refer to website developers handbook: shardingsphere.apache.org/document/cu…
Overall, the main differences between 5.0.0 GA pluggable architecture kernel and 4.1.1 GA kernel are as follows:
version | 4.1.1 GA | 5.0.0 GA |
---|---|---|
positioning | Sub-database sub-table middleware | Distributed database ecosystem |
function | Provide basic functionality | Provide infrastructure and best practices |
coupling | The coupling is large and there are functional dependencies | Isolated from each other, unaware of each other |
Use a combination of | Fixed combination mode, must be based on data fragmentation, superimposed read/write separation and encryption and decryption functions | Free combination of functions, data sharding, read and write separation, shadow library pressure measurement, encryption and decryption, high availability and other functions can be arbitrarily combined |
First of all, from the perspective of project positioning, 5.0.0 GA version realizes the transformation from sub-database sub-table middleware to distributed database ecosystem with the help of pluggable architecture. All functions can be integrated into the distributed database ecosystem through pluggable architecture. Second, from the perspective of project functions, 4.1.1 GA only provides some basic functions, while 5.0.0 GA focuses more on providing infrastructure and best practices of some functions. Users can completely abandon these functions and develop customized functions based on the kernel infrastructure. From the perspective of functional coupling, kernel functions in 5.0.0 GA version are isolated from each other, without perception of each other, which can ensure the stability of the kernel to the greatest extent. Finally, from the point of view of function combination, 5.0.0 GA version realizes the function hierarchy consistency, such as data sharding, read and write separation, shadow library pressure measurement, encryption and decryption, high availability and other functions, can be arbitrarily combined according to the needs of users. In the 4.1.1 GA version, users must combine these functions by focusing on data sharding and stacking other functions.
As you can see from these comparisons, the 5.0.0 GA pluggable kernel is fully enhanced, allowing users to stack and combine features like building blocks to meet more business needs. However, the change in pluggable architecture has also resulted in a significant change in the way kernel functionality is used, and we’ll look at how to combine these features in the 5.0.0 GA release with examples later in this article.
Federation Execution Engine
The Federation execution engine is another feature of the 5.0.0 GA kernel. It aims to support distributed queries that cannot be executed in 4.1.1 GA, such as associative queries and subqueries across database instances. The Federation execution engine enables business developers to focus on business functions without having to worry about the scope of SQL usage, reducing functional limitations at the business level.
The following steps are followed: SQL parsing, SQL routing, SQL rewriting, and SQL execution. The only difference is that the Federation execution engine introduces SQL optimization. The distributed query statements are optimized by Rule Based Optimizer (RBO) and Cost Based Optimizer (CBO) to obtain the least costly execution plan. In the SQL routing phase, the routing engine determines whether the SQL statement is executed through the Federation execution engine based on whether the STATEMENT spans multiple database instances.
The Federation engine is currently under rapid development and still needs a lot of optimization. It is an experimental feature, so it is disabled by default. To experience the Federation engine, you can configure sqL-federation-enabled: True to enable this feature.
The Federation execution engine is mainly used to support associative queries and subqueries across multiple database instances, as well as aggregated queries that are not supported by the kernel. Let’s look at the statements supported by the Federation execution engine in a specific scenario.
- Cross-library associative query: The Federation execution engine provides support when multiple tables in an associative query are distributed across different database instances.
For example, in the following data sharding configuration, t_ORDER and T_ORDER_item tables are shard tables of multiple data nodes with no binding table rules configured, and T_USER and T_user_ROLE are single tables distributed across different database instances.
rules: - ! SHARDING tables: t_order: actualDataNodes: ds_${0.. 1}.t_order_${0.. 1} tableStrategy: standard: shardingColumn: order_id shardingAlgorithmName: t_order_inline t_order_item: actualDataNodes: ds_${0.. 1}.t_order_item_${0.. 1} tableStrategy: standard: shardingColumn: order_id shardingAlgorithmName: t_order_item_inlineCopy the code
Because it spans multiple database instances, the Federation execution engine is used to perform associative queries for the following common SQL queries.
SELECT * FROM t_order o INNER JOIN t_order_item i ON o.order_id = i.order_id WHERE o.order_id = 1;
SELECT * FROM t_order o INNER JOIN t_user u ON o.user_id = u.user_id WHERE o.user_id = 1;
SELECT * FROM t_order o LEFT JOIN t_user_role r ON o.user_id = r.user_id WHERE o.user_id = 1;
SELECT * FROM t_order_item i LEFT JOIN t_user u ON i.user_id = u.user_id WHERE i.user_id = 1;
SELECT * FROM t_order_item i RIGHT JOIN t_user_role r ON i.user_id = r.user_id WHERE i.user_id = 1;
SELECT * FROM t_user u RIGHT JOIN t_user_role r ON u.user_id = r.user_id WHERE u.user_id = 1;
Copy the code
- The subquery: The Simple Push Down engine of Apache ShardingSphere supports subqueries with consistent shard conditions and subqueries routed to a single shard. If a shard key is not specified for both the subquery and the outer query, or the shard key values are inconsistent, the Federation execution engine is required to provide support.
Here are some of the subquery scenarios supported by the Federation execution engine:
SELECT * FROM (SELECT * FROM t_order) o;
SELECT * FROM (SELECT * FROM t_order) o WHERE o.order_id = 1;
SELECT * FROM (SELECT * FROM t_order WHERE order_id = 1) o;
SELECT * FROM (SELECT * FROM t_order WHERE order_id = 1) o WHERE o.order_id = 2;
Copy the code
- Aggregate query: We also provide support through the Federation execution engine for aggregated queries that are not currently supported by the Apache ShardingSphere Simple Push Down engine.
SELECT user_id, SUM(order_id) FROM t_order GROUP BY user_id HAVING SUM(order_id) > 10;
SELECT (SELECT MAX(user_id) FROM t_order) a, order_id FROM t_order;
SELECT COUNT(DISTINCT user_id), SUM(order_id) FROM t_order;
Copy the code
The emergence of Federation execution engine has significantly enhanced the distributed query capability of Apache ShardingSphere. In the future, Apache ShardingSphere will continue to optimize and effectively reduce the memory footprint of Federation execution engine. Continuously improve distributed query capabilities. For a detailed list of statements supported by the Federation execution engine, see the experimental supported SQL in the official documentation:
Shardingsphere.apache.org/document/5….
Kernel function API adjustment
In order to reduce the cost of using kernel features for users, the 5.0.0 GA release also has a lot of optimization at the API level. First of all, in response to the feedback from the community that the data sharding API is too complex and difficult to understand, after full discussion by the community, a new data sharding API was provided in 5.0.0 GA version. At the same time, as Apache ShardingSphere project positioning changes — from traditional database middleware to distributed database ecosystem, the realization of transparent data sharding function is becoming more and more important. Therefore, the 5.0.0 GA version provides an automatic sharding strategy. The user does not need to care about the details of the sub-table, but can realize automatic sharding by specifying the number of sharding. In addition, due to the introduction of pluggable architecture and further enhancement of shadow library pressure measurement functions, the kernel function API has been optimized accordingly. Below, we’ll take a look at the changes to the 5.0.0 GA API level from a functional perspective.
Data sharding API adjustment
In 4.x, the community often reported that the data sharding API was too complex to understand. The following is the data sharding configuration in 4.1.1 GA version. The sharding policy includes standard, Complex, inline, Hint, and None. The parameters of different sharding policies are very different, which makes it difficult for ordinary users to understand and use.
shardingRule: tables: t_order: databaseStrategy: standard: shardingColumn: order_id preciseAlgorithmClassName: xxx rangeAlgorithmClassName: xxx complex: shardingColumns: year, month algorithmClassName: xxx hint: algorithmClassName: xxx inline: shardingColumn: order_id algorithmExpression: ds_${order_id % 2} none: tableStrategy: .Copy the code
In 5.0.0 GA version, the sharding strategy in sharding API is simplified. Firstly, the inline policy is removed and only the standard, Complex, Hint and None sharding policies are retained. Meanwhile, the sharding algorithm is extracted from the sharding strategy. Add it to shardingAlgorithms attribute for separate configuration, and specify shardingAlgorithmName attribute for reference in sharding policy.
rules: - ! SHARDING tables: t_order: databaseStrategy: standard: shardingColumn: order_id shardingAlgorithmName: database_inline complex: shardingColumns: year, month shardingAlgorithmName: database_complex hint: shardingAlgorithmName: database_hint none: tableStrategy: ... shardingAlgorithms: database_inline: type: INLINE props: algorithm-expression: ds_${order_id % 2} database_complex: type: CLASS_BASED props: strategy: COMPLEX algorithmClassName: xxx database_hint: type: CLASS_BASED props: strategy: HINT algorithmClassName: xxxCopy the code
The above configuration is modified according to sharding configuration of 4.1.1 GA version. It can be seen that the new sharding API is more concise and clear. At the same time, in order to reduce the amount of configuration for users, Apache ShardingSphere provides many built-in sharding algorithms for users to choose, and users can also customize by CLASS_BASED sharding algorithm. For more information about the built-in sharding algorithm, please refer to the official documentation of the built-in Sharding Algorithm:
Shardingsphere.apache.org/document/5….
In addition to optimizing the data sharding API, the 5.0.0 GA release also provides an automated sharding strategy for transparent data sharding. The following shows the difference between automatic and manual sharding policy configurations:
rules: - ! SHARDING autoTables: # SHARDING strategy T_ORDER: actualDataSources: DS_0, ds_1 shardingStrategy: Standard: shardingColumn: order_id shardingAlgorithmName: auto_mod keyGenerateStrategy: column: order_id keyGeneratorName: Snowflake shardingAlgorithms: auto_mod: type: MOD props: sharding-count: 4 tables: # Manually declare fragmentation algorithms T_ORDER: actualDataNodes: ds_${0.. 1}.t_order_${0.. 1} tableStrategy: standard: shardingColumn: order_id shardingAlgorithmName: table_inline dataBaseStrategy: standard: shardingColumn: user_id shardingAlgorithmName: database_inlineCopy the code
Automatic sharding policies need to be configured under the autoTables attribute. Users only need to specify the data source for data storage and specify the number of sharding using the automatic sharding algorithm. Data distribution is not manually declared by actualDataNodes, and there is no need to set the database and table policies. Apache ShardingSphere automatically implements data sharding management.
In addition, the 5.0.0 GA release removes the defaultDataSourceName configuration from the data sharding API. In 5.0.0 GA version, Apache ShardingSphere is positioned as a distributed database ecosystem. Users can directly use the services provided by Apache ShardingSphere just like using traditional databases, so users do not need to be aware of the underlying database storage. Apache ShardingSphere manages single tables beyond data sharding through built-in SingleTableRule, helping users to realize automatic loading and routing of single tables.
5.0.0 GA adds the configuration of the defaultShardingColumn to simplify user configuration and coordinate with the data sharding API’s defaultDatabaseStrategy and defaultTableStrategy sharding strategies. As the default sharding key. If multiple table sharding keys are the same, you can use defaultShardingColumn instead of shardingColumn. In the following sharding configuration, the sharding policy for the T_ORDER table will use the default defaultShardingColumn configuration.
rules: - ! SHARDING tables: t_order: actualDataNodes: ds_${0.. 1}.t_order_${0.. 1} tableStrategy: standard: shardingAlgorithmName: table_inline defaultShardingColumn: order_id defaultDatabaseStrategy: standard: shardingAlgorithmName: database_inline defaultTableStrategy: none:Copy the code
Read/write separation API tuning
The basic functionality of the read/write split API changes little in version 5.0.0 GA, only adjusted by MasterSlave to ReadWriteSplitting, other usage is basically the same. Here is a comparison of the 4.1.1 GA and 5.0.0 GA read/write separation apis.
# 4.1.1 GA read/write separation API masterSlaveRule: name: MS_ds masterDataSourceName: master_ds slaveDataSourceNames: -slave_ds_0 -slave_ds_1 # 5.0.0 GA API rules: -! READWRITE_SPLITTING dataSources: pr_ds: writeDataSourceName: write_ds readDataSourceNames: - read_ds_0 - read_ds_1Copy the code
In addition, in 5.0.0 GA version, high availability function is developed based on pluggable architecture. Read/write separation can be combined with high availability function, providing high availability version read/write separation that can automatically switch master/slave. Welcome to pay attention to the official documents and technical sharing of high availability function.
Encryption and decryption API adjustment
5.0.0 GA version slightly optimized the encryption and decryption API, adding the queryWithCipherColumn attribute at the table level, which is convenient for users to control the plaintext and ciphertext switching of the encryption and decryption field at the table level. Other configurations are basically the same as 4.1.1 GA version.
rules: - ! ENCRYPT encryptors: aes_encryptor: type: AES props: aes-key-value: 123456abc md5_encryptor: type: MD5 tables: t_encrypt: columns: user_id: plainColumn: user_plain cipherColumn: user_cipher encryptorName: aes_encryptor order_id: cipherColumn: order_cipher encryptorName: md5_encryptor queryWithCipherColumn: true queryWithCipherColumn: falseCopy the code
Shadow library pressure measurement API adjustment
The shadow library pressure measurement API has been completely adjusted in 5.0.0 GA version. Firstly, the logical columns in the shadow library have been deleted, and a powerful shadow library matching algorithm has been added to help users achieve more flexible routing control. The following is the 4.1.1 GA version of the shadow library pressure measurement API, generally simple function, according to the corresponding value of the logical column to determine whether to enable the shadow library pressure measurement.
shadowRule:
column: shadow
shadowMappings:
ds: shadow_ds
Copy the code
5.0.0 GA version of the shadow library pressure measurement API is more powerful, users can control whether to enable the shadow library pressure measurement through the enable attribute, at the same time, according to the dimension of the table, can be fine-grained control of the production table need to be shadow library pressure measurement, and support a variety of different matching algorithms, such as: Column value matching algorithm, column regular expression matching algorithm and SQL comment matching algorithm.
rules: - ! SHADOW enable: true dataSources: shadowDataSource: sourceDataSourceName: ds shadowDataSourceName: shadow_ds tables: t_order: dataSourceNames: - shadowDataSource shadowAlgorithmNames: - user-id-insert-match-algorithm - simple-hint-algorithm shadowAlgorithms: user-id-insert-match-algorithm: type: COLUMN_REGEX_MATCH props: operation: insert column: user_id regex: "[1]" simple-hint-algorithm: type: SIMPLE_NOTE props: shadow: true foo: barCopy the code
In the subsequent technical share articles, we will introduce the shadow library pressure measurement function in detail, here is not spread, more shadows can refer to the official document library matching algorithm algorithm: shardingsphere.apache.org/document/5….
5.0.0 GA Upgrade Guide
The 5.0.0 GA kernel has been optimized in detail from the aspects of pluggable kernel architecture, Federation execution engine and kernel function API tuning. With all the differences between the two versions, the biggest concern is how to upgrade from 4.1.1 GA to 5.0.0 GA. We will use a typical scenario based on data sharding, read/write separation, and encryption/decryption integration to detail what needs to be considered when upgrading 5.0.0 GA.
In 4.1.1 GA, the combination of multiple functions must be based on data sharding, and then superimposition of read/write separation and encryption and decryption. Therefore, the configuration in 4.1.1 GA version is usually as follows:
shardingRule: tables: t_order: actualDataNodes: ms_ds_${0.. 1}.t_order_${0.. 1} tableStrategy: inline: shardingColumn: order_id algorithmExpression: t_order_${order_id % 2} t_order_item: actualDataNodes: ms_ds_${0.. 1}.t_order_item_${0.. 1} tableStrategy: inline: shardingColumn: order_id algorithmExpression: t_order_item_${order_id % 2} bindingTables: - t_order,t_order_item broadcastTables: - t_config defaultDataSourceName: ds_0 defaultDatabaseStrategy: inline: shardingColumn: user_id algorithmExpression: ms_ds_${user_id % 2} defaultTableStrategy: none: masterSlaveRules: ms_ds_0: masterDataSourceName: ds_0 slaveDataSourceNames: - ds_0_slave_0 - ds_0_slave_1 loadBalanceAlgorithmType: ROUND_ROBIN ms_ds_1: masterDataSourceName: ds_1 slaveDataSourceNames: - ds_1_slave_0 - ds_1_slave_1 loadBalanceAlgorithmType: ROUND_ROBIN encryptRule: encryptors: aes_encryptor: type: aes props: aes.key.value: 123456abc tables: t_order: columns: content: plainColumn: content_plain cipherColumn: content_cipher encryptor: aes_encryptor t_user: columns: telephone: plainColumn: telephone_plain cipherColumn: telephone_cipher encryptor: aes_encryptorCopy the code
As can be seen from the above configuration file, t_ORDER and T_order_item are configured with sharding rules, and the CONTENT field of T_ORDER is set with encryption and decryption rules, using AES algorithm for encryption and decryption. T_user is a common table that is not sharded, and the encryption and decryption rules are configured for the telephone field. In addition, read/write separation rules and encryption/decryption rules are configured in sharding rules in the form of attributes, which is also a specific manifestation of function dependence in 4.1.1 GA. Other functions must be based on data sharding.
After the configuration, we started the Proxy access end of 4.1.1 GA version and initialized the T_ORDER, T_ORDER_item and T_USER tables. Initialization statement execution results as follows:
CREATE TABLE t_order(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100));
# Logic SQL: CREATE TABLE t_order(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_0(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_1(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_0(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_1(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100))
CREATE TABLE t_order_item(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100));
# Logic SQL: CREATE TABLE t_order_item(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_item_0(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_item_1(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_item_0(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_item_1(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone VARCHAR(100));
# Logic SQL: CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone VARCHAR(100))
Copy the code
The routing rewriting of the fragmentation function of t_ORDER table is normal, but the rewriting of the encryption and decryption function is not supported, because the version 4.1.1 GA does not support the rewriting of DDL statements in encryption and decryption scenarios. Therefore, users need to create the corresponding encryption and decryption table on the underlying database in advance. DDL statement support encryption and decryption rewriting has been perfectly supported in 5.0.0 GA release, reducing unnecessary operations for users.
The t_order_item table is normal because it does not involve encryption or decryption. The t_USER table also has the problem of rewriting the encryption and decryption DDL statement, and the T_USER table is routed to the DS_0 data source. This is because defaultDataSourceName: ds_0 is configured in the sharding rule, so non-sharded tables are routed using this rule.
For t_ORDER and T_USER tables, we manually create encryption and decryption tables on the underlying database corresponding to the routing results using the following SQL.
CREATE TABLE t_order_0(order_id INT(11) PRIMARY KEY, user_id INT(11)), content_plain VARCHAR(100), content_cipher VARCHAR(100)) CREATE TABLE t_order_1(order_id INT(11) PRIMARY KEY, user_id INT(11), content_plain VARCHAR(100), content_cipher VARCHAR(100)) CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone_plain VARCHAR(100), Telephone_cipher VARCHAR(100) # ds_1 CREATE TABLE t_order_0(order_id INT(11) PRIMARY KEY, user_id INT(11), content_plain VARCHAR(100), content_cipher VARCHAR(100)) CREATE TABLE t_order_1(order_id INT(11) PRIMARY KEY, user_id INT(11), content_plain VARCHAR(100), content_cipher VARCHAR(100))Copy the code
We restart the Proxy and add data to the T_ORDER, T_ORDER_item, and T_USER tables. T_order and T_ORDER_item are routed to corresponding data nodes according to the sharding key and the configured sharding policy during data insertion. The T_USER table then routes to the DS_0 data source based on defaultDataSourceName.
INSERT INTO t_order(order_id, user_id, content) VALUES(1, 1, 'TEST11'), (2, 2, 'TEST22'), (3, 3, 'TEST33');
# Logic SQL: INSERT INTO t_order(order_id, user_id, content) VALUES(1, 1, 'TEST11'), (2, 2, 'TEST22'), (3, 3, 'TEST33')
# Actual SQL: ds_0 ::: INSERT INTO t_order_0(order_id, user_id, content_cipher, content_plain) VALUES(2, 2, 'mzIhTs2MD3dI4fqCc5nF/Q==', 'TEST22')
# Actual SQL: ds_1 ::: INSERT INTO t_order_1(order_id, user_id, content_cipher, content_plain) VALUES(1, 1, '3qpLpG5z6AWjRX2sRKjW2g==', 'TEST11'), (3, 3, 'oVkQieUbS3l/85axrf5img==', 'TEST33')
INSERT INTO t_order_item(item_id, order_id, user_id, content) VALUES(1, 1, 1, 'TEST11'), (2, 2, 2, 'TEST22'), (3, 3, 3, 'TEST33');
# Logic SQL: INSERT INTO t_order_item(item_id, order_id, user_id, content) VALUES(1, 1, 1, 'TEST11'), (2, 2, 2, 'TEST22'), (3, 3, 3, 'TEST33')
# Actual SQL: ds_0 ::: INSERT INTO t_order_item_0(item_id, order_id, user_id, content) VALUES(2, 2, 2, 'TEST22')
# Actual SQL: ds_1 ::: INSERT INTO t_order_item_1(item_id, order_id, user_id, content) VALUES(1, 1, 1, 'TEST11'), (3, 3, 3, 'TEST33')
INSERT INTO t_user(user_id, telephone) VALUES(1, '11111111111'), (2, '22222222222'), (3, '33333333333');
# Logic SQL: INSERT INTO t_user(user_id, telephone) VALUES(1, '11111111111'), (2, '22222222222'), (3, '33333333333')
# Actual SQL: ds_0 ::: INSERT INTO t_user(user_id, telephone_cipher, telephone_plain) VALUES(1, 'jFZBCI7G9ggRktThmMlClQ==', '11111111111'), (2, 'lWrg5gaes8eptaQkUM2wtA==', '22222222222'), (3, 'jeCwC7gXus4/1OflXeGW/w==', '33333333333')
Copy the code
Then run a few simple queries to see if read/write separation takes effect. According to the log, t_ORDER and T_order_item tables were encrypted and decrypted and correctly routed to the slave library. The T_USER table is still routed to the DS_0 data source for execution, and the read/write separation rule configured in the rule has no effect. This is because in 4.1.1 GA, read/write separation and encryption and decryption are integrated based on sharding, which naturally limits the use of functions other than sharding.
SELECT * FROM t_order WHERE user_id = 1 AND order_id = 1;
# Logic SQL: SELECT * FROM t_order WHERE user_id = 1 AND order_id = 1
# Actual SQL: ds_1_slave_0 ::: SELECT order_id, user_id, content_plain, content_cipher FROM t_order_1 WHERE user_id = 1 AND order_id = 1
SELECT * FROM t_order_item WHERE user_id = 1 AND order_id = 1;
# Logic SQL: SELECT * FROM t_order_item WHERE user_id = 1 AND order_id = 1
# Actual SQL: ds_1_slave_1 ::: SELECT * FROM t_order_item_1 WHERE user_id = 1 AND order_id = 1
SELECT * FROM t_user WHERE user_id = 1;
# Logic SQL: SELECT * FROM t_user WHERE user_id = 1
# Actual SQL: ds_0 ::: SELECT user_id, telephone_plain, telephone_cipher FROM t_user WHERE user_id = 1
Copy the code
The 5.0.0 GA version is based on the pluggable architecture, and the kernel has been fully upgraded, so that the various functions in the kernel can be used in any combination. At the same time, 5.0.0 GA removes defaultDataSourceName, which requires additional user configuration. By default, SingleTableRule implements metadata loading and routing for a single table. Let’s take a look at how the same functionality is configured and used in the 5.0.0 GA version as follows:
rules: - ! SHARDING tables: t_order: actualDataNodes: ms_ds_${0.. 1}.t_order_${0.. 1} tableStrategy: standard: shardingColumn: order_id shardingAlgorithmName: t_order_inline t_order_item: actualDataNodes: ms_ds_${0.. 1}.t_order_item_${0.. 1} tableStrategy: standard: shardingColumn: order_id shardingAlgorithmName: t_order_item_inline bindingTables: - t_order,t_order_item broadcastTables: - t_config defaultDatabaseStrategy: standard: shardingColumn: user_id shardingAlgorithmName: database_inline defaultTableStrategy: none: shardingAlgorithms: database_inline: type: INLINE props: algorithm-expression: ms_ds_${user_id % 2} t_order_inline: type: INLINE props: algorithm-expression: t_order_${order_id % 2} t_order_item_inline: type: INLINE props: algorithm-expression: t_order_item_${order_id % 2} - ! READWRITE_SPLITTING dataSources: ms_ds_0: writeDataSourceName: ds_0 readDataSourceNames: - ds_0_slave_0 - ds_0_slave_1 loadBalancerName: ROUND_ROBIN ms_ds_1: writeDataSourceName: ds_1 readDataSourceNames: - ds_1_slave_0 - ds_1_slave_1 loadBalancerName: ROUND_ROBIN - ! ENCRYPT encryptors: aes_encryptor: type: AES props: aes-key-value: 123456abc tables: t_order: columns: content: plainColumn: content_plain cipherColumn: content_cipher encryptor: aes_encryptor t_user: columns: telephone: plainColumn: telephone_plain cipherColumn: telephone_cipher encryptor: aes_encryptorCopy the code
First, from a configuration perspective, the biggest difference between 5.0.0 GA and 4.1.1 GA is the relationship between different features. They are a horizontal relationship, there is no function dependency in 4.1.1 GA, and each feature can be flexibly loaded and unloaded in a pluggable way. Second, when these functions are used together, they are delivered in a pipe-like manner. For example, read/write separation rules aggregate two logical data sources, MS_DS_0 and MS_DS_1, based on two sets of master-slave relationships. Data sharding rule Configure the data sharding rule based on the logical data source aggregated by read/write separation, and then aggregate the logical table T_ORDER. The encryption and decryption function focuses on the rewriting of columns and values, and the configuration of encryption and decryption rules for the logical table aggregated by the data sharding function. Read and write separation, data sharding, encryption and decryption functions are transferred layer by layer, and the functions are continuously added through decoration mode.
To compare the functionality of 4.1.1 GA, we tested 5.0.0 GA with the same initialization, insert, and query statements.
CREATE TABLE t_order(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100));
# Logic SQL: CREATE TABLE t_order(order_id INT(11) PRIMARY KEY, user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_0(order_id INT(11) PRIMARY KEY, user_id INT(11), content_cipher VARCHAR(100), content_plain VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_1(order_id INT(11) PRIMARY KEY, user_id INT(11), content_cipher VARCHAR(100), content_plain VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_0(order_id INT(11) PRIMARY KEY, user_id INT(11), content_cipher VARCHAR(100), content_plain VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_1(order_id INT(11) PRIMARY KEY, user_id INT(11), content_cipher VARCHAR(100), content_plain VARCHAR(100))
CREATE TABLE t_order_item(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100));
# Logic SQL: CREATE TABLE t_order_item(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_item_0(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_order_item_1(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_item_0(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
# Actual SQL: ds_0 ::: CREATE TABLE t_order_item_1(item_id INT(11) PRIMARY KEY, order_id INT(11), user_id INT(11), content VARCHAR(100))
CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone VARCHAR(100));
# Logic SQL: CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone VARCHAR(100))
# Actual SQL: ds_1 ::: CREATE TABLE t_user(user_id INT(11) PRIMARY KEY, telephone_cipher VARCHAR(100), telephone_plain VARCHAR(100))
Copy the code
In version 5.0.0 GA, we added support for rewriting encryption and decryption DDL statements, so that routing and rewriting can be performed normally during the creation of T_order, whether it is data sharding, read/write separation, or encryption and decryption. T_user table is routed to ds_1 data source from logs. In 5.0.0 GA version, T_USER belongs to a single table, and there is no need for users to configure data source. During the execution of table construction sentences, a data source will be randomly selected for routing. For a single table, we need to ensure that it is unique in the logical library to ensure the accuracy of routing results.
INSERT INTO t_order(order_id, user_id, content) VALUES(1, 1, 'TEST11'), (2, 2, 'TEST22'), (3, 3, 'TEST33');
# Logic SQL: INSERT INTO t_order(order_id, user_id, content) VALUES(1, 1, 'TEST11'), (2, 2, 'TEST22'), (3, 3, 'TEST33')
# Actual SQL: ds_1 ::: INSERT INTO t_order_1(order_id, user_id, content_cipher, content_plain) VALUES(1, 1, '3qpLpG5z6AWjRX2sRKjW2g==', 'TEST11'), (3, 3, 'oVkQieUbS3l/85axrf5img==', 'TEST33')
# Actual SQL: ds_0 ::: INSERT INTO t_order_0(order_id, user_id, content_cipher, content_plain) VALUES(2, 2, 'mzIhTs2MD3dI4fqCc5nF/Q==', 'TEST22')
INSERT INTO t_order_item(item_id, order_id, user_id, content) VALUES(1, 1, 1, 'TEST11'), (2, 2, 2, 'TEST22'), (3, 3, 3, 'TEST33');
# Logic SQL: INSERT INTO t_order_item(item_id, order_id, user_id, content) VALUES(1, 1, 1, 'TEST11'), (2, 2, 2, 'TEST22'), (3, 3, 3, 'TEST33')
# Actual SQL: ds_1 ::: INSERT INTO t_order_item_1(item_id, order_id, user_id, content) VALUES(1, 1, 1, 'TEST11'), (3, 3, 3, 'TEST33')
# Actual SQL: ds_0 ::: INSERT INTO t_order_item_0(item_id, order_id, user_id, content) VALUES(2, 2, 2, 'TEST22')
INSERT INTO t_user(user_id, telephone) VALUES(1, '11111111111'), (2, '22222222222'), (3, '33333333333');
# Logic SQL: INSERT INTO t_user(user_id, telephone) VALUES(1, '11111111111'), (2, '22222222222'), (3, '33333333333')
# Actual SQL: ds_1 ::: INSERT INTO t_user(user_id, telephone_cipher, telephone_plain) VALUES(1, 'jFZBCI7G9ggRktThmMlClQ==', '11111111111'), (2, 'lWrg5gaes8eptaQkUM2wtA==', '22222222222'), (3, 'jeCwC7gXus4/1OflXeGW/w==', '33333333333')
Copy the code
When data is inserted into the T_USER table, it is automatically routed based on the information stored in the metadata. Since T_USER was routed to the DS_1 data source in the previous step, other statements are routed based on metadata like T_USER: DS_1.
SELECT * FROM t_order WHERE user_id = 1 AND order_id = 1;
# Logic SQL: SELECT * FROM t_order WHERE user_id = 1 AND order_id = 1
# Actual SQL: ds_1_slave_0 ::: SELECT `t_order_1`.`order_id`, `t_order_1`.`user_id`, `t_order_1`.`content_cipher` AS `content` FROM t_order_1 WHERE user_id = 1 AND order_id = 1
SELECT * FROM t_order_item WHERE user_id = 1 AND order_id = 1;
# Logic SQL: SELECT * FROM t_order_item WHERE user_id = 1 AND order_id = 1
# Actual SQL: ds_1_slave_1 ::: SELECT * FROM t_order_item_1 WHERE user_id = 1 AND order_id = 1
SELECT * FROM t_user WHERE user_id = 1;
# Logic SQL: SELECT * FROM t_user WHERE user_id = 1
# Actual SQL: ds_1_slave_0 ::: SELECT `t_user`.`user_id`, `t_user`.`telephone_cipher` AS `telephone` FROM t_user WHERE user_id = 1
Copy the code
When executing the query statement, we can see that the T_USER table is routed to the ds_1_SLAVe_0 data source, enabling read and write separation for a single table. In the 5.0.0 GA version, the Apache ShardingSphere kernel internally maintains the data distribution information of a single table through metadata loading, and fully considers the combination of different functions, so that a single table can be perfectly supported.
There are many new features in the 5.0.0 GA release, and the examples in the upgrade guide are just some of the features that are supported in both GA releases. They are expected to help you understand the new features and implement the upgrade smoothly. If you are interested in pluggable architecture, Federation execution engine, or other new features, please check out the official documentation.
conclusion
After two years of polishing, Apache ShardingSphere shows in front of everyone with a new attitude. The pluggable architecture kernel provides infinite possibilities for all developers. In the future, we will continue to develop new functions based on the pluggable architecture kernel. Enrich the Apache ShardingSphere ecosystem. The Federation execution engine opens the door to distributed query, and we will focus on memory and performance optimization to provide more reliable and efficient distributed query capabilities. Finally, you are welcome to actively participate in promoting the development of Apache ShardingSphere.
Reference documentation
1️ Apache ShardingSphere Release Note: github.com/apache/shar…
1 ️ new Sharding configuration API of Release 5.x: github.com/apache/shar…
1 ️ Automatic Sharding Strategies for Databases and Tables: github.com/apache/shar…
4️ from middleware to distributed database ecology, ShardingSphere 5. X innovation becomes old
5️ ShardingSphere X openGauss, what kind of chemical reaction will be produced
6 ️ ⃣ contribution guidelines: shardingsphere.apache.org/community/c…
7️ Chinese community: community.sphere-ex.com/
Welcome to add community manager wechat (SS_assistant_1), reply to “communication group” into the group and many ShardingSphere enthusiasts to discuss together!