Distributed primary key issues

In traditional database software development, automatic generation of primary keys is a basic requirement. Each database also provides corresponding support for this requirement, such as MySQL’s increment key, Oracle’s increment sequence and so on. After data sharding, generating globally unique primary keys from different data nodes is a very tricky problem. Self-increment keys between different actual tables in the same logical table are not aware of each other, resulting in duplicate primary keys. Although collisions can be avoided by constraining autoincrement primary key initial values and steps, additional operation and maintenance rules need to be introduced, resulting in a lack of integrity and scalability of the solution.

There are a number of third-party solutions that perfectly solve this problem, such as UUID, which relies on a specific algorithm to generate non-repeating keys, or service generation by introducing primary keys. In order to facilitate the use of users and meet the needs of different users in different scenarios, Apache ShardingSphere not only provides built-in distributed primary key generators, such as UUID and SNOWFLAKE, but also removes the interface of distributed primary key generators to facilitate users to implement self-defined auto-increment primary key generators.

Refer to the previous article: do you want to use auto-increment ids?

Shard key

Database fields used for sharding are key fields for horizontal splitting of databases (tables). For example, if the mantissa of the order primary key in the order table is modulo sharded, the order primary key is a sharding field. If there is no fragment field in SQL, full routing is performed, resulting in poor performance. In addition to single-sharding field support, Apache ShardingSphere also supports sharding by multiple fields.

routing

Broadcast routing

For SQL that does not carry sharding keys, broadcast routes are used.


SELECT * FROM t_order WHERE good_prority IN (1.10);
Copy the code

Such an SQL broadcast to the full database table query

SELECT * FROM t_order_0 WHERE good_prority IN (1.10);
SELECT * FROM t_order_1 WHERE good_prority IN (1.10);
SELECT * FROM t_order_2 WHERE good_prority IN (1.10);
SELECT * FROM t_order_3 WHERE good_prority IN (1.10);
Copy the code

Unicast routing

Unicast routing is used to obtain real table information. It only needs to obtain data from any real table in any library.

practice

Since the development direction of our database is something like this:

Therefore, we follow the trend to do the operation practice of single library sub-table first, and then do sub-library sub-table.

To declare a version, the following practices use ShardingSphere5.0 Beta

Single library sub table – New business

Let’s assume that in an ideal world, the business is split before it even starts.

Since ShardingSphere does not automatically create tables according to the table splitting rules, we first create tables manually. Of course, only the table names are different, but the table structure is the same. The following table building sentences are used for testing:

CREATE TABLE `t_order_sharding_0` (
  `order_id` bigint(20) NOT NULL,
  `user_id` bigint(20) DEFAULT NULL,
  `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
  `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update Time'.PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `t_order_sharding_1` (
  `order_id` bigint(20) NOT NULL,
  `user_id` bigint(20) DEFAULT NULL,
  `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
  `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update Time'.PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Copy the code

To illustrate, we simply create two tables T_ORDER_SHARDING_0 and T_order_sharDING_1

Here is my yamL configuration file for the sub-table section:

Configure the database table
sharding:
  sharding-algorithms:
    table_inline:
      props:
        algorithm-expression: t_order_sharding_${order_id % 2}
      type: INLINE
  tables:
    t_order_sharding:
      Configure t_order table rules
      actual-data-nodes: write-ds.t_order_sharding_$->{0.. 1}
      table-strategy: # table splitting strategy
        standard:
          sharding-algorithm-name: table_inline
          sharding-column: order_id
Copy the code

Note that you can’t run this configuration for two reasons:

Table -inline 2 T_order_sharDING_ ${order_id % 2} t_order_sharding_$->{order_id % 2}Copy the code

The last normal configuration is:

Configure the database table
sharding:
  sharding-algorithms:
    table-inline:
      props:
        algorithm-expression: t_order_sharding_$->{order_id % 2}
      type: INLINE
  tables:
    t_order_sharding:
      Configure t_order table rules
      actual-data-nodes: write-ds.t_order_sharding_$->{0.. 1}
      table-strategy: # table splitting strategy
        standard:
          sharding-algorithm-name: table-inline
          sharding-column: order_id
Copy the code

As you can see from the configuration, my policy is simple. Since there are only two tables, modulo the value of the order_ID column with 2, and the values can only be 0 and 1, which happen to be the suffixes of my two tables.

If you want to divide more tables, or have other custom table partitioning policies and algorithms, you can refer to the official documentation to set them up.

Next we write the application side of the program, using the interface request to add some data, to see if we follow our rules into a different table. User_id and order_id were created using Snowflake using a native utility class.

Of course, ShardingSphere configuration can also help us automatically generate:

Configure the database table
sharding:
    key-generators:
        snowflake:
        type: SNOWFLAKE
        props:
            worker:
            id: 123
    sharding-algorithms:
        table-inline:
        props:
            algorithm-expression: t_order_sharding_$->{order_id % 2 }
        type: INLINE
    tables:
        t_order_sharding:
        Configure t_order table rules
        actual-data-nodes: write-ds.t_order_sharding_$->{0.. 1}
        table-strategy: # table splitting strategy
            standard:
            sharding-algorithm-name: table-inline
            sharding-column: order_id
        # Main build build strategy
        key-generate-strategy:
            key-generator-name: snowflake
            column: order_id

Copy the code

Keep an eye on the key-generators configuration

We used the business interface to access and then query four pieces of data and found that they were inserted into different tables. Importantly, the business program does not need to make any changes. It is the same logic as before using a table, such as using SpringBoot+MybatisPlus for fast business implementation. ShardingSphere will automatically do the relevant things for you when configured, so don’t worry.

{
    "code": 100000."msg": ""."data": [{"orderId": 654340378203258881."userId": 12."createTime": "The 2021-10-11 15:15:03"."updateTime": "The 2021-10-11 15:15:03"
        },
        {
            "orderId": 1447456383522967551."userId": 12."createTime": "The 2021-10-11 14:59:14"."updateTime": "The 2021-10-11 14:59:14"
        },
        {
            "orderId": 1447457650144055296."userId": 12."createTime": "The 2021-10-11 15:02:50"."updateTime": "The 2021-10-11 15:02:50"
        },
        {
            "orderId": 1447457651482038272."userId": 12."createTime": "The 2021-10-11 15:02:52"."updateTime": "The 2021-10-11 15:02:52"}}]Copy the code

According to the configuration of the table will be finished, there is no need to modify the business code, the original how to write how to write, very perfect, but need to pay attention to your SQL, the previous writing method support need to be according to the official document query, under normal circumstances ordinary SQL is ok, some special not, Of course, these special may be lazy development without reasonable design of the program just a SINGLE SQL fix, not the most appropriate way. For example, a complex SQL can use multiple simple SQL through the program using memory calculation results.

Sub-database sub-table can be configured with read/write separation. The following is a complete configuration:

spring:
  profiles:
    include: common-local
  shardingsphere:
    props:
      sql:
        Set whether SQL is displayed
        show: true
      check:
        table:
          metadata:
            enabled: false
    datasource:
      names: write-ds,read-ds-0
      write-ds:
        jdbcUrl: jdbc:mysql://mysql.local.test.myapp.com:23306/test?allowPublicKeyRetrieval=true&useSSL=false&allowMultiQueries=true&serv erTimezone=Asia/Shanghai&useSSL=false&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=co nvertToNull
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        username: root
        password: nicai
        connectionTimeoutMilliseconds: 3000
        idleTimeoutMilliseconds: 60000
        maxLifetimeMilliseconds: 1800000
        maxPoolSize: 50
        minPoolSize: 1
        maintenanceIntervalMilliseconds: 30000
      read-ds-0:
        jdbcUrl: jdbc:mysql://mysql.local.test.read1.myapp.com:23306/test?allowPublicKeyRetrieval=true&useSSL=false&allowMultiQueries=tru e&serverTimezone=Asia/Shanghai&useSSL=false&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehav ior=convertToNull
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        username: root
        password: nicai
        connectionTimeoutMilliseconds: 3000
        idleTimeoutMilliseconds: 60000
        maxLifetimeMilliseconds: 1800000
        maxPoolSize: 50
        minPoolSize: 1
        maintenanceIntervalMilliseconds: 30000
    rules:
      readwrite-splitting:
        data-sources:
          glapp:
            write-data-source-name: write-ds
            read-data-source-names:
              - read-ds-0
            load-balancer-name: roundRobin # Load balancing algorithm name
        load-balancers:
          roundRobin:
            type: ROUND_ROBIN ROUND_ROBIN ROUND_ROBIN ROUND_ROBIN
      Configure the database table
      sharding:
        key-generators:
          snowflake:
            type: SNOWFLAKE
            props:
              worker:
                id: 123
        sharding-algorithms:
          table-inline:
            props:
              algorithm-expression: t_order_sharding_$->{order_id % 2 }
            type: INLINE
        tables:
          t_order_sharding:
            Configure t_order table rules
            actual-data-nodes: write-ds.t_order_sharding_$->{0.. 1}
            table-strategy: # table splitting strategy
              standard:
                sharding-algorithm-name: table-inline
                sharding-column: order_id
            key-generate-strategy:
              key-generator-name: snowflake
              column: order_id

Copy the code

Single library sub-table – old business

Listen next time

reference

  • Shardingsphere.apache.org/document/5….