This article is from the official account: Java Universe (wechat id: Javagogo)

The original link: mp.weixin.qq.com/s/Fy45BKBIZ… Author: Gao Hongtao

Automatic sharding is the mainstream function of distributed database, all major distributed databases, even database middleware are trying to automatic sharding. To introduce the sharding algorithm, I use Apache ShardingShpere as an example.

Shard key

ShardingShpere first provides distributed primary key generation, which is the key to generating shard keys. The primary key generation based on database instances is not suitable for distributed scenarios because distributed databases typically involve multiple database nodes.

Commonly used algorithms are UUID and Snowfalke stateless generation algorithms.

UUID is the simplest method, but the generation efficiency is not high and the data dispersion is mediocre. Therefore, the latter algorithm is used in the current production environment.

There are three valid parts.

  • Timestamp: The algorithm is a Unix-like representation of time. It is the number of milliseconds between a specific time and the current point in time. In this case, the algorithm can be used for nearly 70 years.

  • Working node ID: Ensures that no duplicate data is generated for each independently working database node.

  • Access sequence: Ensure that the generated ID is not repeated in the same process and within the same millisecond.

Shard tools

In order to ensure the flexibility of sharding calculation, ShardingShpere provides standard sharding algorithm and some tools to allow users to flexibly and uniformly formulate database sharding strategy.

  • PreciseShardingAlgorithm can be used with hash function to realize hash sharding
  • RangeShardingAlogrithm enables range sharding
  • With ComplexShardingStrategy, you can implement a fusion sharding algorithm using multiple sharding keys
  • Sometimes, the sharding modes of data tables are not completely consistent. For some special sharding modes, HintShardingStrategy can be used to formulate special routing rules in the running state instead of using a unified sharding configuration
  • If users want to implement special sharding algorithms such as the geographic location algorithm, they can customize the sharding strategy
    • Written using inline expressions, no compilation is required based on configuration, suitable for simple personalized sharding calculations
    • Using Java code, you can do more complex calculations, but you need to compile and package them

Automatic subdivision

ShardingShpere provides sharding-scale to support elastic scaling of database nodes, which is its support for automatic Sharding. The following figure shows the display of automatic Sharding function. It can be seen that the original two databases are expanded to three after the sharding-scale feature.

Automatic sharding consists of four processes as shown below.

As you can see from the figure, ShardingShpere can support complex automatic sharding based on hash.

It should also be noted that it is very difficult to automate sharding without professional and automated elastic scaling tools.

The above is the actual case of sharding algorithm, using the classic horizontal sharding mode.


Welcome to pay attention to the public account of the Java universe (wechat: Javagogo), refuse hydrology, harvest dry goods!