Two core requirements for distributed IDS:

  • Globally unique
  • Trends and orderly
  • A high performance

1 UUID

A globally unique ID based on the UUID. A string like the order number UUID makes no sense at all, showing no useful information about the order; For databases, it is not recommended to use a distributed ID as a business primary key ID because it is not only too long or a string, but also time-consuming to store queries with poor performance.

advantages

  • The generation is simple enough that the local generation has no network consumption and is unique

disadvantages

  • It is an unordered string and does not have the trend increment characteristic
  • No specific business meaning, no useful information related to the order
  • If a string of 16 bytes, 128 bits, or 36 bits is too long, storage and query will be costly to MySQL performance. MySQL officials clearly recommend that the primary key should be as short as possible and be used as the primary key of the databaseUUIDThe disordering of data can lead to frequent changes in data locations, seriously affecting performance

Applicable scenario

  • It can be used to generate scenarios such as tokens, which are sufficiently unrecognizable, unordered, and sufficiently long to be read
  • It can be used in scenarios where there is no pure number requirement, unordered increment, and no readability requirement

2 The ID of the database is automatically added

The database based auto_increment ID can fully act as a distributed ID. When we need an ID, insert a record into the table to return the primary key ID, but this method has a fatal disadvantage, when the traffic surge MySQL itself is the bottleneck of the system, using it to implement distributed services is risky, not recommended. The related SQL is as follows:

 CREATE DATABASE `SEQ_ID`;
 CREATE TABLE SEQID.SEQUENCE_ID (
     id bigint(20) unsigned NOT NULL auto_increment, 
     value char(10) NOT NULL default ' '.PRIMARY KEY (id),
 ) ENGINE=MyISAM;
 ​
 insert into SEQUENCE_ID(value)  VALUES ('values');
Copy the code

advantages

  • Simple implementation, ID monotonic increment, numeric type query speed

disadvantages

  • The SINGLE-point DATABASE (DB) system may break down and cannot support high concurrency scenarios

Applicable scenario

  • Small-scale service scenarios with small data accesses
  • No high concurrency scenario, insert record controllable scenario

3 Database multi-master mode

The single point database is not desirable, so do some high availability optimization for the above method, and change to the primary/secondary mode cluster. If one of the primary nodes is not available, then the Mysql cluster is in dual primary mode, which means that both Mysql instances can produce their own ID increment separately.

Question: What if two MySQL instances start from 1 and generate duplicate ids?

Solution: Set the starting value and the increment step

MySQL_1 configuration:

 set @@auto_increment_offset = 1;     Starting values -
 set @@auto_increment_increment = 2;  - step
 -- The autoincrement ids are 1, 3, 5, 7, 9......
Copy the code

MySQL_2 configuration:

 set @@auto_increment_offset = 2;     Starting values -
 set @@auto_increment_increment = 2;  - step
 -- Increment ID: 2, 4, 6, 8, 10......
Copy the code

What if the clustered performance still doesn’t support high concurrency? Add nodes for MySQL expansion:

As can be seen from the figure above, the database cluster that scales horizontally is helpful to solve the problem of single point pressure on the database. Meanwhile, for the ID generation feature, the auto-step size is set according to the number of machines. Add a third MySQL instance need to manually modify one, two two starting values for MySQL instance and step length, the third machine ID generated initial position is set in than existing on the largest ID place far away, but it must be in the first and second two MySQL instance ID haven’t rise to the third stage MySQL instance ID value, the beginning of the Otherwise, the auto-increment ID will be repeated, and if necessary, it may need to stop and modify.

advantages

  • Solve the DB single point problem

disadvantages

  • It is not conducive to subsequent expansion, and in fact, the pressure on a single database is still large, which still cannot meet the requirements of high concurrency scenarios

Applicable scenario

  • Scenarios where the data volume is small and the database does not need to be expanded

This solution, in addition to being difficult to adapt to large-scale distributed and high concurrency scenarios, is suitable for ordinary business scale, so it is worth accumulating.

4 Database number segment mode

The number segment pattern is one of the mainstream implementation methods of distributed ID generator at present. It can be understood as obtaining auto-increment IDS from the database in batches, and fetching a range of number segments from the database each time. For example, (1,1000) represents 1000 ids. The table structure is as follows:

 CREATE TABLE id_generator (
   id int(10) NOT NULL,
   max_id bigint(20) NOT NULL COMMENT 'Current maximum ID',
   step int(20) NOT NULL COMMENT 'Step size of number segment',
   biz_type  int(20) NOT NULL COMMENT 'Type of business',
   version int(20) NOT NULL COMMENT 'Version number'.PRIMARY KEY (`id`)
 ) 
Copy the code

Biz_type: indicates different service types

Max_id: indicates the maximum available ID

Step: indicates the length of the number segment

Version: An optimistic lock. The version is updated every time to ensure data correctness during concurrent operations

id biz_type max_id step version
1 101 1000 2000 0

When this batch of number segment ids is used up, apply for a new number segment from the database again. Update max_id= max_id + step. If the update is successful, the new number segment is successfully obtained. The new number range is (max_id,max_id +step).

update id_generator set max_id=max_id+${step}, version = version+1 where version=${version} and biz_type=${XXX}
Copy the code

Because multiple service terminals may operate at the same time, version number optimistic locking is adopted for update. This distributed ID generation method does not strongly depend on the database, and it does not frequently access the database, which reduces the pressure on the database.

5 Redis mode

Redis can also be implemented, the principle is to use the Redis incr command to achieve atomic ID autoincrement.

  # Initialize autoincrement ID to 1127.0.0.1:6379 >set seq_id 1
 OK
 ​
 # increments by 1 and returns the increment127.0.0.1:6379 > incr seq_id (integer2)Copy the code

One caveat to redis implementation is the persistence of Redis. Redis has two persistence methods: RDB and AOF:

  • RDB: periodically creates a snapshot for persistenceredisIt’s not persisted in time, and this isredisIt’s dead. RestartredisThen the ID will be repeated
  • AOF: will persist each write command, even ifRedisIf the incr command is suspended, the ID will not repeat, but because of the particularities of the incr command, it willRedisData recovery takes a long time after the restart. Procedure

advantages

  • Sequential increment, readable
  • It can meet certain performance

disadvantages

  • Strongly dependent on Redis, there may be a single point of problem
  • Bandwidth usage, and the need to consider network latency and other issues with performance impact

Applicable scenario

  • The performance requirement is not too high, and the small scale business light scenario, and Redis running condition has certain requirements, pay attention to the network problem and single point pressure problem, if it is distributed, it will consider more problems, so in the group case this way is used less

In fact, the reliability of Redis’ solution needs to be studied. After all, depending on the network, delayed failure or downtime may lead to service unavailability, and such risk has to be taken into account in the system design.

6 Snowflake algorithm

Snowflake algorithm is an ID generation algorithm adopted by the internal distributed project of Twitter company. After open source, Snowflake has been widely praised by domestic big factories. Under the influence of this algorithm, major companies have developed distributed generators with their own characteristics.

Snowflake generates an ID of type Long. A Long is 8 bytes and each byte is 8 bits. That means a Long is 64 bits. Snowflake ID is a Long type consisting of 64 bits: positive digit (1 bit) + timestamp (41 bits) + Machine ID (5 bits) + Data center (5 bits) + Self-increment (12 bits).

  • The first bit bit (1bit) : In Java, the highest bit of the long is the sign bit, which represents positive and negative numbers. Positive numbers are 0, and negative numbers are 1. Generally, the generated ID is positive, so the default is 0
  • Timestamp part (41bit) : millisecond time. It is not recommended to save the current timestamp. Instead, use the difference (current timestamp – fixed start timestamp) to make the generated ID start from a smaller value. A 41-bit timestamp can be used for 69 years, (1L << 41)/(1000L * 60 * 60 * 24 * 365) = 69 years
  • Machine ID (10bit)Also known asworkId, this can be flexible configuration, machine room or machine number combination can be
  • Serial number part (12bit) : auto-increment supports 4096 ids generated by the same node in the same millisecond

advantages

  • It can generate millions of different ids per second, which is good performance
  • The timestamp value is high, the machine code is fixed in the middle, the increment sequence is in the position, and the entire ID is trending upward
  • The database node can be flexibly arranged to challenge bit division according to service scenarios, which is highly flexible

disadvantages

  • It is strongly dependent on the machine clock. If the clock is turned back, it will lead to repeated ID generation. Therefore, the algorithm based on this will generally throw an exception to prevent ID generation when the clock is turned back, which may cause service unavailability

Applicable scenario

  • The snowflake algorithm has an obvious disadvantage of clock dependency. It is feasible to use this method to generate distributed IDS if the machine does not have a clock fallback. Of course, it can be used in small-scale systems

7 Baidu (UUID Generator)

The UUID Generator is based on the Snowflake algorithm. Unlike the original Snowflake algorithm, the UUID Generator supports custom bits for each part of the timestamp, machine ID, and serial number. In addition, the user – defined workId generation policy is adopted in UID-Generator.

The UUID Generator needs to work with the database, and a new WORKER_NODE table needs to be added. When the application starts, it inserts data into the database table. After the insert is successful, the autoincrement ID returned is the workId data of the machine, which consists of host and port.

For UID-generator ID composition:

WorkId takes 22 bits, time takes 28 bits, and serialization takes 13 bits. The time is in seconds, not milliseconds, and the workId is different, and the same application consumes one workId every time it restarts.

8 Meituan (Leaf)

Leaf supports both segment mode and Snowflake algorithm mode, which can be switched between.

8.1 Leaf- Segment database solution

Create table leaf_alloc:

 DROP TABLE IF EXISTS `leaf_alloc`;
 CREATE TABLE `leaf_alloc` (
   `biz_tag` varchar(128)  NOT NULL DEFAULT ' ' COMMENT 'the key of business',
   `max_id` bigint(20) NOT NULL DEFAULT '1' COMMENT 'Maximum id currently allocated',
   `step` int(11) NOT NULL COMMENT 'Initial step size, also dynamically adjusted minimum step size',
   `description` varchar(256)  DEFAULT NULL COMMENT 'Description of business Key',
   `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Database maintenance update time'.PRIMARY KEY (`biz_tag`)
 ) ENGINE=InnoDB;
Copy the code

When the first Leaf machine runs out of numbers, it loads another number with the length of step=1000. Assuming that neither of the other two numbers has been updated, the first machine should load 3001 4000. Select * from biz_tag where max_id = 4000; select * from biz_tag where max_id = 4000;

 Begin
 UPDATE table SET max_id=max_id+step WHERE biz_tag=xxx;
 SELECT tag, max_id, step FROM table WHERE biz_tag=xxx;
 Commit
Copy the code

advantages

  • Leaf services can easily scale linearly, and the performance is sufficient to support most business scenarios
  • The ID is a 64 digit with 8 bytes increasing in trend and meets the primary key storage requirements of the preceding databases
  • High disaster recovery: Leaf service has internal number segment cache. Even if DB breaks down, Leaf can still provide external services normally in a short time
  • You can customize the size of max_ID to facilitate service migration from the original ID mode

disadvantages

  • ID number is not random enough, can leak the number of messages, not very safe
  • TP999 data fluctuates a lot. When the number segment is used up, it will still hang on the I/O updating the database, while tg999 data will occasionally spike
  • DB outage may cause the entire system to be unavailable

Double buffer optimization

As for the second disadvantage, it only appears after the number segment is used up, so the next number segment can be obtained before it is used up, thus solving the problem:

Double buffer mode is adopted, Leaf service has two number segment cache segments. When 10% of the current number segment is delivered, if the next number segment is not updated, another update thread is created to update the next number segment. After the current segment is delivered, if the next segment is ready, the current segment will be delivered in the next segment, and the cycle repeats:

  • Each Biz-Tag has consumption speed monitoring, and it is usually recommended to set the segment length as 600 times (10 minutes) of QPS for issuing numbers during peak service period, so that Leaf can continue to issue numbers for 10-20 minutes without being affected even if DB breaks down
  • Each time a request comes, the state of the next number segment is judged and the number segment is updated, so occasional network jitter will not affect the update of the next number segment

Leaf High availability disaster recovery

For the third “DB availability” problem, one Master and two slaves are used. In the extension room deployment, data is synchronized between Master and Slave in semi-synchronous mode.

8.2 the Leaf – snowflake scheme

The Leaf segment scheme can generate ID with increasing trend, and the ID number is computable, which is not applicable to the order ID generation scenario. For example, when competition parties place orders at 12 o ‘clock on two days, the order quantity of a day can be roughly calculated by subtraction of order ID number, which is intolerable. In the face of this problem, we offer leaf-Snowflake. The leaf-Snowflake solution follows the bit-bit design of the Snowflake solution, which is “1+41+10+12”. When the number of service clusters is small, you can configure the workerID manually. Leaf service scale is large, the cost of manual configuration is too high. The wokerID is automatically configured for the Snowflake node using Zookeeper’s persistent sequential node feature. Leaf-snowflake is started with the following steps:

  • Start the leaf-Snowflake service, connect to Zookeeper, and check if you have registered under the leaf_forever parent node.
  • If you have registered your workerID (int ID generated by zK sequence node), start the service
  • If not, create a persistent sequential node under the parent node. After the creation, retrieve the serial number as your own workerID and start the service

9. TinyID

Tinyid is implemented based on the number segment pattern principle, just like Leaf. Each service gets a number segment (1000,2000], (2000,3000], (3000,4000]).

Tinyid Supports HTTP and TinyID-client access.

9.1 Http Access

Step 1: Import the Tinyid source code

 git clone https://github.com/didi/tinyid.git
Copy the code

Step 2: Create the table

 - build SQL table
 CREATE TABLE `tiny_id_info` (
   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Increment primary key',
   `biz_type` varchar(63) NOT NULL DEFAULT ' ' COMMENT 'Business type, unique',
   `begin_id` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Start ID, which records only the initial value and has no other meaning. When initializing, begin_id and max_id should be the same.,
   `max_id` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Current maximum ID',
   `step` int(11) DEFAULT '0' COMMENT 'step',
   `delta` int(11) NOT NULL DEFAULT '1' COMMENT 'Each id increment',
   `remainder` int(11) NOT NULL DEFAULT '0' COMMENT 'remainder',
   `create_time` timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'Creation time',
   `update_time` timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'Update time',
   `version` bigint(20) NOT NULL DEFAULT '0' COMMENT 'Version number'.PRIMARY KEY (`id`),
   UNIQUE KEY `uniq_biz_type` (`biz_type`)
 ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT 'ID information table';
 ​
 CREATE TABLE `tiny_id_token` (
   `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'on the id',
   `token` varchar(255) NOT NULL DEFAULT ' ' COMMENT 'token',
   `biz_type` varchar(63) NOT NULL DEFAULT ' ' COMMENT 'The service type identifier accessible by this token',
   `remark` varchar(255) NOT NULL DEFAULT ' ' COMMENT 'note',
   `create_time` timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'Creation time',
   `update_time` timestamp NOT NULL DEFAULT '2010-01-01 00:00:00' COMMENT 'Update time'.PRIMARY KEY (`id`)
 ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT 'Token Information Table';
 ​
 - add tiny_id_info
 INSERT INTO `tiny_id_info` (`id`, `biz_type`, `begin_id`, `max_id`, `step`, `delta`, `remainder`, `create_time`, `update_time`, `version`) VALUES (1.'test'.1.1.100000.1.0.'the 2018-07-21 23:52:58'.'the 2018-07-22 23:19:27'.1);
 INSERT INTO `tiny_id_info` (`id`, `biz_type`, `begin_id`, `max_id`, `step`, `delta`, `remainder`, `create_time`, `update_time`, `version`) VALUES(2.'test_odd'.1.1.100000.2.1.'the 2018-07-21 23:52:58'.'the 2018-07-23 00:39:24'.3);
 ​
 - add tiny_id_token
 INSERT INTO `tiny_id_token` (`id`, `token`, `biz_type`, `remark`, `create_time`, `update_time`) VALUES(1.'0f673adf80504e2eaa552f5d791b644c'.'test'.'1'.'the 2017-12-14 16:36:46'.'the 2017-12-14 16:36:48');
 INSERT INTO `tiny_id_token` (`id`, `token`, `biz_type`, `remark`, `create_time`, `update_time`) VALUES(2.'0f673adf80504e2eaa552f5d791b644c'.'test_odd'.'1'.'the 2017-12-14 16:36:46'.'the 2017-12-14 16:36:48');
Copy the code

Step 3: Configure the database

datasource.tinyid.names=primary
datasource.tinyid.primary.driver-class-name=com.mysql.jdbc.Driver
datasource.tinyid.primary.url=jdbc:mysql://ip:port/databaseName? autoReconnect=true&useUnicode=true&characterEncoding=UTF-8
datasource.tinyid.primary.username=root
datasource.tinyid.primary.password=123456
Copy the code

Step 4: Test after starting Tinyid-server

 # Get distributed autoincrement ID
 http://localhost:9999/tinyid/id/nextIdSimple?bizType=test&token=0f673adf80504e2eaa552f5d791b644c'return results: 3 # batch access distributed on the ID http://localhost:9999/tinyid/id/nextIdSimple? bizType=test&token=0f673adf80504e2eaa552f5d791b644c&batchSize=10'Return result: 4,5,6,7,8,9,10,11,12,13Copy the code

9.2 Access using a Java Client

Step 1: Introduce dependencies

<dependency>
    <groupId>com.xiaoju.uemc.tinyid</groupId>
    <artifactId>tinyid-client</artifactId>
    <version>${tinyid.version}</version>
</dependency>
Copy the code

Step 2: Configure the file

tinyid.server =localhost:9999
tinyid.token =0f673adf80504e2eaa552f5d791b644c
Copy the code

Step 3: Test and tinyID. token insert data into the database table in advance. Test is the specific service type, and tinyID. token indicates the accessible service type

 // Get a single distributed auto-increment ID
 Long id =  TinyId.nextId("test");
 // Add ids in batches on demand
 List<Long> ids =  TinyId.nextId("test".10);
Copy the code