As the system becomes more and more complex, more and more data will be introduced into the sub-database sub-table. In this case, the db’s self-added ID cannot meet the requirements, and the distributed ID is introduced to meet the requirements. Here are several scenarios for generating distributed ids.
The id of the database is automatically added
Train of thought
- Import a database (called the ID database) and create an ID table
- Before each data is inserted, the id database inserts a data and obtains an id as the ID of the data
advantages
- Implement a simple
disadvantages
- Not high availability. Once the database fails, the distributed ID will not be available
- High performance loss.
- The entire interaction process involves network overhead
- Database disk I/O cost
- The maximum number of concurrent database requests will become the maximum number of concurrent data inserts for the service
Database multi-master mode
In order to solve the high availability problem, multiple master databases are introduced on the basis of single database.
Train of thought
- Each database passes
New ID = Different step size + old ID
To get the ID (avoid duplicate ids). - Added background services to obtain ids from different databases in load balancing mode
advantages
- High availability
- The concurrency burden is greatly reduced compared to the single master database schema
disadvantages
- Once the step size and the initial ID value are fixed, horizontal scaling of the database is difficult
Redis INCR command
Train of thought
- To obtain an ID, request Redis to run INCR to obtain an incremented ID
advantages
- Implement a simple
- High concurrency performance
disadvantages
- Not high availability. Because the Redis cluster is replicated asynchronously, you can only use stand-alone Redis to avoid duplicate ids
- There is still a significant network performance overhead
UUID
advantages
- The implementation is simple, and the corresponding methods exist in each language’s own class library
- Locally generated with no IO overhead
disadvantages
- The length is too long. MySQL recommends that the primary key be as short as possible, and the UUID length is 36 characters long
- Random sequence. Being a database primary key is prone to split pages.
- The clock is rolled back. Duplicate ids may result
Snowflakes algorithm
Twitter’s Snowflake algorithm for distributed ID generation.
Train of thought
A brief description
- The highest bit is the sign bit, which is always 0 and is not available.
- A 41-bit time series, accurate to the millisecond level, lasts 69 years. Another important function of the time bit is that it can be sorted by time. Note that the 41-bit cut-off is not the current cut-off, but the difference between the current cut-off and the start cut-off. The start cut-off is usually the time our ID generator started using. Specified by our application (as shown in the START_STMP property of the SnowFlake class). The 41-bit time segment can be used for 69 years. Year T = (1L << 41)/(1000L * 60 * 60 * 24 * 365) = 69
- A 10-bit machine ID supports a maximum of 1024 nodes.
- The 12-bit serial number is a series of self-increasing ids that can generate multiple ID numbers for a node in a millisecond. The 12-bit serial number supports 4096 ID numbers for each node in a millisecond.
It adds up to just 64 bits, which is a Long. This algorithm is compact, but still a good ID generation strategy. Among them, the 10-bit identifier is generally 5-bit IDC+ 5-bit machine number, which uniquely determines a machine.
advantages
- It is monotonically increasing on a single machine. In distributed systems, if no clock rollback occurs, the clock is trending upward.
disadvantages
- The length is too long. MySQL recommends that the primary key be as short as possible, and the UUID length is 36 characters long
- The clock is rolled back. Duplicate ids may result
Them roughly mode
An entire number segment is obtained for each request to generate an ID.
advantage
- It ensures that the trend increases
- Reduces the concurrency for generating distributed ID services
- Reduces the IO overhead of generating distributed ID services
RedisINCRBY command
INCRBY key increment
Train of thought
- To obtain an ID, request Redis to execute the INCRBY command to obtain an ID
- (id-increment,id] indicates the obtained number segment
advantages
- Implement a simple
- High concurrency performance
disadvantages
- Not high availability. Because the Redis cluster is replicated asynchronously, you can only use stand-alone Redis to avoid duplicate ids
Meituan Leaf
Leaf-segment
Train of thought
- Build table
CREATE TABLE 'tiny_id_info' (' id 'int(11) NOT NULL AUTO_INCREMENT COMMENT '主键', 'biz_type' varchar(64) NOT NULL COMMENT ' ', 'max_id' int(11) NOT NULL COMMENT ', 'step' int(11) NOT NULL COMMENT ', 'step' int(11) NOT NULL COMMENT ', ', 'version' int(11) NOT NULL COMMENT ' ', PRIMARY KEY (' id ')) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='db ';Copy the code
- A distributed ID service is provided to obtain number segments by step
- Double buffer. When a certain number of number segments are used, multithreading asynchronously goes to the database to fetch and load the next number segment
advantages
- Convenient linear extension
- High disaster recovery. The number segment cache in the service ensures normal service in a short time even if DB goes down
- The double buffer ensures that requests will not block because there is no number
disadvantages
- Not high availability
- A single database has concurrency performance bottlenecks
Leaf-snowflake
Train of thought
Based on the Snowflake algorithm, the Snowflake node is automatically configured with wokerID by using the feature of the Zookeeper persistent sequential node. Avoid the cost of manually setting the workId when the number of service clusters is large.
Drops TinyId
Based on the Leaf – segment.
This section describes the tinyID mechanism
features
- Added more DB support to ensure the high availability of DB
- Added the client SDK. The server generates a number segment, and the client generates an ID based on the number segment. Reduces network I/O overhead
- The client introduces the double number segment cache to avoid the waiting time for retrieving a number segment when the number segment is used up. In addition, disaster recovery is improved