In distributed systems, there are some scenarios where globally unique ids are required. In this case, 36-bit UUID can be used to prevent ID collisions. However, UUID has some disadvantages
Sometimes we want to use a simpler ID, and we want the ids to be generated in chronological order
Public number: Longtai technical notes
GitHub:acmenlt
What is the Snowflake algorithm
Snowflake, which means Snowflake in Chinese and is often referred to as the Snowflake algorithm, is Twitter’s open-source distributed ID generation algorithm
The Twitter snowflake algorithm is generated as a 64-bit long value, with the time stamp introduced in the component, which basically keeps the increment
Advantages of the SnowFlake algorithm:
- High performance high availability: Independent of the database, it is completely generated in memory
- High throughput: Generate millions of autoincrement ids per second
- ID increment: Saves the ID to the database, which improves index efficiency
Disadvantages of the SnowFlake algorithm:
Depending on the consistency with the system time, if the system time is called back or changed, ID conflicts or duplicate may occur
Snowflake algorithm composition
The snowflake structure is shown below:
There are four components
Not used: 1bit, the highest bit is the sign bit, 0 indicates positive, 1 indicates negative, fixed 0
Timestamp: 41bit, millisecond timestamp (69 years of use for 41 bits)
Identifier bit: 5-bit DATA center ID and 5-bit working machine ID. Combined, a maximum of 1024 nodes can be deployed
Serial number: the 12-bit increasing serial number indicates that the node generates repeated ids within milliseconds. The serial number indicates that the node is unique. The 12-bit number can generate 4096 ids every millisecond
If 4096 non-repeating ids can be generated from serial numbers in one millisecond, 4096 x 1000 = 409W IDS can be generated in one second
The default snowflake algorithm is 64 bits, and the length can be customized. If you want to run longer, increase the number of timestamp bits; If more nodes need to be deployed, increase the identifier bit length. If the concurrency is high, increase the serial number digit
Summary: The snowflake algorithm is not invariable and can be customized according to the specific scene in the system
The Snowflake algorithm applies to scenarios
Because snowflake algorithm order increment, guarantee MySQL B+ Tree index structure insert high performance
Therefore, in daily business use, snowflake algorithm is mostly applied to the primary key ID of database and the primary key of business association
Snowflake algorithm generates ID duplication problem
Hypothesis: an order microservice generates ID through the Snowflake algorithm and deploys three nodes with the same identification bit
At this time, there are 200 concurrent, evenly distributed three nodes, three nodes in the same millisecond with the same serial number to generate ID, then will generate duplicate ID
Based on the above hypothetical scenarios, it can be known that there are certain preconditions for the generation of ID conflicts by snowflake algorithm
- The service is deployed in a cluster with the same identifier bits on some machines
- Services have a certain amount of concurrency, without which duplicate problems cannot be triggered
- When the ID is generated: The sequence numbers are the same for the same millisecond
How is an identity bit defined
If you can ensure that the identity bit is not duplicated, then the snowflake ID will not be duplicated
From the above example, you know the necessary conditions for ID duplication. If you want to avoid duplicate ids within the service, you need to move the article from the identity bit
Let’s take a look at how to define identifier bits in an open source framework using the Snowflake algorithm
Mybatis-Plus v3.4.2 Snowflake algorithm implementation class Sequence, provides two construction methods: no parameter construction, automatic generation of dataCenterId and workerId; Parameter constructs that explicitly specify the identifier bit when creating a Sequence
Hutool V5.7.9 provides a default implementation based on Mybatis-Plus dataCenterId and workerId generation schemes
How to generate dataCenterId and workerId
public static long getDataCenterId(long maxDatacenterId) { long id = 1L; final byte[] mac = NetUtil.getLocalHardwareAddress(); if (null ! = mac) { id = ((0x000000FF & (long) mac[mac.length - 2]) | (0x0000FF00 & (((long) mac[mac.length - 1]) << 8))) >> 6; id = id % (maxDatacenterId + 1); } return id; }Copy the code
The input parameter maxDatacenterId is a fixed value representing the maximum number of data center ids. The default value is 31
Why is the maximum 31? Because the maximum binary value of 5 bits is 11111, which corresponds to the decimal value 31
When obtaining the dataCenterId, the network interface is empty and the default value is 1L. The other is not empty and obtains the dataCenterId from the Mac address
The value of dataCenterId depends on the Mac address
Now look at the workerId
public static long getWorkerId(long datacenterId, long maxWorkerId) {
final StringBuilder mpid = new StringBuilder();
mpid.append(datacenterId);
try {
mpid.append(RuntimeUtil.getPid());
} catch (UtilException igonre) {
//ignore
}
return (mpid.toString().hashCode() & 0xffff) % (maxWorkerId + 1);
}
Copy the code
The input parameter maxWorkderId is also a fixed value representing the maximum number of working machine ids. The default value is 31. DatacenterId is taken from the getDatacenterId method above
The value of the name variable is PID@IP, so name needs to be divided according to @ and get the subscript 0 to get PID
The MAC + PID hashcode is used to obtain 16 low values, and then the workerId is finally obtained
Assign identifier bit
The acquisition of Mybatis-Plus identification bit depends on Mac address and process PID. Although it can be done as little as possible, there is still a small probability
How can identity bits be defined so that they are not duplicated? There are two schemes: pre-allocation and dynamic allocation
pre-allocated
Before the application goes online, count the number of nodes currently serving the application and manually apply for an IDENTIFIER
This scheme, without the amount of code development, can be used when service nodes are fixed or projects are few, but it cannot solve the problem of dynamic expansion of service nodes
Dynamic allocation
The identification bit is stored in middleware such as Redis, Zookeeper and MySQL to request the identification bit when the service is started. After the request, the identification bit is updated to the next available one
By storing the identifier bits, we extend the question of whether the ID of the Snowflake algorithm is unique within the service or globally
Take Redis as an example. If you want to be unique in the service, the Redis node storing the identifier bits can be used in its own project. If globally unique, all applications that use the snowflake algorithm use the same Redis node
The only difference is whether Redis is shared between different services. If there is no requirement for global uniqueness, it is best to make it unique within the ID service, because this avoids single point problems
If the number of service nodes exceeds 1024, you need to perform additional expansion. You can extend the 10-bit identifier or choose the open source distributed ID framework
Dynamic allocation implementation scheme
Redis stores a Hash Key that contains two key-value pairs: dataCenterId and workerId
When the application starts, the Lua script goes to Redis to get the identity bit. The dataCenterId and workerId are obtained and incremented in the Lua script, and the call returns the available flag bits
The specific Lua script logic is as follows:
- Redis may not have the “snowflake_work_id_key” Hash when obtaining the first service node. If the Hash does not exist, the dataCenterId and workerId will be initialized to 0
- If the Hash exists, check whether the dataCenterId and workerId are equal to the maximum value of 31. If the conditions are met, the system returns after the dataCenterId and workerId are initialized to 0
- The total number of permutations of dataCenterId and workerId is 1024. The workerId is allocated first
- Check whether workerId! If (workerId = 31); if (workerId = 31); If workerId = 31, increment the dataCenterId and set the workerId to 0
DataCenterId and workerId are all pushed down, forming a loop. Through the atomicity of Lua script, ensure that the generation of snowflake algorithm under 1024 nodes is not repeated. If the identity bit is equal to 1024, the loop continues from the beginning
Open source distributed ID framework
Both Leaf and Uid have implemented the snowflake algorithm. Leaf provides the number segment mode to generate ID in addition
Meituan Leaf:https://github.com/Meituan-Dianping/Leaf
Baidu Uid:https://github.com/baidu/uid-generator
The Snowflake algorithm can meet most scenarios. If it is not necessary, it is not recommended to introduce open source solutions to increase system complexity
review
This article helps readers sort out what is the snowflake algorithm and how to solve the problem of ID conflict generated by snowflake algorithm
As for the problem of ID conflict generated by snow ring algorithm, a scheme is proposed: assigning identifier bits; By assigning the component identifier bits of the snowflake algorithm, the default ID generation is unique under 1024 nodes
You can go to see Hutool or Mybatis-Plus snowflake algorithm concrete implementation, to help you better understand
The Snowflake algorithm is not a panacea and cannot be applied to all scenarios. If the ID needs to be globally unique and the number of service nodes exceeds 1024, you can modify the composition of the algorithm itself, that is, expand the identifier bit, or choose LEAF and UID
Creation is not easy, the article read help, point attention to support, wish good