How do YOU generate unique ids?
In the use of databases, we often need to generate unique ids because each row in the database must be uniquely distinguishable according to the second normal form design principle. In the era of RDBMS (relational database management system), database provides sequence generators, such as Oracle sequence, mysql increment and so on. RDBMS is a centralized environment (single machine environment), global only need the current machine to decide on the line; But in a distributed (decentralized) environment, where multiple hosts coexist, how can they automatically generate globally non-duplicate ids?
There are two main types of solutions
Method one: still adopt the idea of centralization
A batch of sequences are pre-generated in the RDBMS, and each node in the distributed environment gets a number segment from the RDBMS when it is started, which is used separately. The Segment mode of Meituan Leaf falls into this category. \
Method two: adopt the idea of decentralization
As a rule, each node in a distributed environment generates its own globally unique ID. UUID, GUID, and Snowflake algorithms all fall into this category. \
❉❉❉❉❉❉ Snowflake algorithm ❉❉❉❉❉❉
Many of these innovations are very simple, and so is the Snowflake algorithm. We need to learn the design philosophy of this approach, which can be applied to ids in distributed environments.
The Snowflake algorithm is open-source by Twitter and is set to 64 bits. , which consists of the first digit, time stamp, machine ID and increment sequence.
- First, 1 bit, fixed to 0; Ponder: Why is the first digit 0?
- Timestamp, 41 bits, millisecond difference between the current time and the specified date; [Thinking: Why is the time difference?]
- Cluster node ID, 10 bits, maximum 2^10, total 1024 machines;
- Increment sequence, 12 bits, Max. 2^12, total 4096 ids.
No two snowflakes are alike
The id generated by each node will be locally unique due to the difference of timestamp and increment sequence. With the cluster node ID, it naturally achieves global uniqueness, so the snowflake algorithm achieves the purpose of “no two snowflakes are the same in the world”.
At the same time, the timestamp supports up to 4096 ids per millisecond, so each node can generate 4096000 ids per second, and the generated ids will only exceed 41 bits after (2^41-1)/86400/365/1000=69 years.
Design of the core
So the core of its design is:
1. Self-increasing ID is used in a cycle to ensure local uniqueness within a certain time;
2, millisecond level timestamp, provide second level to generate a large number of ID, deal with high request;
3. Cluster node ID, which must be globally unique.
Once the design idea is understood, it can be improved accordingly. For example, if the number of baidu clusters exceeds 1024, what should I do?
Baidu has adjusted the algorithm of Snowflake, and its UID is 1bit first digit +28bit timestamp +22bit machine ID +13bit serial number. So Baidu UID supports 2^22=4194304 nodes, and each node can generate 2^13=8192 ids per second. However, the timestamp is shorter and only supports the second level, so the id generated by this algorithm will exceed the 28bit length after (2^28-1)/86400/365=8.5 years.
So, Baidu students, what are you going to do after eight and a half years?
Extension: What are the problems with the Snowflake algorithm? What’s the solution? What other scenarios can it be used in?
And don’t forget down down down down down down down down down down down
Point to share
A little praise
Some in the