Concern public number: IT elder brother, read a dry goods article every day, a year later you will find a different self.
preface
System unique ID is a problem we often encounter when designing a system, and we often struggle with this problem.
This article is to provide you with a generation of distributed unique global ID generation scheme ideas, I hope to help you.
Deficiencies, please give more advice!!
The problem
Why do you need distributed globally unique ids and the business requirements for distributed ids
In a complex distributed system, it is often necessary to uniquely identify a large number of data and messages, such as finance, payment, catering and hotel on Meituan Dianping
The data in the system of products such as Cat’s Eye gradually increases, and a unique ID is needed to identify a piece of data or information after the database is divided into different databases and tables.
Special Ian orders, riders, coupons need to be marked with a unique ID
A system capable of generating globally unique ids is necessary at this point
Mandatory requirements for ID generation rules
-
Globally unique
-
Increasing trend
-
In MySQL’s InnoDB engine, clustered indexes are used. Since most RDBMS use Btree data structure to store indexes, we should try to use ordered primary keys to ensure write performance
-
Monotone increasing
-
Ensure that the next ID is greater than the previous ID. For example, the transaction version number, IM incremental messages, and sorting are required
-
Information security
-
If the ID is continuous, malicious users crawling work is very easy to do, directly in accordance with the order to download the specified URL, if the order number is dangerous, competitors can directly know our single amount of a day, so in some application scenarios, the need for ID is irregular, so that competitors can not guess
-
Contain the timestamp
-
Is it also possible to quickly understand in development when the distributed ID was generated
The ID number generates availability requirements for the system
-
High availability
-
To issue a request for a distributed ID, the server is guaranteed to create a unique distributed ID for me 99.999% of the time
-
Low latency
-
Send a request for a distributed ID, and the server needs to be fast, fast
-
High QPS
-
For example, the server will have to successfully create 100,000 distributed ids in one go
Generic generic solution
UUID
RandomUUID (), UUID standard type contains 32 hexadecimal digits, hyphen into five segments, form 8-4-4-12 36 characters, very high performance, local generation, no network consumption.
There is a problem
Poor inbound database performance because the UUID is unordered
Unordered, cannot predict its generation order, cannot generate the increasing order of the number
First of all, distributed ids are generally used gradually, but according to the official recommendation of mysql, the primary key should be as short as possible. UUID each is very long, so it is not recommended.
Primary key, when ID is the primary key, there are some problems under certain circumstances
For example, in the case of DB primary key, UUID is very unsuitable for MySQL
Index, B+ tree index split
Since the distributed ID is the primary key, and the primary key contains the index, and the mysql index is implemented through the B+ tree, every new UUID data is inserted, to optimize the query, the underlying INDEX B+ tree will be modified, because the UUID data is unordered. Therefore, every insert of UUID data will greatly modify the B+ tree of the primary key, which is not good. Completely disorderly insert will not only cause some intermediate nodes to split, but also create a lot of unsaturated nodes, which greatly reduces the performance of database insert.
UUID can only ensure global uniqueness and does not satisfy trend increasing or monotone increasing
The primary key is automatically added to the database
stand-alone
In the distributed environment, the main principle of the database increment ID mechanism is: the database increment ID mechanism and mysql database replace into implementation, replace into is similar to insert function, the difference is: Replace into first attempts to insert into the table. If there is already a row in the table (based on the primary key or unique index), delete the row first. Otherwise, insert new data directly.
REPLACE INTO means to insert a record and REPLACE old data if the value of the unique index in the table encounters a conflict
REPLACE into t_test(stub) values('b');
select LAST_INSERT_ID();
Copy the code
Each time we insert, we find that the original data is replaced and the ID is increased
That’s enough
-
increase
-
monotonicity
-
uniqueness
In distributed cases, and not many concurrent cases, you can use this solution to obtain a global unique ID
Cluster Distributed cluster
Is the database increment ID mechanism suitable for distributed ids? The answer is not very
It is difficult to expand the system horizontally. For example, after defining the step size and the number of machines, what should we do if we want to add machines? Suppose there is a machine whose number is 1,2,3,4,5 (step size is 1), and one machine needs to be expanded. It seems ok to set the initial value of the second batch of machines much higher than that of the first batch, but assuming there are 100 machines on the line, how to expand the capacity at this time is a nightmare, so the system horizontal expansion scheme is complicated and difficult to achieve.
Database pressure is still very large, each time to obtain the ID has to read and write the database, very bad performance, does not comply with the distributed ID in the low latency and high QPS rules (under high concurrency, if all to the database to obtain the ID, it is very bad performance)
Generate global ID policy based on Redis
Stand-alone version
Because Redis is single-threaded, atomicity is guaranteed by nature and can be achieved using atomic operations INCR and INCRBY
INCRBY: Set the growth step
Cluster distribution
Note: in the case of a Redis cluster, the same as MySQL, you need to set a different growth step, and the key must be set to expire, you can use a Redis cluster to achieve higher throughput.
Assuming there are five Redis in a cluster, you can initialize each Redis to a value of 1,2,3,4,5, and set the step size to 5
The ID generated by each Redis is:
A: 16 11 16 21 B: 2 7 12 17 22 C: 3 8 13 18 23 D: 4 9 14 19 24 E: 5 10 15 20 25Copy the code
But the problem is that Redis cluster maintenance and maintenance is more troublesome, troublesome configuration. Because of the single point of failure, sentry watch
But the main problem is that you need to import the entire Redis cluster for one ID
Snowflakes algorithm
What is the
Twitter’s distributed increment ID algorithm, Snowflake
Initially Twitter migrated its storage system from MySQL to Cassandra (an open source distributed NoSQL database system developed by Facebook) because Cassandra did not have a sequential ID generation mechanism, so a globally unique ID generation service was developed.
Twitter’s distributed SnowFlake algorithm, SnowFlake, was able to generate 260,000 self-incrementally sortable ids per second in tests
-
Twitter’s SnowFlake generation ids can be generated in chronological order
-
The result of SnowFlake’s ID is a 64-bit integer of type Long (up to 19 in length when converted to a string).
-
No ID collisions (distinguished by Datacenter and workerID) occur in distributed systems and are efficient
In a distributed system, there are some scenarios that require globally unique ids, and basic requirements for ID generation
-
In a distributed environment, it must be globally unique
-
In general, monotonically incrementing is required because unique ids are usually stored in the database, and InnoDB’s feature is to store content in the primary key index of the leaf node, and incrementing from left to right, so for database performance, it is best to generate ID monotonically incrementing. The 36-bit UUID can be used to prevent ID collisions, but UUID has some disadvantages. First, it is relatively long, and UUID is generally unordered
-
There might also be a need for no rules, because if you use something like a unique ID as an order number, you need this rule in order not to let anyone know how many orders were placed in a day
structure
Several core components of the Snowflake algorithm
In Java 64bit certificates are of type long, so the ID generated in SnowFlake is stored in the long class
The first part
The highest bit in binary is the sign bit, where 1 represents a negative number and 0 represents a positive number. The generated ID is usually an integer, so the highest bit is fixed at 0.
The second part
The second part is the 41bit timestamp bit, used to record the timestamp, millisecond level
41 bits can represent 2^ 41-1 numbers
If used only to represent positive integers, the range is 0-2^ 41-1, minus 1 because the range of values that can be represented starts at 0, not 1.
This means that 41 bits represent a value of 2^ 41-1 milliseconds, which translates into 69.73 years
The third part
The third part is the working machine ID. 10Bit is used to record the working machine ID
Can be deployed on 2^10 = 1024 nodes, including 5-bit datacenterId and 5-bit workerID
The largest positive integer that can be represented with 5 bits is 2 ^ 5 = 31 digits to represent different data centers and machine codes
The fourth part
The 12-bit bit can be used to express positive integers as 2^12 = 4095, i.e. 0, 1, 2… 4094 represents 4095 ID serial numbers generated in the same timestamp of the same machine.
SnowFlake is guaranteed
All generated ids increase over time
No duplicate ids are generated across the distributed system because the datacenterId and workerId distinguish between them
implementation
Snowflake algorithm is written by scala algorithm, some people use Java implementation, Github address
Github.com/beyondfengy…
/** * @author beyond */ public class snowflake {/** * private final static */ long START_STMP = 1480166465631L; Private final static long SEQUENCE_BIT = 1; private final static long SEQUENCE_BIT = 1; Private final static long MACHINE_BIT = 5; Private final static long DATACENTER_BIT = 5; Private final static long MAX_DATACENTER_NUM = -1l ^ (-1l << DATACENTER_BIT); // Datacenter_num = -1l ^ (-1l << DATACENTER_BIT); private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT); private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT); Private final static long MACHINE_LEFT = SEQUENCE_BIT; private final static long MACHINE_LEFT = SEQUENCE_BIT; private final static long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT; private final static long TIMESTMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT; private long datacenterId; // Private long machineId; Private long sequence = 0L; Private long lastStmp = -1l; Public SnowFlake(long datacenterId, long machineId) { if (datacenterId > MAX_DATACENTER_NUM || datacenterId < 0) { throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0"); } if (machineId > MAX_MACHINE_NUM || machineId < 0) { throw new IllegalArgumentException("machineId can't be greater than MAX_MACHINE_NUM or less than 0"); } this.datacenterId = datacenterId; this.machineId = machineId; } public synchronized long nextId() {long currStmp = getNewstmp(); if (currStmp < lastStmp) { throw new RuntimeException("Clock moved backwards. Refusing to generate id"); } if (currStmp == lastStmp) {if (currStmp == lastStmp) {if (currStmp == lastStmp) { If (sequence == 0L) {currStmp = getNextMill(); }} else {// In different milliseconds, set the sequence number to 0; } lastStmp = currStmp; Return (currStmp - START_STMP) < < TIMESTMP_LEFT / / part timestamp | datacenterId < < DATACENTER_LEFT/part/data center | machineId < < MACHINE_LEFT / / machine identifier portion | sequence; } private long getNextMill() {long mill = getNewstmp(); while (mill <= lastStmp) { mill = getNewstmp(); } return mill; } private long getNewstmp() { return System.currentTimeMillis(); } public static void main(String[] args) { SnowFlake snowFlake = new SnowFlake(2, 3); for (int i = 0; i < (1 << 12); i++) { System.out.println(snowFlake.nextId()); }}}Copy the code
Project landing experience
Hutools toolkit
Address: github.com/looly/hutoo…
SpringBoot integrates snowflake algorithms
Introduce the Hutool utility class
<dependency> <groupId>cn. Hutool </groupId> <artifactId>hutool-all</artifactId> <version>5.3.1</version> </dependency>Copy the code
integration
/** * public class SnowFlakeDemo {private long workerId = 0; private long datacenterId = 1; private Snowflake snowFlake = IdUtil.createSnowflake(workerId, datacenterId); @postconstruct public void init() {try {// Convert network IP to long workerId = netutil.ipv4tolong (netutil.getLocalHostStr ()); } catch (Exception e) { e.printStackTrace(); }} public synchronized long snowflakeId() {return this.snowflake. NextId (); } public synchronized long snowflakeId(long workerId, long datacenterId) { Snowflake snowflake = IdUtil.createSnowflake(workerId, datacenterId); return snowflake.nextId(); } public static void main(String[] args) { SnowFlakeDemo snowFlakeDemo = new SnowFlakeDemo(); for (int i = 0; i < 20; i++) { new Thread(() -> { System.out.println(snowFlakeDemo.snowflakeId()); }, String.valueOf(i)).start(); }}}Copy the code
results
1251350711346790400 1251350711346790402 1251350711346790401 1251350711346790403 1251350711346790405 1251350711346790404 1251350711346790406 1251350711346790407 1251350711350984704 1251350711350984706 1251350711350984705 1251350711350984707 1251350711350984708 1251350711350984709 1251350711350984710 1251350711350984711 1251350711350984712 1251350711355179008 1251350711355179009, 1251350711355179010,Copy the code
The advantages and disadvantages
advantages
-
The number of milliseconds in the high dimension, the increment sequence in the low, the whole ID is trend increasing
-
It does not rely on third-party systems such as databases, and is deployed as a service, providing higher stability and high performance in ID generation
-
You can allocate bits according to your service characteristics, which is very flexible
disadvantages
-
It depends on the machine clock. If the machine clock is rolled back, duplicate ids will be generated
-
It is incremental on a single machine, but due to the distributed environment, the clock on each machine can not be completely synchronized, sometimes not global incremental situation, this shortcoming can be regarded as indifferent, generally distributed ID only requires trend increasing, will not strictly increase, 90% of the demand only requires trend increasing.
Other supplementary
-
To solve the clock rollback problem that causes ID duplication, someone later proposes a solution
-
Baidu open-source distributed unique ID generator UidGenerator
-
Leaf-meituan-dianping distributed ID generation system
Concern public number: IT elder brother, read a dry goods article every day, a year later you will find a different self.