Honey, do you know how the Snowflake algorithm works?

SnowFlake is an open-source distributed ID generation algorithm for Twitter. The idea is to use a 64-bit long number as a globally unique ID. It is widely used in distributed systems, and the introduction of time stamps for IDS is basically self-incrementing, with detailed annotations in the following code. \

One of the 64 bits is unused and 41 bits are used as the number of milliseconds, 10 bits are used as the work machine ID, and 12 bits are used as the serial number.

To give you an example, consider the following 64 bit long:

The first part, which is a bit: 0, is meaningless.
The second part is 41 bits: the timestamp.
The third part contains five bits: the machine room ID, 10001.
The fourth part has five bits: the machine ID, 1 1001.
The fifth part is the 12 bits: the serial number, which is the serial number of the ID generated at the same time in a millisecond on a machine in a machine room, 0000 00000000.

①1 bit: No, why not?

Since the first bit in binary is a 1, it is a negative number, but the generated ID is a positive number, so the first bit is always 0.

②41 bit: indicates the timestamp in milliseconds.

41 bits can represent as many as 2^ 41-1 milliseconds, or 69 years in adult years.

③10 bit: Record the working machine ID, indicating that the service can be deployed on a maximum of 2^10 machines, i.e. 1024 machines.

However, five bits represent the machine room ID and five bits represent the machine ID. A maximum of 2 ^ 5 equipment rooms (32 equipment rooms). Each equipment room can represent 2 ^ 5 machines (32 machines), or it can be determined based on the actual situation of your company.

④12 bit: This is used to record different ids generated in the same millisecond.

The largest positive integer represented by 12 bits is 2 ^ 12-1 = 4096, which means you can use the 12 bits to distinguish 4096 different ids in the same millisecond.

In simple terms, one of your services assumes that you want to generate a globally unique ID, so you can send a request to the system that has the SnowFlake algorithm deployed, and the SnowFlake algorithm system generates the unique ID.

The SnowFlake algorithm must first know the machine room id = 17 and machine ID = 12.

When the SnowFlake algorithm receives the request, it first generates a 64-bit long ID using binary arithmetic. The first of the 64 bits is meaningless.

This is followed by 41 bits to use the current timestamp (in milliseconds), 5 bits to set the machine room ID, and 5 bits to set the machine ID.

Finally, determine the number of requests in this millisecond on the machine in the current machine room, and add a sequence number to the request to generate the ID as the last 12 bits.

The result is a 64-bit ID that looks something like this:

This algorithm ensures that a unique ID is generated in the same millisecond on a single machine in a machine room. It is possible to generate multiple ids in a millisecond, but with the ordinal number of the last 12 bits to distinguish them.

Let’s take a quick look at a code implementation of the SnowFlake algorithm. This is an example. Once you understand what this means, you can try to modify the algorithm for yourself.

In a word, each bit of a 64 bit number is used to set different flag bits to distinguish each ID.

SnowFlake algorithm implementation code is as follows:

public class IdWorker {

 // Since the first bit in binary is 1, it is negative, but the generated id is all positive, so the first bit is always 0.

 // Machine ID is binary 5 bits 32 bits minus 1 bit 31 bits
 private long workerId;
 // Machine room ID is 5 bits in base 2, minus 1 bit and 31 bits
 private long datacenterId;
 // Represents the latest sequence number of multiple ids generated in one millisecond
 private long sequence;
 // Set an initial time value of 2^ 41-1 to approximately 69 years
 private long twepoch = 1585644268888L;
 // A 5-bit machine ID
 private long workerIdBits = 5L;
 // 5-digit equipment room ID
 private long datacenterIdBits = 5L;
 // Number of ids generated per millisecond 2 to the 12th
 private long sequenceBits = 12L;
 // This is a binary operation, i.e., a maximum of 31 digits in 5 bits, i.e., a maximum of 32 machine ids
 private long maxWorkerId = - 1L ^ (- 1L << workerIdBits);
 A maximum of 31 digits can be contained in 5 bits. A maximum of 32 digits can be contained in the machine room ID
 private long maxDatacenterId = - 1L ^ (- 1L << datacenterIdBits);

 private long workerIdShift = sequenceBits;
 private long datacenterIdShift = sequenceBits + workerIdBits;
 private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
 private long sequenceMask = - 1L ^ (- 1L << sequenceBits);
 // Record the number of milliseconds of the generation time and check whether it is the same as 1 milliseconds
 private long lastTimestamp = - 1L;
 public long getWorkerId(){
  return workerId;
 }
 public long getDatacenterId() {
  return datacenterId;
 }
 public long getTimestamp() {
  return System.currentTimeMillis();
 }

 public IdWorker(long workerId, long datacenterId, long sequence) {

  // Check whether the equipment room ID and machine ID exceed 31 and cannot be smaller than 0
  if (workerId > maxWorkerId || workerId < 0) {
   throw new IllegalArgumentException(
     String.format("worker Id can't be greater than %d or less than 0",maxWorkerId));
  }

  if (datacenterId > maxDatacenterId || datacenterId < 0) {

   throw new IllegalArgumentException(
     String.format("datacenter Id can't be greater than %d or less than 0",maxDatacenterId));
  }
  this.workerId = workerId;
  this.datacenterId = datacenterId;
  this.sequence = sequence;
 }

 // This is the core method that causes the snowflake algorithm on the current machine to generate a globally unique ID by calling nextId()
 public synchronized long nextId() {
  // Get the current timestamp in milliseconds
  long timestamp = timeGen();
  if (timestamp < lastTimestamp) {

   System.err.printf(
     "clock is moving backwards. Rejecting requests until %d.", lastTimestamp);
   throw new RuntimeException(
     String.format("Clock moved backwards. Refusing to generate id for %d milliseconds",
       lastTimestamp - timestamp));
  }

  A request is sent to generate an ID within the same millisecond
  // The seqence sequence number must be incremented by 1, up to 4096
  if (lastTimestamp == timestamp) {

   // This means that there can only be 4096 digits in a millisecond. No matter how many you pass in,
   // This bit operation is guaranteed to always be in the range of 4096, so that you do not pass a sequence beyond the range of 4096
   sequence = (sequence + 1) & sequenceMask;
   // When the number of generated ids exceeds 4095 in a certain millisecond, the system waits until the next millisecond, and the system continues to generate ids
   if (sequence == 0) { timestamp = tilNextMillis(lastTimestamp); }}else {
   sequence = 0;
  }
  // The last time the id was generated is in milliseconds
  lastTimestamp = timestamp;
  // Here is the core bit operation, generating a 64bit ID
  // Move the current timestamp left to 41 bit; [Fixed] Move machine room ID left to 5 bit [Fixed] Move the machine ID left to 5 bit Put the serial number in the last 12 bits
  // Finally concatenated into a 64 bit binary number, converted to base 10 is a long
  return ((timestamp - twepoch) << timestampLeftShift) |
    (datacenterId << datacenterIdShift) |
    (workerId << workerIdShift) | sequence;
 }

 /** * When the number of ids generated in a certain millisecond exceeds 4095, the system will wait until the next millisecond, the system continues to generate id * @param lastTimestamp * @return */
 private long tilNextMillis(long lastTimestamp) {

  long timestamp = timeGen();

  while (timestamp <= lastTimestamp) {
   timestamp = timeGen();
  }
  return timestamp;
 }
 // Get the current timestamp
 private long timeGen(){
  return System.currentTimeMillis();
 }

 /**
  *  main 测试类
  * @param args
  */
 public static void main(String[] args) {
  System.out.println(1&4596);
  System.out.println(2&4596);
  System.out.println(6&4596);
  System.out.println(6&4596);
  System.out.println(6&4596);
  System.out.println(6&4596); }}Copy the code

Advantages of the SnowFlake algorithm:

(1) High performance and high availability: the generation does not depend on the database, and is completely generated in memory.

(2) Large capacity: millions of self-increasing ids can be generated every second.

(3) ID increment: save in the database, index efficiency is high.

Disadvantages of the SnowFlake algorithm:

Depending on the consistency with the system time, if the system time is called back or changed, id conflicts or duplicate may occur.

In practice, we do not have so many machine rooms, so we can improve the algorithm and optimize the 10bit machine ID into business tables or businesses related to our system.

From: blog.csdn.net/lq18050010830/article/details/89845790

Honey, do you know how the Snowflake algorithm works?

Related Posts

What timers are available in TCP?

Find out how to write a blockbuster article in Nuggets through crawler

FAQ for RocksDB storage engine integration in ArangoDB