Today we will take apart the Snowflake algorithm and see how Baidu, Meituan, Tencent and other big companies have designed globally unique ID services. Then we will design a new globally unique ID generation algorithm according to specific needs. Not enough, we’ll talk about distributed CAP selection and performance bottlenecks for globally unique ID services.

Those already familiar with Snowflake can start by looking at dachang’s design and trade-offs.

Baidu UIDGenertor: github.com/baidu/uid-g…

Meituan Leaf:tech.meituan.com/2017/04/21/…

Tencent Seqsvr: www.infoq.cn/article/wec…

Global unique ID is an important infrastructure in distributed system and order-type business system. Here’s meituan’s description:

In a complex distributed system, it is often necessary to uniquely identify a large amount of data and messages. For example, in the system of Meituan-Dianping’s financial, payment, catering, hotel, cat’s eye movies and other products, data is increasing day by day, and a unique ID is required to identify a data or message after data is divided into databases and tables. The self-added ID of the database obviously cannot meet the demand. Special items such as orders, riders and coupons also need to be marked with unique IDS.

At this point, you might ask: I still don’t understand, why does it have to be globally unique?

MySQL cannot generate ids sequentially, sequentially and alternately under the condition of MySQL database and table. In this case, to ensure the order of data, the global unique ID is a good choice.

In the crawler scenario, this data will undergo data cleaning, verification, correction, analysis and other processes before entering the database, during which there is a certain probability of retry or abnormal operation, that is to say, it needs an ID to identify it before entering the database.

What attributes should a globally unique ID have to satisfy this scenario?

Meituan’s technical team listed four attributes that I think are very accurate. They are:

  1. Global uniqueness: No duplicate ID numbers. This is a basic requirement for unique identifiers.
  2. Increasing trend: the MySQL InnoDB engine uses clustered indexes. Since most RDBMS use b-tree data structure to store index data, we should try to use ordered primary keys to ensure write performance.
  3. Monotonic increment: Ensure that the next ID is greater than the previous ID. For example, special requirements such as transaction version number, IM incremental message, and sorting are required.
  4. Information security: if the ID is continuous, malicious users crawling work is very easy to do, directly in order to download the specified URL; If it is an order number, it is even more dangerous. Our competitors can directly know our daily orders. Therefore, in some application scenarios, irregular and irregular ids are required.

There seems to be a slight conflict between point 3 and point 4, but I’ll talk about that later. In addition to the ID attributes listed above, a service built based on this generation algorithm needs to purchase several requirements of high QPS, high availability, and low latency.

What are the common ID generation methods in the industry?

UUID and GUID, which you probably learned in school, produce values that look like this:

6F9619FF-8B86-D011-B42D-00C04FC964FF
Copy the code

Since it is not purely numeric, this does not satisfy the properties of trending and monotonically increasing, and it also degrades write performance. As mentioned above, the database self-increment ID cannot meet the needs of pre-database use and distributed scenarios, so it is excluded.

Someone proposed to achieve this by Redis, such as order number = date + self-growth number of the same day, and self-growth through INCR. But this operation can not meet the number can not guess the demand.

The ObjectID of MongoDB was proposed. Don’t forget that it will generate an ID like this: 5b6b3171599d6215a8007se0.

Isn’t there anything to fight?

The famous Snowflake

In 2010, Twitter opened source Snowflake, a globally unique ID generation algorithm used by internal teams, which translates as Snowflake. Snowflake does not rely on a database, but can be generated directly by a programming language. It uses clever bit design to allow ids to satisfy incremental attributes, and the generated ids are not sequential, but satisfy the four attributes of globally unique ids mentioned above. The three ids it generates consecutively look like this:

563583455628754944
563583466173235200
563583552944996352
Copy the code

Snowflake stores the four parts of an ID in 64 bits:

1. The highest bit is 1 bit, and the value is fixed at 0 to ensure that the generated ID is a positive number;

2. The median is 41 bits, and the value is the millisecond timestamp;

3. The middle and lower bits are 10 bits, and the value is the ID of the working machine. The upper limit of the value is 1024.

4. The last bit is 12 bits, and the value is the different ID generated in the current millisecond. The upper limit of the value is 4096.

Snowflake’s code implementation is widely available online and can be found in almost every language. I found a code implementation of Golang on the Internet when I was doing experiments:

The code can be viewed and downloaded in my Gist.

The problem with Snowflake

Snowflake doesn’t rely on databases or in-memory storage and can generate ids at any time, which is why it’s so popular. But because it is designed with timestamps to avoid memory and database dependence, it depends on the server’s time. As mentioned above, Snowflake’s 4-segment structure actually affects the ID size by the highest value, and since the highest value is fixed at 0, the ID size is affected by the median value, which is the timestamp.

Imagine that the server’s time is distorted or called back, which directly affects the generated ID. There is a high probability that duplicate ids will be generated and the increment property will be broken. This is a fatal shortcoming, if you think about it, the payment order and the purchase order number is repeated, this is how serious a problem!

In addition, it is severely limited in how much ID it can generate per millisecond due to its limit on the number of middle, lower and last bits. Since the median is a 41-bit millisecond timestamp, it only lasts 70 years from the current start until the 41-bit runs out.

Moreover, the program will spend a lot of time to obtain the operating system time. Compared with random number and constant, the performance is far from that, which is the biggest factor restricting its performance.

How do first-tier enterprises solve the global unique ID problem

To make a long story short, let’s look at what Baidu, Meituan and Tencent (wechat) are doing.

The Baidu team open-source the UIDGenerator algorithm.

It borrows future times and dual buffers to solve time callbacks and build performance problems, and uses MySQL to assign ids. This is a Snowflake optimized operation and it’s a good choice. Do you think it’s preferred?

Meituan team proposed leaf-segment scheme based on number Segment idea and Leaf-Snowflake scheme based on Snowflake according to business scenarios.

The reason for the emergence of the two schemes is that leaf-segment does not meet the requirements of security attributes, so it is easy to guess and cannot be used in open scenarios (such as orders). Leaf-snowflake reduces its dependence on ZooKeeper through file system caching, while addressing Snowflake’s time rollback issues through time comparison and alerts. Both are good choices. Do you think this is preferred?

The wechat team has a special business, which uses AN ID to mark the order of messages to ensure that the messages we receive are in order. In this case, it is not a globally unique ID, but a globally unique ID of a single user. You only need to ensure that the ID of the message sent by this user is increasing.

This project, called Seqsvr, does not rely on time, but instead uses auto-increment and number segments to solve the generation problem. That’s a good choice. Do you think it’s preferred?

How was the algorithm designed to perform 587 times better than Snowflake?

After learning about Snowflake’s strengths and weaknesses and reading the designs of Baidu UIDGenertor, Meituan Leaf and Tencent wechat Seqsvr, I hope to design a global unique ID generation algorithm that can meet the four attributes of global unique ID and has higher performance, longer service life, no unit time limit and no dependence on time.

It seemed simple enough, but absorbing what I learned, designing, practicing, and optimizing performance took me four weekends. In my opinion, the design process of the algorithm is similar to that of liquid water turning into a gas-like Mist, so I named the algorithm Mist. Let’s take a look at how the mist algorithm is designed and implemented.

The number of bits is the main factor affecting the upper limit of the ID value. In Snowflake, the lower and last bits limit the upper limit of the number of ids generated per unit of time. To solve these two problems, the composition of the ID must be redesigned.

Leaving the middle, let’s first look at the design of the middle, bottom and bottom. The value of the middle and lower 10 bits is actually the machine number, while the value of the last 12 bits is actually the ID serial number generated within the unit time (the same millisecond), expressing the 5th or 150th value generated in this millisecond. At the same time, the combination of the two makes the ID value unpredictable and meets the security attribute. In fact, there is no need to record the machine number, and we can ignore the number of values generated in unit time. The security attribute can be realized through the combination of multiple groups of random numbers. As the number increases and the random number changes, it is very difficult to guess the order by ID.

The highest bit is fixed at 0 and does not need to be changed. Let’s take a look at the all-important median. Snowflake’s median is a millisecond timestamp, and since it’s not going to rely on time, it’s certainly not going to use timestamps. I’m going to increment 1,2,3,4,5… . The median determines the upper limit of the generated ID and how long it can be used. If we use 41 bits, the upper limit is similar to the upper limit of the timestamp. After calculation, I choose to use a different section than Snowflake:

Reduce the number of middle, bottom and last bits, increase the number of middle bits, so that we can have a higher upper limit and life, then the upper limit and life is now how long? The upper limit of the median value was calculated by int64(1<< 47-1) and the result was 140737488355327. At quadrillion, assuming a billion ids per day, the mist algorithm can run for 385+ years, which would take several lifetimes.

The middle, lower and last bits are 8 bits, and the upper limit of the value is 255, that is, the open and closed interval is [0, 255]. If these two paragraphs are filled with random numbers, there are 256 * 256 corresponding combination methods, which will change each time, so it is quite difficult to guess. Unlike Snowflake, which requires calculating the serial number of the last bit, the code for the mist algorithm is not very long. The code can be found in my GitHub repository:

Talking about performance issues, getting time stamps is more performance expensive, not getting time stamps is of course faster, so how do you get 500+ times? Taking Golang as an example (I have done experiments with Golang), there are three ways to generate Golang random numbers:

  • Random number based on fixed numerical seeds;
  • Will transform the timestamp as the seed random number;
  • Big numbers are really random;

Random numbers based on fixed numerical seeds generate the same value every time, which is pseudo-random and cannot be used here. Seeding time stamps to generate random numbers is currently the mainstream practice of Golang developers, and the measured performance is about 8800 ns/op. Relatively few people know the true randomness of large numbers, and the measured performance is 335ns/op, so it can be seen that the performance difference is nearly 30 times.

The true randomness of large numbers also has certain loss. If you want to improve the performance to the peak, you only need to change the middle, lower and last random numbers into constants. The measured performance of constants is 15ns/op, 587 times that of the timestamp seed random numbers.

Note that the performance of placing constants in the middle, bottom and bottom digits is high, but the difficulty of guessing is correspondingly reduced.

The dependence problem of mist algorithm

In order to avoid time dependence, the mist algorithm has to rely on storage. The value of the median increment can only survive in memory, so it needs to rely on storage to store the increment value to avoid the accident of duplicate ID caused by downtime or program abnormalities.

It looks that way, but is it really storage dependent?

If you think about it, such an important service must require high availability. No matter you use Twitter or Baidu or Meituan or Tencent’S wechat solution, it must be high availability in architecture, and high availability must require storage. In this context, the dependency of the mist algorithm is not an extra dependency, but a design that can be fully integrated with the architecture.

Mist algorithm and Redis combination

Since the mist algorithm is proposed, how can not provide real usable engineering practice? After writing the mist algorithm, I started the engineering practice, combining the mist algorithm with KV storage to provide the global unique ID generation service. Here I chose the more familiar Redis, a combination of Mist and Redis, and I named the project Medis.

The high performance is not made up, we look at it Jemeter pressure measurement parameters and results:

The above is a screenshot from the Medis README of a performance test of about 2.5 W/SEC for a large cardinality. In addition to the high performance of mist algorithm itself, Medis design also makes a great contribution:

  • Using Channel as the data cache, this operation resulted in a seven-fold improvement in the number issuing service performance;
  • The prefetch strategy is adopted to ensure that the Channel has a value in most cases, so that it can quickly respond to requests from clients.
  • Gorouting to perform the time-consuming prefetch operation, will not affect the response to the client request;
  • Batch values are obtained from Redis using the Lrange Ltrim combination, which is more efficient than cyclic single reads or pipeline batch reads.
  • When Redis is written, pipeline batch writing is adopted, which is more efficient than cyclic single writing.
  • Seqence value is calculated before pre-storage, so as not to delay the response to the client request. Although the performance of mist algorithm is nanosecond level, it also causes some performance loss when the concurrency is high, so it is obviously more convenient to calculate when pre-storage.
  • Thanks to the high performance of Golang Echo framework and Golang itself, I am very satisfied with the whole process. If you want to pursue ultimate performance, I recommend you to try Rust.

The Medis service startup process and interface access flowchart are shown below:

Interested friends can download experience, start the Medis root directory server. Go after visit http://localhost:1558/sequence can get a globally unique ID.

Highly available architecture and distributed performance

Distributed CAPS (consistency, availability, partition fault tolerance) are a foregone conclusion, and such services typically pursue an availability architecture (AP). Since prestorage and prefetch are used in the design, and the overall order should be kept increasing, so it is preferred to provide access on a single machine, that is, the upper limit of performance in distributed architecture is that of the single machine that provides services.

Do you want to implement distributed multi-machine service delivery?

Such a requirement would change Medis’s logic as well as the composition of the applications. If you want to implement distributed multi-machine simultaneous service, you have to scrap the Redis and Channel prefetch mechanism, and then abandon the Channel in favor of real-time generation, so that multiple servers can be used at the same time, but the performance bottleneck is shifted to KV storage (in this case Redis). The performance is equivalent to that of a stand-alone Redis. You can use ETCD or Zookeeper to implement multiple KV, but isn’t that back to CAP origin?

As for how to choose, you can discuss with the architecture based on actual service scenarios and requirements and choose a suitable solution for deployment.

After seeing Mist and Medis, I’m sure you’ll come up with other ingenious ideas. Please leave them in the comments and we’ll share our progress!

Nightnight was founded in 2019, The team consists of cui qingcai (jingmi), zhou ziqi (Loco), Chen xiangan (CXA), tang yifei (big fish | BruceDone), feng wei (wei wei), CAI jin (owner of yuelai inn), dai huangjin (xianyu), zhang yeqing (MarvinZ), wei shidong (Asyncins | quinn) and wen anzhe (sml2h3).

My programming languages include but are not limited to Python, Rust, C++, Go, covering crawler, deep learning, service development, reverse engineering, software security, etc. The team is neither good nor evil. We only do what we think is right. Please be careful.