1, the introduction

When many people think of IM application development, their first impression is “long connection”, “socket”, “live”, “protocol” and so on. Yes, these are definitely the technologies involved in IM development.

But when you actually start writing the first line of code, the real question is “How do I generate chat message ids?” This seemingly insignificant little thing. It seems trivial because it’s so common and ubiquitous in IM. However, although it seems trivial, it is actually very important, because the advantages and disadvantages of its generation algorithm and generation strategy in a sense, determines the difficulty of implementing certain functions of your IM application layer.

With that in mind, IM has put together a series of articles called IM Message ID Technology Topics that hopefully will give you a deeper understanding and best practices on this small but important technical point.

This article is the fifth in a series of special articles, specially introduces baidu open source distributed message ID generator algorithm logic, implementation ideas, key source code interpretation, etc., may bring you more inspiration.

Learning and communication:

– Instant messaging/push technology development communication group 5:
215477170
[recommend]

– Introduction to MOBILE IM Development
Just one entry for beginners: Developing mobile IM from scratch”

(This article is simultaneously published at: www.52im.net/thread-2953…)

2. Thematic catalogue

This article is the fifth in a series of articles on IM Message ID technology.


IM Message ID Technology Topic (I) : Wechat’s massive IM Chat Message Serial number Generation Practice (Algorithm Principle)




IM Message ID Technical Topic (2) : Wechat massive IM Chat Message Sequence Number Generation Practice (Disaster Recovery Solution)




IM Message ID technical topic (3) : Decrypting the chat message ID generation strategy of Rongyun IM products




IM message ID technology topic (4) : Deep decryption of Meituan distributed ID generation algorithm




IM Message ID technology Topic (5) : Baidu open source distributed ID generator UidGenerator introduction
“(* Article)

3. Basic introduction

Global ID services (such as message ID in IM chat system, order number in e-commerce system, order number in take-out application, etc.) are basic services in distributed services, which need to be globally unique, efficient, and highly reliable. Monotonicity may sometimes be required, but it does not have to be strictly increasing or decreasing.

The global ID can also be obtained by the auto-increment primary key of the database, but requiring a high QPS is obviously unrealistic.

UidGenerator is baidu’s open source unique ID generator based on The Snowflake algorithm (a modified version of Baidu’s Snowflake algorithm), which further improves efficiency by introducing the RingBuffer concept of high-performance queue Disruptor.

UidGenerator is implemented in Java language. It works in application projects in the form of components and supports custom workerId bits and initialization policies, which is suitable for scenarios such as automatic restart and drift of instances in docker virtualization environment.

In terms of technical implementation, UidGenerator has the following key features:

1) UidGenerator solves the natural concurrency limitation of sequence by borrowing future time;

2) RingBuffer is used to cache generated UID and parallelize the production and consumption of UID;

3) Complete the CacheLine at the same time to avoid the hardware-level “pseudo-sharing” problem caused by RingBuffer.

Based on the above technical features, the UidGenerator’s stand-alone pressure test data indicates that its QPS can be up to 6 million.

Dependent environment:

1) Java8 and above (new features such as functional programming statements are used in the code, please see:
Uid-generator source code online version
);


2) MySQL (built-in WorkerID allocator, distributed through DB at startup stage; DB is not mandatory if the implementation is custom.

The following are resources for the UidGenerator project:

1) Complete source code address:
Github.com/baidu/uid-g…


2) Alternate source code address:
Github.com/52im/uid-ge…


3) Read the source code online:
Docs.52im.net/extend/docs…
(recommended)

What is the Snowflake algorithm?

4.1 Principles of the SnowFlake algorithm

Friendly tip: This section is excerpted from the article “IM Message ID Technical Topic (4) : In-depth Decryption of Meituan’s distributed ID Generation algorithm”. If you want to know about Meituan’s understanding and application of SnowFlake algorithm, you can read it in detail.

SnowFlake is an open-source distributed ID generation algorithm for Twitter. The idea is to use a 64-bit long number as a globally unique ID.

One of the 64 bits is unused and 41 bits are used as the number of milliseconds, 10 bits are used as the work machine ID, and 12 bits are used as the serial number.

SnowFlake ID composition:

(This figure is quoted from “IM Message ID Technical Topic (4) : Distributed ID Generation Algorithm for Deep Decryption of Meituan”)

Sample SnowFlake ID:

(This figure is quoted from “IM Message ID Technical Topic (4) : Distributed ID Generation Algorithm for Deep Decryption of Meituan”)

To give you an example, like the following 64-bit long:

1) The first part is a bit: 0, which is meaningless;

2) The second part, 41 bits: represents the timestamp;

3) In the third part, there are 5 bits: the machine room ID is 10001;

4) In the fourth part, there are 5 bits: 1 1001;

5) The fifth part is 12 bits: the serial number, which is the serial number of the ID generated at the same time in a certain machine room in this millisecond, 0000 00000000.

1) 1 bit:

No. Why is that?

Since the first bit in binary is a 1, it is a negative number, but the generated ID is a positive number, so the first bit is always 0.

(2) 41 bit:

Represents a timestamp in milliseconds.

41 bits can represent as many as 2^ 41-1 milliseconds, or 69 years in adult years.

(3) 10 bit:

Record the working machine ID, which indicates that the service can be deployed on a maximum of 2^10 machines, or 1024 machines.

However, five bits represent the machine room ID and five bits represent the machine ID. This means a maximum of 2 ^ 5 machine rooms (32 machine rooms), each machine room can represent 2 ^ 5 machines (32 machines).

(4) 12 bit:

This is used to record different ids generated in the same millisecond.

The largest positive integer represented by 12 bits is 2 ^ 12-1 = 4096, which means you can use the 12 bits to distinguish 4096 different ids in the same millisecond. Theoretically, snowflake’s QPS is about 409.6W /s, which ensures that any machine in any IDC will generate a different ID in any millisecond.

In simple terms, one of your services assumes that you want to generate a globally unique ID, so you can send a request to the system that has the SnowFlake algorithm deployed, and the SnowFlake algorithm system generates the unique ID.

  • 1) The SnowFlake algorithm system must know the machine room and machine where it is located, for example, machine ID = 17, machine ID = 12;
  • 2) When the SnowFlake algorithm receives the request, it first generates a 64-bit long ID using binary arithmetic. The first of the 64 bits is meaningless.
  • 3) After 41 bits, you can use the current timestamp (in milliseconds), then 5 bits to set the machine room ID, and 5 bits to set the machine ID;
  • 4) Finally, determine the number of requests in this millisecond on the machine in the current machine room, and add a sequence number to the request to generate the ID as the last 12 bits.

The result is a 64-bit ID that looks something like this:

(This figure is quoted from “IM Message ID Technical Topic (4) : Distributed ID Generation Algorithm for Deep Decryption of Meituan”)

This algorithm ensures that a unique ID is generated in the same millisecond on a single machine in a machine room. It is possible to generate multiple ids in a millisecond, but with the ordinal number of the last 12 bits to distinguish them.

Let’s take a quick look at a code implementation of the SnowFlake algorithm. This is an example. Once you understand what this means, you can try to modify the algorithm for yourself.

In a word, each bit of a 64 bit number is used to set different flag bits to distinguish each ID.

4.2 Code implementation of SnowFlake algorithm

A typical Java implementation of the SnowFlake algorithm can be seen in the article section “6.5 Solution 4: Analysis of the SnowFlake algorithm’s Ideas” : “Easy to Understand: How to Design a Database architecture that can support millions of concurrent applications”. Is the code Jack Jiang actually used in a project.

4.3 Advantages and disadvantages of the SnowFlake algorithm

The advantages and disadvantages of the SnowFlake algorithm for distributed business systems are as follows.

Excellent advantages:

1) The number of milliseconds is high, the increment sequence is low, and the whole ID is trending upward;

2) Independent of third-party systems such as databases, deployed in the way of services, with higher stability and high performance of ID generation;

3) It can allocate bits according to its own service characteristics, which is very flexible.

Has disadvantages:

If the clock on the machine is dialed back, the number may be repeated or the service may be unavailable.

5. UidGenerator improved SnowFlake algorithm

In the previous section, we learned the basics of the original SnowFlake algorithm.

Specifically, the core components of the original SnowFlake algorithm are:

The specific meanings of the fields in the original SnowFlake algorithm are:

1) 1-bit sign Identifies the bit;

2) 41-bit timestamp;

3) 10-bit workId (i.e. 5-bit data center ID + 5-bit work machine ID);

4) 12-bit increment sequence.

The core composition of UidGenerator’s improved SnowFlake algorithm is shown as follows:

In short, the UidGenerator guarantees that”

Specifies machine & the same time & a concurrent sequence

“, is unique, and generates a 64-bit unique ID (long) based on this. The byte allocation method shown in the preceding figure is used by default.

Unlike the original Snowflake algorithm, UidGenerator also supports custom time stamps, work machine ids, and serial numbers for different scenarios (see source code implementation).

As shown in the figure above, the meanings of data bits in the default ID of UidGenerator are as follows:

  • 1) sign (1 -) :

    Fixed the 1bit symbol identifier, that is, the generated UID is a positive number.

  • 2) Delta seconds (28 bits) :

    Current time, incremental value relative to time base point “2016-05-20”, in seconds, up to about 8.7 years (note: (a) here is in seconds, not milliseconds! (b) Support for up to 8.7 years.

  • 3) Worker ID (22 bits) :

    Machine ID, which can support up to 420W machine startup. The built-in implementation is allocated by the database at startup, the default allocation policy is deprecated after use, and the subsequent reuse policy can be provided.

  • 4) Sequence (13 bits) :

    A 13-bit QPS sequence supports 8192 concurrent requests per second (note the default QPS maximum of 8192).

6. Specific code implementation analysis of UidGenerator

As you can see from reading the UidGenerator source code, there are two options for implementing UidGenerator: DefaultUidGenerator and CachedUidGenerator. Let’s take a look at the subtleties of the two specific code implementations.

6.1 DefaultUidGenerator

The source code for DefaultUidGenerator clearly states the implementation logic for several key bits that generate ids.

1) Delta seconds (28 bits) :

This value refers to the time difference between the current time and the epoch time in seconds. The epoch time refers to the time when the distributed ID service will go online for the first time by integrating DefaultUidGenerator. It is configurable and must be configured according to your online time, because the default epoch time is 2016-09-20. Otherwise, several years of available time will be wasted.

2) Worker ID (22bits) :

DefaultUidGenerator assigns a value to the worker ID. Create a table for DefaultUidGenerator:

DROP DATABASE IF EXISTS `xxxx`;
CREATE DATABASE `xxxx` ;
use `xxxx`;
DROP TABLE IF EXISTS WORKER_NODE;
CREATE TABLE WORKER_NODE
(
ID BIGINT NOT NULL AUTO_INCREMENT COMMENT 'auto increment id',
HOST_NAME VARCHAR(64) NOT NULL COMMENT 'host name',
PORT VARCHAR(64) NOT NULL COMMENT 'port',
TYPE INT NOT NULL COMMENT 'node type: ACTUAL or CONTAINER',
LAUNCH_DATE DATE NOT NULL COMMENT 'launch date',
MODIFIED TIMESTAMP NOT NULL COMMENT 'modified time',
CREATED TIMESTAMP NOT NULL COMMENT 'created time',
PRIMARY KEY(ID)
)
COMMENT='DB WorkerID Assigner for UID Generator', ENGINE = INNODB;
Copy the code

DefaultUidGenerator inserts a row of data into the table when the instance that integrates it to generate the distributed ID is started, and the resulting ID value is the one that will be assigned to the workerId. Since the workerId defaults to 22 bits, all instances that integrate DefaultUidGenerator to generate a distributed ID are not allowed to restart more than 4194303 times (that is, 2^22-1), otherwise an exception will be thrown.

3) Sequence (13bits) :

The core code is as follows, several key points to achieve:

A. synchronized Ensures thread safety;

B. If there is any callback, throw an exception.

C. If the current time is the same as the last time, sequence increases automatically. If the increment exceeds 2^13-1 in the same second, the spin waits for the next second (getNextSecond);

D. If the time is the new second, sequence starts from 0 again.

(The above source code is extracted from the nextId() method of DefaultUidGenerator.)

4) Summary:

Through the implementation of DefaultUidGenerator, it can be seen that the processing of clock callback is simple and crude. In addition, if you use the DefaultUidGenerator mode of UidGenerator to generate distributed ID, be sure to adjust the number of bits occupied by each field according to your business situation and characteristics:

<! -- Specified bits & epoch as your demand. No specified the default value will be used --><property name="timeBits" value="29"/><property name="workerBits" value="21"/><property name="seqBits" value="13"/><property name="epochStr" value="2016-09-20"/>Copy the code

6.2 CachedUidGenerator

CachedUidGenerator is an important improved implementation of DefaultUidGenerator. Its core makes use of RingBuffer, which is essentially an array with each item in the array called slot. The CachedUidGenerator designs two ringBuffers, one for the unique ID and one for the flag. RingBuffer has a size of 2^n, and n must be a positive integer.

Here is a schematic of RingBuffer in CachedUidGenerator:

Expanded knowledge: What is RingBuffer?

The concept of the Ring Buffer, actually derived from the Linux kernel (Maybe), provides a lock-free solution to the problem of contention in some special cases. This special case is when there is only one producer and one consumer, and it must be locked to use it in other cases.

Circular buffers usually have a read pointer and a write pointer. Read Pointers point to readable data in the ring buffer, and write Pointers point to writable buffers in the ring buffer. Data can be read and written to the buffer by moving the read and write Pointers. In general, the read user of the ring buffer only affects the read pointer, and the write user only affects the write pointer. If there is only one read user and one write user, there is no need to add mutex protection to ensure data correctness. If more than one user accesses the ring buffer, then mutual exclusion protection must be added to ensure that multiple users mutually exclusive access to the ring buffer.

For more details on the CachedUidGenerator code, please refer to baidu’s UID-Generator documentation page to see how the algorithm works.

So just to summarize briefly,CachedUidGeneratorThe following measures and schemes are adopted to avoid the clock callback problem and enhance the uniqueness:

  • 1) Autoincrement column: The workerId of the CachedUidGenerator is initialized with each instance restart and is the autoincrement ID of the database, so that there is no conflict between the workerIDS obtained by each instance;
  • 2) RingBuffer: Instead of calculating distributed ids in real time each time an ID is fetched, the CachedUidGenerator uses the RingBuffer data structure to pre-generate several distributed ids and save them;
  • 3) Time increment: Traditional SnowFlake implementations use System.currentTimemillis () to get the time and compare it to the last time, which is heavily dependent on the server’s time. CachedUidGenerator is AtomicLong and takes the next time from the server by incrementAndGet(), so there is no clock callback. (There is a small problem with this, too, That is, the time information in the distributed ID may not be the actual time at which the ID was generated, for example: {“timestamp”:”2019-05-02 23:26:39″,”workerId”:”21″,”sequence”:”1″}, {“timestamp”:”2019-05-02 23:26:39″,” sequence”:”1″} However, this ID may not be generated at the time of “2019-05-02 23:26:39”.

6.3 Summary

The CachedUidGenerator pregenerates a list of unique ids by caching to avoid the time-consuming process of obtaining unique ids. The downside of this approach is that it takes a lot of memory to cache this data, and if there is not much traffic, the timestamp in the UID generated in advance may be very old. For most scenarios DefaultUidGenerator will suffice, and there is no need to be a CachedUidGenerator.

In addition, suggestions for UidGenerator bit allocation:

For applications that do not require high concurrency and are expected to be used for a long time, you can increase the number of timeBits and decrease the number of seqBits. For example, if {“workerBits”:23,”timeBits”:31,”seqBits”:9} is configured, the WorkerIdAssigner policy is discarded and the restart frequency is 12 times/day. Support 28 nodes running for 68 years at an overall concurrency rate of 14,400 UID/s.

For applications that restart nodes frequently and are expected to be used for a long time, you can increase the workerBits and timeBits bits and reduce the seqBits bits. For example, if {“workerBits”:27,”timeBits”:30,”seqBits”:6} is configured to the WorkerIdAssigner policy and the restart frequency is 24 x 12 times per day, It can support 37 nodes running at an overall concurrency of 2400 UID/s for 34 years.

7. Throughput pressure test of UidGenerator

The UidGenerator test data shows the following UID throughput tests for CachedUidGenerator (single instance) on a MacBook Pro (2.7ghz Intel Core I5, 8G DDR3).

First, set workerBits to any value (e.g., 20), and count the throughput when timeBits change (e.g., from 25 to 32, the total duration corresponds to 1 year and 136 years respectively). The test results are shown in the following figure:

Then set timeBits to any value (for example, 31) and count the throughput of workerBits when they change (for example, from 20 to 29, the total number of restarts corresponds to 1 million and 500 million respectively). The test results are shown as follows:

No matter how configured, the CachedUidGenerator provides a stable throughput of 6 million /s with a reduced lifetime, which is great!

Finally, fix workerBits and timeBits (e.g., 23 and 31), and count the throughput of UID users with different numbers (e.g., 1 to 8, local CPU cores 4). The test results are as follows:

8. Reference materials

[1] Improved Snowflake Global ID generator -UID-generator

[2] UidGenerator

[3] UID-Generator source code analysis for Baidu Open source distributed ID generator

Appendix: More articles on hot technologies for IM development


Just one entry for beginners: Developing mobile IM from scratch




Mobile IM Developers must read (1) : Easy to understand the “weak” and “slow” mobile web




Mobile IM Developers must read (ii) : Summary of the most comprehensive mobile weak Network optimization methods ever




This paper discusses the message reliability and delivery mechanism of mobile IM from the perspective of client




Summary of optimization means of modern mobile terminal network short connection: request speed, weak network adaptation, security guarantee




Tencent technology sharing: the evolution of bandwidth compression technology for social network pictures




Session and Token in HTTP short connections




IM Development Basics Tutorial: Understand the principle of the front HTTP SSO single sign-on interface




How to ensure the efficiency and real time of pushing large-scale group messages in MOBILE IM?




The technical issues that mobile IM development needs to face




Is it better to design your own protocol using byte streams or character streams to develop IM?




Does anyone know the mainstream implementation of voice message chat?




Realization of IM message delivery guarantee mechanism (I) : to ensure the reliable delivery of online real-time messages




Realization of IM message delivery guarantee mechanism (II) : Ensure the reliable delivery of offline messages




How to ensure the “timing” and “consistency” of IM real-time messages?




A low cost method to ensure IM message timing




Should I use “push” or “pull” to synchronize online status in IM chat and group chat?




IM group chat messages are so complex, how to ensure that not lost and not heavy?




Talk about optimizing login requests in mobile IM development




How to save traffic by pulling data during IM login on the mobile terminal?




Brief introduction to the principle of multi-point login and message roaming in MOBILE IM




How to design a “retry failure” mechanism for a completely self-developed IM?




Easy to understand: Load balancing solution sharing at the mobile IM access layer based on a cluster




Technical Experiment and Analysis of wechat’s Influence on Network




Principle, Technology and Application of Instant Messaging System (Technical Paper)




The status of open source IM project “Mogujie TeamTalk” : An open source show with no end




QQ Music team share: Android image compression technology in detail (part 1)




QQ Music team share: Android image compression technology details (part 2)




Tencent original share (a) : how to greatly improve the mobile QQ picture transmission speed and success rate under the mobile network




How to greatly reduce APP traffic Consumption in mobile Network (Part 1)




(3) How to greatly reduce APP traffic Consumption in mobile Network (Part 2)




As promised: Mars, a cross-platform component library for wechat’s mobile IM network layer, has been officially open source




How does Social network-based Yelp achieve lossless compression of massive user images?




Tencent technology Sharing: How Tencent dramatically reduced bandwidth and Network traffic (photo compression)




How Tencent dramatically reduced bandwidth and Network traffic




Character encoding thing: quick understanding of ASCII, Unicode, GBK, and UTF-8




Fully master the features, performance and tuning of mainstream image formats on mobile terminals




Bullet SMS behind the bright: netease yunxin chief architect to share 100 million IM platform technology practice




IM development basics tutorial (five) : easy to understand, correctly understand and use MQ message queues




Wechat Technology Sharing: Practice of generating serial number of wechat’s Massive IM Chat Messages (Algorithm Principle)




Is it so hard to develop IM yourself? Hand-in-hand teach you from a simple Andriod version OF IM (source code)




Rongyun technology sharing: decrypting the chat message ID generation strategy of Rongyun IM products




IM development basic knowledge make-up lesson (six) : database with NoSQL or SQL? Just read this!




For beginners: Developing an IM server from scratch (based on Netty, with complete source code)




Pick up the keyboard: Develop a distributed IM system freehand with me




How does the “People nearby” function work in IM? How can it be done efficiently?




Basic KNOWLEDGE of IM development tutorial (7) : the principle and design idea of the main mobile terminal account login method




IM development of basic knowledge complement (eight) : the history of the most popular, thoroughly understand the nature of the character garbled problem




IM “Scan” feature easy to do? Take a look at the full technical implementation of wechat’s “Scan knowledge”




How to implement the IM scan login function? One article to understand the mainstream application of scanning code login technology principle




IM to do mobile phone scan code login? Let’s take a look at the technical principle of the scanning code login function of wechat

>>
More of the same…

(This article is simultaneously published at: www.52im.net/thread-2953…)