Distributed ID production mode - Collect posts

A database grows its own sequence or field

The most common way. Use the database, the whole database is unique.

Advantages:
- Simple, easy code, acceptable performance.
- Numeric ID natural sorting, helpful for pagination or results that need sorting.
Disadvantages:
- Different database syntax and implementation is different, when the database migration or multiple database versions support needs to be handled.
- Only one master library can be generated in the case of a single database or read/write separation or one master with many slaves. Risk of single point of failure.
- Scaling is difficult when performance is not up to par.
- It can be painful to have multiple systems that need to merge or involve data migration.
- There will be trouble when dividing tables and libraries.
Optimization scheme:
- For the Master library single point, if there are multiple Master libraries, the start number and step size of each Master library are different, which can be the number of masters. For example: Master1 generates 1,4,7,10; Master2 generates 2,5,8,11; Master3 generates 3,6,9,12. This effectively generates unique ids in the cluster and greatly reduces the load of ID generation to the database.

UUID

Common way. It can be either database or procedural generation, generally globally unique.

Advantages:

Simple, convenient code.
The generated ID performs very well and has almost no performance problems.
Unique in the world, it can handle data migration, system data consolidation, or database changes.
Disadvantages:
- Without sorting, there is no guarantee of increasing trend.
- UUID is usually stored in strings, which is inefficient to query.
- The storage space is relatively large. If it is a massive database, the storage capacity needs to be considered.
- Large amount of data is transmitted
- Unreadable.

Redis generated ID

When using a database to generate ids is not performance enough, we can try using Redis to generate ids. This depends on Redis being single-threaded, so it is possible to generate globally unique ids. This can be done using Redis’s atomic operations INCR and INCRBY. Redis clustering can be used for higher throughput. Suppose there are five Redis in a cluster. You can initialize each Redis with a value of 1,2,3,4,5, and then a step size of 5. The ids generated by each Redis are: A: 1,6,11,16,21 B: 2,7,12,17,22 C: 3,8,13,18,23 D: 4,9,14,19,24 E: 5,10,15,20,25

Advantages:
- Independent of database, flexible and convenient, and better performance than database.
- Numeric ID natural sorting, helpful for pagination or results that need sorting.
Disadvantages:
- If the system does not have Redis, new components need to be introduced to increase the complexity of the system.
- The amount of coding and configuration required is considerable.

twitter

When Migrating from MySQL to Cassandra, Twitter developed a globally unique ID generation service called Snowflake because Cassandra did not have sequential ID generation. 1 41-bit time series (accurate to the millisecond, the 41-bit length can be used for 69 years) 2 10-bit machine ID (a 10-bit machine ID supports a maximum of 1024 nodes) 3 12-bit count sequence number (a 12-bit count sequence number supports 4096 ID numbers generated by each node every millisecond) The highest bit is the sign bit and is always 0.

Advantages:
- High performance, low delay; Standalone applications;
- Order by time.
Disadvantages:
- Independent development and deployment is required.
- If the host time is dialed back, duplicate ids will be generated
- Ids are ordered, but not continuous

MongoDB的ObjectId

MongoDB’s ObjectId algorithm is similar to snowflake’s. It is designed to be lightweight and can be easily generated by different machines in the same way that is globally unique. MongoDB was designed from the start as a distributed database, and handling multiple nodes was a core requirement. Make it much easier to generate in a sharding environment. ObjectId used 12 bytes of storage space, its generation is as follows: | 0 | 1 | 2 | 3 | 4 5 6 7 | | | | | 8 9 10 11 | | | | | | machine ID PID timestamp | | counter in the first four bytes timestamp starts the timestamp of the standard era, unit for seconds, has the following features: 1 The timestamp is the same as the last 5 bytes to ensure the uniqueness of the second level. 2. Ensure that the insertion sequence is roughly in chronological order. 3 implies the creation time of the document; 4 The actual value of the timestamp is not important. There is no need to synchronize time between servers (adding the machine ID and process ID ensures that this value is unique. Uniqueness is the ultimate objective of ObjectId). The machine ID is the server host ID, usually a hash value of the machine host name. Multiple mongod instances can run on the same machine, so the process identifier PID is also required. The first nine bytes ensure the uniqueness of the ObjectId generated by different processes on different machines in the same second. The last three bytes are an auto-increments counter (a mongod process needs a global counter) that ensures that the ObjectId is unique for a second. Each process can have a maximum of (256^3 = 16777216) different objectiDs in one second. To summarize: time stamp to ensure second level only, machine ID design when considering distributed, avoid clock synchronization, PID to ensure the same server to run multiple mongod instance, the uniqueness of the counter guarantee the uniqueness of the same seconds (chooses a few bytes to consider both storage efficiency, also want to consider the limit of concurrent performance). “_id” can be generated on both the server side and the client side to reduce the stress on the server side.

There are a lot of domestic manufacturers based on snowflake algorithm localization, basically is the further optimization of Snowflake, such as to solve the clock back problem! For example,

Baidu uID-Generator: github.com/baidu/uid-g…
Meituan Leaf: github.com/zhuzhong/id…

In general, distributed unique ids must meet the following conditions:

High availability: No single point of failure.
Global uniqueness: No duplicate ID numbers, which is a basic requirement since they are unique.
Increasing trend: The MySQL InnoDB engine uses clustered indexes. Since most RDBMS use b-tree data structure to store index data, we should try to use ordered primary keys to ensure write performance.
Time order: In time order, or the ID contains time. In this way, there is one less index and hot and cold data can be easily separated.
Sharding support: You can control ShardingId. For example, articles of a certain user should be placed in the same fragment, so that the query efficiency is high and the modification is easy.
Monotonically increasing: Ensure that the next ID is greater than the previous ID. For example, the transaction version number, IM incremental messages, and sorting are required.
Medium length: not too long, preferably 64bit. It’s easier to work with long, but if it’s 96bit, it can be quite inconvenient to shift, and some components may not be able to support such large ids.
Information security: if the ID is continuous, malicious user selection work is very easy to do, directly in order to download the specified URL; If it is an order number, it is even more dangerous. Our competitors can directly know our daily orders. Therefore, in some application scenarios, irregular and irregular ids are required.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Distributed ID production mode – Collect posts

Distributed ID production mode – Collect posts

Related Posts

A chat Mockito

Imitation Kingdee, material inventory system design and implementation ideas

Redis cache penetration cache breakdown cache avalanche cache preheat dual write inconsistent