preface

In the Internet business system, there are various IDS involved, such as payment ID and refund ID in the payment system. So what are the general solutions for generating ids? Especially in the complex business scenario of distributed systems, it is very important for us to adopt the right solution for ourselves. We’ll list them one by one, not all of them will fit, but these solutions are just for your reference and may work for you. You can follow the public account “Java Architect Secrets”

The body of the

Features of distributed ids

Uniqueness: Ensure that the generated ID is unique across the network. Sequential increment: Ensures that the generated ID is sequential increment for a user or service. High availability: Ensure that ids are generated correctly at all times. With time: ID contains time, a glance to know which day of the transaction.

1. The core idea of the UUID algorithm is to generate A UUID by combining the machine’s network card, local time, and a random number.

Advantages: Local generation, simple generation, good performance, and no high availability risk Disadvantages: Excessive length, redundant storage, unordered and unreadable storage, low query efficiency

2. Automatic increment OF the DATABASE ID Use the automatic increment of the DATABASE ID policy, for example, auto_increment of MySQL. In addition, the two databases can be used to set the asynchronous length and generate a policy of non-duplicate ids to achieve high availability.

Advantages: The ids generated by the database are in absolute order, and the high availability mode is simple. Disadvantages: The database instance needs to be deployed independently, which is costly and has performance bottlenecks

3. Generate ids in batches multiple ids are generated in batches. Each generation requires access to the database, change the database to the maximum ID value, and record the current value and maximum value in the memory.

Advantages: Avoids the pressure of accessing the database every time an ID is generated, and improves performance. Disadvantages: A local generation policy has a single point of failure, and ids are discontinuous due to service restart

All command operations in Redis are single-threaded and provide auto-increment atomic commands such as INCr and Increby, so the generated ids are guaranteed to be uniquely ordered.

Advantages: independent of database, flexible and convenient, and better performance than database; Numeric ID natural sorting, helpful for pagination or results that need sorting.

Disadvantages: If there is no Redis in the system, new components need to be introduced to increase the system complexity; The amount of coding and configuration required is considerable.

Given the performance bottleneck of a single node, Redis clustering can be used to achieve higher throughput. Suppose there are five Redis in a cluster. You can initialize each Redis with a value of 1, 2, 3, 4, 5, and then step size of 5. The ID generated by each Redis is:

A: 1, 6, 11, 16, 21 B: 2, 7, 12, 17, 22 C: 3, 8, 13, 18, 23 D: 4, 9, 14, 19, 24 E: 5, 10, 15, 20, 25Copy the code

Random load to which machine is determined, it is difficult to modify in the future. Step sizes and initial values must be determined in advance. Redis clustering can also be used to address single point of failure issues. In addition, it is good to use Redis to generate daily serial numbers starting from 0. For example, order number = date + daily growth number. You can generate a Key every day in Redis and use INCR to accumulate. Twitter implements a global ID generated service using ZooKeeper: Snowflake: github.com/twitter/sno… P1-jj.byteimg.com/tos-cn-i-t2… As you can see above, Twitter’s Snowflake algorithm consists of the following parts:

1 bit sign bit:

Since long is signed in Java, the highest bit is the sign bit, the positive number is 0, and the negative number is 1, and the ID used in the actual system is generally positive, so the highest bit is 0.

41 bit timestamp (millisecond level) :

Note that the 41-bit timestamp here is not the current timestamp, but the difference of the timestamp (current timestamp – start timestamp). The start timestamp here is usually the timestamp used by the ID generator, as specified by the program. So the 41-bit ms timestamp can be used at most (1 << 41)/(1000x60x60x24x365) = 69 years.

10 data machine bits:

Including five data identification bits and five machine identification bits, these 10 bits determine that a distributed system can deploy a maximum of 1 << 10 = 1024 S nodes. Any more than that, and the generated ids are likely to conflict.

12-bit ms sequence:

This 12-bit count allows each node to generate up to 1 << 12 = 4096 ids per millisecond (same machine, same time) that add up to exactly 64 bits, a Long.

Advantages: High performance, low latency, ordered by time, generally does not cause ID collisions Disadvantages: requires independent development and deployment, dependent on the machine clock

Simple implementation

Public class IdWorker {/** * start time: 2017-04-01 */ Private final Long epoch = 1491004800000L; Private final Long workerIdBits = 5L; /** * private final Long dataCenterIdBits = 5L; Private final Long maxWorkerId = ~(-1l << workerIdBits); private final Long maxWorkerId = ~(-1l << workerIdBits); Private Final Long maxDataCenterId = ~(-1 << dataCenterIdBits); private Final Long maxDataCenterId = ~(-1 << dataCenterIdBits); */ private final Long sequenceBits = 12L; */ private final Long workerIdShift = sequenceBits; */ Private final Long dataCenterIdShift = sequenceBits + workerIdBits; */ Private final Long timestampShift = sequenceBits + workerIdBits + dataCenterIdBits; */ Private final Long timestampShift = sequenceBits + workerIdBits + dataCenterIdBits; Private final Long sequenceMask = ~(-1l << sequenceBits); private final Long sequenceMask = ~(-1l << sequenceBits); /** * dataCenterId (0 ~ 31) */ private long dataCenterId; /** * private long workerId; /** * private long sequence; /** * private long lastTimestamp = -1l; public IdWorker(long dataCenterId, long workerId) { if (dataCenterId > maxDataCenterId || dataCenterId < 0) { throw new IllegalArgumentException(String.format("dataCenterId can't be greater than %d or less than 0", maxDataCenterId)); } if (workerId > maxWorkerId || workerId < 0) { throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId)); } this.dataCenterId = dataCenterId; this.workerId = workerId; } /** * get the nextId * @return snowflakeId */ public synchronized long nextId() {long timestamp = timeGen(); If (timestamp < lastTimestamp) {throw new; if (timestamp < lastTimestamp) {throw new RuntimeException(String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp)); If (timestamp == lastTimestamp) {sequence = (sequence + 1) &sequencemask; If (sequence == 0) {timestamp = nextMillis(lastTimestamp); }} else {// The timestamp changes, the sequence reset sequence = 0L; } lastTimestamp = timestamp; / / shift and through a bitwise or operation up to 64 - bit ID return ((timestamp - epoch) < < timestampShift) | (dataCenterId < < | dataCenterIdShift) (workerId << workerIdShift) | sequence; } /** * returns the current time in milliseconds * @return current time (milliseconds) */ protected long timeGen() {return system.currentTimemillis (); } /** * blocks until the next millisecond, * @return current timestamp */ protected Long nextMillis(long lastTimestamp) {long timestamp = timeGen(); while (timestamp <= lastTimestamp) { timestamp = lastTimestamp; } return timestamp; }}Copy the code

UidGenerator UidGenerator is an open source distributed ID generator for Baidu, based on the implementation of the Snowflake algorithm, which looks and feels ok. However, the project maintenance of domestic open source is really worried. For details, please refer to the official website: github.com/baidu/uid-g… Leaf is an open source distributed ID generator of Meituan, which can ensure global uniqueness, trend increasing, monotonic increasing, and information security. It also refers to the comparison of several distributed schemes, but it also needs to rely on middleware such as relational database and Zookeeper. For details, please refer to the official website: tech.meituan.com/MT_Leaf.htm…

summary

This article shares several common scenarios for global ID generation services and compares their advantages, disadvantages and application scenarios. In practical work, you can choose the appropriate system based on your own business and system architecture. Welcome to add Q group: 230419550 to learn and discuss the advanced knowledge of architects

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Several generation schemes of distributed unique ids

preface

The body of the

summary

Several generation schemes of distributed unique ids

preface

The body of the

summary

Related Posts

Ali P7 big guy! ConcurrentHashMap source code: Too thorough

TAPD integration GitLab

Kafka summary