CosIdGeneral-purpose, flexible, high-performanceA distributed idsThe generator
Introduction to the
CosId is designed to provide a universal, flexible, and high-performance distributed ID generator. There are currently two classes of ID generators:
SnowflakeId
: Single-machine TPS performance: 409W/s JMH benchmarks, the main solutionThe clock is set back 、Machine number assignment problemAnd provide a more friendly and flexible use experience.SegmentId
: Get one paragraph at a time (Step
) ID to reduce the frequency of network I/O requests to the number dispenser to improve performance.IdSegmentDistributor
: number dispenser (number storage)RedisIdSegmentDistributor
Based on:RedisThe number dispenser of.JdbcIdSegmentDistributor
Based on:JdbcNumber segment dispenser, support a variety of relational databases.
SegmentChainId
(recommended) :SegmentChainId
(lock-free) is toSegmentId
The enhanced. Approximate performance can be achievedAtomicLong
的 TPS performance: + 12743 w/s JMH benchmarks 。PrefetchWorker
Maintain a safe distance (safeDistance
) and supports dynamics based on hunger statussafeDistance
Expansion/contraction.
Background (WhyA distributed ids)
In the process of software system evolution, with the growth of business scale, we need to cluster deployment to share computing and storage pressure, we can easily achieve stateless, elastic scaling of application services. But is simply increasing the number of copies of services enough? Obviously not enough, because the performance bottleneck is often at the database level, so this time we need to consider how to expand, scale, cluster the database, usually using the database, sub-table way to deal with. So how do I shard (horizontal shard, and of course vertical shard but that’s not the topic of this article), shard is that we have to have an ID, and then we can shard according to the shard algorithm. (For example, the simple and commonly used ID modulo sharding algorithm, which is similar to the concept of Hash algorithm, we need to have the key to Hash the slot.)
Of course, there are many distributed scenarios that require distributed ids, and I won’t list them all here.
Core metrics for distributed ID schemes
- Global (same business) uniqueness: The uniqueness guarantee isIDIt’s easy to understand that primary key conflicts will occur if the ID is not unique.
- Generally speaking, global uniqueness does not mean that all business services should be unique, but that different deployed copies of the same business service should be unique. For example, multiple deployment copies of the Order service are being generated
t_order
This tableId
Is required to be globally unique. As for thet_order_item
The generatedID
witht_order
It doesn’t affect the uniqueness constraint, it doesn’t have any side effects. The same is true for different business modules. That is, uniqueness mainly solves the problem of ID conflicts.
- Generally speaking, global uniqueness does not mean that all business services should be unique, but that different deployed copies of the same business service should be unique. For example, multiple deployment copies of the Order service are being generated
- order: Order assurance is required for query-oriented data structure algorithms (except Hash algorithms), isBinary searchDivide and conquer.
- Mysq-innodb B+ tree is the most widely used, assuming that Id is unordered, B+ tree will frequently insert the Id in the middle of the index and move the position of the node behind to maintain the order, and even cause frequent page splitting, which has a great impact on performance. So if we were able to keep the ids in order it would be a completely different situation, just append. So the ordering of IDS is very important and an inevitable feature of ID design.
- Throughput/Performance (OPS/Time): Indicates the number of ids generated per unit time (per second). Generating an ID is a very high-frequency operation, and it’s also the most basic. Given the slow performance of ID generation, no amount of system tuning will yield better performance.
- It is not hard to understand why, assuming that ID generation is slow, the overall performance ceiling will be limited.
- Stability (time/op): Generally, the stability index can be adoptedThe time of each operation is sampled at the percentile levelTo analyze, for exampleCosIdPercentile samplingP9999 = 0.208 us/op, i.e.,0% ~ 99.99%The unit operation time is less than or equal to0.208 us/op.
- Percentile WIKI: A statistical term used to describe the percentile of a percentage point when a set of data is sorted from smallest to largest and the corresponding cumulative percentile is calculated. Pk is the KTH percentile. The percentile is used to compare the relative status of individuals in a population.
- Why not averageTime per operation: Is Mr. Ma’s price equal to yours? Does the average value make sense?
- You can use the minimumTime per operationAnd the biggestTime per operationFor reference? Because the minimum and maximum only explain the case of the zero bound point, although they can be used as a reference for stability, they are still not comprehensive. andpercentileI’ve covered both of these.
- Autonomy (dependence): Is there any dependence on the external environmentThem roughly modeWill strongly rely on third-party storage middleware to obtain
NexMaxId
. Autonomy also has an impact on usability. - availability: The availability of distributed ids is mainly affected by autonomy, such asSnowflakeIdThe clock is dialed back, and the system becomes unavailable for a short time. whileThem roughly modeWill be subject to third party transmitters (
NexMaxId
).- Usability WIKI: The percentage of the total available time for an individual feature at a given time interval.
- MTBF: mean fault interval
- MDT: Average repair/recovery time
- Availability=MTBF/(MTBF+MDT)
- Suppose MTBF is 1 year and MDT is 1 hour, i.e
The Availability = (365 * 24)/(365 * 24 + 1) = 0.999885857778792 99.99% material
That’s what we call the four nines for usability.
- adaptive: refers to the adaptive ability in the face of changes in the external environment. Here, we mainly talk about the performance of dynamically scaling distributed IDS in the face of traffic burst.
- The SegmentChainId can dynamically scale the safe distance based on the starvation state.
- SnowflakeId regular bit allocation scheme has a constant performance of 409.6W. Although it is possible to adjust the bit allocation scheme to obtain different TPS performance, the change of bit allocation method is destructive. Generally, the bit allocation scheme is determined according to the business scenario and will not be changed.
- Storage space: using mysq-Innodb B+ tree as an example, normal indexes (secondary indexes) will store primary keys. The larger the primary key, the more memory cache and disk space will be occupied. Page The less data the Page stores, the more times the disk I/O is accessed. In general, keeping the storage footprint as small as possible is a good design principle in most scenarios when the business needs are met.
Comparison of core indicators of different distributed ID schemes
Order (divide and conquer, dichotomy search, must maintain me)
We’ve just discussed the importance of ID ordering, so we should design our ID algorithm so that ids are monotonically increasing as much as possible, like the primary key of a table is incrementing. Unfortunately, because of distributed system problems such as global clock and performance, we often have to choose a combination of locally monotonically increasing and globally trending (just as we have to choose ultimate consistency in distributed systems) to get multiple trade-offs. Let’s take a look at what is monotonically increasing and trending increasing.
Monotonically increasing orderliness
Monotonically increasing: T represents the global absolute time point, assuming that Tn+1>Tn (absolute time is always forward, and relativity, time machine, etc., are not considered here), then F(Tn+1)>F(Tn) is inevitable, and the auto-increasing primary key of the database belongs to this category. It is also important to note that monotone increment and continuous increment are different concepts. Continuous increment: F(n+1)=(F(n)+step), that is, the next ID must be equal to the current ID+ step, when step =1, similar to such a sequence :1->2->3->4->5.
If the primary key of a database is not incrementing continuously, you must have experienced this situation. Why is the database designed this way?
The tendency of order increases
Increasing trend: Tn>Tn-s, then the large probability is F(Tn)>F(Tn-s). Although there are disordered intervals, the overall trend is increasing. From the diagram above, there is an upward trend (trend line).
- inSnowflakeIdIn then-sGlobal clock synchronization is affected. Procedure
- In segment mode (SegmentId)n-sSubject to number segment availability interval (
Step
).
Distributed ID allocation scheme
UUID/GUID
- : ThumbSup: Does not rely on any third party middleware
- : thumbsup: high performance
- : Thumbsdown: Completely unordered
- : Thumbsdown: Large footprint, requiring 128 bits of storage.
The biggest problem with UUIds is that they are random, unordered, and when used for primary keys can cause inefficient primary key indexes in databases (frequently inserting data in the middle of the index, rather than appending, to maintain the index tree). This is the most important reason why UUID does not apply to database primary keys.
SnowflakeId
SnowflakeId is a distributed ID algorithm that uses Long (64-bit) bit partitions to generate ids. The general bit allocation scheme is timestamp(41-bit)+machineId(10-bit)+sequence(12-bit)=63-bit.
- 41-bit
timestamp
=(1L<<41)/(1000/3600/365), about 69 years of time stamps can be stored, that is, the absolute time that can be used isEPOCH
+69 years, usually we need customEPOCH
For product development time, it is also possible to increase the number of timestamp bits by squeezing the allocation bits in other regions to extend the available time. - 10-bit
machineId
=(1L<<10)=1024, that is, 1024 copies of the same service can be deployed (in Kubernetes concept, there is no master and slave copies, here directly use the definition of Kubernetes). It is generally not necessary to use this number of digits, so it will be redefined depending on the deployment size. - 12-bit
sequence
=(1L<<12)*1000=4096000, that is, a single machine can generate about 409W ids per second, and a global service cluster can generate ids4096000 * 1024 = 419430 w = 4.19 billion (TPS)
.
From the SnowflakeId design:
- :thumbsup:
timestamp
At the high level, single instanceSnowflakeIdIs will ensure that the clock is always forward (check the local clock back), so is the local monotone incrementing. This parameter is affected by global clock synchronization/clock rollbackSnowflakeIdThe global trend is increasing. - :thumbsup:SnowflakeIdThere is no strong dependency on any third party middleware and performance is very high.
- : ThumbSup: Bit assignment solutions can be flexibly configured according to business system needs for optimal use.
- : ThumbsDown: Strong reliance on the native clock, with potential clock back issues resulting in duplicate IDS and a brief unavailable state.
- :thumbsdown:
machineId
This parameter must be set manually. In actual deployment, this parameter is requiredmachineId
, would be very inefficient.
SnowflakeId machine number assignment problem
The bit allocation scheme in SnowflakeId is determined according to the business design and there is little change and maintenance required. However, machineIDS always need to be configured and cannot be duplicated in the cluster. Otherwise, the partitioning principle will be broken and the ID uniqueness principle will be broken. When the cluster size is large, the maintenance of Machineids is very tedious and inefficient.
It should be noted that SnowflakeId’s MachineId is a logical concept, not a physical one. Imagine that the MachineId is physical, which means that a machine can only have one MachineId. What’s the problem?
Currently, CosId provides the following three MachineId allocators.
- ManualMachineIdDistributor: manual configuration
machineId
It is not recommended for use only when the cluster size is very small. - Use StatefulSetMachineIdDistributor:
Kubernetes
theStatefulSet
The stable ID (HOSTNAME=service-01) is provided as the machine number. - Use RedisMachineIdDistributor:RedisAs the distribution store of the machine number, it will also store
MachineId
The last time stamp ofThe clock goes back on startupThe inspection.
SnowflakeId clock rollback problem
The fatal problem with clock backtracking is (understandably) ID duplication, collision, and ID duplication is clearly not tolerated. In the SnowflakeId algorithm, partition IDS are based on machineIDS. It is not difficult to understand that different Machineids cannot produce the same ID. So the clock rollback problem we solved is the clock rollback problem of the current MachineId, not all cluster nodes.
The MachineId clock back problem can be divided into two cases:
- Run time clock back: that is, the current timestamp obtained at run time is smaller than the last timestamp obtained. The clock rollback for this scenario is easy to handle, in generalSnowflakeIdThe code is stored as it is implemented
lastTimestamp
Check for clock rollback at run time and throw clock rollback exceptions.- Throwing an exception when the clock goes back is not a good practice because there are few other options for the downstream user (oh, what else can I do, wait), clock synchronization is the only option, and don’t let the user choose when there is only one option.
ClockSyncSnowflakeId
isSnowflakeId
Which is used when a clock rollback occursClockBackwardsSynchronizer
Actively wait for clock synchronization to regenerate the ID, providing a more user-friendly experience.
- Boot-time clock rollback: The current clock obtained when the service instance is started is smaller than the last time the service was stopped. At this time
lastTimestamp
Cannot be stored in process memory. When access to external storageMachine statusIs used when the value is greater than the current clockClockBackwardsSynchronizer
Active clock synchronization.- LocalMachineStateStorage: Use local file storage
MachineState
(Machine number, last timestamp). Because of the use of local files, only if the instance’s deployment environment is stable,LocalMachineStateStorage
To apply. - RedisMachineIdDistributor:
MachineState
Stored in theRedisDistributed cache, which ensures that the last time a service instance was down is always availableMachine status.
- LocalMachineStateStorage: Use local file storage
JavaScript value overflow from SnowflakeId
MAX_SAFE_INTEGER is only 53-bit. If you return a 63-bit SnowflakeId directly to the front end, you will overflow the value. It’s just that snowflakeids appear faster). Obviously overflow is not acceptable, and there are two possible solutions:
- The 63-bit that will be generated
SnowflakeId
convertString
Type.- Directly to the
long
Converted toString
. - use
SnowflakeFriendlyId
willSnowflakeId
To a friendlier string representation:{timestamp}-{machineId}-{sequence} -> 20210623131730192-1-0
- Directly to the
- The custom
SnowflakeId
Bit allocation to shortenSnowflakeId
The number of digits (53-bit) enablesID
Does not overflow when supplied to the front end- use
SafeJavaScriptSnowflakeId
(JavaScript
The safety ofSnowflakeId
)
- use
Segment mode (SegmentId)
From the above design diagram, it is not difficult to see that the basic design idea of number segment mode is to obtain available ID (ID segment/number segment) of a certain length (Step) each time to reduce the number of NETWORK I/O requests and improve performance.
- : ThumbsDown: Strong reliance on 3rd party number dispenser, usability is affected by 3rd party number dispenser.
- :thumbsdown: Fetched every time the number segment is used up
NextMaxId
Network I/O requests are required, resulting in low performance. - The single instance ID increases monotonically, and the global trend increases.
- It’s not hard to see from the blueprintsInstance 1Every time acquired
NextMaxId
, must be larger than the last time, which means that the next number segment must be larger than the last time, so it is monotonically increasing from the single instance. - Multiple instances each hold different number segments, which means that the ids generated by different instances at the same time are out of order, but the overall trend is increasing, so the global trend is increasing.
- It’s not hard to see from the blueprintsInstance 1Every time acquired
- The degree of ID disordering is affected by Step length and cluster size (as can be seen from the increasing trend graph).
- If there is only one instance in the cluster, the number segment pattern is monotonically increasing.
Step
The smaller it is, the smaller the disorder. whenStep=1
, will be infinitely close to monotonically increasing. It’s important to note that this is infinitely close rather than monotonically increasing, because you can think of a scenario like this:- Number dispenser T1Moment toInstance 1Distribution of the
ID=1
,T2Moment toInstance 2Distribution of theID=2
. Because of machine performance, network, etc.,Instance 2
Network IO write request precedesInstance 1
To arrive. At this point, the ids are still out of order for the database.
- Number dispenser T1Moment toInstance 1Distribution of the
SegmentChainId
The SegmentChainId is an enhanced version of the SegmentId. It has the following advantages over the SegmentId:
- Stability:SegmentId(P9999=46.624(US /op)) is mainly due to the synchronization after the number segment is used up
NextMaxId
(which produces network IO).- SegmentChainId(P9999=0.208(US /op)) Introduced new rolesPrefetchWorkerFor maintenance and warrantysafe distance, ideally so that the thread getting the ID does not need to wait for synchronization almost at all
NextMaxId
The performance can be approximatedAtomicLong
的 TPS performance: + 12743 w/s JMH benchmarks 。
- SegmentChainId(P9999=0.208(US /op)) Introduced new rolesPrefetchWorkerFor maintenance and warrantysafe distance, ideally so that the thread getting the ID does not need to wait for synchronization almost at all
- Adaptability: FromSegmentIdIn the introduction we know the effectID out-of-orderCluster size,
Step
Size. Cluster size is out of our control, butStep
It’s adjustable.Step
It should be close to possible to makeID monotonically increasingIs more likely to occur.Step
Too little will affect throughput, so how do we set it properlyStep
? The answer is that we cannot accurately estimate throughput requirements at all points in time, so the best approach is to automatically increase Step when throughput requirements are high and automatically shrink Step when throughput is low.- SegmentChainId introduces the concept of starvation state. PrefetchWorker will detect whether the current safety distance needs to expand or shrink according to the starvation state in order to achieve the tradeoff between throughput and order. This is the adaptiveness of SegmentChainId.
SegmentChainId- Throughput (ops/s)
RedisChainIdBenchmark-Throughput
MySqlChainIdBenchmark-Throughput
SegmentChainId- The percentile of time taken for each operation
RedisChainIdBenchmark-Sample
MySqlChainIdBenchmark-Sample
Description of the benchmark report running environment
- Benchmark running environment: MacBook Pro-(M1)
- All benchmarking is performed on the development notebook.