CosId profile

CosId is designed to provide a universal, flexible, and high-performance distributed ID generator. There are currently two types of ID generators:

  • SnowflakeId : Single-machine TPS performance: 409W/sJMH benchmark, mainly addressedThe clock is set backMachine number assignment problemAnd provide a more friendly and flexible use experience.
  • SegmentId: Obtain a Step ID at a time to reduce the frequency of network I/O requests to the number dispenser and improve performance. IdSegmentDistributor: them roughly the dispenser () them roughly memory RedisIdSegmentDistributor: based onRedisThe number dispenser of. JdbcIdSegmentDistributor: based on theJdbcNumber segment dispenser, support a variety of relational databases. SegmentChainId(recommended):SegmentChainId (lock-free) is an enhancement to the SegmentId. The performance can approximate that of AtomicLongTPS performance: + 12743 w/sJMH benchmarks. PrefetchWorker maintains safeDistance and supports dynamic safeDistance expansion/contraction based on hunger state.

Background (why you need distributed IDS)

In the process of software system evolution, with the growth of business scale, we need to cluster deployment to share computing and storage pressure, we can easily achieve stateless, elastic scaling of application services. But is simply increasing the number of copies of services enough? Obviously not enough, because the performance bottleneck is often at the database level, so this time we need to consider how to expand, scale, cluster the database, usually using the database, sub-table way to deal with. So how do I shard (horizontal shard, and of course vertical shard but that’s not the topic of this article), the premise of shard is that we have to have an ID, and then we can shard according to the shard algorithm. (For example, the simple and commonly used ID modulo sharding algorithm, which is similar to the concept of Hash algorithm, we need to have the key to Hash the slot.)

Of course, there are many distributed scenarios that require distributed ids, and I won’t list them all here.

Core metrics for distributed ID schemes

  • Global (same business) uniqueness: The guarantee of uniqueness is a necessary condition for ids, and it is easy to understand that assuming ids are not unique will result in primary key conflicts. Generally speaking, global uniqueness does not mean that all business services should be unique, but that different deployed copies of the same business service should be unique. For example, multiple deployed copies of the Order service are required to be globally unique when generating the Id of the T_order table. Whether the ID and T_order generated by t_order_item are unique does not affect the uniqueness constraint and has no side effects. The same is true for different business modules. That is, uniqueness mainly solves the problem of ID conflicts.
  • Order: A guarantee of order is required for all query-oriented data structure algorithms (except Hash algorithms) and is a prerequisite for dichotomous search (divide and conquer). Mysq-innodb B+ tree is the most widely used, assuming that Id is unordered, B+ tree will frequently insert the Id in the middle of the index and move the position of the node behind to maintain the order, and even cause frequent page splitting, which has a great impact on performance. So if we were able to keep the ids in order it would be a completely different situation, just append. So the ordering of IDS is very important and an inevitable feature of ID design.
  • Throughput/Performance (OPS/Time) : Indicates the number of ids generated per second. Generating an ID is a very high-frequency operation, and it’s also the most basic. Given the slow performance of ID generation, no amount of system tuning will yield better performance. It is not hard to understand why, assuming that ID generation is slow, the overall performance ceiling will be limited.
  • Stability (time/op): Generally, the stability index can be adoptedThe time of each operation is sampled at the percentile levelTo analyze, for exampleCosIdPercentile samplingP9999 = 0.208 us/op, i.e.,0% ~ 99.99%The unit operation time is less than or equal to0.208 us/op. Percentile WIKI: A statistical term used to describe the percentile of a percentage point when a set of data is sorted from smallest to largest and the corresponding cumulative percentile is calculated. Pk is the KTH percentile. The percentile is used to compare the relative number of bits of individuals in a population. Why not averageTime per operation: Is Mr. Ma’s price equal to yours? Does the average value make sense? You can use the minimumTime per operationAnd the biggestTime per operationFor reference? Because the minimum and maximum only explain the case of the zero bound point, although they can be used as a reference for stability, they are still not comprehensive. andpercentileBoth of these indicators have been covered.
  • Autonomy (dependency) : mainly refers to whether there is dependence on the external environment, such as the number segment mode will strongly rely on third-party storage middleware to obtain the NexMaxId. Autonomy also has an impact on usability.
  • availability: The availability of distributed ids is mainly affected by autonomy, such asSnowflakeIdThe clock is dialed back, and the system becomes unavailable for a short time. whileThem roughly modeCan be affected by the availability of a third party transmitter (NexMaxId). Usability WIKI: The percentage of the total available time for an individual feature at a given time interval. MTBF: mean failure interval MDT: Mean repair/recovery time Availability=MTBF/(MTBF+MDT) Assume that MTBF is 1 year and MDT is 1 hour, that is, Availability=(36524)/(365)24+1)=0.999885857778792≈99.99%, which is what we usually call 4 9s for availability.
  • Adaptability: refers to the adaptive ability in the face of external environment changes. Here we mainly talk about the performance of dynamically scaling distributed ID in the face of traffic burst. SegmentChainId can dynamically scale the safe distance based on the starvation state. SnowflakeId regular bit allocation scheme has a constant performance of 409.6W. Although it is possible to adjust the bit allocation scheme to obtain different TPS performance, the change of bit allocation method is destructive. Generally, the bit allocation scheme is determined according to the business scenario and will not be changed.
  • Storage space: using mysq-Innodb B+ tree as an example, normal indexes (secondary indexes) will store primary keys. The larger the primary key, the more memory cache and disk space will be occupied. Page The less data the Page stores, the more times the disk I/O is accessed. In general, keeping the storage footprint as small as possible is a good design principle in most scenarios when the business needs are met.

Comparison of core indicators of different distributed ID schemes

A distributed ids Global uniqueness order throughput Stability (1s=1000,000us) autonomy availability adaptive The storage space
UUID/GUID is Complete disorder 3078638(ops/s) P9999 = 0.325 (us/op) Completely autonomous 100% no 128-bit
SnowflakeId is Local monotone increasing, global trend increasing (affected by global clock) 4096000(ops/s) P9999 = 0.244 (us/op) Rely on the clock A clock rollback may cause a temporary unavailability no 64-bit
SegmentId is Local monotone increasing, global trend increasing (influenced by Step) 29506073(ops/s) P9999 = 46.624 (us/op) Relies on a third party number dispenser Affected by number segment dispenser availability no 64-bit
SegmentChainId is Local monotonicity increases and global trend increases (affected by Step and safe distance) 127439148(ops/s) P9999 = 0.208 (us/op) Relies on a third party number dispenser This is affected by the availability of the number segment dispenser, but is higher than the SegmentId because the security distance exists and the ID segment is reserved is 64-bit

Order (divide and conquer, dichotomy search, must maintain me)

We’ve just discussed the importance of ID ordering, so we should design our ID algorithm so that ids are monotonically increasing as much as possible, like the primary key of a table is incrementing. Unfortunately, because of distributed system problems such as global clock and performance, we often have to choose a combination of locally monotonically increasing and globally trending (just as we have to choose ultimate consistency in distributed systems) to get multiple trade-offs. Let’s take a look at what is monotonically increasing and trending increasing.

Monotonically increasing orderliness

Monotonically increasing: T represents the global absolute time point, assuming that Tn+1>Tn (absolute time is always forward, and relativity, time machine, etc., are not considered here), then F(Tn+1)>F(Tn) is inevitable, and the auto-increasing primary key of the database belongs to this category. In addition, it should be noted that monotone increment and continuous increment (F(n+1)=F(n)+step) are different concepts.

The tendency of order increases

Increasing trend: Tn>Tn-s, then the large probability is F(Tn)>F(Tn-s). Although there are disordered intervals, the overall trend is increasing. From the diagram above, there is an upward trend (trend line).

  • In SnowflakeId, n-s is affected by global clock synchronization.
  • In segment mode (SegmentId) n-s is affected by segment available interval (Step).

Distributed ID allocation scheme

UUID/GUID

  • Does not rely on any third party middleware
  • High performance
  • Complete disorder
  • The storage space is large, requiring 128 bits.

The biggest problem with UUIds is that they are random, unordered, and when used for primary keys can cause inefficient primary key indexes in databases (frequently inserting data in the middle of the index, rather than appending, to maintain the index tree). This is the most important reason why UUID does not apply to database primary keys.

SnowflakeId

SnowflakeId is a distributed ID algorithm that uses Long (64-bit) bit partitions to generate ids. The general bit allocation scheme is timestamp(41-bit)+machineId(10-bit)+sequence(12-bit)=63-bit.

  • 41-bitTIMESTAMP =(1L<<41)/(1000/3600/365), about 69 years of time stamps can be stored, that is, the absolute time that can be used is EPOCH+69 years. Generally, we need to customize EPOCH as product development time. In addition, we can compress the allocated bits in other regions. To increase the number of timestamp bits to extend the available time.
  • 10-bitmachineId=(1L<<10)=1024, that is, 1024 copies of the same service can be deployed. It is generally not necessary to use this number of digits, so it will be redefined depending on the deployment size.
  • 12-bitsequence=(1L<<12)1000=4096000. That is, a single server can generate 409W ids per second, and a global service cluster can generate 4096000 ids1024 = 419430 w = 4.19 billion (TPS).

From the SnowflakeId design:

  • Timestamp in high position, single instanceSnowflakeIdIs will ensure that the clock is always forward (check the local clock back), so is the local monotone incrementing. This parameter is affected by global clock synchronization/clock rollbackSnowflakeIdThe global trend is increasing.
  • SnowflakeIdThere is no strong dependency on any third party middleware and performance is very high.
  • The bit allocation scheme can be flexibly configured according to service system requirements to achieve the optimal use effect.
  • With a strong reliance on the local clock, potential clock back problems can result in duplicate ids and temporary unavailable states.
  • The machineId needs to be set manually, and it is very inefficient to assign machineids manually in actual deployment.

SnowflakeId machine number assignment problem

The bit allocation scheme in SnowflakeId is determined according to the business design and there is little change and maintenance required. However, machineIDS always need to be configured and cannot be duplicated in the cluster. Otherwise, the partitioning principle will be broken and the ID uniqueness principle will be broken. When the cluster size is large, the maintenance of Machineids is very tedious and inefficient.

It should be noted that SnowflakeId’s MachineId is a logical concept, not a physical one. Imagine that the MachineId is physical, which means that a machine can only have one MachineId. What’s the problem?

Currently, CosId provides the following three MachineId allocators.

  • ManualMachineIdDistributor: manual configuration machineId, generally only in cluster scale are likely to use a very small, is not recommended.
  • StatefulSetMachineIdDistributor: use Kubernetes StatefulSet provide stable identity ID number (HOSTNAME = service – 01) as a machine.
  • RedisMachineIdDistributor: use Redis as the machine number of distributed storage, as well as storage MachineId last timestamp, used for inspection of startup clock callback.

SnowflakeId clock rollback problem

The fatal problem with clock backtracking is (understandably) ID duplication, collision, and ID duplication is clearly not tolerated. In the SnowflakeId algorithm, partition IDS are based on machineIDS. It is not difficult to understand that different Machineids cannot produce the same ID. So the clock rollback problem we solved is the clock rollback problem of the current MachineId, not all cluster nodes.

The MachineId clock back problem can be divided into two cases:

  • Run time clock back: that is, the current timestamp obtained at run time is smaller than the last timestamp obtained. The clock callback in this scenario is very easy to handle. The normal implementation of SnowflakeId code stores lastTimestamp for runtime clock callback checks and throws a clock callback exception. Throwing an exception when the clock goes back is not a good practice because there are few other options for the downstream user (oh, what else can I do, wait), clock synchronization is the only option, and don’t let the user choose when there is only one option. ClockSyncSnowflakeId is SnowflakeId wrapper, when the clock callback when using ClockBackwardsSynchronizer active waiting for clock synchronization to regenerate ID, provide a more friendly experience.
  • Boot-time clock rollback: The current clock obtained when the service instance is started is smaller than the last time the service was stopped. At this point, the lastTimestamp cannot be stored in the process memory. When access to external storage machine is greater than the current state of the clock when the clock, will use ClockBackwardsSynchronizer active synchronous clock. LocalMachineStateStorage: Store MachineState(machine number, last timestamp) using local files. Because local files are used, LocalMachineStateStorage is applicable only if the instance’s deployment environment is stable. RedisMachineIdDistributor: MachineState stored in Redis distributed cache, so that you can always get to the last service instances when stop machine status.

JavaScript value overflow from SnowflakeId

MAX_SAFE_INTEGER is only 53-bit. If you return a 63-bit SnowflakeId directly to the front end, you will overflow the value. It’s just that snowflakeids appear faster). Obviously overflow is not acceptable, and the following two solutions can be used:

  • Convert the generated 63-bitSnowflakeId to String. Convert long directly to String. SnowflakeFriendlyId: {timestamp}-{machineId}-{sequence} -> 20210623131730192-1-0
  • Custom SnowflakeId distribution to shorten SnowflakeId digits (53 – bit) make the ID for the front not overflow use SafeJavaScriptSnowflakeId (JavaScript security SnowflakeId)

Segment mode (SegmentId)

From the above design diagram, it is not difficult to see that the basic design idea of number segment mode is to obtain available ID (ID segment/number segment) of a certain length (Step) each time to reduce the number of NETWORK I/O requests and improve performance.

  • Strong dependence on third party number dispenser, availability is affected by third party number dispenser.

  • Obtaining the NextMaxId every time the number segment is used up requires network I/O requests, which results in low performance.

  • The single instance ID increases monotonically, and the global trend increases.

    • From the design diagram, it is not difficult to see that the NextMaxId obtained by Instance 1 each time must be larger than the last time, which means that the next number segment must be larger than the last time, so from the single Instance view is monotonically increasing.
    • Multiple instances each hold different number segments, which means that the ids generated by different instances at the same time are out of order, but the overall trend is increasing, so the global trend is increasing.
  • The degree of ID disordering is affected by Step length and cluster size (as can be seen from the increasing trend graph).

    • If there is only one instance in the cluster, the number segment pattern is monotonically increasing.
    • The smaller the Step, the smaller the disorder. When Step=1, will be infinitely close to monotonically increasing. Note that this is infinitely close rather than monotonically increasing. You can consider the scenario where the number dispenser at time T1 assigns ID=1 to Instance 1 and at time T2 assigns ID=2 to Instance 2. The network I/O write request for Instance 2 precedes the one for Instance 1 due to machine performance, network, etc. At this point, the ids are still out of order for the database.

SegmentChainId

The SegmentChainId is an enhanced version of the SegmentId. It has the following advantages over the SegmentId:

  • Stability:SegmentIdThe stability problem (P9999=46.624(us/op)) is mainly caused by the synchronous acquisition of NextMaxId after the number segment is used up (resulting in network IO).SegmentChainId(P9999=0.208(US /op)) Introduced new rolesPrefetchWorkerFor maintenance and warrantysafe distanceIdeally, the thread that gets the ID does not need to wait for the NextMaxId to get the ID synchronously almost at all, and the performance can approximate that of AtomicLongTPS performance: + 12743 w/sJMH benchmarks.
  • Adaptability: From the SegmentId introduction, we know that there are two factors that affect ID disordering: cluster size and Step size. Cluster size is out of our control, but Step can be adjusted. The Step should be as small as possible to make it more likely that the ID will increase monotonically. Step too small affects throughput, so how do we set Step properly? The answer is that we cannot accurately estimate throughput requirements at all points in time, so the best approach is to automatically increase Step when throughput requirements are high and automatically shrink Step when throughput is low. SegmentChainId introduces the concept of starvation state. PrefetchWorker will detect whether the current safety distance needs to expand or shrink according to the starvation state in order to achieve the tradeoff between throughput and order. This is the adaptiveness of SegmentChainId.

SegmentChainId- Throughput (ops/s)

SegmentChainId- The percentile of time taken for each operation

Description of the benchmark report running environment

  • Benchmark running environment: MacBook Pro-(M1)
  • All benchmarking is performed on the development notebook.