background
Suppose we have a distributed system where we need to maintain a global ID field, which we can treat as a unique identifier that cannot be repeated. How do we generate such an ID?
One of the easiest ways to do this is to use the Redis key pair, so every time you update it, you call incr and you get a unique ID, and another way is to use MySQL or some other database, because we know that MySQL can generate auto-increment primary keys, It is also possible to use this primary key as a distributed ID.
However, the above two methods are not very efficient and depend on the third party. If we want to generate distributed IDS more efficiently, the best way is to generate them locally as far as possible without negotiating with other nodes. However, there is a problem, how to ensure that ID does not repeat? We can use the Snowflake algorithm to solve this problem.
concept
Snowflake can be used to generate unique ids in distributed systems. It was developed by Twitter. There are many different versions of Snowflake, but the basic idea is the same.
Snowflakes use 64 bits, but only 63 bits are used. The first bit is a sign bit.
Timestamp is a 41-bit timestamp that is used to generate an ID or a timestamp difference relative to a specific time. Machine ID is the ID number assigned to each machine in the distributed system. 10 bits indicates a maximum of 1024 machines. Sequence number represents the sequence number, because multiple ids may be assigned to the same timestamp.
ID generated
The machine ID of each machine is pre-configured and can be obtained directly from the data center ID and the machine ID of the data center. When we need to generate an ID, first we need to obtain the current timestamp, determine whether it is consistent with the last timestamp, if it is consistent with the last timestamp, then we should increase the sequence number, and then construct a corresponding 64-bit ID number through the shift operation. If the current timestamp is different from the last one, then we can directly modify the timestamp, and then set the serial number to zero, and concatenate it.
Code implementation
The implementation of the algorithm is quite simple, the core code is given below:
func (s *Snowflake) GetID(a) int64 {
// Get the current timestamp first
timestamp := time.Now().UnixMilli()
// Same timestamp sequence number +1
if timestamp == s.LastTimestamp {
s.Sequence = (s.Sequence + 1) & sequenceMask
// The loop is redone
// Multiple ids are generated in the same timestamp
if s.Sequence == 0 {
for timestamp <= s.LastTimestamp {
timestamp = time.Now().UnixMilli()
}
}
} else {
s.Sequence = 0
}
// Reset
s.LastTimestamp = timestamp
// Make a splice
return (timestamp-epoch)<<timestampShift | (s.DatacenterID << datacenterIDShift) | (s.WorkerID << workerIDShift) | s.Sequence
}
Copy the code
test
I wrote a very simple benchmark
func BenchmarkGetID(b *testing.B) {
s := New(10.10)
for i := 0; i < b.N; i++ {
s.GetID()
}
}
Copy the code
The following is the result, which is not bad, so that assuming 250ns each time, 1s can also generate 4,000,000 different ids
goos: windows goarch: amd64 pkg: snowflake cpu: Intel(R) Core(TM) i7-8565U CPU @1.80ghz benchmarkGetiD-8 4897538 246.0ns /op PASS OK Snowflake 2.069sCopy the code
Twitter also provides its own implementation of Scala, which can be found on GitHub and in my repository for this article