In 2010, Twitter opened source Snowflake, a globally unique ID generation algorithm used by internal teams, which translates as Snowflake. Snowflake doesn’t rely on a database, but can be generated directly by a programming language. It generates three consecutive ids that look like this: 563583455628754944, 563583466173235200, 563583552944996352.
Snowflake stores the four parts of an ID in 64 bits:
- The highest bit is 1 bit and the value is fixed to 0 to ensure that the generated ID is a positive number.
- The median is 41 bits, and the value is the millisecond timestamp.
- The value is the ID of the working machine. The upper limit of the value is 1024.
- The last bit is 12 bits, and the value is the different ID generated in the current millisecond. The upper limit of the value is 4096.
This design allows up to 4096 ids to be generated in 1 millisecond, and the generated ids are ordered incrementally because the median is a timestamp.
Snowflake code implementation
There is an important knowledge point in the code implementation – bit operation. For those unfamiliar with bitwise operations, read the bitwise section of this book (Python Programming Reference).
First we import the necessary libraries and set the length of the middle, bottom and bottom bits:
Import time import Logging # WORKER_BITS = 5 DATACENTER_BITS = 5 SEQUENCE_BITS = 12Copy the code
The middle and lower bits here contain five machine room ids and five machine ids, which together make up the 10-bit middle and lower bits. Then through the length of the lower computer room and the upper limit of the machine;
WORKER_UPPER_LIMIT = -1 ^ (-1 << WORKER_BITS) DATACENTER_UPPER_LIMIT = -1 ^ (-1 << DATACENTER_BITS)Copy the code
As mentioned above, this is a set of 64 bit lengths, and after getting each segment we need to group them together bitwise, so here we need to calculate the offset of the position:
WORKER_SHIFT = SEQUENCE_BITS DATACENTER_ID_SHIFT = SEQUENCE_BITS + WORKER_BITS TIMESTAMP_LEFT_SHIFT = WORKER_BITS TIMESTAMP_LEFT_SHIFT = SEQUENCE_BITS + WORKER_BITS + DATACENTER_BITSCopy the code
Then set the mask and meta time:
SEQUENCE_MASK = -1 ^ (-1 << SEQUENCE_BITS) # Mask EPOCH = 1577808001000 # Meta timestamp The meta is set to 2020-01-01 00:00:01Copy the code
At Snowflake’s 41-bit length, the timestamp is limited to a maximum of 70 years. Starting from 1970 would be a waste of decades, so the meta-time can be set to the time when the code was written or to recent years.
Copyright watermarking wechat public number Python programming reference
This algorithm involves time boundary, timestamp generation and number retrieval operations. The basic structure of the code is as follows:
class SnowFlake: def __init__(self, data_center_id, worker_id, sequence=0): pass def _timestamp(): Pass def take(self): "" pass def _generate(self): "" def _wait_next_time(self, last_TIMESTAMP): """ wait until next unit time """ passCopy the code
The init function does some initialization and basic boundary checking, such as checking the boundary between WORKER ID and DATACENTER ID, and initializing the ID, as follows:
Def __init__(self, data_center_id, worker_id, sequence=0): if worker_id > WORKER_UPPER_LIMIT: Raise ValueError('WORKER ID exceeds the upper limit ') if worker_id < 0: Raise ValueError('WORKER ID lower than lower limit ') if data_center_id > DATACENTER_UPPER_LIMIT: Raise ValueError('DATA CENTER ID exceeds the upper limit ') if data_center_id < 0: Raise ValueError('DATA CENTER ID is lower than the lower limit ') self.worker_id = worker_id self.datacenter_id = data_center_id self.sequence = Sequence self.last_timestamp = -1 # Sequence self.last_timestamp = -1 #Copy the code
Timestamp generation is simple, but we need to write our code with the single responsibility principle in mind, so we can use it as a separate function:
@staticMethod def _timestamp(n=1e3): """ return int(time.time() * n)Copy the code
If the number generated in a unit of time exceeds the upper limit, it needs to wait until the next unit of time, and the corresponding code is as follows:
def _wait_next_time(self, last_timestamp): Timestamp = self._timestamp() while timestamp <= last_timestamp: timestamp = self._timestamp() return timestampCopy the code
There are many operations involved in number retrieval, such as whether the check time is dialed back, whether the number generated per unit time exceeds the limit, and account generation, etc. Therefore, multiple small functions need to be divided and called, following the principle of single responsibility:
def take(self) -> int: Timestamp = self._timestamp() self._check(timestamp) self.last_timestamp = timestamp # Updates the timestamp of the last generated number return self._generate(timestamp) def _generate(self, timestamp): "" "to generate a serial number "" "number = ((timestamp - EPOCH) < < TIMESTAMP_LEFT_SHIFT) | (self. Datacenter_id < < | \ DATACENTER_ID_SHIFT) (self.worker_id << WORKER_SHIFT) | self.sequence return number def _check(self, timestamp): """ self._time_back_off_check(timestamp) self._number_check(timestamp) def _number_check(self, timestamp): """ If timestamp == self.last_timestamp: self.sequence = (self.sequence + 1) & SEQUENCE_MASK if self.sequence == 0: timestamp = self._wait_next_time(self.last_timestamp) else: Self. sequence = 0 def _time_back_off_check(self, timestamp): """ if timestamp < self.sequence = 0 def _time_back_off_check(self, timestamp): "" Logging. error(' clock rollback is found and the last timestamp is {}'. Format (self.last_timestamp)) raise Exception(" Clock rollback Exception ")Copy the code
Copyright watermark Wei Shidong’s technical column www.weishidong.com
At this point, the Snowflake algorithm is written. The complete code is shown below:
This is the wechat official account Python programming reference. If you find this article helpful, please follow me. Go to wei Shidong’s technology column at www.weishidong.com for more useful information. Top articles at a glance:
[1] Several Effective Ways to Cope with the 35-year-old Crisis
[2] Practical Guide to Drawing and Design for Engineers
[3] Continuous Delivery Practices – GitHub Actions Deploying Node applications to cloud Servers
[4] Practice on SSH Password-free/Username Free/IP-free Cloud Server Login
[5] Practice on ElasticSearch deleting data on specified days