The original link

preface

Hi everybody, I’m Asong, this is my twelfth article, today I’m going to introduce the snowflake algorithm to you. The snowflake algorithm is secondary because it’s all too familiar, and the main purpose is to recommend my new series. Today, I got the idea to create a new collection. This series mainly stores the algorithms used in our daily work development, such as snowflake algorithm, hash algorithm and so on. There is a lot of knowledge about these algorithms on the Internet, but it is very miscellaneous, it is not easy to find an algorithm, but also to read all kinds of blogs, all kinds of. Therefore, our current idea is to put these algorithms together and implement them all in GO, with detailed documents such as basic knowledge learning and code description attached. In this way, when we want to learn a new algorithm, we can find it here. If not, we can also submit issues to improve it. At present, it is just starting, now only the code document of snowflake algorithm, it will be gradually improved in the future, interested partners welcome to join in, we improve together. See github notes at github.com/asong2020/g…

Snowflake algorithm background

The background of the Snowflake algorithm is of course the need for unique ID generation in the highly concurrent environment of Twitter. Thanks to the excellent technology inside Twitter, the Snowflake algorithm has survived and is widely used today because of several characteristics

  • To meet the high concurrency distributed system environment ID is not repeated
  • High generation efficiency
  • Based on the timestamp, you can guarantee a basic sequential increment
  • Does not rely on third-party libraries or middleware
  • The generatedidIt has timing and uniqueness

Principle of Snowflake algorithm

Let’s start with a picture, from the Internet:

The snowFlake ID structure is a 64-bit int.

  • 1 a bit

In binary, the highest bit is 1, which is a negative number, because we’re supposed to be using integer ids, so the highest bit here should be 0.

  • 41 bit timestamp

The 41 bits can represent 2 to the 41st minus 1. If only positive integers were used, the range of values would be 0 – (2 to the 41st minus 1). The reason for subtracting 1 here is because the range starts at 0, not 1.

This is in milliseconds, so 41 bits would represent 2^41-1 milliseconds, which translates into years of 2^41-1 /(1000 * 60 * 60 * 24 * 365) = 69

  • 10bit- Indicates the ID of the working machine

Here is the id used to record the work machine. 2^10=1024 indicates that the current rule allows a maximum of 1024 distributed nodes. This includes a 5-bit workerID and a 5-bit dataCenterID, which can be ignored here, but my code below does.

  • 12 bit – serial number

Used to record different ids generated in the same millisecond. The maximum positive integer that 12 bits can represent is 2^12-1=4095, which is 0,1,2,3,…… 4094 The 4095 digits represent the 4095 ID serial numbers generated in the same machine in the same timestamp (milliseconds).

The principle is the above, no difficulty, let’s see how to implement the code:

Go implements snowflake algorithm

1. Define basic constants

First look at the code, let’s go through it in turn:

const (
	workerIDBits =  uint64(5)  // 10bit 5bit workerID in the machine ID
	dataCenterIDBits = uint64(5) // 10 bit 5bit dataCenterID of the working machine ID
	sequenceBits = uint64(12)

	maxWorkerID = int64(- 1) ^ (int64(- 1) << workerIDBits) // The maximum value of the node ID is used to prevent overflow
	maxDataCenterID = int64(- 1) ^ (int64(- 1) << dataCenterIDBits)
	maxSequence = int64(- 1) ^ (int64(- 1) << sequenceBits)

	timeLeft = uint8(22)  TimeLeft = workerIDBits + sequenceBits // The left offset of the timestamp
	dataLeft = uint8(17)  // dataLeft = dataCenterIDBits + sequenceBits
	workLeft = uint8(12)  // workLeft = sequenceBits // Left offset of node IDx
	// 2020-05-20 08:00:00 +0800 CST
	twepoch = int64(1589923200000) // Constant timestamp (milliseconds)
)
Copy the code

The following explains each of the constants in this code:

  • worerIDBits: here is the corresponding 10bit- machine ID in the figure above, I have split here. This is one of 5bit ‘ ‘worerIDBits’
  • dataCenterIDBits: here is the corresponding 10bit- machine ID in the figure above, I have split here. This is 5 bits of itdataCenterIDBits
  • sequenceBits: corresponds to the 12bit sequence number shown in the preceding figure
  • maxWorkerID: Here is maximizing, but we use the xor method, because the binary representation of -1 is 1’s complement, in plain English, here is actually2 ^ 5-1If you don’t understand, you can verify it by yourself
  • maxDataCenterID: Same principle as above
  • maxSequence: Same principle as above
  • timeLeft: the left offset of the timestamp, so you may not understand, look at the picture above, by the left offset is 22, so you understand?
  • dataLeft: The principle is the same as above, also to find the offset
  • workLeft: Same principle as above;
  • twepoch: The time stamp of 41bit, in milliseconds. Here I choose time as2020-05-20 08:00:00 +0800 CSTDo not change this ID as soon as it is generated, otherwise the same ID will not be generated.

Definition 2.workerWork node

Since this is the ID generation algorithm used in distributed, we need to generate multiple workers, so we need to abstract the node parameters.

type Worker struct {
	mu sync.Mutex
	LastStamp int64 // Record the timestamp of the last ID
	WorkerID int64 // ID of the node
	DataCenterID int64 // Data center ID of the node
	Sequence int64 // The number of ids that have been generated in the current millisecond (accumulated from 0) is 4096 at most in 1 millisecond
}
Copy the code

Explanation of code:

  • mu sync.Mutex: Adds a mutex to ensure concurrent security
  • LastStamp int64: Records the timestamp of the last ID generation
  • WorkerID int64: The ID of this worknode has the same meaning as the 5bit workerID shown in the figure above
  • DataCenterID int64: Same principle as above
  • Sequence int64: Id sequence number that has been generated in the current millisecond (accumulated from 0) A maximum of 4096 ids can be generated in one millisecond

3. Create oneworkerobject

// In the distributed case, we should assign a separate ID to each machine by external profile or other means
func NewWorker(workerID,dataCenterID int64) *Worker  {
	return &Worker{
		WorkerID: workerID,
		LastStamp: 0,
		Sequence: 0,
		DataCenterID: dataCenterID,
	}
}
Copy the code

There is nothing to explain here.

4. Generate id

First look at the code:

func (w *Worker) getMilliSeconds(a) int64 {
	return time.Now().UnixNano() / 1e6
}

func (w *Worker)NextID(a) (uint64,error) {
	w.mu.Lock()
	defer w.mu.Unlock()


	return w.nextID()
}

func (w *Worker)nextID(a) (uint64,error) {
	timeStamp := w.getMilliSeconds()
	if timeStamp < w.LastStamp{
		return 0,errors.New("time is moving backwards,waiting until")}if w.LastStamp == timeStamp{

		w.Sequence = (w.Sequence + 1) & maxSequence

		if w.Sequence == 0 {
			for timeStamp <= w.LastStamp {
				timeStamp = w.getMilliSeconds()
			}
		}
	}else {
		w.Sequence = 0
	}

	w.LastStamp = timeStamp
	id := ((timeStamp - twepoch) << timeLeft) |
		(w.DataCenterID << dataLeft)  |
		(w.WorkerID << workLeft) |
		w.Sequence

	return uint64(id),nil
}
Copy the code

The code is a bit long, so LET me explain it in turn:

  • getMilliSeconds(): a encapsulated method that gets the current value for milliseconds
  • func (w *Worker)NextID() (uint64,error)

Func (w *Worker)nextID() (uint64,error); There is also a particularly important place, lock, release lock, this step is very important.

  • func (w *Worker)nextID() (uint64,error)

Here is the actual generated ID code. There are several steps:

  • Get the current timestamp, and make sure that the current timestamp value is greater than the last time the ID was generated, otherwise there will be duplication.
  • If you want to wait, first get the id sequence number that has been generated for the current millisecond. You may not be getting it here, but it’s the equivalent ofif w.sequence++ > maxSequence

If the current millisecond has generated an overflow of ID sequence numbers, you need to wait for the next millisecond. If you do not wait, there will be a lot of duplication.

  • We set w.count to zero in the else. To explain, if the current time is not the same as the last time the worker generated ID, then we need to reset the sequence number of the worker generated ID.

  • The final step, the more important, adopted or operation, here is the bit in place of each part and the purpose of through a bitwise or operation (this is the ‘|’) integrating it. < < this is the role of the offset to the left to right, and the | operation is to integrate. In case you don’t understand, look at the following picture:

What do you think? You know what it means.

5. Test

Here I send 10000 Goroutine to generate ID, save it to map, see if there is any duplicate, see the code:

var wg sync.WaitGroup

func main(a)  {
	w := idgen.NewWorker(5.5)

	ch := make(chan uint64.10000)
	count := 10000
	wg.Add(count)
	defer close(ch)
	// Create a goroutine with a snowFlake ID
	for i:=0 ; i < count ; i++ {
		go func(a) {
			defer wg.Done()
			id,_ := w.NextID()
			ch <- id
		}()
	}
	wg.Wait()
	m := make(map[uint64]int)
	for i := 0; i < count; i++  {
		id := <- ch
		// If the map contains key id, the generated snowflake ID is duplicate
		_, ok := m[id]
		if ok {
			fmt.Printf("repeat id %d\n",id)
			return
		}
		// Store the id as the key to the map
		m[id] = i
	}
	// Generate snowflake ID successfully
	fmt.Println("All".len(m), "snowflake ID Get successed!")}Copy the code

Verification results:

All 10000 snowflake ID Get successed!
Copy the code

conclusion

Well, this article to the end here, the snowflake algorithm is not as difficult as we imagined, or very good to achieve, you learn? Did not learn it does not matter, download the source code, take a look, is not a problem for you.

GitHub:github.com/asong2020/g… Welcome star, thank you ~ ~ ~

Guys, remember what I said at the beginning? Interested in welcome to join you, we make progress together ~ ~ ~.

At the end of a small welfare to everyone, I recently read [micro service architecture design patterns] this book, it is very good, I also collected a PDF, there is a need to the guy can download. Access to: follow the public number: [Golang Dreamworks], the background reply: [microservices], you can get.

I have translated a Chinese GIN document, which will be maintained regularly. If you need a background reply [GIN], you can download it.

I am ASong, an ordinary program ape, let me slowly become strong together. I built one myselfgolangCommunication group, there is a need to add mevxI’ll pull you into the group. We’ll see you next time ~~~

Recommended previous articles:

  • Detailed Context package, read this article is enough!!

  • Go -ElasticSearch

  • Interviewer: Have you ever used “for-range” in go? Can you explain the reasons for these questions

  • Learning to wire dependency injection and cron scheduled tasks is as simple as that!

  • I heard you don’t know JWT and Swagger yet. – I’m not eating dinner and HERE I am with a practical project

  • Master these Go features, and you’ll be on your way up to the next level.

  • Go achieve multiplayer chat room, here you want to chat anything can be!!

  • GRPC Practice – Learning GRPC is that simple

  • Go library RPC practices

  • The latest Gin framework Chinese document asong picked up English again, diligently translation

  • Several thermal loading methods based on GIN

  • Boss: This guy doesn’t know how to use the Validator library yet