“Consistent hash is designed to address distributed caching, not only to hash, but also to move data as little as possible when a server goes down. It is therefore widely used for routing functions in state services.”

01 Routing algorithm of distributed system

Suppose you have a push message system with the following simple architecture

The device access layer not only receives device login and logout status commands, but also pushes developer messages to devices. At this time, the device access layer needs to maintain the status information of the device (of course, you can separate a state service to maintain these information, requiring that this part must have less code update, the specific reason to think about oh =_=). At this time, each server in the device access layer retains a cache of status information of a batch of devices. Which server should the device connect to to obtain data, and to which server should the messages of the middle layer be pushed? So this is a consistent hash algorithm.

02 What is the Consistent Hash Algorithm

Consistent hash consists of objects, resources, algorithms, and machines. What it does is: the object uses an algorithm to determine which machine to connect to. In the preceding system, the device ID (userID) is the object. Its corresponding state data (cache) is a resource. A server is a machine.

  


In a consistent hash algorithm, these resources form a closed loop, and each machine holds a resource segment, each resource segment corresponds to a batch of objects/devices. In this way, if a machine dies, its corresponding resources are transferred to the nearest machine X, and the device corresponding to the Dead Server is connected to machine X.

Now assume that the devices corresponding to these four resource segments have a large difference in activity. For example, the devices corresponding to resource segments 1 and 2 are particularly active, while resource segments 3 and 4 have little activity. In this way, machine 1-2 needs to store a large number of state data, while machine 3-4 has a large number of vacant, which is obviously unreasonable. The improved consistent hash algorithm works like this: instead of each machine holding a contiguous resource segment, each machine holds a partial resource segment of multiple regions. For example, machine 1 stores 1/4 of each resource segment, machine 2 stores 1/4 of each resource segment, and machine 3 and 4 do the same. In this way, even if there are hot spots in individual number segments, they will be evenly distributed to different machines.

  


03 Application of Consistent Hash in the system

The concept and improvement of consistent hash have been introduced above. In system practice, we have a large number of users, often more than one cluster. Here’s how we use consistency hashing:

First, select the corresponding cluster according to the number segment, which is configurable

After the cluster is determined, the device is matched to an instance of the server based on the consistent hash (multiple device access layer instances are deployed on each server (1). Each instance holds more scattered state information; 2. Service gc issues will be alleviated)

Create a virtual node for the machine: reverse the order of users (scramble the previous consecutive userids) to form a new resource segment. The virtual server node is created

Record the number of devices that each server locks. If machine A fails, select the machine with the least number of devices to accept kicked-device

04 Not all cases are suitable for consistent hash

The principles and practices of consistent Hash have been introduced above, but not all services are suitable for routing with consistent Hash. For example, in the message push system in Section 01, the middle layer is stateless. The developer access layer can request any machine of Cluter-A, and it can send the message asynchronously to MQ after completing basic verification without waiting for the result to return directly. The device access layer is stateful and cannot tolerate high latency. Therefore, it is better to select server-instance with consistent Hash and communicate through TCP/UDP.