In go-Zero’s Distributed Caching System share, Kevin focused on the principles of consistent hash and practices in distributed caching. This article goes into detail about the principles of consistent hash and its implementation in Go-Zero.
Take storage as an example. In the overall microservice system, our storage cannot be said to be a single node.
- First, to improve stability, when a single node goes down, the entire storage system will be unavailable.
- The second is data tolerance. In the case of multiple nodes, the data of a single node is physically damaged, but the nodes are backed up unless the nodes that are mutually backed up are damaged simultaneously.
So the question is, in the multi-node case, which node should the data be written to?
hash
So essentially, we need a value that can be “compressed” into a smaller value that is typically unique and extremely compact, such as uint64:
- Idempotent: Every time you compute the hash with the same value, you must ensure that you get the same value
That’s what the hash algorithm does.
However, routes are routed using the hash algorithm, for example, key % N. If a node exits the cluster or the heartbeat communication is abnormal, a large amount of data will be redistributed to different nodes. When a node receives a new request, it needs to reprocess the logic that fetched the data: if it is in the cache, it can cause a cache avalanche.
In this case, the consistent Hash algorithm needs to be introduced.
consistent hash
Let’s take a look at how consistent Hash solves these problems:
rehash
First, solve a lot of rehashes:
As shown in the figure above, when a new node is added, only key31 is affected. After the new node is added (deleted), only data around the node is affected. Data on other nodes is not affected, thus solving the problem of node change.
That’s exactly what it is: monotonicity. This is why the Normal hash algorithm is not suitable for distributed scenarios.
Data skew
As you can see from the figure above, most of the keys are currently on Node 1. If the number of nodes is small, most keys can be concentrated on a node, and the problem found during monitoring is that the load between nodes is uneven.
To solve this problem, Consistent Hash introduces the concept of virtual Node.
Since the load is uneven, we artificially construct a balanced scene, but there are only so many nodes. Therefore, virtual nodes are used to divide regions, while the actual service nodes remain the same as before.
The specific implementation
Let’s start with Get() :
Get
Here’s how the implementation works:
- To calculate
key
The hash - Find the first match
virtual node
Index, and get the correspondingh.keys[index]
: Hash value of a virtual node - Corresponds to this
ring
To find a matchactual node
In fact, we can see that the ring gets a []node. This is because when computing virtual node hashes, hash conflicts may occur, and different virtual node hashes correspond to an actual node.
Nodes are one-to-many with virtual Nodes. And the ring inside is the following design:
This indicates the allocation strategy for consistent hash:
virtual node
As a range.key
To obtainnode
, based on the divisionvirtual node
As a boundaryvirtual node
throughhash
, ensuring that the keys allocated by different nodes are roughly uniform in the corresponding relationship. That isBreak up the binding- When a new node is added, multiple nodes are allocated
virtual node
. The new node can bear the pressure of multiple existing nodes. From a global perspective, it is easier to achieve load balancing during capacity expansion.
Add Node
If you look at Get, you can see the entire consistent hash design:
type ConsistentHash struct {
hashFunc Func / / the hash function
replicas int // Virtual node magnification factor
keys []uint64 // Stores the hash of the virtual node
ring map[uint64] []interface{} // Mapping between virtual nodes and actual nodes
nodes map[string]lang.PlaceholderType // Actual node storage
lock sync.RWMutex
}
Copy the code
Ok, so basically a consistent hash implementation is complete.
Specific code: github.com/tal-tech/go…
Usage scenarios
Consistent hashing can be widely used in distributed systems:
- Distributed caching. Can be found in
redis cluster
Build one on this storage systemcache proxy
, free control routing. This routing rule can then use a consistent hash algorithm - Service discovery
- Distributed scheduling task
All of the above distributed systems can be used in load balancing modules.
The project address
Github.com/tal-tech/go…
Welcome to Go-Zero and star support us!
Wechat communication group
Pay attention to the public account of “micro-service Practice” and click on the exchange group to obtain the QR code of the community group.