This is the 18th day of my participation in the August Genwen Challenge.More challenges in August
Consistency of the Hash
Uniform hash is a special hash algorithm. After using the consistent hash algorithm, a change in the slot number (size) of the hash table requires an average remapping of K/ N keywords, where K is the number of keywords and n is the number of slots. In traditional hash tables, however, adding or removing a slot requires almost all keywords to be remapped.
— Wikipedia
Application scenarios
Consistent hashing algorithm is a distributed hashing (DHT) implementation algorithm proposed by MIT in 1997. It is designed to solve the Hot spot problem on the Internet. Consistent hashing fixes the problem caused by the simple hashing algorithm used by CARP, making distributed hashing (DHT) really useful in P2P environments. Most commonly used in distributed Web services. Each service represents a node in the hash ring, and when that node changes, such as hanging, only part of the cache recalculates its distribution, not all of it invalidates.
Features are as follows:
- Less redundant
Load balancing
Smooth transition
- Storage is balanced
Keywords monotony
In a distributed cluster, adding and deleting a machine or automatically leaving the cluster after a machine fails are the most basic functions of distributed cluster management. If the common hash(object)%N algorithm is used, much of the original data cannot be found after machines add or delete it, which seriously violates the principle of monotonicity. Next, I will mainly explain how the consistent hash algorithm is designed:
The algorithm process
Suppose we currently have a cache cluster, where nodes represent cluster nodes.
Here is a hash ring, ranging from 0 to 2^32, where all values are hashed (e.g., using hashcode).
Steps to read the cache:
- Compute the node hash and map it to the range 0 to 2^32
- When the cache is read, the hash value corresponding to the cache key is computed and mapped to the same circle
- Find the circle clockwise, the first node encountered
- The cache is read on it, and if not, the cache is updated
Update cache steps: Similar to “Read cache”
When a node fails, or a new node is added, only part of the cache changes, as shown above.
Data skew problem
If there are too few service nodes, let’s say in the figure above, there are only node1 and node3, then most of the cache will fall on Node1.
Solution: Generate random nodes for each node.
For example, node1 generates node1-1 and node1-2. Node2 generates node2-1 and node2-2.
Google consistent hashing
Jump Consistent Hash is a consistent hash algorithm that consumes zero memory, is evenly distributed, is fast, and has only 5 lines of code.
int32_t JumpConsistentHash(uint64_t key, int32_t num_buckets) {
int64_t b = - 1, j = 0;
while (j < num_buckets) {
b = j;
key = key * 2862933555777941757ULL + 1;
j = (b + 1) * (double(1LL << 31) / double((key >> 33) + 1));
}
return b;
}
Copy the code
Code implementation
In nodejs, use hashring.js.
const HashRing = require('hashring');
// The weight of the node is passed, which affects the number of virtual nodes
const ring = new HashRing({
'127.0.0.1:11211': 200.'127.0.0.2:11211': { weight: 200 }, // same as above
'127.0.0.3:11211': 3200
});
console.log('>>> Hit node ', ring.get('your own ip'))
Copy the code