In the above hashing process, we did not need to traverse all Redis servers to improve performance. However, there are some problems with using Hash caches, and all caches will change when the Redis server changes. For example, now that we have eight Redis cache servers, we have changed from hash(product.png) % 6 = 5 to hash(product.png) % 8 =? It’s definitely not going to be the original 5. Also, in a cluster of six servers, when one of the master and slave groups fails and cannot be cached, we need to remove the failed machines, so the modulus changes from 6 to 5. The formula that we calculate will also change.
Since the above hash algorithm uses modulus to cache, the hash consistency algorithm is born to avoid the above situation
Principle of consistent Hash algorithm
The consistent Hash algorithm also modulates the number of servers, whereas the consistent Hash algorithm modulates 32 squares of 2. That is, the consistent Hash algorithm organizes the entire Hash space into a virtual circle. The value space of the Hash function is 0 ~ 2^ 32-1 (a 32-bit unsigned integer). The entire Hash ring is as follows:
The entire ring is organized clockwise, with the point directly above the ring representing 0 and the first point to the right of 0 representing 1, and so on. In the second step, we Hash each server using Hash. Specifically, we can select the IP or host name of the server as the keyword for Hash, so that each server is determined in a position of the Hash ring. For example, we have three machines, and the position of each server in the ring space after using IP address Hash is shown in Figure 1-4:
Now, we use the following algorithm to locate data access to the corresponding server:
Calculate the Hash value of the data Key using the same function Hash, and determine the position of the data on the ring. From this position, search clockwise along the ring, and the server encountered is the server to which it should be located.
For example, there are ObjectA, ObjectB, ObjectC data objects hashed in the ring space as follows:
According to the consistency algorithm, Object -> NodeA, ObjectB -> NodeB, ObjectC -> NodeC
Data skew problem
When there are too few nodes serving the consistent Hash algorithm, it is easy to cause data skew due to uneven node distribution (most cached objects are cached on a certain server), as shown in Figure 1-8.
At this time, we find that A large amount of data is concentrated on node A, while node B has only A small amount of data. In order to solve the problem of data skew, the consistent Hash algorithm introduces the virtual node mechanism, that is, it computes multiple hashes for each server node and places a service node, called a virtual node, in each computed result position.