preface
After lastIllustration of Redis Sentinel modeThe entire Redis high availability solution remainsCluster
Cluster mode (collectively referred to laterCluster
(No.Cluster
The plan is to use two related articles, the first one is today’s aboutCluster
Mode ofhash slot
Algorithm. So let’s basically figure out how this algorithm worksCluster
That’s pretty much it. About thishash slot
Will pass the basichash
Algorithm, consistencyhash
Algorithm tohash slot
The basic idea of the algorithm through the way of drawing a good brother better understand. If you think the picture is good, please click “like” and “follow” (it is still difficult to draw a picture, sometimes it takes a whole morning).
An overview of the
I do not know the good brothers have seen or hands-on practice of the database of the sub-database sub-table, I believe that the good brothers have some understanding of the database sub-database sub-table (no words first Baidu see about), when doing sub-database or sub-table need to consider a problem is the distribution of data using what way. Common data distribution methods include hashing (redundant hash, consistent hash), by data range, and by data volume.
Of course, we will not compare the application scenarios and advantages and disadvantages of various data distribution methods in detail here. In Cluster, hash partition rules are selected for data distribution. There are many ways to implement hash partitioning. Here we mainly illustrate node redundancy hash, consistency hash, and virtual slot hash.
Take more than a hash
Take more than a hash
It uses specific data, for exampleThe user Id
, mobile phone number, and then the number of nodesN
By formula:hash(key) % N
Compute the hash value. The diagram below:
Let’s say we have three Redis master servers that we want to storeThe user id
for10
The data. At this time, our formula is:hash(10) % 3 = 1
(hash(10) = 10
), we can locate the userThe user id
for10
Data stored inmaster1
In the server node. When we get the data, we can locate it by the formulamaster1
Master nodes, which greatly improves performance by eliminating the need to traverse all servers.
Advantages and disadvantages
advantage
The outstanding advantage of this method is simplicity, and it is often used in the rules of database partition and table partition, generally using the way of pre-partition. Plan the number of partitions based on the data volume, for example, 512 or 1024 tables, to ensure that the data volume can be supported for a period of time. Migrate the tables to other databases based on the load. Capacity expansion is usually doubled to avoid full migration caused by data mapping disruption.
disadvantage
The disadvantages of this approach are also obvious. Firstly, as shown in the figure above, when a network exception or downtime occurs on a master node, the number of available servers in the whole cluster environment will change (from three to two). As a result, the entire cluster is unavailable because the calculation formula involved has changed, resulting in data failure and confusion.
On the other hand, scaling the entire cluster environment horizontally (adding master nodes) is also very unfriendly, requiring re-mapping of old data to nodes.
Consistency of the hash
Consistency of the hash
That’s what the algorithm is forTake more than a hash
The main reason for the first failure is that the cardinality of the mod is dynamically changing according to the number of nodes in the cluster!Consistency of the hash
The algorithm fixed the base of the mod to a constant 2^32^, so that it can be invariant. Thus, the mod range is controlled within the interval of [0,2^32^-1], and this interval is evenly distributed on a ring in a clockwise direction, which is calledHash ring
(See picture above).
Secondly, since the consistent hash algorithm fixes the base of mod to a constant 2^32^, there is no problem related to horizontal expansion.
As shown in the figure above, when we need to query the data, the hash value calculated is 10(the first key of N1 is clockwise in the figure above). In this case, our addressing rule is to find the first data node that is greater than or equal to the hash value clockwise according to the circle.
Advantages and disadvantages
advantage
The first is to solve the problem of data failure, and the second is that there is no need to consider the problem of horizontal expansion. Good fault tolerance and scalability.
disadvantage
Do you guys think this is already perfect? Of course not, if the N2 node in the figure above is down, would it be possible to hash data that falls clockwise from N1 to N3 using this node? This results in uneven data distribution (data skew).
That is to say, when a small number of nodes are used, node changes will greatly affect the data mapping in the hash ring, so this approach is not suitable for the distributed scheme with a small number of data nodes. When adding or removing nodes, you need to double or subtract half of the nodes to ensure data and load balance.
Consistent hash + virtual node
This one makes a lot of sense,Consistency of the hash
What is the main problem of unbalanced data and load? That is less nodes, so in order to solve this problem, you are not less nodes, add to me, but I do not have so many servers ah how to do. Ah, I give you the entire virtual node out of the line soon, the server or the same few, is not to solve this problem. As a result, some virtual nodes of actual nodes appear as shown above.
The hash value is clockwise to find a virtual node of N3, and to use the N3 server to provide services.
Advantages and disadvantages
The main advantage is to solve the problem of consistent hash data skew and small number of nodes. Disadvantage words don’t pick bones inside the egg, this is already very perfect ok (mainly I don’t know, have a good brother to tell me in the comment section).
Virtual slot hash
Whether it’s a redundant hash, a consistent hash, or a consistent hash plus a virtual node, it’s all addressed from the server’s perspective (using the hash algorithm to determine which server to use).
What is a virtual slot? Virtual slots make clever use of hash space, using well-distributed hash functions to map all data into a fixed range of integers defined as slots. The range is generally larger than the number of nodes. For example, the range of Cluster slots is 0 to 16,383. A slot is the basic unit for data management and migration in a cluster. The main purpose of adopting a wide range of slots is to facilitate data splitting and cluster expansion. Each node is responsible for a certain number of slots.
For example, we have the image belowMaster1
,Master2
,Master3
Three servers, all contained in each server16383%3 = 5461
Groove (Master1
:0-5461.
.Master2
:5462-10922.
.Master3
:10922-16383.
). When we need to gethash
为100
Theta is going to fall at this timeMaster1
On this server. So this approach to the virtual slot is to get the server from a data () perspective.
Advantages and disadvantages
advantage
- Decouple the relationship between data and nodes, making it easier to expand and shrink nodes (just move slots).
- The node maintains slot mapping, and does not require the client or proxy service to maintain slot partition metadata.
- Supports query of mappings between nodes, slots, and keys for data routing and online scaling.
- In a sense there is no data skew.
disadvantage
- Special services need to be processed in some scenarios. For example, to view data in sequence, the data needs to be stored on the same service.
- Batch operations are not supported in this mode because batch data may exist on multiple services.
conclusion
This article focuses on the understanding of redundant hash, consistent hash and cluster addressing algorithm virtual slot hash. To be fair, this is often asked in the interview, as long as the Redis high availability solution is asked this is a must ask. Of course, understanding this is useful for the entire Redis Cluster pattern. Feel a harvest of good brother point a like plus a concern!
That’s the end of this issue. Welcome to leave your comments in the comments sectionAsk for attention, ask for likes
Illustrated Redis Sentinel model