preface

After lastIllustration of Redis Sentinel modeThe entire Redis high availability solution remainsClusterCluster mode (collectively referred to laterCluster(No.ClusterThe plan is to use two related articles, the first one is today’s aboutClusterMode ofhash slotAlgorithm. So let’s basically figure out how this algorithm worksClusterThat’s pretty much it. About thishash slotWill pass the basichashAlgorithm, consistencyhashAlgorithm tohash slotThe basic idea of the algorithm through the way of drawing a good brother better understand. If you think the picture is good, please click “like” and “follow” (it is still difficult to draw a picture, sometimes it takes a whole morning).

An overview of the

I do not know the good brothers have seen or hands-on practice of the database of the sub-database sub-table, I believe that the good brothers have some understanding of the database sub-database sub-table (no words first Baidu see about), when doing sub-database or sub-table need to consider a problem is the distribution of data using what way. Common data distribution methods include hashing (redundant hash, consistent hash), by data range, and by data volume.

Of course, we will not compare the application scenarios and advantages and disadvantages of various data distribution methods in detail here. In Cluster, hash partition rules are selected for data distribution. There are many ways to implement hash partitioning. Here we mainly illustrate node redundancy hash, consistency hash, and virtual slot hash.

Take more than a hash

Take more than a hashIt uses specific data, for exampleThe user Id, mobile phone number, and then the number of nodesNBy formula:hash(key) % NCompute the hash value. The diagram below:



Let’s say we have three Redis master servers that we want to storeThe user idfor10The data. At this time, our formula is:hash(10) % 3 = 1(hash(10) = 10), we can locate the userThe user idfor10Data stored inmaster1In the server node. When we get the data, we can locate it by the formulamaster1Master nodes, which greatly improves performance by eliminating the need to traverse all servers.

Advantages and disadvantages

advantage

The outstanding advantage of this method is simplicity, and it is often used in the rules of database partition and table partition, generally using the way of pre-partition. Plan the number of partitions based on the data volume, for example, 512 or 1024 tables, to ensure that the data volume can be supported for a period of time. Migrate the tables to other databases based on the load. Capacity expansion is usually doubled to avoid full migration caused by data mapping disruption.

disadvantage

The disadvantages of this approach are also obvious. Firstly, as shown in the figure above, when a network exception or downtime occurs on a master node, the number of available servers in the whole cluster environment will change (from three to two). As a result, the entire cluster is unavailable because the calculation formula involved has changed, resulting in data failure and confusion.

On the other hand, scaling the entire cluster environment horizontally (adding master nodes) is also very unfriendly, requiring re-mapping of old data to nodes.

Consistency of the hash



Consistency of the hashThat’s what the algorithm is forTake more than a hashThe main reason for the first failure is that the cardinality of the mod is dynamically changing according to the number of nodes in the cluster!Consistency of the hashThe algorithm fixed the base of the mod to a constant 2^32^, so that it can be invariant. Thus, the mod range is controlled within the interval of [0,2^32^-1], and this interval is evenly distributed on a ring in a clockwise direction, which is calledHash ring(See picture above).

Secondly, since the consistent hash algorithm fixes the base of mod to a constant 2^32^, there is no problem related to horizontal expansion.

As shown in the figure above, when we need to query the data, the hash value calculated is 10(the first key of N1 is clockwise in the figure above). In this case, our addressing rule is to find the first data node that is greater than or equal to the hash value clockwise according to the circle.

Advantages and disadvantages

advantage

The first is to solve the problem of data failure, and the second is that there is no need to consider the problem of horizontal expansion. Good fault tolerance and scalability.

disadvantage

Do you guys think this is already perfect? Of course not, if the N2 node in the figure above is down, would it be possible to hash data that falls clockwise from N1 to N3 using this node? This results in uneven data distribution (data skew).

That is to say, when a small number of nodes are used, node changes will greatly affect the data mapping in the hash ring, so this approach is not suitable for the distributed scheme with a small number of data nodes. When adding or removing nodes, you need to double or subtract half of the nodes to ensure data and load balance.

Consistent hash + virtual node



This one makes a lot of sense,Consistency of the hashWhat is the main problem of unbalanced data and load? That is less nodes, so in order to solve this problem, you are not less nodes, add to me, but I do not have so many servers ah how to do. Ah, I give you the entire virtual node out of the line soon, the server or the same few, is not to solve this problem. As a result, some virtual nodes of actual nodes appear as shown above.

The hash value is clockwise to find a virtual node of N3, and to use the N3 server to provide services.

Advantages and disadvantages

The main advantage is to solve the problem of consistent hash data skew and small number of nodes. Disadvantage words don’t pick bones inside the egg, this is already very perfect ok (mainly I don’t know, have a good brother to tell me in the comment section).

Virtual slot hash

Whether it’s a redundant hash, a consistent hash, or a consistent hash plus a virtual node, it’s all addressed from the server’s perspective (using the hash algorithm to determine which server to use).

What is a virtual slot? Virtual slots make clever use of hash space, using well-distributed hash functions to map all data into a fixed range of integers defined as slots. The range is generally larger than the number of nodes. For example, the range of Cluster slots is 0 to 16,383. A slot is the basic unit for data management and migration in a cluster. The main purpose of adopting a wide range of slots is to facilitate data splitting and cluster expansion. Each node is responsible for a certain number of slots.

For example, we have the image belowMaster1,Master2,Master3Three servers, all contained in each server16383%3 = 5461Groove (Master1:0-5461..Master2:5462-10922..Master3:10922-16383.). When we need to gethash100Theta is going to fall at this timeMaster1On this server. So this approach to the virtual slot is to get the server from a data () perspective.

Advantages and disadvantages

advantage

  1. Decouple the relationship between data and nodes, making it easier to expand and shrink nodes (just move slots).
  2. The node maintains slot mapping, and does not require the client or proxy service to maintain slot partition metadata.
  3. Supports query of mappings between nodes, slots, and keys for data routing and online scaling.
  4. In a sense there is no data skew.

disadvantage

  1. Special services need to be processed in some scenarios. For example, to view data in sequence, the data needs to be stored on the same service.
  2. Batch operations are not supported in this mode because batch data may exist on multiple services.

conclusion

This article focuses on the understanding of redundant hash, consistent hash and cluster addressing algorithm virtual slot hash. To be fair, this is often asked in the interview, as long as the Redis high availability solution is asked this is a must ask. Of course, understanding this is useful for the entire Redis Cluster pattern. Feel a harvest of good brother point a like plus a concern!

That’s the end of this issue. Welcome to leave your comments in the comments sectionAsk for attention, ask for likes

Illustrated Redis Sentinel model