Previously introduced “advanced Redis data persistence RDB and AOF” and “advanced Redis Sentinel principle and combat”, this time to understand the clustering function of Redis, as well as the principle of hash sharding.

Cluster sharding mode

If Redis only uses the replication function as the master and slave, when the data volume is huge, it may not be able to withstand a single piece of data, not to mention the master and slave need to save a complete piece of data respectively. In this case, data sharding is a very good solution.

Redis’ Cluster is designed to solve this problem. It provides two main functions:

  1. Automatic data fragmentation, falling on each node
  2. You can continue processing commands even if some nodes in the cluster fail or fail to connect

For the second point, its functionality is somewhat similar to a failover by Sentienl (see previous Sentinel articles), which I won’t go into here. The following is a detailed understanding of the slot sharding principle of Redis. Before that, we first understand the distributed simple hash algorithm and consistent hash algorithm to help understand the role of slots.

Simple hash algorithm

Suppose there are three machines, on which machine is the algorithm of data falling

  c = Hash(key) % 3
Copy the code

For example, if the hash value of key A is 4, 4%3=1, it falls on the second machine. If the Key ABC hash value is 11,11 %3=2, it falls on the third machine.

Using this algorithm, suppose that the amount of data is now too large, need to add a machine. A used to be on the second machine, but now according to the algorithm 4%4=0, it is on the first machine, but there is no value of A on the first machine. Such an algorithm can cause an avalanche of cache penetration as machines are added or subtracted.

Consistent hashing algorithm

In 1997, Karger et al. from MIT proposed the consistent hashing algorithm to solve the problem of distributed cache.

In the consistent hash algorithm, the entire hash space is a virtual circle

Suppose there are four nodes, Node A, Node B, Node C, and Node D. After hashing their IP addresses, their positions are as follows

There are four storage objects: Object A, B, C, and D. After hashing the Key, their positions are as follows

Consistent hashing is something like that, but how fault tolerant and scalable is it?

Suppose Node C hangs and the storage of Object C is lost, then the latest Node it finds clockwise is Node D. If Node C fails, only the data from Node B to Node C will be affected, and this data will be transferred to Node D for storage.

Similarly, suppose there is a large amount of data and a Node, Node X, needs to be added. The location of Node X is directly between Node B and Node C, so only the data between Node B and Node X will be affected and will be re-dropped onto Node X.

So consistent hashing algorithm has very good support for fault tolerance and extensibility. However, consistent hashing algorithm also has a serious problem, that is, data skew.

If there are too few nodes in a sharded cluster and they are not evenly distributed, the consistent hash algorithm will have too much data on some nodes and too little data on some nodes. That is, you cannot control the allocation of storage data to nodes. As shown in the figure below, most of the data are on A, while B has less data.

Hash slot

Instead of using the above consistent hash, Redis clusters use the concept of hash slots. The main reason is that, as mentioned above, consistent hash algorithm is not very friendly to the control of data distribution and node position.

First of all, hash slots are actually two concepts, the first one is the hash algorithm. The Redis Cluster hash algorithm is not simple hash(), but crC16, a verification algorithm.

The other one is the concept of slots, the rules of space allocation. In fact, the essence of a hash slot is very similar to a consistent hash algorithm, but the difference is the definition of a hash space. The space of consistent hash is a ring, and the node distribution is based on the ring, so the data distribution cannot be well controlled. Slot space in the Redis Cluster is customized, similar to the concept of Windows disk partitions. This partition can be customized size, customized location.

The Redis Cluster contains 16384 hash slots. Each Key will be assigned to a specific slot after calculation. Users can define which slot belongs to a storage node. For example, a machine with a small hard disk can be allocated fewer slots, while a machine with a large hard disk can be allocated more slots. If all disks on the nodes are the same, the disks can be evenly distributed. So the hash slot is a good solution to the problem of consistent hashing.

In addition, in terms of fault tolerance and extensibility, representation and consistency hash are transferred to the affected data. In essence, a hash slot transfers the slot responsible for a faulty node to another normal node. Expansion nodes are also used to transfer slots on other nodes to the new node.

It is important to note, however, that the Redis cluster does not automatically move and dispatch slots, but manually configure them. Therefore, the high availability of a Redis cluster depends on the master/slave replication and automatic failover between the master/slave nodes.

The cluster structures,

The following uses the simplest example to introduce how to set up a Redis cluster and how to allocate slots to deepen the understanding of the principles and concepts of Redis cluster, excluding the content of high availability master/slave replication level transfer.

Redis. Conf configuration

Go to redis.conf and enable cluster.

Cluster-enabled yes Cluster-enabled yes is disabled by default. To enable the cluster and make Redis a part of the cluster, you need to manually enable it.

Then configure the cluster configuration file

To facilitate the setup, all the Redis instances are on the same machine, so after changing the different cluster config names, three copies of the redis.conf configuration are used to start three cluster instances (the cluster requires at least three primary nodes).

The cluster correlation

  > redis-server /usr/local/etc/redis/redis-6379.conf --port 6379 &
  > redis-server /usr/local/etc/redis/redis-6380.conf --port 6380 &
  > redis-server /usr/local/etc/redis/redis-6381.conf --port 6381 &
Copy the code

The ampersand is used to make commands run in the background, but the log of program execution is still printed in the console. You can also configure Redis to run in the background by deamonize yes in redis.conf.

Connect to the Redis instance 6379 and view the cluster scope through Cluster Nodes.

On 6379, use the cluster meet command to establish a link with 6380 and 6381.

127.0.1:6379 > cluster meet 127.0.0.1 6380 127.0.1:6379 > cluster meet 127.0.0.1 6381Copy the code

Slot allocation

Run the cluster info command to check the cluster status

Then allocate slots 0 to 5000 to 6379, 5001 to 10000 to 6380, and 10001 to 16,383 to 6381.

> redis-cli -c -p 6379 cluster addslots {0.. 5000} > redis-cli -c -p 6380 cluster addslots {5001.. 10000} > redis-cli -c -p 6381 cluster addslots {10001.. 16383}Copy the code

Take a look at cluster Info

Effect of the test

Log on to any instance and add the -c parameter to enable the cluster mode client, otherwise it will not run properly.

  redis-cli -c -p 6380
Copy the code

Try the set and get operations

Use the Cluster keyslot command to calculate the slot where the key is located, and then figure out the node where the key is switched to.


More technical articles, wonderful dry goods, please pay attention to the blog: Zackku.com wechat official number: Zack said code