Redis cluster

Based on the previous analysis, we know that the single-machine Redis has many limitations, so we can use multiple machines to achieve partitioned storage, build larger databases, and meet higher business requirements.

Before we implement the master-slave replication, can achieve more than a master from architecture, but the abstract view, is actually a redis architecture, only one primary library implementation, speaking, reading and writing, not more master from structure, so we need to redis cluster scattered single server access pressure, load balancing, at the same time reduce the standalone storage limit, improve extensibility.

Single machine architecture

In Redis3.0, Redis Cluster is provided to meet the needs of cluster construction, which can be manually constructed, allocated nodes and nodes handshake, and then manually allocated slots.

The ruby script redis-trib.rb is also available. It is recommended to use this script for fast setup and management. It is worth mentioning that the single Redis itself has 0-15 databases, while each Redis node in the cluster only has DB0.

Cluster architecture

Data partition

If it is stand-alone storage, directly store the data in stand-alone Redis. But for clustered storage, you need to consider data partitioning.

Data partitioning is usually sequential and hash distributed.

Sequential distribution ensures data order, but low dispersion may result in high popularity of data in one partition and low popularity of data in other partitions, resulting in unbalanced access to partitions.

Hash distribution is also divided into a variety of distribution methods, such as regional hash partition, consistent hash partition and so on. Redis Cluster uses virtual slot partitioning.

Virtual Slot Partition

A Redis cluster consists of 0 to 16383 slots. Each slot maps a data subset and stores data in different slots using the hash function. Nodes in each cluster store some slots.

When each key is stored, the hash function CRC16(key) is used to obtain an integer. Then, the integer is mod to 16384 to obtain the value of slot. Then, the node is found and the data is stored in the corresponding slot.

Cluster communication

However, the process of finding a slot is not hit all at once. For example, the key above will be stored in slot 14396, but node3 is not locked immediately. It is possible to ask node1 first and then access Node3.

The communication between nodes in the cluster ensures that the node where the corresponding slot is located can be hit at most twice. Because in each node, information about the other nodes is saved to know which slot is responsible for which node. In this way, even if the slot is not hit in the first attempt, the client is notified of the node where the slot is located. In this way, accurate hits can be achieved when accessing the corresponding node.

  1. Node A sends A meet operation to node B, and when node B returns, it indicates that A and B can communicate.
  2. Node A sends the meet operation to node C. After C returns, A and C can also communicate with each other.
  3. Then B can find C based on his understanding of A, and establish A connection between B and C.
  4. Until all nodes are connected.

So each node can see which slots the other is responsible for.

The cluster scale

After a cluster is established, the number of nodes is not fixed. New nodes are added to the cluster or nodes in the cluster go offline. This is the capacity expansion or reduction of a cluster. However, because cluster nodes and slots are closely related, the scaling of a cluster also corresponds to the migration of slots and data

The cluster expansion

When a new node is ready to join the cluster, the new node is still an isolated node. There are two ways to join. One is to shake hands with isolated nodes by executing commands from cluster nodes, and the other is to add nodes using scripts.

  1. cluster_node_ip:port: cluster meet ip port new_node_ip:port
  2. redis-trib.rb add-node new_node_ip:port cluster_node_ip:port

Typically this new node has two identities, either as a master node or as a slave node:

  • Master node: Allocating slots and data
  • Slave node: Failover backup

The migration of the slot has the following steps:

Cluster shrinkage capacity

The process for bringing a node offline is as follows:

  1. Check whether the node has a slot. If the node does not have a slot, it moves to the next slot. Migrate the slot to another node first
  2. Notify other nodes (Cluster forget) to forget the offline node
  3. Stop services on offline nodes

Note that failover occurs if you take the master node offline first and then the slave node offline, so take the slave node offline first.

failover

** In addition to manually offline nodes, sudden failures can also occur. ** The following is mainly about the failure of the master node, because the failure of the slave node does not affect the work of the master node. The corresponding master node only remembers which slave node it is offline and sends the information to other nodes. After the faulty secondary node is reconnected, data on the primary node is copied again.

Only the primary node needs to failover. When we learned about master-slave replication earlier, we needed to use Redis Sentinel for failover. Redis Cluster does not need Redis Sentinel and has its own failover capability.

As we have learned before, nodes communicate with each other, exchanging messages through ping/ Pong, so faults can be found through this. Cluster node fault detection also has subjective offline and objective offline

Subjective offline

There is a fault list for each node, and the fault list maintains information received by the current node for all other nodes. Objective logoff is attempted when more than half of the host nodes holding slots have marked a node as subjective logoff.

Objective offline

failover

The cluster also has automatic failover, similar to the sentry, which is ready to “take over” the slave node of the failed node after an objective offline.

First is the qualification check, only qualified secondary nodes can participate in the election:

  • The disconnection time between all secondary node checks of the faulty node and the faulty master node
  • If the value exceeds cluster-node-timeout * cluster-slave-validati-factor(default: 10), the election qualification is disqualified

And then the order of preparing for the election. Nodes with different offsets have different order of participating in the election. The slave node with the largest offset has the highest election order and the highest election priority. The slave node with a lower offset delays the election.

When a secondary node participates in the election, the primary node receives the message and begins to vote. The node with the largest offset is more likely to get the most votes if it participates in the election first, which is called the primary node.

When the slave node becomes the master node, it is time to replace the master node:

  1. Run slaveof no one to change the slave node to master
  2. Allocate slots responsible for the faulty node to the node
  3. Broadcast a Pong message to other nodes in the cluster to indicate that failover is complete
  4. After the faulty node is restarted, it becomes the slave node of new_master

Read/write separation of the cluster

Slave nodes in cluster mode are read-only, which means that slave nodes in cluster mode reject any read/write requests. When a command attempts to obtain data from the slave node, the slave node redirects the command to the node that is responsible for the data slot.

Why read-only connection? This is because the slave can execute the command: readonly so that the request can be read from the node, but only for this connection. That is, when the client disconnects and restarts, the request becomes redirected again.

In cluster mode, read/write separation is more complex, and the relationship between slave nodes and slots of different primary nodes needs to be maintained.

It is generally not recommended to build read/write separation in clustered mode, but to add nodes to address the requirements. However, due to bandwidth problems caused by information communication between nodes, the official recommendation is no more than 1000 nodes.

Single machine? Cluster?

Cluster is not the best choice, but according to different business needs to determine whether to build a cluster. When small businesses can meet the needs by using single redis and Redis Sentinel, there is no need to build a cluster.

Clustering also has some limitations:

  • The key for redis transaction operations must be on a node
  • Batch operations such as Mset and MGET must be in the same slot
  • There is only one database, DB0
  • Some commands cannot cross nodes, such as scan, flush, and keys
  • Harder to maintain

Therefore, cluster construction should also consider whether single Redis can no longer meet the amount of business concurrency. On the premise that Redis Sentinel can also meet the high availability, and the concurrency is not saturated, cluster construction is gilding the lily.