In the interview questions of big factory, there will be three modes of Redis cluster. I have communicated with several familiar friends about this question in private, but many of them did not give a comprehensive answer. Most people are familiar with the use of Redis, but they only know a little about this kind of thinking and design.

So, today we are going to talk about three models of Redis clustering.

Voice-over: In fact, this question has been asked quite frequently in Lai’s interview experience.

A master-slave mode

1. Architecture diagram

2. Cluster introduction

1) In master/slave mode, one Redis instance is used as the master and the other instances as the slave;

2) Master is used to support data write and read operations, while slave is used to support data read and master data synchronization;

3) Data in master and slave instances is identical throughout the schema;

3. Master/slave replication principle

  • Full amount of synchronization
  1. When the slave node starts, the SYNC command is sent to the master node.

  2. After receiving the SYNC command, the active node runs the command to save snapshots in the background to generate an RDB file and uses the buffer to record all write commands executed thereafter.

  3. After the snapshot is taken for the primary node, the snapshot file and all cache commands are sent to the secondary node in the cluster, and the write commands that are executed are recorded during the snapshot sending.

  4. The primary node sends the write command in the buffer to the secondary node after the snapshot is sent.

  5. After the snapshot file is loaded from the slave node, command requests are received to execute write commands to the buffer of the master node.

  • The incremental synchronization

When the secondary node is connected to the primary node again, data is lost due to network reasons. If conditions permit, the primary node resends lost data to the secondary node. Because the reissued data is much smaller than the full data, the high cost of full replication can be effectively avoided.

The principle of master/slave replication in the three modes is basically the same.

4. Troubleshooting methods of the primary node

In master-slave mode, each client is assigned an IP and port number when connecting to the Redis instance. If the connected Redis instance goes offline due to a failure, the client cannot be notified to connect to another client address, so you have to do this manually.

5. High availability is not supported

Obviously, the master-slave mode solves the problem of data backup well, but when the master node goes offline due to a failure and needs to manually change the client configuration to reconnect, this mode does not guarantee high availability of the service.

Thus the Redis cluster ushered in sentinel mode……

The guard mode

1. Architecture diagram

2. Cluster introduction

Unlike master-slave mode, sentinel mode adds a separate process (i.e., sentinels) to monitor every move in the cluster. When connecting to the cluster, the client first connects to the sentinel, through which the sentinel queries the address of the master node, and then connects to the master node for data interaction.

As shown in the following figure, if the master is abnormal, a master-slave switchover is performed to switch the optimal slave to the master node.

At the same time, the sentry continuously monitors the failed master node and, when it recovers, joins the cluster as a new slave node.

3. Master node fault handling mode/sentry working mode

  1. Each sentinel sends a PING command every second to the master, slave, and other sentinels in the cluster.

  2. If the time since the last effective ping reply exceeds a certain value, an instance is marked as subjective offline.

  3. If the master is marked as subjectively offline, other sentinels monitoring the master confirm that it is subjectively offline at a rate of one second, and when the number reaches a certain threshold, the master is marked as being offline, and the other slave servers are notified to modify their configuration files and switch hosts.

  4. If the master node fails, the client will ask the sentry again for the address and obtain the latest address of the master node.

4. Capacity expansion

Sentinel mode solves the problem that the master node is down and cannot switch over (i.e., highly available) in master/slave mode. However, as services grow, capacity expansion is inevitable.

There are two common expansion methods: Vertical expansion and horizontal expansion:

  • Vertical expansion: Increase capacity by adding master memory;
  • Horizontal capacity expansion: To expand capacity by adding a master node.

Although vertical capacity expansion is convenient and does not need to add redundant nodes, the capacity of the machine is limited. Therefore, horizontal capacity expansion is necessary. Horizontal expansion involves data migration, and service availability must be ensured during migration. Therefore, do not migrate data if possible.

Obviously, the Sentinel pattern does not meet this requirement. Hence the Redis Cluster.

Redis cluster mode

1. Architecture diagram

2. Cluster introduction

  1. The Redis Cluster mode is implemented without a central node, with each primary node maintaining connections to other primary nodes. The gossip protocol is used to exchange information between nodes, and each master node has one or more slave nodes.

  2. When the client connects to the cluster, it directly connects to each master node of the Redis cluster and stores the key in different hash slots according to the mold of hash algorithm.

  3. The redis cluster is divided into 16384 hash slots by data sharding in the cluster. As shown in the figure below, these hash slots are stored in three primary nodes:

  • Master1 is responsible for hash slots 0 to 5460
  • Master2 is responsible for hash slots 5461 to 10922
  • Master3 is responsible for hash slots 10922 to 16383

  1. Each node stores a data distribution table. Each node sends its slot information to other nodes, and the data distribution table is continuously transmitted between nodes.

  2. When a client connects to a cluster, the client uses the IP address of a node in the cluster to connect. When a client attempts to execute a command to this node, for example, to obtain a key value, the command can be executed successfully if the slot in which the key resides happens to be on the node. If the slot is not on the node, the node will return a MOVED error and tell the client about the node corresponding to the slot. The client can run commands on the node.

3. Troubleshooting methods of the primary node

In redis cluster, the fault handling method of the master node is similar to the sentinel mode. When a node cannot successfully complete the ping message communication with another node in the cluster within the agreed time, the node will be marked as the subjective offline state, and the information will be broadcast to the whole cluster.

If a node receives a node whose number of lost connections reaches the majority of the cluster, the node is marked as objectively offline and a FAIL message is broadcast to the cluster. Then perform a primary/secondary switchover on the faulty node immediately. After the original primary node recovers, it automatically becomes the secondary node of the new primary node. If the master node has no slave nodes, the cluster will be unavailable when it fails.

4. Capacity expansion

In Sentry mode, we have problems with capacity expansion, so how do we dynamically online a node in a cluster? How are hash slots allocated when a node is added to the cluster? When a new node is added to a cluster, it shakes hands with a node in the cluster. The node sends information about other nodes in the cluster to the new node through the Gossip protocol. The new node shakes hands with these nodes and joins the cluster.

Then each node in the cluster will take a part of the hash slot and allocate it to the new node, as shown below:

  • Master1 is responsible for the 1365-5460
  • Master2 is responsible for the 6827-10922
  • Master3 is responsible for the 12288-16383
  • Master4 is responsible for the 0-1364546-1-6826109-23-12287

When you want to delete a node from a cluster, you simply move all the hash slots in the node to the other nodes, and then remove the empty nodes that do not contain any hash slots.

About the author

Author: Hello, I’m Lewu, a brick remover from BAT. From a small company to dachang, I gained a lot along the way. I want to share these experiences with people in need, so I created a public account “IT Migrant Workers”. Regular updates, I hope to help you.