In the differences between Redis syCN and Psync, we talked about read/write separation, but we haven’t talked about failover yet, this time we’ll talk about a Redis high availability solution
Redis sentry
What is sentry: When we first started writing projects, we probably understood the variables that said please write in the development documentation literally. What is a sentry in its literal sense? How come there’s no one watching over M78, and Beria’s infiltrated it,
The sentinel is a warning before an invasion, but the Sentinel in Redis is not a warning but the Ultraman who can defeat Beria. For example, if the master and slave master are disconnected, the slave can be promoted to the master role and continue to provide external services to achieve high availability.
Simple sentry flow chart:
In the above image: The sentinel mode we deployed above is the classic three Sentinels mode, but why three sentinels is the classic sentinel mode? What about two sentinels?
Speaking of which: let’s take a look at the sentry’s auto-switch mechanism
Subjective versus objective outages (meaning not understood is explained below)
There are sDOWN and ODOWN failure states in the sentinels of Redis. Sdown is subjective outage, and if a sentinel thinks a master is down, it is subjective outage; ODOWN is objective outage, and if the sentinels of quorum think a master is down, it is objective outage.
Sdown does something very simple: if a sentry pings a master for more than the number of milliseconds specified for IS-IS master-down-after-milliseconds, the master is down. The condition of sDOWN to ODOWN conversion is very simple. If a sentry receives a quorum quantity within a specified period of time and other sentries consider the master to be SDown, the master is considered to be ODown.
Automatic discovery mechanism for sentinel cluster
Sentries communicate with each other through redis’ PUB/SUB system. Each sentry sends a message to the subscription channel, and all other sentries can consume the message and become aware of the presence of other sentries. Every two seconds, each sentinel will send a message to a subscription channel corresponding to master+ Slaves that they monitor about their host, IP and RUNId as well as the monitoring configuration of the Master. Each sentinel will also monitor the subscription channel corresponding to each master+ Slaves that they monitor and then perceive the presence of other sentinels who are also monitoring this master+ Slaves. Each sentry also exchanges master monitoring configurations with other sentries to synchronize the monitoring configurations with each other.
Automatic correction of slave configurations
The sentry is responsible for automatically correcting some configurations of the slave. For example, if the slave is to become a potential master candidate, the sentry ensures that the slave is copying data from the existing master. If the slaves are connected to the wrong master, such as after a failover, then the sentinels ensure that they are connected to the correct master.
The quorum and majority
Each time a sentinel switches master/standby, the quorum sentinels must first consider the switch odown and then elect a sentinel to do the switch, which must also be authorized by the majority. If quorum < majority, such as five sentinels, majority is 3, and quorum is set to 2, then three sentinels authorization can perform the switch. However, if quorum >= majority, then all sentinels of the quorum number must be authorized, such as five sentinels, and quorum is 5, then all five sentinels must agree on authorization before the switch can take place.
I think we have an answer to the above question, right?
Why not set up a cluster of two sentinels?
For example, the minimum quorum =1, majority is larger than quorum, so it must be 2, and failover requires the consent of the majority of sentinels. Why did I raise this question here? You won’t know what the interviewer is going to do, so it doesn’t hurt to find out.
I also talked about the automatic discovery mechanism of the Sentinel cluster. This mechanism is an interview question of Ali:
Can you tell me how the Redis sentries communicate?
Communication between clusters is achieved through Redis subscription publishing
Expanded content:
The heartbeat detection
How do we know the master is still alive? Our sentry will send an info command to get a message from the master and keep a heartbeat check with the master
When a failure occurs, you need to start the fail-over mechanism immediately, so how do you ensure timeliness?
Each sentry node sends the ping command to the master, slave, and other sentries every second. If the master, slave, and other sentries respond within the specified time, the sentry node is healthy and alive. The sentinel considers the node to be offline if it does not respond within the specified time (configurable).
Elect sentry leaders
After confirming that the master node is faulty, you need to start the fault recovery phase. How to perform fault recovery also requires a series of processes.
The first step is to elect a sentinel leader who will be responsible for fail-over operations without having multiple sentinels involved. The process of electing sentry leaders requires the consultation of multiple sentry nodes.
This process of election negotiation is called consensus in the distributed domain, and the algorithm of negotiation is called consensus algorithm.
Consensus algorithm is mainly to solve how to reach a consensus result for a certain scene in distributed scenario.
There are many types of consensus algorithms, such as Paxos, Raft, Gossip, etc.
The sentry process for choosing a leader is similar to the Raft algorithm, which is simple enough to understand.
The process is as follows:
- Each sentry sets a random timeout period, after which it sends a request to the other sentries to become a leader
- Other sentries can only reply to the first request they receive
- First reach the sentry node of the majority of confirmed votes and become the leader
- If, after confirmation of replies, all sentries fail to reach a majority, a new election is held until a leader is chosen
After the Sentinel leader is selected, subsequent fail-over operations are performed by the sentinel leader.
Select a new master
The sentinel leader needs to select one of its slave nodes to replace the master node when it fails.
The process of selecting a new master also has a priority. In the scenario of multiple slaves, the priority of the new master is as follows: slave-priority configuration > Data Integrity > RUNID lower.
In other words, the slave node with the minimum slave-priority is selected first. If the configuration is the same for all slaves, the slave node with the most complete data is selected. If the data is the same, the slave node with the smaller RUNID is selected last.
Promote the new master
After prioritizing and selecting alternate master nodes, the next step is to perform a true master/slave switch.
The Sentinel leader sends the slaveof no one command to the alternate master node to make it master.
The sentinel leader then sends the slaveof $newmaster command to all slaves of the failed node, making them the slave nodes of the newmaster to begin synchronizing data from the newmaster.
Finally, the sentinel leader demotes the faulty node to a slave and writes it into his own configuration file. After the faulty node recovers, it automatically becomes the slave of the new master node.
At this point, the whole failover is complete.
The client senses the new master
Finally, how does the client get the latest master address?
After a failover, the sentinel writes a message to its node’s specified Pubsub, which clients can subscribe to to be notified of changes to the master. Our client can also retrieve the latest master address by actively querying the current master address at the Sentinel node.
In addition, Sentry provides a “hook” mechanism. You can also configure some script logic in the Sentry configuration file to trigger the “hook” logic when the failover is complete, notifying the client that the failover has occurred, and allowing the client to retrieve the latest master address on sentry again.