Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

Based on the master-slave replication above, what if the note node fails? In Redis master-slave cluster, the sentinel mechanism is the key mechanism to realize the automatic switch between master and slave libraries, which effectively solves the problem of failover in master-slave replication mode.

1. An overview of the

Redis Sentinel, or Redis Sentinel, was introduced in Redis 2.8. The core function of Sentry is automatic failover of the master node.

The following is a logical diagram for a typical sentinel cluster monitoring:

What does sentry do? Here’s how the Redis documentation describes it:

  • Monitoring: The sentry continuously checks whether the master and slave nodes are functioning properly.
  • Automatic failover: When the master node does not work properly, the Sentry starts an Automatic failover operation by upgrading one of the slave nodes of the failed master node to the new master node and making the other slave nodes replicate the new master node instead.
  • Configuration Provider: During initialization, the client connects to the sentinel to obtain the address of the current Redis service master node.
  • Notification: The sentry can send the result of a failover to the client.

Among them, the monitoring and automatic failover function, so that the sentry can detect the master node failure and complete the transition; Configuring the provider and notification functions is reflected in the interaction with the client.

2. Sentry mechanism principle

2.1 Establishment of sentinel cluster

How is the Sentinel cluster in the picture above organized? Sentinel instances can discover each other thanks to Redis’ pub/sub, or publish/subscribe mechanism.

In a master-slave cluster, there is a channel on the master library called __sentinel__: Hello, through which sentinels find each other and communicate with each other. In the figure below, Sentry 1 posts its IP (172.16.19.3) and port (26579) to the __sentinel__: Hello channel, to which Sentry 2 and 3 subscribe. At this point, Sentry 2 and 3 can get sentry 1’s IP address and port number directly from this channel. Sentries 2 and 3 can then establish a network connection with Sentry 1.

In this way, Sentinels 2 and 3 can also establish a network connection, thus forming a sentinel cluster. They can communicate with each other via a network connection, for example, to judge and negotiate whether or not the main library is offline.

2.2 Sentinels monitor the Redis library

  • What does the sentry monitor?

This is done by sentry sending the INFO command to the main library. As shown in the figure below, Sentry 2 sends the INFO command to the master library, which, upon receiving the command, returns the list of slave libraries to sentry. The sentry can then establish a connection to each slave library based on the connection information in the slave library list and continuously monitor the slave library from that connection. Sentinel 1 and sentinel 3 can connect to the slave library in the same way.

  • Regular monitoring by sentries

Task 1: Each sentinel node will send the info command to the master node and the slave node every 10 seconds to obtain the topological structure diagram. During sentinel configuration, only the monitoring of the master node is required. By sending info to the master node, the information of the slave node can be obtained, and the new slave node can be immediately sensed when it joins

Task 2: Each data every 2 seconds to redis sentinel node node on the specified channel of sending the sentinel node for the judgment of the master node and the current of the sentinel node information, and each of the sentinel node can subscribe to the channel, to get to know other sentinel node information and judgment of the master node is done through the publish and subscribe

Task 3: Every second, each sentry sends a ping command to the master node, the slave node and other sentry nodes for heartbeat detection, which is also an important basis for the sentry to judge whether the node is normal

2.3 Determination of main library offline

How does sentry tell if the main library is offline?

First, understand two concepts: subjective and objective logoff

  • Subjective logoff: Subjective logoff refers to the fact that a single sentinel considers a service to be offline (possibly due to failure to receive subscriptions, network failure, etc.).

Sentinel sends a ping command once per second to all instances (master, slave, other sentinels) with which it has established a command connection, and determines if the instance is online (” subjectively online “for this sentinel) by determining whether the ping reply is valid or invalid.

If an instance returns an invalid response before down-after-milliseconds, sentinel considers the instance offline. Change the flags state to SRI_S_DOWN. If multiple sentinels monitor a service, there may be several sentinels with different down-after-milliseconds configurations. This should be noted in actual production.

  • Objective referral: When the subjective offline node is the primary node, the sentinel 3 node will seek the judgment of other sentinels on the primary node through the command sentinel IS-Masterdown-by-addr. If other sentinels also believe that the primary node is subjective offline, the number of votes for the subjective offline node will exceed quorum (election). At this point, the sentinel node thinks that there is indeed a problem with the master node, so it is objectively offline. Most of the sentinel nodes agree to the offline operation, which is objectively offline

When one sentry (Sentry 2 in the figure below) determines that the master library is “subjically offline,” it sends the is-master-down-by-addr command to the other sentries. The other sentinels then respond with either Y or N, where Y equals yes and N equals no, depending on their connection to the main library.

If the approval vote (2 here) is greater than or equal to the quorum configuration item in the sentinel configuration file (for example, if quorum=2 here), then the master library is considered objectively offline.

2.4 Election of sentinel cluster

Which sentinel node performs the master/slave switchover after determining that the master library is offline? This is where the sentinel voting mechanism is needed.

  • Why must there be an election/consensus mechanism?

To avoid a single point of sentry, a distributed cluster of sentries is required. As a distributed cluster, it inevitably involves consensus problem (i.e. election problem). Both failover and notification require only a master sentinel node.

  • What is the sentry’s election mechanism?

Sentinel’s election mechanic is quite simple, just a Raft election algorithm: if you get more than num(Sentinels)/2+1, you become the leader, and if you don’t get more, you continue

See the article distributed Algorithms – Raft Algorithm

  • For any sentry who wants to be a Leader, two conditions must be met:

    • First, get a majority of the votes;
    • Second, the quorum must also be greater than or equal to the quorum value in the Sentinel configuration file.

In the case of three sentinels, assuming quorum is set to 2, any sentinel who wants to be the Leader needs only two votes.

A lot of people confuse the concept of objective logging-out with the ability to switch between master and slave (using an election mechanism). Let’s look at another example.

Redis 1 master, 4 slave, 5 sentinels, sentinels configure quorum as 2, if 3 sentinels fail, when the master library is down, can the sentinels judge that the master library is “objectively offline”? Can it be switched automatically?

After the actual test:

1. The Sentinel cluster can determine that the master library is “subjectively offline”. Since quorum=2, when one sentry determines that the master library is “subjectively offline”, the same result will be obtained by asking the other sentry. Both sentries determine that the master library is “subjectively offline”, reaching the quorum value. Therefore, the sentry cluster can determine that the master library is “objectively offline”.

2. But the sentry cannot complete the master/slave switch. Sentry marks the main library as “objective offline”, and in electing a Sentry leader, a sentry must receive more than a majority of votes (5/2+1=3 votes). But there are only two sentries left, and no matter how they vote, one sentry can only get 2 votes at most, never reaching N/2+1 votes.

2.5 Selection of new master library

Since the master library has been judged to be offline objectively, how to choose a new master library from the rest of the library?

  • Filter out unhealthy (offline or offline) slave nodes that have not responded to the sentry’s ping
  • choosesalve-priorityThe secondary node has the highest priority (redis.conf)
  • Select the slave node with the largest replication offset

2.6 Fault Transfer

Once the new master library is selected, failover can begin.

Suppose that according to our initial figure: (We assume that the main library is objectively offline and sentinel 3 is selected as the Sentinel leader)

The failover process is as follows:

  • Detachslave-1 from the slave node (PS: 5.0 should be)replicaof no one), upgrade the master node,
  • Points the slave node slave-2 to the new master node
  • Notifies the client that the primary node is replaced
  • Change the oldMaster node (oldMaster) to a slave node pointing to the new master node

After the transfer

3. Obtain server information by Sentinel

3.1 Obtaining primary server information by Sentinel

By default, Sentinel sends the info command to the primary server through a command connection every 10 seconds, and analyzes the response to the info command to obtain the current information about the primary server. Sentinel can capture two aspects of information:

  • Information about the master server itself, including server run_id, role’s server role.
  • For all slave servers, each slave is recorded by a line at the beginning of the slave string, recording the IP and port of the slave server (the master server has configuration information for the slave library).

3.2 Obtaining slave server information by Sentinel

In addition to creating sentinelRedisInstance for the new slave server, Sentinel also creates command connections and subscription connections to the slave server when Sentinel detects a new slave server. By default, Sentinel sends the info command from the server through a command connection every 10 seconds. Sentinel analyzes the response from the INFO command to get the current information from the server. The value can be run_ID of the secondary server, role of the secondary server, IP address and port of the primary server, connection status of the primary server and secondary server master_link_status, and priority of the secondary server slave_priority.

3.3 Sentinel sends information to the master and slave servers

By default, Sentinel sends commands in the following format to all monitored master and slave servers over command connections every 2 seconds:

PUBLISH sentinel _:hello <sentinel information, primary server information >Copy the code

This command sends a message to the server’s _sentinel_: Hello channel with a number of parameters:

  • The parameter starting with s_ records the information of sentinel itself.
  • The parameters starting with m_ record information about the primary server, if sentinel is monitoring the primary server, and if Sentinel is monitoring the slave server, then these parameters record information about the primary server that the slave server is copying.
parameter describe
S_ip IP address of Sentinel
S_port The port number of Sentinel
S_runid The run ID of Sentinel
S_epoch The current configuration era of Sentinel
m_name The name of the primary server
M_ip IP address of the primary server
M_port Port number of the primary server
M_epoch The current configuration era of the primary server

Here is an example of the message sentinel sends to the primary server via the publish command:

  

In this example, the IP address of sentinel is 172.0.0.1, the port number is 26379, the run ID is the following string, and the current era is 0. The name of the primary server is myMaster, IP address is 127.0.0.1, port number is 6379, and the current era is 0.

3.4 Sentinel receives channel information from master server and slave server

After sentinel establishes a subscription connection with a master or slave server, sentinel sends the following command to the server via the subscription connection: Subscribe_sentinel_ :hello. For each server connected to Sentinel, Sentinel both sends messages to the server’s _sentinel_: Hello channel through a command link and receives messages from the server’s _sentinel_: Hello channel through a subscription connection.

When there are three sentinels, sentinel1, Sentinel2, and Sentinel3. Three sentinels are monitoring the same server, so when Sentinel1 sends a message to the server’s _sentinel_: Hello channel, All sentinel subscribers to the _Sentinel_ : Hello channel (including Sentinel1 itself) receive this message.

When a sentinel receives a message from the _sentinel_: Hello channel, it analyzes the message and extracts 8 parameters, including IP, port, and runID, to perform the following checks:

  • If the sentinel operation ID recorded in the message is the same as the Sentinel operation ID that received the message, then the message was sent by Sentinel itself. Sentinel will discard the message without further processing.

  • Conversely, if the sentinel run ID recorded in the message is not the same as the Sentinel run ID receiving the message, it means that the message is sent from other Sentinels monitoring the same server. The receiving Sentinel will update the instance structure of the corresponding primary server based on the parameters in the message.

3.5 Sentinel updates its own Sentinels dictionary

Sentinel creates the Sentinels dictionary in the sentinel instance structure for the master server, which holds sentinel itself and monitors other Sentinels’ data for the master server. When a sentinel receives a message from another Sentinels, the sentinels analyze and extract two parameters from the message:

  • Parameters related to sentinel include IP, port, RUNID, configuration era for Sentinel.

  • Parameters related to the primary server, including monitor primary server IP, port, RUNId, configuration era.

Suppose there are three sentinels: 127.0.0.1:26379, 127.0.0.1:26380, 127.0.0.1:26381. Three Sentinels are monitoring the primary server 127.0.0.1:6379, so when the sentinel 127.0.0.1:26379 receives the following message:

This sentinel will perform the following actions:

  • The sender of the first message is himself and the message is ignored.

  • The second message is sent to 26381, and sentinel extracts the content to update the sentinels dictionary instance structure corresponding to 26381.

  • The third message is sent to 23680, and also updates the instance structure corresponding to 23680 in the dictionary.

Each sentinel has its own sentinels dictionary. For sentinel 26379, the Sentinels dictionary information preserves two sentinels, 26380 and 26381. The same goes for other sentinels.

3.6 Sentinel Creates command connections to other sentinels

When sentinel finds a new Sentinel through channel information, it not only updates the Sentinels dictionary, but also creates a link to the sentinels command. The new Sentinel also creates a command link to the sentinel. Multiple Sentinels that ultimately monitor the same master server will form an interconnected network. As shown below:

 

Reference documentation

Redis high availability: Detailed explanation of the Redis sentinel mechanism