Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

1, the introduction of

Master-slave replication is the foundation of distributed Redis, but normal master-slave replication is not highly available. In normal master-slave replication mode, if the primary server goes down, o&M personnel have to manually switch the primary server, which is obviously not an option. In response to this situation, Redis has officially launched a high availability solution to resist node failures, Redis Sentinel. Redis Sentinel (Sentinel) : Sentinel system consisting of one or more Sentinel instances. It can monitor any number of primary and secondary servers. When the monitored primary server goes down, the primary server is automatically offline and the secondary server is selected to upgrade to the new primary server.

The following is an example: If the offline time of the old Master exceeds the upper limit set by the user, the Sentinel system will perform a failover operation for the old Master. The failover operation consists of three steps:

  1. Select the latest Slave as the new Master
  2. Send new replication instructions to other slaves to make them become the new Slave of the Master
  3. Continue to monitor the old Master and set the old Master as the Slave of the new Master if it goes online

This paper is carried out based on the following resource list:

The IP address The node role port
192.168.211.104 Redis Master/ Sentinel 6379/26379
192.168.211.105 Redis Slave/ Sentinel 6379/26379
192.168.211.106 Redis Slave/ Sentinel 6379/26379

2. Sentinel initialization and network connection

Sentinel is a simpler Redis server that loads different command tables and configuration files when Sentinel is started, so Sentinel is essentially a Redis service with fewer commands and some special features. When a Sentinel is started, it goes through the following steps:

  1. Initialize the Sentinel server
  2. Replace the plain Redis code with a special code for Sentinel
  3. Example Initialize the Sentinel status
  4. Initializes the list of primary servers monitored by Sentinel based on the Sentinel profile given by the user
  5. Create a network connection to the primary server
  6. Get the slave server information based on the master service to create a network connection to the slave server
  7. Obtain Sentinel information according to publish/subscribe, and create network connections between Sentinels

2.1 Initializing the Sentinel Server

Sentinel is essentially a Redis server, so starting Sentinel requires starting a Redis server, but Sentinel does not need to read the RDB/AOF file to restore the data state.

2.2 Replace the regular Redis code with the Sentinel special code

Sentinel is used for a small number of Redis commands, most of which are not supported by the Sentinel client, and Sentinel has some special features that require Sentinel to replace the code used by the Redis server with a special code for Sentinel at startup. During this time Sentinel loads a different command table than the regular Redis server. Sentinel does not support commands such as SET and DBSIZE. Reserve support for PING, PSUBSCRIBE, SUBSCRIBE, UNSUBSCRIBE, INFO and other commands. These instructions provide assurance in Sentinel work.

2.3 Initializing the Sentinel status

After loading Sentinel’s specific code, Sentinel initializes the sentinelState structure, which is used to store sentinel-related state information, the most important of which is the Masters dictionary.

Struct sentinelState {// Uint64_t current_epoch; // Key -> master // value -> point to sentinelRedisInstance pointer dict *masters; / /... } sentinel;Copy the code

2.4 Initialize the list of primary servers monitored by Sentinel

The list of master servers monitored by Sentinel is kept in sentinelState’s Masters dictionary, and initialization begins when sentinelState is created.

  • The key for Masters is the name of the master service
  • Masters’ value is a pointer to sentinelRedisInstance

The name of the primary server is specified by our sentinel.conf configuration file. The following primary server name is redis-master:

Daemonize yes port 26379 protect-mode no dir "/usr/local/sof/redis-6.2.4 /sentinel-tmp" sentinel monitor redis-master 192.168.211.104 6379 2 Sentinel down-after-milliseconds redis-master 30000 sentinel failover-timeout redis-master 180000  sentinel parallel-syncs redis-master 1Copy the code

The sentinelRedisInstance holds the Redis server information (master, slave, and Sentinel information are all stored in this instance).

Typedef struct sentinelRedisInstance {// Identifies the type and status of the current instance. Such as SRI_MASTER, SRI_SLVAE, SRI_SENTINEL int flags; // Set the instance name, secondary server, and Sentinel to IP :port char *name; // server runid char *runid; // Configure epoch, failover using uint64_t config_epoch; // Address sentinelAddr *addr; // Sentinel down-after-milliseconds redis-master 30000mstime_t down_after_period; Sentinel monitor redis-master 192.168.211.104 6379 2 int quorum; sentinel monitor redis-master 192.168.211.104 6379 2 int quorum; Sentinel parallel-syncs redis-master 1 int parallel-syncs; sentinel parallel-syncs redis-master 1 int parallel-syncs; Sentinel failover-timeout Redis -master 180000 mstime_t failover_timeout; / /... } sentinelRedisInstance;Copy the code

According to the one master and two slave configuration above, the following instance structure will be obtained: \

2.5 Creating a Network Connection to the primary server

When the instance structure is initialized, Sentinel will start to create network connections to the Master, which will become the Master’s client. A command connection and a subscription connection are created between Sentinel and Master:

  • Command connection is used to obtain primary/secondary information
  • Subscription connections are used to broadcast information between sentinels, with each Sentinel subscribes to the master and slave servers it monitorssentinel: Hello channel (Note that sentinels do not create subscription connections between them, they pass subscriptionssentinel: Hello channel to get initial information for other Sentinels)

Sentinel sends an INFO command to the Master every 10 seconds after the connection is created.

  • Master itself
  • Slave information under Master



2.6 Creating a Network Connection to the secondary server

Sentinel can create network connections to Slave based on master server information. Command connections and subscription connections are also created between Sentinel and Slave. \

When a network connection is created between Sentinel and Slave, Sentinel becomes a client of Slave. Sentinel also requests Slave to obtain server information through the INFO command every 10 seconds. At this point Sentinel captures the relevant server data for the Master and Slave. The important information is as follows:

  • Server IP address and port
  • Server running ID Run ID
  • Server role
  • Server connection status Mater_link_status
  • Slave Replication offset slave_REPL_offset (used to elect a new Master during failover)
  • Slave Priority slave_priority

The instance structure information is as follows: \

2.7 Creating a network connection between Sentinels

How the Sentinels detect each other and communicate with each other is a matter of whether or not they subscribe to the Sentinel: Hello channel. The Sentinel will subscribe to the Sentinel: Hello channel with all the masters and slaves monitored by Sentinel. The Sentinel will send a message to the Sentinel: Hello channel every 2 seconds with the following content:

PUBLISH sentinel:hello “,,,,,,,”

Where S code Sentinel, M stands for Master; IP indicates the IP address, port indicates the port, rUNId indicates the running ID, and epoch indicates the configuration era.

Multiple Sentinels will be configured with the same primary server IP and port information in the configuration file. Therefore, multiple Sentinels will subscribe to Sentinel: Hello channel, and the IP and port of other Sentinels can be obtained by the information received through the channel. The following two points need to be noted:

  • If the obtained RUNID is the same as Sentinel’s own RUNID, it indicates that the message is published by Sentinel itself and is discarded directly
  • If not, it indicates that the received messages are published by other Sentinels. In this case, Sentinel instance data needs to be updated or added according to THE IP address and port

Subscription connections are not created between sentinels, only command connections are created: \

The instance structure information is as follows: \


3. Sentinel work

Sentinel’s primary job is to monitor the Redis server and switch to a new Master instance when the Master instance exceeds its preset time limit. There are a lot of details in this process, which can be roughly divided into four steps: detecting whether the Master is subjectively offline, detecting whether the Master is objectively offline, electing the lead Sentinel, and failover.

3.1 Checking whether the Master is offline

Every second, Sentinel sends a PING command to all the Master, Slave, and Sentinel servers in the sentinelRedisInstance to determine whether they are still online.

sentinel down-after-milliseconds redis-master 30000
Copy the code

In the Sentinel profile, when an instance of Sentinel PING returns an invalid command within the duration of consecutive down-after-milliseconds configurations, Sentinel currently considers it offline. The down-after-milliseconds configured in the Sentinel profile will apply to all masters, slaves, and sentinelRedisInstance.

Invalid instructions refer to instructions other than +PONG, -loading, and -masterdown, including no response

If the Sentinel detects that the Master is subjectively offline, it will change its sentinelRedisInstance flags to SRI_S_DOWN\


3.2 Checking whether the Master is offline

The current Sentinel considers that its offline status can only be subjective offline. In order to determine whether the current Master is objectively offline, other Sentinels need to be asked, and the sum of all the subjective or objective offline status of the Master needs to reach the quorum configuration value. The Master is currently marked as objective offline by Sentinel. \

The current Sentinel sends the following command to the other sentinelRedisInstance sentinels:

SENTINEL is-master-down-by-addr <ip> <port> <current_epoch> <runid>
Copy the code
  • IP: IP address of the Master that is judged to be the subjective offline
  • Port: indicates the port of the Master that is judged to be subjectively offline
  • Current_epoch: Configuration epoch of the current Sentinel
  • Runid: indicates the runid of the current sentinel, runid

Current_epoch and RUNID are both used in Sentinel elections. Once the Master is offline, it is necessary to elect a lead Sentinel to elect a new Master. The current_epoch and RUNId play an important role in this.

If the Sentinel command is received, the system checks whether the primary server is offline according to the parameters in the command. After the check is complete, the following three parameters are returned:

  • Down_state: check result 1 indicates offline, 0 indicates offline
  • Leader_runid: Returns * for deciding whether to go offline, and returns RUNId for electing lead Sentinel
  • Leader_epoch: Configuration epoch will have a value when Leader_rUNId returns RUNId, otherwise 0 will always be returned
  1. When Sentinel detects that the Master is subjectively offline, it queries other Sentinels and sends CURRENT_EPOCH and RUNId, where CURRENT_EPOCH =0 and RUNId =*
  2. The Sentinel that receives the command returns down_state = 1/0, leader_rUNId = *, Leader_EPOCH =0 when it determines whether the Master is offline



3.3 Election lead Sentinel

If down_state returns 1, the Sentinel receiving the IS-master-down-by-addr command considers the master to be subjectedly offline. If down_state returns 1 (including itself) greater than or equal to quorum (the value configured in the configuration file), So Master is officially marked as objective offline by the current Sentinel. Sentinel sends the following command again:

SENTINEL is-master-down-by-addr <ip> <port> <current_epoch> <runid>
Copy the code

At this point, the runid will no longer be 0, but the value of the Sentinel’s own runid (runid), indicating that the current Sentinel wants other sentinels receiving is-master-down-by-addr to set it as the lead Sentinel. This setting is on a first-come, first-served basis, and the first Sentinel to receive a set request will be the lead Sentinel. The Sentinel that sends the command will determine whether it is set as the lead Sentinel based on the replies from other sentinels. If more than half of sentinels are set as lead sentinels by other sentinels (this number is available in sentinelRedisInstance’s Sentinel dictionary), Then the Sentinel will think that it has become the lead Sentinel and start subsequent failover work (since half is required and only one lead Sentinel is set up for each Sentinel, only one lead Sentinel will appear. If none of them meets the requirements of the lead Sentinel, Sentinel will be re-elected until a lead Sentinel is elected).

3.4 Failover

Failover will be given to the lead Sentinel, which will do the following:

  1. Select the best slave from the original master as the new master
  2. Make other slaves slaves of the new master
  3. Continue listening on the old master, and if it goes online, set it as the slave of the new master

The hardest part is that if the best new Master is selected, the lead Sentinel does the following cleaning and sorting:

  1. Check whether any slave is offline and if any slave is removed from the slave list
  2. Delete the slave that did not respond to the sentinel INFO command within 5 seconds
  3. Delete all secondary servers that have been disconnected from the offline primary for longer than down_after_milliseconds * 10
  4. Based on slave_priority, the slave with the highest priority is selected as the new master
  5. If the priorities are the same, the slave with the largest offset slave_REPL_offset is selected as the new master
  6. If the offset is the same, sort the slave server by the slave server run ID, and select the slave with the smallest RUN ID as the new master

After a new Master is created, the lead Sentinel sends the SLAVEOF IP port command to other slave servers (excluding the new Master) that have taken the Master offline to make them slave of the new Master.

This is the end of the Sentinel workflow, and if the new master is offline, the loop can be used!