In the previous article introduces Redis master-slave replication mechanism, master-slave replication mechanism can allow us to expand the node for data copy, can read and write according to the business scenario for separation, data backup, and other functions, but the Master node Master when abnormal and can’t achieve automatic master-slave replication nodes, fault handling transfer operations such as switch, The sentinel mechanism introduced in this paper is a node monitoring management mechanism based on the master/slave replication mechanism of Redis. It can switch nodes and failover when the above problems occur. It is an implementation mechanism of Redis high availability scheme.

Structure topology

  • Master (Master node) The Master service database of Redis, which is responsible for receiving the writing of business data. Generally, there is one (the horizontal extension of multiple masters after Sharding under distributed architecture is not extended here, but the general architecture topology of Redis Sentinel is briefly discussed).
  • Slave The Slave database of Redis that replicates the data of the Master node
  • Sentinel Node Sentinel nodes are used to monitor the status of service data nodes such as Master and Slave. Sentinel nodes are usually composed of multiple Sentinel nodes, which reflects the high availability of the Sentinel mechanism
composition role role The number of
Master Primary node of service data Receives client requests, readable and writable 1
Slave Service data from nodes Replication Master data, Dr, writable (read/write separation) > = 1
Sentinel Node The sentinel node Monitor Master and Slave service data nodes and perform failover when faults occur > = 1

Operation mechanism

The main task of Redis Sentinel is to monitor the status of all Redis nodes at all times, and handle faults according to the preset value mechanism once exceptions occur, so that Redis can achieve high availability. The core implementation mechanism is to discover and monitor each node through three scheduled monitoring tasks.

Timing task Trigger interval function
Timing the info 10s Sentinel node to obtain the latest Redis node information and topological relationship
The publish/subscribe regularly 2s Sentinel nodes communicate with each other by subscribing to the Master channel
Timing ping 1s The Sentinel node checks the network with all Redis nodes and other Sentinel nodes
  • [Worker-1]Every 10 seconds, each Sentinel node is directed tomasterandslavesendinfoCommand to obtain the latest topology.The scheduled task provides the following functions:When a fault occurs or a new node is added, you can periodically obtain and update the topology of the current Redis node

After you run info replication on the master node, the following information can be viewed:

The Replication role: master connected_slaves: 2 slave0: IP = 127.0.0.1 port = 6380, state = online, offset = 4917, lag = 1 Slave1: IP = 127.0.0.1 port = 6381, state = online, offset = 4917, ` lag = 1

  • [Worker-2]Every 2 seconds, each Sentinel node calls the Redis data node_sentinel_ : hello!On the channelPublishThe judgment of the Sentinel node for the master node and the information of the current Sentinel node, as well as each Sentinel nodeSubscribe (Subcribe)To learn about other Sentinel nodes and their relationship tomasterThe judgment of the

All sentinel nodes publish/subscribe to _sentinel_ : hello of the master node to communicate and exchange information, so as to provide basis for objective downsizing and leader election

  • [Worker-3]Every second, each Sentinel node is directed tomaster,slave,Other Sentinel nodesSend apingOrder to do it onceThe heartbeat detectionTo verify that these nodes are currently reachable

failover

  • [step-1]sentinel nodeNode listeningmasterThe node is faulty.slaveThe slave node cannot be pairedmasterData Replication

  • [step-2] sentinel nodefoundmasterThe node is abnormalSentinel cluster nodeTo be elected by internal voteleaderTo carry outmaster,slaveService data node faults are transferred and notifiedclientThe client

Timeout detection: Uses the down-after-milliseconds parameter. If no response is detected after the timeout, the node is faulty. Because The Sentinel node exists in the form of cluster, when the Sentinel node detects that the master node is abnormal, it will ask other Sentinel nodes to vote for the next step, which can greatly reduce the misjudgment of single node to the failure

  • [step-3]When the newmasterAfter producing,slaveThe node will copy the new onemasterBut they will continue to monitor the old onesmasternode

  • [step-4]When the oldmasterAfter the node recovers from the fault, becauseSentinel clusterListen all the time, it will be put back into the cluster management, make it newmasterSecondary node of a node. The recovered faulty node becomesslaveAnd start copying new onesmasterNode to realize the reuse after node failure

For the aboveRedis SentinelThe sequence diagram interaction of the failover process under the architecture is summarized as follows:

The cluster of election

Sentinel node election

Due to thesentinelTherefore, you need to select a cluster to ensure high availabilitySentinel nodeAs aLeaderTo operate on each of themSentinel nodeCan beLeader. Election process:

  • When aSentinel nodeIdentify the primary node of the Redis clusterofflineafter
  • Request otherSentinel nodeAsk to be elected asLeader. The requestedSentinel nodeIf you haven’t agreed to anything elseSentinel node, the request is granted, i.eThe electoral vote is +1Otherwise, do not agree.
  • When aSentinel nodeThe number of electoral votes obtained reachedLeaderMinimum number of votes (Maximum number of Sentinel nodes /2+1), theSentinel nodeElected asLeader; Otherwise, a new election will be held.

Raft algorithm is adopted for Sentinel cluster election. If you are interested, you can continue to explore the internal implementation mechanism of this algorithm.

Subjective referral & Objective referral:

  • Subjective offline

Each Sentinel node in the Sentinel cluster periodically sends heartbeat packets to all nodes in the Redis cluster to check whether the nodes are normal. If a node does not reply to the heartbeat packet of the Sentinel node within down-after-milliseconds, the redis node is subjectively offline by the Sentinel node. The so-called subjective offline is judged by a single node. It is possible that the node is not communicating with the master properly. The interaction between the non-master and all nodes is abnormal, so multiple Sentinel nodes need to confirm.

  • Objective offline

When a node is recorded as subjective offline by a Sentinel node, it does not mean that the node is definitely faulty, and it needs to be judged as subjective offline by other Sentinel nodes in the Sentinel cluster.

Redis node election

When the Sentinel cluster elects a Sentinel leader, the Sentinel leader selects a slave as the master.

Election process:

  • Filter faulty nodes
  • Priority selectionslave-priorityOne of the biggestslaveAs amasterContinue if it does not exist
  • chooseCopy offset(The amount of data written in bytes, which records how much data was written. The primary server synchronizes the offset to the secondary server. When the offset is the same, the data is fully synchronizedslaveAs amasterContinue if it does not exist
  • chooserunidRedis generates a random RUNID each time it is started as an identifier of RedisslaveAs amasterSo this is a random scenario and it’s also a bottom-of-the-barrel scenario

reference

Redis Design and Implementation

Redis Development and Operation

www.cnblogs.com/albert32/p/… Sentinel node election