In the previous article introduces Redis master-slave replication mechanism, master-slave replication mechanism can allow us to expand the node for data copy, can read and write according to the business scenario for separation, data backup, and other functions, but the Master node Master when abnormal and can’t achieve automatic master-slave replication nodes, fault handling transfer operations such as switch, The sentinel mechanism introduced in this paper is a node monitoring management mechanism based on the master/slave replication mechanism of Redis. It can switch nodes and failover when the above problems occur. It is an implementation mechanism of Redis high availability scheme.

Structure topology

Master (Master node) The Master service database of Redis, which is responsible for receiving the writing of business data. Generally, there is one (the horizontal extension of multiple masters after Sharding under distributed architecture is not extended here, but the general architecture topology of Redis Sentinel is briefly discussed).
Slave The Slave database of Redis that replicates the data of the Master node
Sentinel Node Sentinel nodes are used to monitor the status of service data nodes such as Master and Slave. Sentinel nodes are usually composed of multiple Sentinel nodes, which reflects the high availability of the Sentinel mechanism

composition	role	role	The number of
Master	Primary node of service data	Receives client requests, readable and writable	1
Slave	Service data from nodes	Replication Master data, Dr, writable (read/write separation)	> = 1
Sentinel Node	The sentinel node	Monitor Master and Slave service data nodes and perform failover when faults occur	> = 1

Operation mechanism

The main task of Redis Sentinel is to monitor the status of all Redis nodes at all times, and handle faults according to the preset value mechanism once exceptions occur, so that Redis can achieve high availability. The core implementation mechanism is to discover and monitor each node through three scheduled monitoring tasks.

Timing task	Trigger interval	function
Timing the info	10s	Sentinel node to obtain the latest Redis node information and topological relationship
The publish/subscribe regularly	2s	Sentinel nodes communicate with each other by subscribing to the Master channel
Timing ping	1s	The Sentinel node checks the network with all Redis nodes and other Sentinel nodes

[Worker-1]Every 10 seconds, each Sentinel node is directed tomasterandslavesendinfoCommand to obtain the latest topology.The scheduled task provides the following functions:When a fault occurs or a new node is added, you can periodically obtain and update the topology of the current Redis node

After you run info replication on the master node, the following information can be viewed:

The Replication role: master connected_slaves: 2 slave0: IP = 127.0.0.1 port = 6380, state = online, offset = 4917, lag = 1 Slave1: IP = 127.0.0.1 port = 6381, state = online, offset = 4917, ` lag = 1

[Worker-2]Every 2 seconds, each Sentinel node calls the Redis data node_sentinel_ : hello!On the channelPublishThe judgment of the Sentinel node for the master node and the information of the current Sentinel node, as well as each Sentinel nodeSubscribe (Subcribe)To learn about other Sentinel nodes and their relationship tomasterThe judgment of the

All sentinel nodes publish/subscribe to _sentinel_ : hello of the master node to communicate and exchange information, so as to provide basis for objective downsizing and leader election

[Worker-3]Every second, each Sentinel node is directed tomaster,slave,Other Sentinel nodesSend apingOrder to do it onceThe heartbeat detectionTo verify that these nodes are currently reachable

failover

[step-1] 当sentinel nodeNode listeningmasterThe node is faulty.slaveThe slave node cannot be pairedmasterData Replication

[step-2] sentinel nodefoundmasterThe node is abnormalSentinel cluster nodeTo be elected by internal voteleaderTo carry outmaster,slaveService data node faults are transferred and notifiedclientThe client

Timeout detection: Uses the down-after-milliseconds parameter. If no response is detected after the timeout, the node is faulty. Because The Sentinel node exists in the form of cluster, when the Sentinel node detects that the master node is abnormal, it will ask other Sentinel nodes to vote for the next step, which can greatly reduce the misjudgment of single node to the failure

[step-3]When the newmasterAfter producing,slaveThe node will copy the new onemasterBut they will continue to monitor the old onesmasternode

[step-4]When the oldmasterAfter the node recovers from the fault, becauseSentinel clusterListen all the time, it will be put back into the cluster management, make it newmasterSecondary node of a node. The recovered faulty node becomesslaveAnd start copying new onesmasterNode to realize the reuse after node failure

For the aboveRedis SentinelThe sequence diagram interaction of the failover process under the architecture is summarized as follows:

The cluster of election

Sentinel node election

Due to thesentinelTherefore, you need to select a cluster to ensure high availabilitySentinel nodeAs aLeaderTo operate on each of themSentinel nodeCan beLeader. Election process:

When aSentinel nodeIdentify the primary node of the Redis clusterofflineafter
Request otherSentinel nodeAsk to be elected asLeader. The requestedSentinel nodeIf you haven’t agreed to anything elseSentinel node, the request is granted, i.eThe electoral vote is +1Otherwise, do not agree.
When aSentinel nodeThe number of electoral votes obtained reachedLeaderMinimum number of votes (Maximum number of Sentinel nodes /2+1), theSentinel nodeElected asLeader; Otherwise, a new election will be held.

Raft algorithm is adopted for Sentinel cluster election. If you are interested, you can continue to explore the internal implementation mechanism of this algorithm.

Subjective referral & Objective referral:

Subjective offline

Each Sentinel node in the Sentinel cluster periodically sends heartbeat packets to all nodes in the Redis cluster to check whether the nodes are normal. If a node does not reply to the heartbeat packet of the Sentinel node within down-after-milliseconds, the redis node is subjectively offline by the Sentinel node. The so-called subjective offline is judged by a single node. It is possible that the node is not communicating with the master properly. The interaction between the non-master and all nodes is abnormal, so multiple Sentinel nodes need to confirm.

Objective offline

When a node is recorded as subjective offline by a Sentinel node, it does not mean that the node is definitely faulty, and it needs to be judged as subjective offline by other Sentinel nodes in the Sentinel cluster.

Redis node election

When the Sentinel cluster elects a Sentinel leader, the Sentinel leader selects a slave as the master.

Election process:

Filter faulty nodes
Priority selectionslave-priorityOne of the biggestslaveAs amasterContinue if it does not exist
chooseCopy offset(The amount of data written in bytes, which records how much data was written. The primary server synchronizes the offset to the secondary server. When the offset is the same, the data is fully synchronizedslaveAs amasterContinue if it does not exist
chooserunidRedis generates a random RUNID each time it is started as an identifier of RedisslaveAs amasterSo this is a random scenario and it’s also a bottom-of-the-barrel scenario

reference

Redis Design and Implementation

Redis Development and Operation

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

[Redis] Sentry mechanism

Structure topology

Operation mechanism

failover

The cluster of election

Sentinel node election

Redis node election

reference

[Redis] Sentry mechanism

Structure topology

Operation mechanism

failover

The cluster of election

Sentinel node election

Redis node election

reference

Related Posts

Kubernetes entry to advanced combat

LeetCode-048- Rotate image

Application and pain points of API choreography