1. Introduction

High availability has two meanings: one is to minimize data loss, and the other is to provide services as much as possible. AOF and RDB ensure that data persistence is not lost as far as possible (AOF and RDB principle can be seen in the core article two), and master-slave replication is to increase the copy, a piece of data saved to multiple instances. Sentry is capable of automatic failover when the cluster and nodes go down, and continues to provide services.

2. The master-slave

In REIDS, replicaof can be executed to form a relationship between the master and slave libraries, or replicaof can be configured in the configuration file to have one server replicate another. The replicated server is the master server. The servers that replicate the master are called slave servers.

The relationship between master and slave:

  • The master service can have multiple slave servers or no slave servers, usually a Redis service does nothing, default to master
  • The slave server can have only one master

2.1 Master/Slave Association

The redis service is all master by default. To configure the master/slave relationship, you can enable the master/slave replication by configuring the master node information in the following three ways:

  • Configuration file: Add the following configuration to the configuration file of the secondary server
replicaof <masterip> <masterport>
Copy the code
  • Start parameter: redis-server Add the following parameters to the start command: –replicaof masterip masterport
  • Client command: Connect to redis and run the: replicaof masterip masterport command to make the redis instance become the secondary node

2.1.1 demo

  • After the client is connected, run the following command:

  • Observe the host node logs:

As you can see, the BGSAVE command is executed when the master node is associated with the slave node.Redis Core Part 2 – Data Persistence – Mining gold (juejin. Cn)) has a detailed description, unfamiliar partners can move to view.

2.2 Primary/Secondary Data Consistency

Since the master-slave relationship involves multiple nodes, it will inevitably involve the data synchronization delay between nodes, that is, the data consistency scheme. How does Redis weigh data consistency? As for how to synchronize data between multiple nodes, I draw roughly three directions in the figure above. If you have other solutions, please leave a comment in the comments:

  • Highly consistent solution: Data is synchronized between nodes in full synchronization mode. All nodes respond to clients only after data is written. In the tradeoff between high performance and high reliability, this method favors reliability. Although reliable data is ensured, performance is affected, especially when there are multiple slave nodes.
  • Strong consistency solution: Data synchronization between nodes is completely asynchronous. This solution is based on the trade-off between high performance and high reliability. As the performance improves, data reliability is not guaranteed and data is easy to be lost.
  • Final consistent solution: The master throws messages into an asynchronous queue (or buffer) and uses asynchronous threads to consume the queue (buffer) data to synchronize data between nodes, balancing high performance and high reliability. It comes from the Base theory, which is Basically Available, Soft State, and Eventual Consistency.

Actually for most open source products, on the high performance and high reliable weighing, and did not give specific a solution that is usually the power to the user, like kafka ack mechanism, by setting is 0 or 1 or 1, users according to their own business scenarios, his decision is to messages of high throughput rate or high reliability; For our Redis, however, such flexible configuration is not provided, and it defaults to asynchronous replication, weak consistency solutions, because it pursues extreme high performance.

2.3 Principle of Master/Slave Synchronization

There are two different implementations of master/slave replication in Redis: sync before 2.8 and psync after 2.8. In this article, we will introduce the implementations of the two commands respectively.

Master/slave replication consists of two operations: data synchronization and command propagation

  • Data synchronization: Updates the status of the secondary server to that of the primary server.
  • Command propagation: When the status of the primary server is changed, the status of the primary server is inconsistent with that of the secondary server.

Data synchronization can be divided into two scenarios:

  • First synchronization: Data synchronization from a node to the master for the first time
  • Disconnect and reconnect: Disconnect from a node and reconnect to the master

2.3.1 sync

2.3.1.1 Initial Synchronization

The first replication process of master and slave libraries can be roughly divided into three stages: 1) connection establishment stage (i.e. preparation stage); 2) Master database synchronizes data to slave database; 3) Send new write commands to the slave library during synchronization;

  • Link establishment phase: This phase establishes connections between the primary and secondary nodes to prepare for full data synchronization. When the slave database establishes a connection with the master database, the slave database executes slaveof and sends the sync command to inform the master database that synchronization is about to take place. After the master database confirms the reply, the synchronization between the master database and the slave database starts.
  • Phase 2: The master executes the bgsave command to generate an RDB file and sends the file to the slave library. At the same time, the master library creates a buffer for each slave to record all write commands received since the RDB file is generated. After receiving the RDB file from the library, save it to disk, and empty the current database data, and then load the RDB file data into memory
  • Phase 3: After loading the RDB from the Slave node, the master sends the data in the buffer to the Slave node. The Slave receives and executes the data. The Slave node synchronizes the data to the same state as the master node.

It is easy to see from the above process that when the primary/secondary synchronization is performed, if the number of concurrent services is high, the QPS of write commands soar. As a result, the buffer capacity of the secondary node is too small when loading the RDB file, leading to the primary/secondary synchronization failure and re-synchronization. Easy to form an infinite loop of BGSave and RDB retransmission operations.

2.3.1.2 Disconnecting and reconnecting

Prior to version 2.8, the sync command used for synchronization did not deal well with disconnection and reconnection when the master/slave disconnection occurred. The steps to be performed are the same as those in the first master/slave synchronization. Therefore, full replication is required. The steps are the same as those in the first synchronization.

2.3.2 psync

In version 2.8, Redis uses the psync command for master/slave synchronization. The implementation of psync mainly optimizes the disadvantages of full replication after sync is disconnected from master/slave. Use incremental replication instead of full replication

Incremental replication: Used for replication after a network interruption. Only the write commands executed by the primary node during the interruption are sent to the secondary node. Compared with full replication, incremental replication is more efficient

2.3.2.1 Replication offset, replication backlog buffer, Service running ID

The partial resynchronization function consists of the following three parts:

  • Replication offset of the primary server and replication offset of the secondary server
  • Replication Backlog for the primary server.
  • Server run ID (RUN ID)

When the master server propagates a command, it not only sends the write command to all the slave servers, but also stores the write command into the replication backlog buffer.

  • The replication offsets of the master and slave servers

    • The replication offset is incremented by N each time the master propagates N bytes of data to the slave
    • The value of the replication offset from the slave service is incremented by N each time it receives N bytes of data
  • Service Id (runId), whether the primary server or from the server has its own service Id, runId at the time of the server to start automatically generated, such as 33 b9b28ef80q2fdc9ab5e3fcbbbabff4dcdcedbg.

  • Copy backlogs:

    • The replication backlog buffer is a first-in, first-out queue maintained by the main service. The default size is 1M, and when the queue is full, the first queued element is automatically ejected
    • The replication backlog buffer mainly stores two types of data: the replication offset and the byte commands corresponding to the replication offset. When the master server propagates commands, it not only sends the write command to the slave server, but also writes the write command and its offset to the queue.
2.3.2.2 Implementation principles of Incremental Synchronization

Now that you know about server run IDS, replication offsets, and replication backlogs, take a look at how psync implements incremental synchronization, as shown below:

  • When the slave server is disconnected and the replication command is received again, the psync command sent to the master will carry the runId of the last maser and the offset of the last replication.
  • When the master server receives the psync command, it will check whether the runId is consistent with its own running ID. If so, it will check whether the instruction data after offset still exists in the replication backlog buffer. If so, it will return the incremental synchronization result; otherwise, it will return the full synchronization result. In incremental synchronization, the slave node only synchronously copies the commands of the backlogs, and the master node does not need to generate an RDB file for the slave node to reload.

3. The sentry

Although the master-slave architecture can be used for read and write separation to reduce the pressure of single-machine read, when the master dies, the master-slave architecture cannot automatically elect a new master, and the whole master and slave architecture can no longer provide write capacity, so for the highly available system architecture model, the master-slave is far from enough. There must be a highly available solution. After the primary server is down, one of the original secondary servers can be elected as master under certain conditions, so that the whole Redis cluster can still provide read and write services to the outside.

3.1 What is a sentry

Sentinel is a running mode of Redis. It focuses on monitoring the running state of Redis instances, and can select and switch master and slave through a series of mechanisms when the master node fails to achieve failover and ensure the availability of the whole Redis system. To sum up, the functions of sentinels are as follows:

  • Monitor: Continuously monitors whether the master and slave work in the expected state
  • Offline:
    • Objective offline
    • Subjective offline
  • failover

3.2 Sentry Function

3.2.1 monitoring

By default, sentinel sends a PING command once per second to all instances (primary, secondary, and other Sentinel service nodes) with which it has created a command link, and determines whether the instance is online or alive based on the PING response from the instance

An instance responds to the PING command in two ways:

  • Valid reply: The instance returns one of the three responses: +PONG, -loading, or -masterDown.
  • Invalid reply: the instance returns any reply other than +PONG, -loading, or -masterDown, or does not return any reply within the specified time limit

Invalid reply time configuration: The down-after-milliseconds option in the sentinel profile specifies how long it takes sentinel to determine whether an instance is offline. If an instance is within down-after-milliseconds, If an invalid response is returned to Sentinel, the Sentinel marks the instance as a subjective offline state.

3.2.2 offline

3.2.2.1 Subjective referral

The sentinel uses the PING command to check whether the instance it is linked to is alive. If the reply from the PING command is invalid, the sentinel marks the instance as subjective offline.

There is great misjudgment in judging the survival of corresponding instances only by subjective referral. According to the monitoring section, there are many factors that affect whether a reply to the PING command is invalid or valid, such as the load of different sentries, the network of the sentries, and their respective down-after-milliseconds configurations.

For a monitored instance, if the role of the instance (queried by the info command) is slave, the instance survival can be determined directly through the subjective offline. If the role of the instance is master, Therefore, the survival of the instance cannot be determined by the subjective offline behavior of a single sentinel node, but by combining the monitoring results of more sentinels, so as to reduce the misjudgment rate

3.2.2.2 Objective Offline

You can’t just leave it to a sentry to decide if a master is offline; The master is marked as an objective offline only when the number of sentinels judged to be subjectively offline has reached quorum(configurable).

Quorum is a configurable parameter, determined by sentinel configuration; Here is a section of the sentinel configuration file:

# sentinel monitor <master-name> <master-host> <master-port> <quorum>

sentinel monitor mymaster 127.0.0.1 6379 2
Copy the code

This configuration item tells the sentry which primary node to listen on:

  • Sentinel Monitor: indicates monitoring
  • Mymaster: indicates the name of the primary node
  • 127.0.0.1 and 6379 indicate the name and port number of the primary node respectively
  • 2: represents quorum, which means that the master is set to an objective offline state only if two or more sentinels deem the master to be unavailable

For different sentinels monitoring the same master node, they may have different conditions for determining that the master is objectively offline: when one Sentine determines that the master is objectively offline, other sentinels may not see it that way. For example, another sentinel node quorum configuration might have a value of 5.

If there are N sentinel instances, there must be N/2 + 1 instances to judge the master as “subjectively offline” before the master is deemed as “objectively offline”.

3.2.3 Automatic Primary/Secondary Switchover

3.2.3.1 Election lead Sentinel

When a primary server is judged to be objectively offline, the sentinels monitoring the offline primary negotiate to elect a lead Entinel, who will perform failover operations on the offline primary server.

Before we get to the election lead Sentinel, keep a key word in mind, configuration era:

Configuration epoch: The configuration epoch is a counter. Each sentinel election increments the value of the configuration epoch by 1. Within a configuration era, each Sentinel has one and only one chance to set a Sentinel to the leader.

In order to elect the lead sentinel, the nodes of the Sentinel cluster will send each other the is-master-down-by-addr command. The command will carry its own runId, which indicates that it wants to be set as the local lead sentinel, as shown in the figure below:

  • The rule of sentinel setting local lead sentinel is first come, first served: the source Sentinel that sends the setting request first to the target Sentinel will become the local lead Sentinel of the target Sentinel, and all subsequent setting request received will be rejected by the target Sentinel
  • The response to the target Sentinel contains the leader_runId parameter and leader_EPOCH. The sub-table represents the local lead run ID and configuration epoch for the target Sentinel
  • After receiving the command reply, source Sentinel will check whether the value of leader_EPOCH is the same as its own. If so, source Sentinel will continue to fetch the leader_runid parameter in the reply. If the value of leader_runid parameter is the same as the running ID of source Sentinel, So the target Sentinel sets the source Sentinel as the local lead Sentinel
  • If a sentinel is set up as the local lead Sentinel by more than half of the sentinels, the sentinel becomes the lead Sentinel; If no sentinel is elected as the lead sentinel within a given time limit, each sentinel will be elected again after a period of time until the lead sentinel is elected
3.2.3.2 Failover

After the lead Sentinel is elected, the lead Sentinel will perform failover operation on the offline primary server. Failover mainly consists of three steps:

  • Elect master: Pick out a slave server and convert it to master
  • Make all slave servers under the offline master replicate the new master
  • If the original offline primary server comes online again, make it the secondary server of the new master node
3.2.3.2.1 Selecting a new primary server

In order to select a slave server in good condition with complete data, the leading Sentinel uses certain rules to brush (before brushing, the leading Sentinel saves the former master slave nodes as a list) :

  • Based on node online status: Offline nodes are removed from the list
  • Delete all offline primary servers whose connections have been disconnected for more than down-after-milliseconds*10 milliseconds. If the slave is disconnected from the master for longer than the threshold, the slave node’s network is not in good condition
3.2.3.2.2 notice

When the new master server appears, the next step for the lead Sentinel is to make all the slave servers under the offline master server copy the new master server. This operation is actually a master slave association operation. For details, see the previous section on master slave.

3.3 summarize

  • The master-slave switchover is not implemented by randomly selecting a sentry, but by voting arbitration, selecting a Leader who is responsible for the master-slave switchover
  • Sentinel sends the PING command to the instance (including the master server, the slave server and other Sentinels) once per second, and determines whether the instance is online according to the response of the instance to the PING command. When an instance sends invalid reply to Sentinel continuously within the specified period, Sentinel will judge this instance as a subjective referral