The author | codedump codedump. Info blogger, server backstage development work for many years engaged in the Internet. Visit codeDump’s blog to read more articles.
Redis is one of the most widely used basic software. For engineers, architects, and operations personnel, it is essential to understand the high availability solutions and principles behind Redis. In this paper, the author deeply analyzes all aspects of Redis high availability, and makes an effective summary, I believe that the majority of readers can play a good role in leading the way.
In order to achieve High Availability (HA) in Redis, the following two methods are adopted:
Primary and secondary replication data.
Sentinels are used to monitor the operation of data nodes. Once the master node has problems, the service will be continued from the top of the node.
A master-slave replication
There are two types of data replication in Redis: full replication and partial replication.
Implementation of full copy function of old version
Full replication is achieved using the Snyc command, and the flow is as follows:
The secondary server sends the sync command to the primary server.
After receiving the sync command, the master server invokes the bgSave command to generate the latest RDB file and synchronizes it to the slave server, so that the slave server loads the RDB file in the same state as the master server when executing the BGsave command.
The master synchronizes write commands stored in the command buffer to the slave server, which executes these commands so that the state of the slave server is the same as the current state of the master server.
The biggest problem with the full replication function of the old version is that when the server is disconnected and reconnected, full replication is required even if there is already some data on the slave server, which is very inefficient. Therefore, the new version of Redis has made improvements in this part.
Implementation of the new version full copy function
The new version of Redis uses the psync command instead of the sync command, which can achieve both full and partial synchronization.
Copy offset
The replication parties, the master and slave servers, each maintain a replication offset:
The master server changes its replication offset +N each time it synchronizes N bytes of data to the slave.
The slave server changes its replication offset +N each time it synchronizes N bytes of data from the master.
Replication backlogs
The primary server maintains a fixed-length FIFO queue as a replication backlog buffer, with a default size of 1MB.
When command propagation occurs on the master server, write commands are not only synchronized to the slave server, but also written to the replication backlog buffer.
Server Running ID
Each Redis server has its run ID, which is automatically generated by the server at startup. The primary server will send its run ID to the slave server, and the slave server will save the primary server’s run ID.
When the server Redis is disconnected and reconnected, the synchronization progress can be judged according to the running ID:
If the running ID of the primary server stored on the secondary server is the same as that of the current primary server, the secondary server is considered to be connected to the primary server that was previously replicated. The primary server can continue to try partial synchronization.
Otherwise, if the ID of the primary server is different from that of the primary server, the synchronization process is complete.
Psync command flow
With that in mind, let’s examine the flow of the psync command:
If the slave server has not copied any of the master servers before, or slaveof no one has been executed before, then the slave server will send psync to the master server? The -1 command requests the primary server to perform full data synchronization.
Otherwise, if some data has been synchronized from the secondary server, the secondary server sends the psync <runid> <offset> command to the primary server, where runid is the id of the previous primary server and offset is the replication offset of the current secondary server.
In the preceding two cases, after the primary server receives the psync command, the following three possibilities occur:
The primary server replies with +fullresync <runid> <offset>, indicating that the primary server requires full data synchronization with the secondary server. Where, runid is the running ID of the current primary server, and offset is the replication offset of the current primary server.
If the primary server replies with +continue, it indicates that the primary server is partially synchronizing data with the secondary server.
If the primary server responds with -err, the primary server version is earlier than 2.8 and cannot identify the psync command. In this case, the secondary server sends the sync command to the primary server to perform full data synchronization.
Overview of sentry mechanism
Redis uses the Sentry mechanism for high availability (HA), which roughly works as follows:
Redis uses a set of Sentinel nodes to monitor the availability of primary and secondary Redis services.
Once the primary Redis node is found to have failed, a sentinel node is elected as the leader.
The sentinel leader then selects a Redis node from the remaining Redis nodes to serve as the new primary Redis node.
Above, Redis nodes are divided into two types:
Sentinel node (Sentinel) : Monitors the running status of nodes.
Data node: that is, the Redis node that normally serves the client requests. It has the primary and secondary nodes.
The above is the general process, which needs to solve the following problems:
How to monitor Redis data nodes?
How do I determine if a Redis data node has failed?
How do I select a Sentinel leader node?
What is the basis for sentry node to select a new primary Redis node?
Answer these questions one by one below.
Three Monitoring Tasks
The Sentinel node monitors the service availability of the Redis data node through three scheduled monitoring tasks.
The info command
Every 10 seconds, each sentinel node sends the info command to the primary and secondary Redis data nodes to obtain new topology information.
Redis topology information includes:
Role of this node: Primary or secondary.
Addresses and ports of the primary and secondary nodes.
In this way, the sentinel node automatically gets slave information from the INFO command, so slave information added later can be automatically sensed without explicit configuration.
Sync information to __sentinel__: Hello channel
Every two seconds, each sentinel node synchronizes its master node information with the current sentinel node information to the Redis data node’s __sentinel__: Hello channel. Since other sentinels subscribe to this channel, this operation actually swaps master node and sentinel node information between sentinels.
This actually does two things:
* Discover a new sentinel node: If a new sentinel node joins, save the information of the new sentinel node at this time, and establish a connection with the sentinel node later.
* Exchange the status information of the master node as a basis for objectively judging the master node offline.
Perform heartbeat detection on the data node
Every second, each sentinel node sends a ping command to the primary and secondary data nodes and other sentinel nodes for heartbeat detection, which is the basis for subjective judgment of offline data node.
Subjective and objective logoff
Subjective offline
The third of the three monitoring tasks detects the heartbeat. If no response is received after the configured down-after-milliseconds, the data node is considered “sdown”.
Why is it called “subjective referral”? Because in a distributed system, there are multiple machines working together, the network may appear various conditions, only by the judgment of a node is not enough to consider a data node offline, which requires the “objective offline” behind.
Objective offline
When a sentinel node considers that the primary node is subjectively offline, the sentinel node needs to use the “sentinel is-master-down-by ADDR” command to consult other sentinel nodes whether the primary node is offline. If more than half of the sentinel nodes answer that the primary node is offline, the primary node is considered to be “objectively offline”.
Elect sentry leaders
When the master node goes offline objectively, a sentinel node needs to be elected as the sentinel leader to complete the subsequent selection of a new master node.
The general idea of this election is:
Each sentinel node applies to be a sentinel leader by sending the “sentinel IS-master-down-by ADDR” command to other sentinel nodes.
Each sentinel node that receives a “sentinel IS-master-down-by ADDR” command is only allowed to vote for the first node. The other nodes will reject the command.
If a sentinel node receives more than half of the approval votes, it becomes the sentinel leader.
If the previous three steps do not elect a sentinel leader within a certain period of time, the next election will be restarted.
As you can see, this process for electing a leader is very similar to the process for electing a leader in Raft.
Select a new primary node
From the remaining Redis slave nodes, select the new master node in the following order:
Filter out “unhealthy” data nodes, such as slave nodes that are offline, disconnected, slave nodes that have not replied to the ping command of sentinel in 5 seconds, slave nodes that have lost contact with the master node.
Select the slave node with the highest slave-priority. If it exists, return it does not exist and continue with the following process.
Select the slave node with the highest replication offset, which means that the data on this slave node is the most complete. If it exists, return it does not exist and continue with the process.
At this point, all remaining slave nodes are in the same state. Select the slave node with the smallest RUNId.
Promote the new master node
After the new master node is selected, the final process is needed to make the node the new master node:
The sentinel leader issues the “Slaveof no one” command to the last selected slave node to make it the master node.
The Sentinel leader sends commands to the remaining slave nodes to become slave nodes of the new master node.
The sentinel node collection updates the original master node to a slave node and commands it to replicate the data of the new master node when it recovers.
What if the sentry leader fails during the process? Come to the GIAC Global Internet Architecture Conference to be held in Shenzhen from June 21 to 23 and read the senior architect’s book Redis In Depth: Qian Wenpin, the author of “Core Principles and Application practice”, will attend the 2019 GIAC Shenzhen station as a database lecturer, and give a speech on Redis high performance, high availability. In addition, Yu Feng, former head of database of Ali Cloud, will be the producer of this special session. The specific topics are as follows: