Abstract: There are a lot of issues that need to be considered in Redis master-slave mode. Here are some issues that need to be analyzed and summarized in Redis multi-server mode.
Single point of failure exists in a single node of Redis. In order to solve the single point of failure, it is generally necessary to configure the slave node of Redis, and then use the sentry to monitor the survival status of the master node. If the master node fails, the slave node can continue to provide cache function. Master-slave configuration combined with Sentinel mode solves single point of failure and improves Redis availability. The secondary node provides only read operations, while the primary node provides write operations. In the case of too many reads and too few writes, you can configure multiple secondary nodes for the primary node to improve response efficiency.
Master/slave replication process:
- Run slaveof[masterIP][masterPort] on the secondary node to save information about the primary node
- Discover the master node information from the scheduled task in the node, and establish a socket connection with the master node
- The slave node sends a Ping signal, the master node returns Pong, and the two sides can communicate with each other
- After the connection is established, the master node sends all data to the slave node (data synchronization)
- After the master node synchronizes the current data to the slave node, the replication process is completed. The master node then continuously sends write commands to the slave node to ensure data consistency between the master and slave nodes.
Redis data synchronization process:
The sync[runId][offset] command is used before redis2.8 and the psync[runId][offset] command is used after redis2.8.
The difference is that the sync command supports only full replication. Psync supports full and partial replication.
Before introducing synchronization, introduce some concepts:
RunId: A unique UUID is generated for each Redis node startup. The runId changes after each Redis restart.
Offset: The master node and the slave node maintain their own master/slave replication offset. If the master node has a write command, offset=offset+ the length of the command in bytes. After receiving the command from the master node, the slave node also adds its offset and sends its offset to the master node. In this way, the master node saves its own offset and the offset of the slave node at the same time, and determines whether the data of the master node and slave node are consistent by comparing the offset.
Repl_backlog_size: A fixed-length FIFO queue, 1MB by default, saved on the primary node.
When the master node sends data to the slave node, the master node also performs some write operations, and the data is stored in the replication buffer. After synchronizing data from the secondary node to the primary node, the primary node sends the data in the buffer to the secondary node for partial replication.
When the master node responds to a write command, it not only sends the name to the slave node, but also writes the replication backlog buffer to recover data lost by the replication command.
Here is the psync execution flow:
When the secondary node sends the psync[runId][offset] command, the primary node responds in three ways:
FULLRESYNC: first connection, full replication
CONTINUE: Partial replication is performed
ERR: The psync command is not supported for full replication
Full copy and partial copy
This is the full copy process. There are mainly the following steps:
Send psync from the node? -1 command (runId of primary node is not known because it is sent for the first time, so it is? Offset =-1 because it is the first copy.
FULLRESYNC {runId} {offset} is returned when the primary node discovers that the secondary node is the first replication. RunId is the runId of the primary node, and offset is the current offset of the primary node.
After receiving the primary node information from the node, save the information to info.
After sending FULLRESYNC, the primary node starts the BGsave command to generate the RDB file (data persistence).
The primary node sends the RDB file to the secondary node. Write commands from the master node are put into the buffer between the time the data is loaded from the slave node.
Cleans up its own database data from the node.
Load the RDB file from the node and save the data to your own database.
– If AOF is enabled on the slave node, the slave node asynchronously overwrites the AOF file.
Here are some notes on partial replication:
1. Partial replication is an optimization measure made by Redis for the high cost of full replication, which is realized by using psync[runId][offset] command. When the secondary node is replicating the primary node, the secondary node requests the primary node to send the lost command data if the network is disconnected or the command is lost. The replication backlog buffer of the primary node directly sends the lost command data to the secondary node to maintain the consistency of the replication between the primary and secondary nodes. This part of the data is generally much smaller than the full amount of data.
2. The master node still responds to the command when the master/slave connection is interrupted, but the command cannot be sent to the slave node because the replication connection is interrupted. However, the replication backlog buffer in the master node can still store the write command data in the recent period.
3. After the master/slave connection is restored, the slave node has saved its copied offset and the running ID of the master node. They are therefore sent to the master node as psync parameters, requesting partial replication.
4. After receiving the psync command, the primary node first checks whether parameter runId is consistent with itself. If so, it indicates that the previous replication is the current primary node. Then, the replication backlog buffer is searched according to the offset parameter. If the data after the offset exists, the +COUTINUE command is sent to the slave node, indicating that partial replication can be performed. Because the buffer size is fixed, full copy is performed if buffer overflow occurs.
5. The master node sends the data in the replication backlog buffer to the slave node according to the offset to ensure that the master/slave replication enters the normal state.
Redis master-slave replication has the following problems:
- Once the master node is down, the master node needs to be promoted from the master node to the master node, and the master node address of the application needs to be changed. All the slave nodes need to be ordered to copy the new master node. The whole process requires manual intervention.
- The write capability of the primary node is limited by the stand-alone node.
- The storage capacity of the primary node is limited by the single node.
- The disadvantages of native replication are also highlighted in earlier versions, such as the slave node initiating psync after a redis replication break. If the synchronization fails, full synchronization is performed on the primary database. When the primary database performs full backup, delay of milliseconds or seconds may occur.
So use sentinels to solve these problems.
Function of sentry
Redis Sentinel features include master node survival detection, master/slave health detection, automatic failover, and master/slave switchover. Redis Sentinel minimum configuration is one master, one slave.
Redis’ Sentinel system can be used to manage multiple Redis servers. The system can perform the following four tasks:
- Monitoring: Continuously check whether the primary and secondary servers are running properly.
- Notification: Sentinel notifies administrators or other applications through API scripts when a monitored Redis server has a problem.
- Automatic failover: When the primary node does not function properly, Sentinel starts an automatic failover operation. It upgrades one of the secondary nodes that has a master-slave relationship with the failed primary node to the new primary node and points the other secondary nodes to the new primary node so that manual intervention is not required.
- Configure provider: In Redis Sentinel mode, the client application initializes with a Sentinel node collection to obtain the master node information.
Principle of sentry
1. Each Sentinel node needs to perform the following tasks periodically: Each Sentinel sends a PING command to the master server, the slave server, and other Sentinel instances that it knows of, at a frequency of once per second. (As shown above)
If an instance has taken longer than down-after-milliseconds, it will be flagged as subjective milliseconds. (As shown above)
3. If a primary server is marked as subjectively offline, all Sentinel nodes that are monitoring the server confirm that the primary server is subjectively offline at a rate of once per second.
4. A master server is marked as objective offline if a sufficient number of Sentinels (at least as many as specified in the profile) agree with this determination within the specified time frame.
5. Typically, each Sentinel will send INFO commands to all known master and slave servers every 10 seconds. When a master server is marked as objective offline, Sentinel will send INFO commands to all slave servers every 10 seconds instead of once every 10 seconds.
6. Sentinel and other Sentinels negotiate the status of the primary nodes that are objectively offline. If they are in SDOWN state, a new primary node is automatically selected by voting, and the remaining secondary nodes are pointed to the new primary node for data replication.
7. The objective offline status of the primary server is removed when there is not enough Sentinel consent for the primary server to go offline. When the primary server returns a valid reply to the PING command for Sentinel, the subjective offline status of the primary server is removed.
Click to follow, the first time to learn about Huawei cloud fresh technology ~