Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.


In the master-slave architecture of Redis, it is inevitable that the connection between the master-slave libraries will be interrupted due to network interruption, blockage and other reasons.

Command propagation

Under normal circumstances, the master library writes the operation commands to the Replication buffer, and the data in the buffer is then sent to the slave library over the network connection.

In a master-slave architecture, there is a replication buffer for each slave library connected by the master library. After the master library executes each operation command, the command is written to the corresponding Replication buffer for each slave library. In addition, a replication backlog is written to the master library. The replication backlog is not one for each slave library, but one for each master library.

What is replication Backlog

The replication backlog is a circular area whose size can be set with the repl-backlog-size parameter.

The master library records the write position using master_repl_offset, and the slave library records its read position using Slave_repl_offset, also known as offset.

At the beginning of the replication, both values are the same and are in the initial position. Every time the master library writes an operation to the Replication backlog, master_REPL_offset is added by 1, and slave_REPL_offset is added by 1 after the slave library copies the operation commands.

Normally, slave_REPL_offset will follow master_REPL_offset, keeping a small difference or equal between the two.

After disconnection and restoration

If the network connection between the master and slave libraries breaks, the slave library cannot continue to copy the master library’s action commands, but the master library still writes the action commands to the Replication Backlog.

When the network is restored, the slave library will continue to request data synchronization from the master library, that is, send the following command:

psync runID offset
Copy the code

RunID: slave_repl_offset: slave_repl_offset: slave_repl_offset: slave_repl_offset In this case, the master library only needs to synchronize the operation commands between master_REPL_offset and slave_REPL_offset to the slave library.

However, as mentioned earlier, the Replication backlog is a circular structure, and if the network outage lasts too long and the master_REPL_offset grows, some of the action commands that are not copied from the slave library will be overwritten. If that happens, you need to make a full copy again.

To avoid full replication, the replication backlog can be set to an appropriate size by changing the value of the repl-backlog-size parameter.

This value needs to be set based on the actual situation. The time between the command generation in the master library and the replication in the slave library is T, the number of commands generated per second is C, and the size of the command is S. The value cannot be less than the product of these commands. This value should be multiplied by 2 or more to account for unexpected network pressure and possible congestion during system operation.