Redis series 4: Redis replication mechanism (master/slave replication)

This blog is the fourth in a series of Redis tutorials on the master-slave replication mechanism.

The first three articles in this series can be viewed by clicking on the following links:

Redis series (1) : Introduction to Redis and environment installation

Redis series (2) : Redis 5 data structures and common commands

Redis series 3: Redis persistence mechanism (RDB, AOF)

The master-slave replication of Redis is often asked in interviews. Several companies I interviewed recently asked me the principle of master-slave replication whenever they talked about Redis.

1. Why is primary/secondary replication required?

In the last blog post in this series, we talked about Redis persistence, which is a great way to solve the problem of data loss caused by an unexpected Redis server process exit or Redis server downtime.

However, the persistence mechanism can restore data only if your Redis server starts up properly.

If there is an extreme power outage (although it is unlikely, it is possible), the Redis server cannot even start, how to restore the data? How to make it highly available.

Even if the Redis server is up and running, the network connection can crash. I can’t believe you haven’t read the news that some services are unavailable due to cable cuts.

Because of this risk, it is impossible to use a single Redis server in the production environment. Since multiple Redis servers are used, how to synchronize data between multiple Redis servers?

This requires the Redis copy mechanism.

Another reason is that although the performance of Redis is very good, there is still a bottleneck in a single machine. Using master/slave replication can achieve read/write separation and improve the high availability of Redis, that is, the master server is used to execute write commands, and multiple slave servers are used to execute read commands, similar to the read/write separation of the database.

To sum up, master-slave replication has the following two usage scenarios:

The data backup
Reading and writing separation

2. Master/slave replication practice

First, I started two Redis instances (or two Redis servers) on the machine, respectively 127.0.0.1:6379 and 127.0.0.1:6380.

Then connect to redis instance 127.0.0.1:6380 using redis-CLI and run the following command:

SLAVEOF 127.0.0.1 6379
Copy the code

In this case, we call 127.0.0.1:6379 the master of 127.0.0.1:6380 and 127.0.0.1:6380 the slave of 127.0.0.1:6379.

The relationship between the two is as follows:

We then execute the following write command on the primary server:

SET msg "hello world"
Copy the code

At this point, we can get the value not only on the master server, but also on the slave server:

We then execute the following delete command on the primary server:

DEL msg
Copy the code

If the MSG key on the primary server is deleted, the MSG on the secondary server is also deleted:

Therefore, the databases on both the master and slave servers in replication will hold the same data.

Note that the secondary server can only execute the read command. The following error is reported when executing the write command:

If the secondary server does not want to replicate the primary server, run the SLAVEOF no one command.

3. Implementation of old version replication function (SYNC)

By “old” I mean the pre-2.8 version of Redis.

The Redis replication function is divided into the following two operations:

Synchronization: Used to update the database state of the slave server to the current database state of the master server.
Command propagation: When the database status of the primary server is changed and the database status of the primary and secondary servers is inconsistent, the command is used to make the database status of the primary and secondary servers consistent.

3.1 the synchronization

When the client sends the SLAVEOF command to the slave server to copy the master server, the slave server sends the SYNC command to the master server. The execution steps of this command are as follows:

The slave server sends messages to the master serverSYNCCommand.
Primary server receivedSYNCThen run the commandBGSAVECommand to generate an RDB file in the background and use a buffer to record all write commands executed from now on.
As the primary serverBGSAVEAfter the command is executed, the master server sends the generated RDB file to the slave server, which receives and loads the RDB file. At this point, the database status of the slave server and the master server are executedBGSAVEThe database status is consistent when the command is executed.
The master server sends all write commands recorded in the buffer to the slave server, which receives and executes these write commands. At this point, the database state of the slave server is the same as the current database state of the master server.

The following figure shows the communication between the primary and secondary servers during the SYNC command execution:

3.2 Command Propagation

After the synchronization is complete, the database status of the primary and secondary servers is consistent. When the primary server executes the write command sent by the client, the database status of the primary server is modified. As a result, the database status of the primary and secondary servers is inconsistent.

In order to make master slave server database state once again return to a consistent state, the master server to execute commands transmitted from the server operation: the primary server will perform the write command, sent to execute from the server, and from the server performs the same write command, master slave server database state once again return to a consistent state.

For an example of concrete, such as the slave servers at first have k1, k2, k3, k4, k5 these five key, and then sent to the primary server client command DEL k3, the primary server will execute this command at this time, and which command is transmitted to the server, so that the Lord is consistent with state of the database server.

The whole change process is as follows:

4. Defects of the copy function of the old version

By “old” I mean the pre-2.8 version of Redis.

Prior to Redis 2.8, replication from the slave server to the master server was divided into the following two cases:

The first copy

The slave server has not replicated any master servers before, or the slave server is currently replicating a different master server than the last one.
Repeat after wire break

The replication between the primary and secondary servers in the command propagation phase is interrupted due to network reasons, but the secondary server reconnects to the primary server by retry and continues to replicate the primary server.

The replication function of the old version can complete the initial replication well, but the efficiency of the replication system after the completion of disconnection is very low.

For example, secondary server B has been replicating primary server A. At the beginning, it was normal, and all write commands executed by primary server A passed the command

The propagation mode is passed to slave SERVER B for execution, but suddenly due to network reasons, replication between master server A and slave server B is interrupted, during which time,

Suppose the primary server executes 10 more write commands, and then secondary server B reconnects to primary server A by retry to continue the replication, then it is

How do you copy that?

Secondary server B sends the SYNC command to primary server A. After receiving the command, primary server A executes the BGSAVE command

All write commands are logged into A buffer, and after the BGSAVE command is executed, primary server A sends the generated RDB file to secondary server B.

Secondary server B receives and loads the RDB file. Primary server A then sends the write commands in the buffer to secondary server B for execution

The database status of the server is restored to be consistent, and then the command is transmitted.

In other words, the SYNC command is executed for A full replication after each disconnection. In fact, all that secondary server B needs is the write command executed by primary server A during the disconnection. In the above example, only 10 write commands are required.

The SYNC command is an expensive operation:

The primary server needs to executeBGSAVECommand to generate RDB files, which consume a lot of CPU, memory, and disk IO resources on the primary server.
The master server needs to send the generated RDB file to the slave server, which consumes a lot of network resources (bandwidth and traffic) on the master and slave servers.
The slave server that receives the RDB file needs to load the RDB file. During the load, the slave server blocks and cannot process command requests.

5. Implementation of new version of replication function (PSYNC)

The new version here refers to Redis 2.8 and later.

Starting with Redis 2.8, Redis uses the PSYNC command instead of the SYNC command to perform synchronization during replication.

The PSYNC command has the following scenarios:

Full resynchronization

Full resynchronization is used to process the initial replication, and the steps of the SYNC command are basically the same.
Partial resynchronization

Part weight break after repeated system synchronization to handle, when after disconnection from the server to connect the server, if the condition allows, the primary server can connect the slave servers disconnect during the execution of the writing life sends from the server, as long as the receiving from the server and perform the write command, the database can be updated to the current state of the main server.

Using the example above, the new version of replication, instead of generating and sending the entire RDB file, the master server only needs to send the 10 write commands executed during the disconnection to the slave server, which greatly improves performance.

The communication process between the master and slave servers during partial resynchronization is shown as follows:

So how does partial resynchronization work?

The partial resynchronization function consists of the following three parts:

Replication offsets of primary and secondary servers
Replication backlog buffer for the primary server
Running ID of the server

Let’s go through them all.

5.1 Replication Offset

The master and slave servers that perform replication each maintain a replication offset:

Each time the master propagates N bytes of data to the slave, it adds N to the value of its own replication offset.
Each time the slave receives N bytes of data propagated from the master, it adds N to its own replication offset.

As an example, suppose that the master server has three slave servers, each with a replication offset of 10086, as shown in the following figure:

The master server then propagates 33 bytes of data to three slave servers, and the replication offset of the master server increases by 33 to 10119.

Server A was disconnected at this time and did not receive data, so the offset is still 10086.

Data was received from server B and server C normally, so the offset was updated to 10019, as shown below:

Obviously, it is easy to know whether the primary and secondary servers are in a consistent state by comparing their replication offsets.

Secondary server A then reconnects to the primary server by retry and sends the PSYNC command to the primary server, reporting its current replication

The offset is 10086, and the primary server needs to handle two problems:

Should A full or partial resynchronization be performed on slave SERVER A?
If partial resynchronization is performed, where does the primary server get the data that was lost from server A during the outage?

With these two questions in mind, let’s take a look at replication backlogs.

5.2 Copying backlogs

The replication backlog buffer is a fixed-length first-in, first-out queue maintained by the primary server, with a default size of 1MB.

When the master server propagates commands, it not only sends write commands to all slave servers, but also queues them into the replication backlog buffer, as shown below:

Therefore, the replication backlog buffer on the primary server holds a portion of the recently propagated write commands and records the corresponding replication offset for each byte in the queue, as shown below:

The offset	.	10087	10088	10089	10090	10091	.
Byte value	.	The ‘*’	3	‘\r’	‘\n’	‘$’	.

When the slave server reconnects to the master server, it sends its replication offset to the master server through the PSYNC command. The master server decides which synchronization operation to perform on the slave server based on the following rules:

If the data after offset is still in the replication backlog buffer, the master server performs partial resynchronization on the slave server.
If the data after offset is no longer in the replication backlog buffer, the master server performs a full resynchronization on the slave server.

Back to the previous example:

Secondary server A reconnects to the primary server and sends A message to the primary serverPSYNCCommand to report its replication offset as 10086.
Primary server receivedPSYNCAfter the command and offset 10086, the data after the offset 10086 is checked to see if it is in the replication backlog. The data is still there, so the master sends A +CONTINUE reply to slave A indicating that the data will be synchronized in partial resynchronization mode.
The master server then sends all data (with offsets 10087 to 10119) in the replication backlog buffer after the 10086 offset to slave server A.
The 33 bytes of missing data are received from server A and returned to the same state as the primary server.

5.3 Server Running ID

Each Redis server, whether primary or secondary, has its own run ID, which is automatically generated when the server is started and consists of 40 hexadecimal characters, as shown in the figure below:

When the slave makes the first replication to the master, the master passes its run ID to the slave, and the slave saves the run ID.

When the secondary server disconnects and reconnects to the primary server, the secondary server sends the previously saved run ID to the currently connected primary server:

If the running ID saved on the secondary server is the same as that of the connected primary server, it indicates that the secondary server is replicated on the same primary server before and after the secondary server is disconnected. The primary server can continue to perform partial resynchronization.
If the running ID saved on the secondary server is different from that of the connected primary server, the primary server will perform full resynchronization on the secondary server.

5.4 Execution details of the PSYNC command

For slave servers, the PSYNC command can be invoked in two ways:

If the slave server has not replicated any master servers before, or if the SLAVEOF on one command has been executed before, the slave server will send PSYNC to the master when starting a new replication? -1 command to proactively request full resynchronization from the primary server.
If the slave has already replicated a master, the slave sends a message to the master when it starts a new replicate

PSYNC {runid} {offset} command, where runid is the running ID of the primary server for the last replication, offset is the current replication offset of the server.

For the master server, receiving the PSYNC command returns one of the following three responses to the slave server:

If the primary server returns+FULLRESYNC {runid} {offset}Where runid is the running ID of the primary server. The secondary server will save this ID and send it next timePSYNCCommand, offset is the current replication offset of the primary server. The secondary server will use this value as its initialization offset.
If the primary server returns+CONTINUEIndicates that the primary server performs partial resynchronization with the secondary server. The primary server sends the missing data to the secondary server.
If the primary server returns -error, the primary server is older than Redis 2.8 and does not recognize the PSYNC command, the secondary server will send SYNC to the primary serverCommand and perform a full resynchronization operation with the primary server.

The process described above can be represented by the following flow chart:

6. Source code and reference

Redis Design and Implementation by Huang Jianhong