Last time Ali asked about the principle of Redis master-slave replication, this time I finally figured it out!

1. Introduction

A single point of failure exists on a single Redis node. To solve the single point of failure, you need to configure secondary nodes on the Redis node. Use sentry to listen for the master node survival status, if the master node fails, the slave node can continue to provide caching. How do slave nodes transfer data to and from the master node? This is the master slave replication of Redis.

2. Primary/secondary configuration and functions

Temporary configuration:

Redis -cli After accessing the slave redis node, run –slaveof [masterIP] [masterPort]

Permanent configuration:

Conf file of the slave node and add slaveof [masterIP] [masterPort]

Function:

1) Master/slave configuration combined with sentinel mode can solve single point of failure and improve redis availability

2) The secondary node only improves the read operation, the primary node provides the write operation. In the case of too many reads and too few writes, multiple slave nodes can be configured for the primary node to provide response efficiency

Supplement:

Master-slave replication is not a horizontal extension of Redis, clustering is

3. Replication process

1) Run slaveof [masterIP] [masterPort] to save information about the master node

2) Discover the master node information from the scheduled task in the node, and establish socket connection with the master node

3) Ping signal is sent from the node, and the master node returns to Pong. Both sides can communicate with each other

4) After the connection is established, the master node sends all data to the slave node (data synchronization)

5) After the master node synchronizes the current data to the slave node, the replication process is completed. The master node then continuously sends write commands to the slave node to ensure data consistency between the master and slave nodes

4. Synchronize data

Sync [runId] [offset] is used before redis2.8, and psync [runId] [offset] is used after redis2.8. The difference is that the sync command supports only full replication. Psync supports full and partial replication. Before introducing synchronization, introduce some concepts:

RunId: A unique runId is generated for each Redis node startup. The runId also changes after each Redis restart

Offset: The master node and the slave node maintain their own master/slave replication offset. If the master node has a write command, offset=offset+ the length of the command in bytes. After receiving the command from the master node, the slave node also adds its offset and sends its offset to the master node. In this way, the master node saves its offset and the slave node’s offset at the same time. The data consistency between the master node and the slave node can be judged by comparing the offset

Repl_backlog_size: A fixed-length fifO queue saved on the primary node, with a default size of 1MB

1) When the master node sends data to the slave node, the master node also performs some write operations, and the data is stored in the replication buffer. After synchronizing data from the secondary node to the primary node, the primary node sends the data in the buffer to the secondary node for partial replication.

2) When the master responds to a write command, it will not only send the name to the slave node, but also write the replication backlog buffer, which is used to recover the data lost by the replication command;

Psync Execution flow

The secondary node sends the psync [runId] [offset] command. The primary node responds as follows

FULLRESYNC: first connection, full replication

CONTINUE: Partial replication is performed

ERR: The psync command is not supported for full replication

Full copy process

1) Send psync from the node? The runId of the primary node is not known because it is sent for the first time. Offset = -1 because it is the first copy.

2) When the master node discovers that the slave node is replicated for the first time, FULLRESYNC {runId} {offset} is returned. RunId is the runId of the master node, and offset is the current offset of the master node.

3) After receiving the primary node information from the node, save the information to info.

4) After sending FULLRESYNC, the master node starts bgsave command to generate RDB file (data persistence).

5) 6) The primary node sends the RDB file to the secondary node. Write commands from the master node are put into the buffer between the time the data is loaded from the slave node.

7) Clear its own database data from the node.

8) Load the RDB file from the node and save the data to its own database.

9) If AOF (another persistence scheme) is enabled on the slave node, the slave node asynchronously overwrites the AOF file.

Partial replication process

1) Partial replication is mainly an optimization measure made by Redis for the high cost of full replication, which is realized by using psync {runId}{offset} command. When slave nodes (slave) is to copy the master node (master), if there is a network failure or command the anomalies such as lost, will be to the Lord from the node’s orders to reissue lost data, if the master node replication backlog buffer memory will be from this part of the data is sent directly to the node, so you can keep the master-slave node replication consistency. This part of the data is generally much smaller than the full amount of data.

2) The master node still responds to the command when the master/slave connection is interrupted, but the command cannot be sent to the slave node because the replication connection is interrupted. However, the replication backlog buffer existing in the master node can still save the write command data in the recent period, with the default maximum cache of 1MB. When the secondary node network is restored, the secondary node is connected to the primary node again.

3) After the master/slave connection is restored, the slave node has saved its copied offset and the running ID of the master node. They are therefore sent to the master node as psync parameters, requesting a partial copy.

4) After receiving the psync command, the master node first checks whether the parameter runId is consistent with itself. If so, it indicates that the previous replication is the current master node. Then, the self-replication backlog buffer is searched according to the parameter offset. If the data after the offset exists in the buffer, the slave node sends a +COUTINUE response, indicating that partial replication can be performed. Because the buffer size is fixed, a full copy is required if a cache overflow occurs.

5) The master node sends the data in the replication backlog buffer to the slave node according to the offset to ensure that the master/slave replication enters the normal state.

5. Add

Redis troubleshooting

If the active node fails and is restarted again, the value of rUNId changes. When the psync command is executed on the secondary node, a message is displayed indicating that the original Runid cannot be found. In this case, full replication is performed again. To avoid this situation, the Redis failover mechanism is used to upgrade the secondary node to the primary node after the primary node fails. Such as sentinel mode.

The last

Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.