In the previous two articles, we introduced Redis memory model and Redis persistence technology and solution selection respectively.

As mentioned in the previous article, Redis’s highly available solutions include persistence, master-slave replication (and read-write separation), sentry, and clustering. Persistency focuses on the backup of Redis data (from memory to hard disk backup). Primary/secondary replication focuses on hot backup of data. In addition, master – slave replication can also achieve load balancing and fault recovery.

In this article, we will detail all aspects of Redis master-slave replication, including: How to use master-slave replication, principles of master-slave replication (full replication, partial replication, and heartbeat mechanism), problems in actual applications (such as data inconsistency, replication timeout, and replication buffer overflow), configurations related to master-slave replication (such as repl-timeout and client-output-bu Ffer -limit slave, etc.

Overview of master slave replication

Master-slave replication refers to the replication of data from one Redis server to another Redis server. The former is called the master node and the latter is called the slave node. The replication of data is one-way and can only go from the master node to the slave node.

By default, each Redis server is a master node, and a master node can have multiple slave nodes (or none), but a slave node can only have one master node.

The primary and secondary replication functions include:

  • Data redundancy: Master/slave replication implements hot backup of data and is a data redundancy method other than persistence.

  • Fault recovery: When the primary node fails, the secondary node provides services for fast fault recovery, but in fact, it is a kind of service redundancy.

  • Load balancing: On the basis of master/slave replication and read/write separation, the master node provides the write service, and the slave node provides the read service (that is, the application connects to the master node when writing Redis data, and the application connects to the slave node when reading Redis data) to share server load. Especially in the scenario of less write and more read, the concurrency of the Redis server can be greatly increased by sharing the read load with multiple slave nodes.

  • High availability cornerstone: In addition to the above, master-slave replication is the foundation upon which sentry and clustering can be implemented, hence master-slave replication is the foundation of High availability in Redis.

How to use master-slave replication

To get a more intuitive understanding of master/slave replication, before introducing its internal mechanism, we first explain how to enable master/slave replication.

Note that master-slave replication is initiated entirely on the slave node and does not require us to do anything on the master node.

There are three ways to enable master/slave replication on a secondary node:

  • The configuration file

    Add: slaveof<masterip><masterport> to the configuration file of the slave server

  • Start the command

    Redis-server: slaveof<masterip><masterport>

  • Client command

    After the Redis server is started, run slaveof<masterip><masterport> on the client to make the Redis instance become the slave node.

The above three methods are equivalent. The following uses the client command as an example to look at the changes in the primary and secondary nodes of Redis after slaveof is executed.

Preparations: Start two nodes

For convenience, the master and slave nodes used in the experiment are different Redis instances on one machine. The master node listens on port 6379 and the slave node listens on port 6380. The port number listened on from the node can be changed in the configuration file:

After startup, you can see:

When the two Redis nodes are started (called 6379 node and 6380 node respectively), they are both primary nodes by default.

Establish replication

At this point, run the slaveof command on node 6380 to make it a slave node:

Observe effect

Verify that data on the master node is copied to the slave node after the master/slave replication is established.

  • First query a nonexistent key from the node:

  • Then add this key to the primary node:

  • If you query the key again from the secondary node, you will find that the operation of the primary node has been synchronized to the secondary node:

  • Then delete this key from the primary node:

  • The primary node’s operation has been synchronized to the secondary node:

After running the slaveof<masterip><masterport> command to establish the master/slave replication relationship, you can run slaveof no one to disconnect the master/slave replication relationship. Note that after the replication is disconnected from the secondary node, the existing data is not deleted, but the new data changes on the primary node are not accepted.

After the slaveof no one command is executed on the node, the log output is as follows. After the replication is disconnected, the slave node becomes the active node again:

The following logs are displayed on the active node:

Implementation principle of master-slave replication

In the previous section, we showed how to establish a master-slave relationship. This section describes the implementation principle of master-slave replication.

The master-slave replication process can be divided into three stages: connection establishment (preparation), data synchronization, and command transmission.

The main purpose of this phase is to establish a connection between the primary and secondary nodes to prepare for data synchronization.

Step 1: Save the information about the primary node

The masterHost and MasterPort fields are internally maintained on the secondary node server to store the IP address and port information of the primary node.

Slaveof: slaveof: slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof: Slaveof

During this process, you can see the log printed from the node as follows:

Step 2: Establish a socket connection

Replication timing function replicationCron() is invoked once every second on a secondary node. If a primary node is available for connection, a socket connection is created based on the IP address and port of the primary node. If the connection is successful:

The slave node establishes a file event handler for the socket to handle the replication work, and is responsible for the subsequent replication work, such as receiving RDB files and receiving command transmission.

After the master node receives the socket connection from the slave node (that is, after accept), it creates the corresponding client state for the socket and treats the slave node as a client connected to the master node. The following steps take the form of a command request from the slave node to the master node.

During this process, logs are printed from the node as follows:

Step 3: Send the ping command

After the slave node becomes the client of the master node, the ping command is sent for the first request to check whether the socket connection is available and whether the master node is currently able to process the request.

After the ping command is sent from a node, the following situations may occur:

  • Pong is returned, indicating that the socket connection is normal and the primary node can process the request. The replication process continues.

  • Timeout: Indicates that the secondary node does not receive any reply from the primary node after a certain period of time. If the socket connection is unavailable, disconnect the socket connection from the secondary node and reconnect the socket.

  • Return a result other than pong: If the master node returns a result other than pong, for example, a script that timed out is being processed, the master node cannot process the command. Disconnect the socket from the slave node and reconnect.

When the master node returns pong, the slave node prints the following log:

Step 4: Authentication

If the masterAuth option is set from the slave node, the slave node needs to authenticate to the master node; If this option is not set, no authentication is required. Authentication from the slave node is done by sending the auth command to the master node, whose parameter is the value of masterauth in the configuration file.

If the password status on the master node is the same as that on the slave node masterauth (consistent means both exist and the password is the same, or neither exists), the authentication succeeds and the replication continues. If no, disconnect the socket from the secondary node and reconnect the socket.

Step 5: Send the port information of the secondary node

After authentication, the slave node sends the listening port number (6380 in the previous example) to the master node. The master node saves this information to the Slave_listening_port field of the slave node. This port information has no other function than to be displayed when info Replication is being performed on the primary node.

After the connection between the primary and secondary nodes is established, data synchronization can start. This phase can be understood as the initialization of data on the secondary node. To perform this operation, the secondary node sends the psync command to the primary node to start the synchronization.

The data synchronization phase is the core phase of the primary/secondary replication. Based on the status of the primary/secondary nodes, the data synchronization phase can be divided into full replication and partial replication. The following sections will explain the two replication modes and the execution process of the psync command.

Note that before the data synchronization phase, the slave node is the client of the master node, but the master node is not the client of the slave node. At this stage and beyond, the master and slave nodes are clients to each other. The reason is that: Before this, the master node only needs to respond to the request of the slave node, and does not need to take the initiative to send the request, but in the data synchronization stage and the later command propagation stage, the master node needs to take the initiative to send the request (such as the write command in the push buffer) to the slave node to complete the replication.

After the data synchronization phase is complete, the primary and secondary nodes enter the command transmission phase. In this phase, the master node sends the write command to the slave node, and the slave node receives and executes the command to ensure data consistency between the master and slave nodes.

In the command propagation phase, in addition to sending write commands, the master and slave nodes also maintain heartbeat mechanisms: PING and REPLCONF ACK. Because the heartbeat mechanism involves partial replication, it will be introduced separately after the introduction of partial replication.

Delay and inconsistency

Note that command propagation is an asynchronous process, that is, the master node does not wait for a reply from the slave node after sending a write command. Therefore, it is difficult to maintain real-time consistency between master and slave nodes and delay is inevitable. The extent of data inconsistency depends on the network status between the primary and secondary nodes, the execution frequency of write commands on the primary node, and the repl-disable-tcp-nodelay configuration on the primary node.

Repl-disable-tcp-nodelay no: controls whether to disable TCP_NODELAY between the primary node and the secondary node during command propagation. The default value is no, that is, TCP_NODELAY is not disabled. When the value is set to yes, TCP merges packets to reduce the bandwidth, but the transmission frequency decreases, and the data delay on the slave node increases, resulting in poor consistency. The sending frequency depends on the configuration of the Linux kernel. The default value is 40ms. When set to no, TCP immediately sends data from the primary node to the secondary node, increasing the bandwidth but decreasing the latency.

In general, the value is set to yes only when the application has a high tolerance for Redis data inconsistencies and the network between master and slave nodes is poor. The default value no is used in most cases.

Full copy and partial copy

Let’s look at full and partial replication in the data synchronization phase.

Before Redis2.8, the secondary node sends the sync command to the primary node to synchronize data. In this case, full replication is performed. After Redis2.8 or later, the secondary node can send the psync command to request data synchronization. In this case, the synchronization mode may be full or partial replication depending on the status of the primary and secondary nodes. This section uses Redis2.8 and later versions as an example.

  • Full replication: For initial replication or other cases where partial replication is not possible, sending all data from the master node to the slave node is a very heavy operation.

  • Partial replication: Used for replication after network interruption. Only the write commands executed by the primary node during the interruption are sent to the secondary node, which is more efficient than full replication. Note that if the network is interrupted for a long time and the primary node cannot completely save the write commands executed during the interruption, partial replication cannot be performed and full replication is still used.

The full replication process of Redis using the psync command is as follows:

  • The secondary node sends a request for full replication to the primary node when it detects that partial replication cannot be performed. Or the secondary node sends a partial replication request, but the primary node determines that the full replication cannot be performed. The specific judgment process needs to be introduced after the partial replication principle is described.

  • When the master node receives the full copy command, it executes bgSave, generates an RDB file in the background, and uses a buffer (called the copy buffer) to record all write commands executed from now on.

  • After the bgSave execution of the master node is complete, the RDB file is sent to the slave node. The slave node first cleans its own old data, then loads the received RDB file to update the database state to the database state when the master node executes the BGSave.

  • The master node sends all the write commands in the replication buffer described above to the slave node, which executes these write commands to update the database state to the latest state of the master node.

  • If AOF is enabled on the slave node, bgreWriteAOF is triggered to ensure that the AOF file is updated to the latest state on the master node.

The following shows the logs generated by the primary and secondary nodes during full replication. The logs correspond to the preceding steps:

  • Logs generated on the primary node

  • Logs are generated from the node

Among them, a few points need to be noted:

  • Received 89260 bytes of data from the autonomous node;

  • The slave node clears the old data before loading the master node.

  • The slave node calls bgrewriteAof after synchronizing data.

As can be seen from the full copy process, full copy is a very heavy operation:

  • The primary node forks the RDB persistence using the BGSave command, which is very CPU intensive, memory intensive (page table replication), and disk IO. The performance issues of BGSave have been explained in my article Redis High Availability: Persistence Techniques and Solution Choices.

  • The master node sends RDB files to the slave nodes over the network, which consumes a lot of bandwidth of both the master and slave nodes.

  • The process of emptying old data from the node and loading new RDB files is blocked and cannot respond to client commands. There is also an additional cost if bgrewriteaof is executed from the node.

Full replication is inefficient when the primary node has a large amount of data. Therefore, Redis2.8 provides partial replication to handle data synchronization when the network is interrupted.

The implementation of partial replication relies on three important concepts:

Copy offset

The master node and the slave node maintain a replication offset, which represents the number of bytes passed from the master node to the slave node. When the master node propagates N bytes of data to the slave node each time, the offset of the master node increases by N. Each time the slave node receives N bytes of data from the master node, the offset from the node is increased by N.

Offset is used to check whether the database status of the primary and secondary nodes is the same. If the offset is the same, the database status is the same. If the offsets are different, they are inconsistent. In this case, you can find the missing part of data from the node according to the two offsets. For example, if the primary node has an offset of 1000 and the slave node has an offset of 500, then partial replication needs to pass data at offset 501-1000 to the slave node, and the location of the data store at offset 501-1000 is the replication backlog buffer described below.

Replication backlogs

The replication backlog buffer is a fixed-length, first-in, first-out (FIFO) queue maintained by the primary node, with a default size of 1MB; Created when the master node has a slave node, it backs up the data that the master node recently sent to the slave node. Note that only one replication backlog buffer is required whether the master node has one or more slave nodes.

In the command propagation phase, the master node not only sends the write command to the slave node, but also sends a copy to the replication backlog buffer as a backup of the write command. In addition to storing write commands, the replication backlog buffer also stores the replication offset for each byte in it. Because the replication backlog buffer is fixed in length and first-in, it stores the most recent write commands executed by the master node. Early write commands are squeezed out of the buffer.

Because the buffer length is fixed and limited, the write commands that can be backed up are limited. If the offset difference between the primary node and the secondary node exceeds the buffer length, partial replication cannot be performed, but full replication can only be performed. Conversely, to increase the probability of partial replication in the event of a network outage, the size of the replication backlog buffer can be increased as needed (by configuring the repl-backlogs -size). For example, if the average time of network outages is 60 seconds and the average number of bytes of write commands (protocol-specific format) generated by the primary node is 100KB per second, the average requirement for replication backlogs is 6MB. To be safe, you can set it to 12MB to ensure that partial replication can be used for most disconnections.

After the slave node sends the offset to the master node, the master node decides whether to perform a partial copy based on the offset and buffer size:

  • If all data after offset is still in the replication backlog, partial replication is performed.

  • If the data after offset is no longer in the replication backlog (the data has been squeezed out), a full copy is performed.

Server Running ID(RUNID)

Each Redis node (whether primary or secondary) automatically generates a random ID at startup (each startup is different), consisting of 40 random hexadecimal characters; Runid is used to uniquely identify a Redis node. You can run the info server command to view the rUNID of the node:

When the master and slave nodes replicate for the first time, the master node sends its RUNID to the slave node, and the slave node saves the RUNID. When disconnecting and reconnecting, the slave node sends this RUNID to the master node. The master node determines whether partial replication can be performed based on rUNId:

  • If the rUNID saved by the slave node is the same as the rUNID of the master node, it indicates that the master node and the slave node have been synchronized before, and the master node will continue to try to use partial replication (whether partial replication is possible depends on offset and replication backlog buffer).

  • If the RUNID saved by the secondary node is different from the current RUNID of the primary node, it indicates that the Redis node synchronized by the secondary node before the disconnection is not the current primary node and can only be fully replicated.

After knowing the replication offset, replication backlog buffer, and node run ID, this section describes the parameters and return values of the psync command to illustrate how the primary and secondary nodes determine whether full or partial replication is used during the execution of the psync command.

The following figure shows the execution process of the psync command:

Image credit: Redis Design & Implementation

(1) First, the slave node decides how to call the psync command according to the current state:

  • If slaveof has not been executed before or slaveof no one has been executed recently, the slave node sends the command psync? -1: requests full replication from the primary node.

  • If slaveof has been run on the secondary node, run the psync<runid><offset> command, where runid is the runid of the primary node that was copied last time, and offset is the replication offset saved on the secondary node when the replication expires.

(2) The master node decides whether to perform full or partial replication based on the psync command received and the current server status:

  • If the version of the primary node is earlier than Redis2.8, -err is returned. In this case, the secondary node sends the sync command again to perform full replication.

  • If the version of the master node is new, the RUNID is the same as the rUNID sent from the slave node, and all data after the offset sent from the slave node exists in the replication backlog buffer, reply +CONTINUE, indicating partial replication will take place, and the slave node will wait for the master node to send the missing data.

  • If the primary node version is new enough but the rUNID is different from the one sent from the secondary node, or the data after the offset sent from the secondary node is no longer in the replication backlog (squeezed out of the queue), reply +FULLRESYNC<runid><offset> to indicate full replication. Runid indicates the current rUNId of the primary node, and offset indicates the current offset of the primary node. The secondary node saves the two values for future use.

In the following demonstration, the network is restored after a few minutes of interruption, and the disconnected primary and secondary nodes are partially replicated. To facilitate the simulation of network outages, the master and slave nodes in this example are on two machines in the LAN.

Network interruption

After a period of network interruption, both the master node and the slave node find themselves disconnected from each other (the judgment mechanism of the master and slave nodes for timeout will be explained later). After that, the secondary node starts to reconnect the primary node. Because the network has not been recovered, the reconnection fails, and the secondary node keeps trying to reconnect.

The log of the active node is as follows:

The log of the slave node is as follows:

The network recovery

After the network is restored, the secondary node successfully connects to the primary node and requests partial replication. After the primary node receives the request, the secondary node and the primary node perform partial replication to synchronize data.

The log of the active node is as follows:

The log of the slave node is as follows:

5. Heartbeat mechanism at command transmission stage

In the command propagation phase, in addition to sending write commands, the master and slave nodes also maintain heartbeat mechanisms: PING and REPLCONF ACK. The heartbeat mechanism is useful for determining timeout and data security of primary and secondary replication.

At specified intervals, the master node sends the PING command to the slave node. The purpose of the PING command is to enable the slave node to determine the timeout.

The frequency of PING is controlled by the rep-ping-slave-period parameter, in seconds. The default value is 10s.

There is some debate as to whether the PING command is sent from the master node to the slave node or vice versa. Because in the official Redis documentation, the comment on this parameter states that the PING command is sent from the node to the master node, as shown in the following figure:

However, according to the parameter name (including ping-slave) and the code implementation, I believe that the ping command is sent from the master node to the slave node. The relevant codes are as follows:

In the command propagation phase, the slave node sends the REPLCONF ACK command to the master node once per second. The command format is: REPLCONF ACK{offset}, where offset indicates the replication offset saved from the node.

The REPLCONF ACK command provides the following functions:

Monitor the network status of primary and secondary nodes in real time

This command is used by the master node to determine the replication timeout. In addition, using Info Replication on the master node, you can see that the lag value in the state of its slave node represents the time since the master node last received the REPLCONF ACK command. Normally, this value would be 0 or 1, as shown below:

Detection command loss

The slave node sends its offset, and the master node compares it with its offset. If the slave node loses data (such as network packet loss), the master node pushes the missing data (also using the replication backlog buffer here). Note that offset and copy backlog buffers can be used not only for partial replication but also for command loss situations. The difference lies in that the former is carried out after disconnection and reconnection, while the latter is carried out without disconnection of the master and slave nodes.

Secondary ensures the number and latency of slave nodes

Min-rabes-to-write and min-rabes-max-lag parameters are used in the master node of Redis to ensure that the master node will not execute write commands under unsafe conditions.

Unsafe means that the number of slave nodes is too small or the delay is too high. For example, min-rabes-to-write and min-rabes-max-lag are 3 and 10 respectively, which means that if the number of slave nodes is less than 3 or the delay value of all slave nodes is greater than 10s, the master node refuses to execute the write command. The lag value in Info Replication is determined by the time the master node receives the REPLCONF ACK command.

Six, the application of problems

Read/write separation on the basis of master/slave replication can realize read load balancing of Redis: the master node provides write service, and one or more slave nodes provide read service (multiple slave nodes can not only improve data redundancy, but also maximize read load capacity). In application scenarios with heavy read loads, the concurrency of the Redis server can be greatly increased.

Here are some things to be aware of when using Redis for read/write separation:

Delays and inconsistencies

As mentioned earlier, since command propagation for master-slave replication is asynchronous, latency and data inconsistencies are inevitable. If the application has low acceptance of data inconsistencies, possible optimization measures include:

  • Optimize the network environment between the primary and secondary nodes (for example, in the same equipment room).

  • Monitors the delay of the primary and secondary nodes (by offset). If the delay of the secondary node is too large, the application is notified that it will not read data from the secondary node.

  • Use clustering to extend both the write load and the read load.

Data inconsistencies on slave nodes can be more serious in situations other than the command propagation phase, such as when the connection is in the data synchronization phase, or when the slave node loses its connection to the master node.

The slave node’s slave-serve-stale-data parameter is relevant for this, controlling how the slave node behaves in this case: if yes (the default), the slave node can still respond to the client’s commands; If it is no, the slave node can only respond to a few commands, such as info and slaveof. The setting of this parameter is related to the data consistency requirement of the application. If data consistency requirements are high, set this parameter to No.

Data expiration problem

In Redis standalone, there are two deletion strategies:

  • Lazy deletion: The server does not delete data actively. When a client queries a data, the server determines whether the data has expired and deletes the data.

  • Periodic deletion: The server performs a scheduled task to delete expired data, but due to the tradeoff between memory and CPU (deletion frees memory, but frequent deletion is cpu-unfriendly), the frequency and execution time of such deletion are limited.

In the master-slave replication scenario, to ensure data consistency between the primary and secondary nodes, the secondary node does not delete data actively, but the primary node controls the deletion of expired data from the secondary node. The lazy deletion and periodic deletion policies of the master node cannot ensure that the master node deletes expired data in a timely manner. Therefore, when the client reads data from the node through Redis, it is easy to read expired data.

In Redis3.2, the slave node is added to judge whether the data has expired when it reads data. If the data has expired, the slave node does not return the data to the client. Updating Redis to 3.2 solves the data expiration problem.

Failover problem

In the read-write separation scenario without sentry, the application connects different Redis nodes for read and write respectively. When the master node or slave node is changed due to problems, the connection of the application to read and write Redis data needs to be modified in time. Connection switching can be carried out manually, or write their own monitoring program to switch, but the former response is slow, error-prone, the latter is complicated, the cost is not low.

conclusion

Before using read/write separation, other methods can be considered to increase the read load capacity of Redis, such as optimizing the primary node as much as possible (reducing slow queries, reducing the blocking caused by persistence and other conditions) to improve the load capacity; Use Redis cluster to improve both read and write loads. If read/write separation is used, sentry can be used to make failover of master/slave nodes as automatic as possible and to reduce intrusion into the application.

Timeout is one of the most important causes of replication interruption. This section describes timeout separately, and the next section describes other problems that can cause replication interruption.

Judgment significance of timeout

During and after the replication connection is established, both the master and slave nodes have mechanisms to determine whether the connection times out. The significance is as follows:

  • If the master node determines that the connection times out, it will release the connection of the corresponding slave node, thus releasing various resources. Otherwise, the invalid slave node will still occupy various resources of the master node (output buffer, bandwidth, connection, etc.). In addition, the judgment of connection timeout can make the master node know the number of current effective slave nodes more accurately, which is helpful to ensure data security (in accordance with the aforementioned parameters such as min-slave-to-write).

  • If the connection times out, you can establish a connection in time to avoid long-term data inconsistency with the primary node.

Judge mechanism

The core of determining the timeout of the master/slave replication is the repl-timeout parameter, which specifies the timeout threshold (60 seconds by default) for both the master and slave nodes. The conditions for triggering timeout on the primary and secondary nodes are as follows:

(1) Master node: Replication timing function replicationCron() is called once every second to determine whether the time since the last REPLCONF ACK of each slave node has exceeded the Repl-timeout value. If so, the connection of the corresponding slave node will be released.

(2) From the node: The judgment of timeout from the node is also judged in the replication timing function, and the basic logic is:

  • If the connection is being established and the time since the last message from the primary node has exceeded Repl-timeout, the connection with the primary node is released.

  • If the data synchronization phase is in progress and the time for receiving the RDB file from the primary node times out, the data synchronization stops and the connection is released.

  • If the current command is in the command propagation phase and the time since the last PING command or data is received from the primary node exceeds the repl-timeout value, the system releases the connection with the primary node.

The related source code for master and slave nodes to determine connection timeout is as follows:

Swipe up or down to view the complete code ↓

/* Replication cron function, called 1 time per second. */
void
replicationCron(
void
) {
    
static
long
long
replication_cron_loops = 0;
    
/* Non blocking connection timeout? * /
    
if
(server.masterhost &&
        
(server.repl_state == REDIS_REPL_CONNECTING ||
         
slaveIsInHandshakeState()) &&
         
(
time
(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
    
{
        
redisLog(REDIS_WARNING,
"Timeout connecting to the MASTER..."
);
        
undoConnectWithMaster();
    
}
    
/* Bulk transfer I/O timeout? * /
    
if
(server.masterhost && server.repl_state == REDIS_REPL_TRANSFER &&
        
(
time
(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
    
{
        
redisLog(REDIS_WARNING,
"Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value."
);
        
replicationAbortSyncTransfer();
    
}
    
/* Timed out master when we are an already connected slave? * /
    
if
(server.masterhost && server.repl_state == REDIS_REPL_CONNECTED &&
        
(
time
(NULL)-server.master->lastinteraction) > server.repl_timeout)
    
{
        
redisLog(REDIS_WARNING,
"MASTER timeout: no data nor PING received..."
);
        
freeClient(server.master);
    
}
    
// omit irrelevant code here......
    
/* Disconnect timedout slaves. */
    
if
(listLength(server.slaves)) {
        
listIter li;
        
listNode *ln;
        
listRewind(server.slaves,&li);
        
while
((ln = listNext(&li))) {
            
redisClient *slave = ln->value;
            
if
(slave->replstate ! = REDIS_REPL_ONLINE)
continue
;
            
if
(slave->flags & REDIS_PRE_PSYNC) 
continue
;
            
if
((server.unixtime - slave->repl_ack_time) > server.repl_timeout)
            
{
                
redisLog(REDIS_WARNING, 
"Disconnecting timedout slave: %s"
.
                    
replicationGetSlaveName(slave));
                
freeClient(slave);
            
}
        
}
    
}
    
// omit irrelevant code here......
}

Need to pay attention to the pit

Here are some practical issues related to connection timeouts during the replication phase:

Data synchronization phase: When the master and slave nodes perform a full copy of BGSave, the master node forks the current data to an RDB file, and then transfers the RDB file to the slave node over the network. If the RDB file is too large, the primary node takes too much time to fork the child process and save the RDB file, which may cause a timeout due to long time without receiving data from the node. At this point, the secondary node reconnects to the primary node, and then replicates fully again. The secondary node times out again, and then reconnects again. It’s a sad cycle. In order to avoid this situation, in addition to paying attention to the single Redis data volume is not too large, on the other hand, it is appropriate to increase the repl-timeout value, the specific size can be adjusted according to the BGSave time.

Command propagation phase: During this phase, the master node sends the PING command to the slave node. The frequency is controlled by the rep-ping-slave-period. This parameter should be significantly smaller than the Repl-timeout value, which is at least several times the former value. Otherwise, if the two parameters are equal or close, network jitter causes some PING commands to be lost, and the master node also does not send data to the slave node, the slave node can easily determine the timeout.

Slow query blocking: if the master or slave node performs some slow queries (such as keys * or hgetall for big data, etc.), the server blocks; Failure to respond to requests from the other node in the replication connection during blocking may result in replication timeout.

In addition to the primary and secondary node timeouts that are one of the causes of replication outages, there are other conditions that can cause replication outages, most notably replication buffer overflows.

Replication buffer overflow

Have previously mentioned, in the full amount of copy stage, the master node will perform write command in the copy buffer, the buffer to store the data includes the following several time NaZhu nodes perform the write command: bgsave generated RDB and RDB file by the master node to node, the node to empty the old data and load the RDB data in a file.

If the data volume of the primary node is large or the network delay between the primary and secondary nodes is large, the buffer size may exceed the limit. In this case, the primary node disconnects from the secondary node. Full replication → Replication buffer overflow leads to connection interruption → reconnection → Full replication → Replication buffer overflow leads to connection interruption…… The loop.

The size of the replication buffer is set by client-output-buffer-limit slave{hard limit}{soft limit}{soft seconds}. The default value is client-outport-buffer-limit slave 256MB 64MB 60. If the buffer is larger than 256MB or 64MB in consecutive 60s, the master node disconnects from the slave node. This parameter can be dynamically configured by running the config set command (that is, it takes effect without restarting Redis).

When the replication buffer overflows, the primary node prints logs as follows:

Note that the replication buffer is a type of client output buffer. The master node allocates the replication buffer separately for each slave node. The replication backlog buffer is one for each master node, regardless of how many slave nodes it has.

Now that we’ve covered the details of Redis replication, we can summarize when to use partial replication and what to look out for in the following common scenarios.

Create a copy for the first time

In this case, full replication is inevitable. However, note the following: If the primary node has a large amount of data, avoid traffic peak hours to avoid congestion. If multiple secondary nodes need to replicate the primary node, stagger the secondary nodes to avoid excessive bandwidth usage on the primary node. In addition, if there are too many slave nodes, the topological structure of the master-slave replication can be adjusted from one master with multiple slave nodes to a tree structure (the middle node is both the slave node of its master node and the master node of its slave node). However, the use of tree structure should be cautious — the reduction of direct slave nodes of the master node reduces the burden of the master node, but the delay of multi-layer slave nodes increases, the data consistency deteriorates, and the structure is complex, and the maintenance is quite difficult.

Restart the active node

Master node reboots can be discussed in two cases, one is failure caused outage, and the other is planned restart.

The active node is down

After the active node is restarted, the Runid changes. Therefore, only full replication can be performed instead of partial replication. In fact, in the event that the master node goes down, failover should be performed by upgrading one of the slave nodes to the master and replicating the other slave nodes from the new master. Failover should be as automatic as possible, and sentinels, described later in this article, can do it automatically.

Safe restart: Debug Reload

In some scenarios, you may want to restart the primary node, for example, the memory fragmentation rate of the primary node is too high, or you may want to adjust some parameters that can only be adjusted at startup. If the master node is restarted by normal means, the RUNID changes, which may lead to unnecessary full replication.

To solve this problem, Redis provides debug Reload restart mode: after the restart, the rUNId and offset of the primary node are not affected, avoiding full replication.

As shown in the following figure, the runid and offset are not affected after debug Reload restarts:

Debug Reload, however, is a double-edged sword: it emptying the current memory and reloading it from the RDB file, a process that causes the primary node to block, so it also needs to be careful.

Restarting a Node

After the secondary node is down and restarted, the rUNID of the saved primary node is lost, so partial replication cannot be performed even if slaveof is executed again.

Network interruption

If a network problem occurs between primary and secondary nodes, resulting in a short period of network interruption, the following situations can be discussed:

  • The network problem lasted for a very short time, only causing a short packet loss, and neither the master nor the slave nodes judged timeout (no repl-timeout was triggered). In this case, you only need to use the REPLCONF ACK to replace the lost data.

  • The network problem lasted for a long time, the master and slave nodes judged to have timed out (triggered the Repl-timeout), and lost more data than the replication backlog buffer could store. In this case, only full replication can be performed on the primary and secondary nodes instead of partial replication. In order to avoid this situation, the size of the replication backlog buffer should be adjusted according to the actual situation. In addition, timely detection and repair of network outages can also reduce full replication.

  • In between, the master and slave nodes are judged to have timed out and the lost data is still in the replication backlog. In this case, partial replication can be performed on the primary and secondary nodes.

This section summarizes the replication-related configurations, explaining what they do, at what stage, and how to configure them. By understanding these configurations, on the one hand, deepen the understanding of Redis replication, on the other hand, master the methods of these configurations, you can optimize the use of Redis, less pit.

The configurations can be roughly divided into the configurations related to the master node, the configurations related to the slave node, and the configurations related to both the master and slave nodes, as described below.

Configurations related to both primary and secondary nodes

First, the most special configuration determines whether the node is a master or slave node:

  • Slaveof <masterip><masterport> : Redis works on startup; The Redis server with this configuration enabled becomes a slave node after being started. This comment is commented out by default, that is, Redis servers are all master nodes by default.

  • Repl-timeout 60: determines the connection timeout between the primary and secondary nodes in each phase. For details, see the previous section.

Configurations related to the active node

  • Repl-diskless-sync no: controls whether the primary node uses diskless replication (diskless replication) during the full replication phase. Diskless replication refers to full replication in which the master node writes data directly into the socket of the slave node rather than into the RDB file. Disks are not involved in the whole process. Diskless replication is more advantageous when disk I/O is slow and network speed is fast. Note that as of Redis3.0, Diskless replication is experimental and is turned off by default.

  • Repl-diskless-sync-delay 5: This parameter applies to the full replication phase. When the primary node uses diskless replication, this parameter determines the pause time (unit: second) before sending data from the primary node to the secondary node. This parameter is valid only when diskless replication is enabled. The default value is 5s. The pause time is set based on the following considerations: a. Once the data transmission to the slave socket starts, the newly connected slave can only start the new data transmission until the current data transmission ends. B. Multiple secondary nodes have a high probability to establish a primary/secondary replication within a short period of time.

  • Client-output-buffer-limit slave 256MB 64MB 60: this parameter is related to the buffer size of the master node in the full replication phase. For details, see the previous section.

  • Repl-disable-tcp-nodelay no: indicates the delay of command propagation. For details, see the previous section.

  • Masterauth

    : relates to authentication during the connection establishment phase, as described earlier.

  • Repl-ping-slave-period 10: refers to the timeout judgment of the master and slave nodes during command transmission. For details, see the previous section.

  • Repl-backlog -size 1MB: Specifies the size of the replication backlog buffer, as described earlier.

  • Repl – backlogs – TTL 3600: Replicates the length of time the backlog buffer remains when the master node has no slave nodes, so that full replication can be performed when the disconnected slave node is reconnected (default: 3600s). If set to 0, the replication backlog buffer is never released.

  • Min-rabes-to-write 3 and Min-Rabes-max-lag 10: specify the minimum number of slave nodes and the corresponding maximum delay of the master node, as described above.

Configure secondary nodes

  • Slave-serve-stale-data yes: indicates whether the slave node responds to client commands when data is stale. For details, see the previous section.

  • Slave-read-only yes: Indicates whether the slave node is read-only. The default is read-only. Data on the primary and secondary nodes may be inconsistent if the write operation is enabled on the secondary node. Therefore, do not modify this configuration.


Redis High Availability: Persistence Technology and Solution Selection

Cutting off the primary node: When the primary node is down, a common Dr Policy is to promote one secondary node to the primary node and mount other secondary nodes to the new primary node. In this case, only full replication can be performed on these secondary nodes. If the Redis single machine memory reaches 10GB, the synchronization time of a slave node is in the order of minutes. If there are more slave nodes, the recovery speed will be slower. If the read load is high and the secondary node cannot provide services during this period, the system is under great pressure.

Secondary library expansion: If the volume of data increases suddenly, secondary nodes are required to share the read load. If the volume of data is too large, secondary nodes cannot synchronize data in a timely manner.

Buffer overflow: Cut the Lord and from the library expands from the node can normal synchronous scenario (slow), but if the amount of data is too big, cause the whole amount of the phase of the master node buffer overflow of duplication, leading to replicate the interrupt, the data synchronization will full amount copy of master-slave node – > copy buffer overflows cause disruptions to reconnection, full amount, copy – > copy buffer overflow caused interruption… The loop.

Timeout: If the amount of data is too large, the primary node fork+ Saves the RDB file during the full replication phase, and the secondary node cannot receive data for a long time, triggering a timeout. Data synchronization between the primary and secondary nodes may also fall into full replication → Replication interruption due to timeout → reconnection → Full replication → Replication interruption due to timeout…… The loop.

In addition, the absolute amount of memory on the primary node should not be too large, and the proportion of memory on the host should not be too large: it is best to use only 50-65% of the memory, leaving 30-45% for bgsave commands, replication buffers, etc.

The master node:

From the node:

For slave nodes, the top half shows their status as slave nodes and, starting with ConnectD_Slaves, shows their status as potential master nodes.

Much of what is shown in Info Replication has already been covered in this article and will not be covered here.

Seven,

To review the main points of this article:

Functions of master/slave replication: Understand the problems to be solved by master/slave replication, such as data redundancy, fault recovery, and read load balancing.

Master/slave replication operation: the slaveof command.

The principles of master-slave replication are as follows: The master-slave replication includes the connection establishment phase, data synchronization phase, and command transmission phase. In the data synchronization phase, there are two data synchronization modes: full replication and partial replication. During command propagation, the PING and REPLCONF ACK commands are used between the primary and secondary nodes to check the heartbeat of each other.

Problems in application: Including read/write separation (data inconsistency, data expiration, failover, etc.), replication timeout, replication interruption, etc., and then summarizes the configuration related to master/slave replication. Among them, repl-timeout and client-output-buffer-limit slave may be helpful to solve the problems in master/slave replication of Redis.

Although master-slave replication solves or relieves problems such as data redundancy, fault recovery, and read load balancing, its disadvantages are still obvious: fault recovery cannot be automated. Write operations cannot be load balanced. Storage capacity is limited by single machine; Solving these problems requires the help of sentinels and clusters, which I will cover in a later article.

  • Redis Development and Operation

  • Redis Design and Implementation

  • Redis In Action

  • http://mdba.cn/2015/03/16/redis – slow query/copy interrupt problem

  • https://redislabs.com/blog/top-redis-headaches-for-devops-replication-buffer/

  • http://mdba.cn/2015/03/17/redis master-slave replication (2) – the replication – buffer and replication backlog /

  • https://github.com/antirez/redis/issues/918

  • https://blog.csdn.net/qbw2010/article/details/50496982

  • Why shouldn’t Redis memory be too large

Author: The programming myth

Source: www.cnblogs.com/kismetv/p/9236731.html