Redis’s highly available solutions include persistence, master-slave replication (and read-write separation), sentry, and clustering. Persistency focuses on the backup of Redis data (from memory to hard disk backup). Primary/secondary replication focuses on hot backup of data. In addition, master – slave replication can also achieve load balancing and fault recovery.
However, the real solution to the single point of load and memory problems of Redis depends on the clustering model
The master/slave replication function of Redis has two modes, the old version of replication before Redis2.8 and the new version of replication. The new version of replication is to solve the low efficiency of the old version of replication and replication redundancy. The following describes the flow of the two replication functions and the working principle of replication.
Redis master slave replication process
The process copied by the older version
Redis replication is divided into two operations: synchronization and command propagation.
- Synchronization: Updates the database state of the slave server to the current database state of the master server, that is, the master server’s memory is completely copied to the slave server
- Command propagation: When the master server receives and executes a command from the client, the state of the master server is inconsistent with that of the slave server. In this case, the master server propagates the command to all the slave servers to ensure consistency between the master and slave servers
synchronous
When running the SLAVEOF command on the slave server, the slave server is required to replicate the master server and synchronize the database status of the master server. In this case, the secondary server first connects to the primary server, runs the PING command to check the network connectivity between the primary server and secondary server, clears the database status of the secondary server, and sends the synchronization command to prepare data synchronization on the primary server.
- The slave server sends messages to the master server
SYNC
The command - Primary server received
SYNC
Execute after commandBGSAVE
Command to generate an RDB file in the background and create one at the same timeCopy bufferRecords all write commands executed since the current time. - As the primary server
BGSAVE
The secondary server receives the RDB file and loads it into the memory. When the RDB file is loaded, the secondary server sends a command to the primary server to indicate that the current synchronization is complete - The master server sends all write commands recorded in the replication buffer to the slave server, which updates the database state of the slave server to the current state of the master server, keeping the data consistent in memory.
Command transmission
After the initial synchronization above, the master/slave server enters the command propagation phase. After a write command is executed on the primary server, the current write command is incrementally synchronized to the secondary server to ensure the consistency between the primary and secondary servers. However, due to asynchronous replication and command propagation, the consistency between the primary and secondary servers is not strong.
Disadvantages of old copy
- The first replication is sent from the server
SYNC
The command is sent to the primary server and executed by the primary serverBGSAVE
Generate an RDB snapshot file and perform a full resynchronization from the server to keep the state of the database consistent. - It is also sent once when the server is disconnected and reconnected
SYNC
Command to repeat the steps of the first replication to perform a complete resynchronization, but the efficiency is slow.
The SYNC command is a very resource-intensive operation:
- The primary server needs to generate RDB files, occupying a large amount of CPU, memory, and disk I/O of the primary server. As a result, the processing efficiency of the primary server slows down when the client requests are large
- The primary server consumes large network resources when sending RDB files to the secondary server
- After receiving the RDB file, the slave server needs to update the current database state. The slave server is blocked and cannot accept client requests.
New master/slave replication process
In order to solve the problem that complete resynchronization consumes resources when the original version of the master/slave replication is disconnected and reconnected, Redis uses the PSYNC command instead of the SYNC command to improve the efficiency of the master/slave replication.
The PSYNC command supports full and partial resynchronization modes.
- Full resynchronization: During the first master/slave replication,
PSYNC
The resynchronization command is used to complete resynchronizationSYNC
The commands have the same functions and the execution process is basically the same. In both cases, a complete RDB file is generated and sent to the slave server, and then write commands in the replication buffer are sent to complete a synchronization - Partial resynchronization: A major role in bolt reconnection, after disconnection from the server to connect to the server, if meet some heavy synchronous conditions, then the primary server will only be sent from the server in offline for the time increment write command, rather than a complete synchronization, also needs to receive only incremental part of the command from the server, After execution, the database state is updated.
Implementation of partial resynchronization
Partial resynchronization structure:
-
The replication offsets of the master and slave servers
Copy the offset is to record the current master-slave replication of data between the gap, to determine whether a master-slave is in a consistent state and each time the master server in synchronous from a command to the server, will be the primary server replication offset increases, after receive the master server to send commands from the server, and will copy the offset from the server will also increase.
-
Replication backlogs
A queue is used to record write commands during the offline time of the slave server when copying the backlog buffer. Each time the primary server sends a write command to the secondary server, it synchronously records the write command to the replication backlog buffer. The default size is 1 MB.
-
Server run ID (RUN ID)
During the first master/slave replication, the master server sends its RUN Id to the slave server. The slave server records the run Id of the current master server to determine whether the slave server and the first connected server are the same server when reconnecting
Partial resynchronization execution process
During the reconnection from the server, the system checks whether the conditions for partial resynchronization are met. Partial resynchronization is performed only when the conditions for partial resynchronization are met. Otherwise, full resynchronization is performed to ensure data consistency.
- After the secondary server disconnects and reconnects, it sends a message to the primary server
PSYNC
Command and current record of the master serverrun ID
, the current slave service recorded before disconnectionThe replication offset is offset - The primary server received the message sent by the secondary server
PSYNC
Command and compare the current server ID with the one sent from the serverrun ID
If yes, partial resynchronization can be performed. If no, partial resynchronization cannot be performed. It indicates that a primary/secondary switchover is performed during the disconnection of the secondary server - if
run id
Same, then the master server will be based on the receivedoffset
toReplication backlogsIf there is no record, then a command is missing and a full resynchronization of the slave server is required. If there is a record, then the incremental commands during the slave disconnection can be found and a partial resynchronization can be performed - If the data exists in the replication backlog buffer, the master server sends it to the slave server
+CONTINUE
Command, then send the incremental data in the replication backlog buffer to the slave server, and the master and slave will return to the consistent state.
Master/slave synchronization process
General process of full and partial resynchronization:
Principle of master slave replication in Redis
Implementation of master-slave replication
Primary/secondary replication synchronizes data from the secondary server to the primary server to ensure that the status of the two servers is consistent. The primary/secondary replication is asynchronous and cannot ensure strong data consistency at the same time.
Master slave replication process
-
** Set the IP address and port of the primary server.
SLAVEOF <master_id> <master_port>
-
** Step 2: ** Establish a socket connection
Establish a connection to the executing server based on the IP and port specified in step 1, and make the slave server the client of the primary server
-
** Step 3: ** Send the PING command
After the step 2 connection is established, the first thing to do is send the PING command. The PING command is sent to check whether the connection is established correctly and whether the master server can receive the command from the slave server. The second is to prepare for the primary/secondary replication. The primary and secondary servers communicate with each other through commands. You need to send the PING command to check whether the primary server can receive and process commands properly. When the master server responds to the PONG command to the slave server after receiving the PING command, and the slave server receives the PONG command sent by the master server, then the connection status is normal and communication can be carried out normally
-
** Step 4: ** authentication
If the connection established in Step 3 is normal, the masterauth option is set to determine whether to perform identity authentication. If identity authentication is required, the slave server sends an AUTH command to the master server with the masterauth parameter.
-
** Step 5: ** Send port information
If the identity authentication in Step 4 succeeds, the secondary server runs the replconf listening-port
command to send the port number listened by the secondary server to the master service. -
** Step 6: ** Synchronization
The secondary server sends the PSYNC command to synchronize the database status of the primary server to ensure data consistency.
During the synchronization phase, the master server becomes the client of the slave server, because only the client can send commands to Redis and be executed by Redis.
-
** Step 7: Spread the ** command
When complete data after the initial synchronization of the master server and is consistent with the state of the database from the server, slave servers will enter to the command transmission stage, at this stage, the primary server only need will perform a write command link sent from the server through the network, received from the server and perform, can guarantee the master-slave server data consistency (eventual consistency).
Heartbeat detection of primary/secondary replication
REPLCONF ACK
The secondary server sends heartbeat connections to the primary server once a second after the primary server starts command propagation.
There are three main functions for master and slave servers:
- Check the network connection status of the primary and secondary servers
- Help implement min-Slaves option -> to prevent the problem of split brains in Redis as described in the Redis-Sentinel article
- Detection command loss
When the write command is lost on the network, the master server can sense the data inconsistency between the two servers by replconf ack command sent by the slave server, and then it will send the command by offset to restore the data consistency between the master and slave servers.
Redis active resending commands are determined by REPLCONF ACK
, which is a feature after Redis2.8. Before Redis2.8, if a command is lost, the primary and secondary servers do not notice it.
summary
- Master/slave replication is implemented by RDB snapshot files and Redis’ own COW mechanism to copy modified data pages and copy buffers
- After Redis2.8, master/slave replication is divided into full resynchronization and partial resynchronization.
- Partial resynchronization is achieved through server run ID, replication backlog buffer, and replication offset
- The master server implements the consistency between the master and slave states through command propagation. Different replication is an asynchronous implementation process. The heartbeat detection between the slave server and the master server determines the consistency of the database memory status.
Issues related to
-
Why does master/slave full synchronization always fail?
During master/slave full synchronization, you may encounter synchronization failure in the following scenarios:
The slave sends a full synchronization request to the master. The master generates an RDB and sends it to the slave. The slave loads the RDB.
Due to the large size of the RDB data, the slave load takes a long time.
At this point, you will find that the slave has not finished loading the RDB, but the connection between the master and slave is disconnected, and the data synchronization fails.
Then you will see that the slave initiates a full synchronization, and the master generates an RDB to send to the slave.
Similarly, when the SLAVE loads the RDB, the master/slave synchronization fails again and again.
What’s going on here?
In fact, this is Redis “copy storm” problem.
What is a replication storm?
As just described: master/slave full synchronization fails, synchronization restarts, synchronization fails again, and so on, in a vicious cycle that continues to waste machine resources.
Why does this cause problems?
This problem can occur if your Redis has the following characteristics:
- The master instance data is too large, and the SLAVE takes too long to load the RDB. Procedure
- The slave client-output-buffer-limit is too small. Procedure
- The master has a large number of write requests
During full data synchronization, the master first writes the write request to the master/slave replication buffer. The upper limit of the buffer is determined by the configuration.
When the SLAVE loads the RDB too slowly, the slave cannot read the data from the “replication buffer” in a timely manner, causing the replication buffer to “overflow”.
To prevent memory growth, the master forcibly disconnects the slave, and full synchronization fails.
The slave that failed to synchronize would then “restart” the full synchronization, causing the problem described above to repeat itself in a vicious cycle known as a “replication storm.”
How to solve this problem? Let me give you the following suggestions:
- Redis instances should not be too large to avoid large RDB
- Set the replication buffer as large as possible to allow sufficient time for the slave to load the RDB and reduce the probability of full synchronization failure
If you’re in the same hole, there’s a solution.