Master slave replication process

Define a few concepts

Copy offset

Both the master and slave nodes maintain a replication offset, which represents the number of bytes received by the current node. The master node represents the number of bytes received by the client, and the slave node represents the number of bytes received by the master node. For example, when receiving N bytes of data from the master node, the offset of the slave node increases by N.

The offset plays a very important role. It is the only standard used to measure whether the data of the primary and secondary nodes is always. If the offset of the primary and secondary nodes is equal, the data is always; otherwise, the data is inconsistent. In the case of inconsistency, the missing part of data from the node can be found according to the offset of the two nodes. For example, if the offset of the master node is 500 and the offset of the slave node is 400, the master node only needs to transfer data from 401 to 500 to the slave node, which is called partial replication.

Replication backlogs

The replication backlog buffer is a cache queue maintained by the primary node and has the following characteristics:

  • Maintained by the active node
  • The default value is 1MB. The configuration parameters are as follows:repl-backlog-size
  • It’s a first-in, first-out queue

On command propagation nodes, in addition to passing write commands to slave nodes, the master node also writes write commands to the replication backlog buffer as a backup for use in part of the replication process. Because it is the first in first out queue, and size is fixed, so he had to save the master node recently written orders, when the large difference between two offset of master-slave node, beyond the scope of the backlog of copy buffer, the part could not be copied, only replicated the full amount, so in order to improve the network interruption caused by the full amount to copy, We need to carefully evaluate the size of the replication backlog buffer and adjust it appropriately. For example, if the network outage is 60s and the write command received by the master node is 100KB per second, the average size of the replication backlog buffer should be 6MB, so we can set the size to 6MB or even 10MB. To ensure that partial replication is available for the vast majority of interrupts.

Runid (runid)

When each Redis node is started, it generates a runid, runid, which uniquely identifies the Redis node. It is a string of 40 random hexadecimal characters. You can view the runid of the node through the info server command

When a slave node first establishes a connection for full replication (send psync? -1), the master node will inform the slave node of its RUNID, and the slave node will save it. When the primary and secondary nodes are disconnected and reconnected, the secondary node will send the RUNID to the primary node, and the primary node will decide which replication to choose based on the RUNID sent by the secondary node:

  • If sent from a noderunidWith the current master noderunid When consistent, the master node attempts a partial copy, which of course depends on whether the offset is copying the backlogs
  • If sent from a noderunidWith the current master node runidIf they are inconsistent, full replication is performed

Full copy process

  1. Since it is the first time for data synchronization, the secondary node does not know the rUNID of the primary node, so send psync? – 1

  2. After receiving the command from the slave node, the master node determines that full replication is performed, so it replies +FULLRESYNC and sends its RUNId and offset to the slave node. The response is +FULLRESYNC {runid} {offset}.

  3. After the slave node receives the response from the master node, the rUNId and offset of the master node are saved.

  4. When the master node responds to the slave node’s command, it performs bgSave and saves the generated RDB file locally. The master node accepts part of the requests from the slave node, but the RUNID is inconsistent. The master node performs full replication, returns +FULLRESYNC, and returns its RUNId and offset to the slave node.

  5. The primary node sends the generated RDB file to the secondary node. The secondary node receives the RDB file and saves it as a data file. If the secondary node has a local RDB file, the secondary node clears the RDB file first.

  6. If AOF is also enabled on the slave node, AOF overrides are also performed

Part of the copy

When master-slave node in command transmission network interruption, data loss situation, will be to the Lord, from node requests missing data, if the request is offset copied in the backlog of buffer, the master node will reissue the rest of the data from the node, keep the master-slave node data, due to the replacement of data are generally smaller, So the overhead is small compared to full copy, and the flow is as follows:

  1. If the network blinks on the primary and secondary nodes and the rep-timeout period exceeds, the primary node considers that the secondary node is faulty and unreachable

  2. Since the master node is not down, it still responds to client commands, which are not lost, and are stored in the replication backlog buffer, which is 1MB by default

  3. When the master and slave restore the connection directly, the slave node reconnects to the master node

  4. After the connection is established between the master and slave nodes, the slave node stores the runid and offset of the master node. Therefore, you only need to run the psync {runid} {offset} command

  5. After receiving the psync command from the secondary node, the primary node checks whether the requested RUNID is consistent with its own RuniD. If the runiD is consistent, the secondary node replicates the current primary node. Then check to see if the requested offset is replicating the backlog buffer, if it is partial, otherwise full, partial copy reply +CONTINUE response, receiving the reply from the node

  6. In partial replication, the master node only needs to send the data of the replication backlog buffer to the slave node according to the offset

Added: How to execute the psync command

  1. How to call first from the node based on the current statepsyncThe command
  • If the primary and secondary nodes were never connected or executed between themslave of none, the psync? Command is sent from the node. -1: requests full replication from the primary node.
  • If a connection is established between the primary and secondary nodes, the command is sent psync {runid} {offset}Try partial replication. The full or partial replication depends on the primary node
  1. The master node responds differently according to its own situation:
  • Responds if the version of the primary node is less than 2.8-ERR, receives the reply from the node and sends itsyncMake a full copy
  • If the master node finds that the requested command ispsync ? - 1, the slave node is judged to be connected for the first time and responds+FULLRESYNC <runid> <offset>To perform full replication
  • If the master node compares the command requestrunid And its ownrunidInconsistent or consistent, but the requested offset is not in the replication backlog buffer, the response+FULLRESYNC <runid> Make a full copy
  • If the master node compares the command request runidAnd its ownrunidConsistent, andoffsetIs also copying the backlog buffer, then responds+CONTINUEMake a partial copy