Prepare for redis guide (6) : | get master-slave replication

preface

In the previous article, we studied in depth two different Redis persistence methods, Redis server through persistence, Redis memory to the hard disk, when Redis down, can be RDB or AOF file data recovery in memory.

But if the server’s host fails, is the data lost?

Next, we will learn about the master-slave replication mechanism, which replicates data from a host to multiple servers to ensure data security.

So what are the specific functions of Redis master and slave copy?
How to configure the primary/secondary cluster?
How is it implemented at the bottom?

This article will help you with those questions.

Overview of master slave replication

Primary/secondary replication refers to the replication of data from one Redis server to other Redis servers. The former is called the master node (master/leader) and the latter is called the slave node (slave/follower). The replication of data is one-way and can only go from the master node to the slave node. The Master mode is mainly written, while the Slave mode is mainly read.

By default, each Redis server is the primary node; And a master node can have multiple slave nodes (or none), but a slave node can only have one master node.

The general structure of master-slave replication can be as follows:

The primary and secondary replication functions include:

Data redundancy: Master/slave replication implements hot backup of data and is a data redundancy method other than persistence.
Fault recovery: When the primary node is faulty, the secondary node provides services for rapid fault recovery. It’s actually redundancy of services.
Load balancing: On the basis of master/slave replication and read/write separation, the master node provides the write service, and the slave node provides the read service (that is, the application connects to the master node when writing Redis data, and the application connects to the slave node when reading Redis data) to share server load. Especially in the scenario of less write and more read, the concurrency of the Redis server can be greatly increased by sharing the read load with multiple slave nodes.
High availability (clustering) cornerstone: In addition to the above, master-slave replication is the foundation upon which sentry and clustering can be implemented, so master-slave replication is the foundation of High availability in Redis.

Generally speaking, if you want to apply Redis to engineering projects, it is absolutely impossible to use only one Redis, for the following reasons:
- 1. Structurally, a single Redis server will have a single point of failure, and a single server needs to handle all the request loads, resulting in great pressure.
- 2. In terms of capacity, the content capacity of a single Redis server is limited. Even if the content capacity of a Redis server is 256G, all the memory can not be used as Redis storage memory. In general, the maximum memory used by a single Redis should not exceed 20GB

Two, master and slave construction

Three servers are proposed:

The server	role
192.168.81.101	master
192.168.81.102	slave
192.168.81.103	slave

Redis is 6.0.8

2.1 Node Configuration

The configuration of the master and slave nodes of Redis has several elements:

If there is no password, you do not need to configure the password
If a password is used to log in, the password must be the same on the three Redis, and the password of the primary node must be configured on the secondary node.

All three servers have passwords. Therefore, the configuration is as follows:

The master node:

Bind 0.0.0.0 requirepass redis@admin2021 # daemonize yes # run protected-mode no # Turn protected mode offCopy the code

Slave node configuration:

Bind 0.0.0.0 requirepass redis@admin2021 # daemonize yes # Run protected-mode no # Disable replicaof in protected mode 192.168.81.101 6379 # Connect to the IP port of the master node masterauth redis@admin2021 # Configure the password of the master nodeCopy the code

Start Redis and run the info replication command to check the status of the current cluster.

127.0.0.1:6379> Info replication # Replication role: Master Connected_Slaves :2 = 192.168.51.102 slave0: IP and port = 6379, state = online, offset = 458, lag = 1 = 192.168.51.103 slave1: IP and port = 6379, state = online, offset = 472, lag = 0 master_replid: 3 d0b1a1d07922824b5b29d3f2387aa61d6a32bc1 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:472 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:472Copy the code

127.0.0.1:6379> info replication # replication role:slave master_host:192.168.51.101 master_port:6379 master_link_status:down master_last_io_seconds_ago:-1 master_sync_in_progress:0 slave_repl_offset:486 master_link_down_since_seconds:41 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:3d0b1a1d07922824b5b29d3f2387aa61d6a32bc1 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:486 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 Repl_backlog_first_byte_offset :15 repl_backlog_histlen:472 127.0.0.1:6379> set k1 v1 (error) READONLY You can't write against a read only replica.Copy the code

2.2 Cluster Test

When writing data to the slave node, the slave node cannot be written by default

127.0.0.1:6379> set k1 v1
(error) READONLY You can't write against a read only replica.
127.0.0.1:6379> get k1
(nil)
Copy the code

Write data on the primary node and then look up data on the secondary node.

master:

127.0.0.1:6379> set k1 v1
OK
Copy the code

slave

127.0.0.1:6379 > get k1 "v1"Copy the code

Above, one master and two slave cluster is set up.

Master-slave replication theory

After setting up the master-slave cluster, let’s learn more about the theory of master-slave replication

3.1 CAP principle

CAP theory is the theoretical foundation of distributed storage. Three letters is the abbreviation of three words

C) Consistent D) Consistent
A: Availablity
P: Partition tolerance

The nodes of a distributed system are often distributed on different machines for network isolation, which means that the network is always at risk of disconnection. Once disconnection is called network partition.

During network partitioning, two distributed nodes cannot communicate with each other. Modification operations on one node cannot be synchronized to the other node. Therefore, data consistency cannot be satisfied.

Unless availability is sacrificed, data modification is no longer provided when the network is disconnected until the network is restored.

Conclusion: CAP theory is that when a network is partitioned, consistency and availability cannot be achieved at the same time.

Therefore, the high availability of Redis is a must, so we sacrifice the data consistency of Redis to the final consistency. When the master and slave networks are disconnected, the slave nodes will try to catch up with the master node until the data consistency is finally maintained.

3.2 Full Replication

Full Redis replication usually occurs during the Slave initialization phase, when the Slave needs to make a copy of all data on the Master. The specific steps are as follows:

1. Connect the secondary server to the primary server and send the PSYNC command to the master to request data replication
After receiving the PSYNC command, the master server will start to persist data in the background, using bgSave to generate the latest RDB snapshot file. During the persistence, the master will continue to receive requests from the client, which may modify the data set cached in memory
3. When persistence is complete, master sends the RDB file data set to Slave
4. The slave will persist the received data to generate RDB, and then load it into memory
5. The master sends the commands cached in memory to the slave
6. When the connection between the master and slave is disconnected for some reason, the slave can automatically reconnect to the master
7. If the master receives multiple slave concurrent connection requests, it will persist the data only once, instead of one connection and sending the data to multiple slave concurrent connections

As can be seen from the full copy process, full copy is a very heavy operation:

1. The primary node uses the bgsave command to fork the RDB persistence process, which consumes CPU, memory, and disk IO
2. The master node sends the RDB file to the slave node over the network, which consumes the bandwidth of both the master and slave nodes
3. The process of clearing old data from the node and loading new RDB file is blocked and cannot respond to the client command.

3.3 Partial Replication

The reason for adding partial replication is that full replication can cause many problems, such as time overhead and failure to isolate. Redis wanted to have some mechanism to minimize the loss of replication when the master jitter (disconnection) occurs

Partial replication is an optimization measure made by Redis for the high cost of full replication. It is implemented by using the psync{runId}{offset} command. When the slave node is replicating the master node, the slave node will ask the master node to send the lost command data to the master node in case of network intermittent disconnection or command loss. If the master node has such data in the replication backlog buffer, the slave node will directly send the data to the master node. This keeps the replication consistency between the master and slave nodes. This part of the data is generally much smaller than the full amount of data, so the cost is very small.

Implementation principle:

When the master and slave are disconnected, the master records the operations performed during the disconnect to the replication cache (a queue of 1 MB by default). After the slave is reconnected, the slave will send the psync command to the master and pass in offset and runId. At this time, if the master finds the value of the offset transmitted by the slave, it will send the data from offset to the end of the queue in the cache area. Synchronization is achieved, reducing the overhead of using full replication.

3.4 Advantages and Disadvantages of primary/secondary Replication

3.4.1 track advantages

1. The master automatically synchronizes data to the slave to separate read/write data and share the read load of the master
2. The synchronization between master and slave is carried out in a non-blocking manner. During the synchronization, the client can still submit the query or update request

3.4.2 shortcomings

1, do not have automatic fault tolerance and recovery function
2. The master fails. If data is not synchronized before the master fails, data inconsistency may occur after the IP address is changed
3. It is difficult to support online expansion

Four,

Redis were introduced in this paper a Lord and from the cluster set, and then explained the theory of master-slave replication, the construction of the master-slave replication is relatively simple, but its theory is really not so easy to understand, this also is tasted, the current understanding these is enough, don’t have to dig down, if there is a deep understanding of the subsequent, more article explore the I again.

I am Xiao Lei, welcome to read!