A master-slave replication
In Redis, users can execute the SLAVEOF command or set the SLAVEOF option to have one server replicate another. We call the replicated server the master server, and the servers that replicate the master server are called slave servers.
The databases of the primary and secondary servers in replication will hold the same data, which is conceptually called database-state consistency, or simply “consistency”.
Redis uses the old version of replication before the 2.8 version, and uses the new version from the 2.8 version. The old version of replication has the problem of low efficiency after a short reconnection, but the new version of replication through partial resynchronization to solve this problem, the following will briefly learn the implementation principle of the old version and the new version of replication.
Copy the old version
Redis replication is divided into two operations: synchronization and command propagation:
- The synchronization operation is used to update the database state of the slave server to the current database state of the master server.
- The command propagation operation is used to make the databases of the primary and secondary servers return to the consistent state when the database status of the primary server is changed and the database status of the secondary server is inconsistent.
synchronous
When the client sends the SLAVEOF command to the slave server to replicate the master server, the slave server first needs to perform a synchronization operation, that is, update the database state of the slave server to the current database state of the master server.
The steps are as follows:
- The secondary server sends the SYNC command to the primary server.
- The primary server receiving the SYNC command executes the BGSAVE command, generates an RDB file in the background, and uses a buffer to record all write commands executed from now on.
- When the BGSAVE command is executed by the master server, the master server sends the RDB file generated by the BGSAVE command to the slave server. The slave server receives and loads the RDB file to update its database status to that of the master server when the BGSAVE command is executed.
- The master server sends all write commands recorded in the cache to the slave server, which executes these write commands to update its database state to the current state of the master database.
Command transmission
After the synchronization operation is completed, the databases of the master and slave servers will reach a consistent state, but this consistency is not invariable, whenever the master server executes the write command sent by the client, the MASTER server IDE database may be modified, and cause the master and slave server state is not consistent.
In order to let the slave servers back to a consistent state once again, the master server to execute commands transmitted from the server operation: the primary server will write command execution, which is inconsistent with the slave servers that caused write command, sent from the server to perform, every time after the write command from the server to perform the same, master-slave server returned to a consistent state.
Defects in old copy
Replication from the slave server to the master server in Redis can be divided into the following two cases:
- Primary replication: The secondary server has not replicated any primary server before, or the primary server to be replicated is different from the primary server to be replicated last time.
- Replication after disconnection: The replication between the primary server and secondary server is interrupted due to network problems. However, the secondary server reconnects to the primary server through automatic reconnection and continues to replicate the primary server.
For the first replication, the old replication function can complete the task well, but for the post-disconnection replication, although the old replication can bring the master and slave servers back to the consistent state, but the efficiency is very low.
The secondary server must send SYNC to the primary server regardless of how much database state the secondary server already has. The primary server will then execute the BGSAVE command to generate the RDB file. This is a very resource-consuming process. See how new version replication solves this problem.
The new copy
In order to solve the problem of the inefficiency of the replication function in the case of short-line duplication, Redis uses the PSYNC command instead of the SYNC command to perform the synchronization operation during replication from version 2.8.
The PSYNC command supports full and partial resynchronization modes.
- Full resynchronization: The execution steps of the SYNC command are basically the same as those of the SYNC command. They are synchronized by having the master create and send the RDB file, and by sending write commands stored in the cache to the slave server.
- Some heavy sync: from the server after the break line to connect the main server, if the condition allows, the primary server can connect the slave servers disconnect during the execution of write command is sent to and from the server from the server as long as receive and execute the write command, the database can be updated to the current state of the main server.
By comparing the methods of SYNC and PSYNC, it can be seen that although SYNC and PSYNC can restore the primary and secondary disconnected servers to the same state, the partial resynchronization requires much less resources than the SUNC command, and the synchronization is completed much faster. The SYNC command requires the entire RDB file to be generated, transmitted, and loaded, while partial resynchronization requires only the missing write commands to be sent to the slave server for execution.
Implementation of partial resynchronization
Part of the resynchronization capability consists of replication offsets, replication backlog buffers, and run ids.
Replication offset:
Each party that performs the copy maintains a copy offset that records the number of bytes currently copied.
- Each time the master propagates N bytes of data to the slave, it adds N to the value of its own replication offset.
- Each time the slave receives N bytes of data propagated from the master, it adds N to its own replication offset.
By comparing the replication offsets of the master and slave servers, the program can easily know whether the master and slave servers are in a consistent state. If the offsets of the primary and secondary servers are different, the primary and secondary servers are not in the same state.
In the case of such a disconnection, partial resynchronization can be achieved between the master and slave servers by buffering commands using backlogged buffers.
Replication backlog buffer: a fixed-length fifO queue maintained by the primary server, with a default size of 1MB.
When primary server for command transmitted, it will not only will write command is sent to all from the server, also will write command team to copy the backlog in the buffer, so the primary server replication backlog buffer inside part of the recent spread of write command, could be preserved and buffer will for each byte of queue records corresponding to copy the offset, When the slave server reconnects to the master server, the slave server sends its replication offset to the master server through the PSYNC command. The master server uses this replication offset to decide which synchronization operation to perform on the slave server.
If the data after offset is still in the replication backlog buffer, the primary server performs partial resynchronization on the secondary server. If the data no longer exists in the replication backlog buffer, a full resynchronization operation is performed.
The heartbeat detection
During command propagation, the secondary server sends the REPLCONF ACK <replication_offset> command to the primary server once per second, where replication_offset is the current replication offset of the secondary server. The functions of sending this command are as follows:
- Check the network connection status of the master and slave servers: If the master server does not receive a command from the slave server for more than a second, the master server knows that there is a problem with the connection between the master and slave servers.
- Auxiliary implementation of Min-Slaves option: The min-rabes-to-write and min-rabes-max-log options of Redis can be configured so that when the master server has less than X slave servers or the delay value of x slave servers is greater than Y seconds, the master server will reject the write command.
- Detection command missing: If because of network failure, the primary server is sent to write command in halfway missing from the server, and then from the server to the main server sends REPLCONF ACK command, the primary server will be found from the server copy of the current offset less than their own copy offset, and then the primary server will according to submit copy of offset from the server, Find the data missing from the slave server in the replication backlog buffer and resend it to the slave server.
Primary and secondary replication Demo experience
# note! Slaoveof before 5.0 and Replicaof after 5.0
Replica -read-only yes Set the secondary server to read-only
The steps are as follows:
# 1. Enable primary server 6379
# 2. Enable slave server 12345
# 3. Send SLAVEOF to server 12345 via CLI client127.0.0.1:12345 > SLAVEOF 127.0.0.1 6379Run a write command on the primary server through the CLI client127.0.0.1:6379 >set hello masterAndSlave
Get the value of the hello key from the server127.0.0.1:12345 > get helloCopy the code
The guard Sentinel
Sentinel-sentinel is a high availability solution for Redis: The sentinel system, composed of one or more sentinel instances, can monitor any number of master servers and all slave servers under these master servers, and automatically upgrade a slave server under the offline master server to a new master server when the monitored master server goes offline. The new master server then replaces the offline master server to continue processing command requests.
How sentinel mode works
Sentinel is essentially a Redis server running in a special mode that uses sentinel.conf as its configuration file. By default, after the sentinel server is created, it sends the INFO command to the monitored master and slave servers through the command connection once every ten seconds, and obtains the current information of the master and slave servers by analyzing the reply of the INFO command. The sentinel sends publish/subscribe commands to all monitored master and slave servers every two seconds to obtain node information. Once a second, the sentinel sends PING commands to all instances (master, slave, and other sentinels) with which it has created command connections, and determines whether the instance is online by the PING response returned by the instance.
Subjective and objective logoff
As mentioned earlier, the sentinel sends a PING command at a rate of once per second, and when a sentinel node does not receive a response for some reason (network fluctuation, instance outage, etc.) within a certain number of milliseconds, the instance is deemed subjective offline by the sentinel. This number of milliseconds can be configured using the Down-after-Milliseconds option in the configuration file.
When the sentinel determines that a master server is subjectively offline, in order to verify that the master server is actually offline, it asks other sentinels who are also monitoring the master to see if they agree that the master server has gone offline (subjectively or objectively). When the sentinel receives a sufficient number of offline judgments from other sentinel nodes, the sentinel determines that the primary server is objectively offline and performs failover operations on the primary server. To determine how many sentinels are needed, the number of sentinels that can be determined as objective offline if the master server is offline can be configured through the quorum parameter of the configuration file.
Different sentry nodes above can have different configurations, that is, different sentry nodes can have different subjective logout decision milliseconds and different objective logout decision sentry nodes.
Election lead guard
When a master server is judged to be objectively offline, the sentinels monitoring the offline master negotiate to elect a lead sentinel who will perform failover operations on the offline master server.
The election rules are as follows:
- All sentinels on line are eligible to be selected as lead sentinels.
- After each lead sentry election, the value of the configuration era for all sentries, whether or not they were elected successful
epoch
They all increase by themselves. - In a configuration era, all sentinels have one chance to set a sentinel as a local leader, and once the local leader is set, it cannot be changed in the configuration era.
- Each sentry that finds the master server going objective offline asks the other sentries to set themselves as local lead sentries.
- When one sentry sends a message to another
SENTINEL is-master-down-by-addr
Command, and in the commandrunid
When the argument is not an * symbol but the run ID of the source sentry, this indicates that the source sentry has asked the target sentry to set the former to the local lead sentry of the latter. - The rule for sentries to set local lead sentries is first served.
- Target sentry is picking up
SENTINEL is-master-down-by-addr
After command, a command reply is returned to the source sentryLeader - runid
Parameters andleader_epoch
Parameters record the run ID and configuration era of the target sentinel’s local lead sentinel, respectively. - After receiving the command reply from the target sentinel, the source sentinel checks the response
leader_epoch
Is the value of the parameter the same as its own configuration era, if so, then the source sentinel continues to fetch the replyleader_runid
Parameter, ifleader_runid
The value of the parameter is the same as the running ID of the source sentry, which means that the target sentry has set the source sentry as the local lead sentry. - If a sentinel is set as the local lead sentinel by more than half of the sentinels, that sentinel becomes the lead sentinel.
- Because a lead sentinel requires the support of more than half of the sentinels, and each sentinel can have only one local lead sentinel in each configuration era, only one lead sentinel will appear in a configuration era.
- If no sentry is elected as the lead sentry within a given time limit, each sentry will be elected again at a later time until the lead sentry is elected.
Examples of lead Sentry elections are as follows:
- Each sentinel node that is subjectively offline is sent above to other sentinel nodes
SENTINEL is-master-down-by-addr host port runid
Command to request that it be set to leader. - The sentinel node receiving the command will agree if it has not yet agreed to the command sent by another sentinel (that is, has not yet voted), otherwise it will reject it.
- If the sentinel finds that it has reached quorum, it becomes the lead sentinel.
- If no sentry is elected as the lead sentry during the process, a period of time is waited for a re-election.
failover
After the lead Sentry is elected, the lead Sentry will perform a failover operation on the offline master server, which consists of the following three steps:
- Out of all slave servers under the offline master, select one slave server and convert it to the master server.
- Reroute all slave servers under the offline master to replicate the new master.
- Set the offline master server as the slave server of the new master server, and when the old master server comes back online, it becomes the slave server of the new master server.
Sentry Demo Experience
sentinel.conf # as a configuration file for the sentinel node
Start the sentinel node and monitor the master serverSentinel Monitor MyMaster 127.0.0.1 6379 1You can also use this method to boot
redis-server sentinel.conf --sentinel
Info to obtain master/slave replication information
info replication
Copy the code
The cluster
Redis cluster is a distributed server farm consisting of multiple master and slave node groups with replication, high availability and sharding features. Redis clusters can perform node removal and failover without sentinels. Each node needs to be set to cluster mode, which has no central nodes and can scale horizontally and linearly to tens of thousands of nodes according to official documentation (no more than 1000 nodes are officially recommended). Redis clusters provide better performance and high availability than previous versions of Sentinel mode, and cluster configuration is very simple.
Working Principle of cluster
The Redis cluster divides all data into 16384 slots, some of which each node is responsible for. Slot information is stored in each node. When a client of the Redis cluster connects to the cluster, it also gets a copy of the slot configuration information of the cluster and caches it locally to the client. In this way, when the client wants to find a key, it can directly locate the target node. In addition, slot information may be inconsistent between the client and server. Therefore, a correction mechanism is required to verify and adjust slot information.
Slot location algorithm
By default, the Cluster hashes the key using the CRC16 algorithm to obtain an integer value, and then modulates 16384 to obtain the specific slot.
HASH_SLOT = CRC16(key) mod 16384
Jump relocation
When the client sends an instruction to a wrong node, the node will find that the slot where the key of the instruction is located is not managed by itself, and then it will send a special skip instruction to the client with the address of the node for the target operation, telling the client to connect to this node to obtain data. After receiving the command, the client not only redirects to the correct node, but also synchronously updates and corrects the local slot mapping table cache. All subsequent keys use the new slot mapping table.
The shard
The Redis cluster resharding operation can change any number of slots assigned to one node to another node, and the key-value pairs of the associated slots are moved from the source node to the target node. Resharding can be done online without the cluster going offline, and both source and target nodes can continue to process command requests.
Resharding of Redis clusters is performed by Redis’s cluster management software, Redis-Trib. Redis provides all the commands required for resharding, while Redis-Trib does this by sending commands to source and target nodes.
The steps are as follows:
- Redis-trib is sent to the destination node
CLUSTER SETSLOT <slot> IMPORTING <source_id>
Command to prepare the target node to import key-value pairs belonging to slots from the source node. - Redis-trib is sent to the source node
CLUSTER SETSLOT <slot> MIGRATING <target_id>
Command to prepare the source node to migrate key-value pairs belonging to slots to the target node. - Redis-trib is sent to the source node
CLUSTER GETKEYSINSLOT <slot> <count>
Command, get the mostcount
Is the key name of the key-value pair that belongs to the slot. - For each key name obtained in Step 3, redis-trib sends one to the source node
MIGRATE <target_ip> <target_port> <key_name> 0 <timeout>
Command to migrate the selected key atomically from the source node to the destination node. - Repeat Step 3 and Step 4 until all key-value pairs saved by the source node that belong to slots are migrated to the destination node.
- Redis-trib is sent to any node in the cluster
CLUSTER SETSLOT <slot> NODE <target_id>
Command to assign a slot to a target node. This assignment information is sent to the entire cluster through a message, and eventually all nodes in the cluster know that the slot has been assigned to the target node.
ASK the wrong
During resharding, when the source node migrates a slot from the target node, it may occur that some of the key-value pairs belonging to the migrated slot are kept in the source node, while others are kept in the target node. To the source node when the client sends a command associated with database key, and ordered to handle database key just belong to the tank is moved, the source node will now own database to find the specified key, if found the orders to the client sends directly, if can’t find it, then this key may have been migrated to the target node, The source node then returns an ASK error to the client, directing the client to the target node that is importing the slot and sending the command it wanted to execute earlier.
The client receiving the ASK error will turn to the target node in the import slot according to the IP address and port number provided by the error, and then send an ASKING command to the target node first, and then resend the original command. (Because the command will be returned to the ‘MOVED’ command without ASKING.)
Replication and failover
Each node in the cluster periodically sends PING messages to other nodes in the cluster to check whether they are online. If the node that receives the PING message does not return the PONG message to the node that sent the PING message within the specified time, The node that sends the PING message marks the node that receives the PING message as suspected offline (PFAIL).
Each node in a cluster sends messages to each other to exchange status information about each node in the cluster, for example, whether a node is online, suspected to be offline (PFAIL), or offline (FAIL).
If, in a cluster, more than half of the primary nodes responsible for processing slots report a primary node X as suspected offline, that primary node X will be marked as offline, and all nodes that mark primary node X as offline will broadcast a FAIL message to the cluster about primary node X. All nodes that receive this FAIL message immediately mark primary node X as offline.
failover
When a slave node finds that the master node that it is replicating enters the offline state, the slave node starts to failover the offline master node as follows:
- Of all slave nodes that replicate the offline master node, one slave node is selected.
- The selected slave node executes
SLAVEOF no one
Command to become the new primary node. - The new master undoes all slots assigned to the offline master and assigns them all to itself.
- The new master node broadcasts a PONG message to the cluster, which lets the other nodes in the cluster know immediately that the slave node has become the master node and that the master node has taken over the slot that was handled by the offline node.
- The new master node starts receiving command requests related to the slot it is responsible for processing, and the failover is complete.
Elects a new master node
Similar to the lead Guard election process, the steps are as follows:
- The secondary node discovers that its primary node becomes the FAIL state
- Add one to the currentEpoch and broadcast the FAILOVER_AUTH_REQUEST message
- The other nodes receive this message, and only the master node responds by determining the legitimacy of the requester and sending FAILOVER_AUTH_ACK, which is sent only once in each epoch
- The secondary node that attempts failover collects the FAILOVER_AUTH_ACK returned by other primary nodes
- After receiving more than half of the master node ack from the node into the new master node (here to explain why the cluster requires at least three primary node, if there are only two, when one of them is hung up, only one master node cannot be elected) (if this era no elected within the master node, and re-election will be entering a new era)
- The new master node broadcasts the PONG message to inform the other cluster nodes
The message
Redis cluster nodes communicate with each other through the gossip protocol. The nodes send the following five messages:
- MEET: A node sends a MEET to a new node to join the cluster, and then the new node starts communicating with other nodes.
- PING: Each node frequently sends PING messages to other nodes, which contain its own status and cluster metadata maintained by it. Each node can exchange metadata (such as adding and removing cluster nodes and hash slot information) through PING.
- PONG: The return of PING and MEET messages, containing its own status and other information, can also be used for information broadcasts and updates.
- FAIL: After judging that another node fails, a node sends a FAIL message to inform other nodes that the specified node is down.
- PUBLISH: When a node receives a PUBLISH command, it executes the command and broadcasts a PUBLISH message to the cluster. All nodes that receive the PUBLISH message execute the same PUBLISH command.
Cluster Demo Experience
# start clusterRedis-cli --cluster create --cluster-replicas 1 127.0.0.1:8001 127.0.0.1:8002 127.0.0.1:8003 127.0.0.1:8004 127.0.0.1:127.0.0.1 8005:8006Get cluster information
cluster info
cluster nodes
Copy the code
conclusion
In the last article, I learned the use and basic principles of Redis standalone database, including persistence and events. However, standalone database faces four problems: Read pressure, writing, data backup, fault self-healing, in this article, learning the master-slave replication, sentinel and cluster, master-slave replication solution to read pressure and data backup, the guard mode on the basis of the master-slave solved the fault self-healing, cluster are all contain the function of master-slave and sentry and writing has been solved by the fragmentation and slot allocation principle of pressure.