preface
Redis is usually not deployed individually, or it will not cause a single point of failure, so what are the high availability solutions for Redis?
A master-slave replication
Users can use the SLAVEOF command or configure one server to replicate another. The replicated server is called the primary server, and the replicated server is called the secondary server. This allows you to add key values on the primary server that can be read on the secondary server.
The replication process is divided into two steps: synchronization and command propagation.
synchronous
Synchronization updates the database state of the slave server to the current database state of the master server.
When the client sends the SLAVEOF command to the slave server, the slave server synchronizes the slave server with the SYNC command as follows:
- The SYNC command is executed from the secondary server to the primary server.
- The primary server receiving the SYNC command executes the BGSAVE command, generates an RDB file in the background, and records all write commands executed from now on with a buffer.
- After the BGSAVE command is executed on the master server, the master server sends the RDB file generated by the BGSAVE command to the slave server. The slave server receives and loads the RDB file, which updates the database state of the slave server to the database state of the master server when the BGSAVE command is executed
- The master server sends all write commands for the buffer to the slave server, which executes these write commands to update the database state to the current database state of the master server.
Command transmission
After the synchronization, the database status of the primary and secondary servers is consistent. However, after the primary server receives the write command from the client, data inconsistency occurs between the primary and secondary databases. In this case, the database consistency is achieved through command propagation.
PSYNC synchronization optimization
The synchronization prior to 2.8 was full synchronization each time, but if the slave server is disconnected for a short time, it is not necessary to start the synchronization from scratch, just need to synchronize the disconnected data. So version 2.8 started using PSYNC instead of SYNC.
PSYNC is divided into full synchronization and partial synchronization. Full synchronization deals with initial synchronization and partial synchronization deals with disconnection and reconnection.
Implementation of partial synchronization
Partial synchronization mainly uses the following three parts:
- The replication offsets of the master and slave servers
- Replication backlog buffer for the primary server
- Running ID of the server
Copy offset
Replication offset of the master server: The master server propagates its own replication offset +N of the replication offset of the slave server each time N bytes of data are propagated to the slave server: The slave server copies its own offset +N each time it receives N bytes of data propagated by the master server. If the master and slave servers are in a consistent state, their offsets are always the same. If the offsets are not equal, they are in an inconsistent state.
Replication backlogs
Replication backlog buffer A FIFO queue of fixed length maintained by the primary server. The default size is 1MB. When the maximum length is reached, the first queued element is ejected to make room for the new queued element.
The redis command is propagated not only to the slave server, but also to the replication backlog buffer.When the slave server reconnects to the master server, the slave server sends its replication offset to the master server through the PSYNC command. The master server decides whether to use partial or full synchronization based on the replication offset. Partial synchronization is used if the data after offset is still replicating the backlog buffer, and full synchronization is used instead.
(The book does not say how to determine, I guess it is the primary replication offset minus the secondary replication offset, if more than 1MB is not in the buffer backlog.)
Running ID of the server
When the server starts up, a random 40-digit character is generated as the server run ID.
During the first replication from the slave server to the master server, the master server passes its run ID to the slave server, and the slave server saves the run ID. When the secondary server disconnects and reconnects, the secondary server sends the saved running ID to the secondary server. If the saved running ID is the same as that of the primary server, partial synchronization is attempted. If different, full synchronization is performed.
The overall process of PSYNC
The heartbeat detection
During command propagation, the secondary server sends the REPLICONF ACK
command to the primary server once per second by default. Replication_offset is the current replication offset of the secondary server. Sending the REPLICONF ACK command has three effects on the primary and secondary servers:
- Check the network connection status of the primary and secondary servers.
- Assist to implement the Min-Slaves option.
- The detection command is lost.
Check the network connection status of the primary and secondary servers
The master and slave servers can check that the network connection is working by sending and receiving REPLICONF ACK commands: If the master does not receive a REPLICONF ACK command from the slave for more than a second, the master knows that there is a problem.
Assist to implement the Min-Slaves option
Redis’ min-rabble-to-write and Min-Rabble-max-lag options prevent the master and slave servers from executing write commands in unsafe situations.
min-slaves-to-write 3
min-slaves-max-lag 10
Copy the code
If configured as above, the master server will refuse to execute the write command if the number of slave servers is less than 3, or if the delay of all three slave servers is greater than or equal to 10 seconds.
Detection command loss
If, due to a network failure, the write command propagated from the master server to the slave server is lost halfway through, then when the slave server sends the REPLICONF ACK command to the master server, the master server will notice that the current replication offset of the slave server is less than its own offset, and the master server can use the replication offset of the slave server as a reference. Find the data missing from the slave server in the replication buffer and rewrite the data to send to the slave server.
Master/slave replication summary
Master-slave replication is actually much backup a copy of the data, because even with the help of RDB and AOF for persistence, but may be the whole machine hang up on the primary server, and the master-slave replication slave servers can be deployed on two different machines, so even if the primary server machine hang up, also can manual switch to continue service from the server.
sentinel
Data is backed up on the primary and secondary servers. However, when the primary server fails, you need to manually switch the secondary server to the primary server. Sentinel automatically switches the secondary server to the primary server when the primary server fails.
The Sentinel system can monitor all primary and secondary servers, assuming server1 is now offline. The Sentinel system will fail over server1 when its offline duration exceeds the user-set upper limit:
- The Sentinel system first selects one of the slave servers under Server1 and upgrades this selected slave server to the new master server.
- The Sentinel system then sends a new copy command to all slave servers under Server1, making them slave servers of the new master. The failover operation completes when all slave servers replicate the new master.
- In addition, Sentinel monitors server1 that has gone offline and, when it comes back online, sets it up as a secondary server to the new master server.
Example Initialize the Sentinel status
struct sentinelState { char myid[CONFIG_RUN_ID_SIZE+1]; // Uint64_t current_epoch; Dict *masters is a pointer to the sentinelRedisInstance structure; // Dict *masters is a pointer to the sentinelRedisInstance structure. Int TILT; // Number of scripts currently being executed int running_scripts; Mstime_t tilt_start_time; // The last execution time of the processor mstime_t previous_time; // a fifO queue containing all user scripts that need to be executed. char *announce_ip; int announce_port; unsigned long simfailure_flags; int deny_scripts_reconfig; char *sentinel_auth_pass; char *sentinel_auth_user; int resolve_hostnames; int announce_hostnames; } sentinel;Copy the code
Initialize the Masters property of the Sentinel state
Masters records information about all master servers monitored by Sentinel, where the dictionary key is the name of the monitored server and the value is the monitored server corresponding to the sentinelRedisInstance structure. SentinelRedisInstance Is an instance monitored by the Sentinel server. It can be a master server, a slave server, or another Sentinel instance.
Typedef struct sentinelRedisInstance {// Mark the value, record the type of the instance, and the current state of the instance int flags; // The primary server name is set in the configuration file. // The secondary server name and sentinel name are set automatically by sentinel. // runid char *runid; Uint64_t config_epoch; // Address of the instance sentinelAddr *addr; Mstime_t down_after_period; /* Master host. Unsigned int quorum; Int parallel_syncs; int parallel_syncs; Mstime_t failover_timeout; // The sentinel // key is the name of sentinel, in the format of IP :port // the value is the key corresponding to the sentinel instance structure dict *sentinels; / /... } sentinelRedisInstance;Copy the code
Create a network connection to the primary server
The final step in initializing Sentinel is to create a network connection to the monitored master server. Two connections to the master server are created. Command connection: sends commands exclusively to the master server and receives command replies.
Subscribe to the connection: _sentinel_: Hello channel dedicated to subscribing to the master server.
Get master server information
By default, sentinel sends the INFO command to the monitored primary server every 10 seconds via a command connection and replies to the primary server for the current information. Reply to get the following information.
- Run_id of the primary server
- Information about all secondary servers under the primary server.
The Name dictionary and rUNId fields under sentinelRedisInstance can be updated with this information.
Get slave server information
Sentinel also creates command and subscription connections to the slave server.
By default, sentinel sends the INFO command to the slave server every 10 seconds via a command connection and gets the current information from the slave server via a reply. The reply is as follows:
- The running ID of the secondary server
- Role of the secondary server
- The IP and port of the primary server
- Connection status of the master server master_link_status
- The priority of the secondary server is slave_priority
- Copy offset variable from the server
Based on the info reply, Sentinel can update the slave server instance structure.
Sends information to the subscription connections of the master and slave servers
By default, Sentinel sends commands to the monitored master and slave servers every 2 seconds. s_ip
: IP address of Sentinel
s_port
: Indicates the port number of sentinel
s_runid
: Indicates the running ID of sentinel
s_epoch
:sentinel Indicates the current configuration era
m_name
: Name of the primary server
m_ip
: IP address of the primary server
m_port
: Port number of the primary server
m_epoch
: Indicates the current configuration era of the primary server
Messages sent to sentinel_: Hello channel are also listened on by other Sentinels monitoring the same server (including themselves).
Create command connections to other Sentinels
Sentinels create command connections with each other. Multiple sentinels monitoring the same command will form an interconnected network.
No subscription connections are created between Sentinels.
Detect subjective offline status
Once a second, Sentinel sends a ping command to all instances (primary, secondary, other Sentinels) with which it has created command connections, and determines if the instance is online based on its response. Valid reply: the instance returns one of +PONG, -loading, or -masterdown. Invalid reply: any reply other than the above three, or no reply within the specified period. An instance returned invalid replies to sentinel in back-to-back down-after-milliseconds. Sentinel then modifies the instance structure by turning on the SRI_S_DOWN flag in the flags attribute of the structure to indicate that the instance is in the subjective offline state. (Down-after-milliseconds can be configured in sentinel’s profile)
Detect objective offline status
When a primary server is judged to be subjectively offline, Sentinel will also ask other Sentinels that also monitor the primary server to see if the primary server is also considered offline. If the number exceeds a certain threshold, the master server is judged to be objective offline.
Ask other Sentinels if they agree to take the server offline
SENTINEL is-master-down-by-addr <ip><port><current_epoch><runid>
Run the SENTINEL is-master-down-by-addr command to query the query information. The meanings of the parameters are as follows:
The SENTINEL IS-master-down-by-addr command was received
After receiving the sentinel IS-master-down-by-addr command, other sentinels check whether the primary server is offline based on the IP address and port of the primary server, and then send a reply containing three parameters of Multi Bulk.Sentinel Statistics Indicates the number of offline primary servers. When the number of offline primary servers reaches the configured value, the SRI_O_DOWN flag of the flags attribute of the primary server is turned on, indicating that the primary server has entered the objective offline state.
Election Lead Sentinel
When a primary server is judged to be objectively offline, the sentinels monitoring the offline primary negotiate to elect a new lead Sentinel, which will perform failover operations.
The SENTINEL is-master-down-by-addr command is sent again to elect the lead SENTINEL after confirming that the primary server has gone offline objectively.
Electoral rules
- Each of the multiple online Sentinels monitoring the same master server may become the lead Sentinel.
- After each lead Sentinel election, regardless of whether the election was successful or not, the configuration epochs of all sentinels increased by themselves once. (Configure epoch, which is actually a counter)
- In a configuration era, all sentinels have the opportunity to set a sentinel to a local sentinel. Once set in this configuration era, it cannot be changed.
- All sentinels that find that the primary server goes offline objectively ask other Sentinels to set themselves as local lead Sentinels, that is, they send the sentinel IS-master-down-by-addr command. Try to get other Sentinels to set themselves up as local lead Sentinels.
- When a sentinel is sent to another sentinel
SENTINEL is-master-down-by-addr
If the value of the runid parameter is not * but the runid of the source sentinel, the target Sentinel is to set itself as the lead Sentinel. - The rule for sentinel to set up local leads is
First come first served basis
After the first was set as a local lead Sentinel, all other requests were rejected. - After receiving a sentinel IS-master-down-by-addr command, the target sentinel will return a command reply to the source Sentinel. The LEADER_rUNId parameter and leader_EPOCH parameter in the reply record the RUNID and configuration epoch of the local lead sentinel of the target Sentinel respectively.
- After receiving the reply, the source sentinel will compare whether the returned configuration era is the same as its own configuration era. If so, it will continue to compare whether the rUNID of the returned local lead Sentinel is the same as its own RUNID. If they are consistent, the target Sentinel has set itself as the local lead Sentinel.
- A sentinel becomes a lead Sentinel if more than half of the sentinels are set as locally lead Sentinels.
- The lead Sentinel requires more than half support and can only be set once per configuration era, so only one lead Sentinel will appear in a configuration era
- If each sentinel is elected as the lead Sentinel within a certain time limit (no sentinel fails to get more than half of the votes), then each sentinel is elected again after a certain period of time until the lead Sentinel is elected
failover
Failover involves the following three steps:
- Of all slave servers under the offline master server, select one slave server to convert to the master server.
- Reroute all slave servers under the offline master to replicate the new master.
- Set the primary server that has gone offline as the secondary server of the new server. When the old primary server comes online again, it becomes the secondary server of the new primary server.
Select the new master server
Select a slave server from all slave servers under the offline primary server and send the SLAVEOF no one command to convert the slave server to the primary server.
Rules for picking a new master server
Sentinel, the lead server, saves all slave servers that have been taken offline into a list, and then filters the list to pick out new master servers.
- Delete all secondary servers that are offline or disconnected from the list.
- Delete all slave servers from the list that have not responded to the lead Sentinel INFO command in the last 5 seconds
- Delete all servers that disconnected from offline servers for more than Won-after-milliseconds * 10 milliseconds
- Then, the remaining slave servers in the list are sorted according to the priority of slave servers, and the server with the highest priority is selected.
- If there are multiple slave servers with the same highest priority, the slave server with the highest offset is sorted by the replication offset (the highest replication offset also means that the data it holds is the most recent)
- If the replication offset is the same, sort by rUNId and select the slave with the smallest RUNID
After sending slaveof no one, the lead Sentinel will send info to the upgraded slave server once every second (normally every 10 seconds). If the role returned is changed from slave to master, Then the lead Sentinel knows that the slave server has been upgraded to the master server.
Modify the replication target of the secondary server
Use the SLAVEOF command to make the slave server copy the new master server. When sentinel detects that the old master server is back online, it also sends the SLAVEOF command to make it a SLAVEOF the new master server.
Sentinel summary
Sentinel is actually a monitoring system, and after the primary server goes offline, a leading Sentinel can be elected through the election mechanism, and then the leading Sentinel will select one of the secondary servers under the offline primary server to switch to the primary server, without manual switch.
The cluster
Sentinel mode does an automatic master/slave switchover, but there is still only one master server for write operations (sentinel mode can also monitor multiple master servers, but clients need to implement load balancing). Officials also offer their own way to implement clustering.
node
Each Redis service instance is a node, and multiple connected nodes form a cluster.
CLUSTER MEET <ip><port>
Sending the CLUSTER MEET command to another node can cause the node to shake hands with the target node, and the successful handshake can add the node to the current CLUSTER.
Start node
The redis server startup determines whether to enable the server cluster mode based on whether the cluster-Enabled configuration option is set to Yes.
Cluster data structure
Each node uses a clusterNode structure to record its own status and creates a corresponding clusterNode structure for other nodes in the cluster to record the status of other nodes.
Typedef struct clusterNode {// Struct clusterNode ctime; // Node name char name[CLUSTER_NAMELEN]; // Node flags // Various flag values record the role of the node (such as primary node or secondary node) // and the current state of the node (online or offline) int flags; // The current configuration era of the node, used to failover the uint64_t configEpoch; // Node IP address char IP [NET_IP_STR_LEN]; ClusterLink *link; list *fail_reports; / /... } clusterNode;Copy the code
ClusterLink holds the information needed to connect nodes
typedef struct clusterLink { // ... // Connection creation time mstime_t ctime; Struct clusterNode *node; struct clusterNode *node; / /... } clusterLink;Copy the code
Each node also keeps a clusterState structure, which records the current status of the cluster from the perspective of the current node, such as whether the cluster is online or offline, and how many nodes the cluster contains.
Typedef struct clusterState {// clusterNode *myself; // The current configuration era of the cluster, used to failover the Uint64_t currentEpoch; // Cluster status, online or offline int state; // Number of nodes that process at least one slot in the cluster int size; // List of cluster nodes (including the myself node) // The dictionary key is the node name, and the dictionary value is the node's corresponding clusterNode structure dict *nodes; } clusterState;Copy the code
Implementation of the CLUSTER MEET command
CLUSTER MEET <ip><port>
- Node A creates A clusterNode structure for node B and adds the structure to its ClusterState. nodes dictionary.
- Then, node A will send A MEET message to node B based on the IP address and port number given by the CLUSTER MEET command.
- If all goes well, node B will receive the MEET message sent by node A, and node B will create A clusterNode structure for node A and add this structure to its ClusterState. nodes dictionary.
- Then, node B will return A PONG message to node A.
- If everything goes well, node A will receive the PONG message returned by node B. Through this PONG message, node A can know that node B has successfully received the MEET message sent by it.
- Node A will then return A PING message to node B.
- If everything goes well, node B will receive the PING message returned by node A. Through this PING message, node B knows that Node A has successfully received its PONG message, and the handshake is completed.
Slots assigned
The entire database of the cluster is divided into 16,384 slots, each key belongs to one of the 16,384 slots, and each node in the cluster processes 0 or 16,384 slots. If all slots have nodes, the cluster is online; otherwise, the cluster is offline.
CLUSTER ADDSLOTS
CLUSTER ADDSLOTS
… You can use the CLUSTER ADDSLOTS command to assign slots to the current node. For example, CLUSTER ADDSLOTS 0 1 2 3 4 can assign slots 0 to 4 to the current node
Records the slot assignment information of a node
The Slots and numSlot attributes of the clusterNode structure record which slots the node is responsible for:
typedef struct clusterNode {
unsigned char slots[CLUSTER_SLOTS/8];
int numslots;
// ...
} clusterNode;
Copy the code
Slots: is a binary array of 16384 bits. The numslots attribute records the number of slots handled by the node. That is, the number of bits in slots with a value of 1.
Propagate slot assignment information for a node
In addition to recording its slots in clusterNode, the node also sends the Slots array to other nodes in the cluster to tell them which slots it is currently handling.
typedef struct clusterState {
clusterNode *slots[CLUSTER_SLOTS];
} clusterState;
Copy the code
Slots contains 16384 entries. Each array entry is a pointer to a clusterNode, indicating that it is assigned to that node, or NULL if it is not assigned to any node.
Implementation of the CLUSTER ADDSLOTS command
Execute commands in the cluster
When a client sends a database-related command to a node, the node receiving the command calculates which slot the database key to be processed by the command belongs to and checks whether it has been assigned to it.
If assigned to itself, the node executes the command directly. If not, the node returns a MOCED error to the client, directing the client to the correct node and sending the executed command again.
Which slot does the key belong to
CRC16(key) is the checksum of CRC16 that computs the key key, and &16383 is the mod that computs the slot number of the key as an integer between 0 and 16383.
Check whether the slot is handled by the current node
After calculating the slot number I that the key belongs to, the node can determine whether the slot number is handled by itself. If Clusterstate. slots[I] equals clusterstate. myself, then you can run the command directly on this node. If not, then clusterState.slots[I] points to clusterNode IP and port, and returns a MOVED error to the client, directing the client to the node responsible for the slot.
In cluster mode, no ‘MOVED’ errors are printed, but direct automatic redirection.
The shard
Redis cluster reallocation allows any number of slots assigned to one node to be assigned to another node, and the key-value pairs of the associated slots to be moved from the source node to the target node. Resharding takes place online. During resharding, the cluster does not go offline, and both source and target nodes can continue to process command requests. Resharding of the Redis cluster is performed by Redis-trib. Re-sharding is performed as follows:
- Redis-trib is sent to the destination node
CLUSTER SETSLOT <slot> IMPORTING <source_id>
Command to prepare the target node to import the key-value pairs of slots from the source node. - Redis-trib is sent to the source node
CLUSTER SETSLOT <slot> MIGRTING <target_id>
Command to prepare the source node to migrate key-value pairs belonging to slots to the target node. - Redis-trib is sent to the source node
CLUSTER GETKEYSINSLOT <slot> <count>
Command to obtain the names of a maximum of count key pairs that belong to slots. - For each key name obtained in Step 3, redis-trib sends one to the source node
MIGRTING <target_ip> <target_port> <key_name> 0 <timeout>
Command to migrate the selected key-value pair from the source node to the destination node. - Repeat Step 3 and Step 4 until all key-value pairs saved on the source node that belong to slots are migrated to the destination node.
- Redis-trib is sent to any node in the cluster
CLUSTER SETSLOT <slot> NODE <target_id>
Command to assign slots to target nodes. This information is eventually sent through a message to the entire cluster.
CLUSTER SETSLOT IMPORTING command is implemented
typedef struct clusterState {
// ...
clusterNode *importing_slots_from[CLUSTER_SLOTS];
} clusterState;
Copy the code
Importing_slots_from records the slot that the current node is importing from another node. Importing_slots_from [I] is not null, then it points to CLUSTER SETSLOT
IMPORTING
The CLUSTER SETSLOT MIGRTING command is implemented
typedef struct clusterState {
// ...
clusterNode *migrating_slots_to[CLUSTER_SLOTS];
} clusterState;
Copy the code
Migrating_slots_to Records the slot where the current node is being migrated to another node. Migrating_slots_to [I] If migrating_SLOts_to [I] is not null, it points to the clusterNode structure represented by migration to the destination node.
ASK the wrong
During resharding, as the source node migrates slots to the target node, some key-value pairs that may belong to this slot are stored in the source node and some in the target node. The client sends a command related to the database key to the source node just as the slot is being migrated. The source node now looks for the specified key in its own database and executes it if it finds it. If not, the node checks migrating_SLOts_to [I] to see if the key is migrating, and if so returns an ask error directing the client to the target node.
ASKING
After receiving the ask error, the client executes the ASKING command and then sends the command to the target node. The ASKING command is to enable the REDIS_ASKING identifier of the client that sends the command. > importing_slots_from[I]; importing_slots_from[I]; If the client that sends the command has the REDIS_ASKING flag, it will execute the command once.
Failover of the cluster
The failover effect of the cluster is similar to that of the sentinel mode, which also upgrades the slave node to the master node. The old master node becomes the slave node of the new master node when it comes online again.
Fault detection
Each node in the cluster will regularly send PING messages to other nodes in the cluster to check whether the other node is online. If no PONG message is received within a specified time, the node will be marked as suspected offline. Locate the clusterNode structure of the node in the clusterstate. nodes dictionary and change the flags attribute to REDIS_NODE_PFAIL. Each node in the cluster sends messages to each other to exchange the status of each node in the cluster, for example: When primary node A learns that primary node B thinks that primary node C is in the suspected offline state, primary node A finds the clusterNode structure of node C in the ClusterState. nodes dictionary and adds the offline report of primary node B to the Fail_Reports list of the clusterNode structure. Each offline report is represented by a clusterNodeFailReport structure
typedef struct clusterNodeFailReport { struct clusterNode *node; // When the last offline report was received mstime_t time; } clusterNodeFailReport;Copy the code
If more than half of the primary nodes responsible for processing slots in a cluster report a primary node X as suspected offline. The primary node X will be marked as offline. Nodes that mark primary node X as offline broadcast a FAIL message about primary node X to the cluster. All nodes that receive this FAIL message mark primary node X as offline.
failover
When a slave node finds that the master node that it is replicating enters the offline state, the slave node begins to failover the offline master.
- Copy all slave nodes of the offline primary node and one of the primary nodes will be selected.
- The selected secondary node will run the SLAVEOF no one command to become the new primary node.
- The new master undoes all slots assigned to the offline master and assigns them all to itself.
- The new master node broadcasts a PONG message to the cluster, which lets other nodes in the cluster know immediately that the node has changed from slave to master. The master node has taken over the slot that the offline node handles.
- The new master node starts receiving command requests related to the slot it is responsible for processing, and the failover is complete.
Elects a new master node
The new master node is elected
- The configuration era of the cluster is an increment counter with an initial value of 0.
- When a node in the cluster starts a failover operation, the value of the configuration era of the cluster is incremented by one.
- For each configuration era, each primary node in the cluster responsible for processing slots has a chance to vote, and the first secondary node that requests a vote from the primary node gets the primary node’s vote.
- When the slave node detects that the master node it is replicating has gone offline, the slave node broadcasts a CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST message to the cluster, requiring all primary nodes that have received the message and voting rights to vote for the slave node.
- If a master node has a vote (it is handling the processing slot) and the master node has not voted for another slave node, the master node will return a CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message to the slave node requesting the vote, indicating that the master node supports the slave node becoming the new master node.
- Each participating slave receives a CLUSTERMSG_TYPE_FAILOVER_AUTH_ACK message and counts how much primary support it has received based on how many such messages it has received.
- If there are N primary nodes with voting rights in the cluster, the secondary node will be elected as the new primary node when N / 2 + L votes or more are collected from the primary node.
- Since each primary node can vote only once in each configuration era, if there are N primary nodes voting, there will only be one secondary node with N / 2 + L votes or greater, ensuring that there will only be one new primary node.
- If not enough votes are collected from the nodes in a configuration era, the cluster enters a new configuration era, and elections are held again until a new primary node is elected.
The process of primary node election is very similar to the process of sentinel election.
Data loss
The primary and secondary replication data is lost
The replication is performed asynchronously between the master and slave databases. It is possible that some data of the master is not synchronized to the slave database, and then the master hangs, and the unsynchronized data is lost.
Split brain
Split-brain is when a master’s machine suddenly disconnects from the normal network and cannot be connected to other slave machines, but the master is still running. At this point, the sentry might think that the master is down, open the election, and switch other slaves to master. At this point, there will be two masters in the cluster, which is called split brain. Even though a slave is switched to the master, the client may continue to write data to the old master before switching to the new master. When the master recovers again, it is attached to the new master as a slave and its own data is cleared. Data is copied from the new master again, resulting in data loss.
Reduce data loss configurations
min-slaves-to-writ 1
min-slaves-max-lag 10
Copy the code
The above configuration means that if at least one slave server has not sent itself an ACK message for more than 10 seconds, then the master will not perform a write request.
The primary and secondary data are inconsistent
Data synchronization is delayed when the secondary database is blocked due to network problems or high command complexity. As a result, data synchronization is delayed and the primary and secondary databases are inconsistent.
All see this, click a “like” and then go 🙂