5. Multi-machine database

This section mainly introduces replication, clustering, and sentinel related to multi-level databases.

replication

Source code: replication.c

In Redis, you can use SLAVEOF or set the SLAVEOF option to make one server copy another. This scenario is known as the master-slave structure, which is usually used for read/write separation, with the master server responsible for read/write and the slave server responsible for read, as shown in the figure below.

Copy and compare old and new versions

Redis2.8 is a more efficient and secure version of the replication function, compared to the old version of the replication function (please click or download to view the larger image) :

The synchronization operation of the old version is as follows:

Note: When the primary server does BGSAVE and sends RDB files, all write commands are written to the send buffer, and all write commands are sent to the secondary server when the transfer is complete.

An example of command propagation is shown below:

Partial synchronization of the new replication feature is shown below:

The new replication function invocation flowchart is as follows:

Further explain how the new version of replication implements partial resynchronization:

· Server ID: When the secondary server synchronizes a server, this ID is set to the ID of the primary server. This is to determine whether the last connection is the original server when the secondary server disconnects and reconnects to the primary server.

· Replication offset: Similar to TCP SEQ, both master and slave servers maintain a replication offset: when the master server sends **N bytes to the slave server (instead of a command) **, this offset+N; When N bytes are received from the server, the Offset+N; This is the key to partial synchronization: let the master know which commands the slave is missing!

· Replication backlog buffer: a fixed-length (adjustable, 1M by default) FIFO queue on the primary server. This is also the key to partial synchronization: in this queue, the replication offset is maintained for each write command, as shown in the figure below.

If the replication command required from the server is in the queue, partial resynchronization can be performed; If it’s not… then you have to fully resynchronize!

The whole process of replication

· Set the address and port of the master server

· Establish socket connections

· Send the PING command (check the status of the master server)

· Authenticate (if Masterauth is set)

· Send port information, that is, the slave server sends its own port to the master server (mainly for printing information)

· Synchronous operation (at this time, the two clients can be used to transmit write commands)

· Command propagation

In this phase, the slave server sends the command REPLCONF ACK to the master server for 1s by default, which is a heartbeat check to check for network connectivity or command loss.

The guard Sentinel

Source code: sentinel.c

introduce

Sentinal is an important component of high availability in Redis multi-machine architecture. Its main functions are as follows:

(1) Cluster monitoring, responsible for monitoring whether the Redis master and slave processes work normally (heartbeat detection);

(2) message notification. If a Redis instance fails, the sentry is responsible for sending a message as an alarm notification to the administrator;

(3) Failover. If the master node fails, it will be automatically transferred to the slave node.

(4) Configure the center to notify the client client of the new master address if failover occurs;

The sentinels themselves are also distributed, operating as a cluster of sentinels, working cooperatively with each other

(1) During failover, determining whether a master node is down requires ** most (more than half) ** of the sentries to agree, which is related to distributed elections;

(2) The sentinel cluster can work even if part of the sentinel nodes fail, because a failover system that is an important part of the high availability mechanism is itself a single point of instability;

The following diagrams show how the Sentinel system implements monitoring of the Redis server:

The initial state is as follows:

Primary server offline:

Perform failover:

The original master server went online and was degraded

Initialize the Sentinel

There are four main steps:

· Initialize server: Sentinel itself is a Redis server running in special mode!

The differences between Sentinel and regular Redis servers are as follows:

· Use sentinel specific code: some code from regular servers will not be used;

· Initialize sentinel status, including monitoring master server IPport, etc. (read configuration file);

· Create a network connection with the monitoring server: one is a command connection, which is responsible for command interaction with the master server; The second is to subscribe to the server’s _sentinel_: Hello channel. ;

Obtain primary and secondary server information

After the connection is established, sentinel defaults to 10s to get the return message from the master/slave server via the INFO command.

Note: Sentinel initially connects to the primary server, and when it finds a new slave server, it creates its instance (flags is SRI_MASTER), establishes the connection, and subscribes.

· Obtain master server information:

On the one hand, there is information about the master server itself, including the server running ID recorded in the RUN_id field and the server role (master/slave) recorded in the ROLE field.

The other is information about all slave servers under the master server. Each slave server is recorded by a line at the beginning of the string “slave”. The IP = field of each line records the IP address of the slave server, and the port= field records the port number of the slave server. Based on these IP addresses and port numbers, Sentinel automatically discovers slave servers without requiring the user to provide the address information of the slave server.

Sentinel uses this information to update the instance structure of the primary server.

· Slave server information:

Run ID run_id of the secondary server.

From the role of the server.

The IP address of the master server is master_host and the port number of the master server is master_port. Connection status of the primary and secondary servers master_link_status.

The priority of the slave server is s1ave_priority (related to subsequent elections).

Replication offset of the slave repl offset.

Sentinel uses this information to update the instance structure of the primary server.

Interact with the master and slave servers

Sentinel sends commands to the monitoring master/slave server over a command connection every two seconds by default:

PUBLTSHsentinel__:hello”<s_ip>,<s_port>,<s_runid>,<s_epoch>,<m_name>,<m_ip>,,”

The main purpose of these commands is to keep the two connected and make sure the information is correct.

Sentinel also sends messages to connected servers via its sentine_: Hello channel.

This channel information is shared by all sentinels, and other Sentinels update their instance information accordingly.

Check the offline

It is divided into detecting subjective/objective offline status.

· Subjective offline: By default, PING is sent to all instances once per second to determine whether the connection is down based on the response/disconnection duration returned by the server.

Note: Multiple sentinels have different down-after-milliseconds, and only the sentinel with the longest interval counts as offline.

· Objective referral: This is what was mentioned in the previous point: Therefore, when a sentinel considers a server to be offline (subjective offline), he will continuously ask “whether the server is offline”. Only after receiving enough (configurable) offline judgments will the server be considered to be truly objective offline. And perform an objective migration from the server.

Note: Objective referral is based on subjective referral.

failover

Failover is divided into two steps:

· Elect lead Sentinel. When the main server is judged to be objective offline, each Sentinel monitoring the offline server will negotiate (broadcast to other sentinels) and elect a lead Sentinel, which will perform failover. This process mainly has the following rules:

There can only be one lead sentinel in a configuration era. Each sentinel is eligible. Each election is conducted regardless of the outcome of the configuration All sentinels in the same configuration epoch have only one chance to set ** a sentinel as a local leader. Sentinels that find that the primary server is offline objectively require their peers to set ** a sentinel as a local leader Entinel is set as local leader, that is, global leader, and if it is not selected within a given time, it will be repeated.

· Failover: After selecting the lead sentienl, the lead will failover, which mainly consists of three steps, as shown below:

The cluster

Source code to cluster.c.

Cluster is actually running multiple instances of Redis, different from distributed, cluster is equivalent to increasing hardware resources to make the whole system better performance (using load balancing), while distributed is each part of the classification, each role, IPC communication.

Redis is started one by one. When the CLUSTER MEET command is used, the corresponding Redis instances will shake hands and become a CLUSTER.

Assign hash slots

Data allocation for Redis clusters is dispatched in a form called a hash slot, which has great flexibility and can be configured freely. The Redis cluster stores key-value pairs in the database by sharding: the whole database of the cluster is divided into 16384 slots, and each key in the database belongs to one of the 16384 slots. Each node in the cluster can handle zero or up to 16384 slots.

When 16,384 slots are allocated, the cluster is online; otherwise, the cluster is offline (fail).

CLUSTER ADDSLOTS can be used to allocate slots.

The slots information for each cluster is recorded by the clusterNode slots information, which is an array:

Each node in the cluster sends its slots array to the other nodes in the cluster via message, and each node that receives the slots array stores the array in the corresponding clusterNode structure. Therefore, Each node in the cluster knows which node in the cluster the 16,384 slots in the database are assigned to.

Note: Each cluster node records the assignment of 16384 slots in the ClusterState slots array (cluster *slots[], previously unsigned char slots[]). If it is NULL, it indicates no assignment. If it is not NULL, it stores the corresponding IPport and other information.

How do I run commands in the cluster

The key to executing a command on the client is to calculate which cluster node the command should correspond to, as shown in the following:

· Calculate which slot the key is in:

def slot__number (key):
    return CRC1 6 (key)&16383
Copy the code

· Determine which node the slot is on

The clusterState. slots array is used to determine whether the current node is itself. If not, the client redirects MOVDE incorrectly.

Note: No ‘MOVED’ errors will be printed in clustered mode, only Redis alone will, because it does not understand.

Note: In cluster mode, only database 0 can be used.

The shard

Redis clusters can be resharded online, which is performed by their cluster management software, Redis-Trib, as shown below:

Note: During resharding, what happens if the client executes a command when part of the key-value pair of the slot being migrated is on the source node and part of the key-value pair is on the destination node?

Note: The ASK error, like the MOVED command, is automatically acted by the cluster and hidden. The difference between the two:

1) version error on behalf of the groove is responsible for the right is now shifting from one node to another node, in after the client receives the slot I version of the error, each time the client met about slot I command request, can be directly to the command node, the request is sent to version error points to because the node of the node is currently responsible for the tank I.

2) In contrast, an ASK error is a temporary measure used by the two nodes in the process of migrating slots: after the client receives an ASK error about slot I, the client will only send the command request for slot I to the node indicated by the ASK error on the next command request. However, this redirection will not have any effect on the client’s future command requests about slot I, and the client will still send the command requests about slot I to the node currently responsible for processing slot I, unless the ASK error occurs again.

failover

The nodes in the cluster are divided into master node and slave node. The master node processes the slot, and the slave node replicates the master node and replaces the master node when the master node goes down and goes offline. For example, in the following cluster, 7004 and 7005 are two slave nodes that replicate the data state of node 7000.

If node 7000 goes offline due to an error, the active primary nodes 7001, 7002, and 7003 will choose one of the secondary nodes 7004 and 7005 as the primary node. Even if node 7000 comes online again, it will be a secondary node of the new master node.

Each node in the cluster will regularly send PING messages to ** other nodes in the cluster to check whether the other node is online. If the node that receives the PING message does not return PONG message to the node that sent the PING message within the specified time, The sending node will then flag the receiving node as suspected to be offline (PROBABLE fail, PFAIL).

When more than half of the nodes think that a node is suspected to be offline, a node is marked as offline and broadcast to the whole cluster. The following steps are used to start failover.

1) Select a slave node from all the slave nodes of the offline primary node.

2) The selected secondary node will run the SLAVEOF no one command to become the new primary node.

3) The new master node will revoke all slots assigned to the offline master node and assign all slots to itself.

4) The new master node broadcasts a PONG message to the cluster. This PONG message can let other nodes in the cluster know immediately that this node has changed from a slave node to a master node, and this master node has taken over the slot that was handled by the offline node.

5) The new master node starts to receive command requests related to the slot it is responsible for processing, and the failover is complete.

The slave node election works like this: when a slave node receives a message that its master node fails, it broadcasts the message to the cluster, and each master node can vote on it. In each configuration era, when a slave node has more than N/2 supporters, it is selected as the new master node for subsequent operations.

The above election process is very similar to sentinel election, which is based on Raft algorithm.