Redis supports different levels of disk persistence and provides high availability through Redis Sentinel and Automated partitioning (Cluster).

High availability of Redis

A master-slave replication

In Redis, one server replicates another by executing the SLAVEOF command or setting the SLAVEOF option. We call the replicated server master. The replicated server is called a slave. The databases on both primary and secondary servers will hold the same data. This phenomenon is conceptually called “database state consistency”, or simply “consistency”.

Implementation of replication

Step 1: Set the IP address and port of the primary server

When the client sends the following command to the slave server:

127.0.0.1:12345 > SLAVEOF 127.0.0.1 6379Copy the code

The master server IP address 127.0.0.1 and port 6379 will be saved to the masterhost and masterPort properties of the server state:

Struct redisServer {/ /... // Master server address char *masterhost; // Master server port int masterport; / /... };Copy the code

The SLAVEOF command is an asynchronous command. After the masterhost and masterport properties are set, the slave server will return OK to the client that sent the SLAVEOF command, indicating that the replication instruction has been received. The actual replication will not start until the OK is returned.

Step 2: Establish a socket connection

After the SLAVEOF command is executed, the slave server will create a socket connection to the master server based on the IP address and port set by the command. If success from the server to create a socket connection (connect) to the master server, so from the server to the socket connection a special event handler to handle duplicate work file, the processor will be responsible for performing subsequent copy work, such as receiving RDB file, and received the master server to write command.

After the master server accepts the socket connection from the slave server, it creates the corresponding client state for the socket and treats the slave server as a client connected to the master server. In this case, the slave server will have both server and client identities: The slave server can send command requests to the master server, and the master server will return command replies to the slave server

Step 3: Send the PING command

Once the slave server becomes a client of the master, the first thing it does is send a PING command to the master. It has two functions:

  • You can run the PING command to check whether the read/write status of the socket is normal.
  • You can run the PING command to check whether the primary server can process command requests properly.

If a “PONG” reply is read from the server, the network connection between the master and slave servers is normal.



Step 4: Authentication

After the slave server receives the “PONG” reply from the master, the next step is to authenticate: if the slave server has the Masterauth option set, then authenticate, otherwise not.

Step 5: Send port information

After the authentication step, the slave server executes a command to send the slave server’s listening port number to the master server. After receiving this command, the master server records the port number in the client state property corresponding to the slave server

Step 6: Synchronize

In this step, the slave server sends the PSYNC command to the master server to perform the synchronization and update its database to the current state of the master database. Until the synchronization operation is performed, only the slave server is the client of the master server, but after the synchronization operation, the master server also becomes the client of the slave server. The PSYNC command supports full and partial resynchronization modes.

  • Full resynchronization is used to handle the initial replication situation by having the master create and send the RDB file and sending write commands to the slave stored in the buffer.
  • Part weight break after repeated system synchronization is used to deal with, when after disconnection from the server to connect the server, the master server can connect the slave servers disconnect during the execution of write command to send from the server, just receive and execute these write command from the server, and database can be updated to the current state of the main server.

Step 7: Command propagation

When the synchronization is complete, the master server and the slave server enter the command propagation phase. At this time, the master server only needs to send the write command it executes to the slave server, and the slave server only needs to receive and execute the write command sent by the master server to ensure the consistency between the master server and the slave server. During command propagation, the secondary server sends heartbeat detection commands to the primary server once per second by default.

Sentinel

Sentinel consists of one or more Sentinel instances Sentinel system can monitor any number of the primary server, as well as the primary server of all from the server, and monitored in the main server into offline, automatically logoff subordinate one of the main server upgraded to a new master server from the server, and then by the new primary server instead of has rolled off the production line of the primary server requests continue to process orders.

  

Start and initialize Sentinel

When a Sentinel starts, it needs to perform the following steps

Step 1: Initialize the server

Sentinel is essentially just a Redis server running in a special mode, so the first step to start Sentinel is to initialize a regular Redis server, because Sentinel does different work than a regular Redis server The initialization process for Sentinel is not exactly the same as that for a regular Redis server. For example, a normal server would load an RDB or AOF file to restore the database state during initialization, but because Sentinel does not use a database, it would not load an RDB or AOF file during initialization

Step 2: Use the Sentinel special code

The next step was to replace some of the code used by regular Redis servers with Sentinel specific code.

Step 3: Initialize the Sentinel status

After applying Sentinel’s proprietary code, the server next initializes a SentinelState structure that holds all Sentinel function-related state on the server

Step 4: initialize the Sentinel | state masters attributes

The Masters dictionary in Sentinel status records information about all primary servers monitored by Sentinel, where:

  • The key of the dictionary is the name of the main server being monitored.
  • The dictionary value is the sentinelRedisInstance structure corresponding to the master server being monitored. Each structure represents an Instance of a Redis server monitored by Sentinel, which can be a master server, a slave server, or another Sentinel

** Step 5: ** Create a network connection to the primary server

The final step is to create a network connection to the monitored master server. Sentinel will be the client of the master server. It can send commands to the master server and obtain relevant information from the command reply. For each primary server monitored by Sentinel, Sentinel creates two asynchronous network connections to the primary:

  • One is the command connection, which is dedicated to sending commands to the master server and receiving command replies.
  • The other is the subscription connection, which is dedicated to subscribing to the main server’s channels.

Subjective offline

By default, Sentinel sends a PING command once per second to all instances (primary, secondary, and other Sentinels) with which it has created command connections, and determines whether the instance is online by the PING response returned by the instance. An instance replies to the PING command in the following two cases:

  • Valid reply: The instance returns one of the three responses: +PONG, -loading, or -masterDown.
  • Invalid reply: any reply other than the above three, or no reply is returned within the specified time limit.

If the configuration file specifies that the value of the down-after-milliseconds option for Sentinel is 50000 milliseconds, then when the primary server returns an invalid response to Sentinel for 50000 milliseconds, Sentinel marks the master as subjective offline and turns on the SRI_S_DOWN flag in the flags attribute of the instance structure that the master corresponds to.

Objective offline

When Sentinel determines that a primary server is subjectively offline, to confirm that the primary server is actually offline, it asks other Sentinels that also monitor the primary server if they also believe that the primary server has gone offline (either subjectively or objectively). When Sentinel receives a sufficient number of offline judgments from other Sentinels, Sentinel determines that the slave server is objectively offline and performs a failover on the primary server.

Election Lead Sentinel

When a primary server is judged to be objective offline, the sentinels monitoring the offline primary server negotiate to elect a lead Sentinel, who will perform failover operations on the offline primary server.

Suppose that there are now three Sentinels monitoring the same master server and that the three Sentinels have previously confirmed that the master has gone offline by command. So in order to select the lead Sentinel, the three sentinels will again send Sentinel ismaster-down-by-addr to the other sentinels, The first Sentinel to receive the command wins the election of the lead Sentinel, which can then start failover operations on the primary server.

failover

After the lead Sentinel is elected, the Lead Sentinel will perform a failover operation on the offline primary server, which consists of the following three steps:

Step 1: Select a new primary server

The first step in the failover operation is to select a healthy slave server with complete data from all slave servers that are subordinate to the offline master server, and then send the SLAVEOF no one command to this slave server to convert it to the master server.

Step 2: Modify the replication target of the secondary server

When a new master server appears, the next step for the lead Sentinel is to have all slave servers under the offline master server copy the new master server. This can be done by sending the SLAVEOF command to the slave server.

Step 3: Turn the old primary server into a secondary server

The final step in the failover operation is to set the offline primary server as a slave server for the new primary server. Sentinel also continues to monitor the service that has gone offline, and when it comes back online,Sentinel sends it the SLAVEOF command to set it up as a slave to the new master server.

The cluster

Redis cluster is a distributed database solution provided by Redis. The cluster shares data through sharding and provides replication and failover functions. A Redis cluster is usually composed of multiple nodes. At the beginning, each node is independent of each other, and they are all in a cluster that only contains itself. To build a real working cluster, we must connect all independent nodes to form a cluster containing multiple nodes.

Hash slot

Redis clustering introduces the concept of hash slots. Redis cluster has 16384 hash slots, Redis cluster through the way of fragmentation to save the key value pairs in the database, each node in the cluster can process 0 or up to 16384 slots. When all 16384 slots in the database have nodes processing, the cluster is online (OK). Conversely, if any slot in the database is not processed, the cluster is in an offline state (fail).

Slot allocation information

The slots and numSlot attributes of the clusterNode structure record which slots the node is responsible for:

Struct clusterNode {/ /... unsigned char slots[16384/8]; int numslots; / /... };Copy the code

The slots attribute is a bit array. The length of the array is 16384/8=2048 bytes, which contains 16384 bits. Redis starts with 0, terminates with 16383, and numbers 16384 bits from the slots array:

  • If the bits of the Slots array on index I have a value of 1, that means the node is responsible for processing slot I.
  • If the bits of the Slots array have a value of 0 on index I, then the node is not responsible for processing slot I.

In addition to recording its slots in the clusterNode slots and NumSlots attributes, a node also sends its slots array to other nodes in the cluster via a message to tell them which slots it is currently handling.

When node A receives the slots array from node B via A message, node A looks up the clusterNode structure corresponding to node B in its dictionary and saves or updates the slots array in the structure. Because each node in the cluster sends its slots array to the other nodes in the cluster via message, and each node that receives the S1OTS array saves the array to the corresponding clusterNode structure, Each node in the cluster knows which node in the cluster the 16,384 slots in the database are assigned to.

Slot assignment information

The slots array in the ClusterState structure records the assignment of all 16384 slots in the cluster:

Typedef struct clusterstate {/ /... c clusterNode *slots[16384]; / /... }clusterstate;Copy the code

The Slots array contains 16384 entries, each of which is a pointer to the clusterNode structure:

  • If the slots[I] pointer points to NULL, then slot I has not been assigned to any node.
  • If the slots[I] pointer points to a clusterNode structure, it indicates that slot I has been assigned to the node represented by the clusterNode structure.

The calculation of key

The node uses the following algorithm to calculate which slot a given key belongs to:

def slot_number(key): 
	return CRC16(key)& 16383
Copy the code

The CRC16 (key) statement is used to calculate the CRC-16 checksum of the key, and the &16383 statement is used to calculate an integer between 0 and 16383 as the slot number of the key.

You can use the CLUSTER KEYSLOT command to view which slot a given key belongs to:

127.0.0.1:7000 > CLUSTER KEYSLOT "date" (integer) 2022Copy the code

Execute commands in the cluster

Once all 16,384 slots in the database have been assigned, the cluster comes online and clients can send data commands to nodes in the cluster.

When a client sends a command related to a database key to a node, the node receiving the command calculates which slot the database key to be processed by the command belongs to and checks whether the slot has been assigned to it:

  • If the slot in which the key is located happens to be assigned to the current node, the node simply executes the command.
  • If the slot where the key is not assigned to the current node, the node will return a MOVED error to the client, directing it to the correct node and again sending the command it wanted to execute.

The implementation principle of resharding

The Redis cluster resharding operation can change any number of slots assigned to one node (the source node) to another node (the target node), and the key-value pairs of the associated slots are moved from the source node to the target node. Resharding can be done online without the cluster going offline, and both source and target nodes can continue to process command requests.

Resharding of Redis clusters is performed by Redis’s cluster management software, Redis-Trib. Redis provides all the commands required for resharding, while Redis-Trib does this by sending commands to source and target nodes. Redis-trib re-sharded the cluster’s individual slots as follows:

  1. Redis-trib sends CLUSTER SETSLOT IMPORTING to the target nodes to prepare them to import the key pairs of slots from the source nodes.
  2. Redis-trib sends CLUSTER SETSLOT MIGRATING to the source node to prepare the source node to migrate the key-value pairs belonging to slots to the target node.
  3. Redis-trib sends the CLUSTER GETKEYSINSLOT command to the source node to obtain up to a count of key/value pairs belonging to slots.
  4. For each key name obtained in Step 3, redis-trib sends a MIGRATE



    0 to the source node to MIGRATE the selected key atomically from the source node to the target node.


  5. Repeat Steps 3 and 4 until all the key-value pairs saved by the source node that belong to slots are migrated to the destination node.
  6. Redis-trib sends the CLUSTER SETSLOT NODE

    command to any NODE in the CLUSTER to assign slots to the target NODE. This assignment information is sent to the whole CLUSTER via a message. Eventually all nodes in the cluster will know that slots have been assigned to target nodes.

Fault detection

Each node in the cluster periodically sends PING messages to other nodes in the cluster to check whether they are online. If the node that receives the PING message does not return the PONG message to the node that sent the PING message within the specified time, The node that sends the PING message marks the node that receives the PING message as suspected to be offline. In a cluster, more than half of the nodes whose primary nodes have not returned messages are suspected to be offline, which will be marked offline and a FAIL message will be broadcast to the cluster about this node.

failover

When a slave node finds that the master node that it is replicating enters the offline state, the slave node starts to failover the offline master node. Here are the steps to failover:

  1. Of all slave nodes that replicate the offline master node, one slave node is selected.
  2. The selected secondary node will run the SLAVEOF no one command to become the new primary node.
  3. The new master undoes all slots assigned to the offline master and assigns them all to itself.
  4. The new master node broadcasts a PONG message to the cluster, which lets the other nodes in the cluster know immediately that the slave node has become the master node and that the master node has taken over the slot that was handled by the offline node.
  5. The new master node starts receiving command requests related to the slot it is responsible for processing, and the failover is complete.

Elects a new master node

The new master node is elected. Here is how a cluster elects a new master node:

  1. Each master node in the cluster responsible for processing slots has a chance to vote, and the first slave node that asks for a vote from the master node gets the vote from the master node.
  2. When the slave node finds that the master node it is replicating has gone offline, the slave node broadcasts a message to the cluster asking all the master nodes that received the message and have the right to vote to the slave node.
  3. If a master node has not voted for another slave node, the master node will return an ACK message to the slave node requesting the vote, indicating that the master node supports the slave node as the new master node.
  4. Each participating slave node receives an ACK message and counts how much support it has from the master node based on how many such messages it has received.
  5. If there are N primary nodes with voting rights in the cluster, the secondary node is elected as the new primary node when N/2+1 votes or more are collected from the primary node.

Persistence of Redis

Redis is an in-memory database that supports persistence, which means redis needs to constantly synchronize data in memory to disk for persistence. Redis supports two types of persistence: Snapshotting (snapshots) is also the default, and append-only file (aOF).

RDB persistence

Snapshots are the default persistence mode. In this way, the data in memory is written to a binary file as a snapshot. The default file name is dump.rdb. Also called RDB persistence

RDB file creation

There are two Redis commands that can be used to generate RDB files, one is SAVE and the other is BGSAVE.

  • The SAVE command blocks the Redis server process until the RDB file is created, and the server cannot process any command requests while the server process is blocked
  • The BGSAVE command gives birth to a child process, which then creates the RDB file, and the server process (parent) continues to process the command requests

Because the BGSAVE command can be executed without blocking the server process, Redis allows the user to have the server automatically execute the BGSAVE command once in a while by setting the save option for the server configuration. The user can set multiple save criteria with the Save option, but the server executes the BGSAVE command whenever any of the criteria is met.

save 900 1 
save 300 10 
save 60  10000
Copy the code

The BGSAVE command is executed if any of the following three conditions are met:

  • The server made at least one change to the database within 900 seconds.
  • The server made at least 10 changes to the database within 300 seconds
  • The server made at least 10,000 changes to the database within 60 seconds

RDB file loading

The loading of the RDB file is performed automatically at server startup, as soon as the Redis server detects the RDB file at startup, it will automatically load the RDB file. The server blocks while loading the RDB file until the load is complete.

Since AOF files are usually updated more frequently than RDB files

  • If AOF persistence is enabled on the server, AOF files are used to restore database state in preference.
  • The RDB file is used by the server to restore the database state only when AOF persistence is turned off.

Realize the principle of

When the Redis server starts, the user can set the Save option by specifying the configuration file or passing in the startup parameter. If the user does not actively set the Save option, the server will set the default conditions for the Save option

save 900 1 
save 300 10 
save 60  10000
Copy the code

The server program then sets the saveParams array of the server state redisServer structure based on the save criteria set by the Save option. Each element in the array holds one of the save criteria set by the Save option

Struct saveparam{struct saveparam time t seconds; Int changes; };Copy the code

In addition to the saveParams array, the server state also maintains a dirty counter and a lastSave property:

  • The dirty counter records how many changes (writes, deletes, updates, etc.) the server has made to the database state (all databases on the server) since the last SAVE or BGSAVE command was successfully executed.
  • The lastSave attribute is a UNIX timestamp that records the last time the server successfully executed the SAVE or BGSAVE command.

When the server successfully executes a database modification command, the program updates the dirty counter: the value of the dirty counter increases as many times the command changes the database.

struct redisServer{ //.... // Change the count long long dirty; // The last time the save was executed time_t lastsave; / /... };Copy the code

Redis’s server periodic operation function servercron is executed every 100 milliseconds by default. This function is used to maintain the running server, and one of its jobs is to check whether the save criteria set by the Save option have been met, and if so, to execute the BGSAVE command.

The program iterates through and checks all the save criteria in the SaveParams array, and if any of the criteria are met, the server executes the BGSAVE command.

AOF persistence

In addition to RDB persistence, Redis also provides AOF (Append Only File) persistence. Unlike RDB persistence, which records database state by saving key-value pairs in the database, AOF persistence records database state by saving write commands executed by the Redis server.

When AOF persistence is enabled, the server appends a write command to the end of the AOF BUF buffer of the server state in a protocol format after executing a write command.

AOF persistence policy

There are three methods: The default value of the appendfsync option is everysec.

  • Appendfsync always: Write to disk immediately after receiving the write command, slowest, but with full persistence
  • Appendfsync Everysec: Writes to disk once per second, a good compromise between performance and persistence
  • Appendfsync no: completely dependent on the operating system with best performance and no persistence

AOF file loading and data restoration

Because the AOF file contains all the write commands needed to restore the database state, the server simply reads and re-executes the write commands saved in the AOF file to restore the database state before the server was shut down. The detailed steps are as follows:

  1. Create a fake client with no network connection: Since Redis commands can only be executed in the client context, and the commands used to load AOF files come directly from AOF files and not from the network connection, the server uses a pseudo-client with no network connection to execute the write commands saved by AOF files. The effect of the command executed by the pseudo client is exactly the same as that executed by the client with network connection.
  2. Parse and read a write command from the AOF file.
  3. Use pseudo clients to execute read write commands.
  4. Continue steps 2 and 3 until all write commands in the AOF file have been processed.

AOF the rewrite

To solve the problem of bloated AOF files, Redis provides AOF file rewriting. With this feature, the Redis server can create a new AOF file to replace the existing AOF file. The old and new AOF files hold the same database state, but the new AOF file does not contain any redundant commands that waste space, and is much smaller.

First read the current value of the key from the database, and then use a command to record the key value pair, instead of recording the previous multiple commands, this is the realization principle of the AOF rewrite function.

The Redis server uses a single thread to process command requests, and the server will not be able to process command requests from clients during the rewriting of AOF files. So Redis puts the AOF rewrite in a child process to achieve two goals:

  • The server process (parent) can continue to process command requests while the child process does the AOF rewrite.
  • The child process has a copy of the server process’s data, and using the child process instead of the thread keeps the data secure without using locks.

Because the server process needs to continue processing command requests during the AOF rewrite, the current database state of the server is inconsistent with the database state stored in the AOF file after the rewrite. To solve this data inconsistency problem, the Redis server sets up an AOF rewrite buffer, which is used after the server creates the child process. When the Redis server executes a write command, it sends the write command to both the AOF buffer and the AOF rewrite buffer.

When the child completes the AOF rewrite, it sends a signal to the parent. The parent receives the signal and performs the following operations:

  • Write all the contents of the AOF rewrite buffer to the new AOF file, and the database state stored in the new AOF file will be the same as the current database state of the server.
  • Rename the new AOF file, overwrite the existing AOF file atomically, and complete the replacement of the old and new AOF files.

reference

Redis design and implementation of Redis combat Redis official website