Redis is an open source (BSD-licensed), in-memory data structure storage system that can be used as a database, cache, and messaging middleware. It supports many types of data structures, such as strings (hashes), lists (Lists), sets (sets), sorted sets (sorted sets) and range queries, bitmaps, Hyperloglogs and Geospatial index radius query. Redis is built with replication, LUA scripting, LRU eviction, transactions and different levels of disk persistence, And provides high availability through Redis Sentinel and Automated partitioning (Cluster).
1. Master-slave architecture
The Redis Slaveof command converts the current server to a slave server of a specified server.
If the current server is already a slave server of a master server, executing SLAVEOF Host Port will cause the current server to stop synchronizing with the old master server, discard the old data set and start synchronizing with the new master server.
In addition, executing the SLAVEOF NO ONE command on a slave server will cause the slave server to turn off replication and switch back from the slave server to the master server without discarding the data sets previously synchronized.
SLAVEOF NO ONE will not discard synchronized datasets, allowing the slave server to be used as the new primary server in the event of a primary server failure.
Redis 127.0.0.1:6379> SLAVEOF 127.0.0.1 6379 OK redis 127.0.0.1:6379> SLAVEOF NO ONE OKCopy the code
1. Master-slave architecture
- The primary server is responsible for receiving write requests
- The slave server is responsible for receiving read requests
- Data from the slave server is copied to the master server. The data on the primary and secondary servers is consistent
In addition to the above form, the master-slave schema also has the following (though less frequently) :
2. Advantages of master-slave architecture
- Read/write separation (primary writes, secondary reads)
- High availability (if one slave server hangs up, other slave servers can continue to receive requests without affecting services)
- Handle more concurrency (each slave server can receive read requests, so read QPS go up)
Second, the replication function
1. Implement the replication function
The replication function is divided into two operations:
- Synchronization (sync)
Update the database state of the slave server to the database state of the master server
- Command propagate
The database status of the primary server is changed, causing the database status of the primary and secondary servers to be inconsistent.
The synchronization between the secondary server and the master server can be divided into two cases:
- Initial synchronization: The secondary server has not replicated any primary server, or the primary server to be replicated from the secondary server is different from the primary server to be replicated.
- Synchronization after disconnection: The replication between the primary server and secondary server is interrupted due to network reasons. The secondary server reconnects to the primary server through automatic reconnection and continues to replicate the primary server
Prior to Redis2.8, copying after disconnection was missing only part of the data, but it was inefficient to get the master and slave servers to re-execute SYNC. (The SYNC command is used to resynchronize all data, not just the lost data.)
Let’s take a closer look at how replication is implemented after Redis2.8:
2. Pre-copy work
- Secondary server Sets the IP address and port of the primary server
- Establish a Socket connection to the primary server
- Send PING command (check whether Socket read and write is normal and communicate with the master server)
- Authentication (see if the corresponding authentication configuration is set)
- The slave server sends port information to the master server, and the master server records the listening port
As mentioned earlier, prior to Redis2.8, SYNC was reexecuted after disconnection, which was very inefficient. Let’s take a look at how synchronization works after Redis2.8.
Starting from version 2.8, Redis uses the PSYNC command instead of the SYNC command to perform synchronization during replication. The PSYNC command has both full and partial resynchronization modes.
3. Complete resynchronization
- The secondary server sends the PSYNC command to the primary server
- The primary server receiving the PSYNC command executes the BGSAVE command to generate an RDB file in the background. A buffer is used to record all write commands executed from now on.
- When the BGSAVE command of the primary server is finished, the generated RDB file is sent to the slave server, which receives and loads the RBD file. Update the state of your database to the state when you run the BGSAVE command with the master server.
- The master server sends all buffer write commands to the slave server, and the slave server executes these write commands to achieve final data consistency.
4. Partial resynchronization
Let’s take a look at partial resynchronization. Partial resynchronization allows us to reconnect after disconnection and only synchronize the missing data (instead of synchronizing all data before Redis2.8), which is logical! Partial resynchronization consists of the following parts:
- Replication offset of the primary and secondary servers
- Replication backlog buffer for the primary server
- Server running ID(RUN ID)
First let’s explain the above nouns:
Replication offset: Both parties performing replication maintain a replication offset
- The master server adds N to its own replication offset each time it propagates N bytes
- Each time the slave server receives N bytes from the master, it adds N to its own replication offset
By comparing the offsets of the master/slave replication, it is easy to know whether the data on the master/slave server is in a consistent state!
The e secondary server sends the PSYNC command to the primary server and reports that the offset is 36. Should the primary server perform full or partial resynchronization on the secondary server? This is left up to the replication backlogs. When the master server propagates commands, it not only sends write commands to all slave servers, but also queues the write commands into the replication backlog buffer (this size can be adjusted). Partial resynchronization is performed if data with a missing offset exists in the replication backlog, otherwise full resynchronization is performed.
The run ID of the server is actually used to check whether the IDS are the same. If not, then the primary server replicated before the secondary server was disconnected is the same as the primary server currently connected, which causes a full resynchronization.
5. Command transmission
When synchronization is complete, the master/slave server enters the command propagation phase. In this case, the master server only needs to send its own write command to the slave server, and the slave server receives and executes the write command sent by the master server, so that the master and slave servers can keep the data consistent! During command propagation, the secondary server sends the REPLCONF ACK <replication_offset> command to the server once every second. Replication_offset is sent from the current replication offset of the server. This command has three functions:
- Check the network status of the primary and secondary servers
- Assist to implement the Min-Slaves option
- Detection command loss
3. Active/standby Switchover (Failover)
1. Sentinal mechanism
Redis provides the Sentinal mechanism. If the primary server is down, we can upgrade the secondary server to the primary server, and wait until the old primary server (the one that was down) reconnects, it will become the secondary server.
- This process is called failover (failover)
The Sentinal mechanism is mainly used to achieve high availability of Redis. Its main functions are as follows:
- Sentinel continuously monitors whether the Redis master and slave servers are working properly
- If a Redis instance fails, sentry is responsible for sending a message notifying the administrator
- If the primary server fails, the secondary server is automatically promoted to the primary server (including configuration changes).
- Sentinel can act as a configuration hub, providing information about the current master server.
Sentinel makes our Redis highly available, and Sentinel itself as a component must be highly available (it can’t be a single point)
Here’s how Sentinel moved from a secondary server to a primary server.
2. Start and initialize the Sentinel
First things first: Sentinel is essentially just a Redis server running in a special mode. Because Sentinel and Redis servers do different things, they are initialized differently (for example, Sentinel does not load AOF/RDB files when initialized because Sentinel does not use a database).
The regular Redis server code is then replaced with Sentinel specific code at startup. (So Sentinel is a Redis server, but it can’t execute SET, DBSIZE, etc., because the code for the command table has been replaced.)
Next, the status of Sentinel is initialized and the list of primary servers monitored by Sentinel is initialized based on the given configuration file.
Finally, Sentinel creates two network connections to the primary server:
- Command connection (send and receive commands)
- Subscribe to the Connection (subscribe to the main server’s _sentinel_: Hello channel)
3. Obtain and update information
Sentinel sends the INFO command from the master server to obtain the address information of all slave servers under the master server, and creates the corresponding instance structure for these slave servers. In addition to creating the corresponding slave instance structure, Sentinel also creates command connections and subscription connections when a new slave is detected.
While Sentinel is running, a command link sends commands to the _sentinel_: Hello channel of the monitored master/slave servers every two seconds. And receives information about the _sentinel_: Hello channel through a subscription connection.
This way, time after time, we can update the information of each Sentinel instance structure.
4. Check whether the primary server is offline
There are two cases to determine whether the primary server is offline:
- Subjective offline
- Once per second, Sentinel sends a PING command to the instances connected to its creation command, including the primary and secondary servers and other Sentinels. The PING command returns information to determine whether the instance is online
- If a primary server sends a series of invalid replies to Sentinel within down-after-milliseconds, the Sentinel assumes that the primary server has gone offline.
- Objective offline
- When Sentinel determines that a primary server is subjectively offline, to confirm that the primary server is actually offline, it asks Sentinel, which also monitors the primary server, if they also think the primary server is offline.
- If enough Sentinels believe that the master server is offline, the master service is judged to be objectively offline and failover is performed on the master server.
How many milliseconds did it take for an invalid reply to determine that the primary server was subjectively offline, and how many sentinels determined that the primary server was objectively offline. It’s all configurable
5. Election lead Sentinel and failover
When a primary server is considered objective to go offline, various sentinels monitoring the primary server go offline through negotiation to elect a lead Sentinel, which will perform failover operations on the primary server.
There are many rules for electing the lead Sentinel. Generally, it is first come, first served (whichever is faster, is chosen). After electing the lead Sentinel, the lead Sentinel will perform the failover operation for the offline primary server, including three steps:
- Select one of the slave servers under the offline master server to switch to the master server
- Make all slave servers under the offline master replicate the new master
- When the offline primary server reconnects, make it the secondary server of the new primary server
There is also a policy for selecting a slave server as the primary server, as follows:
- Duration of disconnection from master
- Slave priority
- Copy the offset
- run id
6. Data is lost
So far the master-slave + Sentry architecture can say that Redis is highly available, but to be clear: Redis still loses data in two cases:
- Data loss caused by asynchronous replication
- Some data is lost when the master server goes down before it can be copied to the slave server
- Data loss due to split-brain
- Sometimes the primary server is disconnected from the network and cannot connect to other secondary servers. At this point, the sentry might think that the primary server is offline (and then open an election, switching one of the secondary servers to the primary), but the primary server is still running. At this point, there will be two servers in the cluster (also known as split brain).
- Even if a secondary server is switched to the master server, the client may continue to write data to the old master server before switching to the new master server. When the old server reconnects, the new master server is replicated as the secondary server (which means the old data is lost).
You can use the following two configurations to minimize the possibility of data loss:
min-slaves-to-write 1
min-slaves-max-lag 10
Copy the code