This is the 29th day of my participation in Gwen Challenge.
How does Redis prevent data loss
We know that the cache database, speaking, reading and writing are all in memory, so its performance is high, but the data in memory can be lost as server restart, in order to ensure that data is not lost, to store data in memory to disk, so that the cache server restart, will restore the original data from disk, This process is Redis data persistence.
This is one of the things that distinguishes Redis from other cached databases (Memcached, for example, doesn’t have persistence). Redis persists data in three ways.
-
AOF log (Append Only File) : Records all operation commands and appends them to a File as text.
-
RDB snapshot (Redis DataBase) : Writes memory data ata certain point in time to disks in binary mode.
-
Hybrid Persistence: Redis 4.0 adds a hybrid persistence approach that integrates the benefits of RDB and AOF.
Let’s look at how these three approaches work.
How is AOF logging implemented
Typically, logs for relational databases such as MySQL are “Write Ahead Log” (WAL), which means modifying data is recorded in a Log file before the data is actually written so that it can be recovered in the event of a failure. For example, MySQL’s redo log records modified data.
The AOF log records every command received by Redis, and these commands are stored in text format. The difference is that Redis AOF logs are recorded in the opposite order to traditional relational databases. It is a write after log, which means that Redis executes commands first to write data to memory. Then log to a file.
Why do Reids execute commands first and then write data to the log? To help you understand, I’ve compiled the key memory points:
Because Redis does not syntax check commands before writing to the log; Therefore, only successful commands are recorded to avoid incorrect commands. Also, logging after the command is executed does not block the current write operation.
Of course, there are risks associated with doing so (which you’ll need to explain in the interview).
Data may be lost: If Redis is down after executing a command, the command may be lost.
Other operations may be blocked: although AOF is a post-write log to avoid blocking the execution of the current command, since AOF logs are also executed in the main thread, when Redis writes the log file to disk, it still blocks the execution of subsequent operations.
And because Redis cannot persist without AOF and RDB, we need to learn RDB.
How are RDB snapshots implemented
AOF logs record operation commands rather than actual data. Therefore, when you use THE AOF method to perform fault recovery, you need to execute all logs. Once there are too many logs, the recovery operation of Redis is bound to be slow.
To solve this problem, Redis has added RDB memory snapshots (memory snapshots, which record the state of memory in the form of data on disk at a certain point in time) to ensure reliability and fast recovery during downtime.
Unlike AOF, RDB records the data of Redis at a certain time, rather than the operation. Therefore, when performing data recovery, you only need to read the RDB file directly into the memory to complete fast recovery.
Does the RDB block threads when making snapshots
Because Redis’s single-threaded model requires that all operations avoid blocking the main thread, this is no exception for RDB snapshots, which may or may not degrade Redis performance.
To solve this problem, Redis provides two commands to generate RDB snapshot files, save and BGSave. The save command is executed in the main thread and blocks. The BGsave command creates a subprocess for writing to the RDB file, avoiding blocking on the main thread, which is the default configuration of Redis RDB.
Can RDB data be modified when making snapshots
What would happen if the data could or could not be modified during snapshot execution?
If the write operation can be performed at this time, Redis can handle the write operation normally, then the data in the snapshot being performed may have been modified.
If the write cannot be performed at this point: this means that all Redis writes cannot be performed until the snapshot is completed, then the main thread is blocked again.
So how does Redis solve this problem? It makes use of the bgSave child process as follows:
-
If the main thread performs reads, the main thread and bgSave children do not affect each other.
-
If the main thread performs a write operation, a copy of the modified data is made, and the BGSave child writes the copy to the RDB file, while the main thread can still modify the original data directly.
Note that Redis is important for RDB execution frequency because it affects the integrity of snapshot data and the stability of Redis. Therefore, after Redis 4.0, a hybrid data persistence mechanism of AOF and RDB was added: Data is written to a file in the form of RDB and subsequent operation commands are saved to a file in the format of AOF, which not only ensures the speed of Redis restart but also reduces the risk of data loss.
To sum up, when asked how Redis does not lose data, you should first realize that you are testing your knowledge of Redis data persistence. Redis has several methods of persistence, and then analyzes the principles and problems of AOF and RDB, and finally analyzes the persistence mechanism after Redis 4.0.
How does Redis make services highly available
In addition, Redis can not only be used as a cache, but also directly as a data store in many cases, so you need a highly available Redis service to support and ensure the normal operation of the business. So how do you design a Redis high availability service that doesn’t crash?
Think about it, what is the solution to high availability of data? Is a copy. In order to design a highly available Redis service, it is important to consider the multi-service nodes of Redis, such as the master-slave replication, sentinel mode, and Redis cluster. Here are three things you should definitely answer in an interview.
- Master/slave synchronization (master/slave replication)
This is the most basic guarantee of Redis high availability service. The realization scheme is to synchronize data from a former Redis server to multiple slave Redis servers, that is, the mode of one master and many slave, so that we can do read and write separation of Redis, to bear more concurrent operations. This is the same principle as MySQL’s master-slave replication.
- Redis Sentinel (Sentinel mode)
When using the Redis master/slave service, there is a problem that when the master/slave server of Redis fails, it needs to be manually recovered. To solve this problem, Redis added the Sentinel mode (because the sentinel mode can monitor the master/slave server and provide automatic disaster recovery).
- Redis Cluster
Redis Cluster is a distributed decentralized operation mode. It is the Redis Cluster solution introduced in Redis 3.0. It distributes data on different servers to reduce the dependence of the system on a single master node and improve the read and write performance of Redis services.
The Redis Cluster solution uses Hash slots to handle the mapping between data and instances. In the Redis Cluster scheme, a slice Cluster has a total of 16384 hash slots, which are similar to data partitions. Each key-value pair will be mapped to a hash slot according to its key. The specific execution process is divided into two steps.
-
Calculate a 16-bit value according to the CRC16 algorithm based on the key of the key-value pair.
-
Then take module of 16384 with 16bit value, get module in the range of 0~16383, each module represents a corresponding number of hash slot.
The remaining question is, how are these hash slots mapped to specific Redis instances? There are two options.
-
Evenly distributed: When you create a Redis cluster using the cluster create command, Redis automatically distributes all hash slots evenly across the cluster instance. For example, if there are nine instances in the cluster, the number of slots on each instance is 16384/9.
-
Manual allocation: You can use the Cluster meet command to manually establish connections between instances to form clusters. Then use the Cluster addslots command to specify the number of hash slots on each instance. For your convenience, I used a diagram to explain the mapping between data, hash slots, and instances.