Redis is an in-memory database, data is stored in memory, but we all know that data changes in memory are fast, and it is easy to lose. Fortunately, Redis also provides persistence mechanisms for us, which are RDB(Redis DataBase) and AOF(Append Only File).
This assumes that you already know the basic syntax of Redis. Basic use of the article will not write, are some commonly used commands.
Here are two ways to introduce. From the shallow to the deep.
1. Persistent processes
Now that redis data can be kept on disk, what does the process look like?
There are five processes:
(1) The client sends write operations to the server (data is stored in the client’s memory).
(2) The database server receives the data of the write request (the data is in the server’s memory).
(3) The server invokes the write system call to write data to disk (the data is in the buffer of system memory).
(4) The operating system transfers the data in the buffer to the disk controller (the data is in the disk cache).
(5) The disk controller writes the data to the physical media of the disk (the data actually falls on the disk).
These 5 processes are a normal save process under ideal conditions, but in most cases, our machine and so on will have various failures, here are divided into two cases:
(1) If the Redis database fails, as long as the third step above is completed, it can be persisted, and the remaining two steps are completed by the operating system for us.
(2) If the operating system is faulty, the above five steps must be completed.
In this paper, only the possible failure of the saving process is considered. In fact, the saved data may also be damaged, requiring a certain recovery mechanism, but it will not be extended here. The main consideration now is how Redis implements the above five steps for saving disks. It provides two policy mechanisms, namely RDB and AOF.
Second, RDB mechanism
An RDB is simply a snapshot of data stored on disk. What is a snapshot? You can think of it as taking a picture of the current moment and saving it.
RDB persistence refers to writing a snapshot of an in-memory data set to disk at a specified interval. This is also the default persistence mode, which is to write the in-memory data as a snapshot to a binary file named dump.rdb by default.
After we installed Redis, all the configuration was in the redis. Conf file, which saved the various configurations of the RDB and AOF persistence mechanisms.
Since the RDB mechanism works by taking a snapshot of all the data at one point in time, there should be a trigger mechanism to do this. For RDB, three mechanisms are provided: Save, BGSave, and automation. Let’s look at them separately
1. Trigger mode of Save
This command blocks the current Redis server and Redis cannot process other commands during the execution of the save command until the RDB process is complete. The specific process is as follows:
If an old RDB file exists at the end of execution, the new RDB file is replaced with the old one. Our clients may be in the tens of thousands or hundreds of thousands, which is obviously not desirable.
2. Bgsave trigger mode
When this command is executed, Redis asynchronously takes snapshots in the background and responds to client requests at the same time. The specific process is as follows:
The Redis process forks to create a child process. The RDB persistence process is responsible for the child process and ends automatically after the process is complete. Blocking occurs only during the fork phase, which is usually very short. Basically all RDB operations inside Redis use the BGsave command.
3. Automatic trigger
Automatic triggering is done by our configuration file. In the redis.conf configuration file, there is the following configuration that we can set:
Save: this is used to configure the RDB persistence condition that triggers Redis, that is, when to save the data in memory to the hard disk. For example, “Save m n”. Bgsave is automatically triggered when the data set is modified for n times within m seconds.
The default configuration is as follows:
If at least 10 keys have changed in 60 seconds, save 900. If at least 10 keys have changed in 60 seconds, save 300. If at least 10000 keys have changed in 60 seconds. Save 60 10000
No persistence is required, so you can comment out all save lines to disable the save function.
②stop-writes-on-bgsave-error: The default value is yes. Whether Redis stops receiving data when RDB is enabled and the last background save fails. This makes the user aware that the data has not been persisted to disk correctly, otherwise no one will notice that a disaster has occurred. If Redis is restarted, you can start receiving data again
(3) rdbcompression; The default value is yes. You can set whether to compress snapshots stored in disks.
④ rDBChecksum: The default value is yes. After storing the snapshot, we can also have Redis use the CRC64 algorithm to validate the data, but this adds about 10% of the performance cost and can be turned off for maximum performance gains.
⑤dbfilename: Specifies the snapshot name. The default name is dump. RDB
⑥dir: Set the directory for storing snapshot files. This configuration item must be a directory, not a file name.
We can modify these configurations to achieve the desired effect. Since the third method is configured, let’s do a comparison of the first two:
4. Advantages and disadvantages of RDB
(1), advantage
(1) RDB file compact, full backup, very suitable for backup and disaster recovery.
(2) When the RDB file is generated, the main redis process forks () a child process to handle all the save work. The main process does not need to do any disk IO operations.
(3) RDB is faster than AOF in recovering large data sets.
(2), weaknesses
An RDB snapshot is a full backup that stores the binary serialized form of in-memory data and is very compact in storage. When snapshot persistence is implemented, a child process is enabled to take charge of snapshot persistence. The child process has the memory data of the parent process, and the child process does not respond to the memory modification of the parent process. Therefore, the data modified during snapshot persistence is not saved and may lose data.
3. AOF mechanism
Full backups are always time-consuming, and sometimes we provide a more efficient way, AOF, which works simply as Redis appends every write command it receives to a file via the write function. The popular understanding is logging.
1. Persistence
Here’s how it works:
Whenever a write command comes in, it’s stored directly in our AOF file.
2. Principle of document rewriting
The AOF approach also poses another problem. Persistence files get bigger and bigger. To compress aOF persistence files. Redis provides the bgrewriteaof command. Memory data is saved to a temporary file by command, and a new process is forked to rewrite the file.
The operation of overwriting an AOF file, rather than reading the old AOF file, commands the entire contents of the database in memory to rewrite a new AOF file, similar to a snapshot.
3. AOF also has three trigger mechanisms
(1) Each change synchronization always: Synchronous persistence Every data change is recorded immediately. Poor disk performance but good data integrity
(2) Synchronization per second Everysec: asynchronous operation, record every second if the downtime within a second, there is data loss
(3) Different No: never synchronized
4, strengths,
(1) AOF can better protect against data loss. Generally, AOF will execute fsync operation every second through a background thread, and lose data for at most one second. (2) AOF log files do not have any disk addressing overhead, write performance is very high, the file is not easy to damage.
(3) Even if the AOF log file is too large, the background rewrite operation will not affect the client’s reading and writing.
(4) Commands for AOF log files are recorded in a very readable manner, which is ideal for emergency recovery in the event of catastrophic deletions. Flushhall flushes all data in the flushhall file. Rewrite in the background has not yet happened. Flushhall deletes the last item in the AOF file and then flushes the AOF file back
5 and disadvantages
(1) AOF log files are usually larger than RDB data snapshot files for the same data
(2) When AOF is enabled, the write QPS supported by RDB is lower than that supported by RDB, because AOF is usually configured to fsync log files once per second. Of course, once per second is still very high performance
(3) There was a bug in AOF before, that is, the same data was not recovered when the logs recorded by AOF were recovered.
4. How to choose RDB and AOF
If you choose, the two are better together. Because you get the two persistence mechanisms, you’re left with your own requirements, and you don’t have to choose between them, but they’re usually used in combination. Here’s a picture to summarize:
After comparing these features, the only thing left to look at is yourself.