How does Redis persistence work?

What is persistence? Data is stored in a device that will not be lost after a power outage.

First let’s take a look at what the database does when it writes. There are five main processes:

  • The client sends write operations (data in the client’s memory) to the server.
  • The database server receives the data for the write request (the data is in the server’s memory).
  • The server invokes the write system call to write data to disk (the data is in a buffer in system memory).
  • The operating system transfers the data in the buffer to the disk controller (the data is in the disk cache).
  • The disk controller writes data to the physical medium of the disk (the data actually falls onto the disk).

Failure analysis

Write operations generally have the above five processes. The following shows the faults of different levels based on the above five processes:

  • When the database system fails, the system kernel is still intact. At this point, as long as we finish step 3, the data is safe, because the operating system will do the following steps to ensure that the data will eventually reach disk.
  • When the system is powered down, all the caches mentioned in the above five items fail and the database and operating system stop working. Therefore, only when the data is completed in step 5 can we ensure that the data will not be lost after the power failure.

With the understanding of the above five steps, we may wish to clarify some of the following questions:

  • How often does the database call WRITE to write data to the kernel buffer?
  • How often does the kernel write data from the system buffer to the disk controller?
  • When does the disk controller write cached data to physical media?

** For the first issue, ** usually has overall control at the database level. ** For the second question, the ** operating system has a default policy, but we can also force the operating system to write data from the kernel to the disk controller through the Fsync series of commands provided by the POSIX API. ** For the third question, ** seems like the database is out of reach, but in fact, most of the time disk caching is set to be turned off, or only read caching is turned on, meaning that writes are not cached and written directly to disk.

It is recommended that write caching be enabled only when your disk device has a spare battery.

Data corruption

Data corruption means that data cannot be recovered. All we have talked about above is how to ensure that data is actually written to disk, but writing to disk may not mean that data is not corrupted. For example, we might have two different writes on one write request, and when an accident occurs, one write may safely complete, but the other may not. If the data file structure of the database is not properly organized, it may result in a situation where the data cannot be recovered at all. There are also three common strategies for organizing data to prevent data files from being corrupted beyond recovery:

  • The first is the most crude processing, which is not to ensure the recovery of the data through the organization of the data. After data files are damaged, data can be restored through data backup through synchronous data backup. In fact, this is the case when MongoDB does not enable the operation log and configure Replica Sets.
  • The other is to add an operation log on top of the above, and record the behavior of each operation, so that we can use the operation log for data recovery. Because operation logs are appended sequentially, operation logs cannot be recovered. This is similar to what happens when MongoDB turns on operation logging.
  • A safer bet is that the database does not modify the old data, but just appending the writes so that the data itself is a log, so that the data can never be unrecoverable. In fact, CouchDB is a great example of this.

Redis provides RDB persistence and AOF persistence

The advantages and application of RDB mechanism

RDB persistence refers to writing a snapshot of an in-memory data set to disk at a specified interval. This is also the default persistence mode, which is to write the in-memory data as a snapshot to a binary file named dump.rdb by default.

You can configure the automatic snapshot persistence mode. We can configure Redis to automatically take snapshots if more than m keys are changed within n seconds. The following is the default snapshot saving configuration

   save 900 1     #900In seconds if more than1If two keys are modified, the snapshot is initiated to save300 10    #300Second content such as over10If two keys are modified, the snapshot is initiated to save60 10000
Copy the code

RDB file saving process

  • Redis calls fork and now has a child and a parent.
  • The parent process continues to process client requests, and the child process is responsible for writing the memory contents to temporary files. Due to the COPY on Write mechanism of the OS, the parent process shares the same physical page. When the parent process processes a write request, the OS creates a copy of the page to be modified by the parent process instead of writing the shared page. So the data in the child’s address space is a snapshot of the entire database at fork.
  • After the child process writes the snapshot to the temporary file, it replaces the original snapshot file with the temporary file. Then the child process exits.

The client can also use the save or bgsave command to tell Redis to persist a snapshot. The save operation saves the snapshot in the main thread, which blocks all client requests because Redis uses one main thread to process all client requests. Therefore, it is not recommended.

Another point to note is that each snapshot persistence writes the entire memory data to disk once, rather than incrementally synchronizing only dirty data. If there is a large amount of data and many write operations are performed, a large number of DISK I/O operations will be performed, which may seriously affect disk performance.

advantage

  • Once you do this, your entire Redis database will contain only one file, making it easy to back up. For example, you might plan to file some data every day.
  • For easy backup, we can easily move RDB files one by one to another storage medium
  • RDB can recover large data sets faster than AOF.
  • RDB maximizes the performance of Redis: the only thing the parent has to do to save the RDB file is fork out a child, which then handles all subsequent saves without the parent performing any disk I/O operations.

disadvantage

  • If you need to avoid losing data in the event of a server failure, the RDB is not for you. Although Redis allows you to set different save points to control how often RDB files are saved, it is not an easy operation because RDB files need to hold the state of the entire data set. So you’ll probably save your RDB file at least once every 5 minutes. In this case, you could lose several minutes of data in the event of a malfunctioning outage.
  • Each time the RDB is saved, Redis forks () out a child process that does the actual persistence. In large data sets, fork() can be time-consuming, causing the server to stop processing the client in so-and-so milliseconds; If the data set is very large and CPU time is very tight, this stop time can even take a full second. Although AOF overrides also require forking (), the durability of the data is not compromised regardless of the interval between AOF overrides.

AOF file saving process

Redis appends each received write command to the file using the write function (default appendone.aof).

When Redis restarts, it recreates the contents of the entire database in memory by re-executing the write commands saved in the file. Of course, since the OS caches the changes made by write in the kernel, they may not be written to disk immediately. It is still possible to lose some changes in aOF persistence. However, we can tell Redis through the configuration file when we want to force the OS to write to disk via fsync. There are three methods as follows (default: fsync once per second)

appendonly yes              // enable aOF persistence #
appendfsync always      // Every time a write command is received, it is forcibly written to disk immediately. The slowest, but complete persistence, is not recommended
appendfsync everysec     // Force writes to disk once per second, a good compromise between performance and persistence, recommended #
appendfsync no    // Completely dependent on OS, best performance, no guarantee of persistence
Copy the code

The AOF approach also poses another problem. Persistence files get bigger and bigger. For example, if we call the incr test command 100 times, we must save all 100 commands in the file, but 99 of them are redundant. To restore the state of the database, simply save a set test 100 file.

To compress aOF persistence files. Redis provides the bgrewriteaof command. Upon receiving this command, Redis will use a similar method to snapshot to save the in-memory data to a temporary file by command, and eventually replace the original file. The specific process is as follows

  • Redis calls fork and now has a parent and child process
  • The child process writes commands to rebuild the state of the database to a temporary file based on an in-memory database snapshot
  • The parent process continues to process client requests except for writing the write command to the original AOF file. It also caches the received write commands. This ensures that there will be no problem if the child process rewrite fails.
  • When a child process writes snapshot contents to a temporary file in command mode, the child process sends a signal to notify the parent process. The parent process then writes cached write commands to the temporary file as well.
  • The parent process can now replace the old AOF file with a temporary file and rename it, and subsequent write commands will start appending to the new AOF file.

Note that the aOF file is overwritten. Instead of reading the old AOF file, the entire database contents in memory are command overwritten into a new AOF file, similar to a snapshot.

advantage

  • Using AOF persistence makes Redis much more durable: You can set different fsync policies, such as no fsync, fsync every second, or fsync every time a write command is executed. The default AOF policy is fsync once per second. In this configuration, Redis still performs well and loses at most a second of data in the event of an outage (fsync is performed in background threads, so the main thread can continue to struggle to process command requests).
  • The AOF file is an append only log file, so writes to the AOF file do not need to seek, even if the log contains incomplete commands for some reason (for example, the disk is full when writing, the write is stopped, etc.). The Redis-check-aof tool can also easily fix this problem.

Redis can automatically rewrite the AOF in the background when the AOF file becomes too large: the rewritten new AOF file contains the minimum set of commands needed to restore the current data set. The entire rewrite operation is absolutely safe because Redis continues to append commands to existing AOF files while creating new AOF files, and the existing AOF files will not be lost even if an outage occurs during the rewrite. Once the new AOF file is created, Redis switches from the old AOF file to the new AOF file and starts appending the new AOF file.

  • AOF files orderly store all writes to the database in the Redis protocol format, so the contents of AOF files are easy to read and parse. Exporting AOF files is also very simple: For example, if you accidentally execute the FLUSHALL command, as long as the AOF file isn’t overwritten, stop the server, remove the FLUSHALL command at the end of the AOF file, and restart Redis, You can restore the data set to the state it was in before the FLUSHALL execution.

disadvantage

  • AOF files are usually larger than RDB files for the same data set.
  • Depending on the fsync strategy used, AOF may be slower than RDB. Fsync per second performance is still very high under normal conditions, and turning off fsync allows the AOF to be as fast as the RDB, even under high loads. However, RDB can provide more guaranteed maximum latency when handling large write loads.
  • AOF has had a bug in the past where an AOF file could not restore the dataset as it was saved when it was reloaded due to certain commands. (For example, the blocking command BRPOPLPUSH has caused such a bug.) The test suite adds tests for this: they automatically generate random, complex data sets and reload them to make sure everything is okay. Although this kind of bug is not common in AOF files, RDB bugs are almost impossible by comparison.

choice

In general, if you want to achieve data security comparable to PostgreSQL, you should use both persistence features. If you care deeply about your data, but can still afford to lose it within minutes, you can use RDB persistence only.