My new course “C2C E-commerce System Micro-service Architecture 120-day Practical Training Camp” has been launched in the public account Of Ruape Technology Nest. Interested students can long press and scan the qr code below to learn about the course details:

The opening

Redis is the most commonly used in-memory database. Generally speaking, data is stored in memory, in order to avoid the Redis server process exit caused by the disappearance of data in memory. Redis proposed the persistence mechanism, which is to save the data in memory to disk, so as to improve the reliability of data storage. For this purpose, major databases provide two types of persistence: “snapshot” storage and “log” storage. Accordingly, Redis provides RDB persistence and AOF persistence to match them. RDB stores memory data to disk in the form of snapshot, and AOF stores data in the form of log appending. Here’s a look at these two types of persistence:

  • RDB file structure

  • RDB trigger mechanism and process

  • AOF persistence process

  • AOF buffer synchronization file policy

  • AOF rewrite

  • Comparison of RDB and AOF

RDB persistence

RDB is short for Redis Database. RDB is used to generate snapshots of data stored in Redis memory ata certain point in time and store them on disk or other media. Files on disk are called RDB files. The RDB file is a binary file that has been compressed. Because the RDB file is stored on the hard disk, the Redis server can use the RDB file to restore the state of the database even if the Redis server process exits or even if the computer running the Redis server goes down, as long as the RDB file still exists. As shown in Figure 1, you can imagine that the Redis database has a database state at different points in time on the timeline, and you can think of them as slices. With two slices of the database marked at two points in time, what RDB persistence does is to save the “slices” of the database state to disk as RDB files in the direction of the green arrow.

Figure 1 Saving the database state as an RDB file

RDB file structure

While RDB is a compressed binary file, a basic understanding of its file structure is required to help us understand its role. As shown in Table 1, an RDB file consists of five parts, in the order from left to right:

  • At the beginning of the file is the “REDIS” part, its length is 5 bytes, save the “REDIS” five characters. Through these five characters, the program can determine whether the loaded file is an RDB file when loading the file.

  • “Db_version” is a 4-byte string integer that records the version number of the RDB file. For example, “0006” indicates that the RDB file version is version 6.

  • Databases can contain zero or multiple databases.

  • The length of the “EOF” constant is 1 byte and indicates the end of the RDB file body. When the loader reads a value of “EOF”, it means that all the database key-value pairs have been loaded.

  • “Check_sum” is an 8-byte unsigned integer that holds a checksum. The checksum is calculated from the REDIS, DB_Version, Databases, and EOF sections. When the Redis server loads the RDB file, it compares the checksum calculated by the load data with the checksum recorded by check_sum to determine whether the RDB file is corrupted.

REDIS

db_version

databases

EOF

check_sum

Table 1 File structure of RDB

As shown in Table 2, it indicates an RDB file whose databases section is empty. A file that starts with “REDIS” indicates an RDB file, and “0006” indicates that the database version is version 6. Because databases is empty, there is no database information. Therefore, the version number is directly followed by “EOF” constant. The final version 6265312314761934562 is the checksum of the file.

REDIS

“0006”

EOF

6265312314761934562

Table 2 Examples of RDB files

RDB trigger mechanism and process

After understanding the structure of RDB files, let’s take a look at how Redis triggers RDB persistence operations and how the process works. This includes three parts: synchronous triggering of SAVE, asynchronous triggering of BGSave and automatic triggering of configuration.

Save Synchronization triggers RDB persistence

Figure 2 shows the process of persisting RDB in the save synchronization mode:

  1. The Redis Client sends the save command to the Redis Server to request the RDB persistence operation.

  2. After receiving the command, Redis Server saves the current database snapshot to an RDB file.

  3. Since the save command is a synchronous operation, if another Redis Client initiates the save operation to the Redis Server at this time, it will be blocked until the first Redis Client completes the save command.

Figure 2. The Save synchronization mode triggers RDB persistence

Bgsave synchronization triggers RDB persistence

As shown in Figure 3, the process of asynchronous mode is as follows:

  1. The Redis Client still initiates the command, but the command is changed to bgsave (background save is used to run in the background), and the Redis Server is still requested.

  2. When the Redis Server receives the request, fork a Redis child process.

  3. This child process is used to create the RDB file. Since the process is asynchronous, Redis Server can accept additional requests after the child process has started.

  4. After creating the RDB file, the Redis subprocess returns the success message to the Redis Server.

  5. Since the bgsave command is an asynchronous operation, if other Redis clients request Redis Server at the same time, it will not be blocked. Redis Server will respond to the request and fork out the corresponding child process to create the RDB file.

Figure 3 BGSave asynchronously triggers RDB persistence

Automatic configuration is triggered

As shown in Figure 4, this approach can be understood as reading a configuration file. Look at the following three steps:

  1. The Redis Server reads the contents of the Redis configuration file directly to obtain the RDB persistent information.

  2. The corresponding save command is configured in the Redis configuration file to replace the command requested by the Redis Client. The configuration includes the save command, seconds, and modification times. In this example, “Save 500 3” means to make a save request if the Redis database has 3 changes within 500 seconds.

  3. Once the conditions in the configuration file are met, Redis Server performs the corresponding save operation to persist.

Figure 4 read the configuration file for RDB persistence

Generally, the Redis configuration information is stored in the redis.conf configuration file, which contains a lot of content. Here we will give you a brief introduction to the RDB persistence part. The SNAPSHOTTING section is found in the redis.conf file. Look at the following configuration items:

# 1 change in 900 seconds, 10 changes in 300 seconds, and 10,000 changes in 60 seconds

If any of the above conditions are met, RDB persistence is triggered.

save 900 1

save 300 10

save 60 10000

Do you want to stop persistence when bgsave fails? Yes indicates yes, and no indicates no.

stop-writes-on-bgsave-error yes

# Compression or not? Yes indicates yes, and no indicates no. The default value is yes.

rdbcompression yes

# Verify RDB data, indicating whether RDB file check is enabled when writing and reading files.

# Yes: yes; no: No; default: yes.

# Select yes to load the RDB file in Redis. If the file is corrupted, the startup will be stopped.

rdbchecksum yes

# Set RDB file name

dbfilename dump.rdb

# RDB file path, if not specified separately, defaults to redis startup path.

dir ./

To restore Redis data, it is easy to move the RDB persistent file (e.g. Dump. RDB) to the Redis installation directory and start the Redis service. To obtain the Redis installation directory, run the CONFIG GET dir command in Redis.

Since the RDB is a compressed binary file, it represents a snapshot of Redis at a point in time. It is suitable for database backup and full replication scenarios. For example, periodically back up the database, copy the RDB files to other servers, and use them for disaster recovery. Also due to compression, RDB loads faster than AOF.

AOF persistence

The above describes the execution mode and process of RDB, which does not meet the requirements of real-time persistence. Because both save and BGSave consume a lot of resources (CPU, memory, disk) each time they run. As the capacity of the database increases, the amount of data backed up each time increases. Meanwhile, RDB is stored in binary format. When the Redis version evolution process has RDB versions in multiple formats, the old RDB is compatible with the new version format. Because of these problems with RDB, Redis proposed AOF persistence. Append only File (AOF) records the commands written each time in a log. When the Redis Server is started, the commands in the AOF file are executed again to recover the data. AOF can solve the real-time problem of data persistence, and is also the mainstream persistence method of Redis.

AOF persistence process

As mentioned above, the process of AOF persistence is the process of continuously appending logs. Figure 5 shows the specific process:

  1. Redis Client as the source of commands, there will be multiple sources and a steady stream of requests for commands.

  2. After these commands reach the Redis Server, they are not directly written to the AOF file. Instead, they are stored in the AOF cache first. The AOF buffer here is actually an area of memory that exists to avoid frequent disk I/O operations when these commands reach a certain amount before being written to disk.

  3. The AOF buffer writes the command to the AOF file on disk according to the corresponding policy.

  4. As the content of an AOF file increases, commands are merged according to the rules. This is called AOF rewrite, so as to compress the AOF file.

  5. When the Redis Server Server is restarted, data is loaded from the AOF file.

Figure 5. AOF processing flow chart

AOF buffer synchronization file policy

As mentioned above, Redis writes commands to the AOF buffer first and then to the AOF file. Here are three strategies for AOF buffer synchronization.

  • Always policy: when a command is written to the AOF buffer, the system will call fsync to synchronize it to the AOF file, and the thread will return when fsync is complete. Fsync is an operation on a single file, which blocks during disk synchronization until it returns after writing to disk to ensure data persistence is complete.

  • Everysec policy: The write operation is called after the command is written to the AOF buffer, and the thread returns when the write is complete. This operation is performed by a dedicated thread once per second. The write operation here triggers a delayed write mechanism, and the Linux kernel provides page buffers to improve disk I/O performance. That is, the write operation is returned after being written to the system buffer, and the synchronization of the disk depends on the operating system scheduling mechanism. (Default configuration for Redis)

  • No policy: This refresh policy is determined by the operating system, that is, the operating system decides when to write the buffer data to disk. Since it is the operating system that determines persistence, this approach is out of control.

AOF rewrite

The AOF buffer will continuously synchronize the commands requested by the Redis Client to the AOF file. As the AOF file grows, an AOF rewrite is required. AOF rewrite is the process of converting data in the Redis process into write commands and synchronizing them to a new AOF file. The goal is to make the rewritten AOF file smaller:

  • Data that has timed out in the process will not be written to the AOF file.

  • Old AOF files contain invalid commands that can be generated directly from in-process data, while new AOF files retain only the final data write commands. For example, if there are three commands in the file, they are “set hello A”, “set hello B” and “set hello C”, only the last sentence “set hello C” will work if the value of the same key is negative. So these three commands are replaced by a single command “set hello C” and saved in the new AOF file.

  • In addition, multiple write commands can be combined into one. For example, there are three commands lpush list A, lpush list B, and lpush list C, which can be combined into one command lpush list A B C.

Not only does the AOF rewrite reduce the size of the file, but a smaller AOF can also be loaded faster by Redis.

Having said the definition of AOF rewriting, let’s look at the process of AOF rewriting. In general, there are two ways to perform rewrite: the bgrewriteaof command and the AOF rewrite configuration.

The bgrewriteaof command is overridden

As shown in Figure 6, the entire execution process consists of three steps:

  • The Redis Client initiates bgrewriteaof, which is an asynchronous command. Since the Redis Server accepts bgrewriteaof as well as other Redis Client commands, the second and third steps are actually performed in parallel. Step 2: Do AOF rewrite, while step 3: accept other command requests that are not overwritten.

  • When the Redis Server receives this command, it starts a Redis child process to perform the AOF override. The rewrite process actually backtraces the data in Redis memory, the “AOF rewrite buffer” in the red area below.

  • As mentioned in the first step, the third step accepts the client request and saves it to the “old AOF file” through the “AOF buffer” at the same time as the second step.

  • Finally, after completing the AOF rewrite operation, the “new AOF file” is written to the “old AOF file” to complete the AOF rewrite.

Figure 6. AOF rewrite flowchart

AOF configuration rewrite

The timing of the rewrite is actually determined by the configuration values in the AOF configuration file. The configuration is as follows:

With the above configuration, the mechanism for AOF rewriting can be obtained as follows:

  • The override mechanism is triggered when the current size of an AOF file is larger than the minimum size for an AOF override. The expression formed by the above configuration is: aof-current-size> auto-aof-rewrite-min-size

  • The rewrite mechanism is needed when the value of the current size of the AOF file minus the size of the AOF file divided by the size of the AOF file is greater than the ratio of the AOF file rewrite.

    The expression is: aof-current-size- aof-base-size/ aof-base-size > auto-aof-rewrite-percentage

Because the Redis configuration file RDB is the default configuration. AOF needs to be manually enabled.

You need to go to the Redis configuration file redis.conf to set the following parameters:

# Enable AOF mode. Yes: enable AOF mode.

appendonly yes

As with the RDB persistent file recovery, simply move the AOF file to the Redis installation directory and start the Redis service to load the AOF file when Redis is started.

From the above description of AOF, it can be seen that AOF has the advantages of data integrity and high security (second level data loss). At the same time, the AOF file exists in the form of appended log file, and the write operation is saved in the Redis protocol format, so its content is readable and suitable for emergency recovery in case of error deletion. However, compared to RDB, the file size is larger and Redis takes more time to start loading data.

Comparison of RDB and AOF

RDB and AOF mechanisms were introduced earlier, so here is a simple comparison.

  • Start priority: Assuming that Redis has both RDB and AOF persistence enabled, when Redis restarts, the AOF will be loaded first, since the AOF data is updated more frequently and will be saved.

  • Size/Recovery speed: The RDB is saved in binary compression mode, so it will be smaller in size and will load faster when Redis recovers. AOF, on the other hand, is written in log form, so it is larger and slower to recover.

  • Data security: RDB saves data in snapshot mode. Data is not saved in real time, which may cause data loss. In this respect, AOF logs have a much better chance of losing data than RDB (e.g. Everysec).

  • Resource consumption: The RDB display consumes more resources because the full amount of data is saved to disk at a time. AOF can save increments of Redis data each time.

conclusion

Starting with why database persistence is needed, this article discusses two types of database persistence mechanisms in Redis: RDB and AOF. In this paper, RDB persistence is described by introducing RDB file structure, triggering persistence mechanism and process. The value includes the save synchronization mode, BGSave synchronization mode, and automatic configuration mode. For AOF persistence, the AOF persistence process, buffer synchronization file strategy and AOF rewrite mechanism are introduced. Buffer synchronization policies include always, Everysec, and no policies.

Share the mind map of the study notes below.