Redis Persistence (RDB and AOF)

We all know that Redis data is stored in memory. If there is a sudden outage, all data will be lost. Therefore, there must be a mechanism to ensure that Redis data will not be lost due to failure, and this mechanism is Redis persistence mechanism.

Redis has two main persistence mechanisms, the first is RDB snapshot and the second is AOD log. If AOF persistence is enabled on our server, the server will preferentially use AOF files to restore the state of the database. Only when AOF persistence is turned off can the server use the RDB file to restore the database state.

RDB persistence

RDB persistence is implemented by taking snapshots of the data set in memory and writing them to disk at a specified time interval. At recovery time, the snapshot file is read into memory.

However, we know that Redis is single-threaded, and memory snapshot requires Redis to perform file IO operations, but file IO operations can not use multiplexing API, which means that we have a single thread to handle requests on the server, and file IO operations, obviously will drag down the performance of the server request.

How do we solve this problem? This uses COW(Copy On Write) for persistence. In this way, there is a detailed article about COW. COW COW! Copy On Write

Redis forks a separate subprocess for persistence, writing data to a temporary file that will be used to replace the last persistent file after the persistence process is complete. During the whole process, the main process does not perform any IO operations. This ensures high performance. RDB is more efficient than AOF if large-scale data recovery is required and data integrity is not very sensitive. The downside of RDB is that data can be lost after the last persistence. (The default RDB file is dump.rdb)

The child process does data persistence. It does not modify the existing in-memory data structure, it just reads the data structure and serializes it to disk. The parent process, however, must continuously service client requests and then make continuous modifications to the in-memory data structure. We can’t operate on the same piece of memory, so we use write on copy.

A data segment is a combination of multiple operating system pages. When the parent process modifies the data on one page, it separates the data from the shared memory and modifies the copied page. At this point, the corresponding page of the child process is unchanged, or the data at the moment when the process was generated. That’s why it’s called a snapshot. (The specific process corresponds to the operation in the figure above)

The trigger condition

We also mentioned the RDB operation above, so when will it be triggered?

Command trigger:

SAVE: Redis data is persisted immediately, all other blocks
BGSAVE: Redis takes snapshots asynchronously in the background and responds to client requests. The lastsave command is used to obtain the last successful snapshot execution time
FLUSHALL: The flush command also triggers persistence, but the dump. RDB file is empty and meaningless
SHUTDOWN: The SHUTDOWN command also triggers redis persistence if AOF persistence is not enabled
DEBUG RELAOD: When reloading Redis with this command, the save operation is also triggered automatically

Config file trigger:

In our Redis file, we can set save to persist the changes within seconds. (Also BGSAVE)

The default configuration is as follows:
save 900 1  Trigger with a change within 900 seconds
save 300 10 Triggers 10 changes within 300 seconds
save 60 10000 10000 changes in 60 seconds
Copy the code

If the slave node performs a full copy operation, the master node automatically BGSAVE to generate RDB files and send them to the slave node

The advantages and disadvantages

Advantages:

Data integrity requirements are not high
Because the main process does not perform any I/O operations, it has high performance for large-scale data recovery

Disadvantages:

When the process is forked, it takes up some memory space
A certain amount of time is required to operate, and if Redis unexpectedly goes down, the last modified data is lost

To restore persistent files:

Move the backup file to the directory where the dir parameter is configured for persistence and change the filename to dbfilename. Then start the database to restore data.

If the dump. RDB file is abnormal, an error will be reported during data restoration. To fix the dump. RDB persistent file, run the redis-check-dump –fix command, and then restart the redis service.

Persistent AOF (Append – Only – the File)

Each write command is recorded in an independent log, and the written content is directly in the format of text protocol. The command in AOF file is executed again when the file is restarted to recover data, which solves the real-time problem of data persistence.

appendonly no # Change the default value to no
Copy the code

If AOF is enabled, the default file name is appendone.aof and the path is the same as the RDB file.

ReWrite mechanism

**AOF uses the method of appending files, which will lead to larger files, so the new rewrite mechanism. When the AOF file size exceeds the set threshold, Redis starts AOF content compression, reserving only the smallest instruction set that can recover data.

**Redis provides the bgreWriteAOF directive to slim down (rewrite) AOF logs. The principle is to create a sub-process to traverse the memory into a series of Redis operation instructions, serialized into a new AOF log file. After serialization, the incremental AOF logs that occurred during the operation are appended to the new AOF log file, which immediately replaces the old AOF log file, and the slimming job is complete.

Specific workflow of AOF:

Real-time write of command, call to command
All write commands are appended to the aOF_buf (buffer)
The AOF buffer synchronizes data to disks based on the corresponding policy
The parent process forks to create a child process. The child process overwrites the AOF file based on the memory snapshot. The parent process continues to respond to subsequent commands
The Redis service is restarted and the AOF file is loaded to restore data

triggering

Active trigger:

Use the bgrewriteaof command

Passive trigger:

Configuration file Settings, with two parameters

Appendfsync (accept client operations) defaults to no to ensure data security
no-appendfsync-on-rewrite no

#redis records the size of the last rewritten AOF file and triggers the AOF rewrite when the AOF file is 100% times larger than the previous rewritten file and the size is greater than 64MB
#auto-aof-rewrite-percentage is used to set the percentage relative to the last AOF file
auto-aof-rewrite-percentage 100
#auto-aof-rewrite-min-size is used to set another reference value
auto-aof-rewrite-min-size 64mb
Copy the code

The advantages and disadvantages

Advantages:

Data integrity is better preserved if every change is synchronized
If you synchronize once a second, if there’s an outage or something, you can lose at most one second of data, right
It is the most efficient if it is never synchronized

Disadvantages:

Compared to data files, AOF is much larger than RDB, and repairs are slower than RDB
AOF is also slower than RDB, so our Redis default configuration is RDB persistence

Fsync Synchronization policy

As mentioned above, the AOF buffer synchronizes to the hard disk according to the corresponding policy. What are the synchronization policies?

The AOF buffer synchronization policy, controlled by the appendfsync parameter, has three values

The value of the option	instructions	other
always	After the command is written into aof_buf, the fsync operation is invoked to synchronize the file to AOF. After the fsync is complete, the thread returns	AOF files need to be synchronized every time they are written, which makes it difficult to achieve high performance on ordinary SATA disks
everysec	After the aof_buf command is written, the write operation is invoked. After the write operation is complete, the thread returns. Fsync synchronization is invoked by the thread once per second (recommended policy)	Default Synchronization Policy
no	After the command is written to aof_buf, the system invoks the write operation. Fsync is not performed for AOF files. The operating system is responsible for disk synchronization, and the maximum synchronization period is 30 seconds	The interval for each AOF file synchronization is uncontrollable and the amount of data to be synchronized increases. This improves performance, but data security is not guaranteed

** If both RDB and AOF are enabled, AOF is loaded first. Because AOF holds more complete data, at most 1s of data is lost.

Interview questions about AOF

Why does AOF adopt text protocol format directly?

Text protocol has good compatibility
After AOF is enabled, all write commands contain append operations and adopt the protocol format to avoid secondary processing overhead
The text protocol is readable for direct modification and processing

Why does AOF append commands to aOF_buf

Writing to cache aOF_buf improves performance, and Redis provides multiple cache synchronization policies

Why can rewritten AOF files become smaller?

Data that has timed out in the process is no longer written to the file
The old AOF file contains invalid commands, overrides are generated directly using in-process data, and the new AOF file only holds write commands for the final data
Multiple write commands can be combined into one. To prevent overflow, the 64 elements are divided into multiple commands

The resources

Redis Design and Implementation

Redis Deep Adventures: Core Principles and Practical Applications

Redis for Redis persistence

How does Redis persist

O mention about

RDB persistence

The trigger condition

The advantages and disadvantages

Persistent AOF (Append – Only – the File)

ReWrite mechanism

triggering

The advantages and disadvantages

Fsync Synchronization policy

Interview questions about AOF

The resources

Related Posts

Hongmeng OS: architecture upgrade, documentation supplement, a large number of developers embrace Hongmeng

MySQL specifies periodic database backup

Index optimization for millions of data associated queries