preface

Redis is an in-memory database. Data is stored in the memory. To avoid permanent data loss caused by process exit, data in Redis must be periodically saved from the memory to the local disk in the form of data or commands. The next Redis restart uses persistent files for data recovery. Redis provides two persistence mechanisms: RDB, which saves the current data to disk, and AOF, which saves each write command to disk (similar to MySQL’s Binlog). This article will introduce RDB and AOF persistence schemes in detail, including operation methods and implementation principles of persistence.

The body of the

Redis is a database server based on key-value pair (K-V) storage. The following introduces the internal structure of Redis database and the storage form of K-V, which helps us to understand the persistence mechanism of Redis more easily.

1. Redis database structure

A single Redis server has 16 databases (0-15) by default, and the number of databases is configurable. Redis uses database 0 by default, which can be switched using the SELECT command.

Each database in Redis is represented by a Redis. H /redisDb structure, which records the key space of the individual Redis database, the expiration time of all the keys, the keys in blocked and ready states, the database number, and so on.

typedef struct redisDb {
    // Database key space, which holds all the key pairs in the database
    dict *dict;
    // The expiration time of the key, the key of the dictionary is the key, the value of the dictionary is the expiration event UNIX timestamp
    dict *expires;
    // The key that is blocking
    dict *blocking_keys;
    // Unblocking key
    dict *ready_keys;
    // The key being monitored by the WATCH command
    dict *watched_keys;
    struct evictionPoolEntry *eviction_pool;
    // Database number
    int id;
    // Average TTL of database keys, statistics
    long long avg_ttl;
} redisDb;

Copy the code

Because Redis is a key-value pairs database, its database is itself a dictionary, with the corresponding structure being redisDb. Dict points to a dictionary of key-value pairs. Its key is a string object, and its values can be any Redis type object, including strings, lists, hash tables, collections, and ordered collections. Expires points to a dictionary that records the expiration date of a key. Its key is a dict database key, and its value is the expiration timestamp of that database key, represented as a long long.

2. RDB persistence

RDB persistence (also known as snapshot persistence) is to generate snapshots of data in memory and save them to disks with the file extension.rdb. The RDB file is a compressed binary file that can be read from the RDB snapshot file to recover data when Redis restarts. RdbSave and rdbLoad are two functions at the core of the RDB function. The former is used to generate the RDB file and save it to disk, and the latter is used to load the data from the RDB file into memory:

An RDB file is full data in a single file, which is suitable for disaster recovery, backup, and recovery. Restoring a database using an RDB file takes a short time. Usually, a 1 GB snapshot file takes about 20 seconds to load into the memory. Redis provides two types of RDB file generation methods: manually triggered save and automatic save interval. The following describes the RDB creation and loading process.

2.1. RDB creation and loading

By default, the Redis server implements persistence in RDB mode. The configuration items in the Redis. Conf file are as follows:

# RDB file name
dbfilename dump.rdb
Backup RDB and AOF files
dir /usr/local/var/db/redis/
Copy the code

2.1.1. Manually triggered save

Redis provides two commands for generating RDB files, one is SAVE and the other is BGSAVE. There are two ways to trigger Redis to perform RDB backup. One is to manually trigger snapshot generation by using the SAVE command or BGSAVE command. The other is to set the SAVE time and write times, and Redis automatically triggers the SAVE operation according to the conditions.

1. SAVE command

SAVE is a synchronous command that blocks the Redis server process until the RDB file is created. The server cannot process any additional command requests while the server process is blocked.

  • Client command
127.0.0.1:6379 > SAVE OKCopy the code
  • Server Logs
6266:M 15 Sep 2019 08:31:01.258 * DB saved on disk
Copy the code

After executing the SAVE command, Redis executes the SAVE operation in the server process (PID 6266), which blocks the request processing of the Redis client.

2. BGSAVE command

BGSAVE is an asynchronous command. Unlike the SAVE command, which blocks the server process directly, BGSAVE sends a child process that creates the RDB file, and the server process (the parent process) continues to process the client’s commands.

  • Client command
127.0.0.1:6379> BGSAVE
Background saving started
Copy the code
  • Server Logs
6266:M 15 Sep 2019 08:31:22.914 * Background saving started by pid 6283
6283:C 15 Sep 2019 08:31:22.915 * DB saved on disk
6266:M 15 Sep 2019 08:31:22.934 * Background saving terminated with success
Copy the code

The server process (PID 6266) forks a sub-process (PID 6283) for the BGSAVE command. The sub-process performs the RDB saving in the background, notifits parent process and exits when the operation is complete. Throughout the process, the server process spends a small amount of time creating child processes and processing child semaphores, and the rest of the time is on standby.

BGSAVE is the main way to trigger RDB persistence.

  1. The client initiates the BGSAVE command, and the main Redis process determines whether there is a child process currently performing a backup, and returns if there is
  2. The parent process forks a child process (which blocks during the fork) using the info STATS command to view the latest_fork_usec option to see how long the last fork took, in microseconds
  3. After the parent fork completes, the Background saving started message is returned, and the fork block is cleared
  4. Fork Creates a child process that starts generating a temporary snapshot file from the parent’s memory data and then replaces the original file
  5. After the backup is complete, the child process sends the completion message to the parent process, and the parent process updates the statistics
3. Comparison between SAVE and BGSAVE
The command SAVE BGSAVE
IO types synchronous asynchronous
Whether blocking The whole block Blocked at fork
The complexity of the O(n) O(n)
advantages No extra memory will be consumed Does not block the client
disadvantages Blocking client Fork Child processes consume memory

2.1.2. Automatic triggering of save

Because the BGSAVE command can be executed without blocking the server process, the Redis configuration file redis.conf provides a save option that lets the server automatically execute the BGSAVE command once in a while. The user can set multiple save criteria with the Save option, and the server executes the BGSAVE command if any of the criteria is met. The Redis configuration file redis.conf has the following three saving conditions by default:

save 900 1
save 300 10
save 60 10000
Copy the code

The BGSAVE command is automatically executed if any of the following three conditions are met:

  • The server made at least one change to the database within 900 seconds.
  • The server made at least 10 changes to the database within 300 seconds.
  • The server made at least 10,000 changes to the database within 60 seconds.

For example, run SET MSG “hello” to insert a key-value pair and wait 900 seconds for the Reids server process to trigger the save. The output is as follows:

6266:M 15 Sep 2019 08:46:22.981 * 1 changes in 900 seconds. Saving...
6266:M 15 Sep 2019 08:46:22.986 * Background saving started by pid 6266
6476:C 15 Sep 2019 08:46:23.015 * DB saved on disk
6266:M 15 Sep 2019 08:46:23.096 * Background saving terminated with success
Copy the code

The Redis server periodically operates on the serverCron function, which executes every 100 milliseconds. One of its tasks is to check whether the save criteria set by the Save option are met and, if so, automatically execute the BGSAVE command.

2.1.3. Enable automatic loading

Unlike creating RDB files using SAVE and BGSAVE commands, Redis does not provide commands specifically for loading RDB files, which are automatically loaded when the Redis server starts. Whenever an RDB file is detected in the specified directory on startup, Redis automatically loads the RDB file using the rdbLoad function.

The following is the log that was printed when the Redis server started, and the penultimate log was printed after the RDB file was successfully loaded.

$ redis-server /usr/local/etc/redis.conf
6266:M 15 Sep 2019 08:30:41.832 # Server initialized
6266:M 15 Sep 2019 08:30:41.833 * DB loaded from disk: 0.001 seconds
6266:M 15 Sep 2019 08:30:41.833 * Ready to accept connections

Copy the code

Since AOF files are incremental write command backups and RDB files are full data backups, they are updated more frequently than RDB files. Therefore, if the Redis server has AOF persistence enabled, the server will preferentially use AOF files to restore the database state. Only when AOF persistence is turned off will the server use RDB files to restore the database state.

2.2. File structure of RDB

RDB files are compressed binaries. Here are some details about the internal construction of RDB files.

2.2.1. Storage path

Both the SAVE and BGSAVE commands back up only the current database. The default backup filename is dump. RDB. You can change the backup filename dbfilename xxx.rdb through the configuration file. You can run the following command to view the backup file directory and RDB file name:

$redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> CONFIG GET dir 1)"dir"
2) "/usr/local/var/db/redis"
127.0.0.1:6379> CONFIG GET dbfilename
1) "dbfilename"
2) "dump.rdb"

Copy the code

The path for storing RDB files can be configured before startup or dynamically configured through commands.

  • Configuration item: Use dir to specify a directory and dbfilename to specify a filename
  • Dynamic specify: After Redis is started, you can also dynamically change the RDB storage path, which is very useful in the case of disk damage or insufficient space. Run the following command:
CONFIG SET dir $newdir
CONFIG SET dbfilename $newFileName

Copy the code

2.2.2. File format

RDB files have fixed format requirements. It stores binary data, which can be roughly divided into the following five parts:

  • REDIS: The file header holds a REDIS character of 5 bytes, which identifies the current file as RDB
  • Db_version: a 4-byte integer string used to record the version number of an RDB file
  • Aux: Records metadata information in the RDB file, including 8 add-ons
    • Redis-ver: indicates the version number of a Redis instance
    • Redis-bits: 64-bit or 32-bit host architecture for running redis instances
    • Ctime: The Unix timestamp when the RDB was created
    • Used_mem: specifies the memory size used to store snapshots
    • Repl-stream-db: Index of the Redis server’s DB
    • Repl-id: replication ID of the primary instance of Redis
    • Repl-offset: Replication offset of the primary Redis instance
    • Aof -preamble: Whether to place RDB snapshots in the header of aOF files (i.e. enable mixed persistence)
  • Databases: contains zero or any number of databases and key-value pairs for each database
  • EOF: a 1-byte constant that marks the end of the RDB file
  • Check_sum: an 8-byte integer that holds the checksum calculated from the previous four sections and is used to check the integrity of the RDB file

1. database

The Databases section of an RDB file contains zero or more databases. Each non-empty database contains SELECTDB, db_number, and key_value_pairs.

  • SELECTDB: a one-byte constant that tells the user program what to read next: a DB_number
  • Db_number: Stores a database number. When the program reads the db_number, the server will immediately call the SELECT command to switch to the corresponding database number
  • Key_value_pairs: Stores all key/value pairs in the database, including those with and without expiration time
2. key_value_pairs

The KEY_value_pairs part of the RDB holds one or more key-value pairs, and if the key-value pairs have an expiration date, the expiration date is stored in front of the pair. Here is the internal structure of the two key-value pairs:

  • EXPIREMENT_MS: a one-byte constant that tells the user that the program will read an expiration time in milliseconds next
  • Ms: an 8-byte integer that records the expiration time of the key-value pair, a timestamp in milliseconds
  • TYPE: Records the TYPE of the value. The length is 1 byte. Each TYPE constant represents an object TYPE or underlying code. When the server reads key-value data from an RDB file, the program uses the value of TYPE to determine how to read and interpret value data. Its value definition is usually one of the following constants:
    • REDIS_RDB_TYPE_STRING: indicates a character string
    • REDIS_RDB_TYPE_LIST: indicates the list type
    • REDIS_RDB_TYPE_SET: indicates the set type
    • REDIS_RDB_TYPE_ZSET: ordered collection
    • REDIS_RDB_TYPE_HASH: Hash type
    • REDIS_RDB_TYPE_LIST_ZIPLIST: indicates the list type
    • REDIS_RDB_TYPE_SET_INT_SET: Set type
    • REDIS_RDB_TYPE_ZSET_ZIPLIST: ordered collection
    • REDIS_RDB_TYPE_HASH_ZIPLIST: hash type
  • Key: a string object encoded in the same format as value of type REDIS_RDB_TYPE_STRING
  • Value: Depending on the TYPE of TYPE, the object TYPE can be String, list, set, zset, or hash

To see the internal structure of the RDB file, insert three key/value pairs into the Redis server:

127.0.0.1:6379 > SADD fruits"apple" "banana" "orange"
(integer) 3
127.0.0.1:6379> LPUSH numbers 128 256 512
(integer) 3 127.0.0.1:6379> SET MSG"hello"
OK

Copy the code

To forcibly persist data in the Redis process to the dump. RDB file, run the SAVE operation

127.0.0.1:6379 > SAVE OKCopy the code

Convert the data in the dump. RDB binary file to ASCII output using the Linux OD command, which is roughly the same as the storage format mentioned above:

$ od -c dump.rdb 0000000 R E D I S 0 0 0 9 372 \t r e d i s 0000020 - v e r 005 5 . 0 . 5 372 \n r e d i 0000040 s - b i  t s 300 @ 372 005 c t i m e 200 0000060 200 200 231 ] 372 \b u s e d - m e m 302 200 0000100 \v 020 \0 372 \f a o f - p  r e a m b l 0000120 e 300 \0 376 \0 373 003 \0 \0 003 m s g 005 h e 0000140 l l o 016 \a n u m b e r s 001 027 027 \0 0000160 \0 \0 022 \0 \0 \0 003 \0 \0 300 \0 002 004 300 \0 001 0000200 004 300 200 \0 377 002 006 f r u i t s 003 006 o 0000220 r a n g e 005 a p p L E 006 b a n a 0000240 n a 377 214 ک ** 3 366 < r X 0000253Copy the code

2.3. Common RDB configuration items

Here are the common configuration items (and default values) associated with RDB files in the redis.conf file:

  • Save m n: bgSave automatically triggers the condition; Without the save m n configuration, automatic RDB persistence is turned off, although it can still be triggered in other ways.
  • Stop-writes-on-bgsave-error yes: Indicates whether Redis stops executing write commands when bgSave errors occur. If the value is set to yes, disk faults can be detected in a timely manner to avoid massive data loss. If set to no, Redis ignores bgSave errors and continues to execute write commands. Consider setting this option to No when monitoring is used on the Redis server’s system (especially hard disks).
  • Rdbcompression yes: indicates whether to enable RDB file compression.
  • Rdbchecksum yes: indicates whether to enable RDB file checksum. It takes effect when files are written and read. Turning off checksum provides approximately a 10% performance improvement when writing files and starting files, but data corruption is not detected.
  • Dbfilename dump. RDB: Sets the name of the RDB file.
  • Dir./ : Sets the directories where RDB files and AOF files reside.

3. AOF persistence

RDB persistence is to periodically write the full amount of data in memory to a File. In addition, RDB also provides persistence function based on AOF (Append Only File). AOF records every write command executed by the Redis server in a log file. When the server is restarted, the command in the AOF file is executed again to recover data.

The main role of AOF is to solve the real-time of data persistence, which has become the mainstream way of Redis persistence.

3.1. Creation and loading of AOF

By default AOF is turned off and Redis will persist data only through RDB. To enable AOF, change the appendonly configuration item to yes in the redis. Conf file. In this way, when AOF persistence is enabled, rDB-based snapshot persistence takes a lower priority. Modify redis. Conf as follows:

# This option is used to enable AOF. The default is no
appendonly yes
Select AOF file name
appendfilename appendonly.aof
Backup RDB and AOF files
dir /usr/local/var/db/redis/

Copy the code

3.1.1. Creation of AOF

After the Redis server process restarts, an appendone. aof file is generated in the dir directory. The aof file is empty because the server has not executed any write commands. Run the following command to write some test data:

127.0.0.1:6379 > SADD fruits"apple" "banana" "orange"
(integer) 3
127.0.0.1:6379> LPUSH numbers 128 256 512
(integer) 3 127.0.0.1:6379> SET MSG"hello"
OK

Copy the code

The AOF file is plain text and the above write commands are written sequentially to appendone.aof (omiting the newline ‘\r\n’) : appendone.aof

/usr/local/var/db/redis$ cat appendonly.aof
*2 $6 SELECT The $10 * 5$4 SADD $6 fruits A $5 apple $6 banana $6 orange
*5 A $5 LPUSH $7 numbers $3 128 $3 256 $3512 * 3$3 SET $3 msg A $5 hello

Copy the code

RDB persistence is to save key-value data of Apple, Banana and Orange as binary files of RDB, while AOF is to save SADD, LPUSH, SET and other commands executed by Redis server into text files of AOF. The following is an internal construction diagram of an AOF file:

3.1.2. Loading of AOF

Restart the Redis server process again and observe the startup log. Redis will load data from the AOF file:

52580:M 15 Sep 2019 16:09:47.015 # Server initialized
52580:M 15 Sep 2019 16:09:47.015 * DB loaded from append only file: 0.001 seconds
52580:M 15 Sep 2019 16:09:47.015 * Ready to accept connections

Copy the code

Read the restored key/value data from the AOF file:

127.0.0.1:6379 > SMEMBERS fruits 1)"apple"
2) "orange"
3) "banana"127.0.0.1:6379> LRANGE numbers 0-1 1)"512"
2) "256"
3) "128"127.0.0.1:6379 > GET the MSG"hello"

Copy the code

3.2. AOF implementation process

AOF does not need to set any trigger conditions. All write commands to the Redis server are automatically recorded in the AOF file. The following describes the execution process of AOF persistence.

The process of writing AOF files can be divided into the following three steps:

  1. Command append: Appends write commands executed by Redis to the AOF buffer aof_buf
  2. File write and file sync: THE AOF synchronizes data of the AOF_BUF to the hard disk based on the corresponding policy
  3. File rewrite: Periodically rewrite AOF to compress write commands.

3.2.1. Add commands

Redis uses a single thread to process client commands. In order to avoid the performance bottleneck of Redis, it appends the write commands to an AOF_buf buffer instead of writing them directly to a file.

The format of command append is the protocol format of Redis command request. It is a pure text format with good compatibility, readability, easy processing, simple operation and avoidance of secondary overhead. In the AOF file, all write commands are sent by the client, except that the select command used to specify the database (for example, select 0 for select database 0) is added by Redis.

3.2.2. File write and file synchronization

Redis provides a variety of file synchronization policies for the AOF cache. Related policies involve the write() and fsync() functions of the operating system, as described below:

1. write()

To improve file writing efficiency, when a user calls the write function to write data to a file, the operating system writes the data to a memory buffer first. When the buffer is filled up or the specified time limit expires, the data in the buffer is actually written to the disk.

2. fsync()

While the write() function is optimized at the bottom of the operating system, it also presents security issues. The system also provides a synchronization function, fsync(), to force the operating system to immediately write the data in the buffer to disk to ensure data persistence.

Redis provides the appendfsync configuration item to control the file synchronization policy of the AOF cache. Appendfsync can be configured with the following three policies:

  • Appendfsync always: Saves each time the command is executed

After the command is written to the aOF_buf buffer, it immediately calls the system fsync function to synchronize to the AOF file. After the fsync operation is completed, the thread returns, and the whole process is blocked. In this case, every write command needs to be synchronized to the AOF file, and the disk IO becomes a performance bottleneck. Redis can only support several hundred TPS writes, which seriously degrades the performance of Redis.

  • Appendfsync no: not saved

After the command is written into the aOF_buf buffer, the system calls the write operation and does not fsync the AOF file. The operating system is responsible for the synchronization, and the synchronization period is usually 30 seconds. In this case, the file synchronization time is not controllable, and the data in the buffer is too large. Therefore, data security cannot be guaranteed.

  • Appendfsync everysec: saves every second

After the command is written into the aOF_buf buffer, the system calls the write operation. After the write operation, the thread returns immediately. The fsync file synchronization operation is called by a separate process every second. Everysec is a compromise between the two strategies, balancing performance and data security, and is therefore the default and preferred configuration option for Redis.

File Synchronization Policy The write block Fsync blocking The amount of data lost during an outage
always blocking blocking At most, data of one command is lost
no blocking Don’t block The operating system last fsync data to the AOF file
everysec blocking Don’t block Generally less than 1 second of data

3.2.3. File rewrite

As commands are continuously written into AOF, files become larger, occupying larger space and taking longer time to recover data. To solve this problem, Redis introduced a rewrite mechanism to merge write commands in AOF files, further reducing file size.

AOF file rewriting refers to converting the data in the Redis process into write commands, synchronizing them to the new AOF file, and then overwriting the old AOF file with the new AOF file without any read or write operations on the old AOF file.

1. Trigger mechanism

The AOF rewrite process provides both manual and automatic triggering mechanisms:

  • Manual trigger: The bgrewriteaof command is invoked directly, which is somewhat similar to bgSave in that the child process is forked and blocks only when forked
  • Automatic trigger: The trigger time is determined according to the auto-aof-rewrite-min-size and auto-aof-rewrite-percentage configuration items, and the status of aOF_current_size and aof_base_size
    • Auto-aof -rewrite-min-size: specifies the minimum size of a file when aof rewriting is performed. The default value is 64MB
    • Auto-aof-rewrite-percentage: percentage of the current AOF size (AOF_current_size) to the aOF_base_size (aOF_base_size) when aOF rewrite is performed
2. Rewrite the process

The following uses manually triggering AOF rewrite as an example. When the bgrewriteaOF command is executed, the AOF file rewrite process is as follows:

  1. The client initiates an AOF rewrite request to the main Redis process using the bgrewriteaof command
  2. The main Redis process forks when no bgSave /bgrewriteaof child is currently executing. The main process blocks. If the bgrewriteAof child process is found, it returns directly; If a BGSave child process is found, wait until the BGSave execution is complete and then fork
  3. After the main process forks, it continues processing the other commands, appending the new write command to both the aOF_buf and aof_rewrite_buf buffers
    • Before the file is overwritten, the main process continues to append write commands to the aOF_buf buffer and synchronizes to the old AOF file according to the appendfsync policy. This avoids data loss due to an AOF rewrite failure and ensures the correctness of the original AOF file
    • Since fork uses copy-on-write technology, the child process can only share the memory data at fork. The main process apends the new command to an aOF_rewrite_buf buffer to avoid losing the data during AOF overwriting
  4. The child process reads the data snapshot in the Redis process, generates write commands and writes in batches to the new AOF file according to the command merge rules
  5. After the child process writes the new AOF file, it signals the main process, which updates the statistics, which can be viewed through Info Persistence
  6. After receiving the child’s signal, the main process appends the write command in the aOF_rewrite_buf buffer to the new AOF file
  7. The main process replaces the old AOF file with the new AOF file, and the AOF rewrite process is complete
3. Compression mechanism

File overwriting can reduce the size of AOF files for several reasons:

  • Expired data is no longer written to the AOF file
  • Invalid commands are no longer written to the AOF file. For example, duplicate set values for data (set Mykey v1, set Mykey v2), delete key-value pairs (sadd myset v1, del myset), and so on
  • Multiple commands can be combined into a single command. For example, sadd myset v1, sadd myset v2, and sadd myset v3 can be combined with sadd myset v1, v2, and v3. However, to prevent client buffer overflow caused by a large command, it is not necessary to use a single command for list, set, hash, and zset keys, but to split the command into multiple commands based on a constant defined by a certain Redis

3.3. Common configuration items of AOF

Here are the common configuration items (and default values) associated with AOF files in the redis.conf file:

  • Appendonly No: Whether to enable the AOF persistence function
  • Appendfilename “appendone. aof” : name of the aOF file
  • Dir./ : Directory where RDB files and AOF files reside
  • Appendfsync Everysec: Fsync persistence policy
  • No-appendfsync-on-rewrite no: Whether to disable fsync operations during AOF file rewriting. If this option is enabled, you can reduce the load on the CPU and disk (especially the disk) during file rewriting, but you may lose data during AOF rewriting, which requires a balance between load and security
  • Auto-aof -rewrite-percentage 100: specifies one of the conditions for triggering aOF file rewriting
  • Auto-aof -rewrite-min-size 64mb: one of the aof file rewriting trigger conditions
  • Aof -load-truncated yes: If the end of aOF file is damaged, whether the Redis server still loads the AOF file when it starts

4. Data recovery mechanism

As mentioned earlier, when AOF persistence is enabled, the Redis server will first execute the command of AOF file to recover data. Only when AOF is disabled, the file data of RDB snapshot will be loaded first.

  • Redis server startup log when AOF is turned off and RDB persistence is enabled:
6266:M 15 Sep 2019 08:30:41.832 # Server initialized
6266:M 15 Sep 2019 08:30:41.833 * DB loaded from disk: 0.001 seconds
6266:M 15 Sep 2019 08:30:41.833 * Ready to accept connections

Copy the code
  • Redis server startup log when AOF is enabled and the AOF file exists:
9447:M 15 Sep 2019 23:01:46.601 # Server initialized
9447:M 15 Sep 2019 23:01:46.602 * DB loaded from append only file: 0.001 seconds
9447:M 15 Sep 2019 23:01:46.602 * Ready to accept connections

Copy the code
  • When AOF is enabled and AOF file does not exist, the RDB file will not be loaded even if it exists, Redis server startup log:
9326:M 15 Sep 2019 22:49:24.203 # Server initialized
9326:M 15 Sep 2019 22:49:24.203 * Ready to accept connections

Copy the code

5. Comparison between RDB and AOF

Persistence mechanism RDB AOF
Startup priority low high
Disk file volume small big
Data recovery speed fast slow
Data security Easy to lose data By strategy
Operation Severity heavy light

5.1. Advantages and disadvantages of RDB

Advantages of 5.1.1.

  • RDB is a very compact compressed file that holds data sets at a certain point in time, suitable for data backup and disaster recovery
  • To maximize the performance of Redis, the server process only needs to fork a child process to create the RDB file. The parent process does not need to do IO operations
  • Large data sets can be recovered faster than AOF persistence

5.1.2. Shortcomings

  • RDB is not as secure as AOF. Saving an entire data set is a heavyweight process that may take several minutes to persist depending on the configuration, and if the server goes down, several minutes of data may be lost
  • When the Redis data set is large, the child process of fork takes more CPU and time to complete the snapshot

5.2. Advantages and disadvantages of AOF

Advantages of 5.2.1.

  • More data integrity, more security, loss of data in seconds (depending on fsync policy, up to 1 second if everysec)
  • AOF file is a command file that only appends, and the write operation is saved in the format of Redis protocol. The content is readable and suitable for emergency recovery

5.2.2. Shortcomings

  • For the same data set, AOF files are much larger than RDB files, and data recovery is slow
  • Depending on the fsync strategy used, AOF may be slower than RDB. In general, however, fsync performance per second is still very high

6. Rdb-aof hybrid persistence

RDB snapshot files are rarely used to restore memory state when rebooting a Redis server, as large amounts of data can be lost. AOF files are used for command replay, but AOF command performance is much slower than RDB. Therefore, in the case of large Redis data, it takes a lot of time to start.

Since RDB snapshots may cause data loss and AOF instructions may be slow to recover data, Redis 4.0 provides a hybrid persistence mechanism based on AOF RDB that retains the advantages of both persistence mechanisms. The AOF file thus rewritten consists of two parts, one is RDB header data and the other is AOF tail instructions.

Mixed persistence is turned off by default in Redis 4.0. Enable this feature by configuring aof-use-rdb-preamble to yes:

Enable AOF-RDB hybrid persistence
aof-use-rdb-preamble yes

Copy the code

Check whether Redis server has mixed persistence enabled:

127.0.0.1:6379> CONFIG GET aof-use-rdb-preamble
1) "aof-use-rdb-preamble"
2) "yes"

Copy the code

Store the contents of the RDB data file with the incremental AOF command file as shown. Here the AOF command is no longer the full command, but rather the incremental AOF command executed by the server process from the beginning of persistence to the end of persistence, which is usually small.

When the Redis server is restarted, it can preload the full amount of RDB data in the head of AOF file, and then replay the incremental AOF command in the tail of AOF file, thus greatly reducing the time of data restoration during the restart process.

7. Select a persistence policy

7.1. RDB and AOF performance overhead

Before introducing persistence strategies, it is important to understand that enabling persistence, whether RDB or AOF, incurs a performance overhead.

  • RDB persistence:
    • The main Redis server process blocks when the BGSAVE command forks
    • Writing data to disk by the Redis child process also causes IO stress
  • AOF persistence:
    • The frequency of writing data to disk increases greatly, resulting in greater I/O pressure and even AOF appending blocking
    • AOF file rewriting is similar to RDB’s BGSAVE process, with problems of blocking when the parent process forks and IO pressure for the child process

Because AOF writes data to disk more frequently, the impact on Redis server main process performance is greater.

7.2. Persistence Policy

In actual production environments, there are various persistence strategies based on data volume, data security requirements of applications, and budget constraints.

  1. No persistence at all
  2. Use either RDB or AOF
  3. Enable both RDB and AOF persistence

In a distributed environment, the choice of persistence must be considered in conjunction with Redis’ master-slave policy, because master-slave replication and persistence also have data backup functions, and the Master Node and Slave Node can independently choose the persistence scheme.

The following discussion is for reference only. The actual solution may be more complex and diverse.

7.2.1. Database caching

It doesn’t matter if the data in Redis is completely discarded (such as Redis being used solely as a cache for the DB layer data), then no persistence can be carried out on either standalone or master-slave architectures.

7.2.2. Single-machine environment

In the single-machine environment, if the data loss can be accepted for more than ten minutes or more, the RDB scheme is more favorable to the performance of Redis. If only second data loss is acceptable, AOF is more suitable.

7.2.3. Active/standby deployment

In most cases, Redis configures the master-slave deployment mechanism. The slave node can not only realize hot backup of data, but also share read and write requests of Redis, and replace the master node when it is down.

In this case, a possible course of action is as follows:

  • Master: Turn persistence off completely (including RDB and AOF) to maximize performance on the master node
  • Slave: Disables the RDB function and enables the AOF function. (If data security requirements are not high, you can also disable the AOF function by enabling the RDB function.) Back up persistent files periodically (to another folder, for example, and mark the time of the backup). Then turn off the automatic rewriting of AOF and add a scheduled task that calls bgreWriteAof to manually rewrite it every day when the Redis server is idle (such as 12 am).

Why is it necessary to set persistence when primary/secondary replication is enabled to achieve hot backup of data? In some special cases, master/slave replication is not enough to secure data, for example:

  • Master and Slave stop at the same time: If the master and slave nodes are located in the same machine room, a power failure may cause the master and slave machines to shut down at the same time and the Redis server process to stop. Without persistence, you face complete loss of data.
  • Master restart: If the master node breaks down due to a fault and the system has an automatic pull up mechanism (that is, the system restarts the service after detecting that the service stops), the master node automatically restarts.
    • Since there is no persistent file, the master data is empty after restart, and the slave synchronization data is also empty
    • If persistence is not enabled on both master and slave nodes, complete data loss can also occur

7.2.4. Remote Disaster preparedness

The preceding persistence policies are for common system faults, such as process exit, outage, and power failure. These faults do not damage disks. However, for disasters that may damage hard disks, such as fires and earthquakes, remote DISASTER recovery is required.

  • Single-machine environment: RDB files or rewritten AOF files can be regularly copied to remote machines, such as Ali Cloud and AWS, by using SCP commands
  • In active/slave deployment mode, you can periodically perform BGSAVE operations on the master node and copy THE RDB file to the remote machine, or run the bgrewriteaOF command on the slave node to rewrite the AOF file and copy the AOF file to the remote machine.

Due to the small size of RDB files and high recovery speed, RDB is generally used for disaster recovery. The frequency of remote backup depends on data security requirements and other conditions, but should not be less than once a day.

summary

This paper mainly introduces the database structure of Redis server, and further introduces several persistence mechanisms provided by Redis, including RDB full persistence based on data snapshot, AOF incremental persistence based on command addition, and hybrid persistence supported by Redis 4.0. For RDB persistence mode, RDB snapshot creation and restoration process, RDB file structure and related configuration items are presented. For the AOF persistence mode, the creation and restoration process of AOF logs, the execution process of AOF, the internal format of AOF files and relevant configuration items are presented. At the end of the paper, the advantages and disadvantages of RDB and AOF are analyzed, as well as the performance cost, and the persistence strategy in single-machine environment, master-slave deployment, and remote disaster preparedness scenarios.

About the Official Account

This account continues to share learning materials and articles on back-end technologies, including virtual machine basics, multithreaded programming, high-performance frameworks, asynchronous, caching and messaging middleware, distributed and microservices, architecture learning and progression.