As a member of web development, I believe you will encounter this question in your interview experience: how does Redis do persistence?

Before you give an answer, pause for a moment to think about it, and then take a look at what follows. Hopefully, after reading this article, you can answer this question.

Why persistence?

Due to Redis is a type of memory database, the server at run time, the system assigns a part of memory to store data, once the server hang up, or suddenly down, so the inside of the database data will be lost, in order to make the server even if all of a sudden shutdown can also save the data, must through the way of persistence to save data from memory to disk.

For persistent programs, data is written from the program to the computer’s disk as follows:

1. The client sends a write instruction to the database (the data is in the client’s memory).

2. The database receives write instructions and data (the data is in server memory at this time).

3. The database makes a system call to write data to disk (the data is in kernel memory).

4. The operating system transfers the data to the disk controller (the data is now in the disk cache)

5. The disk controller actually writes data to a physical medium (such as a disk)

If you only consider the database level, the data is safe after phase 3, at which point the system call has already been initiated, and even if the database process crashes, the system call will continue and the data will be written to disk without any problems. After this step, in step 4 kernel data from the kernel cache will be saved to disk cache, but for the sake of system efficiency, by default, will not do this too frequent, probably can be carried in the 30 s, this means that if the failed step this step has been carried out or when the server suddenly shut down, There may be 30 seconds of data loss, which is a more common catastrophic problem to consider.

The POSIX API also provides a system call that lets the kernel force cached data to be written to disk, the most common being the fsync system call.

int fsync(int fd);

The fsync function only works on one file specified by the file descriptor fd and does not return until the write operation is complete. Each time fsync is called, a write operation is initiated and the buffer’s data is written to disk. The fsync() function blocks the process while the write is complete, and if other threads are writing to the same file, it will block the other threads until the write is complete.

persistence

Persistence is a mechanism for converting program data between persistent and transient states. For the program, the program is running data in memory, if not timely synchronization written to the disk, so once power failure or the program suddenly crashed, the data will be lost, only the data timely synchronization to the disk, the data can be permanently saved, not because of downtime image data validity. Persistence is the act of synchronizing data from a program to disk.

Persistence of Redis

Redis has RDB and AOF persistence. RDB is a snapshot file. Redis backs up data by running the SAVE/BGSAVE command and saves the current data to a *. RDB file, which saves all data sets. AOF is an incremental persistence mode in which the server reads the configuration and appends redis write commands to *. AOF files within a specified time.

RDB

RDB files are implemented using the SAVE or BGSAVE commands. The SAVE command blocks the Redis server process until the RDB file is created. The BGSAVE command forks a child process to create an RDB file. The parent process shares data segments with the child process. The parent process continues to provide read and write services, and the child process performs backup. The BGSAVE phase copies shared data segments only when they need to be modified, that is, Copy On Write (COW). SAVE You can create an RDB file by setting multiple SAVE conditions. If one of the conditions is met, you can perform the SAVE operation in the background.

The code for the SAVE and BGSAVE commands is as follows:

Void saveCommand(client *c) {// BGSAVE Cannot SAVE if (server.rdb_child_pid! = -1) { addReplyError(c,"Background save already in progress"); return; } rdbSaveInfo rsi, *rsiptr; rsiptr = rdbPopulateSaveInfo(&rsi); If (rdbSave(server.rdb_filename,rsiptr) == C_OK) {addReply(c,shared.ok); } else { addReply(c,shared.err); }} /* * BGSAVE command implementation [optional parameter "schedule"] */ void bgsaveCommand(client *c) {int schedule = 0; /* When AOF is being executed, the SCHEDULE parameter changes the effect of BGSAVE * BGSAVE will be executed later instead of reporting an error * */ if (c->argc > 1) {// Can only be "schedule" if (c->argc == 2 &&! strcasecmp(c->argv[1]->ptr,"schedule")) { schedule = 1; } else { addReply(c,shared.syntaxerr); return; If (server.rdb_child_pid! = -1) { addReplyError(c,"Background save already in progress"); } else if (server.aof_child_pid ! If (schedule) {server.rdb_bgSAVE_scheduled =1; addReplyStatus(c,"Background saving scheduled"); } else { addReplyError(c, "An AOF log rewriting in progress: can't BGSAVE right now. " "Use BGSAVE SCHEDULE in order to schedule a BGSAVE whenever " "possible."); }} else if (rdbSaveBackground(server.rdb_filename,NULL) == C_OK) {// Otherwise call rdbSaveBackground to perform backup operation addReplyStatus(c,"Background saving started"); } else { addReply(c,shared.err); }}Copy the code

Once you have the RDB file, if the server is shut down or a new server needs to be added, you can restore previously backed up data by loading the RDB file after restarting the database server. But BGSave takes a long time, is not real-time, and can result in the loss of large amounts of data during outages.

AOF (Append Only File)

The RDB file holds the database’s key-value pair data, and the AOF file holds the write commands executed by the database.

The AOF implementation process has three steps:

append->write->fsync

Append appends commands to the AOF buffer, write writes the contents of the buffer to the program buffer, and fsync writes the contents of the program buffer to a file. When AOF persistence is enabled, each command executed by the server is appended to the aOF_buf buffer of the redisServer structure in a protocol format that is not explained here.

There is a configuration option at the time of AOF persistence: appendfsync. Everysec: Writes the contents of the aOF_buf buffer to the AOF file if it is no from the last time the AOF file was synchronized: Writes all the contents of the AOF_buf buffer to an AOF file, but does not synchronize the AOF file. It is up to the operating system to decide when to synchronize the AOF file, which is generally 30 seconds by default.

In AOF persistence mode, each write command is appended to an AOF file. As the server continues to run, the AOF file becomes larger and larger. To avoid large files generated by AOF, the server rewrites the AOF file and merges the same commands that operate on the same key to reduce the file size.

For example, to save an employee’s name, gender, etc. :

> hset employee_12345 name "hoohack"
> hset employee_12345 good_at "php"
> hset employee_12345 gender "male"
Copy the code

Just typing in the hash key state, the AOF file needs to hold three commands. If there are other operations, such as deleting, or updating values, there will be more commands, the file will be larger, and with overwriting, you can reduce the file size appropriately.

The principle of AOF rewriting is to first traverse the database in the server, then find out all the key objects in each database, obtain the key and value of the key value pair, and rewrite the key value pair according to the type of the key. For example, the above example can be combined into the following command:

> hset employee_12345 name "hoohack" good_at "PHP" gender "male"Copy the code

Redis is single-threaded, so if a server calls the override directly, the server cannot handle other commands. Therefore, the Redis server starts a separate process to perform the AOF override.

Redis performs the rewrite process:

When the child process executes the AOF rewrite, the server executes the command sent by the client after receiving the command from the client, appends the executed write command to the AOF buffer, and at the same time appends the executed write command to the AOF rewrite buffer. When the child process finishes rewriting, it sends a complete signal to the server, which appends the contents of the AOF rewrite buffer to the AOF file and atomically overwrites the existing AOF file.

Advantages and disadvantages of RDB and AOF

RDB persistence allows you to load backup files into the program by simply reading data from the server, whereas AOF requires you to create a dummy client.

RDB files are small and store data generated before a certain point in time, which is suitable for DISASTER recovery and primary/secondary synchronization.

RDB backup takes a long time. If a large amount of data is generated, some data may be lost in the event of an outage. In addition, the RDB is configured to be executed only when certain conditions are met. If the RDB goes down during this time, the data will also be lost.

In AOF mode, the file size is larger than that in RDB mode for the same data set.

The AOF persistence mode also depends on the configuration. The default mode is to synchronize every second, the fastest mode is to synchronize every command, and the worst mode is to wait for the system to execute fsync to synchronize the buffer to the disk file. On most operating systems, it takes 30 seconds. Typically, the synchronization is configured once per second, so at most 1s of data will be lost.

What’s a better way to synchronize?

RDB and AOF are combined. Start a scheduled task to back up data about the current status of the server every hour, named by the date and hour. Start another scheduled task to delete invalid backup files periodically (for example, 48 hours ago). AOF is configured for 1s. As a result, up to 1s of data can be lost, and if Redis is hit by an avalanche, it can be quickly restored to the previous day without stopping service.

conclusion

Redis persistence scheme is not invariable, and the theory on paper needs to be combined with practical results to prove its feasibility.

Original article, writing is limited, talent and learning shallow, if the article is not straight, hope to inform.

More exciting content, please pay attention to the individual public number.

Refer to the article: oldblog.antirez.com/post/redis-… Blog.httrack.com/blog/2013/1…