It’s from the hemlock code farming course.

Implications of redis persistence for disaster recovery in production environments

Course outline

1. What happens when a fault occurs

When Redis suddenly hangs, the process dies or the machine on which it is working is lost, causing redis to fail, which can result in the loss of important cached data. Mysql may be down, and redis data recovery is difficult.

2, how to deal with the occurrence of failure

Many students have also read some redis materials and books, and of course may have seen some Redis video courses

All of the materials are actually going to talk about redis persistence, but there’s a problem that I haven’t seen anyone explain redis persistence very carefully so far

Redis persistence, RDB, AOF, difference, what are their characteristics, suitable for what scenario

What is redis enterprise-level persistence solution and what enterprise-level scenarios are it used with?

The significance of Redis persistence lies in failure recovery

For example, if you deploy a Redis as a cache, you can also store some important data

Without persistence, Redis will lose all data in the event of a catastrophic failure

If you persist a copy of your data onto disk and then periodically synchronize and back it up to some cloud storage service, for example, you can ensure that you don’t lose all of your data and you can still recover some of it

Graphical analysis of redIS and RDB and AOF persistence mechanisms

Course outline

RDB: AOF: AOF: RDB: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF: AOF

We already know that persistence is irreducible for an enterprise-level Redis architecture

Enterprise redis cluster architecture: massive data, high concurrency, high availability

Persistence is mainly for disaster recovery, data recovery, and can also be classified as a link of high availability

So if you have a whole redis down, and then Redis is not available, what you want to do is make Redis available, as soon as possible

Restart Redis and make it available to the public as soon as possible, but like last time, if you didn’t make a backup, then Redis is up and not available, so the data is gone

It is possible to say that a large number of requests come, the cache can not hit, in Redis can not find the data, this time is dead, cache avalanche problem, all requests, not hit in Redis, will go to mysql database such data source to find, suddenly mysql to accept high concurrency, and then hang

When mysql crashes, you can’t find data to restore to Redis. Where does redis data come from? From mysql…

The full cache avalanche scenario, as well as enterprise-level solutions, will be covered later

If you do a good job of redis persistence, backup and recovery plan to achieve enterprise-level, so even if your Redis failure, you can also backup data, rapid recovery, once the restoration of external services immediately

Redis persistence, with high availability, is related, enterprise redis architecture to explain

Redis persistence: RDB, AOF


1. Introduction of RDB and AOF persistence mechanisms

RDB persistence mechanism, periodic persistence of data in REDis

The AOF mechanism logs each write command and writes it to a log file in appends-only mode. When Redis restarts, the entire data set can be reconstructed by playing back the write command in the AOF log

If we want Redis to be used only as a pure memory cache, then we can disable all persistence mechanisms of RDB and AOF

Through RDB or AOF, redis memory data can be persisted to disk, and then these data can be backed up to other places, such as the Ali cloud, cloud services

If Redis fails, the memory and disk data on the server are lost. You can copy back the previous data from the cloud service and put it in the specified directory. Then restart Redis, redis will automatically restore the data in the memory according to the data in the persistent data file and continue to provide services

If both RDB and AOF persistence mechanisms are used, then when Redis restarts, AOF will be used to rebuild the data, because the data in AOF is more complete


2. Advantages of RDB persistence mechanism

(1) RDB will generate multiple data files, and each data file represents redis data at a certain time. This method of multiple data files is very suitable for cold backup. Such complete data files can be sent to some remote secure storage, such as Amazon S3 cloud service. In China, it can be on the ODPS distributed storage of Ali Cloud to regularly back up data in Redis with a predetermined backup strategy

  • RDB prepares by generating multiple files, each representing a complete snapshot of the data at one point in time
  • AOF does cold standby, only one file, but you can copy a copy of the file every certain time

(2) RDB has very little impact on the external read and write services provided by Redis, so that Redis can maintain high performance, because the main process of Redis only needs to fork a sub-process, let the sub-process perform disk IO operations to carry out RDB persistence

RDB writes directly to redis memory every time it writes data to disk only at certain times

AOF can write files to the OS cache quickly, but it still has some time overhead. It is definitely slower than RDB

(3) Compared with AOF persistence mechanism, it is faster to restart and restore redis process directly based on RDB data files

AOF stores instruction logs. When you do data recovery, you actually need to call back and execute all instruction logs to recover all the data in memory

RDB is a data file that can be loaded directly into memory during recovery

Combined with the above advantages, RDB is particularly well suited for cold backup


3. Disadvantages of RDB persistence mechanism

(1) RDB is not as good as AOF if you want to lose as little data as possible when REDis fails. In general, RDB data snapshot files are generated every 5 minutes or more, at which point you have to accept that if the Redis process goes down, the data in the last 5 minutes will be lost

(2) Every time the RDB forks a child process to generate the RDB snapshot data file, if the data file is very large, the service provided to the client may be suspended for milliseconds, or even seconds


4. Advantages of AOF persistence

(1) AOF can better protect against data loss. Generally, AOF will execute fsync operation every second through a background thread, and lose data for at most one second

(2) AOF log files are written in appends-only mode, so there is no disk addressing overhead, high write performance, and the file is not easy to damage, even if the tail of the file is broken, it is easy to repair

(3) Even if the AOF log file is too large, the background rewrite operation will not affect the client’s reading and writing. When the rewrite log was written, the guidance was compressed to create a minimal log that needed to be retrieved. When a new log file is created, the old log file is written as usual. When the log files after the merge are ready, the old and new log files can be exchanged.

(4) Commands for AOF log files are recorded in a very readable manner, which is ideal for emergency recovery in the event of catastrophic deletions. Flushhall flushes all data in the flushhall file. Rewrite in the background has not yet happened. Flushhall deletes the last item in the AOF file and then flushes the AOF file back


5. Disadvantages of AOF persistence

(1) AOF log files are usually larger than RDB data snapshot files for the same data

(2) When AOF is enabled, the write QPS supported by RDB is lower than that supported by RDB, because AOF is usually configured to fsync log files once per second. Of course, once per second is still very high performance

(3) There was a bug in AOF before, that is, the same data was not recovered when the logs recorded by AOF were recovered. Therefore, a more complex command log /merge/ playback approach such as AOF is more vulnerable and buggy than the rDB-based approach of persisting a complete data snapshot file at a time. AOF, however, is designed to avoid bugs in the rewrite process, so instead of merging the rewrite log, rewrite it based on the data in memory at the time, which is much more robust.


6. How to choose BETWEEN RDB and AOF

(1) Don’t just use RDB, because that will cause you to lose a lot of data

(2) Don’t just use AOF, because that has two problems. First, if you use AOF for cold backup, you can recover faster without RDB for cold backup. Second, RDB is more robust by simply generating snapshots each time, avoiding the bugs of complex backup and recovery mechanisms such as AOF

(3) AOF and RDB persistence mechanisms are used comprehensively, and AOF is used to ensure that data is not lost as the first choice of data recovery; RDB is used for varying degrees of cold backup and for quick data recovery when AOF files are lost or corrupted and unavailable