Why persistence

Redis has 10GB of data, a power outage or an unexpected outage, and 10GB is gone when it restarts? ! Therefore, persistence is required, and data can be recovered through persistent files after downtime.

Two, advantages and disadvantages

1. RDB file

RDB files are binary and small. For example, if you have 10GB of memory data, you might have 1GB of RDB files, just for example.

2, strengths,

Since RDB files are binary files, they are small and can be faster in disaster recovery.
Its efficiency (how efficiently the main process handles commands, not persistence) is higher than aOF’s (bgSave not SAVE) because it does nothing with each request, except that it forks () the child process and may copyonwrite, But copyonwrite is just an addressing process, at the nanosecond level. Aof is a write disk operation every time, millimeter level. Just can’t.

3 and disadvantages

Data reliability is lower than AOF, that is, more data is lost. Because AOF can be configured to persist every second or persist every command, RDB can be configured to persist every second, but it is not as flexible as AOF configuration, nor as fast as AOF persistence, because RDB is full each time, AOF only appends each time.

Two methods of RDB persistence

The configuration file can also configure the rules that trigger the RDB. The rules configured in the configuration file follow the bgSave principle.

1, save

1.1, description,

Synchronization, blocking

1.2 and disadvantages

Redis service blocks during persistence (to be exact, it blocks the thread currently executing the save command, but redis is single-threaded, so the entire service blocks), and cannot continue to serve requests to the outside world. If you have a small amount of data, it doesn’t really matter. What about a large amount of data? Each copy takes an hour, which is equivalent to an hour of downtime.

2, bgsave

2.1, description,

Asynchronous, non-blocking

Principle of 2.2,

fork() + copyonwrite

2.3 advantages,

He can perform persistence while providing read/write services without affecting each other. Newly written data does not affect the persistence of my data. If you report an error or take a long time during the persistence process, it does not affect the service that I provide requests to the external. Persistence overwrites the previous RDB file.

Four, the fork ()

Bgsave works with fork() + copyonWrite, so let’s talk about fork()

1. What is fork()

Fork () is an API for operating systems like Unix and Linux, not Redis.

2. What does fork() do

Fork () is used to create a child process, not a thread. The process fork() shares the memory data of its parent. Only the memory data at the moment when the child is fork() is shared is not visible to the child process. Similarly, the data modified by the child process is not visible to the main process.

For example, if process A forks () A child process B, then process A is called the main process. In this case, the main process and its children point to the same memory space, so their data is consistent. However, WHEN A modifies A data in the memory, B cannot see it. When A adds or deletes A data, B cannot see it. If the child process B fails, it has no impact on the main process A. I can still provide external services, but the main process hangs, and the child process must hang with it. This is a bit like the concept of a daemon thread. Redis uses the fork() API to persist the RDB.

Redis fork()

Redis makes a clever use of fork(). When bgSave is executed, the main Redis process forks () to persist the RDB file. If not, the main Redis process forks () to persist the RDB file. So the child process can persist its RDB files, and the main process can continue to provide external services.

We said that their modified memory data is not visible to each other, but they all point to the same memory space. If I have 4 gigabytes of memory, then the maximum limited space is 2 gigabytes. I want to give half of the RDB space. That’s bullshit! So what did he do? Copyonwrite technology is adopted.

Six, copyonwrite

The main process and its children now share a piece of memory. How can they change each other without affecting each other?

1, the principle of

After the main process forks (), the kernel sets the permissions of all memory pages in the main process to read-only, and the address space of the child process points to the main process. When one of the main processes writes to the memory (the main process writes to the RDB file, not the client), the CPU hardware detects that the page is read-only and triggers a page-fault. An interrupt routine that falls into the kernel.

In the interrupt routine, the kernel makes a copy of the page of the raised exception (only the exception page is copied, that is, the modified data page, not all the data in memory), so that the master child process keeps a separate copy. What the data looks like before modification

What the data looks like after modification

2. Back to the original problem

For example, if the main process receives a request for set k 1 (the value of k was 2) and a child process persists in the RDB, the main process will copy a copy of the data page with the key K. In addition, the pointer k in the main process points to the newly copied data page address, and then changes the value to 1. The address of the main process K element references the newly copied address, while the memory data K referenced by the child process is the same as before.

3. A one-paragraph summary

The child of copyonWritefork () shares the physical space of the main process. When the main process writes to memory, read-only the page is interrupted and a copy of the page is made (the rest of the page is shared).

4. Additional supplements

In the Redis service, the child process only reads the data in the shared memory, it does not perform any write operation, only the main process will trigger this mechanism when writing, and for most Redis services or databases, the write request is often far less than the read request. So using fork() plus copy-on-write provides very good performance and makes BGSAVE easy to implement.

Seven, doubt

0. Calling fork() also blocks

All I can say is nothing wrong, but the blockage is really negligible. In particular, save relative to the blocking main thread.

1. Will there be multiple child processes at the same time?

No, each time the main process receives a BGSave command that requires forking (), it will determine whether the child exists and ignore the bgSave request if it does. If it doesn’t, I’ll fork().

Why did you do that?

Here’s why I guess:

Supporting multiple child processes in parallel can not only slow down server performance, but also cause data problems, such as bgSave working at 8:00 and a BGSave command at 9:00. At this time, the execution of nine o ‘clock was finished, and the execution of eight o ‘clock was finished, and the execution of nine o ‘clock was not in vain? This is what I call a data problem. For example, even before the execution is complete, another BGSave is opened at 10 o ‘clock, which accumulates and drags down server performance.
Why isn’t it blocked? If you judge that a child process is working, you wait until it’s finished and I’m on the field, and it’s like that, and it keeps piling up, and the files are too big, and it just keeps piling up.

2. What would be the effect without copyonWrite?

If it’s a full copy, then the memory space is cut in half, wasting resources and taking 10 gigabytes of data to make a full copy. Who can stand that?
What happens if you don’t copy it in full? It’s like I’m copying and you’re writing data, and it doesn’t look like a problem, but it’s not. For example, in Redis, k1 is 1 and k2 is 2, such as bgsave, when RDB writes the value of k1, before writing the value of k2, a client request

set k1 11 
set k2 22
Copy the code

So I persist k2 22, but k1 is still 1, not the latest 11, so there is a data problem, so I use copyonWrite technology to ensure that when I trigger bgSave request, whatever you do will not affect the data persistence of my RDB file.

Eight, summary

This article is all important, very little nonsense. Nothing to sum up. Redis writers know a lot about underlying operating systems, first epoll and now fork() and copyonWrite. Admire 3 lian!!

END

Java interview questions column

Interviewer: Talk about capacity and expansion implementation in HashMap

【82 】 When asked about SQL optimization in an interview, you should read this article.

[83] What is the difference between Redis and MongoDB? Just look here

What can Design patterns ask in an interview? For example, there are three singleton implementations

[issue 85] Six principles of Java object-oriented design

A String of interview Questions and answers

Q: Why should Java serialization and deserialization implement Serializable interface

Interviewer: Can you talk about how interface beans are injected in Spring?

Q: How many HTTP requests can a TCP connection send?

Interviewer: Talk about the idea of using Redis to achieve a large-scale post count

I know you're "watching."Copy the code

Redis RDB persistence Redis RDB persistence Redis RDB persistence

Redis RDB persistence Redis RDB persistence