How does Redis achieve high performance?
Redis this NOSQL database in the computer industry is known to everyone. As long as it involves data, it needs database. There are many types of database, but NOSQL’s KV in-memory database is also many. As one of them, how can Redis achieve the industry ceiling? How do you achieve high performance? How do you make it highly available? Today this article I will sort out some redis design to write, this article is still partial about the high performance this piece.
Efficient data structure
Compared with traditional relational databases, Redis database is also special in data structure. All its data types can be regarded as a map structure, and key is used as the query condition.
Redis is based on KV in-memory database, and it builds a hash table internally. When accessing according to the specified KEY, it only needs O(1) time complexity to find the corresponding data, and the value of value is a data structure with various characteristics, which provides Good performance for Redis in data operation.
Memory based storage
Compared with traditional relational databases, data files may be stored on the hard disk in the form of LSM tree or B + Tree. At this time, IO operations are required to read files. However, Redis is carried out in memory, which does not consume a lot of CPU resources, so the speed is very fast.
Memory from the picture above you can see it in the middle of the hard disk and CPU cache, compared to the hard disk to find data must be fast, of course, that is my personal opinion, if a relational database to some of the ordinary operation of the database are placed in memory cache, will also get some performance boost, like abnormal operating system inside page fault handling, The data fragments are cached in memory by some special algorithms to reduce the overhead of file IO.
IO multiplexing
Traditionally for concurrency, if one process is not working, can’t multiple processes handle multiple client connections at the same time? Multiple processes can solve some concurrency problems, but there are still some issues, such as context-switching overhead, thread looping, and recovery from the PCB back and forth. As the number of client requests increases, the number of threads increases with the number of requests. If it is concurrent, data sharing access is involved, and locks are sometimes used to control range ordering, affecting the efficiency of other threads. (A process can also be understood as a thread in Linux, and each process has only one thread. Of course, here I wrote the process above, don’t worry about these…)
Threads are running in the logic flow process context, a process can contain multiple threads, multiple threads running in the same process context, so all can share the process’s address space, solve the problem of communication between the process and the process is difficult, at the same time, due to the context of a thread is much smaller than the context of a process, so the thread context switching, Is much more efficient than context switching for processes.
likeredis
andNginx
Such applications are single-threaded programs, why can they achieve such strong performance? Let’s start with an example:
- Blocking IO
For lunch, I asked the restaurant owner for a bowl of “hot and dry noodles”, and THEN I waited there for the boss to do it. When the boss did not do it well, I waited there and did nothing until the “hot and dry noodles” was ready.
This process is called Blocking I/O as shown below:
In the synchronous blocking IO model, after an application makes a read call, it blocks until the kernel copies data into user space.
- Non Blocking IO
Switch to common:
Similarly, when you have lunch, you ask the restaurant owner for a bowl of hot and dry noodles, and then the boss starts to do it. You ask the boss every few minutes, ‘Is it ready? ‘until the boss says yes and you get to the end of’ hot and dry noodles’.
In a synchronous non-blocking I/O model, applications make read calls and wait for data to be copied from kernel space to user space. Threads are still blocked until data is copied from kernel space to user space. The process of fetching hot and dry noodles is the process of the kernel exchanging the prepared data into user space.
In summary, the disadvantages of the two models are similar. Both models wait for the kernel to prepare data, and then block waiting. The problem of blocking is also unavoidable.
- I/O Multiplexing
Again, the previous example:
Lunch, to the restaurant owner said to a bowl of ‘hot and dry noodles’, and then the boss arranged for the following cook, specific cook do not know, there are several cooks, and then the boss every once in a while to ask the following cook has done, if done, I will inform me to pick up food.
In the IO multiplexing model, the thread first makes a select call to ask the kernel if the data is ready. When the kernel is ready, the user thread makes a read call. The read call (data from the kernel -> user space) is still blocked.
Reactor monitors client request events through an I/O reuse program and distributes them through a task dispatcher. A connection request is processed by Acceptor and a corresponding handler is set up to handle subsequent operations. For a non-connection request, the Reactor calls the corresponding handler to complete the read-> business ->write process and return the result to the client. The whole process is done in one thread.
Redis is based on the Reactor single-thread model. The IO multiplexer receives the user’s request and pushes it to a queue and sends it to the file dispatcher. For subsequent operations, the entire process is done in a single thread, as seen in the REACTOR single-threaded implementation, so Redis is referred to as a single-threaded operation.
When we say Redis is single threaded fast, we mean that its request processing is very fast! It listens for multiple Socket requests in a single thread. When any Socket is readable/writable, Redis reads the client request, manipulates the corresponding data in memory, and then writes back to the Socket.
Benefits of single threading:
- There is no performance penalty for locking access to shared resources
- Development and debugging are very friendly and maintainable
- The additional overhead of not having multiple thread context switches is reduced, if not eliminated
Single threading is not without its disadvantages. In fact, the disadvantages are obvious. If the previous request takes a long time, the entire Redis will block and no other requests can be processed until the long operation is completed and returned. However, REDis uses the Reactor single-thread model to mitigate this situation.
In Redis 4.0 after the version, the introduction of multi-threading, and this multi-threading is only asynchronous release of memory, it is mainly to solve the performance problem in the release of large memory data caused by the whole Redis blocking, single Redis if the processing of large data requests or bottlenecks, but Redis has cluster high availability solutions can be solved, The master node is only responsible for writing, the slave node is only responsible for reading, I will write here for IO reuse, cluster high availability I will write another article.
copy-on-write
With efficient data structure and IO multiplexer model, can currently solve the problem of data access efficiency, but Redis in order to ensure that the data is not lost has a snapshot mechanism, when it comes to snapshot then will operate disk, Redis how to solve the data operation and also can ensure the integrity of data records? What about data access efficiency?
The answer is copy-on-write, what is copy-on-write? If you are a student or have a good operating system knowledge, this question is clear.
In operating system design, process memory can be divided into virtual memory and physical memory. What is virtual memory? You can check out my last article Virtual Memory. Redis will from the main process through the fork () system call, to create a child process, the parent process of virtual memory and physical memory mapping relationship is copied to the child, and will set the Shared memory, the child will only responsible for memory data written to the RDB persistence operation, if at the time of operation main process to modify the memory, Using copy-on-write technology, a copy of the corresponding memory is created and then persisted by the write.
As shown above, the main process provides a service, only when someone modifies the current memory data, to copy the modified memory page, for the purpose of creating a snapshot.
Pipeline communications
In addition to local server memory and data structure operations affect client read and write efficiency for network reasons. Redis communication protocol is a file protocol, interested in their own research, I am not going to write here. Each time the client operates, the command and metadata are packaged into the Redis protocol and transmitted to the server.
The execution time for each command is as follows: client send time + server process and return time + one network return time.
As can be seen from the figure above, if every command is operated, network IO must be executed once. If the client frequently operates data, network operation will be performed frequently. This process is time-consuming and affects performance. Redis has made some optimizations in the client to introduce a pipelining concept.
Pipes execute multiple irrelevant commands in batches to reduce the network interaction time caused by the execution of multiple commands. In some scenarios, data is operated in batches.
summary
Simplicity begins with complexity! , although the client on a few simple API call things, there are still a lot of design worth learning, after reading this article you may have a new understanding of redis high performance, do not underestimate some details optimization and solution selection, sometimes can bring significant performance improvement. Of course, this article does not complete the design of Redis, such as aOF kernel file descriptor mapping, asynchronous write data to disk, zero copy technology and so on. A future post will update you on how Redis high availability works.