Redis is single-threaded, which means that the network IO and key-value pair reads and writes of Redis are completed by one thread, which is also the main process of Redis providing key-value storage service externally. But other Redis functions, such as persistence, asynchronous deletion, and cluster data synchronization, are actually performed by additional threads.
1. Why does Redis use single threads?
When writing daily programs, we often hear a saying: “Using multithreading, can increase system throughput, or can increase system scalability.” Indeed, for a multithreaded system, with a reasonable allocation of resources, it is possible to increase the number of resource entities in the system to handle requests, thus increasing the number of simultaneous requests that the system can handle, that is, throughput. The left figure below is what we expect when we adopt multithreading.
But notice that often, when we do multithreading, if we don’t have a good system design, we actually get something like this. When we first increase the number of threads, throughput increases, but as we increase the number of threads further, throughput slows down and sometimes even drops.
Why does this happen? A key bottleneck is that there is often a shared resource, such as a shared data structure, that can be accessed simultaneously by multiple threads. When multiple threads modify the shared resource, additional mechanisms are required to ensure that the shared resource is correct, and this additional mechanism incurs additional overhead.
Redis, for example, has a List data type and provides LPOP and LPUSH operations. Assume that Redis has A multi-threaded design, as shown in the figure below. Now you have two threads, A and B. Thread A LPUSH A List and increments the queue length by one. At the same time, thread B performs LPOP on the List and subtracts the queue length by one. To ensure that the queue length is correct, Redis needs to serialize the LPUSH and LPOP of threads A and B so that Redis can record their changes to the List length without error. Otherwise, we might end up with the wrong length. This is the problem of concurrent access control of shared resources in multithreaded programming.
Has always been a multi-threaded concurrent access control is a difficult problem in the development, if there is no elaborate design, for example, simply use a coarse-grained mutex, can appear not ideal result: even if increased the threads, most of the thread is waiting for the mutex, get access to a Shared resource serial, parallel variable system throughput is not along with the increase with the increase of the thread.
Moreover, multithreaded development generally introduces synchronization primitives to protect concurrent access to shared resources, which also reduces the debuggability and maintainability of system code. To avoid these problems, Redis goes straight to single-threaded mode.
2. Why is single-threaded Redis so fast?
In general, single-threaded processing power is much worse than multi-threaded processing power, but Redis can use the single-threaded model to achieve hundreds of thousands of processing power per second. Why? In fact, it’s a combination of Redis’s many design choices.
On the one hand, most of Redis operations are done in memory, coupled with its use of efficient data structures, such as hash tables and hop tables, which is an important reason for its high performance. On the other hand, Redis adopts the multiplexing mechanism, so that it can concurrently process a large number of client requests in network IO operations, and achieve high throughput. Next, we will focus on the multiplexing mechanism. First, we need to understand the basic IO model of network operations and potential choke points. After all, Redis uses a single thread for IO, and if the thread is blocked, it can’t be multiplexed.
Basic IO model and choke points
To process a Get request, you need to listen to the client request (bind/listen), establish a connection with the client (accept), read the request from the socket (RECV), parse the client to send the request (parse), read the key value data (Get) according to the request type. Finally, send data is written back to the socket.
The following figure shows this process, where Bind/Listen, Accept, RECv, Parse, and Send are network IO processing, and GET are key-value data operations. Since Redis is single-threaded, the most basic implementation is to perform these operations in sequence in a single thread.
However, there are potential choke points in the network IO operation here, namely Accept () and recv(). When Redis listens for a connection request from a client, but fails to establish a connection, it blocks the accept() function, causing other clients to fail to establish a connection with Redis. Similarly, when Redis reads data from a client via recv(), Redis blocks at recv() if the data never arrives.
This causes the entire Redis thread to block and cannot process other client requests, which is inefficient. Fortunately, however, the socket network model itself supports non-blocking mode.
Non-blocking mode
The non-blocking mode setting of Socket network model is mainly reflected in three key function calls. If you want to use Socket non-blocking mode, you must understand the return type and setting mode of these three functions. Now, let’s focus on them.
In the socket model, different operation calls will return different sockets (regarded as the endpoint of two-way communication between processes of different hosts, simply speaking, it is a convention of the two sides of communication, using the socket related functions to complete the communication process). Type. The socket() method returns an active socket and calls listen() to convert the active socket to a listening socket, which listens for connection requests from clients. Finally, the accept() method is called to receive the incoming client connection and return the connected socket.
For listening sockets, we can set non-blocking mode: when Redis calls Accept () but no connection request ever arrives, the Redis thread can go back and process other operations instead of waiting forever. Note, however, that a listening socket already exists when accept() is called.
While the Redis thread does not have to continue to wait, there is always a mechanism to continue to wait for subsequent connection requests on the listening socket and notify Redis when a request is made.
Similarly, we can set non-blocking mode for connected sockets: after Redis calls recv(), if no data ever arrives on the connected socket, the Redis thread can also return to process other operations. We also need mechanisms to continue listening for the connected socket and notify Redis when data arrives.
This ensures that the Redis thread does not wait at a choke point, as in the basic IO model, nor does it cause Redis to fail to process connection requests or data that actually arrive.
High performance I/O model based on multiplexing
The IO multiplexing mechanism refers to the process of multiple I/O streams by a single thread, and is commonly known as the SELECT /epoll mechanism. In simple terms, this mechanism allows multiple listening sockets and connected sockets to exist simultaneously in the kernel, in the case that Redis only runs on a single thread. The kernel always listens for connection requests or data requests on these sockets. As soon as a request arrives, it is handed over to the Redis thread for processing, which implements the effect of one Redis thread processing multiple IO streams.
The following figure shows the Redis IO model based on multiplexing. The multiple FDS in the figure are just the same as the multiple sockets. The Redis network framework calls the epoll mechanism to have the kernel listen for these sockets. At this point, the Redis thread does not block on a particular listener or connected socket, that is, on a particular client request processing. Because of this, Redis can connect to multiple clients at the same time and process requests, increasing concurrency.
In order to notify the Redis thread when a request arrives, SELECT /epoll provides an event-based callback mechanism that calls the corresponding handler for different events.
So how does the callback mechanism work? In fact, select/epoll triggers an event whenever it detects that a request has arrived on the FD.
These events are put into an event queue, which is continuously processed by a single Redis thread. This way, Redis doesn’t have to keep polling to see if a request actually happened, which can avoid wasting CPU resources. At the same time, Redis invokes the corresponding handler function when processing the event in the event queue, which implements event-based callback. Because Redis is always processing event queues, it can respond to client requests in a timely manner, improving Redis response performance.
For your convenience, LET me explain the connection request and read data request as examples.
These two requests correspond to Accept and Read events, respectively, for which Redis registers Accept and GET callbacks. When the Linux kernel listens for a connection request or Read request, the Accept and Read events are triggered, and the kernel calls back the corresponding Accept and GET functions of Redis for processing.
It’s like a patient going to the hospital. Each patient (equivalent to a request) needs to be triaged, temperature checked, registered, etc., before the doctor can actually diagnose them. If all this work is done by doctors, doctors will be less efficient. As a result, hospitals have set up a triage desk that handles the pre-diagnosis work all the time (similar to the Linux kernel listening for requests) and then passes it on to the doctor for actual diagnosis. In this way, even a single doctor (equivalent to a single Redis thread) can improve efficiency.
However, it is important to note that the multiplexing mechanism applies even if different operating systems are deployed in your application scenario. Because there are many different implementations of this mechanism, including Select and Epoll implementations based on Linux, KQueue implementations based on FreeBSD, and Evport implementations based on Solaris, you can choose the appropriate multiplexing implementation based on the actual operating system running Redis.