Evolution of redis IO model

preface

As one of the most widely used NOSQL databases, Redis has undergone many upgrades, both large and small. Prior to version 4.0, single-threaded +IO multiplexing enabled redis performance to reach a very high level. The author also stated that it was designed to be single-threaded because the bottleneck of Redis is not on the CPU, and single-threaded does not need to take into account the lock overhead of multi-threading. However, over time, the single thread has become less and less suitable for some applications, such as redis4.0, which introduces an asynchronous thread in response to the main thread blocking caused by large key deletion. Multi-threaded mode is also introduced in ReDIS6.0 for single threads that cannot meet higher concurrency due to their inability to take advantage of multi-core CPUS. So it is becoming less and less accurate to say that Redis is single-threaded.

The event model

Redis itself is an event driver that listens for file events and time events. File event is actually the abstraction of socket, which abstracts each socket event into file event. Redis developed its own network event processor based on Reactor model. So what is the Reactor schema?

communication

Consider the question, how does our server receive our data? First of all, the two sides should establish a TCP connection. After the connection is established, data can be sent and received. The sender sends data to the socket buffer, waits for the system to remove the data from the buffer, and then sends the data through the network adapter. After receiving the data, the network adapter of the receiver copies the data to the socket buffer, and then waits for the application program to retrieve the data. This is the data sending process.

The cost of copy data

Because of the system call involved, the whole process can find that a copy of data needs to be copied from the user-mode socket to the kernel-mode socket, and then copied from the kernel-mode socket to the user-mode process, which is the cost of data copy.

How does the data know which socket to send to

The kernel maintains so many sockets, how to know which socket to deliver the data from the network adapter? The answer is port, socket is a quad: IP (client) +port (client) + IP (server) +port (server) That’s why a single computer can have multiple programs running at the same time.

How to notify the program to fetch data from the socket

When data has been copied from the network adapter to the corresponding socket buffer, how to notify the program to fetch? If the socket data has not arrived yet, what is the program doing? Here actually involves the CPU to the process scheduling problem. From the CPU’s point of view, processes can be running, ready, or blocked.

The ready state: The process is waiting to execute, the resources are ready, and the rest is waiting for the CPU to schedule.
Running state: Running process. The process is being scheduled by the CPU.
Blocking state: Because something is blocking, the CPU is not occupied, and is waiting for some event to complete.

When there are multiple running processes, due to the CPU’s time-slice technology, the running processes are all executed by the CPU for a period of time, as if they were running at the same time. This is called concurrency. When we create a socket connection, it looks something like this:

sockfd = socket(AF_INET, SOCK_STREAM, 0)
connect(sockfd, ....)
recv(sockfd, ...)
doSometing()
Copy the code

The operating system creates a FD handle for each socket. This FD points to the socket object we created, which contains the buffer, the process’s wait queue… . For a process that creates a socket, if the data does not arrive, it will be stuck in the RECV. The process will hang in the queue of the socket object. To the CPU, the process is blocked.

When the data arrives, the nic tells the CPU, and the CPU executes the interrupt procedure, copies the data to the socket buffer, wakes up the process in the waiting queue, and puts the process back into the run queue. When the process is run by the CPU, it can perform the final read operation. There are two problems with this model:

Recv can only receive one FD. What if you want to recV multiple FDS? It’s a little less efficient to go through the while loop.
In addition to reading data, the process also needs to deal with the following logic. When data does not arrive, the process is in a blocking state, that is, using a while loop to listen to multiple FD, whether the other socket is blocked because one of the RECV blocks, and the whole process is blocked.

In view of the above problems, Reactor pattern and IO multiplexing technology emerged.

Reactor

Reactor is a high performance MODEL for I/O processing. In Reactor, the main program only listens for events on file descriptors, which is important. The main program does not read or write file descriptors. So who does the read-write of file descriptors? When a writable event occurs on a socket, the main program notifies the worker. It is the worker program that reads and writes data from the socket. The advantage of this mode is that the main program can handle concurrent, not blocking, the main program is very lightweight. Events can be queued to be executed by workers. The Reactor model registers the event and its handler (callback func) to the Reactor, for example:

type Reactor interface{
   RegisterHandler(WriteCallback func(a),"writeEvent");
   RegisterHandler(ReadCallback func(a),"readEvent");
}
Copy the code

When a client sends a set key value command to Redis, it writes such a command request to the socket buffer. When the Reactor listens to the corresponding socket buffer, the socket is readable, and the Reactor triggers a read event. The command is parsed and executed through the ReadCallback callback function injected in advance. When the socket buffer has enough space to write, the Reactor generates a writable event and executes the injected WriteCallback callback. After the set key value is executed, the worker writes OK to the socket buffer. Finally, the client takes the OK from the socket buffer. In Redis, both ReadCallback and WriteCallback are done by one thread, and if they arrive at the same time, they must queue. This was the default mode prior to Redis6.0, and is the most popular single thread Redis.

The whole process shows that the Reactor main program is very fast, because it does not need to perform real read and write, the rest is the work of the working program: I/O reading and writing, command parsing, command execution, result return.. This is very important.

IO multiplexer

According to the above, we know Reactor is an abstract theory and a model. How to realize it? How to listen for incoming socket events? . The simplest solution is polling. Since we don’t know when the socket event will arrive, we will keep asking the kernel. Suppose there are 1W socket connections, then we will have to ask the kernel 1W times, which is obviously expensive.

The switch from the user state to the kernel state involves the context switch. The CPU needs to protect the scene, save the state of the register before entering the kernel, and recover the state from the register after the kernel returns. This is not a small overhead.

Because the traditional polling method is too expensive, so IO multiplexers appear, IO multiplexers are Select, Poll, EVport, KQueue, epoll. Redis I/O multiplexing program implementation source code with the #include macro defined the corresponding rules, the program will automatically select the system at compile time the highest performance I/O multiplexing function library as the underlying Redis I/O multiplexing program implementation:

// Include the best multiplexing layer supported by this system. The following should be ordered by performances, descending.
# ifdef HAVE_EVPORT
# include "ae_evport.c"
# else
    # ifdef HAVE_EPOLL
    # include "ae_epoll.c"
    # else
        # ifdef HAVE_KQUEUE
        # include "ae_kqueue.c"
        # else
        # include "ae_select.c"
        # endif
    # endif
# endif

Copy the code

Select is the first generation of IO multiplexer. Select is how to solve the round query problem from the user state to the kernel state.

select

Since polling is cumbersome, select gives the kernel the FDS set of a batch of sockets at once. The kernel then iterates over the FDS and determines the read-write status of each FD. When the state of a PARTICULAR FD is satisfied, it is up to the user to fetch it.

fds = []int{fd1,fd2,... }for {
	select (fds)
	for i:= 0; i < len(fds); i++{
		if isReady(fds[i]) {
			   read()
     }
	 }
}
Copy the code

Disadvantages of SELECT: When a process is listening on multiple sockets, the select process will add the queue of all the sockets in the kernel to the process (many-to-one), so that when one socket has data, it will tell the CPU, and the process will wake up from the blocking state, waiting to be scheduled by the CPU. It also removes the process from all socket queues. When the CPU runs the process, the FDS set is sent to the socket, and we do not know which FD is receiving data, so we have to iterate through it once. This is a waste of data for FDS that do not receive data. Since each select iterates through the socket set, a large number of sockets will affect the overall efficiency, which is why select supports a maximum of 1024 concurrent sockets.

epoll

Would it be more efficient if there was a way to trigger the corresponding SOCKET FD instead of blindly polling a message from a socket instead of traversing all sockets? Epoll has emerged to solve this problem:

epfd = epoll_create()
epoll_ctl(epfd, fd1, fd2...)
for {
  epoll_wait()
  for fd := range fds {
    doSomething()
  }
}
Copy the code

An epoll object is first created using epoll_create, which returns a FD handle that, like socket handles, is managed under the FDS collection.
Use epoll_ctl to bind the socket FD to be listened to to the epoll object.
Use epoll_wait to fetch a socket FD with data. If no socket has data, this block will be performed. If there is data, the set of FDS with data will be returned.

How does epoll do it?

First, the kernel socket is no longer bound to the user’s process, but to epoll. In this way, when the socket data arrives, the interrupt program adds the corresponding SOCKET FD to the epoll ready pair column. The socket queue is filled with data, and the process associated with epoll is also woken up. While the CPU is running the process, it can fetch the socket with the event directly from the epoll ready queue and perform the following read. Down the whole process, it can be found that the user program does not have brainless traversal, the kernel does not have traversal, through interruption to achieve “who has data processing who” efficient performance.

Single-thread to multi-thread evolution

Single thread

Combined with Reactor’s idea and high performance Epoll IO model, Redis developed a set of high performance network IO architecture: single-thread I/O multiplexing. The I/O multiplexer is responsible for receiving network I/O events, which are queued to be processed in the way of queue. This is the most original single-thread model, why use single-thread? Because the single-thread Redis has been able to reach 10W QPS load (if do some complex collection operations, will be reduced), to meet most of the application scenarios, at the same time, single-thread does not have to consider the lock problem caused by multi-threading, if not meet your requirements, then you can also configure sharding mode, Having different nodes handle different Sharding keys allows your Redis Server to increase its load linearly as the nodes grow.

Asynchronous thread

One problem in single-threaded mode is that when deleting a large set or hash is time consuming (not contiguous memory), the single-threaded representation is that other commands are queued and have to wait. When the number of waiting orders increases, bad things happen. Redis4.0 creates an asynchronous thread for large key deletions. Use unlink instead of del to perform the delete, so that when we unlink, Redis will detect if the deleted key needs to be executed in an asynchronous thread (e.g., the number of sets exceeds 64…). If the value is large enough, it will be processed in an asynchronous thread without affecting the main thread. Also, flushall and FlushDB support asynchronous mode. In addition, Redis supports the mode of whether asynchronous threads are required for processing in certain scenarios (off by default) :

lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
Copy the code

Lazyfree-lazy-eviction: Asynchronous deletion will be enabled when Redis is designed to exclude memory from maxMemory, the disadvantage of asynchronous deletion is that memory will not be freed if it is not deleted in time.
Lazyfree-lazy-expire: If a key with TTL is cleared by Redis, it will not be deleted synchronously but will be deleted by an asynchronous thread.
Up – lazy – flush: When the slave node is added, it will flush to clear its own data. If the flush takes a long time, more data will accumulate in the replication buffer. Later, data will be synchronized relatively slowly. The flush of the slave can be handed over to the asynchronous server to speed up synchronization.
Lazyfree-lazy-server-del: This option is used for directives such as rename a fieldRENAME key newkeyIf the newkey exists in b, rename will delete the old value of the newkey. If the old value is large, rename will block. When this option is turned on, the asynchronous thread will do the same, so that the main thread will not block.

multithreading

Redis single thread + asynchronous thread + sharding has been able to meet the majority of applications, but nothing is better, Redis in 6.0 or introduced multi-threaded mode. By default, multithreaded mode is turned off.

# io-threads 4 # Number of work threads # io-threads-do-reads noCopy the code

Where does multithreading come in?

We know from the above that when we read data from a socket, we need to copy data from user space to user space, when we write data to the socket, we need to copy data from user space to the kernel. Redis itself is very fast, slow so the main socket IO related operations. When our QPS are very large, single-threaded Redis cannot take advantage of the benefits of multi-core CPUS, so it is a good choice to leverage multi-core CPUS by enabling multiple threads to share IO operations.

So for instance if you have a four cores boxes, try to use 2 or 3 I/O threads, if you have a 8 cores, try to use 6 threads.

When enabled, the official recommendation is 2-3 IO threads for a 4-core machine, and 6 IO threads for an 8-core machine.

Principle of multithreading

Note that Redis multithreading only handles socket IO reads and writes in multiple threads, and actually runs instructions in one thread.

Redis Server listens for client requests through EventLoop. When a request arrives, the main thread does not immediately parse it, but puts it in the global read queue clients_pending_read. And label each client with the CLIENT_PENDING_READ identifier.
The main thread then assigns all tasks to the I/O thread and the main thread itself through a round-robin (RR) policy.
Each thread (including the main thread and child thread) reads and parses only the request parameters (no command is executed) through the client’s CLIENT_PENDING_READ identifier, based on the task assigned to it.
Each I/O thread maintains a local queue IO_threads_list and a local atomic counter io_threads_pending. Tasks between threads are isolated and do not overlap. When the I/O thread completes its task, Io_threads_pending [index] = 0. When all io_threads_pending values are 0, the task is completed.
When all reads are completed, the main thread executes by traversing the clients_pending_read queueThe real execThe action.
After the command has been read, parsed, and executed, it is time to send the resulting response to the client. The main thread adds the clients that need to respond to the global clients_pending_write queue.
The main thread traverses the clients_pending_write queue and uses a round-robin (RR) policy to assign all tasks to the I/O thread and the main thread, which write data back to the client.

In multithreaded mode, each IO thread is responsible for processing its own queue and does not interfere with each other. IO threads are either reading or writing at the same time, but do not read or write at the same time. The main thread will only attempt to assign tasks again after all the child threads have completed their tasks. The final command execution is done by the main thread itself, and the whole process does not involve locking.