preface
In general, Redis is single-threaded, which means that the network IO and read/write key pairs of Redis are completed by a single thread. This is also the main process for Redis to provide external key and value storage service. But other functions, such as persistence and cluster data synchronization, are actually performed by additional threads.
Therefore, Redis is not single-threaded in the full sense of the word, but generally referred to as a typical example of single-threaded performance. So, a lot of people ask, why single threads? Why is a single thread so fast?
Why does Redis use single threads
First we need to understand the overhead of multithreading. Usually write procedures many people feel that the use of multithreading, can increase the throughput of the system, or increase the system scalability. Indeed, for a multithreaded system, under a reasonable allocation of resources, it is possible to increase the number of resource entities in the system to handle requests, and thus increase the number of requests that the system can handle at the same time, that is, throughput. However, if you don’t have good system design experience, you actually end up with an increase in throughput when you first increase the number of threads. However, when the thread is further increased, the throughput rate of the system will increase slowly, or even decrease.
Why does this happen? The key performance bottleneck is the simultaneous access of multiple threads to critical resources in the system. For example, when multiple threads need to modify a shared data, additional mechanisms such as mutex are needed to ensure that the resource is correct, and this additional mechanism requires additional overhead. For example, Redis has a data type, List, which provides queue exit (LPOP) and queue entry (LPUSH) operations. Assume that Redis has a multi-threaded design. Now assume that two threads, T1 and T2, T1 perform an LPUSH operation on a List, and T2 perform an LPOP operation on the List and subtract one from the queue length. To keep the queue length correct, we need to serialize the LPUSH and LPOP of these two threads, otherwise we might get the wrong length results. This is the problem of concurrent access control of shared resources often encountered in multithreaded programming.
And concurrency control is always a difficult problem in multi-threaded development. If you have no design experience and simply use a coarse-grained mutex, the undesirable result will be that even if you add more threads, most of the threads are still waiting for the mutex to access critical resources, resulting in parallelism and serial transformation, and the system throughput does not increase with the increase of threads.
Why is single-threaded Redis so fast
Typically, single-threaded processing is much less powerful than multi-threaded processing, but Redis can use the single-threaded model to achieve 100,000 levels of processing power per second. Why is that?
On the one hand, Redis does most of its operations in memory, plus it uses efficient data structures (such as hash tables, hop tables).
On the other hand, Redis adopts the multiplexing mechanism, which can concurrently process a large number of client requests in network IO operations, thus achieving high throughput. So why is Redis multiplexing?
In order to process a GET request, as shown in the figure above, Redis listens to the client request (bind/listen) and then establishes a connection with the client (accept). The socket reads the request (RECV), parses the request sent by the client, reads the key value data (GET) according to the request type, and finally returns the result to the client (SEND). Accept () and recv() are blocking operations by default. The accept() function is blocked when Redis listens for a connection request from a client, but fails to establish a connection. This can cause other clients to fail to establish a connection with Redis. Similarly, when Redist reads data from a client via recv(), Redis blocks at recv() if the data never arrives. As a result, the entire Redis thread is blocked and unable to process other client requests, which is extremely inefficient. Therefore, you need to set the socket to non-blocking.
Non-blocking mode of Redis
Non-blocking mode setting of the socket network model. FCNTL is generally called. Sample code is as follows
int flag flags = fcntl(fd, F_GETFL, 0); if(flags < 0) { ... } flags |= O_NONBLOCK; if(fcntl(fd, F_SETFL, flags) < 0) { ... return -1; }
Copy the code
The same non-blocking code in Redis’s anet.c file is similar logic. The anetSetBlock function is defined as follows:
int anetSetBlock(char *err, int fd, int non_block) { int flags; /* Set the socket blocking (if non_block is zero) or non-blocking. * Note that fcntl(2) for F_GETFL and F_SETFL can't be * interrupted by a signal. */ if ((flags = fcntl(fd, F_GETFL)) == -1) { anetSetError(err, "fcntl(F_GETFL): %s", strerror(errno)); return ANET_ERR; } /* Check if this flag has been set or unset, if so, * then there is no need to call fcntl to set/unset it again. */ if (!! (flags & O_NONBLOCK) == !! non_block) return ANET_OK; if (non_block) flags |= O_NONBLOCK; else flags &= ~O_NONBLOCK; if (fcntl(fd, F_SETFL, flags) == -1) { anetSetError(err, "fcntl(F_SETFL,O_NONBLOCK): %s", strerror(errno)); return ANET_ERR; } return ANET_OK; }Copy the code
When the socket is set to non-blocking mode and Redis invokes accept() but no connection request ever arrives, the Redis thread can go back and process other operations instead of waiting forever. Similarly, non-blocking mode can be set for connected sockets, and if recv() is called, the Redis thread can also return to handle other operations if no data has ever arrived on the connected socket. But we also need mechanisms to continue listening for that connected socket and notify Redis when data arrives. This ensures that the Redis thread does not wait at a chokepoint, as in the basic IO model, and does not cause Redis to fail to process connection requests that actually arrive.
Based on EPOLL mechanism
IO multiplexing in Linux refers to the ability of an executor to process multiple I/O streams at the same time. This is the commonly heard select/EPOLL mechanism. This mechanism allows multiple listening sockets and connected sockets to be allowed simultaneously in the kernel. The kernel always listens for connection requests on these sockets. Once a request arrives, it is handed over to the Redis thread.
The Redis network framework is based on the EPOLL mechanism, in which the Redis thread does not block on a particular listening or connected socket, and therefore does not block on a particular client request processing. Therefore, Redis can handle connection requests from multiple clients at the same time. The following figure
EPOLL provides a callback mechanism for events in order to notify the Redis thread when a request arrives. That is, the corresponding handlers are called for different events. So let’s talk about how it works. Okay
File events
Redis uses the following structure to record a file event:
/* File event structure */typedef struct aeFileEvent { int mask; /* one of AE_(READABLE|WRITABLE|BARRIER) */ aeFileProc *rfileProc; aeFileProc *wfileProc; void *clientData; } aeFileEvent;Copy the code
The mask is used to describe what happened in the structure:
-
AE_READABLE: The file descriptor is readable
-
AE_WRITABLE: the file descriptor is writable
-
AE_BARRIER: File descriptor blocks
So how does the callback mechanism work? RfileProc and wfileProc are callbacks to read and write events, respectively. Their corresponding functions are as follows
typedef void aeFileProc(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);
Copy the code
Event loop
Redis uses the following structure to record events registered in the system and their states:
/* State of an event based program */typedef struct aeEventLoop { int maxfd; /* highest file descriptor currently registered */ int setsize; /* max number of file descriptors tracked */ long long timeEventNextId; time_t lastTime; /* Used to detect system clock skew */ aeFileEvent *events; /* Registered events */ aeFiredEvent *fired; /* Fired events */ aeTimeEvent *timeEventHead; int stop; void *apidata; /* This is used for polling API specific data */ aeBeforeSleepProc *beforesleep; aeBeforeSleepProc *aftersleep; } aeEventLoop;Copy the code
The main elements in this structure are the file event pointer Events and the time event header pointer timeEventHead. The file event pointer points to a fixed-size (configurable) array with the file descriptor as a subscript to get the event object corresponding to the file.
AeApiAddEvent function
This function is mainly used to associate events with EPOLL, so the CTL method that calls EPOLL is defined as follows:
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) { aeApiState *state = eventLoop->apidata; struct epoll_event ee = {0}; /* avoid valgrind warning */ /* If the fd was already monitored for some event, We need a MOD * operation. Otherwise we need an ADD operation. * This is a MOD action if an event/event is already associated. */ int op = eventLoop->events[fd].mask == AE_NONE ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; ee.events = 0; mask |= eventLoop->events[fd].mask; /* Merge old events */ if (mask & AE_READABLE) ee.events |= EPOLLIN; if (mask & AE_WRITABLE) ee.events |= EPOLLOUT; ee.data.fd = fd; if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1; return 0; }Copy the code
This is called when the Redis service creates a client request, registering a read event.
Redis calls prepareClientToWrite when it needs to write data to the client. This method mainly registers the write event corresponding to the FD.
If the registration fails, Redis does not write the data to the buffer.
If the corresponding suite word is writable, the Redis event loop writes the buffer’s new data to the socket.
Event registration function aeCreateFileEvent
This is the file event registration process, the function implementation is as follows
int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData){ if (fd >= eventLoop->setsize) { errno = ERANGE; return AE_ERR; } aeFileEvent *fe = &eventLoop->events[fd]; if (aeApiAddEvent(eventLoop, fd, mask) == -1) return AE_ERR; fe->mask |= mask; if (mask & AE_READABLE) fe->rfileProc = proc; if (mask & AE_WRITABLE) fe->wfileProc = proc; fe->clientData = clientData; if (fd > eventLoop->maxfd) eventLoop->maxfd = fd; return AE_OK; }Copy the code
This function first gets the file event object based on the file descriptor, then adds the file descriptor of interest to the operating system (using the addApiAddEvent function mentioned above), and finally logs the callback function to the file event object. Thus, a single thread can listen for multiple file events at the same time, which is called IO multiplexing.
AeMain function
The main loop of the Redis event handler
void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; while (! EventLoop - > stop) {/ / handle events aeProcessEvents (eventLoop, AE_ALL_EVENTS | AE_CALL_BEFORE_SLEEP | AE_CALL_AFTER_SLEEP); }}Copy the code
This method eventually calls epoll_wait() to fetch the event and execute it.
These events are put into an event queue, which is continuously processed by the Redis single thread. For example, when a read request arrives, the read request corresponds to a read event. Redis registers the GET callback function for this event. When the kernel listens for the arrival of a read request, the read event is triggered and the corresponding Redis get function is called back.
Return data to the client
After Redis completes the request, Redis does not register a write file event after processing a request and then write the result back to the client in the event callback function. When it detects file events, Redis processes them by calling the rReadProc or writeProc callback functions. After processing, the data that needs to be written back to the client is cached in the memory.
typedef struct client { ... list *reply; /* List of reply objects to send to the client. */ ... int bufpos; char buf[PROTO_REPLY_CHUNK_BYTES]; } client;Copy the code
The data sent to the client is stored in two places:
-
The reply pointer holds the object to be sent;
-
Buf holds the data to be returned, and BUfpos indicates the location of the last byte in the data.
Note: If you can store it in buF, try to store it in the BUF byte array. If you can’t store it in BUF, store it in the Reply object array.
The write back client handles the write back logic by calling the following functions before entering the next waiting file event
int writeToClient(int fd, client *c, int handler_installed) { while(clientHasPendingReplies(c)) { if (c->bufpos > 0) { nwritten = write(fd,c->buf+c->sentlen,c->bufpos-c->sentlen); if (nwritten <= 0) break; c->sentlen += nwritten; totwritten += nwritten; if ((int)c->sentlen == c->bufpos) { c->bufpos = 0; c->sentlen = 0; } } else { o = listNodeValue(listFirst(c->reply)); objlen = o->used; if (objlen == 0) { c->reply_bytes -= o->size; listDelNode(c->reply,listFirst(c->reply)); continue; } nwritten = write(fd, o->buf + c->sentlen, objlen – c->sentlen); if (nwritten <= 0) break; c->sentlen += nwritten; totwritten += nwritten; }}}