Recently in the UNIX network programming and research Redis implementation, Redis source code is very suitable for reading and analysis, which I/O multiplexing (Mutiplexing) part of the implementation is very clean and elegant, here would like to this part of the content of a simple arrangement. (Easter eggs at the end!)

Several I/O models

Why use I/O multiplexing in Redis?

First of all, the Redis is run in a single thread, all operations are carried out in accordance with the order linear, but due to the read and write operations waiting for user input or output is blocked, so the I/O operations in general often cannot return directly, which can lead to a particular file I/O block causes the entire process cannot provide services to other customers, And I/O multiplexing is to solve this problem.

Blocking I/O

Let’s take a look at how the traditional blocking I/O model works: When read or write is used to read or write a File Descriptor, if the current FD is unreadable or unwritable, the Redis service does not respond to other operations and becomes unavailable.

This is the traditional blocking model we use most often in programming:

The blocking model, while very common and well understood in development, is often not used when multiple client tasks need to be handled because it affects other FD corresponding services.

I/O multiplexing

There are many other I/O models, but none of them are covered here.

We need a more efficient I/O model to support multiple Clients of Redis (RedIS-CLI). Here is the I/O multiplexing model:

In the I/O multiplexing model, the most important function call is select, which monitors the readability and writability of multiple file descriptors at the same time. When some file descriptors are readable or writable, select returns the number of file descriptors that are readable and writable.

Select about the specific use of the method, there is a lot of information on the network, here is not more to expand the introduction;

There are other I/O multiplexing functions, epoll/kqueue/ EVport, that perform better than SELECT and support more services.

Reactor design pattern

The Redis service uses the Reactor approach to implement file event handlers (each network connection corresponds to a file descriptor)

The file event handler uses the I/O multiplexing module to listen for multiple FDS at the same time. When accept, Read, write, and close file events occur, the file event handler calls back to the FD-bound event handler.

Although the whole file event processor runs on a single thread, the I/O multiplexing module is introduced to realize the monitoring of multiple FD reads and writes at the same time, which improves the performance of the network communication model and ensures the simplicity of the whole Redis service implementation.

I/O multiplexing module

The I/O multiplexing module encapsulates the underlying I/O multiplexing functions such as SELECT, Epoll, Avport and Kqueue and provides the same interface for the upper layer.

Here we briefly describe how Redis wraps SELECT and epoll, and briefly understand the functionality of the module. The entire I/O multiplexing module smoothen out the differences between I/O multiplexing functions on different platforms and provides the same interface:

  • static int aeApiCreate(aeEventLoop *eventLoop)
  • static int aeApiResize(aeEventLoop *eventLoop, int setsize)
  • static void aeApiFree(aeEventLoop *eventLoop)
  • static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask)
  • static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int mask)
  • static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp)

At the same time, because the parameters required by each function are different, we use an aeApiState inside each sub-module to store the required context information:

// select
typedef struct aeApiState {
    fd_set rfds, wfds;
    fd_set _rfds, _wfds;
} aeApiState;

// epoll
typedef struct aeApiState {
    int epfd;
    struct epoll_event *events;
} aeApiState;
Copy the code

This context information is stored in the void *state of the eventLoop, is not exposed to the upper layer and is only used in the current submodule.

Encapsulating the SELECT function

Select can monitor FD readability, writability, and error conditions.

Before introducing how the I/O multiplexing module encapsulates the SELECT function, let’s look at the general flow of the select function:

int fd = /* file descriptor */

fd_set rfds;
FD_ZERO(&rfds);
FD_SET(fd, &rfds)

for(; ;) { select(fd+1, &rfds, NULL, NULL, NULL);if (FD_ISSET(fd, &rfds)) {
        /* file descriptor `fd` becomes readable */
    }
}
Copy the code
  1. Initialize a readablefd_setCollection that holds FDS that need to be monitored for readability;
  2. useFD_SET 将 fdjoinrfds;
  3. callselectMethods to monitorrfdsWhether FD in is readable;
  4. whenselectWhen you return, check the status of the FD and complete the operation.

In the ae_SELECT file of Redis, the code is organized in a similar order, first initializing RFDS and WFDS in the aeApiCreate function:

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));
    if(! state)return- 1; FD_ZERO(&state->rfds); FD_ZERO(&state->wfds); eventLoop->apidata = state;return 0;
}
Copy the code

AeApiAddEvent and aeApiDelEvent will modify the flag bit of the corresponding FD in FD_SET and FD_CLR:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    if (mask & AE_READABLE) FD_SET(fd,&state->rfds);
    if (mask & AE_WRITABLE) FD_SET(fd,&state->wfds);
    return 0;
}
Copy the code

The most important function in the entire AE_SELECT submodule is aeApiPoll. It is the part that actually calls the SELECT function. Its purpose is to add the corresponding FD to the aeEventLoop’s fired array when the I/O multiplexing function returns. And return the number of events:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, j, numevents = 0;

    memcpy(&state->_rfds,&state->rfds,sizeof(fd_set));
    memcpy(&state->_wfds,&state->wfds,sizeof(fd_set));

    retval = select(eventLoop->maxfd+1,
                &state->_rfds,&state->_wfds,NULL,tvp);
    if (retval > 0) {
        for (j = 0; j <= eventLoop->maxfd; j++) {
            int mask = 0;
            aeFileEvent *fe = &eventLoop->events[j];

            if (fe->mask == AE_NONE) continue;
            if (fe->mask & AE_READABLE && FD_ISSET(j,&state->_rfds))
                mask |= AE_READABLE;
            if(fe->mask & AE_WRITABLE && FD_ISSET(j,&state->_wfds)) mask |= AE_WRITABLE; eventLoop->fired[numevents].fd = j; eventLoop->fired[numevents].mask = mask; numevents++; }}return numevents;
}
Copy the code

Encapsulate the epoll function

Redis encapsulates epoll similarly, using epoll_create to create an EPFD used in epoll:

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));

    if(! state)return- 1; state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);if(! state->events) { zfree(state);return- 1; } state->epfd = epoll_create(1024); /* 1024 is just a hintfor the kernel */
    if (state->epfd == -1) {
        zfree(state->events);
        zfree(state);
        return- 1; } eventLoop->apidata = state;return 0;
}
Copy the code

Use epoll_ctl in aeApiAddEvent to add FDS to be monitored and events to be listened to in epFD:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    struct epoll_event ee = {0}; /* avoid valgrind warning */
    /* If the fd was already monitored for some event, we need a MOD
     * operation. Otherwise we need an ADD operation. */
    int op = eventLoop->events[fd].mask == AE_NONE ?
            EPOLL_CTL_ADD : EPOLL_CTL_MOD;

    ee.events = 0;
    mask |= eventLoop->events[fd].mask; /* Merge old events */
    if (mask & AE_READABLE) ee.events |= EPOLLIN;
    if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
    ee.data.fd = fd;
    if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return- 1;return 0;
}
Copy the code

Because epoll is slightly different from the SELECT mechanism, there is no need to traverse all FDS to see the read and write status when epoll_wait returns. An epoll_event array is provided when the epoll_wait function returns:

typedef union epoll_data { void *ptr; int fd; /* File descriptor */ uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; /* Epoll event */ epoll_data_t data; };Copy the code

It stores the epoll event (EPOLLIN, EPOLLOUT, EPOLLERR, and EPOLLHUP) that occurred and the FD in which the event occurred.

The aeApiPoll function simply adds the information stored in the epoll_event array to the eventLoop fired array, passing the information to the upper module:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, numevents = 0;

    retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    if (retval > 0) {
        int j;

        numevents = retval;
        for (j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events+j;

            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE;
            if(e->events & EPOLLHUP) mask |= AE_WRITABLE; eventLoop->fired[j].fd = e->data.fd; eventLoop->fired[j].mask = mask; }}return numevents;
}
Copy the code

Submodule selection

Because Redis needs to run on multiple platforms, in order to maximize the efficiency and performance of the execution, different I/O multiplexing functions will be selected as sub-modules according to the compilation platform to provide a unified interface to the upper layer. In Redis, we use macro definition to reasonably select different submodules:

#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif

Copy the code

Because the SELECT function is a system call in the POSIX standard and is implemented on different versions of the operating system, it is used as a guarantee:

Redis will preferentially select time complexity asI/O multiplexing functions as the underlying implementation, including in Solaries 10evportIn Linux,epollAnd macOS/FreeBSDkqueueThese functions all use internal kernel constructs and can serve hundreds of thousands of file descriptors.

However, it is selected if the current compilation environment does not have the above functionselectAs an alternative, its time complexity is poor because it scans all listened descriptors when used, and only 1024 file descriptors can be served at the same timeselectAs the first option.

conclusion

Redis for I/O multiplexing module design is very simple, through macro to ensure that THE I/O multiplexing module has excellent performance on different platforms, different I/O multiplexing functions packaged into the same API for the upper layer to use.

The whole module enables Redis to serve thousands of file descriptors while running in a single process, avoiding the increase in code implementation complexity caused by the introduction of multi-process applications and reducing the possibility of errors.