Netty source code analysis – epollWait and wake up for EPOLL

preface

Eventfd and Timerfd are used in EpollEventLoop select. This select refers to one of the three things that the Reactor thread does in our original article.

In the previous article, eventFD was primarily used to wake up epollWait, while Timerfd was primarily used for timeout control due to its blocking until timeout nature. EPOLL has a timeout control, so how can timerfd be used to control the timeout?

To answer this question in advance, timerfd supports the nanosecond level, whereas epoll_wait takes the millisecond level. So timerfd is used to control the timeout, and the epoll_wait parameter is either 0 (return immediately without an event) or -1 (wait forever).

The source code

Let’s look at the EpollEventLoop initialization method:

FileDescriptor epollFd = null; FileDescriptor eventFd = null; FileDescriptor timerFd = null; Try {// initialize epoll this.epollfd = epollFd = Native.newePollCreate (); // Initialize eventfd this.eventfd = eventfd = Native.neweventfd (); EpollCtlAdd (epollfd.intValue (), eventfd.intValue (), Native.epollin); } catch (IOException e) {} // Initialize timerfd this.timerfd = timerfd = Native.newTimerfd (); EpollCtlAdd (epollfd.intValue (), timerfd.intValue (), timerfd.intValue (), Native.EPOLLIN | Native.EPOLLET); } catch (IOException e) { } success = true; } finally { }Copy the code

As you can see here, ePoll listens for IO events in Timerfd and Eventfd, which is important because we’ll see both file descriptors in the underlying system call (C code).

So if you look at the run method, it’s going to be the same for thousands of years and it’s going to be an endless loop until we shutdown. A policy is calculated to indicate what to do first, either SELECT or BUSY_SELECT. The rule is that if there is no task, the select will be blocked. If there is no task, the select will not be blocked. If there is no task, the select will be processed.

Let’s look at the two select methods:

case SelectStrategy.BUSY_WAIT:
    strategy = epollBusyWait();
    break;

case SelectStrategy.SELECT:
    strategy = epollWait(WAKEN_UP_UPDATER.getAndSet(this, 0) == 1);
Copy the code

The first is called epollBusyWait, and the second is called epollWait. Notice that there are two different calls at the bottom. Let’s look at epollWait first, because epollBusyWait only optimizes the training through the underlying instructions.

EpollWait (epollFd, events, timerFd, delaySeconds, delayNanos) :

int ready = epollWait0(epollFd.intValue(), events.memoryAddress(), events.length(), timerFd.intValue(), timeoutSec, timeoutNs); Throw newIOException("epoll_wait", ready); if (ready < 0) {throw newIOException("epoll_wait", ready); } return ready;Copy the code

Calling the native method directly will eventually execute the netty_epoll_native_epollWait0 function in the C language file. Global search can be found, look at the implementation:

struct epoll_event *ev = (struct epoll_event*) (intptr_t) address; int result, err; If (tvSec == 0 &&tvnsec == 0) {return do {result = epoll_wait(efd, ev, len, 0); if (tvSec == 0 &&tvnsec == 0) {return do {result = epoll_wait(efd, ev, len, 0); if (result >= 0) { return result; } while((err = errno) == EINTR); } else { if (tvSec ! = ((jint) -1) && tvNsec ! = ((jint) -1)) { struct itimerspec ts; memset(&ts.it_interval, 0, sizeof(struct timespec)); ts.it_value.tv_sec = tvSec; ts.it_value.tv_nsec = tvNsec; // Set our timeout, where the second parameter 0 means counting from the current time, Ts. Timeout if the SEC seconds (timerfd_settime (timerFd, 0, & ts, NULL) < 0) {netty_unix_errors_throwChannelExceptionErrorNo (...). ; return -1; Result = epoll_wait(efd, ev, len, -1); if (result > 0) { if (result == 1 && ev[0].data.fd == timerFd) { uint64_t timerFireCount; // We are in ET mode, so we need to read out the values so that we can be notified when new data comes in. Check out my previous article result = read(timerFd, & Timerfirecounts, sizeof(Uint64_t)) for details on how ET would work; return 0; } return result; } } while((err = errno) == EINTR); } return -err;Copy the code

There are two more branches, the first of which is simpler and returns immediately (the last argument to epoll_wait is 0). We are mainly looking at blocking, and here we use timerfd as a timeout control.

As I mentioned in the EpollEventLoop initialization, in addition to the epoll file descriptor, we initialized two file descriptors, eventFD and Timerfd, and both file descriptors were given to EPoll to listen for IO. Let’s first look at how timerFD controls timeouts.

Result = epoll_wait(efd, ev, len, -1); . Here we look at the logic of this section in two large chapters.

C language source section

From here you get into the heart of the whole control of epoll.

Select to events there are several cases:

  1. selectAll of themsocketIO.result > 0And whenresult = 1whenresult == 1 && ev[0].data.fd == timerFdReturns thefalse. Because issocketStudent: IO, soevDoes not exist in thetimerfd.
  2. selectTo the event, onlytimerfdIO of, we just said,timerfdWhen the timeout period is reached, write back a timeout number totimerfdFile, so this scenario is actually timeout.result == 1 && ev[0].data.fd == timerFdAt that point I’m going to returntrueBecause onlytimerfdTimeout write a data into, byepollIf yes, the returned fd is of coursetimerfd.
  3. selectTo the events, there aresocketI/O, I/OtimerfdI/O event of.result == 1 && ev[0].data.fd == timerFdAt that point I’m going to returnfalse. becauseresultAt least 2.

Ev [0].data.fd == timerFd returns false even if result is 1 (only one socket event).

In the second case, only the timerfd event, i.e., timed out, enters the following code block:

uint64_t timerFireCount;
result = read(timerFd, &timerFireCount, sizeof(uint64_t));
return 0;
Copy the code

Since we are out of time, read returns immediately. Note that here we have to read out the contents of timerfd, because we are in ET mode, so that we can receive the next notification. For those of you who have forgotten how ET works, please refer to my previous article on how ET and LT work.

Here we see how timerFD controls the timeout of epoll. We use epoll to listen to timerfd and then set the timerfd timeout, which is exactly how long we want epoll to block select. Result = epoll_wait(EFD, ev, len, -1); result = epoll_wait(efD, ev, len, -1); The block timeout is only that long, and then it is woken up (and returns the descriptor of timerfd)!

Netty uses epoll to listen to timerfd for timeout control. The reason for using timerfd is explained above, because timerfd can be controlled at the nanosecond level, whereas epoll_wait calls can only be controlled at the millisecond level.

In the third case, both socket events are mixed with timerFD events. This is just like the first one, it’s going to go straight back.

If the timerfd event is not read, it will not receive any new notifications in the future. If the timerFD event is not read, it will not receive any new notifications. If you can think of this question, you have thought carefully, this question will be found in the Java section of the source code.

Java source section

If we look at the processReady method, we successfully return result without a value of 0, which will enter here. As we said above, the third case may be the socket IO event mixed with the TimerFD event, and see how to handle this:

for (int i = 0; i < ready; i ++) {
    final int fd = events.fd(i);
    if (fd == eventFd.intValue()) {
        // 这里我们后面要说道eventfd的作用时再解释
        Native.eventFdRead(fd);
    } else if (fd == timerFd.intValue()) {
        // 如果socket和timerfd的IO混在一起,我们通过Reactor线程调用一次read
        // 这样我们以后还可以在ET下收到timerfd的同志
        Native.timerFdRead(fd);
    } else {
        final long ev = events.events(i);
        AbstractEpollChannel ch = channels.get(fd);
        if (ch != null) {
            AbstractEpollUnsafe unsafe = (AbstractEpollUnsafe) ch.unsafe();
            if ((ev & (Native.EPOLLERR | Native.EPOLLOUT)) != 0) {
                // 写准备就绪
                unsafe.epollOutReady();
            }
            if ((ev & (Native.EPOLLERR | Native.EPOLLIN)) != 0) {
                // 读准备就绪
                unsafe.epollInReady();
            }
            if ((ev & Native.EPOLLRDHUP) != 0) {
                // 对端关闭
                unsafe.epollRdHupReady();
            }
        } else {
            try {
                // 如果channel是null,我们就不再需要关心这个channel的事件了
                // 这时候我们把这个channel对应的fd从epoll中移除
                Native.epollCtlDel(epollFd.intValue(), fd);
            } catch (IOException ignore) {
            }
        }
    }
}
Copy the code

In the third case above, we also need to read the data from timerFD, so that we can control the next time we receive the IO from TimerFD. However, the events we received were mixed with THE IO events of TimerFD and socket, so we performed the timerFD read operation in the Reactor thread. No matter what, read the timerFD data that was written when it timed out.

We also see this in the code:

if (fd == eventFd.intValue()) {
    Native.eventFdRead(fd);
}
Copy the code

This involves the wake up process, which we’ll talk about in the next section.

The next step is to handle the SOCKET’s I/O events. Here we can see that we handled the write and read events as well as the EPOLLRDHUP event. This event (and EPOLLIN) is triggered when the peer is normally shut down. EPOLLRDHUP is a read event, and as we’ll see in Netty, if the channel is active, the read event will be processed.

We’ll talk about event handling later, but here we’ll look at what EventFD can do.

Use eventFD to control the epoll wake up

Eventfd is also handed over to epoll to listen on initialization. We’ve already talked about usage, which means one side can write and the other side can read. Epoll poll_wait = poll_wait = poll_wait = poll_wait = poll_wait = poll_wait = poll_wait = poll_wait Because epoll_wait at least returns eventFD!

We guessed this step and looked directly at the wakeup method, because we said earlier in the Reactor mechanism that if a task comes in, we need to wakeup the blocking select to prevent our new task from being blocked and never having a chance to execute: wakeup:

if (! inEventLoop && WAKEN_UP_UPDATER.compareAndSet(this, 0, 1)) { Native.eventFdWrite(eventFd.intValue(), 1L); }Copy the code

Write data to eventFD to wake up epoll_wait. We need to keep in mind that Netty’s epoll uses ET by default. After writing data, we must read away the old data in order to receive the IO notification from EventFD next time. In this case, we combine the problem to be solved in the previous section:

if (fd == eventFd.intValue()) {
    Native.eventFdRead(fd);
}
Copy the code

Needless to say, the same routine as TimerFD, read data away in order to get new data.

conclusion

Netty’s epoll and NIO both rely on the Reactor model, as does KQueue. The Reactor thread has three tasks, including wake logic, that never changes. However, epoll relies on EventFD and TimerFD, while NIO relies on the selector wakeup.

In general, the functions of the two underlying FDS are as follows:

  • eventfd: to be able to wake up blocking directlyselect.
  • timerfd: to be able to wake up blocking periodicallyselect.

At this point, the entire epoll select is finished.