Compared with the two multiplexing methods, SELECT and poll, epoll has the best performance.

The use of the epoll

Epoll distributes I/O events by listening for multiple file descriptors registered. Unlike Poll, epoll provides not only the default level-triggered mechanism, but also the better-performing edge-triggered mechanism.

Epoll puts the events on the file descriptors that the user is interested in in a table of events in the kernel, eliminating the need to pass in a set of file descriptors or events with each call like select and poll. (Epoll instance is this event list.)

In order to use epoll for network programming, the following steps are usually required: 1) epoll_create, 2) epoll_ctl, and 3) epoll_wait

epoll_create

The function prototype

 int epoll_create(int size);
 int epoll_create1(int flags);
Copy the code

With the epoll_create() function, an epoll instance can be created. Starting with Linux2.6.8, the size parameter is automatically ignored, but the value still requires an integer greater than 0. The size parameter is simply used to give the kernel a hint about how big the event table needs to be.

This epoll instance is used to call epoll_ctl and epoll_wait. When the epoll instance is no longer needed, such as when the server is properly shut down, the close() function is called to release the epoll instance and the system kernel can reclaim the kernel resources allocated by the epoll instance.

The size parameter, in the original epoll_CREATE implementation, was used to tell the kernel what file description word size it expected to monitor. The kernel then used this information to initialize the kernel data structure. In the new implementation, this parameter is no longer needed because the kernel can dynamically allocate the required kernel data structure. We just need to be careful that size is set to an integer greater than 0 each time.

Epoll_create1 () is used in the same way as epoll(). If the flags input is 0, the kernel ignores epoll_create() automatically.

The return value

If successful, a value greater than 0 is returned indicating the epoll instance; On failure -1 is returned.

epoll_ctl

The function prototype

 int epoll_ctl(int epfd, int op, int fd, struct epoll_event* event);
Copy the code

After an epoll instance is created, you can add or remove monitored events to the epoll instance by calling epoll_ctl.

parameter

1) The EPfd argument is the epoll instance description word just created by calling epoll_CREATE (), that is, the epoll handle.

2) The op parameter indicates whether to add or remove a monitoring event. It has three options:

  • EPOLL_CTL_ADD: Registers the event corresponding to the file descriptor with the epoll instance
  • EPOLL_CTL_DEL: Deletes the event corresponding to the file descriptor to the epoll instance
  • EPOLL_CTL_MOD: modifies the event corresponding to the file descriptor

3) The fd parameter represents the file descriptor of the registered event

4) The event parameter represents the registered event type. In this structure, the data required by the user can be set. The most common is to use the FD field in the union structure to represent the file descriptor corresponding to the event.

 typedef union epoll_data {
      void        *ptr;
      int          fd;
      uint32_t     u32;
      uint64_t     u64;
  } epoll_data_t;
 ​
  struct epoll_event {
      uint32_t     events;      Epoll events Epoll events */
      epoll_data_t data;        /* User data variable User data */
  };
Copy the code

The types of epoll event and poll event are both mask-based event types. The event types are as follows:

  • EPOLLIN: indicates that the file description word can be read.
  • EPOLLOUT: indicates that the corresponding file description word can be written.
  • EPOLLRDHUP: indicates that one end of the socket is closed or half closed.
  • EPOLLHUP: the corresponding file description word is suspended.
  • EPOLLET: set to edge-triggered, default to level-triggered.

The return value

Returns 0 on success, -1 on failure, and sets errno.

epoll_wait

The function prototype

 int epoll_wait(int epfd, struct epoll_event* events, int maxevents, int timeout);
Copy the code

The epoll_wait() function, like the select and poll functions, suspends the caller’s process, waiting for the kernel I/O event to be distributed.

parameter

1) The EPfd parameter represents the epoll instance description word

2) The events parameter returns an array of I/O events to be processed in user space. The size of the array is determined by the return value of epoll_WAIT. Each element in the array is an I/O event to be processed. (This array is only used to output ready events detected by epoll_WAIT, unlike the array parameters of SELECT and poll, which are used to output both user-registered events and ready events detected by the kernel.)

Events indicates that the specific event type has the same value as the settable value in epoll_ctl.

Because epoll_wait returns only detected ready events, it can greatly improve the efficiency of an application’s indexing of ready file descriptors. (Both SELECT and poll require traversing all registered file descriptors and finding ready ones)

3) The maxEvents parameter is an integer greater than 0, indicating the maximum value that epoll_wait can return.

4) The timeout parameter is the timeout value of the epoll_wait blocking call. If this value is -1, it means that the call is blocked until an event occurs. If set to 0, it returns immediately, even if no I/O events have occurred.

The return value

Success returns a number greater than 0 indicating the number of file descriptors in place; Returns 0, indicating that the timeout period is up. On error -1 is returned.

LT (Level Trigger) and ET (Edge Trigger)

Epoll operates on file descriptors in two modes: LT (conditional firing) and ET (edge firing). LT is the default polling mode. In this mode, epoll is equivalent to a highly efficient poll. To use the ET schema, an EPOLLET event on the file descriptor should be registered in the epoll kernel event table.

In the LT mode, when epoll_WAIT detects an event occurring on it and notifies the application of the event, the application does not have to process the event immediately. This way, the next time an application calls epoll_WAIT, epoll_WAIT notifies the application of the event again until the event is processed.

In ET mode, when epoll_WAIT detects an event occurring on it and notifies the application of the event, the application must process the event immediately, because subsequent calls to epoll_WAIT will no longer notify the application of the event.

Therefore, ET mode greatly reduces the number of times that the same epoll event is triggered repeatedly, so it is more efficient than LT mode.

Note: File descriptors using the ET pattern should be non-blocking. If the file descriptor is blocked, read or write operations will remain blocked because there are no subsequent events.

The experiment

Experiments are carried out on SELECT and poll respectively. In the relevant program, when receiving data from the peer end, only one message is printed but data is not read from the buffer, so as to determine whether select and poll are in LT mode or ET mode.

When you experiment with SELECT, you can see that the program keeps printing information in a loop.

Experiment with Poll and you can see that the program also keeps printing information in a loop.

In summary, both SELECT and poll are conditionally triggered, i.e., LT mode.