Select, poll, epoll are all mechanisms of IO multiplexing. I/O multiplexing is a mechanism for monitoring multiple descriptors and notifying a program to read or write when a descriptor is ready (typically read or write). But SELECT, poll, and epoll are all synchronous I/ OS in nature, because they need to do the reading and writing themselves after the read and write event is ready, that is, the reading and writing process is blocked, whereas asynchronous I/O does not need to do the reading and writing itself. The implementation of asynchronous I/O takes care of copying data from the kernel to user space. As for the use of these three IO multiplexers, the previous three summaries were written very clearly and tested with the server echo program.

Select summary for IO multiplexing

Poll summary of IO multiplexing

Epoll summary for IO multiplexing

The connection is as follows:

Multiplexing of IO:

1. The call process of SELECT is shown as follows:

(1) Copy fd_set from user space to kernel space using copy_from_user (2) register the __pollwait callback function (3) to traverse all FDS and call their corresponding poll method (for sockets, this poll method is sock_poll, Sock_poll will call tcp_poll, udP_poll, or datagram_poll depending on the situation) (4) In tcp_poll as an example, its core implementation is __pollwait, which is the callback registered above. (5) The main job of __pollwait is to attach current to the device's wait queue. Different devices have different wait queues. For tcp_poll, the wait queue is SK -> SK_sleep. Current is awakened when the device receives a message (network device) or fills in file data (disk device) and wakes up the sleeping process on the device wait queue. (6) When the poll method returns, it returns a mask mask describing whether the read and write operation is ready, and assigns a value to fd_set according to this mask mask. Schedule_timeout (select current) is called to sleep if all FDS are iterated and no readable mask is returned. When the device driver reads or writes to its own resources, it wakes up the sleeping process on its wait queue. If no one wakes up after a certain timeout (specified by schedule_timeout), the process that called select will wake up again to get the CPU, and then iterate over the FD to determine if any fd is ready. Copy fd_set from kernel space to user space.Copy the code

Select ()

(1) Each call to SELECT requires copying the fd set from user mode to kernel mode, which can be expensive if there are many FD's. (2) Each call to SELECT requires traversing all FDS passed in by the kernel, which can be expensive if there are many FD's. (3) The number of file descriptors supported by SELECT is too small. The default is 1024Copy the code

Poll implementation

The implementation of poll is very similar to that of SELECT, except that it describes FD collections differently. Poll uses the pollFD structure instead of the Fd_set structure of Select, and everything else is similar.

For an implementation analysis of select and poll, please refer to the following blog posts:

Select (poll) system call implementation

Select (poll) system call implementation (2)

Select (poll) system call implementation

Using event-driven model to implement efficient and stable web server programs

Introduction and comparison of several network server models

Select function implementation principle analysis

3, epoll

Epoll, since it is an improvement on SELECT and Poll, should avoid these three disadvantages. So how does epoll work? Before we do that, let’s look at the differences between the call interfaces of epoll and select and poll. Both select and poll provide only one function — select or poll. Epoll provides three functions, epoll_create,epoll_ctl, and epoll_wait. Epoll_create creates an epoll handle. Epoll_ctl is the type of event registered to listen for; Epoll_wait waits for an event to occur.

For the first shortcoming, the solution to epoll is in the epoll_ctl function. Each time a new event is registered into the epoll handle (specify EPOLL_CTL_ADD in epoll_ctl), all FDS are copied into the kernel instead of being copied repeatedly during epoll_wait. Epoll ensures that each FD is copied only once during the entire process.

On the second disadvantage, instead of adding current to the device wait queue each time, as select or poll does, epoll’s solution simply suspends current (which is essential) at epoll_ctl and assigns a callback function to each FD. When the device is ready, This callback function is called when the waiters on the wait queue are awakened, and the callback function adds the ready FD to a ready linked list. The job of epoll_wait is to check for ready FDS in the ready list (schedule_timeout() is similar to step 7 in the select implementation).

For the third disadvantage, ePoll does not have this limitation. The maximum number of FDS it supports is the maximum number of files that can be opened, which is generally much higher than 2048, for example, around 100,000 on a 1GB machine. The exact number can be checked by cat /proc/sys/fs/file-max. Generally, this number depends on the system memory.

Conclusion:

(1) Select, the poll implementation needs to poll all fd collections continuously by itself until the device is ready, possibly alternating between sleep and wake up several times. Epoll also calls epoll_WAIT to poll the ready list, and it may sleep and wake up several times, but when the device is ready, epoll calls the callback function, puts the ready FD into the ready list, and wakes up the process that went to sleep in epoll_wait. Although both sleep and alternate, select and Poll traverse the entire FD collection while awake, while EPoll is awake to determine if the ready list is empty, which saves a lot of CPU time. This is the performance benefit of the callback mechanism.

(2) poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait Note that the wait queue here is not a device wait queue, just a wait queue defined internally by epoll. It also saves a lot of money.

References:

Select source code implementation analysis

Select, poll, epoll implementation analysis – combined with kernel source code

Select, poll, epoll comparison

Select, poll, and epoll use summary

Use epoll a complete C language source example