(1)select time complexity O(n)

It just knows that there is an I/O event happening, but it doesn’t know which streams it is (there could be one, more, or even all of them), so we have to poll all the streams indiscriminately to find the ones that can read or write data and operate on them. So select has O(n) undifferentiated polling complexity, and the more streams that are processed at the same time, the longer the undifferentiated polling time.

Poll ==> time complexity O(n)

Poll is essentially no different from SELECT in that it copies arrays passed in by the user into kernel space and queries the device state for each FD, but there is no limit to the maximum number of connections because it is stored based on linked lists.

(3)epoll==> Time complexity O(1)

Epoll can be understood as an event poll. Unlike busy polling and undifferentiated polling, epoll notifies us of which FLOW has what I/O event. So we say that epoll is actually event-driven (each event is associated with a FD) and that our operations on these streams make sense. (Complexity reduced to O(1))

Select, poll, and epoll are all MECHANISMS for I/O multiplexing. I/O multiplexing is a mechanism for monitoring multiple descriptors and notifying a program to read or write when a descriptor is ready (typically read or write). But SELECT, poll, and epoll are all synchronous I/ OS in nature, because they need to do the reading and writing themselves after the read and write event is ready, that is, the reading and writing process is blocked, whereas asynchronous I/O does not need to do the reading and writing itself. The implementation of asynchronous I/O takes care of copying data from the kernel to user space.

Both Epoll and Select provide a solution for multiplexing multiple I/O. Epoll is unique to Linux, and SELECT is required by POSIX, which is implemented by common operating systems

Select:

Select is essentially the next step by setting or examining the data structure that holds the FD flag bit. The disadvantages of this are:

1. The number of FDS that can be monitored by a single process is limited, that is, the number of ports that can be monitored is limited.

In general, this number depends a lot on the system memory. You can check the number by cat /proc/sys/fs/file-max. The default value is 1024 for a 32-bit vm. The default value is 2048 for 64-bit machines.

2. Linear scanning is used for socket scanning, that is, polling method is adopted, which has low efficiency:

When there are a large number of sockets, each select() is scheduled by traversing FD_SETSIZE sockets, regardless of which Socket is active. This wastes a lot of CPU time. Polling can be avoided if sockets are registered with a callback function that automatically completes the operation when they are active, which is what EPoll and KQueue do.

3. The need to maintain a data structure to store a large number of Fd’s, which makes the replication overhead in user space and kernel space when passing this structure

Poll:

Poll in essence and select no difference, it will be the user of the incoming array copy into the kernel space, then the query each fd corresponding device status, if the equipment is ready the equipment for adding a queue and continue to traverse, if not found ready after traverse all fd equipment, hang up the current process, until the device is ready or timeout, It is woken up to iterate over the FD again. This process went through a lot of unnecessary traversal.

There is no limit to the maximum number of connections because it is stored based on linked lists, but there is also a disadvantage:

  • A large number of FD arrays are copied between the user state and the kernel address space as a whole, regardless of whether the copying makes sense.

  • Another feature of poll is the “horizontal trigger”. If a FD is reported and not processed, the next poll will report the FD again.

epoll:

Epoll has two trigger modes: EPOLLLT and EPOLLET. LT is the default mode, and ET is the high speed mode. In LT mode, epoll_wait returns an event for each FD that has data to read, while in ET (edge triggered) mode, epoll_wait returns an event for each FD that has data to read, regardless of whether there is more data to read.

In ET mode, the buffer must be read until the return value of read is less than the requested value, or an EAGAIN error occurs. Another feature is that epoll uses “event” ready notification, registering a FD with epoll_ctl. Once the FD is ready, the kernel uses a callback-like callback mechanism to activate the FD and epoll_wait is notified.

Why does epoll have EPOLLET trigger mode?

In EPOLLLT mode, if there are a large number of ready file descriptors in the system that you don’t need to read or write, they will be returned every time epoll_wait is called, which greatly reduces the efficiency of the processor to retrieve the ready file descriptors it cares about. In EPOLLET’s edge-triggered mode, epoll_wait() tells the handler to read or write when a read or write event occurs on the file descriptor being monitored.

Epoll_wait () does not notify you of the next call to the file descriptor if the read/write buffer is too small. It does not notify you until the second read/write event occurs on the file descriptor. This mode is more efficient than horizontal triggering, and the system is not flooded with ready file descriptors that you don’t care about

Advantages of epoll:

  • There is no maximum concurrent connection limit, and the maximum number of FDS that can be opened is much higher than 1024 (about 100,000 ports can be listened on 1 GB of memory).
  • Efficiency improvement, not polling, does not decrease as the number of FD’s increases. Callback is called only if an active FD is available; The great advantage of Epoll is that it focuses on your “active” connections, not the total number of connections, so in a real network environment, Epoll is much more efficient than select and poll.
  • Memory copy, which uses mmap() files to map memory to speed up message passing with kernel space; That is, epoll uses Mmap to reduce replication overhead.

Select, poll, epoll

1. Support the maximum number of connections a process can open

select

The maximum number of connections a single process can open is defined by the FD_SETSIZE macro, which is 32 integers (3232 on a 32-bit machine, 3264 on a 64-bit machine). Of course, we can modify this and recompile the kernel, but performance may suffer. This requires further testing.

poll

Poll is essentially no different from SELECT, but it has no limit on the maximum number of connections because it is stored based on linked lists

epoll

Although the number of connections is capped, it is large, around 100,000 connections can be opened on a 1GB machine and around 200,000 connections can be opened on a 2GB machine

2. IO efficiency problems caused by FD surge

select

Because the connection is traversed linearly on each call, increasing FD can cause a “linear degradation of performance” with slow traversal speed.

poll

Same as above

epoll

Because epoll kernel is implemented according to the callback function on each FD, only active sockets will actively call callback, so in the case of fewer active sockets, using Epoll does not have the linear degradation of the previous two performance problems. However, if all sockets are active, there may be performance problems.

3. Message delivery

select

Kernel copy actions are required to deliver messages to user space

poll

Same as above

epoll

Epoll is implemented by sharing a single piece of memory between the kernel and user space.


Conclusion:

To sum up, select select, poll, and epoll should be selected according to the specific application occasions and the characteristics of these three methods.

1. On the surface, epoll performs best, but select and poll may perform better than epoll when the number of connections is small and the connections are very active. After all, epoll’s notification mechanism requires many function callbacks.

Select is inefficient because it requires polling every time. But inefficiencies are relative and can be improved by good design, depending on the situation

Today, the three IO multiplexing comparison, reference online and book information, sorted as follows:

1, select implementation

Select is called as follows:

  • Copy fd_set from user space to kernel space using copy_from_user
  • Register the callback function __pollwait
  • The poll method (sock_poll for sockets, sock_poll for tcp_poll, UDP_poll, or datagram_poll, depending on the case) – tcp_poll as an example, The core implementation is __pollwait, which is the callback function registered above.
  • The main job of __pollwait is to attach current to the device’s wait queue. Different devices have different wait queues. For tcp_poll, the wait queue is SK -> SK_sleep (note that suspending a process does not mean it is asleep). Current is awakened when the device receives a message (network device) or fills in file data (disk device) and wakes up the sleeping process on the device wait queue.
  • The poll method returns a mask mask describing the readiness of the read/write operation, and assigns a value to fd_set based on this mask mask.
  • If all FDS are iterated and no read-write mask is returned, schedule_timeout is called to put the select process (current) to sleep. When the device driver reads or writes to its own resources, it wakes up the sleeping process on its wait queue. If no one wakes up after a certain timeout (specified by schedule_timeout), the process that called select will wake up again to get the CPU, and then iterate over the FD to determine if any fd is ready.
  • Copy fd_set from kernel space to user space.


Conclusion:

Select ()

  • Every time you call select, you need to copy the fd collection from user mode to kernel mode, which can be expensive if there are many FD’s
  • Also, each call to SELECT requires the kernel to iterate over all FDS passed in, which can be expensive if there are many FDS
  • The number of file descriptors supported by SELECT is too small. The default is 1024

Poll implementation

Poll the implementation and the select of very similar, just describe fd set in a different way, poll using pollfd structure rather than the select fd_set structure, all is the same as the other, to manage multiple descriptor is also poll, according to the status of the descriptor for processing, But poll has no limit on the maximum number of file descriptors.

A drawback of poll and SELECT is that arrays containing a large number of file descriptors are copied between the user state and the kernel address space as a whole, regardless of whether the file descriptors are ready or not, and their overhead increases linearly with the number of file descriptors.

3, epoll

Epoll, since it is an improvement on SELECT and Poll, should avoid these three disadvantages. So how does epoll work? Before we do that, let’s look at the differences between the call interfaces of epoll and select and poll. Both select and poll provide only one function — select or poll.

Epoll provides three functions, epoll_create,epoll_ctl, and epoll_wait. Epoll_create creates an epoll handle. Epoll_ctl is the type of event registered to listen for; Epoll_wait waits for an event to occur.

For the first shortcoming, the solution to epoll is in the epoll_ctl function. Each time a new event is registered into the epoll handle (specify EPOLL_CTL_ADD in epoll_ctl), all FDS are copied into the kernel instead of being copied repeatedly during epoll_wait. Epoll ensures that each FD is copied only once during the entire process.

On the second disadvantage, instead of adding current to the device wait queue each time, as select or poll does, epoll’s solution simply suspends current (which is essential) at epoll_ctl and assigns a callback function to each FD. When the device is ready, This callback function is called when the waiters on the wait queue are awakened, and the callback function adds the ready FD to a ready linked list. The job of epoll_wait is to check for ready FDS in the ready list (schedule_timeout() is similar to step 7 in the select implementation).

For the third disadvantage, ePoll does not have this limitation. The maximum number of FDS it supports is the maximum number of files that can be opened, which is generally much higher than 2048, for example, around 100,000 on a 1GB machine. The exact number can be checked by cat /proc/sys/fs/file-max. Generally, this number depends on the system memory.

Conclusion:

(1) Select, the poll implementation needs to poll all fd collections continuously by itself until the device is ready, possibly alternating between sleep and wake up several times. Epoll also calls epoll_WAIT to poll the ready list, and it may sleep and wake up several times, but when the device is ready, epoll calls the callback function, puts the ready FD into the ready list, and wakes up the process that went to sleep in epoll_wait.

Although both sleep and alternate, select and Poll traverse the entire FD collection while awake, while EPoll is awake to determine if the ready list is empty, which saves a lot of CPU time. This is the performance benefit of the callback mechanism.

(2) poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait Note that the wait queue here is not a device wait queue, just a wait queue defined internally by epoll. It also saves a lot of money.

Pay attention to wechat public number: Songhua said, get more wonderful!

A BLOG address:www.liangsonghua.com

Introduction to our official account: We share our technical insights from working in JD, as well as JAVA technology and best practices in the industry, most of which are pragmatic, understandable and reproducible