The essence of I/O multiplexing is the mechanism by which the system kernel buffers I/O data so that a single process can monitor multiple file descriptors and, once a descriptor is ready (usually read or write), inform the program to read or write accordingly.
The file descriptors monitored by the SELECT function are divided into three categories, namely writefds, READFds, and Exceptfds. The select function blocks until a descriptor is ready (data readable, writable, or except), or a timeout (timeout specifies the wait time, or null if returned immediately), and the function returns. When the select function returns, the ready descriptor can be found by iterating through the FDset.
Select is essentially the next step by setting or examining the data structure that holds the FD flag bit.
Disadvantages:
1. The number of FDS that can be monitored by a single process is limited, that is, the number of ports that can be monitored is limited.
In general, this number depends a lot on the system memory. You can check the number by cat /proc/sys/fs/file-max. The default value is 1024 for a 32-bit vm. The default value is 2048 for 64-bit machines.
2. Linear scanning is used for socket scanning, that is, polling method is adopted, which has low efficiency:
When there are a large number of sockets, each select() is scheduled by traversing FD_SETSIZE sockets, regardless of which Socket is active. This wastes a lot of CPU time. Polling can be avoided if sockets are registered with a callback function that automatically completes the operation when they are active, which is what EPoll and KQueue do.
3. It is necessary to maintain a data structure for storing a large number of FD’s. Every time you call select, copy the FD set from user state to kernel state.
And select no difference, it will be the user of the incoming array copy into the kernel space, then the query each fd corresponding device status, if the equipment is ready the equipment for adding a queue and continue to traverse, if not found ready after traverse all fd equipment, hang up the current process, until the device is ready or timeout, It is woken up to iterate over the FD again. This process went through a lot of unnecessary traversal.
There is no limit to the maximum number of connections because it is stored based on linked lists.
Disadvantages:
1. A large number of fd arrays are copied between the user state and the kernel address space as a whole, regardless of whether such copying is done or not
A lot of sense.
2. Another feature of a poll is the “horizontal trigger”. If a FD is reported and not processed, the next poll will report the FD again.
LT mode: Level trigger. When epoll_WAIT detects that a descriptor event has occurred and notifies the application of the event, the application may not process the event immediately. The next time epoll_wait is called, the application is responded again and notified of this event.
ET mode: Edge trigger. When epoll_WAIT detects that a descriptor event has occurred and notifies the application of this event, the application must process the event immediately. If not, the next time epoll_wait is called, the application will not respond again and be notified of this event.
Int epoll_create (int size); Int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);Copy the code
Epoll_create: Creates a handle to epoll. Size tells the kernel how many listeners there are. The size parameter does not limit the maximum number of descriptors epoll can listen on, but is merely a suggestion for the initial allocation of internal data structures by the kernel.
Epoll_ctl: Performs op on the specified descriptor fd.
-epfd: is the return value of epoll_create().
-op Operations: Add EPOLL_CTL_ADD, delete EPOLL_CTL_DEL, modify EPOLL_CTL_MOD, add, delete, and modify fD-listening events.
-fd: indicates the fd (file descriptor) to be listened on.
– epoll_event: tells the kernel what events (such as read and write events) to listen for.
Epoll_wait: waits for I/O events on an EPFD and returns a maximum of maxEvents.
-events: a collection of events used to get from the kernel,
– maxEvents: tells the kernel how big events are. The value of maxEvents cannot be greater than the size of epoll_create().
-timeout: Indicates the timeout period.
Epoll has two trigger modes: EPOLLLT and EPOLLET. LT is the default mode, and ET is the high speed mode. In LT mode, epoll_wait returns its event every time the FD has data to read, while in ET (edge triggered) mode, it only prompts once and never again until the next data flow, regardless of whether the FD has data to read. In ET mode, the buffer must be read until the return value of read is less than the requested value, or an EAGAIN error occurs. Another feature is that epoll uses “event” ready notification, registering a FD with epoll_ctl. Once the FD is ready, the kernel uses a callback-like callback mechanism to activate the FD and epoll_wait is notified.
Why does epoll have EPOLLET trigger mode?
In EPOLLLT mode, if there are a large number of ready file descriptors in the system that you don’t need to read or write, they will be returned every time epoll_wait is called, which greatly reduces the efficiency of the processor to retrieve the ready file descriptors it cares about. In EPOLLET’s edge-triggered mode, epoll_wait() tells the handler to read or write when a read or write event occurs on the file descriptor being monitored. Epoll_wait () does not notify you of the next call to the file descriptor if the read/write buffer is too small. It does not notify you until the second read/write event occurs on the file descriptor. This mode is more efficient than horizontal triggering, and the system is not flooded with ready file descriptors that you don’t care about.
Epoll advantages:
1. There is no limit on the maximum concurrent connection, and the upper limit of FD that can be opened is much higher than 1024 (about 100,000 ports can be listened on 1G memory);
2. Efficiency improvement, not polling, will not decrease as the number of FD increases. Callback is called only if an active FD is available;
The great advantage of Epoll is that it focuses on your “active” connections, not the total number of connections, so in a real network environment, Epoll is much more efficient than select and poll.
3. Memory copy, using mmap() file to map memory to speed up message passing with kernel space; That is, epoll uses Mmap to reduce replication overhead.
0. Underlying data structure
Select: array, poll: linked list, epoll: red-black tree.
1. Support the maximum number of connections a process can open
Select the maximum number of connections that can be opened by a single process is defined by the FD_SETSIZE macro. The size is 32 integers (32*32 on a 32-bit machine, 32*64 on a 64-bit machine). But performance may be affected, which requires further testing.
Poll is essentially no different from SELECT, but it has no limit on the maximum number of connections because it is stored based on linked lists.
Epoll has an upper limit on the number of connections, but it is large. About 100,000 connections can be opened on a machine with 1 GB memory, and about 200,000 connections can be opened on a machine with 2 GB memory.
2. IO efficiency problems caused by FD surge
Select /poll Because the connection is traversed linearly on each call, increasing the number of FD results in a “linear degradation of performance” with slow traversal speed.
Epoll Because epoll kernel is implemented according to the callback function on each FD, only active sockets will actively call callback, so in the case of fewer active sockets, using epoll does not have the linear degradation of the previous two performance problems. However, if all sockets are active, there may be performance problems.
3. Message delivery
The SELECT /poll kernel needs to deliver messages to user space, and both require kernel copy actions.
Epoll is implemented by sharing a single piece of memory between the kernel and user space.
Select, poll, and epoll
Historical Background:
1) Select was implemented in BSD in 1984.
2) Poll was implemented 14 years later in 1997, but the delay was not a problem of efficiency, but the hardware of that era was so weak that a server handling more than 1000 links was a god, and select had satisfied the demand for a long time.
3) In 2002, Davide Libenzi implemented epoll.
References:
www.cnblogs.com/Anker/p/326…
www.cnblogs.com/aspirant/p/…
www.cnblogs.com/dhcn/p/1273…
Note series
Note | Java objects
Notes | the JVM memory area structure: one meter two stack area
Note | the interviewer ask me high concurrency problem: concurrent programming three big challenges
Note | the interviewer ask me: the difference between TCP and UDP
Notes | network programming basis: TCP how to ensure reliability
Notes | and hung up the interview because asked: TCP three-way handshake and four wave
| note 5 kinds of network IO model