I was asked about the difference between select, poll and epoll when I was looking for a job after graduation. I just searched online and memorized the answers. I didn’t really understand it. While I have time now, I plan to have a good look at it. The following article is good. Collect a wave.
The original link
What are synchronous IO and asynchronous IO, blocking IO and non-blocking IO, and what are the differences? Different people give different answers in different contexts. So let’s limit the context of this article.
The background of this article is Network IO in Linux.
Concept Description
Before I explain, a few concepts need to be explained:
- User space and kernel space
- Process switching
- Process blocking
- File descriptor
- The cache I/O
User space versus kernel space
Today’s operating systems use virtual storage, so for 32-bit operating systems, the addressing space (virtual storage) is 4G (2 ^ 32). The core of the operating system is the kernel, which is independent of ordinary applications and has access to the protected memory space as well as all permissions to access the underlying hardware devices. In order to ensure that the user process cannot operate the kernel directly and ensure the security of the kernel, the system divides the virtual space into two parts, one is the kernel space and the other is the user space. For Linux, the highest 1G byte (from virtual addresses 0xC0000000 to 0xFFFFFFFF) is used by the kernel and is called kernel space, while the lower 3G byte (from virtual addresses 0x00000000 to 0xBfffff) is used by various processes and is called user space.
Process switching
To control process execution, the kernel must have the ability to suspend a process running on the CPU and resume execution of a previously suspended process. This behavior is called process switching. Therefore, it can be said that any process runs under the support of the operating system kernel and is closely related to the kernel.
Moving from a running process to a running process goes through the following changes:
- Holds processor context, including program counters and other registers.
- Update PCB information.
- Move the process PCB to the appropriate queue, such as ready, blocked at an event queue, etc.
- Select another process to execute and update its PCB.
- Update memory management data structures.
- Restore processor context.
Process blocking
If some expected events do not occur in the executing process, such as system resource request failure, waiting for the completion of an operation, new data has not arrived, or no new work is done, the system automatically executes Block primitives to change the running state into the blocked state. It can be seen that the blocking of a process is an active behavior of the process itself, so only the running process (to obtain CPU) can be put into the blocking state. When a process is blocked, it consumes no CPU resources.
File descriptor fd
A File descriptor is a computer science term, an abstract concept used to describe a reference to a File.
The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table of open files that the kernel maintains for each process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In programming, some low-level programming tends to revolve around file descriptors. However, the concept of file descriptors is usually only applicable to operating systems such as UNIX and Linux.
The cache I/O
Cache I/O, also known as standard I/O, is the default I/O operation for most file systems. In Linux’s cached I/O mechanism, the operating system caches I/O data in the file system’s page cache. That is, data is copied to the operating system kernel buffer before it is copied from the operating system kernel buffer to the application address space.
Disadvantages of caching I/O:
In the process of data transmission, multiple data copy operations are required in the application address space and the kernel. These data copy operations cost a lot of CPU and memory.
The IO model
As mentioned earlier, for an IO access (such as read), data is copied to the operating system kernel buffer before being copied from the operating system kernel buffer to the application address space. So, when a read operation occurs, it goes through two phases:
- Waiting for the data to be ready
- Copying the data from the kernel to the process
Because of these two phases, Linux produces the following five network modes.
- Blocking I/O (Blocking IO)
- Nonblocking I/O (NONblocking IO)
- I/O multiplexing
- Signal Driven I/O
- Asynchronous I/O (Asynchronous IO)
Note: Since Signal Driven IO is not commonly used in practice, I will only mention the remaining four IO models.
Blocking I/O (Blocking IO)
In Linux, all sockets are blocking by default. A typical read operation would look like this:
So, the characteristic of blocking IO is that both phases of IO execution are blocked.
Nonblocking I/O (NONblocking IO)
On Linux, you can set the socket to make it non-blocking. When a read is performed on a non-blocking socket, the flow looks like this:
** So nonblocking IO requires a user process to constantly ask the kernel for data. **
I/O multiplexing
IO multiplexing is called select poll epoll. Some places also call this IO mode event Driven IO. The advantage of select/epoll is that a single process can handle THE IO of multiple network connections simultaneously. The select, poll, epoll function polls all sockets and notifies the user process when data arrives on a socket.
So, I/O multiplexing is characterized by a mechanism whereby a process can wait for multiple file descriptors at the same time, and select() returns when any one of these file descriptors (socket descriptors) is read ready.
This graph isn’t really that different from blocking IO’s graph, in fact, it’s worse. Because two system calls (select and recvfrom) are required, blocking IO calls only one system call (recvfrom). However, the advantage of using SELECT is that it can handle multiple Connections simultaneously.
So, a Web server using SELECT /epoll does not necessarily perform better than a Web server using multi-threading + blocking IO and may have even greater latency if the number of connections being processed is not very high. The advantage of select/epoll is not that it can process individual connections faster, but that it can process more connections.
In IO multiplexing Model, in practice, for each socket, set become non – blocking commonly, however, as shown in the above, the whole process of user is actually has been block. The process is blocked by the select function rather than by the socket IO to the block.
Asynchronous I/O (Asynchronous IO)
Asynchronous IO is rarely used in Linux. Let’s take a look at the flow:
conclusion
A: The difference between blocking and non-blocking
Invoking blocking IO will block the corresponding process until the operation is complete, while non-blocking IO will return immediately if the kernel is still preparing data.
Synchronous IO and Asynchronous IO
Before explaining the difference between synchronous IO and Asynchronous IO, we need to define both. POSIX’s definition looks like this:
- A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes;
- An asynchronous I/O operation does not cause the requesting process to be blocked;
The difference between the two is that synchronous IO blocks the process during “IO operation”. According to this definition, the aforementioned blocking IO, non-blocking IO, IO multiplexing all belong to synchronous IO.
You might say, well, non-blocking IO is not blocked. I/O operation refers to actual I/O operations, such as recvfrom. When performing the recvfrom system call with non-blocking IO, if the kernel data is not ready, the process will not be blocked. However, when the data in the kernel is ready, recvFROM copies the data from the kernel to the user’s memory, at which point the process is blocked. During this time, the process is blocked.
Asynchronous IO, however, returns when an I/O is initiated and is ignored until the kernel sends a signal telling the process that the I/O is complete. During this entire process, the process is not blocked at all.
The comparison of each IO Model is shown in the figure:
Three I/O multiplexing select, poll, epoll details
Select, poll, and epoll are all MECHANISMS for I/O multiplexing. I/O multiplexing is a mechanism by which a process can monitor multiple descriptors and, once a descriptor is ready (usually read or write), inform the program to do the corresponding read/write operation. But SELECT, poll, and epoll are all synchronous I/ OS in nature, because they need to do the reading and writing themselves after the read and write event is ready, that is, the reading and writing process is blocked, whereas asynchronous I/O does not need to do the reading and writing itself. The implementation of asynchronous I/O takes care of copying data from the kernel to user space. (Verbose here)
select
int select (int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
Copy the code
The file descriptors monitored by the SELECT function are divided into three categories, namely writefds, READFds, and Exceptfds. The select function blocks until a description is ready (data readable, writable, or except), or a timeout (timeout specifies the wait time, and is set to null if returned immediately), and returns. When the select function returns, the ready descriptor can be found by iterating through the FDset.
Select is currently supported on almost all platforms, and its good cross-platform support is another advantage. One disadvantage of SELECT is that there is a maximum limit on the number of file descriptors that a single process can monitor, which on Linux is typically 1024. This limit can be raised by changing the macro definition or even recompiling the kernel, but this can also be inefficient.
poll
int poll (struct pollfd *fds, unsigned int nfds, int timeout);
Copy the code
Unlike select, which uses three bitmaps to represent three FDSets, poll uses a pointer to a PollFD.
struct pollfd { int fd; /* file descriptor / short events; / requested events to watch / short revents; / returned events witnessed */ }; The PollFD structure contains both the events to monitor and the events that occur, eliminating the use of select parameter-value passing. At the same time, pollFD does not have a maximum number (but too many can degrade performance). As with the SELECT function, after poll returns, pollFD needs to be polled to get the ready descriptor.
From above, both SELECT and poll need to iterate through the file descriptor to get the socket ready on return. In fact, a large number of clients connected at the same time may have very few ready at any one time, so the efficiency of monitoring decreases linearly as the number of descriptors increases.
epoll
Epoll was introduced in the 2.6 kernel and is an enhanced version of the previous SELECT and Poll. Compared to select and poll, epoll is more flexible and has no descriptor constraints. Epoll uses a single file descriptor to manage multiple descriptors, storing the events of the file descriptor for the user relationship into an event table in the kernel so that the copy in user space and kernel space is done only once.
An epoll operation process
The epoll operation requires three interfaces, which are as follows:
Int epoll_create (int size); Int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);Copy the code
1. int epoll_create(int size);
Size tells the kernel how many listeners epoll can listen on. This parameter is different from the first parameter in select(), which gives the maximum number of listeners fd+1. Size does not limit the number of descriptors epoll can listen on. This is just a suggestion for the kernel to initially allocate internal data structures. When the epoll handle is created, it takes up a fd value, which is visible on Linux if you look at /proc/process id/fd/, so close() must be called after epoll is used, otherwise the FD may run out.
2. Int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
The op function performs the op operation on the specified descriptor fd.
- Epfd: is the return value of epoll_create().
- Op: indicates an op operation, which is represented by three macros: add EPOLL_CTL_ADD, delete EPOLL_CTL_DEL, and modify EPOLL_CTL_MOD. Add, delete, and modify listening events to fd, respectively.
- Fd: is the FD (file descriptor) to listen on
- Epoll_event: tells the kernel what to listen for, struct epoll_event:
struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ }; // Events can be a collection of the following macros: EPOLLIN: indicates that the corresponding file descriptor can be read (including that the peer SOCKET is normally closed). EPOLLOUT: indicates that the corresponding file descriptor can be written. EPOLLPRI: indicates that the corresponding file descriptor has urgent data to read (it should indicate that out-of-band data arrives). EPOLLERR: indicates that an error occurs in the corresponding file descriptor. EPOLLHUP: the corresponding file descriptor is hung up. EPOLLET: Set EPOLL to Edge Triggered mode, as opposed to Level Triggered. EPOLLONESHOT: monitors only one event. If you want to continue monitoring the socket after this event, you need to add the socket to the EPOLL queue againCopy the code
3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Wait for IO events on epFD, return maxEvents at most. Maxevents tells the kernel how big events are. The value of maxEvents cannot be greater than the size at which epoll_create() was created. Timeout is the timeout time (milliseconds, 0 will be returned immediately, -1 will be undefined, It is also said to be permanently blocked. This function returns the number of events that need to be processed, with a return of 0 indicating timeout.
Ii. Working Mode
Epoll operates on file descriptors in two modes: LT (Level trigger) and ET (edge trigger). The LT mode is the default mode. The differences between the LT mode and ET mode are as follows: LT mode: When epoll_WAIT detects the occurrence of a descriptor event and notifies the application program of the event, the application program does not need to process the event immediately. The next time epoll_wait is called, the application is responded again and notified of this event. ET mode: When epoll_WAIT detects that a descriptor event has occurred and notifies the application of the event, the application must process the event immediately. If not, the next time epoll_wait is called, the application will not respond again and be notified of this event.
1. LT model
LT(level triggered) is the default working mode and supports both block and no-block sockets. In this way, the kernel tells you if a file descriptor is ready, and then you can IO the ready FD. If you do nothing, the kernel will continue to notify you.
2. ET model
Edge-triggered (ET) is high-speed and supports only no-block sockets. In this mode, the kernel tells you via epoll when a descriptor is never ready to go to ready. It then assumes that you know the file descriptor is ready and will not send any more ready notifications for that file descriptor until you do something that causes that file descriptor to no longer be ready (e.g., you are sending, receiving, or receiving requests, Or an EWOULDBLOCK error will result from sending and receiving less than a certain amount of data). Note, however, that the kernel does not send any more notifications (only once) if it never IO the FD (causing it to become unready again).
ET mode greatly reduces the number of epoll events to be triggered repeatedly, so it is more efficient than LT mode. When epoll works in ET mode, it must use a non-blocking socket to avoid starving the task of processing multiple file descriptors due to a blocking read/write operation on a file handle.
3. Summary
Here’s an example:
- We have added a file handle (RFD) for reading data from the pipe to the epoll descriptor
- At this point 2KB of data is written from the other end of the pipe
- Epoll_wait (2) is called and it returns an RFD indicating that it is ready to read
- Then we read 1KB of data
- Calls epoll_wait (2)…
LT mode: In LT mode, epoll_wait(2) at step 5 will still be notified.
ET mode: If we used the EPOLLET flag when we added RFD to the epoll descriptor in step 1, we would probably hang after epoll_wait(2) in step 5 because the remaining data is still in the file’s input buffer and the data sender is still waiting for feedback on the published data. The ET working mode reports an event only when it occurs on the monitored file handle. So at step 5, the caller might give up waiting for the remaining data still in the file input buffer.
When using the ET model of epoll, when an EPOLLIN event is generated, we need to consider when reading the data. If the size returned by recv() is equal to the requested size, it is likely that the buffer has not finished reading the data, which means that the event has not been processed, so we need to read the data again:
while(rs){
buflen = recv(activeevents[i].data.fd, buf, sizeof(buf), 0);
if(buflen < 0){// Because of the non-blocking mode, errno is EAGAIN, indicating that the current buffer has no data to read.if(errno == EAGAIN){
break;
}
else{
return; }}else if(buflen == 0){// The socket on the peer end is closed.}if(buflen == sizeof(buf){ rs = 1; // need to read again}else{ rs = 0; }}Copy the code
EAGAIN in Linux
Development in Linux often encounters many errors (setting errno), of which EAGAIN is one of the more common (for example, for non-blocking operations). Literally, it’s a cue to try again. This error usually occurs when an application does something non-blocking (on a file or socket).
For example, open a file /socket/FIFO with the O_NONBLOCK flag if you do a continuous read operation with no data to read. Instead of blocking and waiting for the data to return when it is ready, the read function returns an error, EAGAIN, indicating that your application has no data to read now and please try again later. For example, when a system call (such as fork) fails because there are not enough resources (such as virtual memory), return EAGAIN to prompt it to call again (perhaps it will succeed next time).
Iii Code Demo
The following is an incomplete and ill-formed piece of code intended to illustrate the above process, removing some of the template code.
#define IPADDRESS "127.0.0.1"
#define PORT 8787
#define MAXSIZE 1024
#define LISTENQ 5
#define FDSIZE 1000
#define EPOLLEVENTS 100listenfd = socket_bind(IPADDRESS,PORT); struct epoll_event events[EPOLLEVENTS]; // Create a descriptor epollfd = epoll_create(FDSIZE); // Add the listener descriptor event add_event(epollfd,listenfd,EPOLLIN); // Loop to waitfor(; ;) {/ / this function returns the descriptor event number ready and ret = epoll_wait (EPOLLEVENTS epollfd, events, 1); / / processing the received connection handle_events (epollfd, events, ret, listenfd, buf); Static void handle_events(int epollfd,struct epoll_event *events,int num,int listenfd,char *buf) {int I; // Static void handle_events(int epollfd,struct epoll_event *events,int num,int listenfd,char *buf) {int I; int fd; // perform traversal; Here you just iterate through the IO events that you have prepared. Num is not the FDSIZE of the original epoll_create.for(i = 0; i < num; i++) { fd = events[i].data.fd; // Process according to the type of descriptor and event typeif ((fd == listenfd) &&(events[i].events & EPOLLIN))
handle_accpet(epollfd,listenfd);
else if (events[i].events & EPOLLIN)
do_read(epollfd,fd,buf);
else if(events[i].events & EPOLLOUT) do_write(epollfd,fd,buf); Static void add_event(int epollfd,int fd,int state){struct epoll_event ev; ev.events = state; ev.data.fd = fd; epoll_ctl(epollfd,EPOLL_CTL_ADD,fd,&ev); Static void handle_accPET (int epollfd,int listenfd){int clifd; struct sockaddr_in cliaddr; socklen_t cliaddrlen; clifd = accept(listenfd,(struct sockaddr*)&cliaddr,&cliaddrlen);if (clifd == -1)
perror("accpet error:");
else {
printf("accept a new client: %s:%d\n",inet_ntoa(cliaddr.sin_addr),cliaddr.sin_port); // Add a customer descriptor and the event add_event(epollfd,clifd,EPOLLIN); Static void do_read(int epollfd,int fd,char *buf){int nread; static void do_read(int epollfd,int fd,char *buf){int nread; nread =read(fd,buf,MAXSIZE);
if (nread == -1) {
perror("read error:"); close(fd); // Remember close fd delete_event(epollfd,fd,EPOLLIN); // Delete listener}else if (nread == 0) {
fprintf(stderr,"client close.\n"); close(fd); // Remember close fd delete_event(epollfd,fd,EPOLLIN); // Delete listener}else {
printf("read message is : %s",buf); // Modify_event (epollfd,fd,EPOLLOUT) is changed from read to write; Static void do_write(int epollfd,int fd,char *buf) {int nwrite; static void do_write(int epollfd,int fd,char *buf) {int nwrite; nwrite = write(fd,buf,strlen(buf));if (nwrite == -1){
perror("write error:"); close(fd); // Remember close fd delete_event(epollfd,fd,EPOLLOUT); // Delete listener}else{ modify_event(epollfd,fd,EPOLLIN); } memset(buf,0,MAXSIZE); Static void delete_event(int epollfd,int fd,int state) {struct epoll_event ev; ev.events = state; ev.data.fd = fd; epoll_ctl(epollfd,EPOLL_CTL_DEL,fd,&ev); Static void modify_event(int epollfd,int fd,int state){struct epoll_event ev; ev.events = state; ev.data.fd = fd; epoll_ctl(epollfd,EPOLL_CTL_MOD,fd,&ev); } // Note: I will save the other endCopy the code
Four epoll summary
In select/poll, the kernel scans all monitored file descriptors only after a certain method is called, whereas epoll registers a file descriptor through epoll_ctl(). Once a file descriptor is ready, the kernel uses a callback mechanism similar to callback. Activate this file descriptor quickly to be notified when the process calls epoll_wait(). The traversal file descriptor is removed here, and instead a mechanism that listens for callbacks. That’s the beauty of epoll.)
The advantages of epoll are as follows:
- There is no limit to the number of descriptors monitored. The maximum number of FDS it can support is the maximum number of files that can be opened. This number is generally much higher than 2048, for example, around 100,000 on a 1GB machine. The exact number can be checked by cat /proc/sys/fs/file-max. Generally, this number depends on the system memory. The biggest drawback of SELECT is that there is a limit to how many FDS a process can open. This is not sufficient for servers with a large number of connections. While a multi-process solution is an option (which is what Apache does), creating processes on Linux is less expensive, but still significant, and inter-process data synchronization is far less efficient than inter-thread synchronization, so it’s not a perfect solution.
The efficiency of IO does not decrease as the number of FDS monitored increases. Epoll differs from select and poll polling in that it is implemented through a callback function defined by each FD. Only ready FDS execute callback functions. If there are not a large number of idle-connections or dead-connections, epoll is not much more efficient than select/poll. However, when there are a large number of idle-connections, Epoll is much more efficient than select/poll.
reference