Welcome to github.com/hsfxuebao/j… , hope to help you, if you feel ok, please click on the Star

What are synchronous IO and asynchronous IO, blocking IO and non-blocking IO, and what are the differences? Different people give different answers in different contexts. So let’s limit the context of this article.

The background of this article is Network IO in Linux.

I. Concept description

Before I explain, a few concepts need to be explained:

  • User space and kernel space
  • Process switching
  • Process blocking
  • File descriptor
  • The cache I/O

User space versus kernel space

Today’s operating systems use virtual storage, so for 32-bit operating systems, the addressing space (virtual storage) is 4G (2 ^ 32). The core of the operating system is the kernel, which is independent of ordinary applications and has access to the protected memory space as well as all permissions to access the underlying hardware devices. In order to ensure that the user process cannot operate the kernel directly and ensure the security of the kernel, the system divides the virtual space into two parts, one is the kernel space and the other is the user space. For Linux, the highest 1G byte (from virtual addresses 0xC0000000 to 0xFFFFFFFF) is used by the kernel and is called kernel space, while the lower 3G byte (from virtual addresses 0x00000000 to 0xBfffff) is used by various processes and is called user space.

Process switching

To control process execution, the kernel must have the ability to suspend a process running on the CPU and resume execution of a previously suspended process. This behavior is called process switching. Therefore, it can be said that any process runs under the support of the operating system kernel and is closely related to the kernel.

Moving from a running process to a running process goes through the following changes:

  1. Holds processor context, including program counters and other registers.
  2. Update PCB information.
  3. Move the process PCB to the appropriate queue, such as ready, blocked at an event queue, etc.
  4. Select another process to execute and update its PCB.
  5. Update memory management data structures.
  6. Restore processor context.

Process blocking

If some expected events do not occur in the executing process, such as system resource request failure, waiting for the completion of an operation, new data has not arrived, or no new work is done, the system automatically executes Block primitives to change the running state into the blocked state. It can be seen that the blocking of a process is an active behavior of the process itself, so only the running process (to obtain CPU) can be put into the blocking state. When a process is blocked, it consumes no CPU resources.

File descriptor fd

A File descriptor is a computer science term, an abstract concept used to describe a reference to a File.

The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table of open files that the kernel maintains for each process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In programming, some low-level programming tends to revolve around file descriptors. However, the concept of file descriptors is usually only applicable to operating systems such as UNIX and Linux.

The cache I/O

Cache I/O, also known as standard I/O, is the default I/O operation for most file systems. In Linux’s cached I/O mechanism, the operating system caches I/O data in the file system’s page cache. That is, data is copied to the operating system kernel buffer before it is copied from the operating system kernel buffer to the application address space.

Disadvantages of caching I/O:

In the process of data transmission, multiple data copy operations are required in the application address space and the kernel. These data copy operations cost a lot of CPU and memory.

IO mode

As mentioned earlier, for an IO access (such as read), data is copied to the operating system kernel buffer before being copied from the operating system kernel buffer to the application address space. So, when a read operation occurs, it goes through two phases:

  1. Waiting for the data to be ready
  2. Copying the data from the kernel to the process

Because of these two phases, Linux produces the following five network modes.

  • Blocking I/O (Blocking IO)
  • Nonblocking I/O (NONblocking IO)
  • I/O multiplexing
  • Signal Driven I/O
  • Asynchronous I/O (Asynchronous IO)

Note: Since Signal Driven IO is not commonly used in practice, I will only mention the remaining four IO models.

Blocking I/O (Blocking IO)

In Linux, all sockets are blocking by default. A typical read operation would look like this:

When the recvfrom system call is made by a user process, the kernel begins the first phase of IO: preparing data. (For network IO, there are many times when data does not arrive in the first place. For example, a complete UDP packet has not been received. The kernel waits for enough data to arrive. This process requires waiting, meaning that it takes a while for the data to be copied into the buffer of the operating system kernel. On the user side, the entire process is blocked (by the process’s own choice, of course). When the kernel waits until the data is ready, it copies the data from the kernel to user memory. Then the kernel returns the result, and the user process unblocks and starts running again.

So, the characteristic of blocking IO is that both phases of IO execution are blocked.

Nonblocking I/O (NONblocking IO)

On Linux, you can set the socket to make it non-blocking. When a read is performed on a non-blocking socket, the flow looks like this:

When a user process issues a read operation, if the data in the kernel is not ready, it does not block the user process, but immediately returns an error. From the user process’s point of view, when it initiates a read operation, it does not wait, but gets a result immediately. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the kernel is ready and receives a system call from the user process again, it copies the data to the user’s memory and returns.

** So nonblocking IO requires a user process to constantly ask the kernel for data. 支那

I/O multiplexing

IO multiplexing is called select poll epoll. Some places also call this IO mode event Driven IO. The advantage of select/epoll is that a single process can handle THE IO of multiple network connections simultaneously. The select, poll, epoll function polls all sockets and notifies the user process when data arrives on a socket.

When a user process calls a SELECT, the entire process is blocked, and the kernel “monitors” all select sockets, returning when data is ready in any socket. The user process then calls the read operation to copy data from the kernel to the user process.

So, I/O multiplexing is characterized by a mechanism whereby a process can wait for multiple file descriptors at the same time, and select() returns when any one of these file descriptors (socket descriptors) is read ready.

This graph isn’t really that different from blocking IO’s graph, in fact, it’s worse. Because two system calls (select and recvfrom) are required, blocking IO calls only one system call (recvfrom). However, the advantage of using SELECT is that it can handle multiple Connections simultaneously.

So, a Web server using SELECT /epoll does not necessarily perform better than a Web server using multi-threading + blocking IO and may have even greater latency if the number of connections being processed is not very high. The advantage of select/epoll is not that it can process individual connections faster, but that it can process more connections.

In IO multiplexing Model, in practice, for each socket, set become non – blocking commonly, however, as shown in the above, the whole process of user is actually has been block. The process is blocked by the select function rather than by the socket IO to the block.

Asynchronous I/O (Asynchronous IO)

Asynchronous IO is rarely used in Linux. Let’s take a look at the flow:

As soon as the user process initiates the read operation, it can start doing other things. On the other hand, from the kernel’s point of view, when it receives an asynchronous read, it first returns immediately, so no blocks are generated for the user process. The kernel then waits for the data to be ready and copies the data to the user’s memory. When this is done, the kernel sends a signal to the user process telling it that the read operation is complete.

conclusion

A: The difference between blocking and non-blocking

Invoking blocking IO will block the corresponding process until the operation is complete, while non-blocking IO will return immediately if the kernel is still preparing data.

Synchronous IO and Asynchronous IO

Before explaining the difference between synchronous IO and Asynchronous IO, we need to define both. POSIX’s definition looks like this:

  • A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes;
  • An asynchronous I/O operation does not cause the requesting process to be blocked;

The difference between the two is that synchronous IO blocks the process during “IO operation”. According to this definition, the aforementioned blocking IO, non-blocking IO, IO multiplexing all belong to synchronous IO.

You might say, well, non-blocking IO is not blocked. I/O operation refers to actual I/O operations, such as recvfrom. When performing the recvfrom system call with non-blocking IO, if the kernel data is not ready, the process will not be blocked. However, when the data in the kernel is ready, recvFROM copies the data from the kernel to the user’s memory, at which point the process is blocked. During this time, the process is blocked.

Asynchronous IO, however, returns when an I/O is initiated and is ignored until the kernel sends a signal telling the process that the I/O is complete. During this entire process, the process is not blocked at all.

The comparison of each IO Model is shown in the figure:

The difference between non-blocking IO and asynchronous IO is clear from the above picture. In non-blocking IO, although the process is not blocked most of the time, it still requires the process to actively check, and when the data is ready, it also requires the process to actively call recvfrom again to copy the data to user memory. Asynchronous IO is completely different. It is as if the user process hands off the entire IO operation to someone else (the kernel), who then sends a signal to notify when it is finished. During this time, the user process does not need to check the status of IO operations or actively copy data.

I/O multiplexing select poll epoll

Select, poll, and epoll are all MECHANISMS for I/O multiplexing. I/O multiplexing is a mechanism by which a process can monitor multiple descriptors and, once a descriptor is ready (usually read or write), inform the program to do the corresponding read/write operation. But SELECT, poll, and epoll are all synchronous I/ OS in nature, because they need to do the reading and writing themselves after the read and write event is ready, that is, the reading and writing process is blocked, whereas asynchronous I/O does not need to do the reading and writing itself. The implementation of asynchronous I/O takes care of copying data from the kernel to user space. (Verbose here)

select

int select (int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); Copy the codeCopy the code

The file descriptors monitored by the SELECT function are divided into three categories, namely writefds, READFds, and Exceptfds. The select function blocks until a description is ready (data readable, writable, or except), or a timeout (timeout specifies the wait time, and is set to null if returned immediately), and returns. When the select function returns, the ready descriptor can be found by iterating through the FDset.

Select is currently supported on almost all platforms, and its good cross-platform support is another advantage. One disadvantage of SELECT is that there is a maximum limit on the number of file descriptors that a single process can monitor, which on Linux is typically 1024. This limit can be raised by changing the macro definition or even recompiling the kernel, but this can also be inefficient.

poll

int poll (struct pollfd *fds, unsigned int nfds, int timeout); Copy the codeCopy the code

Unlike select, which uses three bitmaps to represent three FDSets, poll uses a pointer to a PollFD.

struct pollfd { int fd; /* file descriptor

/ short events; /

requested events to watch

/ short revents; /

returned events witnessed */ }; The PollFD structure contains both the events to monitor and the events that occur, eliminating the use of select parameter-value passing. At the same time, pollFD does not have a maximum number (but too many can degrade performance). As with the SELECT function, after poll returns, pollFD needs to be polled to get the ready descriptor.

From above, both SELECT and poll need to iterate through the file descriptor to get the socket ready on return. In fact, a large number of clients connected at the same time may have very few ready at any one time, so the efficiency of monitoring decreases linearly as the number of descriptors increases.

epoll

Epoll was introduced in the 2.6 kernel and is an enhanced version of the previous SELECT and Poll. Compared to select and poll, epoll is more flexible and has no descriptor constraints. Epoll uses a single file descriptor to manage multiple descriptors, storing the events of the file descriptor for the user relationship into an event table in the kernel so that the copy in user space and kernel space is done only once.

3.1 Epoll Operation Process

The epoll operation requires three interfaces, which are as follows:

Int epoll_create (int size); Int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout); Copy the codeCopy the code

1. int epoll_create(int size);

Size tells the kernel how many listeners epoll can listen on. This parameter is different from the first parameter in select(), which gives the maximum number of listeners fd+1. Size does not limit the number of descriptors epoll can listen on. This is just a suggestion for the kernel to initially allocate internal data structures. When the epoll handle is created, it takes up a fd value, which is visible on Linux if you look at /proc/process id/fd/, so close() must be called after epoll is used, otherwise the FD may run out.

2. Int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

The op function performs the op operation on the specified descriptor fd.

  • Epfd: is the return value of epoll_create().

  • Op: indicates an op operation, which is represented by three macros: add EPOLL_CTL_ADD, delete EPOLL_CTL_DEL, and modify EPOLL_CTL_MOD. Add, delete, and modify listening events to fd, respectively.

  • Fd: is the FD (file descriptor) to listen on

  • Epoll_event: tells the kernel what to listen for, struct epoll_event:

    struct epoll_event { __uint32_t events; /* Epoll events / epoll_data_t data; / User data variable */ };

    // Events can be a collection of the following macros: EPOLLIN: indicates that the corresponding file descriptor can be read (including that the peer SOCKET is normally closed). EPOLLOUT: indicates that the corresponding file descriptor can be written. EPOLLPRI: indicates that the corresponding file descriptor has urgent data to read (it should indicate that out-of-band data arrives). EPOLLERR: indicates that an error occurs in the corresponding file descriptor. EPOLLHUP: the corresponding file descriptor is hung up. EPOLLET: Set EPOLL to Edge Triggered mode, as opposed to Level Triggered. EPOLLONESHOT: only listens for one event. After this event, if you need to continue listening on the socket, you need to add the socket to the EPOLL queue again to copy the code

3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

Wait for IO events on epFD, return maxEvents at most. Maxevents tells the kernel how big events are. The value of maxEvents cannot be greater than the size at which epoll_create() was created. Timeout is the timeout time (milliseconds, 0 will be returned immediately, -1 will be undefined, It is also said to be permanently blocked. This function returns the number of events that need to be processed, with a return of 0 indicating timeout.

3.2 Working Mode

Epoll operates on file descriptors in two modes: LT (Level trigger) and ET (edge trigger). The LT mode is the default mode. The differences between the LT mode and ET mode are as follows: LT mode: When epoll_WAIT detects the occurrence of a descriptor event and notifies the application program of the event, the application program does not need to process the event immediately. The next time epoll_wait is called, the application is responded again and notified of this event. ET mode: When epoll_WAIT detects that a descriptor event has occurred and notifies the application of the event, the application must process the event immediately. If not, the next time epoll_wait is called, the application will not respond again and be notified of this event.

1. LT model

LT(level triggered) is the default working mode and supports both block and no-block sockets. In this way, the kernel tells you if a file descriptor is ready, and then you can IO the ready FD. If you do nothing, the kernel will continue to notify you.

2. ET model

Edge-triggered (ET) is high-speed and supports only no-block sockets. In this mode, the kernel tells you via epoll when a descriptor is never ready to go to ready. It then assumes that you know the file descriptor is ready and will not send any more ready notifications for that file descriptor until you do something that causes that file descriptor to no longer be ready (e.g., you are sending, receiving, or receiving requests, Or an EWOULDBLOCK error will result from sending and receiving less than a certain amount of data). Note, however, that the kernel does not send any more notifications (only once) if it never IO the FD (causing it to become unready again).

ET mode greatly reduces the number of epoll events to be triggered repeatedly, so it is more efficient than LT mode. When epoll works in ET mode, it must use a non-blocking socket to avoid starving the task of processing multiple file descriptors due to a blocking read/write operation on a file handle.

3. Summary

Here’s an example:

  1. We have added a file handle (RFD) for reading data from the pipe to the epoll descriptor
  2. At this point 2KB of data is written from the other end of the pipe
  3. Epoll_wait (2) is called and it returns an RFD indicating that it is ready to read
  4. Then we read 1KB of data
  5. Calls epoll_wait (2)…

LT mode: In LT mode, epoll_wait(2) at step 5 will still be notified.

ET mode: If we used the EPOLLET flag when we added RFD to the epoll descriptor in step 1, we would probably hang after epoll_wait(2) in step 5 because the remaining data is still in the file’s input buffer and the data sender is still waiting for feedback on the published data. The ET working mode reports an event only when it occurs on the monitored file handle. So at step 5, the caller might give up waiting for the remaining data still in the file input buffer.

When using the ET model of epoll, when an EPOLLIN event is generated, we need to consider when reading the data. If the size returned by recv() is equal to the requested size, it is likely that the buffer has not finished reading the data, which means that the event has not been processed, so we need to read the data again:

while(rs){ buflen = recv(activeevents[i].data.fd, buf, sizeof(buf), 0); If (buflen < 0){// If (errno == EAGAIN){break; } else{ return; If (buflen == sizeof(buf){rs = 1; } else{rs = 0; }} Copy the codeCopy the code

EAGAIN in Linux

Development in Linux often encounters many errors (setting errno), of which EAGAIN is one of the more common (for example, for non-blocking operations). Literally, it’s a cue to try again. This error usually occurs when an application does something non-blocking (on a file or socket).

For example, open a file /socket/FIFO with the O_NONBLOCK flag if you do a continuous read operation with no data to read. Instead of blocking and waiting for the data to return when it is ready, the read function returns an error, EAGAIN, indicating that your application has no data to read now and please try again later. For example, when a system call (such as fork) fails because there are not enough resources (such as virtual memory), return EAGAIN to prompt it to call again (perhaps it will succeed next time).

4. Epoll summary in Kafka

Here we introduce the socket “read/write buffer” concept:

Horizontal triggering (conditional triggering) : Read events are triggered as long as the read buffer is not empty. Write events are triggered as long as the write buffer is insufficient. This is a programming convention and the default mode for epoll.

Edge trigger (state trigger) : Read buffer state, from idle to non-empty, trigger once; The state of the write buffer is triggered once when it changes from full to non-full. For example, if you send a large file, fill up the write cache, and then the cache is ready to write, a switch from full to full occurs.

Through analysis, we can see that:

For LT mode, avoid the “write loop” problem: the probability of the write buffer being full is very small, that is, the “write condition” will always be met, so if you register a write event, there is no data to write, but it will keep firing, so in LT mode, after writing data, be sure to cancel the write event.

For ET mode, avoid the “short read” problem: if you receive 100 bytes, it fires once, but you read only 50 bytes, and it does not fire again for the remaining 50 bytes, then the socket is invalid. Therefore, in ET mode, it is important to read through the “read buffer”.

Another difference between LT and ET is that LT applies to blocking and non-blocking IO, while ET applies only to non-blocking IO. Another argument is that ET is more high-performance, but harder to program and error-prone. Whether ET’s performance is necessarily higher than LT is debatable, and we need actual test data to speak for it. As mentioned above, epoll uses the LT mode by default, and Java NIO uses the LT mode of Epoll.

(1) The concept of “event readiness” is somewhat ambiguous for different event types

Read event ready: this is best understood as new data arriving at a remote location that needs to be read. In this case, as long as there is data in the read buffer, it will keep firing.

Write event ready: what does this mean? The local socket buffer is full. If it is not full, write events will always be triggered. So avoid the “write loop” problem and cancel the write event when you’re done.

Connect event ready: Indicates that the CONNECT connection is complete

Accept event ready: a new connection comes in, call Accept

(2) Different types of events are handled in different ways:

Connect event: Register once, then cancel after success. Yes and only 1 time

Read event: does not cancel after registration and keeps listening

Write event: Register once for each send call. The send succeeded, and the registration was cancelled

3.3 Code Demo

The following is an incomplete and ill-formed piece of code intended to illustrate the above process, removing some of the template code.

#define IPADDRESS "127.0.0.1" #define PORT 8787 #define MAXSIZE 1024 #define LISTENQ 5 #define FDSIZE 1000 #define EPOLLEVENTS 100 listenfd = socket_bind(IPADDRESS,PORT); struct epoll_event events[EPOLLEVENTS]; // Create a descriptor epollfd = epoll_create(FDSIZE); // Add the listener descriptor event add_event(epollfd,listenfd,EPOLLIN); // loop wait for for (; ;) {/ / this function returns the descriptor event number ready and ret = epoll_wait (EPOLLEVENTS epollfd, events, 1); / / processing the received connection handle_events (epollfd, events, ret, listenfd, buf); Static void handle_events(int epollfd,struct epoll_event *events,int num,int listenfd,char *buf) {int I; // Static void handle_events(int epollfd,struct epoll_event *events,int num,int listenfd,char *buf) {int I; int fd; // perform traversal; Here you just iterate through the IO events that you have prepared. Num is not the FDSIZE of the original epoll_create. for (i = 0; i < num; i++) { fd = events[i].data.fd; If ((fd == listenfd) &&(events[I].events & EPOLLIN)) handle_ACCPET (ePollFD,listenfd); else if (events[i].events & EPOLLIN) do_read(epollfd,fd,buf); else if (events[i].events & EPOLLOUT) do_write(epollfd,fd,buf); Static void add_event(int epollfd,int fd,int state){struct epoll_event ev; ev.events = state; ev.data.fd = fd; epoll_ctl(epollfd,EPOLL_CTL_ADD,fd,&ev); Static void handle_accPET (int epollfd,int listenfd){int clifd; struct sockaddr_in cliaddr; socklen_t cliaddrlen; clifd = accept(listenfd,(struct sockaddr*)&cliaddr,&cliaddrlen); if (clifd == -1) perror("accpet error:"); else { printf("accept a new client: %s:%d\n",inet_ntoa(cliaddr.sin_addr),cliaddr.sin_port); // Add a customer descriptor and the event add_event(epollfd,clifd,EPOLLIN); Static void do_read(int epollfd,int fd,char *buf){int nread; static void do_read(int epollfd,int fd,char *buf){int nread; nread = read(fd,buf,MAXSIZE); if (nread == -1) { perror("read error:"); close(fd); // Remember close fd delete_event(epollfd,fd,EPOLLIN); Else if (nread == 0) {fprintf(stderr,"client close.\n"); close(fd); // Remember close fd delete_event(epollfd,fd,EPOLLIN); Else {printf("read message is: %s",buf); // Modify_event (epollfd,fd,EPOLLOUT) is changed from read to write; Static void do_write(int epollfd,int fd,char *buf) {int nwrite; static void do_write(int epollfd,int fd,char *buf) {int nwrite; nwrite = write(fd,buf,strlen(buf)); if (nwrite == -1){ perror("write error:"); close(fd); // Remember close fd delete_event(epollfd,fd,EPOLLOUT); Else {modiFY_event (epollfd,fd,EPOLLIN); } memset(buf,0,MAXSIZE); Static void delete_event(int epollfd,int fd,int state) {struct epoll_event ev; ev.events = state; ev.data.fd = fd; epoll_ctl(epollfd,EPOLL_CTL_DEL,fd,&ev); Static void modify_event(int epollfd,int fd,int state){struct epoll_event ev; ev.events = state; ev.data.fd = fd; epoll_ctl(epollfd,EPOLL_CTL_MOD,fd,&ev); } // Note: ON the other side I don't have to copy the codeCopy the code

Four, epoll summary

In select/poll, the kernel scans all monitored file descriptors only after a certain method is called, whereas epoll registers a file descriptor through epoll_ctl(). Once a file descriptor is ready, the kernel uses a callback mechanism similar to callback. Activate this file descriptor quickly to be notified when the process calls epoll_wait(). The traversal file descriptor is removed here, and instead a mechanism that listens for callbacks. That’s the beauty of epoll.)

The advantages of epoll are as follows:

  1. There is no limit to the number of descriptors monitored. The maximum number of FDS it can support is the maximum number of files that can be opened. This number is generally much higher than 2048, for example, around 100,000 on a 1GB machine. The exact number can be checked by cat /proc/sys/fs/file-max. Generally, this number depends on the system memory. The biggest drawback of SELECT is that there is a limit to how many FDS a process can open. This is not sufficient for servers with a large number of connections. While a multi-process solution is an option (which is what Apache does), creating processes on Linux is less expensive, but still significant, and inter-process data synchronization is far less efficient than inter-thread synchronization, so it’s not a perfect solution.

The efficiency of IO does not decrease as the number of FDS monitored increases. Epoll differs from select and poll polling in that it is implemented through a callback function defined by each FD. Only ready FDS execute callback functions. If there are not a large number of idle-connections or dead-connections, epoll is not much more efficient than select/poll. However, when there are a large number of idle-connections, Epoll is much more efficient than select/poll.

Reference:

Linux IO mode and select, poll, epoll details

IO – Synchronous, asynchronous, blocking, Non-blocking difference between SELECT poll and Epoll in Linux SELECT Poll summary I/O multiplexing poll summary I/O multiplexing epoll summary