IO Model Introduction

Blocking IOnonblocking IOIO multiplexingsignal driven IOasynchronous IO

Again, the objects and steps involved in IO occurrence:

* For a network IO (here we use read as an example), it involves two system objects: *

  • One is the process (or thread) that calls the IO.
  • One is the kernel.

* When a read operation occurs, it goes through two phases: *

  • Wait for data to be prepared, for example, accept() and recv() wait for data(Waiting for the data to be ready)
  • To copy data from the kernel to the process, for example accept() receives the request,recv() receives the data sent by the connection and copies it to the kernel and then from the kernel to the process user space (Copying the data from the kernel to the process)

* For socket streams, the flow of data goes through two phases: *

  • The first step usually involves waiting for packets of data on the network to arrive and then be copied to some buffer in the kernel.
  • The second step copies the data from the kernel buffer to the application process buffer.

It is important to keep these two points in mind, because the IO models differ in that they have different situations at each stage.

Blocking I/O (Blocking IO)

In Linux, all sockets are blocking by default. A typical read operation would look like this:

When the recvfrom system call is made by a user process, the kernel begins the first phase of IO: preparing data. (For network IO, there are many times when data does not arrive in the first place. For example, a complete UDP packet has not been received. The kernel waits for enough data to arrive. This process requires waiting, meaning that it takes a while for the data to be copied into the buffer of the operating system kernel. On the user side, the entire process is blocked (by the process’s own choice, of course). When the kernel waits until the data is ready, it copies the data from the kernel to user memory. Then the kernel returns the result, and the user process unblocks and starts running again.

So, the characteristic of blocking IO is that both phases of IO execution are blocked.

Nonblocking I/O (NONblocking IO)

On Linux, you can set the socket to make it non-blocking. When a read is performed on a non-blocking socket, the flow looks like this:

When a user process issues a read operation, if the data in the kernel is not ready, it does not block the user process, but immediately returns an error. From the user process’s point of view, when it initiates a read operation, it does not wait, but gets a result immediately. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the kernel is ready and receives a system call from the user process again, it copies the data to the user’s memory and returns.

So the nonblocking IO feature is that the user process needs to constantly ask the kernel if the data is ready.

It is worth noting that the non-blocking I/O is only used to wait for data. When recvFROM is executed, the I/O is blocked synchronously

I/O multiplexing

IO multiplexing is called select poll epoll. Some places also call this IO mode event Driven IO. The advantage of select/epoll is that a single process can handle THE IO of multiple network connections simultaneously. The select, poll, epoll function polls all sockets and notifies the user process when data arrives on a socket.

This graph isn’t really that different from blocking IO’s graph, in fact, it’s worse. Because two system calls (select and recvfrom) are required, blocking IO calls only one system call (recvfrom). However, the advantage of using SELECT is that it can handle multiple Connections simultaneously.

So, a Web server using SELECT /epoll does not necessarily perform better than a Web server using multi-threading + blocking IO and may have even greater latency if the number of connections being processed is not very high. The advantage of select/epoll is not that it can process individual connections faster, but that it can process more connections.

In IO multiplexing Model, in practice, for each socket, usually set to become non – blocking, because can only make a single set of non – blocking thread/process is not blocked (or lock), can continue to process other socket. As shown in the figure above, the entire user process is actually always blocked. The process is blocked by the select function rather than by the socket IO to the block.

When a user process calls select, the entire process is blocked, and all incoming sockets are added to the select monitor list. The kernel monitors all selects-responsible sockets. After that, the select(epoll, poll, etc.) function polls all the sockets it is responsible for. These sockets are non-blocking in the SELECT monitor list. Select uses some kind of monitoring mechanism to check whether any data has arrived on a particular socket. Select returns when data is ready in either socket. The user process then calls the read operation to copy data from the kernel to the user process.

Comments: *I/O multiplexing is characterized by a mechanism whereby a process can wait for multiple file descriptors at the same time. * If any one of these file descriptors (socket descriptors) is read ready, the select() function can return. * So, IO multiplexing, by its very nature, does not have concurrency because there is still only one process or thread working at any one time. It improves efficiency because selectepoll puts incoming sockets on their ‘watch’ list and processes them immediately when any socket has readable or writable data. If a selectepoll detects a number of sockets at the same time and immediately returns them to the process, it is more efficient than one socket at a time, blocking and waiting for processing. * Of course, it can also be multi-threaded/multi-process, a connection to open a process/thread processing, so that the consumption of memory and process switching page will consume more system resources. So we can combine IO multiplexing and multi-process/multi-thread to achieve high concurrency performance. IO multiplexing is responsible for improving the efficiency of receiving socket notification, and after receiving the request, it is handed over to the process pool/thread pool to process the logic.

Asynchronous I/O (Asynchronous IO)

Asynchronous IO is rarely used in Linux. Let’s take a look at the flow:

As soon as the user process initiates the read operation, it can start doing other things. On the other hand, from the kernel’s point of view, when it receives an asynchronous read, it first returns immediately, so no blocks are generated for the user process. The kernel then waits for the data to be ready and copies the data to the user’s memory. When this is done, the kernel sends a signal to the user process telling it that the read operation is complete.

The difference and relation between blocking and non-blocking I/O and synchronous and asynchronous I/O

Blocking IO VS non-blocking IO:

Concept: Blocking and non-blocking are concerned with the state of the program while it waits for the result of the call (message, return value). A blocking call means that the current thread is suspended until the result of the call is returned. The calling thread does not return until it gets the result. A non-blocking call does not block the current thread until the result is not immediately available.

Example: Did you make a phone call to ask the bookstore owner “distributed systems” this book, if you are blocking call, you will always be “hung” himself, until the book have any results, if is a blocking call, you have no matter the boss told you that your side to play the first, of course you also want to occasionally check back in a few minutes the boss have to return the result. Blocking and non-blocking here have nothing to do with synchronous asynchrony. It doesn’t matter how your boss answers you.

Analysis: Blocking IO will block the corresponding process until the operation is complete, but non-blocking IO will return immediately if the kernel is still preparing data.

Synchronous I/O VS Asynchronous I/O:

Synchronous communication is when you make a call and it does not return until you get a result. But once the call returns, you get the return value. In other words, the caller actively waits for the result of the call. Asynchronous, on the other hand, returns the call directly after it is issued, so no result is returned. In other words, when an asynchronous procedure call is made, the caller does not get the result immediately. Instead, after the call is made, the called * notifies the caller through a status, notification, or callback function to handle the call.

Typical asynchronous programming model such as Node. Js for an example of a popular: did you make a phone call to ask the bookstore boss “distributed systems” this book, if it is a synchronous communication mechanism, the owner will say, you wait a minute, “I look up”, and then began to check ah, such as check ok, may be 5 seconds, can also be a day) to tell you the result (returning results). With asynchronous communication, the bookstore owner tells you let me check it out, calls you when it’s done, and hangs up. And when he does, he’ll call you. Here the boss calls back and forth by “calling back.”

Analysis: Before explaining the difference between synchronous I/O and asynchronous I/O, you need to define them. Stevens’ definition (actually POSIX) looks like this:

A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes;

An asynchronous I/O operation does not cause the requesting process to be blocked;

The difference between synchronous IO and synchronous IO is that the process is blocked during “IO operation”. According to this definition, the * blocking IO, non-blocking IO, and I/O multiplexing described earlier are synchronous IO. * One might say that non-blocking IO is not blocked. I/O operation refers to actual I/O operations, such as recvfrom. Non-blocking IO does not block the recvfrom system call if the kernel data is not ready. However, when the data in the kernel is ready, recvFROM copies the data from the kernel to the user’s memory, at which point the process is blocked. During this time, the process is blocked.

Asynchronous I/O is different. When a process initiates an I/O operation, it returns and does not respond again until the kernel sends a signal telling the process that the I/O is complete. During this entire process, the process is not blocked at all.

Visual example of IO model

There are four IO models: A, B, C and D are fishing. A is using an old-fashioned fishing rod, so he has to wait until the fish is hooked before pulling the rod. B’s fishing rod has a function, can show whether there is a fish hooked, so, B and the MM next to chat, every once in a while to see if there is a fish hooked, then quickly pull rod; C used a fishing rod similar to B’s, but he had the good idea of placing several fishing rods at the same time and then standing by and pulling up the corresponding rod when it was shown that a fish was hooked. D is a rich man, so he hires a man to fish for him and sends him a text message as soon as the man catches a fish.

Select/Poll/Epoll Polling mechanism

Select, poll, and epoll are essentially synchronous I/O because they are responsible for the read and write after the read and write event is ready, that is, the read and write process is blocked

Poll (Select/Poll/Epoll); Poll (Select/Poll/Epoll); Poll (Select/Poll/Epoll); What are their mechanisms for monitoring whether the socket has data arrival? What about efficiency? Which approach should we use for IO reuse? Listed below are their respective implementation methods, efficiency, advantages and disadvantages:

(1) Select, the poll implementation needs to poll all fd collections continuously by itself until the device is ready, possibly alternating between sleep and wake up several times. Epoll also calls ePollWait to poll the ready list, which may alternate between sleep and wake up several times. However, when the device is ready, ePoll calls the callback function, puts the ready FD into the ready list, and wakes up the process that went to sleep in ePollWait. Although both sleep and alternate, select and Poll traverse the entire FD collection while awake, while EPoll is awake to determine if the ready list is empty, which saves a lot of CPU time. This is the performance benefit of the callback mechanism.

(2) poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait: poll_wait Note that the wait queue here is not a device wait queue, just a wait queue defined internally by epoll. It also saves a lot of money.

About me

If the article has harvest for you, you can collect and forward, which will give me a great encouragement yo! In addition, you can pay attention to my public number [code nongfuge] (COder2025), I will continue to output the original algorithm, computer based articles!