On Sunday afternoon, I just put down the phone and was writing an evaluation to the interviewee. I was just writing “don’t have a deep understanding of the basic IO model of Linux” when my girlfriend suddenly appeared.
In Java, there are three main IO models, namely blocking IO (BIO), non-blocking IO (NIO), and asynchronous IO (AIO).
IO apis provided in Java depend on IO operations at the operating system level for file processing. For example, after Linux 2.6, BOTH NIO and AIO are implemented via epoll in Java, while AIO is implemented via IOCP on Windows.
BIO, NIO and AIO in Java can be understood as the encapsulation of various IO models of operating system by Java language. Programmers use these apis without the need to care about operating system knowledge or write different code for different operating systems. Just use the Java API.
In The Linux(UNIX) operating system, there are five IO models: blocking IO model, non-blocking IO model, I/O multiplexing model, signal-driven IO model, and asynchronous IO model.
Since we’re talking about eating fish at night, let’s use the example of fishing to explain the five IO models.
What exactly is IO
When we talk about IO, we refer to the input and output of files, but how do we define IO at the operating system level? What kind of process can be called an IO?
Take a disk file read for example. The file we want to read is stored on disk, and our goal is to read it into memory. This step can be simplified to reading data from the hardware (hard disk) into user space.
In fact, the actual file reading involves caching and other details, which I won’t go into here. If you don’t understand the relationship between user space, kernel space, hardware, etc., you can use the phishing example to understand.
When fishing, the beginning of the fish is in the fish pond, our fishing action is the final end of the mark is the fish from the fish pond by us, into the fish basket.
Here the fish pond can be mapped to the disk, the intermediate hook can be mapped to the kernel space, and the final fish basket can be mapped to the user space. A complete fishing (IO) operation is the transfer (copy) of fish (files) from the fish pond (hard disk) to the fish basket (user space).
Blocking IO model
When we fish, there is a way that is more comfortable, more relaxed, that is, we sit in front of the fishing rod, this process we do nothing, both hands have been holding the fishing rod, just quietly waiting for the fish to bite. As soon as you feel the force of the fish in your hands, scoop it up and place it in your basket. And then the next fish.
Mapped to A Linux operating system, this is the simplest IO model known as blocking IO. Blocking I/O, the simplest I/O model, typically involves a process or thread waiting for a condition, and then waiting forever if the condition is not met. If the conditions are met, go to the next step.
The application process receives the data through the system call recvFROM, but the application process blocks because the kernel is not ready for the datagram. The application process cannot stop blocking until the kernel is ready for the datagram and RecvFROM finishes copying the datagram.
This method of fishing is relatively simple, for the angler, no need for a special fishing rod, take a long enough stick can leisurely start fishing (easy to implement). The disadvantage is that it is time-consuming and suitable for situations with small demand for fish (low concurrency and low timeliness requirements).
Non-blocking IO model
When we are fishing, we can do other things like playing King of Glory or watching an episode of Yanxi Palace while we are waiting for the fish to bite. However, we have to look at the fishing rod from time to time, as soon as we find a fish on the hook, to catch the fish.
Mapped to a Linux operating system, this is the non-blocking IO model. The application process interacts with the kernel, and instead of just waiting, it simply returns. Then, through polling, constantly ask the kernel if the data is ready. If a poll finds that the data is ready, the data is copied to user space.
The application process keeps interacting with the kernel through recvFROM calls until the kernel is ready for data. If not, the kernel returns an error, and the application process sends the recvFROM request some time after receiving an error. The process can do something else first between requests.
Fishing this way, the tools used are the same as blocking IO, but you can do other things while fishing to increase your time utilization.
Signal driven IO model
When we go fishing, in order to avoid checking the fishing rod again and again, we can install an alarm for the fishing rod. Call the police immediately if a fish bites. And then when we get the alarm, we go and catch the fish.
Mapped to a Linux operating system, this is signal-driven IO. The application process tells the kernel when it reads a file, and if an event occurs on a socket, please send me a signal. After receiving the signal, the corresponding signal processing function will perform subsequent processing.
The application process registers a signal handler with the kernel in advance, and the user process returns without blocking. When the kernel data is ready, a signal is sent to the process, and the user process begins copying data from the signal handler into user space.
This way of fishing, compared to the previous several, the use of the tool has a few changes, requires some customization (implementation complexity). But the angler can do something else completely before the fish bites. Just wait for the alarm to go off.
IO multiplexing model
When we go fishing, in order to ensure that we can catch the most fish in the shortest time, we put out several fishing rods at the same time and fish at the same time. Then which rod has a fish bite, we will be the fishing rod on the fish up.
Mapping to Linux operating systems, this is the IO reuse model. The IO of multiple processes can be registered with the same pipe, which will interact with the kernel uniformly. When the data required for a request in the pipeline is ready, the process copies the corresponding data into user space.
When a user process invokes the SELECT function, the SELECT process will listen to all the registered I/OS. If the data required by all the monitored I/OS is not ready, the select calling process will block. When the data for any IO is ready, the select call returns, and the process copies the data through recvFROM.
The IO multiplexing model does not register signal handlers with the kernel, so it is not non-blocking. The process does not return a select until at least one of the IO operations monitored by the select is ready, and it also needs to send another request to copy the file.
This way of fishing, by adding fishing rods, can effectively improve efficiency.
Why are all four synchronized
We say that blocking IO model, non-blocking IO model, IO multiplexing model and signal-driven IO model are synchronous IO models. The reason is that, regardless of the above model, the real data copy process is synchronous.
Isn’t the signal drive asynchronous? Signal-driven, the kernel notifies the process when the data is ready, and the process copies the data through recvFROM. We can think of the data preparation phase as asynchronous, but the data copy operation is synchronous. Therefore, the whole IO process cannot be considered asynchronous.
We can divide the fishing process into two steps: 1. Fish biting (data preparation). 2, catch the fish and put it in the fish basket (copy of data). No matter what kind of fishing method mentioned above, in the second step, it is the human initiative to do, not the fishing rod itself. So, this fishing actually happens simultaneously.
The whole process of boiling water is completed when the alarm goes off. The water is boiling water.
When the fishing alarm goes off, it means the fish has bitten, but not actually hooked.
So, use a kettle with an alarm to boil water, and the process is asynchronous.
When fishing with an alarm rod, the fishing process is synchronized.
Asynchronous IO model
When we go fishing, we use a high-tech fishing rod, that is, fully automatic fishing rod. Can automatically sense the fish hook, automatic rod, more powerful can automatically put the fish into the fish basket. Then, notifying us that the fish had been caught, he moved on to the next fish.
Mapped to Linux, this is the asynchronous IO model. After the application process passes the IO request to the kernel, the kernel handles the file copy completely. After the kernel completes related operations, it sends a signal to inform the application process that the I/O is complete.
After the user process initiates the AIo_read operation, it passes the descriptor, buffer pointer, buffer size, and so on to the kernel, telling the kernel how to notify the process when the entire operation is complete, and then immediately does something else. When the kernel receives aiO_read, it returns immediately and waits for the data to be ready. When the data is ready, the kernel copies the data directly to the user control and then informs the process that the I/O is complete.
This way of fishing is undoubtedly the most economical. You don’t need to manage anything, just hand it to the fishing rod.
Comparison of 5 IO models
After introducing these, I silently deleted the interview comment that I had written before, “I don’t have a deep understanding of the basic IO model of Linux”, and changed it to “I don’t have a deep understanding of IO system, I can only use encapsulated API”.