This is the fourth day of my participation in the August Text Challenge.More challenges in August
IO model
IO is short for Input/Output. There are five IO models in Linix network programming:
- Blocking IO
- Nonblocking IO
- IO multiplexing multiplexing
- Signal Driven IO
- Asynchronous IO
Introduction to the
- The java.io package is implemented based on the flow model and provides IO functions such as File abstraction and input/output streams. The interaction mode is synchronous and blocking. The thread blocks until the input stream is read or output stream is written. The java. IO package has the advantage of simple and intuitive code, but has the disadvantage of I/O efficiency and scalability, which can easily become the bottleneck of application performance.
- Some of the network apis provided under Java.net, such as Socket, ServerSocket, and HttpURLConnection are also often classified as synchronous blocking IO libraries. Network communication is also IO behavior
- The NIO framework (java.niO package) was introduced in Java 1.4, providing new abstractions such as channels, selectors, and buffers to build multiplexed IO programs while providing high-performance data manipulation closer to the underlying operating system.
- In Java7, NIO takes a step further, also known as NIO2, and introduces Asynchronous non-blocking IO, also known as Asynchronous IO(AIO), whose operations are based on events and callbacks.
Let’s start with the difference between synchronous/asynchronous and blocking/non-blocking
Synchronous and asynchronous
Synchronous and asynchronous are specific to how user processes interact with the kernel.
- Synchronization is when a user process triggers an IO operation and waits or polls to see if the IO operation is ready. For example: go to the bank to handle business, oneself can do this thing all the time, other things can only wait for this is finished to do again
- Asynchrony means that the user process starts to do other things after the I/O operation is triggered. When the I/O operation is completed, the user process is notified that the I/O has completed. For example: entrust a relative to the bank to handle business, and then can do other things themselves. (When asynchronous I/O is used, Java delegates I/O reading and writing to the OS, passing the data buffer address and size to the OS.)
Blocking and non-blocking
Blocking and non-blocking are different ways for processes to access data, depending on the ready state of IO operations.
- Blocking means that when an attempt is made to read or write to the file descriptor, if there is nothing to read or write to at that time, the program enters a waiting state until there is something to read or write to. To deal with business, too many people need to queue up, at this time in situ waiting, until their own.
- Non-blocking means that if there is nothing to read, or nothing to write, the read and write function returns immediately, without waiting. When we do business in the bank, we get a receipt, then we can play mobile phone, or chat with others, when it is our turn, the speaker of the bank will announce, then we can do business.
Note that we need to be involved in the process of doing business, which is completely different from asynchrony, so synchronous/asynchronous and blocking/non-blocking are completely different concepts, so don’t confuse the two
I/O model classification
An application sends AN I/O request to the operating system: An application sends an I/O request to the operating system kernel. The operating system kernel waits for the data to be ready. The data may come from other applications or networks. Generally speaking, an IO is divided into two phases:
- Waiting for data: Data may come from another application or network, and if there is no data, the application blocks and waits.
- Copy data: Copies ready data to the application workspace.
In Linux, the IO operation of the operating system is a system call recvfrom(), that is, a system call recvfrom contains two steps, waiting for the data to be ready and copying the data.
Synchronous blocking IO
In this mode, after initiating an I/O operation, the user process must wait for the I/O operation to complete. The user process can run only after the I/O operation is complete. JAVA’s traditional BIO is this way. (before JDk1.4)
Synchronize non-blocking IO
JAVA NIO (introduced after jdk1.4)
In this way, the user process can initiate an I/O operation and then go back to do other things, but the user process needs to ask whether the I/O operation is ready from time to time, which requires the user process to continuously ask, thus introducing unnecessary WASTE of CPU resources. JAVA NIO is a synchronous non-blocking IO
Multiplexing IO
Redis, Nginx, Netty; Reactor model
The select, epoll; This is sometimes called event-driven IO.
The advantage of select/epoll is that a single process can handle THE IO of multiple network connections simultaneously. The basic principle is that the select/epoll function continuously polls all sockets and notifies the user process when data arrives on a particular socket.
In multiplexing, through the select function, you can simultaneously monitor the kernel operations of multiple IO requests. As long as any IO kernel operation is ready, you can notify the select function to return, and then call recvfrom() to complete the IO operation.
This process allows the application to listen for multiple IO requests at the same time, which is much more advanced than multithreaded blocking IO because the server needs only a few threads to do a lot of client communication.
Signal-driven IO model
In Unix system, when an application program initiates an IO request, a signal function can be registered for the IO request, and the request will be returned immediately. The underlying operating system stays in a waiting state (waiting for the data to be ready) until the data is ready, and then the calling program will call the system function recvfrom() to complete the IO operation.
Signal-driven IO model is also a non-blocking IO model. Compared with the above non-blocking IO model, signal-driven IO model does not need polling to check whether the underlying IO data is ready, but passively receives signals and then calls RECvFROM to perform I/O operations.
Compared with the multiplexed IO model, the signal-driven IO model is for the completion process of an IO, while the multiplexed IO model is for the scene of multiple IO simultaneously.
Asynchronous I/o
In this mode, the entire IO operation, including waiting for the data to be ready and copying the data to the application workspace, is left to the operating system. When the data is ready, the operating system copies the data into the application runtime space, and the operating system notifies the application without blocking
The difference between
If you are boiling water:
- Synchronous blocking: You put water on the stove and wait, always watching to see if it boils.
- Synchronous non-blocking: You put water on the stove and go watch TV. Every once in a while, to the stove to observe: the water boiled ah!
- Multiplex: Someone modified the kettle to whistle automatically when the water is boiling, so you just watch TV and wait for the whistle to tell you that the water is boiling.
- Asynchronous non-blocking: you arrange others to boil water, water boiled on special occasions, will call you, peace of mind to watch TV and wait.
Blocking, non-blocking, and multiplexing of multiple I/OS are all synchronous I/OS. Asynchronous I/OS must be non-blocking, so there is no such thing as asynchronous blocking and asynchronous non-blocking. True asynchronous IO requires deep CPU involvement. In other words, asynchronous I/O is only true when the user thread does not care about the EXECUTION of the I/O, leaving it all to the CPU and waiting for a completion signal. Therefore, forking child threads to poll, loop endlessly, or use SELECT, poll, or epoll is not asynchronous
A classic example
-
Blocking the I/O model
Lao Li went to the train station to buy a ticket, queued for three days to buy a refund. Consumption: eat, drink, pull, sleep at the station for 3 days, nothing else.
-
Non-blocking I/O model
Lao Li went to the train station to buy a ticket, every 12 hours to the train station to ask if there is a refund, three days later to buy a ticket. Cost: 6 trips to and from the station, 6 hours on the road, lots of other things done in the rest of the time.
-
I/O multiplexing model
Select /poll Lao Li, Lao Wang, Lao Liu… A line of people go to the railway station to buy tickets, entrusted to the scalpers (select scalpers can only receive the maximum order of 1024 people /pool scalpers are not limited), SELECT /pool scalpers have been waiting for the result of the ticket, after the scalpers get the ticket, do not know who this ticket belongs to (need to ask according to the ticket one by one), After confirmation, inform the corresponding person to pay the money to get the ticket at the railway station.
Epoll Lao Li, Lao Wang, Lao Liu… A group of people (no limit on the number of people) go to the railway station to buy a ticket, and entrust it to the scalper. After the scalper buys the ticket, it can know who the consignor is without confirmation, and then inform it to go to the railway station to pay the money and get the ticket.
The meaning of multiplexing is: after accepting Lao Li’s order, the scalper also received Lao Wang and Liu’s ticket order; Everyone uses the same scalper
-
The signal drives the I/O model
Lao Li went to the railway station to buy a ticket and left a phone number to the conductor. When he got the ticket, the conductor called Lao Li and then Lao Li went to the railway station to pay the money and get the ticket. Cost: to and from the station 2 times, 2 hours on the road, no scalper fee of 100 yuan, no phone call, no scalper
-
Asynchronous I/O model
Lao Li went to the train station to buy a ticket, leaving a telephone to the conductor, ticket, ticket Courier after the telephone notice its receipt of goods. Cost: 1 time to and from the station, 1 hour on the road, no scalper fee of 100 yuan, no phone call, no scalper
Again, IO multiplexing
I/O multiplexing is a mechanism for monitoring multiple descriptors and notifying a program to read or write when a descriptor is ready (typically read or write). But SELECT, poll, and epoll are all synchronous I/ OS in nature, because they need to do the reading and writing themselves after the read and write event is ready, that is, the reading and writing process is blocked, whereas asynchronous I/O does not need to do the reading and writing itself. The implementation of asynchronous I/O takes care of copying data from the kernel to user space.
Multiplexing refers to the simultaneous management of multiple I/O streams by recording the state of each Sock(I/O stream) in a single thread
Some personal understanding:
Here’s a simple analogy: Left several water machine, water need to the right faucet for operation, each water intake and the faucet is a one-to-one relationship, but the middle section is broken, need to connect the water pipe on (a conduit is equivalent to an IO thread), just can carry out water operation (note that the faucet is not always have water, Water operation will be triggered only when the water feeder is connected). Different IO models are explained as follows:
1. Traditional blocking BIO: There is a connection pipe between each water feeder and the faucet. The connection of the pipe triggers the water extraction operation, and the faucet will supply water. In this way, several water intakes need several water pipes. In addition, water pipes cannot get water immediately after they are connected, and they are always blocked. When there are too many water intakes, there are not enough water pipes to connect them
Thread pool mode: There are 10 water pipes in the pool. Whenever there is a water intake request, one water pipe will be used in the pool. The water pipe will be connected to the corresponding faucet according to the number of the water intake. When the water intake request too much, need to keep switching water pipe.
2, multiplexing IO:
Select /poll: All water intakes are multiplexed by a water pipe, there is no surplus water pipe available, all water intakes are connected to this water pipe. (The difference is that select mode only supports 1024 water fetchers, poll does not limit the number of water fetchers). When water comes from a tap, the hose is notified in advance, but it is not known which tap. At this time, the water pipe needs to be connected to each faucet to try, when it is found that one of the faucets has water flow, it will be transported to the water collector connected with it.
The water intakes in epoll are all multiplexed by one water pipe. There is no extra water pipe available and all the water intakes are connected to this one water pipe. Select /poll: When water comes from the tap, the water pipe already knows which tap is carrying water, so it can be directly connected to the corresponding tap.
Pseudocode describes the IO differences
-
Non-blocking busy polling
while true { for i in fd[] { if i has data read until unavailable } } Copy the code
Multiple streams can be processed by querying all streams from beginning to end, but this is a bad idea because if all streams have no I/O events, you waste CPU time slices
-
If there is A client link, create A connection and put it in array A. Continue polling the array. If there is A client I/O event during the polling process, handle it. Select can monitor only 1024 connections (a process can create only 1024 files); And there are thread safety issues;
while true { select(fds[]) // Block here until one stream has an I/O event, and the array size is 1024 for i in fds[] { if i has data read until unavailable } } Copy the code
It just knows that there is an I/O event happening, but it doesn’t know which streams it is (there could be one, more, or even all of them), so we have to poll all the streams indiscriminately to find the ones that can read or write data and operate on them. So select has O(n) undifferentiated polling complexity, and the more streams that are processed at the same time, the longer the undifferentiated polling time
-
Poll: A number of fixes have been made in SELECT, such as not limiting the number of connections monitored; But there are also thread-safety issues;
Poll is essentially no different from SELECT in that it copies arrays passed in by the user into kernel space and queries the device state for each FD, but there is no limit to the maximum number of connections because it is stored based on linked lists.
-
Epoll: Also monitors I/O events, but if an I/O event occurs, it tells you which connection is responsible for the event, eliminating polling for access. It is thread-safe, but only supported by Linux;
while true { active_fds[] = epoll_wait(epollfd) for i in active_fds[] { read or write till } } Copy the code
Epoll can be understood as an event poll. Unlike busy polling and undifferentiated polling, epoll notifies us of which FLOW has what I/O event. So we say that epoll is actually event-driven (each event is associated with a FD) and that our operations on these streams make sense. (Complexity reduced to O(1))