Relevant concepts
Regarding network IO, Java provides blocking BIO (Block IO) and non-blocking IO (nonblock IO). Blocking means that while doing some IO related operation, the current thread will, in some cases, get stuck and not execute any code.
Blocking IO
So why does the BIO block or where does it block?
IO operations are provided by the operating system. The application layer invokes the operating system kernel through syscall, and the kernel performs related operations. Network I/O reads data from a network adapter. When no data is transmitted over the network, the operating system waits until the data is read and then returns the data. As a result, network I/O blocks. Both the Accept () of the ServerSocket and the read method of the Socket InputStream block in Java.
Read the data
How to deal with the problem of blocking when reading data?
In practice, we cannot use the read method to return EOF to determine if the data has been read. This is because the server receives the EOF only when the client actively closes the connection, and the server’s read() method blocks until the client closes the connection and sends no data. There are two simple ways to avoid these problems.
- Using nio.
- Use short links or define protocol formats.
This article deals only with bio, so only the second approach is covered here.
Defining the protocol format means that the client and server send/read data in a specified format, so that the server knows how to read and how much data to read, without waiting, and without blocking.
Take HTTP as an example. HTTP protocol specifies the format. For example, the GET request does not contain body data and ends with two consecutive carriage return newlines, so that the server does not read the data after reading two consecutive carriage return newlines. Since the POST method contains the body data and the body has no end flag, the POST method provides the Content-Length request header, which records the Length of the body data, so the server can read the body data at that Length without blocking.
Why is BIO less efficient than NIO?
The BIO is blocked, so it needs a thread to process a connection, for example: Thread processing connection 1, when a thread waiting connection 1 to send data, the thread will be blocked, if by this time 2 data connection, we can’t read, so have to use a thread to read the connection 2 data, in this way, will have to use a connection a thread to deal with this model, multiple threads can cost a lot of resources by the server. Nio because not blocked and use based on event approach, the server can use a thread to handle the connection, if there is a network, speaking, reading and writing time, then this thread will be taking out the connection, business decisions to the thread, so, nio is to use a thread to manage all connection to deal with this model, will only take up a few resources.
implementation
How to implement an HTTP BIO framework and what to look for?
When implementing the HTTP service using bio, you need to handle the request header Connection: keep-alive in addition to the normal protocol parsing. The purpose of this request header is to reuse the connection. Since TCP requires three handshakes to establish a connection and four waves to disconnect a connection, frequent disconnection with short connections may affect performance.
Therefore, when processing HTTP connections, you need to record and track the status of HTTP connections. In general, we use a thread to receive new connections and a thread pool to process new connections. If the business Connection is closed, no processing is required. If it is not closed and contains Connection: The keep-alive request header needs to be added back to the thread pool for processing.
At the same time, considering the influence of too many idle connections on the system, we also need to record the time when the socket was finished processing last time, use a thread to monitor the connection in the thread pool with too long idle time, and periodically clean up these connections to ensure that the system can process normally.
conclusion
This article discussed what you need to be aware of when reading network data using bio and what you need to be aware of when implementing HTTP connection processing. In implementation, there are various issues and complex scenarios to consider, and how to implement while maintaining performance will be a very big challenge.