Even if you understand how Java NIO’s non-blocking features work (selectors, channels, buffers, and so on), it is still difficult to design a non-blocking server. Non-blocking IO contains several challenges compared to blocking IO. This non-blocking server tutorial will discuss the main challenges of non-blocking servers and describe some potential solutions for them.
The ideas presented in this tutorial are designed around Java NIO. However, I believe these ideas can be reused in other languages, as long as they have some sort of Selector construct. As far as I know, this structure is provided by the underlying operating system, so it’s likely that you can access it in other languages as well. ### non-block server – GitHub library I have created a simple proof-of-concept of the ideas presented in this tutorial and put it in a GitRebu Repository for you to look at. Here is the GitHub Repository: github.com/jjenkov/jav… ### Non-blocking IO pipes A non-blocking IO pipe is a chain of components that process non-blocking IO. This includes reading and writing IO in a non-blocking manner. Here is a simplified example of a non-blocking IO pipe:
Non-blocking IO pipes do not need to read and write data. Some pipelines can only read data, and some can only write data.
The figure above shows only a single component. A non-blocking IO pipe may have multiple components that process incoming data. The length of a non-blocking IO pipe depends on what the pipe needs to do.
A non-blocking IO pipe may also read from multiple channels simultaneously. For example, reading data from multiple Socketchannels.
The control flow shown above has also been simplified. It is a component that starts reading data from a channel through a selector. Instead of a Channel pushing data to a Selector, it enters the component from there, even if that’s what’s shown above. The biggest difference between non-blocking and blocking IO pipes is how data is read from the underlying channel (socket or file).
IO pipes typically read data from a stream (from a socket or file) and split the data into consistent messages. This is similar to using a tokenizer to decompose the data stream into tokens for analysis. Instead, you break up the data stream into larger messages. I will invoke the message that splits the message flow into message readers. Here is an illustration of a message reader breaking messages into messages:
Use the blocking IO interface to simplify the implementation of message readers. Blocking message readers never have to deal with cases where no data is read from the stream, or where only partial messages are read from the stream and need to resume message parsing later.
Similarly, a blocking message writer (a component that writes messages to the stream) never has to deal with writing only part of the message, and then having to resume writing the message later. While intercepting a message reader is easier to implement, it has the unfortunate disadvantage of requiring a separate thread for each stream that needs to be split into messages. This is necessary because the IO interface of each stream blocks until there is some data to read from it. This means that a single thread cannot attempt to read from one stream and, if there is no data, from another stream. As soon as the thread tries to read data from the stream, it blocks until there is actually some data to read.
If the IO pipe is part of a server that must handle a large number of concurrent connections, the server will need to connect one thread per active entry. If the server has only a few hundred concurrent connections at any one time, this may not be a problem. However, this type of design does not scale well if the server has millions of concurrent connections. Each thread will provide 320K (32-bit JVM) and 1024K (64-bit JVM) memory for its stack. Then 1.000.000 threads would take up 1TB of memory! That is, before the server has already used any storage to process incoming messages, such as storage allocated to objects used during message processing.
To reduce the number of threads, many servers use a design in which the server maintains a thread pool (say 100) that reads messages from inbound connections at a time. Inbound connections are retained in the queue, and the thread processes messages from each inbound connection sequentially and queues inbound connections. Here is an illustration of the design:
Some server designs try to mitigate this problem by allowing some flexibility in the number of threads in the thread pool. For example, if the thread pool runs out of threads, the thread pool might start more threads to handle the load. This solution means that a larger number of slow connections are required to render the server unresponsive. But keep in mind that there is still an upper limit on how many threads you can run. So, this doesn’t scale up well with 1000.000 slow connections. ### Basic non-blocking IO pipe design Non-blocking IO pipes can use a single thread to read messages from multiple streams. This requires that the stream can be switched to non-blocking mode. When in non-blocking mode, the stream may return zero or more bytes when you try to read data from it. If the stream has no data to read, zero bytes are returned. When the stream actually has some data to read, 1+ bytes are returned.
To avoid checking for streams with 0 bytes, we use Java NIO Selector. One or more SelectableChannel instances can be registered using Selector. When you call select () or selectNow () on a selector, it only gives you SelectableChannel instances that actually have data to read. Here is an illustration of the design:
- Detects whether there are complete messages in the data block.
- How to process partial messages until the rest of the message arrives. Detecting a complete message requires the message reader to look at the data in the data block to see if the data contains at least one complete message. If the data block contains one or more complete messages, those messages can be sent down the pipe for processing. The process of finding the complete message will be quite repetitive, so this process must be as fast as possible.
Whenever there is a partial message in the block, or one or more complete messages, the message needs to be stored until the rest of the message arrives through the channel.
It is the responsibility of the message reader to detect the complete message and to store part of the message. To avoid mixing message data from different channel instances, we will use a message reader design for each channel as follows:
Message readers are, of course, protocol specific. A message reader needs to know the message format of the message it is trying to read. If our server implementation is reusable across protocols, we need to be able to plug in a message reader implementation – possibly by somehow accepting a message reader factory as a configuration parameter. Now that we have determined that the message reader is responsible for storing part of the message until we receive the complete message, we need to figure out how this partial message storage should be implemented.
Two design considerations should be considered:
- We want to replicate as little message data as possible. The more replication, the lower the performance.
- We want the complete message to be stored in sequential byte sequences to make it easier to parse the message. ##### Buffer for each message reader Obviously, part of the message needs to be stored in some kind of buffer. The straightforward implementation is to simply use a buffer inside each message reader. But how big should the buffer be? It needs to be large enough to store the maximum allowed messages. Therefore, if the maximum allowed message is 1MB, then the internal buffer in each message reader needs to be at least 1MB.
When we get to millions of connections, using 1MB per connection is not really a job. 1.000.000 x 1MB is still 1TB of memory! What if the maximum message size is 16MB? Or 128 MB? ##### resizable buffers Another option is to implement resizable buffers in each message reader. A resizable buffer will start small, and if the messages in the buffer are too large, the buffer will be extended. This way, each connection does not necessarily require, for example, a 1MB buffer. Each connection takes up as much memory as possible because they need to save the next message.
There are several ways to implement resizable buffers. All of these have advantages and disadvantages, so I’ll discuss them in the following sections. ###### The first way to implement resizable buffers by copying resizing is to start with, for example, small buffers. 4 KB. If messages cannot fit into the 4KB buffer, 8KB can be allocated and the data from the 4KB buffer copied to the larger buffer.
The advantage of the copy-sized buffer implementation is that all of the message’s data is stored in a contiguous byte array. This makes parsing the message much easier.
The disadvantage of the resized copy buffer implementation is that it causes a large amount of data to replicate a larger message.
To reduce data replication, you can analyze the size of messages flowing through the system to find buffer sizes that would reduce replication. For example, you might see that most messages are less than 4KB because they contain very small requests/responses. This means that the first buffer size should be 4KB.
Then you might see that if the message is larger than 4KB, it’s usually because it contains a file. You may notice that most files flowing through the system are less than 128KB. It makes sense to make the second buffer size 128KB.
Finally you might see that once messages exceed 128KB, there is no real pattern in message size, so perhaps the final buffer size should be the maximum message size.
With these three buffer sizes, you can reduce the number of data copies depending on the size of the messages flowing through the system. Messages smaller than 4KB will never be copied. For 1.000.000 concurrent connections, resulting in 1.000.000 x 4KB = 4GB, which is possible on most servers today (2015). Messages between 4KB and 128KB will be copied once, and only 4KB of data will be copied into the 128KB buffer. Messages between 128KB and the maximum message size are copied twice. The first 4KB will be copied and the second 128KB will be copied, so the total 132KB will be copied as the largest message. Given that there are not so many messages over 128KB, this might be acceptable.
Once the message is fully processed, the allocated memory should be freed. This way, the next message received from the same connection again starts with the minimum buffer size. This is necessary to ensure that memory can be shared more efficiently between connections. It is likely that not all connections require large buffers at the same time.
I have a full tutorial on how to implement such an in-memory buffer that supports resizable arrays: Resizable arrays. The tutorial also includes a link to the GitHub repository, where the code shows a working implementation. ##### Resizing by appending Another way to resize a buffer is to make the buffer consist of multiple arrays. When you need to resize the buffer, you simply allocate another byte array and write data to that array.
There are two ways to grow such a buffer. One way is to allocate individual byte arrays and keep a list of those byte arrays. Another approach is to allocate a larger slice of the shared byte array and then save a list of the slices allocated to the buffer. Personally, I think the slicing method is slightly better, but not by much.
The advantage of growing the buffer by appending a separate array or slice is that no data needs to be copied at write time. All data can be copied directly from a socket to an array or chip.
The disadvantage of growing the buffer this way is that the data is not stored in a single contiguous array. This makes message parsing more difficult, because the parser needs to look at the ends of each array and the ends of all arrays at the same time. Because you need to look for the end of a message in the data being written, this model is not easy to work with. #####TLV encoded message type-length-value Some protocol message formats are encoded in TLV format (Type, length, and value). This means that when the message arrives, the total length of the message is stored at the beginning of the message. This way you can immediately know how much memory is allocated for the entire message.
TLV encoding makes memory management easier. You immediately know how much memory to allocate for messages. No memory is wasted at the end of a partially used buffer.
One disadvantage of TLV encoding is that all memory is allocated for the message before all the data for the message arrives. Some slow connections that send large messages can allocate all available memory, making the server unresponsive.
The solution to this problem is to use a message format that contains multiple TLV fields. Therefore, memory is allocated to each field, not the entire message, and only when the field arrives. However, a large area may have the same effect on your memory management.
Another solution is timeouts such as 10-15 seconds for messages not received. This allows your server to recover from the simultaneous arrival of many large messages, but it still leaves the server unresponsive for a while. In addition, intentional DoS (denial of service) attacks can still cause your server to fully allocate memory.
TLV coding has different variations. Exactly how many bytes are used, so the type and length of the specified field depends on each individual TLV encoding. There is also TLV encoding, first for the length of the field, then for the type, then for the value (LTV encoding). Although the order of the fields is different, it is still a TLV variation.
The fact that TLV encoding makes memory management easier is one of the reasons HTTP 1.1 is such a bad protocol. This is one of the problems they are trying to solve in HTTP 2.0 when data is transferred in LTV encoded frames. That’s why we designed our own network protocol for the Vstack. co project, which uses TLV encoding. Writing data in a non-blocking IO pipe is also a challenge. When you call write (ByteBuffer) on a channel in non-blocking mode, there is no guarantee of how many bytes are being written to the ByteBuffer. The write (ByteBuffer) method returns how many bytes were written, so you can keep track of the number of bytes written. This is the challenge: keep track of the message that the section wrote so that all bytes of the message are eventually sent.
To manage writing partial messages to channels, we will create a message writer. Just as with message readers, we need message writers for each channel to write messages. Inside each Message Writer, we keep track of the number of bytes of messages it is writing.
If more messages arrive at the message writer instead of writing directly to the channel, the messages need to be queued inside the message writer. The message writer then writes the message to the channel as fast as possible.
Here is a diagram that shows how part of the message is designed:
If you have a lot of connections, you’ll have a lot of message writer instances. Checking, say, a million Message Writer instances to see if they can write any data is slow. First, many Message Writer instances don’t have any messages to send. We don’t want to check those Message Writer instances. Second, not all Channel instances are ready to write data. We don’t want to waste time trying to write data to a channel that can’t accept any data.
To check if the channel is ready for writing, you can register the channel using a selector. However, we don’t want to register all Channel instances using Selector. Imagine if you had 1.000.000 connections, most of which were idle, and all 1.000.000 connections were registered for Selector. Then, when you call select (), most of these Channel instances will be ready to write (they’re mostly free, remember?). . You must then check the message writers of all these connections to see if they have any data to write.
To avoid checking all message writer instances of messages, and all channel instances that have no messages sent to them, we use this two-step method:
-
When a message writes to a message writer, the message writer registers its associated channel with the selector (if it is not already registered).
-
When your server has time, it checks the selector to see which registered channel instances are ready to write. For each write-ready channel, its associated message writer is required to write data to the channel. If the message writer writes all of its messages to its channel, the channel is unregistered from the selector again. This little two-step method ensures that only Channel messages written to their instances are actually registered with the Selector. As you can see, the non-blocking server needs to check incoming data from time to time to see if it has received any new complete messages. The server may need to check several times until it receives one or more complete messages. Checking once is not enough.
Similarly, a non-blocking server needs to check from time to time for data to be written. If so, the server needs to check whether the corresponding connection is ready to write data. It is not enough to check only when a message is first queued, because the message may be partially written.
- In summary, a non-blocking server eventually needs to perform three “pipes” periodically:
- Read pipe to check for new input data from an open connection.
- The process pipeline that processes any received complete message. The write pipe checks whether any outgoing message can be written to any open connection. These three pipes are executed repeatedly in a loop. You might optimize the execution a little bit. For example, if there are no queued messages, the write pipe can be skipped. Or, if we do not receive a new, complete message, maybe we can skip the process pipeline.
Here is a diagram illustrating the entire server cycle:
Github.com/jjenkov/jav…