Virtual thread network I/O low-level implementation

Chris Hegarty on May 10, 2021 on inside.java/2021/05/10/…

Project Loom plans to bring new features and apis to the Java virtual machine on the Java platform to support ease of use, high throughput, lightweight concurrency, and a new programming model. This opens up a number of interesting prospects, one of which is code that simplifies network interactions. Today’s servers can handle far more open sockets than they can support, which presents both opportunities and challenges

Unfortunately, writing well-extensible network interaction code is very difficult. If you exceed a threshold, you cannot scale performance using a synchronous API because such an API may block when performing an I/O operation, and thus block a thread until the operation is ready. For example, when you try to read some data from the socket, but no data is currently available. Threads are an expensive resource on today’s Java platform, and waiting for I/O operations to complete is too costly. To address this limitation, we often use asynchronous I/O or reactive frameworks, because they can be used to write code that does not bind a thread during I/O, but instead uses callback or event notification mechanisms when I/O operations are complete or ready

Using asynchronous and non-blocking apis is more challenging than using synchronous apis, in part because they result in code that is unnatural to humans. The synchronous API is easier to use in most cases; Code is easier to write, read, and debug (meaningful stack traces!). But as listed earlier, code using the synchronous API doesn’t scale as well as the asynchronous variant, which leaves a bad choice — choose the simpler synchronous code and accept that it won’t scale, or choose the more scalable asynchronous code and deal with all of its complexity. Not a good choice! One of The compelling value propositions of Project Loom is to avoid having to make that choice — synchronous code should be extensible.

In this article, we’ll look at how the Java platform’s networking API works at the bottom when calling virtual threads. The details are largely a product of implementation, and we don’t need to know when to write code on it, but it’s still interesting to know how it works at the bottom, and may help answer questions that, if not answered, could lead to having to make tough choices again.

Virtual thread

Before we go any further, we need to know a little about the new thread in Project Loom, Virtual Threads.

Virtual threads are user-mode threads scheduled by the Java virtual machine rather than the operating system. Virtual threads require very few resources, whereas a Java virtual machine may support millions of virtual threads. Virtual threads are a good choice for tasks that spend a lot of time blocking, usually waiting for I/O operations to complete.

Platform threads (threads that we are all familiar with in the current version of the Java platform) are typically kernel threads that map 1:1 to operating system scheduling. Platform threads typically have a large stack and other resources maintained by the operating system.

Virtual threads typically use a set of platform threads as carrier threads. Code executing in a virtual thread is usually unaware of the underlying carrier thread. Locks and I/O operations are scheduling points where carrier threads are rescheduled from one virtual thread to another. The virtual thread may be suspended, which makes it impossible to schedule. A suspended virtual thread can be started, which will re-enable it for scheduling.

Web API

There are two broad classes of Web apis on the Java platform

  1. Asynchronous –AsynchronousServerSocketChannel.AsynchronousSocketChannel
  2. Synchronous – java.netSocket / ServerSocket / DatagramSocket.java.nio.channels.SocketChannel / ServerSocketChannel / DatagramChannel

In the first category, asynchronous, the initial I/O operation will be completed at some later time, possibly on a thread other than the thread that initiated the I/O operation. By definition, these apis do not cause blocking system calls, so no special handling is required while the virtual thread is running

The second type, synchronization, is more interesting from the point of view of how they behave when running in a virtual thread. In this category are NIO channels that can be configured in non-blocking mode. Such channels are typically registered as I/O event notification mechanisms, such as selectors, and do not perform blocking system calls. Similar to asynchronous network apis, these apis require no special handling when run in a virtual thread, because I/O operations do not call the blocking system call itself, which is typically left to selectors. Therefore, this causes the JAVa.net.socket type and NIO channel to be configured in blocking mode. Let’s see how they work in a virtual thread.

The semantics of the synchronization API require that once an I/O operation is initialized, it must complete or fail in the calling thread before control is returned to the caller. But what if the I/O operation is “not ready”, for example, with no data to read from a socket?

Synchronous blocking API

The Java synchronous network API running in a virtual thread sets the underlying socket to non-blocking mode, and if the I/O operation called by the Java code does not complete immediately (the native socket returns EAGAIN- “not ready”/” will block “), The native socket is registered with a jVM-wide notification mechanism (a poller), after which the virtual thread is suspended, and when the underlying I/O operation completes (an event when it reaches the poller), the virtual thread is started and retries the underlying socket operation

Let’s take a closer look at this example. The retrieveURLs method will download and return responses corresponding to multiple urls

// Tuple of URL and response bytes record URLData (URL url, byte[] response) { } List<URLData> retrieveURLs(URL... urls) throws Exception { try (var executor = Executors.newVirtualThreadExecutor()) { var tasks = Arrays.stream(urls) .map(url -> (Callable<URLData>)() -> getURL(url)) .toList(); return executor.submit(tasks) .filter(Future::isCompletedNormally) .map(Future::join) .toList(); }}Copy the code

The retrieveURLs method creates a list of tasks (for each URL) and posts them to the thread pool, then waits for the results. The thread pool opens a new virtual thread for each task, and they call getURL. For simplicity, only successfully completed tasks are returned.

The getURL method is written to use the synchronous URLConnection API to get the response.

URLData getURL(URL url) throws IOException { try (InputStream in = url.openStream()) { return new URLData(url, in.readAllBytes()); }}Copy the code

The readAllBytes method is a batch synchronous read operation that reads all the response bytes. Under the shell, readAllBytes ends up at the bottom in the ‘read method of the java.net.socket input stream.

If we run a small program that uses retrieveURLs to download an HTTP URL and the HTTP server does not provide a complete response, we can check the status of the thread as follows:

$ java Main & echo $!
89215
$ jcmd 89215 JavaThread.dump threads.txt
Created /Users/chegar/threads.txt
Copy the code

In threads.txt, we see the usual system threads, as well as the main thread of our test program, and virtual threads that block during read operations. Note: Virtual threads have no name unless one is explicitly specified and are therefore unnamed.

$ cat threads.txt
...
"<unnamed>" #15 virtual
  java.base/java.lang.Continuation.yield(Continuation.java:402)
  java.base/java.lang.VirtualThread.yieldContinuation(VirtualThread.java:367)
  java.base/java.lang.VirtualThread.park(VirtualThread.java:534)
  java.base/java.lang.System$2.parkVirtualThread(System.java:2370)
  java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:60)
  java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:184)
  java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:212)
  java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:320)
  java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:356)
  java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:807)
  java.base/java.net.Socket$SocketInputStream.read(Socket.java:988)
  java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:255)
  java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:310)
  java.base/java.io.BufferedInputStream.lockedRead(BufferedInputStream.java:382)
  java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:361)
  java.base/sun.net.www.MeteredStream.read(MeteredStream.java:141)
  java.base/java.io.FilterInputStream.read(FilterInputStream.java:132)
  java.base/sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3648)
  java.base/java.io.InputStream.readNBytes(InputStream.java:409)
  java.base/java.io.InputStream.readAllBytes(InputStream.java:346)
  Main.getURL(Main.java:24)
  Main.lambda$retrieveURLs$0(Main.java:13)
  java.base/java.util.concurrent.FutureTask.run(FutureTask.java:268)
  java.base/java.util.concurrent.ThreadExecutor$TaskRunner.run(ThreadExecutor.java:385)
  java.base/java.lang.VirtualThread.run(VirtualThread.java:295)
  java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:172)
  java.base/java.lang.Continuation.enter0(Continuation.java:372)
  java.base/java.lang.Continuation.enter(Continuation.java:365)
Copy the code

Stack frames viewed from bottom up; First, we see a number of frames associated with virtual thread Settings (” Continuations “are the virtual mechanism used internally by virtual threads) that correspond to new threads created by the Executor service. Second, we see that some frames correspond to the test program that calls retrieveURLs’ and ‘getURL. Third, we see the frame corresponding to the HTTP protocol handler and the read method implemented by the socket input stream. Finally, following these frames in the stack, we can see that the virtual thread has paused, which is what we expected, because the server did not send a complete response, so there was not enough data to read the socket. But how do you start a virtual thread when data arrives on a socket?

Taking a closer look at the other system threads in threads.txt, we can see:

"Read-Poller" #16
  java.base@17-internal/sun.nio.ch.KQueue.poll(Native Method)
  java.base@17-internal/sun.nio.ch.KQueuePoller.poll(KQueuePoller.java:65)
  java.base@17-internal/sun.nio.ch.Poller.poll(Poller.java:195)
  java.base@17-internal/sun.nio.ch.Poller.lambda$startPollerThread$0(Poller.java:65)
  java.base@17-internal/sun.nio.ch.Poller$$Lambda$14/0x00000008010579c0.run(Unknown Source)
  java.base@17-internal/java.lang.Thread.run(Thread.java:1522)
  java.base@17-internal/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:161)
Copy the code

This thread is a JVM-wide read poller. At its core, it executes a basic event loop that monitors all synchronous network operations read, Connect, and Accept that are not immediately ready when invoked in a virtual thread. When the I/O operation is ready, the poller is notified and the virtual thread is then suspended appropriately after starting. For write operations, there is an equivalent write-poller.

The stack trace above was captured when the test program was run on macOS, which is why we see the stack frame associated with the poller implementation on macOS, known as kQueue. The poller uses ePoll on Linux, and WePoll on Windows (which provides an EPoll-like API on Winsock’s helper feature driver).

The poller maintains a mapping of file descriptors to virtual threads. When a file descriptor is registered with the poller, an entry is added to the mapping of the file descriptor with the registered thread as its value. When awakened by an event, the poller’s event loop uses the event’s file descriptor to find the corresponding virtual thread and unsuspend it.

extension

If you look closely, you’ll see that the behavior above isn’t too different from current extensible code that uses NIO channels and selectors — they can be found in many server-side frameworks and libraries. Virtual threads differ in the programming model exposed to developers. The former exposes a more complex model that user code must implement between event loops and maintaining application logic I/O, while the latter exposes a simpler and simpler programming model that the Java platform handles scheduling tasks and maintaining contexts across I/O boundaries.

The default scheduler for scheduling virtual threads is the fork-join Work-Stealing scheduler, which is ideal for this job. The native event notification mechanism for monitoring ready I/O operations is an equally modern and efficient mechanism provided by operating systems. Virtual threads are built on top of continuations support in the Java VM. Therefore, synchronous Java networking apis should be on the same scale as more complex asynchronous and non-blocking code constructs.

conclusion

The Synchronous Java Web API has been re-implemented by JEP 353 and JEP 373 in preparation for Project Loom. When running in a virtual thread, if the I/O operation does not complete immediately, the virtual thread will be suspended. When I/O is ready, the virtual thread is started. This implementation uses several features from the Java VM and Core libraries to provide an extensible, efficient alternative to the current asynchronous and non-blocking code constructs.

Try a build of Early Accessloom, and we’d love to hear about your experiences by sending it to the Loom-Dev mailing list.