Analyze the asynchronous I/O mechanism of Node.js step by step

What’s good about it isn’t original. What’s original about it isn’t good.

Simple Node

The sections in this article are shown in the figure below. It takes about 10 to 15 minutes to read.

background

In computer resources, I/O and CPU calculations can be performed in parallel with hardware support. So, blocking I/O in synchronous programming leads to an unnecessary waste of resources by waiting for subsequent tasks, either CPU calculations or other I/ OS.

Clearly is the hardware support, but the software does not support, is a waste. So the idea is to do everything possible to prevent congestion from causing unnecessary waiting.

Problem is introduced into

Suppose we have a set of tasks with both I/O and CPU, and suppose we have a multi-core computer with limited computer resources. What would you do to reduce the waste of resources mentioned above?

The first scheme: multithreading.

By creating multiple threads to perform CPU calculations and I/O separately, CPU calculations are not blocked by I/O.

It has the following disadvantages:

Hardware: Thread creation and thread context switching have time overhead.
Software: Deadlock, state synchronization and other problems of multithreaded programming model make developers headache.

The second solution is single-thread and asynchronous I/O

First, it can avoid the disadvantages of the above scheme.

In an event-driven manner, when a single thread performs CPU calculations, I/O is invoked asynchronously and returns results. This also keeps I/O from blocking CPU calculations.

But it also has disadvantages:

A single thread cannot take advantage of a multi-core CPU. (One thread can’t run on multiple cpus.)
The thread shoots and the whole program crashes. (Multithreading is a minor issue.)
Non-blocking I/O is implemented through polling, which consumes additional CPU resources.

Problem decomposition

We will decompose the problems described above and sort out the ideas:

T1: Reduces the I/O blocking CPU computing time.
T2: Do not introduce lock, state synchronization, etc.
T3: Can take advantage of multi-core CPU.
T4: Do not bring more extra consumption.

To solve the problem

Node uses an asynchronous call + maintenance I/O thread pool + event loop mechanism to reduce or avoid I/O blocking CPU computation time. I will explain the above three steps step by step:

The asynchronous call

In one picture.

Here we are abstracting the asynchronous call processing down to the operating system level, where we know: Asynchronous invocation means that when an application program initiates an I/O call, it sends the call signal to the operating system. Then, the application program continues to execute the call until the operating system returns the data after completing the task. The application program obtains the returned data through callback and executes the corresponding callback function in the program.

Maintain the I/O thread pool

Taking the above operating system into account, Node maintains an I/O thread pool internally.

When a JavaScript thread encounters an I/O task during execution, it makes an asynchronous call, encapsulates the parameters and request objects and places them in a thread pool wait queue for execution.

When the thread pool has a free thread, we let the free thread execute the I/O task, and when the thread completes, we return the occupied thread, and we get the result of the I/O task.

The asynchronous I/O process is as follows:

The IOCP is an Input/Output Completion Port (IOCP), an application programming interface that supports multiple simultaneous asynchronous I/O operations. It is a Windows kernel object.

Event loop mechanism

How does the JavaScript thread know that the asynchronous task is complete?

The most violent and straightforward way to do this is to have the CPU poll, which creates an infinite loop to check the I/O completion status. So now to solve the ** problem T1 (reduce I/O blocking CPU computation time). This leads to problem T4 (do not introduce additional costs.) ** because the CPU spends extra resources processing state judgments and unnecessary “idling”.

This can be abstractly interpreted as the CPU polling the state of each thread in the thread pool.

So we need to optimize problem T4 to reduce consumption as much as possible.

A famous optimization idea is to set up an ideal situation that is impossible to achieve and then design specific methods to get infinitely closer to that ideal goal. Here we want to optimize problem T4 so that it approaches problem T4 where it does not exist.

I just said that always checking the I/O status is the lowest performance scheme (this is called the READ scheme). In addition, there are several schemes as follows:

Poll event states on file descriptors (select scheme). But because it uses a 1024-length array to store state, a maximum of 1024 file descriptors are checked, which creates limitations.

The file descriptor is a simple integer that identifies each file and socket opened by the process. Don’t think 1024 is too big, it’s a really small number in the face of a flood of requests.

Based on the above mentioned use of linked list storage state (poll scheme). However, performance is poor when there are many file descriptors.
If the completion of an I/O event is not detected at the time of polling, the poll is hibernated until an event occurs to wake it up (epoll scheme). This is the most efficient I/O event notification mechanism on Linux and does not waste CPU because the polling thread (which is JavaScript thread) is already asleep.

Let’s tease out the entire epoll-based scenario by describing the producer/consumer model:

The completion of I/O events in each thread in the thread pool is the producer of the event.

Callback functions for events in JavaScript threads are consumers of events.

Step1: Polling mechanism of Node When polling the I/O completion queue and finding that the queue is empty (that is, no thread has completed I/O), the polling mechanism of Node goes to sleep.

Step2: some threads in the I/O thread pool have completed, send signals (the operating system has completed) to wake up the polling mechanism of Node, take out each completed I/O object from the I/O event completion queue, and execute the corresponding callback function.

Step3: if the I/O event completion queue is found to be empty during a poll, sleep again until it is woken up again.

The polling mechanism of the Node mentioned above is the Event Loop, and the I/O Event completion queue is also called the Event observer.

Read more about this section in chapter 3, Section 3.3.2 to 3.3.5.

After the event loop, we can figure out the entire asynchronous I/O process. As shown in the figure:

Conclusion: Node solves the T1 problem (reducing the time I/O blocks CPU computation) by making asynchronous calls + maintaining the I/O thread pool + event loops, while minimizing the effects of the T4 problem (no additional consumption). Since JavaScript execution is always single-threaded, So there is no need for locking and state synchronization, and there is no T2 problem (don’t bring locks, state synchronization, etc.).

So, while JavaScript is single-threaded, Node is multi-threaded because you maintain an I/O thread pool.

Here we only cover asynchronous I/O, and of course non-I /O asynchronous tasks, such as setTimeout. SetTimeout is simply inserting an event into the queue of timer observers (not I/O observers, but multiple observers). Each time the loop determines whether the event expires and executes it when it expires.

It is worth noting that the timer observer is a red-black tree.

Ok, finally, we are ready to tackle the T3 problem mentioned at the beginning of this article:

How to take advantage of multi-core CPU?

In fact, what we want to solve here is the problem of insufficient use of single-process single-core for multi-core.

Without further ado, Node uses a multi-process architecture and a master-worker model. Ideally, each process is assigned a dedicated CPU.

Note, however, that creating a worker process (that is, a child process) is expensive, requiring at least 30ms of startup time and 10MB of memory space. So be careful with your development.

Be clear about our purpose: multi-process is to take advantage of multi-core cpus, not to solve concurrency.

IPC can pass handles, which allows us to implement multiple processes listening on the same port, which can achieve load balancing. For details, see Chapter 9 of Node.

conclusion

Node realizes asynchronous I/O through asynchronous invocation + maintenance OF I/O thread pool + epoll-based event loop mechanism, and makes full use of multi-core CPU through master-worker multi-process architecture

In the future you can say they were wrong:

Node is single-threaded.
Node is I/O intensive, not CPU intensive.
Node writes things that hang too easily.

You can explain it to them and say: