Asynchronous Non-blocking IO and Event Loops

Node.js is a JavaScript runtime environment built on Chrome’s V8 engine. It uses a single-threaded, event-driven, non-blocking I/O approach for high concurrency requests, and Libuv provides asynchronous programming capabilities.

Architecture of

From this diagram, we can see that the underlying framework of Node.js is composed of three parts: Node.js standard library, Node Bindings and underlying library.

Node. Js standard library

This layer is written in Javascript, which is the API we can call directly as we use it. You can see it in the lib directory of the source code, with common core modules such as HTTP, FS, events and so on

Node bindings

This layer can be understood as a bridge between javascript and C/C++ libraries. Through this bridge, the C/C++ libraries implemented at the bottom are exposed to the javascript environment. At the same time, JS is passed to V8.

The underlying library

This layer mainly has the following four pieces:

V8: Google’s Javascript virtual machine that provides an environment for Javascript to run on the non-browser side;
Libuv: Node.js provides cross-platform, thread pool, event pool, asynchronous I/O capability, which is the main reason Nodejs is efficient;
C-ares: Provides asynchronous dnS-related capabilities;
Http_parser, OpenSSL, and zlib: Provide HTTP parsing, SSL, and data compression capabilities.

By the way, looking at the libuv architecture diagram, you can see that Nodejs’s network I/O, file I/O, DNS operations, and some user code all work in Libuv.

Single thread

We know that there are generally two schemes for task scheduling: one is single-threaded serial execution, the execution order is consistent with the coding order. The biggest problem is that multi-core CPU cannot be fully utilized. When parallelism is extremely high, the computing power of single-core CPU is 100% in theory; The other is multi-threaded parallel processing, the advantage is that it can effectively use multi-core CPU, but the disadvantage is that creating and switching threads is expensive, but also involves locking, state synchronization and other issues, the CPU will often wait for I/O to end, CPU performance is wasted.

It usually takes 2M to create a thread for a client connection, so in theory an 8GB server can support a maximum of 4000 concurrency in a Java application. Node.js uses only one thread and fires an internal event whenever a client makes a connection request, making it look parallel through a non-blocking I/O, event-driven mechanism. In theory, a server with 8 gigabytes of memory can accommodate 30 to 40, 000 users at a time.

Node.js uses a single-threaded solution that eliminates the complexity of locking and state synchronization and improves CPU utilization. In addition to being efficient because it is single-threaded, Node.js must also work with the non-blocking I/O described below.

Non-blocking I/O

concept

First of all, for a network IO, there are two system objects involved:

The process or thread that calls this IO
System kernel

When a read occurs, it goes through two phases:

Wait for data to be ready
Copying data from the kernel to the user process is important to keep these two points in mind, because the difference in the IO model is that there are different situations at each stage.

Let’s clarify these concepts:

Blocking I/O: After an I/O operation is initiated, the process is blocked until a response is received or the process times out.
Non-blocking I/O: Initiates AN I/O and returns without a response or timeout, allowing the process to continue with other operations, but polling to check whether the data is ready
Multiplexing I/O: Select, pool, and EPool. The biggest advantage is that a single process can simultaneously handle IO for multiple network connections. The basic principle is thatselect/pollThis function continuously polls all the sockets it is responsible for and notifies the user process when data arrives from one socket. whileepoolNotification mechanism via callback. The most efficient I/O event mechanism under Linux.
Synchronous I/O: Initiates AN I/O and blocks the process until a response or a timeout occurs. The former threeBlocking I/O, non-blocking I/O, multiplexing I/OIs synchronous I/O. Note that non-blocking I/O is synchronous I/O because the process is still blocked when data is copied from the kernel to the user process.
Asynchronous I/O: Directly returns to continue to execute the next statement. When the I/O operation is complete or data is returned, the PROCESS that performs THE I/O operation is notified in the form of events.

Conclusion:

Blocking I/O differs from non-blocking I/O in whether it waits or returns until the I/O operation is completed or data is returned.

Synchronous I/O differs from asynchronous I/O in whether the process is blocked until the I/O operation is completed or data is returned.

Design concept

Because of the non-blocking I/O mechanism in Node.js, after the code that reads the data is executed, it immediately executes the code that follows it, and the code that handles the result of reading the data is placed in the callback function, thus improving the execution efficiency of the program. When an I/O is complete, the I/O thread is notified of the completion of the I/O operation in the form of an event, and the thread executes a callback function for the event.

Event loop

The basic flow

Each Node.js process has only one main thread that executes program code, forming oneExecution stack(Execution Context stack);
In addition to the main thread, it maintains oneThe event queue(Event Queue) when the user’sNetwork request or other asynchronous operationWhen it arrives, it is queued up in the event queue, it is not executed immediately, and the code is not blocked until the main thread has finished executing.
The main thread code executes after completion and then passesEvent loop mechanism(Event Loop), checks whether there are events to be processed in the queue, retrieves the first Event from the head of the queue, and retrieves the first Event from theThe thread poolAssign one thread to handle this event, then a second, then a third, until all events in the queue have been executed. When an event completes, the main thread is notified, and the main thread performs a callback and returns the thread to the thread pool. This process is calledEvent loop(the Event Loop);
Repeat step 3 above over and over;

Six stages

Note:

Each box is called a process phase of the event loop.
There’s one for each stageFIFO(First in, first out) The queue in which the callback function is executedNormally, when the event loop enters a given phase, all operations specific to that phase will be performed, and then the callback events of the queue for that phase will be executed until the queue is exhausted or the maximum execution limit is exceeded, and then the event loop will move to the next phase;
In the poll phase, new processing events may be added to the queue of the kernel, that is, new polling events are added when polling events are processed. Therefore, running the callback event for a long time will make the running time of the poll phase exceed the threshold of the timer.
An Event loop is said to have completed a tick when all stages have been sequentially executed once

Phase overview:

Timers phase: performsetTimeoutandsetIntervalScheduled callback.
Pending Callbacks phase: Used to execute events that have been delayed until this round of the previous round of the event loopI/O callback function.
Idle,prepare: Internal use only.
The poll phase: The most important stage, executionI/O eventsCallback, which node blocks at this stage under appropriate conditions.
Check phase: performsetImmediateThe callback.
Close callbacks phase: A callback to a close event, such as a socket or handle that suddenly closes;

The Pending, Idle/Prepare, and Close phases of the event loop are greyed because these are the phases used internally by the Node.

Node.js developers write code that runs only in the mainline, Timers, Poll, and Check phases as microtasks.

To process asynchronous I/O events as quickly as possible, the event loop ticksThere is always a tendency to maintain poll status;
How long the current poll phase should remain (blocked) is determined byWhether non-empty callback function queues exist in subsequent tick phases 和 The most recent timer time nodeDecision. If all queues are empty and no timers exist, the event loop willIt is maintained indefinitely in the poll phase; In order to realize that once there is an I/O callback function added to the poll queue can be immediately executed;
All the callback functions in the queue in the check phase are setImmediate from the poll phase

The poll stage has two main functions:

When the timers timer expires, the timers (setTimeout and setInterval) callback is executed
Execute the I/O callback in the poll queue

If the Event Loop enters the poll stage and the timer is not set in the code, the following situation may occur:
1. If the poll Queue is not empty, the Event Loop synchronously executes the queue’s callback until the queue is empty or the number of callback executions reaches the upper limit.
2. If the poll Queue is empty, the following can happen:
  - If the code uses setImmediate() to set the callback, the Event Loop ends the poll phase and goes to the Check phase, and queues the check phase.
  - If the code doesn’t use setImmediate(), the Event Loop blocks at that stage waiting for callbacks to join the Poll queue, and executes immediately if a callback comes in. Once the poll Queue is empty, the Event Loop checks the timers and if any timer has expired, the Event Loop returns to the timers phase and executes the Timer queue.

process.nextTick

Process.nexttick () is not executed at any stage of the Event Loop, but is executed in the middle of the switch between stages, that is, before switching from one stage to the next.

Macrotask (macrotask) refers to the tasks performed at each stage of the Event Loop, and microtask (microtask) refers to the tasks performed between each stage.

That is, the above six phases belong to macroTask, and process.nexttick () belongs to microTask.

The implementation of process.nexttick () has nothing to do with V8’s microTask, but is something at the Node.js level. It should be said that process.nexttick () behaves like a MicroTask. Promise.then is also a type of MicroTask.

You can “starve” your I/O by recursively calling process.nexttick (), preventing the event loop from reaching the polling stage.

The promise.then callback is executed as a microprocessing, just like process.nexttick. However, if both are in the same microtask queue, the callback to Process. NextTick will be executed first. NextTick > Promise. then = queueMicrotask

Case analysis

Case a

Let’s look at the results of this code under different circumstances:

setTimeout((a)= >{
    console.log('timer1')

    Promise.resolve().then(function() {
        console.log('promise1')})},0)

setTimeout((a)= >{
    console.log('timer2')

    Promise.resolve().then(function() {
        console.log('promise2')})},0)
Copy the code

First, in the browser environment, the output is:

timer1
promise1
timer2
promise2
Copy the code

Not with the help of previous knowledge about Javascript event loops in browsers.

We then run it on node.js versions below V11.0.0 and get the result:

timer1;
timer2;
promise1;
promise2;
Copy the code

However, if executed in node.js versions above (including) V11.0.0, the result will be:

timer1;
promise1;
timer2;
promise2;
Copy the code

The reason is that under Node V11, the microtask queue is executed only when all the tasks in the Timers phase queue are executed, while the browser executes the microtask queue when only one macro task is executed. In Node V11, setTimeout and setInterval in the Timer phase and immediate in the check phase are changed to execute the microtask queue as soon as a task in a phase is executed. Also to be consistent with the browser.

Case 2

setImmediateThe script is designed to execute when the current polling poll phase is complete
setTimeoutPlan to run the script after the minimum threshold in milliseconds has passed

For scripts that are not in the I/O callback (that is, the main module), the order in which the two timers are executed is uncertain because it is constrained by machine performance, such as:

setTimeout((a)= > {
  console.log('timeout');
}, 0);

setImmediate((a)= > {
  console.log('immediate');
});
Copy the code

The order of output is uncertain.

We know that the setTimeout callback executes in the timer phase, the setImmediate callback executes in the check phase, the Event loop begins by checking the timer phase, but it takes a certain amount of time to get to the timer phase before it starts, So two things happen:

If the preparation time before the timer exceeds 1ms and loop->time >= 1 is satisfied, the callback function of the timer phase (setTimeout) is executed.
If the preparation time before the timer is less than 1ms, the setImmediate callback function in the check phase is performed first, and the next Event loop performs the setTimeout callback function in the timer phase.

If both calls are in an I/O callback, immediate always executes first.

const fs = require('fs');

fs.readFile(__filename, () => {
  setTimeout((a)= > {
    console.log('timeout');
  }, 0);
  setImmediate((a)= > {
    console.log('immediate');
  });
});
Copy the code

Analysis is as follows:

After the fs.readFile callback is executed;
Register the setTimeout callback function in the timer phase.
Register the callback function of setImmediate to the check phase;
The Event loop exits the pool phase and proceeds to the next phase, which happens to be the Check phase, so The callback function of setImmediate executes first. After the Event loop is finished, enter the next Event loop and execute the setTimeout callback function.

Case 3

setInterval((a)= > {
  console.log('setInterval')},100)

process.nextTick(function tick () {
  process.nextTick(tick)
})
Copy the code

Run result: setInterval will never print.

NextTick will loop indefinitely, blocking the Event loop in the MicroTask phase so that callbacks for other MacroTask phases on the Event Loop have no chance to execute. The solution is often to use setImmediate instead of Process. NextTick, as follows:

setInterval((a)= > {
  console.log('setInterval')},100)

setImmediate(function immediate () {
  setImmediate(immediate)
})
Copy the code

Run result: print setInterval every 100ms.

NextTick still registers the tick function at the end of the current microtask, so the microtask will never finish executing. Performing setImmediate within setImmediate registers the immediate function to the next check phase of the Event loop, instead of the current check phase, This gives the opportunity for other Macrotasks to execute on the Event loop.

Four cases

setImmediate((a)= > {
  console.log('setImmediate1')
  setImmediate((a)= > {
    console.log('setImmediate2')
  })
  process.nextTick((a)= > {
    console.log('nextTick')
  })
})

setImmediate((a)= > {
  console.log('setImmediate3')})Copy the code

Run the following command on node V11:

setImmediate1
setImmediate3
nextTick
setImmediate2
Copy the code

Above node V11:

setImmediate1
nextTick
setImmediate3
setImmediate2
Copy the code

For the same reason as in case one

Case 5

setImmediate((a)= > {
  console.log(1)
  setTimeout((a)= > {
    console.log(2)},100)
  setImmediate((a)= > {
    console.log(3)
  })
  process.nextTick((a)= > {
    console.log(4)
  })
})
process.nextTick((a)= > {
  console.log(5)
  setTimeout((a)= > {
    console.log(6)},100)
  setImmediate((a)= > {
    console.log(7)
  })
  process.nextTick((a)= > {
    console.log(8)})})console.log(9)
Copy the code