When it comes to Node.js, I believe most front-end engineers will think of developing server side based on it. They only need to master JavaScript to become a full stack engineer, but in fact node.js is more than that.
Many high-level languages have execution rights that touch the operating system, with the exception of JavaScript running in the browser, which creates a sandbox environment for it that locks the front end engineer into an ivory tower of the programming world. Node.js, however, has made up for that, allowing front-end engineers to get to the bottom of the computer world.
Therefore, node.js is not only significant for front-end engineers to provide full-stack development capabilities, but more importantly, it opens a door to the bottom world of computers for front-end engineers. This article opens the door by analyzing the implementation principle of Node.js.
Node.js source code structure
Node.js source repository has more than a dozen dependencies under the /deps directory, including both C language modules (such as Libuv, V8) and JavaScript language modules (such as Acorn, acorn-plugins), as shown in the figure below.
- Acorn: A lightweight JavaScript parser written in JavaScript.
- Acorn-plugins: Acorn extension that allows Acorn to support ES6 feature parsing, such as class declarations.
- Brotli: Brotli compression algorithm written in C language.
- Cares: should be written as “C-ares”, written in C language to handle asynchronous DNS requests.
- Histogram: Written in C language, realizing the histogram generation function.
- Icu-small: International Components for Unicode (ICU) library, written in C language and customized for Node.js, including some functions for manipulating Unicode.
- LLHTTP: A lightweight HTTP parser written in C language.
- Nghttp2 / NGHTTp3 /ngtcp2: processes HTTP/2, HTTP/3, and TCP/2 protocols.
- Node-inspect: Enables node. js to support CLI debug debugging mode.
- NPM: Node.js module manager written in JavaScript.
- Openssl: C language written, encryption-related module, TLS and crypto module are used.
- Uv: WRITTEN in C language, using non-blocking I/O operations, provides Node.js with the ability to access system resources.
- Uvwasi: C language, WASI system call API.
- V8: C language, JavaScript engine.
- Zlib: For fast compression, Node.js uses zlib to create synchronous, asynchronous, and data stream compression and decompression interfaces.
The most important of these are the modules corresponding to the V8 and UV directories. V8 itself does not have the ability to run asynchronously, but is implemented with the help of other threads in the browser, which is why JS is often said to be single-threaded, since the parsing engine only supports synchronous parsing of code. However, in Node.js, asynchronous implementations mainly rely on Libuv, so let’s focus on the implementation principle of Libuv.
What is a libuv
Libuv is a multi-platform asynchronous I/O library written in C. It mainly solves the problem that I/O operations can easily cause blocking. It was originally developed specifically for node.js use, but has since been used by other modules such as Luvit, Julia, pYUv, etc. Below is a diagram of libuv’s structure.
Libuv can be implemented asynchronously in two different ways, as shown in the yellow box on the left and right.
The left part is the network I/O module, which has different implementation mechanisms on different platforms. In Linux, ePOLL is used, OSX and other BSD systems use KQueue, SunOS system uses Event ports, and Windows system uses IOCP. Because it involves the underlying API of the operating system, it is more complicated to understand, so I will not introduce it here.
The right side includes the file I/O module, DNS module, and user code for asynchronous operations via thread pools. File I/O Unlike network I/O, Libuv does not rely on the system’s underlying apis, but instead performs blocking file I/O operations in a global thread pool.
Event polling in libuv
The following figure is the event polling flow chart provided on libuv’s official website. We analyze it together with the code.
The core code for the libuv event loop is implemented in the uv_run() function, and the following is part of the core code on Unix systems. Although written in C, it’s a high-level language like JavaScript, so it’s not too difficult to understand. The biggest difference is probably the asterisk and the arrow, which we can just ignore. For example, the uv_loop_t* loop in the function argument can be understood as a variable loop of type uv_loop_t. The arrow “→” can be understood as a dot “. For example, loop→stop_flag can be understood as loop.stop_flag.
int uv_run(uv_loop_t* loop, uv_run_mode mode){... r = uv__loop_alive(loop);if(! r) uv__update_time(loop);while(r ! =0 && loop - >stop_flag == 0) { uv__update_time(loop); uv__run_timers(loop); ran_pending = uv__run_pending(loop); uv__run_idle(loop); uv__run_prepare(loop); . uv__io_poll(loop, timeout); uv__run_check(loop); uv__run_closing_handles(loop); . }... }Copy the code
uv__loop_alive
This function is used to determine whether the event polling should continue, and returns 0 if there are no active tasks in the loop object and exits the loop.
In C, this “task” has a technical name, “handle”, which can be understood as a variable pointing to the task. Handles can be divided into two categories: Request and Handle, which represent short and long lifetime handles respectively. The specific code is as follows:
static int uv__loop_alive(const uv_loop_t * loop) {
returnuv__has_active_handles(loop) || uv__has_active_reqs(loop) || loop - >closing_handles ! = NULL; }Copy the code
uv__update_time
To reduce the number of time-dependent system calls, this function is isomorphic to cache the current system time with high precision, up to nanoseconds, but still in milliseconds.
The specific source code is as follows:
UV_UNUSED(static void uv__update_time(uv_loop_t * loop)) {
loop - >time = uv__hrtime(UV_CLOCK_FAST) / 1000000;
}
Copy the code
uv__run_timers
Execute the callbacks to the reached time thresholds in setTimeout() and setInterval(). This is done by iterating through the for loop. As you can see from the code below, the timer callback is stored in the data of a minimum heap structure and exits the loop when the minimum heap is empty or the time threshold has not been reached.
Remove the timer before executing the timer callback. If repeat is set, add it to the minimum heap again and execute the timer callback.
The specific code is as follows:
void uv__run_timers(uv_loop_t * loop) {
struct heap_node * heap_node;
uv_timer_t * handle;
for (;;) {
heap_node = heap_min(timer_heap(loop));
if (heap_node == NULL) break;
handle = container_of(heap_node, uv_timer_t, heap_node);
if (handle - >timeout > loop - >time) break; uv_timer_stop(handle); uv_timer_again(handle); handle - >timer_cb(handle); }}Copy the code
uv__run_pending
Loop through all I/O callbacks stored in pending_queue, returning 0 if pending_queue is empty; Otherwise, return 1 after executing the pending_queue callback.
The code is as follows:
static int uv__run_pending(uv_loop_t * loop) {
QUEUE * q;
QUEUE pq;
uv__io_t * w;
if (QUEUE_EMPTY( & loop - >pending_queue)) return 0;
QUEUE_MOVE( & loop - >pending_queue, &pq);
while(! QUEUE_EMPTY( & pq)) { q = QUEUE_HEAD( & pq); QUEUE_REMOVE(q); QUEUE_INIT(q); w = QUEUE_DATA(q, uv__io_t, pending_queue); w - >cb(loop, w, POLLOUT); }return 1;
}
Copy the code
uvrun_idle / uvrun_prepare / uv__run_check
All three functions are defined by a single macro function, UV_LOOP_WATCHER_DEFINE. Macro functions can be understood as code templates, or functions used to define functions. The macro function is called three times and the name parameter values prepare, check, and idle are passed in. The uvrun_IDLE, uvrun_prepare, and uv__run_check functions are defined.
So the logic is the same. We iterate through the first in, first out (fifo) loop and retrieve the objects in the loop->name##_handles, then execute the corresponding callback function.
#define UV_LOOP_WATCHER_DEFINE(name, type)
void uv__run_##name(uv_loop_t* loop) {
uv_##name##_t* h;
QUEUE queue;
QUEUE* q;
QUEUE_MOVE(&loop->name##_handles, &queue);
while(! QUEUE_EMPTY(&queue)) { q = QUEUE_HEAD(&queue); h = QUEUE_DATA(q, uv_##name##_t, queue); QUEUE_REMOVE(q); QUEUE_INSERT_TAIL(&loop->name##_handles, q); h->name##_cb(h); } } UV_LOOP_WATCHER_DEFINE(prepare, PREPARE) UV_LOOP_WATCHER_DEFINE(check, CHECK) UV_LOOP_WATCHER_DEFINE(idle, IDLE)Copy the code
uv__io_poll
Uv__io_poll is mainly used for polling I/O operations. The specific implementation will be different according to the operating system, we take the Linux system as an example for analysis.
Uv__io_poll function source more, the core of the two sections of cyclic code, part of the code is as follows:
void uv__io_poll(uv_loop_t * loop, int timeout) {
while(! QUEUE_EMPTY( & loop - >watcher_queue)) { q = QUEUE_HEAD( & loop - >watcher_queue); QUEUE_REMOVE(q); QUEUE_INIT(q); w = QUEUE_DATA(q, uv__io_t, watcher_queue); e.events = w - >pevents; e.data.fd = w - >fd;if (w - >events == 0) op = EPOLL_CTL_ADD;
else op = EPOLL_CTL_MOD;
if (epoll_ctl(loop - >backend_fd, op, w - >fd, &e)) {
if(errno ! = EEXIST) abort();if (epoll_ctl(loop - >backend_fd, EPOLL_CTL_MOD, w - >fd, &e)) abort();
}
w - >events = w - >pevents;
}
for (;;) {
for (i = 0; i < nfds; i++) {
pe = events + i;
fd = pe - >data.fd;
w = loop - >watchers[fd];
pe - >events &= w - >pevents | POLLERR | POLLHUP;
if (pe - >events == POLLERR || pe - >events == POLLHUP) pe - >events |= w - >pevents & (POLLIN | POLLOUT | UV__POLLRDHUP | UV__POLLPRI);
if(pe - >events ! =0) {
if (w == &loop - >signal_io_watcher) have_signals = 1;
elsew - >cb(loop, w, pe - >events); nevents++; }}if(have_signals ! =0) loop - >signal_io_watcher.cb(loop, &loop - >signal_io_watcher, POLLIN);
}...
}
Copy the code
In the while loop, the observer queue watcher_queue is iterated, the event and file descriptor are extracted and assigned to the event object E, and the epoll_ctl function is called to register or modify the epoll event.
In the for loop, the file descriptor waiting in epoll is fetched and assigned to NFDS, and then the NFDS is iterated over, executing the callback function.
uv__run_closing_handles
Iterate over the queue waiting to be closed, close handle such as STREAM, TCP and UDP, and then call close_cb corresponding to handle. The code is as follows:
static void uv__run_closing_handles(uv_loop_t * loop) {
uv_handle_t * p;
uv_handle_t * q;
p = loop - >closing_handles;
loop - >closing_handles = NULL;
while(p) { q = p - >next_closing; uv__finish_close(p); p = q; }}Copy the code
Process. NextTick and Promise
While process.nextTick and Promise are both asynchronous apis, they are not part of the event polling and each has its own task queue that executes after each step of the event polling. So when using these two asynchronous apis, be aware that if you do a task or recursion in the incoming callback function, event polling will be blocked, starving I/O operations.
The following code is an example of the fs.readfile callback failing to execute by recursively calling prcoess.nexttick.
fs.readFile('config.json', (err, data) = >{...
}) const traverse = () = >{
process.nextTick(traverse)
}
Copy the code
To solve this problem, use setImmediate instead, because setImmediate performs a callback function queue in event polling. The process.nextTick task queue has a higher priority than the Promise task queue, as shown in the following code:
function processTicksAndRejections() {
let tock;
do {
while (tock = queue.shift()) {
const asyncId = tock[async_id_symbol];
emitBefore(asyncId, tock[trigger_async_id_symbol], tock);
try {
const callback = tock.callback;
if (tock.args === undefined) {
callback();
} else {
const args = tock.args;
switch (args.length) {
case 1:
callback(args[0]);
break;
case 2:
callback(args[0], args[1]);
break;
case 3:
callback(args[0], args[1], args[2]);
break;
case 4:
callback(args[0], args[1], args[2], args[3]);
break;
default: callback(... args); }}}finally {
if (destroyHooksExist()) emitDestroy(asyncId);
}
emitAfter(asyncId);
}
runMicrotasks();
} while (! queue . isEmpty () || processPromiseRejections());
setHasTickScheduled(false);
setHasRejectionToWarn(false);
}
Copy the code
From processTicksAndRejections () function, you can see that in the first through the while loop in the queue queue callback function, and the callback function is through the process of the queue queue. NextTick to add. The runMicrotasks() function is called after the while loop completes to execute the Promise callback.
conclusion
Node.js core libuv structure can be divided into two parts, one is network I/O, the underlying implementation will depend on different system API according to different operating systems, the other part is file I/O, DNS, user code, this part is handled by thread pool.
Libuv’s core mechanism for handling asynchronous operations is event polling, which is divided into several steps, roughly iterating through and executing the callback function in the queue.
NextTick and Promise are not part of event polling and can block event polling if not used properly. SetImmediate mediate is one of the most common ways to overcome this problem.
Public account “Grace Front end”
For every aspiring “front-end ER” to lead the way, build dream aspirations.
Just to be an excellent front end Engineer!