background

Today when I was looking at node.js HMR related materials, I was a little confused when I saw the process of Nodemon restarting the service. The process is as follows:

1 Use the PstREE plug-in to obtain all child processes and shut down all child processes. 2 Stop the main process. 3 Start the service (child_process.fork is preferred, child_process.spawn is default).Copy the code

Will all child processes still exist after the main process is closed? With this doubt, the depth of the process/sub-process/thread (this article has many chapters, intermittent record a lot of content, and spent a day to tidy up, suggest you can follow the example to do it again), and the application scenario;

Note: I use macOS, mainly to introduce the process in Node.js application

before start

Before we start, let’s talk about node.js related interfaces. In addition, this article has carried out a lot of tests, and there are some common Linux commands to understand, so as to facilitate debugging

Procedure 1 Check the process occupied by the port. Lsof -i:port 2 Checks the usage of TCP ports. Netstat -anvp TCP 3 Check the process status. Top-pid pid 4 View the sub-process. Pstree -p pid 5 Check the thread. Ps -m pid 6 Kill the process. Kill -9 pid(kill a process by its pid)/pkill command(kill a process by its name, e.g. pkill node, kill all node applications)Copy the code

This article mainly uses the four apis provided by Node.js: Process, child_process, cluster, and worker_Threads.

process

Process provides the following functions: 1. An instance of EventEmitter that listens to all phases of a /emit process (beforeExit, exit, onece, Warnning, rejectionHandled, etc.)

Process.on ('exit', (code) => {console.log(code)}) // Kill the process with the process.exit event or submit only the emit trigger listening event (while handling some exceptions, Process. exit(1) // 1 process.emit('exit', 'just emit, not exit') // just emit, not exitCopy the code

2. Obtain startup parameters; e.g.

Js -x 3 -y 4 // Print the parameters. You can also use some tools to serialize the parameters. e.g. argvs console.log(argv, argv0) // ['node', 'index.js', '-x', '3', '-y', '4'] 'node'Copy the code

3. Provide process information (PID, PPID,platform,etc.);

child_process

1, shell statement/file execution API, child_process.execfile ()/child_process.exec(); Child_process.fork (); Spawn statement, execute shell statement with new process; 4. EventEmitter provides some process management apis and process information apis (subprocess.kill(), subprocess.exitCode(), subprocess.pid, etc.). Exec (execFile), fork, and spawn apis

All three apis are used to create new child processes. Exec (execFile) and fork are based on spawn.Copy the code

Difference:

ExecFile executes shell commands or shell script files, and does not need to communicate with the parent process. 2 Fork () copies and creates a new child process, usually forking () on an existing process; Spawn is used if the above scenario is not suitable or does not fulfill the requirements. ExecFile ->exec->spawn for execFile->exec->spawn; For copying new child processes, fork()->spawn(); Spawn is the most basic API, but relatively low in performance/convenience in specific scenarios (this is relative, if your implementation can perform better than Node, please mention PR); Exec (execFile) supports callback functions and will pass (err, stdout, stderr) into them. Fork (), copy the new child process, and build IPC communication (more on communication in another article); / / ends 1 exec (execFile) after the shell statement/script will exit;Copy the code

In conclusion, there are many methods like array, the most basic is for loop, but we should use a higher performance, higher performance, more semantic API in the specific scenario.

cluster

Cluster management interface provided by Node, based on eventEmitter, provides methods such as fork, isPrimary, isWorker, workers, etc. Examples on the official website are as follows:

import cluster from 'cluster'; import http from 'http'; import { cpus } from 'os'; import process from 'process'; const numCPUs = cpus().length; / / compatible cluster. IsMaster if (cluster isPrimary | | cluster. The isMaster) {the console. The log (` Primary ${process. Pid} is running `); // Fork workers. for (let i = 0; i < numCPUs; i++) { cluster.fork(); } cluster.on('exit', (worker, code, signal) => { console.log(`worker ${worker.process.pid} died`); }); } else { // Workers can share any TCP connection // In this case it is an HTTP server http.createServer((req, res) => { res.writeHead(200); res.end('hello world\n'); }).listen(8000); console.log(`Worker ${process.pid} started`); }Copy the code

It’s a bit confusing to see here. Isn’t this creating n processes listening on one port? The cluster.fork method is used to create isPrimary, isworker, isworker, and isPrimary. Specific can refer to the source analysis Node Cluster module, in simple terms, the main process listens to the port, the main process through IPC communication to allocate services to the child process to deal with new connections and data;

worker_threads

Worker_threads allows JS to create new threads to execute tasks in parallel. It provides an API for obtaining thread information (isMainThread, parentPort, threadId etc.). 2. MessageChannel (MessagePort), which provides methods for communication between threads and processes. 3. The Worker class, based on eventEmitter, provides some methods for thread management (thread opening and closing). e.g.

// Start a process new Worker(file)Copy the code

process

Concept: Process is a running activity of a program in a computer on a data set. It is the basic unit of system resource allocation and scheduling, and the basis of operating system structure. The concept is abstract, and I think it can be understood as an executing program that takes up some resources. In Node.js, it is the program that executes our code through Node, LLDB.

import Koa from 'koa'

const app = new Koa()

app.use((ctx, next) => {
  ctx.body = 'hello world'
})
app.listen(3002)
Copy the code

The PID of the process can be queried through the port. The running status of the process can be queried through the PID. e.g.

lsof -i:3002 
//COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
//node    82410   vb   23u  IPv6 *********      0t0  TCP *:exlm-agent (LISTEN)
top -pid 82410
// PID        COMMAND     %CPU     TIME         #TH      #WQ      #PORTS   MEM      PURG     CMPRS    PGRP     PPID     STATE        BOOSTS        %CPU_ME     %CPU_OTHRS     UID      FAULTS    COW      MSGSENT  MSGRECV  SYSBSD    SYSMACH  CSW      PAGEINS 
// 15715      node         0.0      00:02.36     8        0        30       106M     0B       102M     4084     1        sleeping     *0[1]         0.00000     0.00000        502      64987     629      106      47       12045     351      2730     0         
Copy the code

You can run the top command to view the resource usage of a process. The main indicators are memory usage, CPU usage, and status (HTTP services are daemon processes and are affected only when users request them to enter the process. State is the sleeping state by default).

Process management

There are many excellent Node.js process management tools (such as PM2, Nodemon, Forever, etc.), so manual process management is almost unnecessary. These process management tools mainly provide the following functions: 1. 2. Process daemon (listening for exceptions and hot restart); 3, multi-process; 4. Load balancing; 5. Log management; Other than process management, it is not covered here. To learn more about node.js processes, we can try to manually implement the process management related code;

Kill/start/restart/hot restart the process

Kill the process

process.exit(code) // code for listen event  
Copy the code

2. Start the process

child_process.fork();  
Copy the code

3. Restart the process

// exit() child_process.fork(); process.exit(code) // code for listen eventCopy the code

Note: Instructions on the order of execution follow

Hot Reboot (rolling release)

For node services in single-node deployment, hot restart is generally implemented in rolling release mode. Each service is restarted in turn. The following functions need to be implemented:

// 1 notify the main process that it is no longer distributing the task (disconnect); Workder. emit('disconnect') 2 Wait for 10s(the time is determined by yourself, generally based on the connection timeout to avoid termination of ongoing tasks); Sleep (10000) workder.kill() 3 Shut down and restart the service. cluster.fork()Copy the code

Specific can refer to the source code

thread

A thread (English: thread) is the smallest unit in which an operating system can schedule operations. (From Wikipedia) My own understanding is the in-process task scheduling unit, each process will perform tasks according to a specific algorithm for task scheduling. As we all know, javascrpt is single-threaded, which calls the method on/off the stack according to the data structure of the stack, together with the asynchronous task queue of event loop. But is javascript really single-threaded? Let’s use a simple example:

import Koa from 'koa' const app = new Koa() app.use((ctx, Next) => {ctx.body = 'hello world'}) app.listen(3002) Lsof -i:3002 // Obtain thread information through PID ps -m pid USER PID TT %CPU STAT PRI STIME UTIME COMMAND VB 45954 s012 0.0s 31T 05:00.03 0:00.11 node index.js 45954 0.0s 31T 0:00.00 0:00.00 45954 0.0s 31T 0:00 01 45954 0.0 S 31T 0:00.00 0:00.01 45954 0.0 S 31T 0:00 0:00.00 45954 0.0 S 31T 0:00.00 0:00.00 45954 0.0 S 31T 0:00Copy the code

As you can see, a Node.js process has n threads running, but we can’t call these threads during development. If there is some complex calculation, can we start another thread to do the calculation to avoid blocking the request? Yes, node.js provides worker_Threads API for implementation, at LLDB.

// sum.js const { Worker, isMainThread, parentPort, workerData } = require('worker_threads'); if (isMainThread) { module.exports = function sumAsync(script) { return new Promise((resolve, reject) => { const worker = new Worker(__filename, { workerData: script }); worker.on('message', resolve); worker.on('error', reject); worker.on('exit', (code) => { if (code ! == 0) reject(new Error(`Worker stopped with exit code ${code}`)); }); }); }; Parentport.postmessage (sum()); parentport.postMessage (sum()); } // main.js const Koa = require('Koa') const app = new Koa() const sum = require('./sum') app.use(async (ctx, next) => { let result = await sum() ctx.body = `hello world ${result}` }) app.listen(3002)Copy the code

But there are a few problems: 1. Thread creation/communication is tedious for developers; 2, each time to create a thread costs a lot, need to create a thread pool to save threads; 3. Destroy the thread automatically after each thread is consumed (for a long time, for example, do a message listener in the thread to keep the process from being destroyed); e.g.

Parentport. on('message', (data) => {console.log(data)})Copy the code

Therefore, it is generally necessary to achieve through plug-ins, now the popular plug-ins piscina, Threads and so on.

The thread pool

Whether it is process pool, thread pool, connection pool, etc., it is actually the same design. In order to avoid the performance consumption of creation, multiple resources are created in advance, queues are set up, queues are triggered when adding, and valid resources are constantly polling for calls. The main process is as follows: 1. Initialize the thread pool (1 thread by default); 2. After the task comes in, encapsulate it as a Promise, and pass resolve and reject into the queue as handles (the queue keeps polling until all tasks are completed); 3. Inform the main process of the result after the task is completed; This is a simple version of the thread pool, there are some issues to note: 1, this is an instance, if you need to use more than one place, it is recommended to mount to the global variable/globally accessible object, through the singleton mode use; Unlike new worker(), which accepts the path of the executable file, this thread pool accepts the method that the thread needs to execute new pool(function); The source code

The problem record

1. Does killing the parent also kill all children? Detached The process created through fork()/spawn() is set to the creation parameter detached to determine whether the process will be killed along with the parent process (default: false). If set to true, the process will be attached to the system and node after the parent process is killed.

For restart, fork() and exit(). If it is the same port number, how to ensure that the order of execution is not wrong (fork, port is still occupied)? Fork () is asynchronous, and exit is executed synchronously. Fork is executed slower than exit, so the port is not still occupied.

Child_process.fork (), child_process.exec(), worker_threads (), etc. Child_process.fork (), child_process.exec(), worker_threads, etc. LLDB is processed using the Registry method of TS-Node.

import { WorkerOptions, Worker } from 'worker_threads' const workerTs = (file: string, wkOpts: WorkerOptions) => { wkOpts.eval = true; if (! wkOpts.workerData) { wkOpts.workerData = {}; } wkOpts.workerData.__filename = file; return new Worker(` const wk = require('worker_threads'); require('ts-node').register(); let file = wk.workerData.__filename; delete wk.workerData.__filename; require(file); `, wkOpts ); }Copy the code

Reference documentation

1 ps command: ss64.com/osx/ps.html 2 Node.js Child Processes: Everything you need to know: www.freecodecamp.org/news/node-j… 3 How does cluster enable multiple processes and can a port be monitored by multiple processes? : juejin. Cn/post / 691145… 4 From the source analysis Node Cluster module: juejin.cn/post/684490… 5 A Complete Guide to Threads in Node.js: blog.logrocket.com/a-complete-…