background
Today when I was looking at node.js HMR related materials, I was a little confused when I saw the process of Nodemon restarting the service. The process is as follows:
1 Use the PstREE plug-in to obtain all child processes and shut down all child processes. 2 Stop the main process. 3 Start the service (child_process.fork is preferred, child_process.spawn is default).Copy the code
Will all child processes still exist after the main process is closed? With this doubt, the depth of the process/sub-process/thread (this article has many chapters, intermittent record a lot of content, and spent a day to tidy up, suggest you can follow the example to do it again), and the application scenario;
Note: I use macOS, mainly to introduce the process in Node.js application
before start
Before we start, let’s talk about node.js related interfaces. In addition, this article has carried out a lot of tests, and there are some common Linux commands to understand, so as to facilitate debugging
Procedure 1 Check the process occupied by the port. Lsof -i:port 2 Checks the usage of TCP ports. Netstat -anvp TCP 3 Check the process status. Top-pid pid 4 View the sub-process. Pstree -p pid 5 Check the thread. Ps -m pid 6 Kill the process. Kill -9 pid(kill a process by its pid)/pkill command(kill a process by its name, e.g. pkill node, kill all node applications)Copy the code
This article mainly uses the four apis provided by Node.js: Process, child_process, cluster, and worker_Threads.
process
Process provides the following functions: 1. An instance of EventEmitter that listens to all phases of a /emit process (beforeExit, exit, onece, Warnning, rejectionHandled, etc.)
Process.on ('exit', (code) => {console.log(code)}) // Kill the process with the process.exit event or submit only the emit trigger listening event (while handling some exceptions, Process. exit(1) // 1 process.emit('exit', 'just emit, not exit') // just emit, not exitCopy the code
2. Obtain startup parameters; e.g.
Js -x 3 -y 4 // Print the parameters. You can also use some tools to serialize the parameters. e.g. argvs console.log(argv, argv0) // ['node', 'index.js', '-x', '3', '-y', '4'] 'node'Copy the code
3. Provide process information (PID, PPID,platform,etc.);
child_process
1, shell statement/file execution API, child_process.execfile ()/child_process.exec(); Child_process.fork (); Spawn statement, execute shell statement with new process; 4. EventEmitter provides some process management apis and process information apis (subprocess.kill(), subprocess.exitCode(), subprocess.pid, etc.). Exec (execFile), fork, and spawn apis
All three apis are used to create new child processes. Exec (execFile) and fork are based on spawn.Copy the code
Difference:
ExecFile executes shell commands or shell script files, and does not need to communicate with the parent process. 2 Fork () copies and creates a new child process, usually forking () on an existing process; Spawn is used if the above scenario is not suitable or does not fulfill the requirements. ExecFile ->exec->spawn for execFile->exec->spawn; For copying new child processes, fork()->spawn(); Spawn is the most basic API, but relatively low in performance/convenience in specific scenarios (this is relative, if your implementation can perform better than Node, please mention PR); Exec (execFile) supports callback functions and will pass (err, stdout, stderr) into them. Fork (), copy the new child process, and build IPC communication (more on communication in another article); / / ends 1 exec (execFile) after the shell statement/script will exit;Copy the code
In conclusion, there are many methods like array, the most basic is for loop, but we should use a higher performance, higher performance, more semantic API in the specific scenario.
cluster
Cluster management interface provided by Node, based on eventEmitter, provides methods such as fork, isPrimary, isWorker, workers, etc. Examples on the official website are as follows:
import cluster from 'cluster'; import http from 'http'; import { cpus } from 'os'; import process from 'process'; const numCPUs = cpus().length; / / compatible cluster. IsMaster if (cluster isPrimary | | cluster. The isMaster) {the console. The log (` Primary ${process. Pid} is running `); // Fork workers. for (let i = 0; i < numCPUs; i++) { cluster.fork(); } cluster.on('exit', (worker, code, signal) => { console.log(`worker ${worker.process.pid} died`); }); } else { // Workers can share any TCP connection // In this case it is an HTTP server http.createServer((req, res) => { res.writeHead(200); res.end('hello world\n'); }).listen(8000); console.log(`Worker ${process.pid} started`); }Copy the code
It’s a bit confusing to see here. Isn’t this creating n processes listening on one port? The cluster.fork method is used to create isPrimary, isworker, isworker, and isPrimary. Specific can refer to the source analysis Node Cluster module, in simple terms, the main process listens to the port, the main process through IPC communication to allocate services to the child process to deal with new connections and data;
worker_threads
Worker_threads allows JS to create new threads to execute tasks in parallel. It provides an API for obtaining thread information (isMainThread, parentPort, threadId etc.). 2. MessageChannel (MessagePort), which provides methods for communication between threads and processes. 3. The Worker class, based on eventEmitter, provides some methods for thread management (thread opening and closing). e.g.
// Start a process new Worker(file)Copy the code
process
Concept: Process is a running activity of a program in a computer on a data set. It is the basic unit of system resource allocation and scheduling, and the basis of operating system structure. The concept is abstract, and I think it can be understood as an executing program that takes up some resources. In Node.js, it is the program that executes our code through Node, LLDB.
import Koa from 'koa'
const app = new Koa()
app.use((ctx, next) => {
ctx.body = 'hello world'
})
app.listen(3002)
Copy the code
The PID of the process can be queried through the port. The running status of the process can be queried through the PID. e.g.
lsof -i:3002
//COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
//node 82410 vb 23u IPv6 ********* 0t0 TCP *:exlm-agent (LISTEN)
top -pid 82410
// PID COMMAND %CPU TIME #TH #WQ #PORTS MEM PURG CMPRS PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS
// 15715 node 0.0 00:02.36 8 0 30 106M 0B 102M 4084 1 sleeping *0[1] 0.00000 0.00000 502 64987 629 106 47 12045 351 2730 0
Copy the code
You can run the top command to view the resource usage of a process. The main indicators are memory usage, CPU usage, and status (HTTP services are daemon processes and are affected only when users request them to enter the process. State is the sleeping state by default).
Process management
There are many excellent Node.js process management tools (such as PM2, Nodemon, Forever, etc.), so manual process management is almost unnecessary. These process management tools mainly provide the following functions: 1. 2. Process daemon (listening for exceptions and hot restart); 3, multi-process; 4. Load balancing; 5. Log management; Other than process management, it is not covered here. To learn more about node.js processes, we can try to manually implement the process management related code;
Kill/start/restart/hot restart the process
Kill the process
process.exit(code) // code for listen event
Copy the code
2. Start the process
child_process.fork();
Copy the code
3. Restart the process
// exit() child_process.fork(); process.exit(code) // code for listen eventCopy the code
Note: Instructions on the order of execution follow
Hot Reboot (rolling release)
For node services in single-node deployment, hot restart is generally implemented in rolling release mode. Each service is restarted in turn. The following functions need to be implemented:
// 1 notify the main process that it is no longer distributing the task (disconnect); Workder. emit('disconnect') 2 Wait for 10s(the time is determined by yourself, generally based on the connection timeout to avoid termination of ongoing tasks); Sleep (10000) workder.kill() 3 Shut down and restart the service. cluster.fork()Copy the code
Specific can refer to the source code
thread
A thread (English: thread) is the smallest unit in which an operating system can schedule operations. (From Wikipedia) My own understanding is the in-process task scheduling unit, each process will perform tasks according to a specific algorithm for task scheduling. As we all know, javascrpt is single-threaded, which calls the method on/off the stack according to the data structure of the stack, together with the asynchronous task queue of event loop. But is javascript really single-threaded? Let’s use a simple example:
import Koa from 'koa' const app = new Koa() app.use((ctx, Next) => {ctx.body = 'hello world'}) app.listen(3002) Lsof -i:3002 // Obtain thread information through PID ps -m pid USER PID TT %CPU STAT PRI STIME UTIME COMMAND VB 45954 s012 0.0s 31T 05:00.03 0:00.11 node index.js 45954 0.0s 31T 0:00.00 0:00.00 45954 0.0s 31T 0:00 01 45954 0.0 S 31T 0:00.00 0:00.01 45954 0.0 S 31T 0:00 0:00.00 45954 0.0 S 31T 0:00.00 0:00.00 45954 0.0 S 31T 0:00Copy the code
As you can see, a Node.js process has n threads running, but we can’t call these threads during development. If there is some complex calculation, can we start another thread to do the calculation to avoid blocking the request? Yes, node.js provides worker_Threads API for implementation, at LLDB.
// sum.js const { Worker, isMainThread, parentPort, workerData } = require('worker_threads'); if (isMainThread) { module.exports = function sumAsync(script) { return new Promise((resolve, reject) => { const worker = new Worker(__filename, { workerData: script }); worker.on('message', resolve); worker.on('error', reject); worker.on('exit', (code) => { if (code ! == 0) reject(new Error(`Worker stopped with exit code ${code}`)); }); }); }; Parentport.postmessage (sum()); parentport.postMessage (sum()); } // main.js const Koa = require('Koa') const app = new Koa() const sum = require('./sum') app.use(async (ctx, next) => { let result = await sum() ctx.body = `hello world ${result}` }) app.listen(3002)Copy the code
But there are a few problems: 1. Thread creation/communication is tedious for developers; 2, each time to create a thread costs a lot, need to create a thread pool to save threads; 3. Destroy the thread automatically after each thread is consumed (for a long time, for example, do a message listener in the thread to keep the process from being destroyed); e.g.
Parentport. on('message', (data) => {console.log(data)})Copy the code
Therefore, it is generally necessary to achieve through plug-ins, now the popular plug-ins piscina, Threads and so on.
The thread pool
Whether it is process pool, thread pool, connection pool, etc., it is actually the same design. In order to avoid the performance consumption of creation, multiple resources are created in advance, queues are set up, queues are triggered when adding, and valid resources are constantly polling for calls. The main process is as follows: 1. Initialize the thread pool (1 thread by default); 2. After the task comes in, encapsulate it as a Promise, and pass resolve and reject into the queue as handles (the queue keeps polling until all tasks are completed); 3. Inform the main process of the result after the task is completed; This is a simple version of the thread pool, there are some issues to note: 1, this is an instance, if you need to use more than one place, it is recommended to mount to the global variable/globally accessible object, through the singleton mode use; Unlike new worker(), which accepts the path of the executable file, this thread pool accepts the method that the thread needs to execute new pool(function); The source code
The problem record
1. Does killing the parent also kill all children? Detached The process created through fork()/spawn() is set to the creation parameter detached to determine whether the process will be killed along with the parent process (default: false). If set to true, the process will be attached to the system and node after the parent process is killed.
For restart, fork() and exit(). If it is the same port number, how to ensure that the order of execution is not wrong (fork, port is still occupied)? Fork () is asynchronous, and exit is executed synchronously. Fork is executed slower than exit, so the port is not still occupied.
Child_process.fork (), child_process.exec(), worker_threads (), etc. Child_process.fork (), child_process.exec(), worker_threads, etc. LLDB is processed using the Registry method of TS-Node.
import { WorkerOptions, Worker } from 'worker_threads' const workerTs = (file: string, wkOpts: WorkerOptions) => { wkOpts.eval = true; if (! wkOpts.workerData) { wkOpts.workerData = {}; } wkOpts.workerData.__filename = file; return new Worker(` const wk = require('worker_threads'); require('ts-node').register(); let file = wk.workerData.__filename; delete wk.workerData.__filename; require(file); `, wkOpts ); }Copy the code
Reference documentation
1 ps command: ss64.com/osx/ps.html 2 Node.js Child Processes: Everything you need to know: www.freecodecamp.org/news/node-j… 3 How does cluster enable multiple processes and can a port be monitored by multiple processes? : juejin. Cn/post / 691145… 4 From the source analysis Node Cluster module: juejin.cn/post/684490… 5 A Complete Guide to Threads in Node.js: blog.logrocket.com/a-complete-…