This is the 15th day of my participation in Gwen Challenge
Node follows the single-threaded, single-process model. Single-threaded node means that the JS engine has only one instance and executes in the main thread of nodeJS. Meanwhile, Node handles asynchronous operations such as I/O in an event-driven manner. Node’s single-threaded mode, which maintains only one main thread, greatly reduces the overhead of switching between threads.
However, Node’s single thread prevents CPU intensive operations on the main thread that would otherwise block the main thread. For CPU-intensive operations, separate child processes can be created in Node using child_process. The parent and child processes communicate via IPC. The child processes can be external applications or Node subroutines.
In addition, node’s single thread, to run a single process, so can not use multi-core CPU and other resources, in order to schedule multi-core CPU and other resources, Node also provides a cluster module, using multi-core CPU resources, so that a string of node sub-process to handle load tasks, while ensuring a certain load balancing type. This article introduces the child_process and cluster modules from the perspective of node’s single thread and single process.
Single thread and single process in node
The first concept to understand is the single-threaded and single-process mode of Node. Node’s single-threaded approach reduces the overhead of switching between threads compared to multithreading in other languages and eliminates the need to worry about locks and thread pools when writing Node code. Node’s claimed single-threaded mode is better suited for IO intensive operations than other languages. So the classic question is:
Is Node really single-threaded?
When it comes to Node, we immediately think of single-threaded, asynchronous IO, event-driven, etc. The first thing to clarify is whether Node is really single-threaded, and if so, where asynchronous IO and timed events (setTimeout, setInterval, etc.) are executed.
Strictly speaking, Node is not single-threaded. Node has multiple threads, including:
- The thread that the js engine executes
- Timer thread (setTimeout, setInterval)
- Asynchronous HTTP threads (Ajax)
. A single threaded node is a node with only one JS engine running on the main thread. Other asynchronous IO and event-driven threads use Libuv to implement internal thread pooling and thread scheduling. Libv has an Event Loop that switches to achieve a multithreaded effect. Simply speaking, Event Loop is to maintain an execution stack and an Event queue. If asynchronous IO and timer functions are found in the current execution stack, these asynchronous callback functions will be put into the Event queue. After the current stack completes execution, the asynchronous callback functions in the event queue are executed in a certain order from the event queue.
In the figure above, the callback functions are executed in a certain order from the execution stack to the Event queue, and finally in the Event queue. The whole process is a simplified version of the Event Loop. In addition, when the callback function is executed, it also generates a stack of execution, and there may be nested asynchronous functions inside the callback function, that is, there is nesting of execution stack.
In other words, the single thread in Node means that the JS engine only runs on the main thread. Other asynchronous operations are also executed by independent threads. Libv Event Loop implements context switching and thread pool scheduling similar to multithreading. Threads are the smallest process, so Node is single-process. This explains why Node is single-threaded and single-process.
The child_process module in Node implements multiple processes
Node is a single process, there must be a problem, is the full use of CPU and other resources. Node provides the child_process module to implement child processes, thereby implementing a generalized multi-process pattern. Using the child_process module, you can implement the mode of one master process and multiple child processes. The master process is called the master process, and the child processes are also called worker processes. The child process can not only call other Node programs, but also execute non-Node programs, shell commands, etc. After executing the child process, it returns as a stream or callback.
API provided by the child_process module
Child_process provides four methods for creating child processes: spawn, execFile, exec, and fork. All methods are asynchronous, and a diagram can be used to illustrate the differences between the four methods
The figure above illustrates the differences between the four methods, and we can briefly describe the differences between the four methods.
- Spawn: a non-Node program executed in a child process that returns the results of the execution as a stream after providing a set of parameters.
- ExecFile: A non-Node program executed in a child process that, after providing a set of parameters, returns the result of the execution as a callback.
- Exec: A child process that executes a non-Node program, passing in a string of shell commands, the result of which is returned as a callback, with execFile
The difference is that exec can execute a string of shell commands directly.
- Fork: A child process executes a Node program. After providing a set of arguments, the result is returned as a stream. Unlike spawn, the child process can only execute node applications. The following sections describe these methods in detail.
ExecFile and exec
Let’s start by comparing the differences between execFile and exec. The similarities between the two methods are:
A non-Node application is executed and the result is returned as a callback function.
The differences are:
Exec is a shell command executed directly, while execFile is an application executed
Echo, for example, is a UNIX command that can be executed directly from the command line:
echo hello world
Copy the code
As a result, hello World is printed on the command line.
Create a new main.js file, if you want to use the exec method, then in this file write:
let cp=require('child_process');
cp.exec('echo hello world'.function(err,stdout){
console.log(stdout);
});
Copy the code
Execute main.js and the result will be Hello World. We find that the first parameter of exec is exactly like a shell command.
(2) Implement it through execFile
let cp=require('child_process');
cp.execFile('echo'['hello'.'world'].function(err,stdout){
console.log(stdout);
});
Copy the code
ExecFile is similar to executing an application named echo and passing in parameters. ExecFlie looks for an app named ‘echo’ in the PATH of process.env.PATH and executes it. The default process.env.PATH contains ‘usr/local/bin’, and the ‘usr/local/bin’ directory contains the program named ‘echo’, passing hello and world, and returning.
(3) Security analysis like exec, can directly execute a shell is extremely unsafe, such as the shell:
echo hello world; rm -rfCopy the code
Exec can be executed directly, rm -rf will delete files in the current directory. ExecFile, like the command line, is a very high level of execution that can lead to security issues. ExecFile is different:
execFile('echo'['hello'.'world'.'; rm -rf'])
Copy the code
When an argument is passed in, the security of the execution of the passed argument is checked, and an exception is thrown if there is a security problem. In addition to execFile, neither spawn nor fork can execute a shell directly, so security is high.
spawn
Spawn is also used to execute non-Node applications and cannot execute shells directly. In contrast to execFile, the result of spawn execution is not a one-time output after execution, but a stream output. For large volumes of data output, the use of memory can be introduced in the form of streams.
Let’s use the example of sorting and deduplicating a file:
In the diagram above, there are acBA unsorted words in the input. TXT file read first. Sorting function can be realized by sort program, and the output is Aabc. Finally, ABC can be obtained by uniq program. We can do this with spawn stream inputs and outputs:
let cp=require('child_process');
let cat=cp.spawn('cat'['input.txt']);
let sort=cp.spawn('sort');
let uniq=cp.spawn('uniq');
cat.stdout.pipe(sort.stdin);
sort.stdout.pipe(uniq.stdin);
uniq.stdout.pipe(process.stdout);
console.log(process.stdout);
Copy the code
After execution, the final result is entered into process.stdout. If the input. TXT file is large, the input and output in the form of streams can significantly reduce the memory footprint. By setting the buffer form, the memory footprint can be reduced and the input and output efficiency can be improved.
fork
In javascript, in terms of processing a large number of computational tasks, HTML is implemented through Web work, so that the task is separated from the main thread. Node uses a communication built in between parent and child processes to handle this problem, reducing the stress of running big data. The fork method is used to execute node programs in a separate process. The child process receives information from the parent process and returns the result to the parent process through communication between the parent process and the child process.
Using the fork method, you can open an IPC channel between the parent and child processes, allowing messages to be exchanged between different Node processes.
In the child process:
Messages are received and sent through the process.on(‘message’) and process.send() mechanisms.
In the parent process:
Messages are received and sent via child-on (‘message’) and process.send() mechanisms.
For a concrete example, in child.js:
process.on('message'.function(msg){
process.send(msg)
})
Copy the code
Js: in the parent.
let cp=require('child_process');
let child=cp.fork('./child');
child.on('message'.function(msg){
console.log('got a message is',msg);
});
child.send('hello world');
Copy the code
Got a message is hello world
Communication between a parent and child can be interrupted by calling in the parent process:
child.disconnect()
Copy the code
To disconnect IPC communication between father and son.
The child process that synchronizes execution
Exec, execFile, spawn, and fork execute child processes that are asynchronous by default and run without blocking the main process. In addition, the child_process module also provides execFileSync, spawnSync, and execSync to synchronously execute the child process.
Cluster module in node
Cluster means integration and integrates two aspects: child_process.fork and automatic load balancing after child processes are created on multi-core cpus.
Let’s take an example from the official website:
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
console.log(` main process${process.pid}Running ');
// Spawn the work process.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit'.(worker, code, signal) = > {
console.log('Working process${worker.process.pid}Have withdrawn from `);
});
} else {
// Worker processes can share any TCP connection.
// In this case, an HTTP server is shared.
http.createServer((req, res) = > {
res.writeHead(200);
res.end('Hello world \n');
}).listen(8000);
console.log('Working process${process.pid}Has launched `);
}
Copy the code
The final output is:
$node server.js Main process3596The worker process is running4324The working process has been started4520The working process has been started6056The working process has been started5644Has startedCopy the code
We call the master process the master process, and the worker process the worker process. Using the Cluster module, Using the API, IPC channel and scheduler encapsulated by Node, it is very easy to create an architecture consisting of a master HTTP proxy server + multiple worker processes and multiple HTTP application servers.
conclusion
In this article, we first introduce the single-thread and single-process modes of Node. Then, we introduce how to implement child processes in Node from the single thread defects. We compare several different child-process generation schemes in child_process module. Finally, the built-in integration module Cluster which can realize load balancing of sub-process and CPU process is introduced.