An overview of the

In other languages, it is very easy to implement multi-process or multi-threading, whereas for a Node instance, it works on a single thread. However, Node can still implement multi-process, which is much more efficient in reading and writing than a single process. In my own test, I read and write more than 20000 files, a total of 2G data. Without using multiple processes, it takes about 3 minutes to complete the reading and writing, but using cluster, it takes about 40 seconds to complete the reading and writing.

Introduction of cluster

Cluster module can create a shared server port sub-process, mainly using the processor multi-core system, let the sub-process to deal with load tasks, in simple terms, the main process through the cluster to distribute tasks to sub-process, so that multiple tasks concurrent execution, which is very suitable in the process of reading and writing. The bottom layer of cluster is implemented by child_process, refer to this article for details

Simple code structure

Code logic:

  1. The main process creates child processes in a loop and sends variables to each child process
  2. The child process performs related operations after receiving the variables and sends a message to the main process to indicate that the operation is complete
  3. The main process receives the message from the subprocess, stops the subprocess, and displays the output of process xx
  4. All child processes are closed and code execution is complete.
var cluster = require("cluster");
// Get the number of cores in the current processor
var numCPUs = require("os").cpus().length;
// Check if it is the main process
if (cluster.isMaster) {
    // Initialize code, such as getting a directory structure under a folder
      let primes = [];
      for (let i = 0; i < numCPUs; i++) {
        // Start the child process
        const worker = cluster.fork(); 
        // In the main process, this sends a message to the child process
        // Assign write work
        worker.send({
            // Send variables
        });
      }
    
  The cluster module will trigger an 'exit' event when any worker process is shut down
  cluster.on("exit".function (worker, code, signal) {
    console.log("Process" + worker.process.pid + "The end");
  });
  When the main process receives the message, it performs some operations. In this case, the main process closes the child process when it receives the message
  cluster.on("message".function (worker, message, handle) {
    worker.kill();
  });
} else {
  // Listen for variable messages from the main process and perform related operations
  process.on("message".(msg) = > {
    // What to do
    // In the child process, a message is sent to the main process
    process.send({ data: "The end" });
  });
}
Copy the code

Matters needing attention

  1. When you want to initialize something, be sure to write it in here. If you judge the outside, each child process will execute the code, leading to repeated initialization
if (cluster.isMaster) {
// Initialize the code
}
Copy the code
  1. As for the number of sub-processes, I default to open the same number of sub-processes according to the number of cores. There are other people testing online, and the speed is the fastest when opening 12 cores, but I measured that the number of sub-processes is a little faster than opening 12 cores. In this question, I can make a choice based on the personal measured results
  2. Variable passing is a problem in the child process. Common variables are often unavailable in the child process. You can use the child process to communicate with the main process to solve this problem.

The resources

Node.js Api

Welcome to my personal blog