Introduce a,
Node directly introduced the Cluster module in V0.8 to solve the multi-core CPU utilization problem, and also provides a relatively complete API to deal with the robustness of the process.
The cluster module calls the fork method to create child processes. This method is the same as the fork method used in child_process (play with node child processes – child_process). The cluster module adopts the classic master-slave model. The cluster will create a master, and then replicate as many child processes as you specify. You can use the cluster.isMaster attribute to determine whether the current process is a master or a worker. The master process manages all the child processes. The master process is not responsible for specific task processing, but mainly responsible for scheduling and management.
The Cluster module uses built-in load balancing to better handle stress between threads using round-robin algorithms (also known as loops). When using the round-robin scheduling policy, master accepts() all incoming connection requests and then sends the corresponding TCP request processing to the selected worker process (which still communicates via IPC). The official example is as follows
const cluster = require('cluster');
const cpuNums = require('os').cpus().length;
const http = require('http');
if (cluster.isMaster) {
for (let i = 0; i < cpuNums; i++){
cluster.fork();
}
// The child exits the listener
cluster.on('exit'.(worker,code,signal) = > {
console.log('worker process died,id',worker.process.pid)
})
} else {
// Annotate the child process with the process name
process.title = ` cluster the child process${process.pid}`;
// Workers can share the same TCP connection, in this case an HTTP server
http.createServer((req, res) = > {
res.end(`response from worker ${process.pid}`);
}).listen(3000);
console.log(`Worker ${process.pid} started`);
}
Copy the code
The cluster module is a combination of child_process and net modules. When a cluster is started, the TCP server is started internally. Send the file descriptor for the TCP server socket to the worker process. If the worker process is copied from cluster.fork(), NODE_UNIQUE_ID will be in its environment variable. If the worker process has listen() listening for network port calls, it will get the file descriptor and reuse it through the SO_REUSEADDR port. In this way, many sub-processes share ports.
2. Cluster events
- Fork: triggers the event after copying a worker process;
- Online: After replicating a worker process, the worker process sends an online message to the master process. After the master process receives the message, the event is triggered.
- Listening: After a worker process calls LISTEN () (the server Socket is shared), a listening message is sent to the master process. After the master process receives the message, the event is triggered.
- Disconnect: This event is triggered when the IPC channel between the main process and the worker process is disconnected.
- Exit: This event is triggered when a worker process exits.
- Setup: cluster.setupMaster() triggers this event after execution.
Most of these events are related to events in the CHILD_process module, which is encapsulated on the basis of interprocess messaging.
cluster.on('fork'.() = > {
console.log('the fork events... ');
})
cluster.on('online'.() = > {
console.log('the online event... ');
})
cluster.on('listening'.() = > {
console.log('listening event... ');
})
cluster.on('disconnect'.() = > {
console.log('the disconnect events... ');
})
cluster.on('exit'.() = > {
console.log('exit event... ');
})
cluster.on('setup'.() = > {
console.log('the setup event... ');
})
Copy the code
3. Master communicates with worker
Cluster.fork () creates worker processes by child_process.fork(). That is to say: master and worker process are parent and child processes; It communicates over an IPC channel like the parent processes created by child_process.
IPC stands for inter-process Communication. The purpose of inter-process Communication is to enable different processes to access resources and coordinate their work with each other. Node implements THE IPC channel with pipe technology, which is provided by Libuv and implemented by Named Pipe under Windows. * NIx uses Unix Domain Socket. Instead, interprocess communication at the application layer consists of simple message events and Send methods, which are very simple to use.
The parent process creates the IPC channel and listens for it before actually creating the child process, and then tells the child process the file descriptor for the IPC channel through the environment variable (NODE_CHANNEL_FD). During the startup process, the child process connects to the existing IPC channel according to the file descriptor, thus completing the connection between the parent process.
After the connection is established, the parent and child processes can communicate freely. Since IPC channels are created using named pipes or Domain sockets, they behave like network sockets and are two-way communication. The difference is that they complete the interprocess communication in the system kernel without going through the actual network layer, which is very efficient. In Node, the IPC channel is abstracted as a Stream object, which sends data (similar to write) when send is called, and the received message is triggered to the application layer through a Message event (similar to data).
The master and worker processes communicate through the IPC channel during server instance creation. Will that interfere with our development? Like getting a bunch of messages you don’t really need to care about? Surely the answer is no? So how does it work?
Node introduces the function of sending handles between processes. In addition to sending data through IPC, the send method can also send handles. The second parameter is the handle, as shown below
child.send(meeage, [sendHandle])
Copy the code
A handle is a reference that can be used to identify a resource and contains an internal file descriptor that points to an object. For example, handles can be used to identify a server-side socket object, a client-side socket object, a UDP socket, a pipe, and so on. So is handle sending any different than sending a server object directly to a child process? Does it really send the server object to the child process?
Before sending the message to the IPC pipe, the send() method assembles the message into two objects, one with handle and the other with Message, as shown below
{
cmd: 'NODE_HANDLE'.type: 'net.Server'.msg: message
}
Copy the code
What is actually sent to the IPC pipe is the handle file descriptor to be sent, which is an integer value. This message object is serialized to a string by json.stringify when written to the IPC pipe. The child process reads the message sent by the parent process by connecting to the IPC channel and restores the string to an object through json. parse. Then the message event is triggered to pass the message body to the application layer. CMD, if prefixed with NODE_, will respond to an internal event internalMessage. If message. CMD is NODE_HANDLE, It takes the message.type value and restores a corresponding object with the resulting file descriptor. A schematic of this process is shown below
In a cluster, take the worker process notifying the master process to create a server instance as an example. The worker pseudocode is as follows:
// woker process const message = {CMD: 'NODE_CLUSTER', type: 'net.Server', MSG: message}; process.send(message);Copy the code
The master pseudocode is as follows:
worker.process.on('internalMessage', fn);
Copy the code
How to implement port sharing
In the previous example, servers created in multiple Wokers listen on the same port 3000. Generally speaking, if multiple processes listen on the same port, the system will report an EADDRINUSE exception. Why is cluster ok?
The file descriptors of TCP server sockets are different in independently started processes. As a result, exceptions are thrown when the same port is listened on. However, the file descriptor for the service restored from the handle sent by Send () is the same, so listening on the same port does not raise an exception.
Here it is important to note that multiple applications to monitor the same port, the file descriptor at the same time can only be used by a process, in other words when sending the requests to the server, only a lucky process to get the connection, that is only for this request for service, these processes service is preemptive.
How to distribute requests to multiple workers
- Whenever the worker process creates a server instance to listen for requests, it registers on the master through the IPC channel. When the client request arrives, the master will be responsible for forwarding the request to the corresponding worker.
- Which worker will it be forwarded to? This is determined by the forwarding policy, which can be set with the environment variable NODE_CLUSTER_SCHED_POLICY or passed in at cluster.setupmaster (options). The default forwarding policy is polling (SCHED_RR);
- When a customer request arrives, the master will poll the worker list, find the first idle worker, and forward the request to the worker.
Vi. Working principle of PM2
Pm2 is a node process management tool. It can be used to simplify the tedious tasks of node application management, such as performance monitoring, automatic restart, load balancing, etc. If you have not used PM2 in your practice, you can check out another article of the author “PM2 Practical Guide”.
Pm2 itself is packaged based on the Cluster module. In this section, we will focus on PM2 Satan, God Daemon, and remote RPC calls between the two processes.
Satan, primarily the eight-ball (also known as the eight-ball Satan) in the Bible, regarded as the source of evil and darkness as opposed to the power of God.
God. Js is responsible for maintaining the normal operation of the process. God process has been running since it started, which is equivalent to the Master process in the cluster to maintain the normal operation of worker process.
Remote Procedure Call Protocol (RPC) refers to Remote Procedure Call. In other words, two servers, A and B, have one application deployed on server A and want to Call functions/methods provided by the application on server B. However, they cannot be called directly because they do not share the same memory space. The semantics of the call and the data of the call need to be expressed over the network. Method calls between different processes on the same machine also fall within the scope of RPC. The execution process is as follows
Satan is executed each time the command line is entered, and if the God process is not running, the God process needs to be started first. Satan then performs the logic through RPC calls to the corresponding method in God.
Take pm2 start app.js -i 4 as an example, God will configure the cluster and listen for events in the cluster when executing it for the first time:
/ / configure cluster
cluster.setupMaster({
exec : path.resolve(path.dirname(module.filename), 'ProcessContainer.js')});// Listen for cluster events
(function initEngine() {
cluster.on('online'.function(clu) {
// The worker process is executing
God.clusters_db[clu.pm_id].status = 'online';
});
// In the command line, kill pid triggers exit. Process. kill does not trigger exit
cluster.on('exit'.function(clu, code, signal) {
Stopped if the process is restarted too frequently
God.clusters_db[clu.pm_id].status = 'starting';
/ / logic
// ...}); }) ();Copy the code
After God is started, an RPC link between Satan and God is established and the prepare method is called, which calls cluster.fork to start the cluster
God.prepare = function(opts, cb) {
// ...
return execute(opts, cb);
};
function execute(env, cb) {
// ...
var clu = cluster.fork(env);
// ...
God.clusters_db[id] = clu;
clu.once('online'.function() {
God.clusters_db[id].status = 'online';
if (cb) return cb(null, clu);
return true;
});
return clu;
}
Copy the code
Seven,
This article from the basic use of cluster, events, to the basic implementation principle of cluster, and then how to PM2 process management based on cluster, take you from the entry to in-depth principle and understand its high-level applications, I hope to help you.
Blog github address: github.com/fengshi123/… , a summary of all the author’s blogs, also welcome to follow and star ~
References:
- Learn more about processes and threads in Node.js
- Node.js advanced: Cluster module in-depth analysis
- The node Chinese website
- NodeJS