Excerpts from: eggjs.org/zh-cn/core/…
We know that JavaScript code runs on a single thread, in other words, a Node.js process can only run on one CPU. So if you use Node.js as a Web Server, you can’t enjoy the benefits of multi-core computing. As an enterprise-level solution, one of the problems we have to solve is:
How to drain server resources and take advantage of concurrency on multi-core cpus?
Node.js officially provides a solution called the Cluster module, which contains a brief introduction:
A single Node.js instance runs in a single-threaded environment. To take advantage of multi-core environments, users sometimes want to start a batch of Node.js processes for loading.
Clustered modules make it easy to create child processes that can be shared between service ports.
What is a Cluster?
To put it simply,
- Start multiple processes simultaneously on the server.
- Each process runs the same source code (like dividing the work of a previous process among multiple processes).
- Amazingly, these processes can listen on the same port at the same time (see @davidcai1993 on Cluster implementation).
Among them:
- The process responsible for starting other processes is called the Master process, it is like a “contractor”, does not do the specific work, only responsible for starting other processes.
- The other processes that are started are called Worker processes, which are literally “workers” doing the work. They receive requests and provide services externally.
- The number of Worker processes is generally determined according to the number of CPU cores on the server, so that multi-core resources can be perfectly utilized.
|
The multi-process model of the framework
The above example isn’t simple, but there’s a lot more to consider as an enterprise-level solution.
- How can I handle the abnormal exit of the Worker process?
- How do multiple Worker processes share resources?
- How to schedule multiple Worker processes?
- .
The process to protect
Robustness (also called robustness) is an issue that enterprise applications must consider. In addition to ensuring the code quality of the program itself, the framework level also needs to provide a corresponding “bottom” mechanism to ensure the availability of the application in extreme cases.
In general, Node.js process exits can be classified into two categories:
Uncaught exception
Node.js provides the process.on(‘uncaughtException’, handler) interface to catch it. However, when a Worker process encounters an uncaughtException, It is already in an indeterminate state, at which point we should gracefully exit the process:
- Close all TCP servers of the abnormal Worker process (quickly disconnect the existing connection and stop receiving new connections), disconnect the IPC channel with the Master and stop accepting new user requests.
- The Master immediately forks a new Worker process, keeping the total number of “workers” online unchanged.
- The abnormal Worker waits for a period of time, processes the accepted requests and exits.
|
OOM. The system is abnormal
However, when a process is abnormal and crash or OOM is killed by the system, we still have the opportunity to continue the process execution unlike when the exception is not captured. The Master can only make the current process exit directly and fork a new Worker immediately.
In the framework, we use graceful and egg-cluster modules to implement the above logic. This scheme has been widely deployed in the production environment of Alibaba and Ant Financial, and has stood the test of “Double 11” promotion, so it is relatively stable and reliable.
The Agent mechanism
Speaking of which, the Node.js multi-process solution seems to be taking shape, which is what we used online in the early days. However, we later found that some tasks do not need to be done by every Worker. If they are all done, it will waste resources on the one hand and, more importantly, may lead to resource access conflicts between multiple processes. For example: Log files in production are usually archived by date, which is easy to do in a single-process model:
- At 00:00 every day, the current log file is renamed according to the date
- Destroy the previous file handle and create a new log file to continue writing
Imagine if there were now four processes doing the same thing. Therefore, for this kind of background running logic, we hope to put them into a separate process to execute, which is called Agent Worker, Agent for short. Agent is like a “secretary” hired by the Master for other workers. It does not provide external services but only works for App workers, specializing in handling some public affairs. Now our multi-process model looks like this
|
The startup sequence of our framework is as follows:
|
- After the Master starts, fork the Agent process
- After the Agent is successfully initialized, the Master is notified through the IPC channel
- The Master forks multiple App workers
- Notify the Master that the App Worker is successfully initialized
- After all processes are successfully initialized, the Master notifies the Agent and Worker that the application is successfully started
In addition, there are several points to note about Agent workers:
- Since App Worker depends on Agent, fork App Worker only after Agent initialization is complete
- Although the Agent is the “secret” of App Worker, it should not be placed on the Agent to do business-related work, otherwise it will be bad for her to be exhausted
- Due to the special positioning of Agent, we should ensure its relative stability. When an uncaught exception occurs, the framework will not let it exit and restart like App Worker, but record the exception log, alarm and wait for manual processing
- The API mounted on Agent and common App Worker is not exactly the same. How to identify the difference can be found in the framework document
The use of the Agent
You can implement your own logic in agent.js at the root of your application or plug-in (similar to launching custom, except that the entry argument is an Agent object).
|
|
In this example, the agent.js code will be executed on the Agent process, and the app.js code will be executed on the Worker process. They conduct inter-process communication (IPC) through the messenger object encapsulated by the framework. The following chapters will explain the IPC of the framework in detail.
Master VS Agent VS Worker
When an application is started, all three processes are started simultaneously.
type | Number of processes | role | The stability of | Whether to run business code |
---|---|---|---|---|
Master | 1 | Process management, message forwarding between processes | Very high | no |
Agent | 1 | Background running work (long connection client) | high | A small amount of |
Worker | Generally, set this parameter to the number of CPU cores | Execute business code | general | is |
Master
In this model, the Master process undertakes the work of process management (similar to PM2), without running any business code, we only need to run a Master process, which will help us handle the initialization and restart of all Worker and Agent processes.
The stability of the Master process is extremely high. When running online, we only need to run the Master process started by the egg.startCluster through the egg-scripts background. There is no need to use pm2 and other daemon modules.
|
Agent
In most cases, the Agent process is completely ignored when writing business code, but it comes into play when there are scenarios where you just want the code to run on one process.
Because the Agent has only one Agent and is responsible for many dirty and tiring tasks to maintain connections, it cannot be easily hung up or restarted. Therefore, the Agent process will not exit when listening to uncaught exceptions, but will print error logs. We need to be vigilant about uncaught exceptions in logs.
Worker
The Worker process handles real user requests and the processing of scheduled tasks. The scheduled task of Egg also provides the ability to run only one Worker process, so the problems that can be solved by the scheduled task should not be executed on the Agent.
Worker runs business codes, which are relatively more complex and less stable than the codes running on Agent and Master processes. When the Worker process abnormally exits, the Master process will restart a Worker process.
Interprocess Communication (IPC)
Although each Worker process is relatively independent, they still need to communicate with each other, which is called inter-process communication (IPC). Here is a sample code provided by Node.js
|
If you are careful, you may have found that the IPC channel of cluster only exists between Master and Worker/Agent, and Worker and Agent processes do not exist between each other. So what should workers do to communicate with each other? Yes, through Master.
|
To make it easier to call, we’ve wrapped a Messenger object to hang on app/Agent instances, providing a set of friendly apis.
send
app.messenger.broadcast(action, data)
: Send to all Agent/app processes (including yourself)app.messenger.sendToApp(action, data)
: sent to all app processes- Calling this method on your app sends it to yourself and other app processes
- Calling this method on the Agent is sent to all app processes
app.messenger.sendToAgent(action, data)
: Sent to the Agent process- Calling this method on the APP sends it to the Agent process
- Calling this method on the Agent sends it to the agent itself
agent.messenger.sendRandom(action, data)
:- This method is not available in the app (currently Egg implementation is equivalent to sentToAgent)
- The Agent sends a random message to an app process (the master controls who the message is sent to)
app.messenger.sendTo(pid, action, data)
: Sent to the specified process
|
app.messenger
All of the above methods are available in theagent.messenger
To use.
egg-ready
As mentioned in the example above, you need to wait for the egg-ready message before sending it. Only after the Master confirms that all Agent and Worker processes are successfully started (and ready) will the Master send an egg-ready message to all agents and workers via Messenger to inform them that everything is ready. The IPC channel is ready to be used.
receive
Listen for action events on Messenger to receive messages from other processes.
|
The IPC of actual combat
Let’s use a simple example to get a feel for how IPC can be used to solve a real-world problem under the framework’s multi-process model.
demand
We have an interface that needs to read some data from a remote data source, providing apis to the outside world, but the data from this data source rarely changes, so we want to cache the data in memory to improve service capacity and reduce RT. There needs to be a mechanism for updating the memory cache.
- Periodically obtain data from the remote data source and update the memory cache. To reduce the pressure on the data source, the update interval is set to be long.
- The remote data source provides an interface to check for data updates, and our service can call the interface more frequently to pull data when it is updated.
- Remote data sources push messages of data updates through the messaging middleware, and our service listens for messages to update the data.
In the actual project, we can adopt plan 1 for the bottom, and combine plan 3 or Plan 2 to improve the real-time performance of data update. In this example, we implement all three cache update schemes simultaneously through IPC + scheduled tasks.
implementation
We encapsulate all the logic that interacts with the remote data source in a Service and provide the GET method for the Controller to call.
|
Write a scheduled task to achieve scheme 1, which periodically obtains data from the remote data source every 10 minutes to update the cache as the bottom.
|
Another scheduled task is written to realize the check logic of scheme 2. Let a worker call the check interface every 10s. When data changes are found, all workers are notified through the method provided by Messenger.
|
Listen for refresh event in the startup custom file and update data. All Worker processes can receive the message and trigger the update. At this point, scheme 2 has been completed.
|
Now let’s look at how to implement the third scenario. We need to have a client of message-oriented middleware, which will maintain a long connection with the server. This kind of long connection maintenance is more suitable for the Agent process, which can effectively reduce the number of connections and reduce the consumption of both ends. So we enable message listening on the Agent process.
|
With the proper use of Agent processes, scheduled tasks, and IPC, we can easily address similar requirements and reduce pressure on data sources. See examples/ipc for examples of code.