This article explains the solution of Node single thread bottleneck from the actual project, the code is from the actual line of the project, please rest assured to eat, if there is a problem please correct.

My graduation project is to write an online SQL examination system for the college. The goal is to support 300 people to take online SQL programming examination at the same time and develop the whole stack. As the front-end, I choose Node as the background of the system without hesitation — bid farewell to the huge Java and embrace the flexible JS in all aspects. I am full of confidence in the prospect of the project. The project went smoothly, and soon the business was completed and the final test was reached, which was the last step for the system to be put into the real scene (if the test was not sufficient, the system would be sprayed online).

In my impression, The concurrency performance of Node is good, after all, I prepared for the interview, memorized some “Node single thread model but IO non-blocking, good asynchronous performance” and so on. Besides, in 2021, the performance of the server itself will not be too bad. It is estimated that it can support 300 concurrent people without optimization. Therefore, I took over a cloud server with 8G memory and 4 core CPU of the school and started pressure test after deploying the system. The reality is that a system with a target of 300 concurrent requests can’t sustain 100 concurrent requests, so it has to look for problems between its legs.

How to locate Node performance bottlenecks

Before you start looking for performance problems, brag about yourself. After all, solving performance bottlenecks means that the previous business development is almost complete and the project is almost done.

The performance test

Firstly, the system should be tested to observe various indicators, evaluate the current performance of the system, and then find out the bottleneck. I used the following two performance testing methods in the project:

  • The PTS pressure test service of a cloud can simulate the simultaneous concurrency of multiple IP addresses
  • JMeter thread pressure measurement, simulating a single point of high-frequency access

The specific pressure measurement operation is not described, interested friends can leave comments, I collect, later discussion.

I started a PTS concurrent pressure test in increments from 0 to 100 and the results were as follows:It can be seen that from the beginning of 100 concurrent requests, the number of returned abnormal requests cannot be ignored, 860 abnormal requests for 50,000 requests, and the system can not hold up. Then reading the generated test report, it can be found that almost all abnormal requests occur in one interface, as shown in the following figure:This interface accounted for 844 of the 860 exception requests, making it the biggest bottleneck in the system.

Locate and narrow the problem

After searching the Network data of the abnormal interface, it is found that the interface is the user login interface. Why? Surprised that an interface unrelated to the business itself had become a system bottleneck, I looked for more information to determine the cause.

First of all, I tried to improve the server configuration. After all, the previous 4-core CPU is really not enough. If we improve the hardware, it can be solved, at least it is the most direct solution. However, when I upgraded the configuration to 32 cores and 32 GIGABytes, there was still no improvement in performance. It seems that the problem is not the hardware configuration, but the problem needs to be further identified.

I started the concurrent test again. During the test, I started the command line of the server and the cloud console, ran the top command on the command line to check the specific performance of the process, and found that the CPU usage of the main process of the node in the system was directly full (100%), while the CPU usage of the server on the cloud console was only 6% at the highest. You can see the real-time monitoring of server performance on the cloud console:

Note that the maximum value on the vertical axis is only 6. In the case of a multi-core CPU, this obviously means that the main process is not using the multi-core CPU to its full potential, i.e. one core of the CPU is working hard while the other cores are watching the play. In addition, you can also see that the CPU usage displayed by the top command is single-core.

The main process can’t take advantage of the multi-core CPU. The reason is easy to understand. Node has a single-threaded model, and when we start a project, we start a single process. Based on the existing information, you can infer that the CPU usage of the login interface is excessive. After the final confirmation, the user login interface was annotated and tested again. It was found that the concurrency of 100 users was very easy and the CPU single core was not fully pulled. It seemed that the user login interface did consume too much CPU performance and became a bottleneck.

Code Review

Next, I need to find the code that causes the CPU to consume a lot of performance, so I need to CR the written user login interface. Here is the password tool file passhash.js that I have wrapped for password encryption and verification:

const bcrypt = require("bcryptjs");

/** * Compares the password entered by the user with the hash encrypted password *@param {string} password
 * @param {string} passhash
 * @returns {Promise<boolean>}* /
function comparePassword(password, passhash) {
  return bcrypt.compare(password, passhash);
}

/** * Returns the hash encrypted password *@param {string} password* /
async function getPasshash(password) {
  const salt = await bcrypt.genSalt(10);
  const hash = await bcrypt.hash(password, salt);
  return hash;
}

module.exports = {
  comparePassword,
  getPasshash,
};
Copy the code

I used the library bcrypt to encrypt the password, and added 10 bits of salt to improve security. Every time I logged in, I needed to call the comparePassword method of Bcrypt to calculate whether the user password was correct. Obviously, it was cpu-intensive operation, so finally I determined that the bottleneck was password verification.

Solve the single-thread bottleneck

Analyze and code

Password verification is cpu-intensive, and Node itself can only use one core. To overcome this bottleneck, additional coding is obviously required to take full advantage of multi-core cpus. Because Node itself cannot start multiple threads, it can only take the multi-process approach. Start a main process, responsible for the main business of the system, and start multiple sub-processes, responsible for CPU intensive computing. How many processes can use how many CPU cores.

Node has two key modules: child_process and cluster. Child_process is a subprocess module that provides subprocess operations, including creation, destruction, event listening, and so on. Cluster means cluster, which is a unified management of child processes. It encapsulates the creation of child_process forks and hides some of the details of child_process.

We can create and manage sub-processes in the project by directly using Cluster, which can facilitate the division of labor between the main process and sub-process. My project uses Express framework, so the startup process is written in the server.js file, and the key codes are as follows:

if (cluster.isMaster) {
  masterProcess();
} else {
  childProcess();
}

// The main process is initialized
function masterProcess() {
  / /... Start server, listen for port, listen for exit signal, here omitted...
  // Start the child process, set the child process event listener function, specific code is longer, for a thumbnail
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
  bindEvent(cluster.workers); // bindEvent comes from the child event manager
}

// The child process is initialized. Child processes are used to perform time-consuming operations that take advantage of multi-core CPU performance
function childProcess() {
  const passhash = require("./lib/passhash.js");
  const { childDataTypes } = require("./const/childWorker.js");

  // The child process listens to the event of the parent process and determines the message type according to the type attribute in the data object attached to the message event. If it is a password verification request, the response verification operation is performed
  process.on("message".async function (data) {
    if (data.type === childDataTypes.password) {
      const isMatch = await passhash.comparePassword(
        data.password,
        data.passhash
      );
      // Uuid is an externally generated unique ID used to determine which HTTP request the password verification result corresponds to
      if (isMatch) {
        process.send({ isMatch: true.uuid: data.uuid });
      } else {
        process.send({ isMatch: false.uuid: data.uuid }); }}else {
      process.send({ noData: true.uuid: data.uuid }); }}); }Copy the code

After the above steps are complete, the main process and child processes have been created, and they have established a bidirectional bridge for event communication. The bindEvent method is used to listen for events of the child process, which is defined in the event manager file event-binder.js. The file mainly provides two functions: listening for events of the child process and holding the callback function of the request so that the child process can call after the operation is completed. The key codes are as follows:

// The map structure that stores the callback function, which is used when the child process returns an event to the main process
const callbackMap = new Map(a);/** * Creates various types of event bindings * for child processes@param {Object} Workers Incoming cluster. Workers */
function bindEvent(workers) {
  const workerArray = Object.values(workers);
  if (workerArray.length === 0| |! workers) {return;
  }
  workerArray.forEach((worker) = > {
    worker.on("message".(data) = > {
      constcallback = callbackMap.get(data.uuid); callback? .(data); callbackMap.delete(data.uuid); }); }); }function addCallback(uuid, callback) {
  callbackMap.set(uuid, callback);
}

module.exports = { bindEvent, addCallback };
Copy the code

The last step is to notify the child process of password verification and register a callback function in the child process event manager to process the password verification result and return the login result. This part of the code is easy to write:

const uuid = uuidv4();
constworker = roundRobin(cluster.workers); worker? .send({type: childDataTypes.password,
  password,
  passhash: user.passhash,
  uuid,
});
// Register the uUID callback with the child process event manager
addCallback(uuid, responseFunc);
Copy the code

The roundRobin method selects an idle subprocess in the simplest polling mode and sends password verification tasks to the process. In fact, the roundRobin method is load balancing in non-distributed scenarios. The responseFunc function returns a different message based on the password verification result.

Retest performance

Finally, restart the PTS pressure test of a cloud and start the server command line and the cloud console. After the command line top, many Node processes are started, and the CPU performance of each process is over 90%. Then observe the visual real-time performance chart of the cloud console:

The server CPU was finally pulled to close to 90%, and the multi-core performance was fully utilized. Look at the performance of the user login interface:

The success rate was more than 99%, and the performance was improved by more than one order of magnitude than before optimization. The remaining anomalies were almost timeout problems. I guessed that the probability was mostly due to the bandwidth problem of the Intranet transparent transmission tool I used, so I opened JMeter and conducted a pressure test of 300 concurrent threads on the Intranet, with the performance as follows:

It can be seen that there is no error rate in the Intranet test. Therefore, it can be determined that the public IP address is required in the PTS pressure test of a cloud, and I can only use the IP transparent transmission tool to transparently transmit the Intranet IP address to the public network. The bandwidth limitation of this tool affects the concurrency performance in the public network test. To solve this problem, apply for a public IP server.

What about interprocess sharing state

Node interprocess state sharing has two approaches:

  • The state is stored in a common Redis database, which is accessed by all processes.
  • The status is managed in the main process through the IPC inter-process communication. When the status of the main process changes, the sub-process is notified to update the status through IPC. When the status of the sub-process changes, the main process is notified to synchronize the status through IPC.

If you have other good ideas, feel free to comment and share.

conclusion

After such optimization, behind the 500 concurrent can withstand pressure test, the first formal version finally secure online, although over time may be exposed to more problems, with the expansion of the use number new performance bottleneck will appear, but at this stage is ready, after have means to resist, to solve problems, optimize and extension system, Is the normal development trend of software (righteoutly put the pot to the younger brother to learn younger sister, manual dog head).

The Year of the Ox is coming, the spring recruitment is coming, I wish the students a satisfactory harvest of offer, colleagues work smoothly. Tencent AlloyTeam will continue to recruit new students in the New Year. Here is a forecast. Please send your resume to me when the spring recruitment officially starts.

Ah lian will continue to output dry articles, all from the actual combat and the combination of learning and thinking, if you are helpful, welcome to pay attention to me, together with progress.

The resources

Node documentation — Cluster Cluster

Other dry goods

Answer:

  • Koban small front of the big factory surface

Webpack dry goods:

  • Optimized a large project on Taobao, shared some dry goods (Webpack, SplitChunk code instance, graphic combination)
  • Mom no longer have to worry about my optimization | Webpack series (two) : SplitChunksPlugin source code explained

CSS details:

  • The interviewer wants to know how much you know about absolute position

For those who can’t find their way

  • Back end to front end of the little brother suddenly reap big factory offer, the truth is actually