Node.js environment performance monitoring

Introduction

With the release of Node V11.0, Node has come a long way. A number of server-side frameworks have emerged based on Node to help us develop and deploy front-end engineering independently of the back-end.

The migration of business logic and the emergence of server-side rendering models for various MV* frameworks make the node-based front-end SSR strategy more dependent on server performance. The first screen display performance and Node service stability directly affect user experience. Node as a server language, the overall performance of Node is still not well tuned compared to older server languages such as Java and PHP. While there is an alarm platform like Sentry to notify you of errors, it does not prevent them. To prevent this, you first need to understand the main indicators of Node.js performance monitoring.

The following code is based on the Egg framework. If you are not familiar with Egg, you can browse the documentation first

indicators

The resource bottlenecks of the server are as follows:

CPU
memory
disk
I/O
network

Different Node environments require different types of resources. If Node is only used for front-end SSR, then CPU and network are major performance bottlenecks.

Of course, if you need to use Node for data persistence, then I/O and disk usage will also be high.

Even companies that are very advanced on the front end rarely use Node as a support for their business data. At best, it acts as a BFF layer to provide data services to the front end and does not directly touch persistent data. So disks and I/O are hardly a bottleneck for current front-end performance.

If there are platforms that use Node for data persistence, most are experimental or in-house. Not directly oriented to business scenarios.

Therefore, CPU, memory, and network are the main performance bottlenecks for Node in most scenarios.

CPU indicators

CPU load and CPU usage

As the name implies, both of these metrics are quantitative metrics used to assess how busy the system’s CPU is at the moment. CPU load and CPU utilization are two different ways of quantifying how busy a CPU is.

CPU load: process perspective
CPU usage: CPU time allocation

Processes are the smallest unit of resource allocation.

This sentence in operating system textbooks or your examination papers are more or less appeared. That is, the system allocates resources at the process level, with one CPU core serving only four processes at a time.

The load on the CPU is easy to understand. The total number of processes that occupy and wait for the CPU during a period of time is the load average of the CPU during that period. In most cases, we call this standard loadavg.

CPU utilization, on the other hand, quantifies the CPU time utilization. Generally, CPU utilization is considered as 1 – Idle time/total CPU time.

Tizi is explained very clearly on the wiki

Quantifying CPU metrics

So which of these two indicators best represents the actual state of the system?

Slide: CPU

Process of people:

Let’s say there are four slides. Each slide can hold up to 10 people. We assume that all people are the same size. Then, the following analogy can be obtained:

Loadavg = 0, indicating that there is no one on the slide
Loadavg = 0.5, indicating that half of the slides are occupied by the average number of people on each slide, i.e., a total of 20 people on the slide. Due to CPU scheduling, these people are generally distributed evenly (everyone picks the slide with fewer people).
Loadavg = 1, which means that each slide is full and there is no free space
Loadavg = 2, which means that not only is each slide packed, there are 40 people waiting in the back

The above analogies are based on transient loadavg.

Generally, we adopt three different time standards for loadavg quantization. One minute, five minutes and 15 minutes.

It is difficult to get a balanced index with an index of one minute. Because one minute is too short, it is possible that a peak in one second can affect the average indicator over a one minute period. However, if loadAVG suddenly reaches a high value within 1 minute, it may also be a precursor to system crash, which is also a warning indicator.

Five minutes and 15 minutes are good indicators. When the CPU is running at high load for 5 or 15 minutes, it is very dangerous for the entire system. Anyone who has ever been stuck in a traffic jam should know that once there is a traffic jam, as long as the jam is not cleared in time, the traffic jam will become longer and longer. The same is true for a CPU. If the number of waiting processes on the CPU is too high, the later queued tasks are less likely to grab resources and will be blocked forever.

On a MAC, you can run the sysctl -n vm. Loadavg command under root.

// /app/lib/cpu.js
const os = require('os');
// Number of CPU cores
const length = os.cpus().length;
// Average load of single-core CPUS
os.loadavg().map(load= > load / length);
Copy the code

CPU utilization is not a good number to use as a direct yardstick. Because processes block on the CPU for different reasons, for CPU-intensive tasks, CPU utilization is a good indicator of how well the CPU is working, but for I/O intensive tasks, idle CPU does not mean that the CPU is idle, but that the task has been suspended for other operations.

However, for THE SSR Node system, rendering can basically be understood as CPU intensive business, so this indicator can reflect the CPU performance of the current business environment to a certain extent.

// /app/lib/cpu.js

const os = require('os');
// Get the current instantaneous CPU time
const instantaneousCpuTime = (a)= > {
    let idleCpu = 0;
    let tickCpu = 0;
    const cpus = os.cpus();
    const length = cpus.length;

    let i = 0;
	  while(i < length) {
      let cpu = cpus[i];

      for (let type in cpu.times) {
        tickCpu += cpu.times[type];
      }

      idleCpu += cpu.times.idle;
      i++;
    }

    const time = {
      idle: idleCpu / cpus.length,  // The idle time of a single-core CPU
      tick: tickCpu / cpus.length,  // Total CPU time
    };
	  return time;
}
const cpuMetrics = (a)= > {
  const startQuantize = instantaneousCpuTime();
  return new Promise((resolve, reject) = > {
    setTimeout((a)= > {
      const endQuantize = instantaneousCpuTime();
      const idleDifference = endQuantize.idle - startQuantize.idle;
      const tickDifference = endQuantize.tick - startQuantize.tick;

  		resolve(1 - (idleDifference / tickDifference));
    }, 1000);
  });
};

cpuMetrics().then(res= > {
    console.log(res);
	/ / 0.074999
});
Copy the code

Combined with the above two indicators, the operating state of the system can be roughly obtained, so as to intervene the system. Such as demoting SSR to CSR.

Memory metrics

Memory is a very easy metric to quantify. Memory usage is a common indicator of a system’s memory bottleneck. For Node, the usage of the internal memory stack is also a quantifiable metric.

// /app/lib/memory.js
const os = require('os');
// Get the current Node memory stack
const { rss, heapUsed, heapTotal } = process.memoryUsage();
// Get free system memory
const sysFree = os.freemem();
// Get the total system memory
const sysTotal = os.totalmem();

module.exports = {
  memory: (a)= > {
    return {
      sys: 1 - sysFree / sysTotal,  // System memory usage
      heap: heapUsed / headTotal,   // Node heap memory usage
      node: rss / sysTotal,         // Memory usage ratio of nodes}}}Copy the code

There are a few concerns about the value obtained by process.memoryUsage() :

My Node enlightenment book “Simple Node.js” this book, although the version has fallen behind the current Node.js a lot of release, but it talked about the V8 engine GC mechanism content, is still very useful, recommend everyone to buy the legitimate version to support the teacher.

rss: indicates the total memory occupied by node processes.
heapTotal: indicates the total heap memory.
heapUsed: Actual heap memory usage.
external: the amount of memory used by external programs, including the C++ programs at the Node core.

The first thing to focus on is the memory stack, which is the footprint of the heap. In the single-threaded mode of Node, the C++ program (V8 engine) allocates Node memory as heapTotal for the Node thread. During Node use, new variables declared use this memory to store heapUsed.

Node’s generational GC algorithm wastes memory resources to some extent, so global.gc() is forced when heapUsed reaches half of heapTotal. Read this article about GC operations. For system memory monitoring, it is not possible to just GC the system memory level as Node memory level, but also render degradation. 70% to 80% memory usage is a very dangerous situation. The specific value depends on the host where the environment is located.

Strongloop-node. js Performance Tip of the Week: Managing Garbage Collection

QPS

Strictly speaking, QPS cannot be used as a direct standard for Web monitoring. But when the server is not getting QPS close to what it would get under pressure under high load, you need to consider some other reason for the server’s performance bottleneck. Generally, when conducting SSR in Node environment, assuming that the maximum number of threads in Node-cluster is 10, 10 pages can be rendered in parallel, which of course also depends on the number of host CPU cores.

With Node as the host environment for SSR, it’s easy to keep track of how many requests the current machine is responding to over a period of time. I tried to stress test a Web site in several ways during my senior thesis.

ApacheBench

http_load

Seige

All three web pressure tools are similar in that they can test concurrent requests, make concurrent access to a Web site for multiple users, record the response time of all requests, and repeat requests, which can well simulate the performance of a Node environment under pressure.

Based on the results of the performance pressure test and the assessment of the peak traffic demand, it is possible to calculate roughly how many machines are needed to ensure the stability of the Web service and ensure that the majority of users can respond in an acceptable time.

test

According to the above three indicators, perform pressure test for the local startup environment.

The locally started Node environment is based on the React SSR environment extended by the Egg framework. In the actual online environment, many static resources (including javascript scripts, CSS, images, etc.) are pushed to the CDN, so these resources do not directly exert pressure on the environment. There are also many process differences between production and development environments, so actual performance is much better than local boot. Here, for testing convenience, you start the Egg project directly locally.

The test environment

The Node project can be started locally by using PM2 or directly by using Node command. In the local test environment, try not to use webpack-dev-server, which may cause the Cluster mode of Node not to run well, and the monitoring thread blocks the page rendering thread. An EGG-based environment can use schedule scheduled tasks to periodically print environment monitoring logs. For specific use, you can see the document in Egg, which will be more detailed. You can define a log type and store monitoring logs independently of application logs for easy analysis and visualization.

// /app/schedule/monitor.js
const memory = require('.. /lib/memory');
const cpu = require('.. /lib/cpu');

module.exports = app= > {
  return {
    schedule: {
	    interval: 10000.type: 'worker',},async task(ctx) {
      ctx.app.getLogger('monitorLogger').info('The log results you want to print')}}}// /config/config.prod.js
const path = require('path');
// Customize log files to a separate monitor log file
module.exports = appInfo= > {
  return {
    customLogger: {
       monitorLogger: { file: path.resolve(__dirname, '.. /logs/monitor.log')}}}}Copy the code

Then prepare siege for crunch: Install siege on Mac

Or on a MAC it’s easier to install Siege directly using BREW. This method is recommended because the libSSL library may not be connected and HTTPS requests cannot be made.

Test and monitor results

In the case of no requested access:

siege

Configure the list of siege request urls: We can place the urls for siege requests in a file and read them using the siege command (note that siege can only access HTTP sites, and other methods may need to be considered if the site forces HTTPS).

Urls file

Execute: siege -c 10-r 5 -f urls-i-b

-c: simulates the simultaneous access of N users

-r: Repeat the test n times

-f: specifies the file to obtain the test URL

-i: specifies the random URL to obtain the URL in the file

-b: The request does not need to wait

The siege command above means that 10 concurrent requests to a random site in a urls file are executed five times without waiting for direct access.

Siege achieves 515 hits on the server because the server has static resources to request in addition to the main page. These hits include pages, javascript scripts, images, CSS, etc. The average response time for each resource is 0.83 seconds

The request end time is 20:29:37, and you can see that after this time, the CPU indicators begin to decline, while the memory does not change significantly.

One more stressful test:

Run: sieg-c 100-r 5 -f urls-i-b to increase the number of concurrent transactions to 10 times 100.

You can see that the average response time has dropped to 3.85 seconds, which is very significant. In addition, loadAVG has a very obvious increase compared with the first pressure test. Memory usage doesn’t change much,

The test environment uses a VIRTUAL machine (VM) that does not exclusively use all resources of the physical machine. However, the number of cpus it obtains is the number of cpus of the physical machine. Since we previously calculated single-core for each parameter, the CPU-related results here need to be related to the number of physical machine cores and the number of cores occupied by the virtual machine.

Interested partners can try the machine’s limit ORZ. Or try a pressure test on a physical machine. I didn’t dare hurt my little brother like that.

Conclusion

Many businesses are moving forward, and the concept of BFF (Backends for Frontends) has been gradually tried out by many teams. Let the back end focus on providing a unified data model, and then migrate the business logic to the BFF layer based on Node.js, and let the front end provide API interfaces for itself, so as to leave a lot of costs of front and back end interadjustment, so that the RPC or HTTP interfaces provided by the back end are more common, less modification of the back end engineering, and speed up the efficiency of development.

However, this is very dependent on the stability of the Node side. In BFF architecture, once the Node side is blocked due to errors, all front-end pages will lose services, resulting in serious consequences. Therefore, the monitoring of the Node side is more and more meaningful. Combining traditional platforms such as Sentry or Zabbix can help build a stable front-end deployment environment.

reference

Several web server performance measurement tools

Node.js Garbage Collection Explained

Pattern: Backends For Frontends

Node.js Performance Monitoring – Part 1: The Metrics to Monitor

Node.js Performance Monitoring – Part 2: Monitoring the Metrics

What is loadavg

Using LoadAvg for Performance Optimization