This article will analyze top-calls step by step from the code details and how to solve them.

Our Node applications are too slow to see the performance benefits.

Basic work

What are the general effects on the system?

For a typical system, performance problems are mainly affected by:

  • Code implementation issues
  • Architecture issues: for example, can we add a caching system (e.g. Redis) to improve query efficiency?
  • There are system problems with the systems we rely on: for example, MySQL, which we rely on heavily, is overloaded

This time we will focus on the details from the first point of view: code implementation, code by code

General steps for performance analysis

Based on past experience, the general steps for analyzing performance problems are as follows:

  • Use a performance testing tool, such as JMeter, to assess performance and get a base value
  • Use Profiler tools to analyze which pieces of code take the most time
  • Optimize for the code that takes the most time
  • Repeat the above three steps until the performance reaches the expected unknown

Start from the code details to solve the general idea of performance problems

This is actually a skill problem, can continue to sum up

Our foundation work

  • Since Node is a single process, we can test the performance of a single Node process first. Multiple processes can be re-evaluated for overall performance before finally going live
  • Since we are also HTTP services, we used JMeter as our performance testing tool
  • Profiler is a Profiler tool that can be used for Google Profiler Profiling. If you can use v8 Profiler for Google Profiler Profiling, you can use v8 Profiler for Google Profiler Profiling

Get to work…

Test reference value

Our Web framework is self-implemented on the basis of our own dependency injection mechanism, using Express as the base HTTP Server, basically SpringMVC for Node, named Guice-Web V2. So let’s start with a performance comparison between the framework itself and Express:

Performance indicators guice-web V2 express
QPS 32 1390

I added several required middleware for this Express, so the performance result is lower than the official value

I was stunned to find that Guice-Web V2 was 43 times the performance of Express

But that’s okay, let’s see where the performance is going, right?

Analysis 1: First V8 profiling results analysis

Using V8 profiling results in the following:

Profiling results in ordering functions in code from highest to lowest in terms of the number of times a function is called, and as you can see from this graph, the one that consumes us the most is fs.readsync;

This was a surprise to me because I didn’t feel like there was any file reading in the code, especially synchronous file reading, but why?

Moving on to bottom-up, this shows the entire function call stack from the outside in:

Fs.readsync == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == = This is mainly due to my understanding of the loading mechanism of Node modules. I thought Node would compile with all require statements in advance, but it doesn’t.

In addition, write a demo to prove the logic again:

// require in the header require(' HTTP '); require('https'); require('child_process'); require('cluster'); require('fs'); require('crypto'); require('net'); require('events'); require('zlib'); require('util'); function f() {} var startTime = new Date().getTime(); f(); var endTime = new Date().getTime(); Console. log(endtime-starttime) // The result is 0Copy the code
Function f() {require(' HTTP '); require('https'); require('child_process'); require('cluster'); require('fs'); require('crypto'); require('net'); require('events'); require('zlib'); require('util'); } var startTime = new Date().getTime(); f(); var endTime = new Date().getTime(); console.log(endTime - startTime); / / get 39Copy the code

That says it all.

However, Node will cache the require itself, so it should not be a performance bottleneck even though the require is called many times and the IO is synchronized.

So let’s move on

Analysis 2: Babel-Register is to blame

Source-map = source-map = source-map = source-map = source-map

Since the code makes extensive use of ES6/ES7 syntax and relies on Babel conversion, I don’t want to compile every time for the sake of previous development, so I directly import the file require(‘babel-register’). I thought it was nothing, but Babel converted all the files when the program started, but I forgot that == it needed to trace stack ==. Babel register (‘ Babel register’);

QPS increased from 32 to 50+, almost doubling.

Even so, it was much slower than I expected, and MY goal was at least half express.

Analysis 3: Real optimization begins. Use Babel

The above two analyses and changes are minor tweaks that do not involve the bones of the system itself. We continue parsing: we can see that the top file is error-stack-parser.js, which comes from the error-stack-parser I introduced.

Why did I introduce this dependency?

I have written a log output module, but the output format I want in the log output module is:

Date (YYYYMMDD) LEVEL APP_NAME message (in fileName:lineNumber)

Where date (YYYYMMDD), fileName and lineNumber are variables, where fileName and lineNumber are obtained?

Use (new Error()).stack to parse as baryon/tracer did. In addition, I introduced error-stack-parser for easy parsing, and error-stack-parser uses regular expressions to parse, which is very slow!!

The cause of the problem is found, but how to solve it?

The easiest way, of course, is not to print fileName and lineNumber, but that’s not what I want.

The alternative, of course, is to pass fileName and lineNumber wherever you type log. Node has a preset variable __filename, but not __line, and, Logger.debug (” MSG “) from logger.debug(” MSG “) to logger.debug(__filename, __line, “MSG “) is not a good idea. No logger module requires this.

I remember when I wrote C++, the output of logging was wrapped in a macro, such as:

#define LOG_INFO(format, args...) { \ logger->log(alog::LOG_LEVEL_INFO, __FILE__, __LINE__, __FUNCTION__, format, ##args); }Copy the code

So can I do the same?

Babel is a nice thing that allows us to create our own syntax, so we can use Babel and change our code from Logger.debug (” MSG “) to: Logger.debug ({fileName: “fileName “, lineNumber: 12},” MSG “)

Let’s implement the Babel plugin:

const path = require('path'); const LOG_PATTERN = /__LOG_(\w+)__/; module.exports = ({types}) => { const void0Expression = types.unaryExpression('void', types.numericLiteral(0), true); let transformLog = (nodePath, state) => { let calleeName = nodePath.node.callee.name; let matched = LOG_PATTERN.exec(calleeName); let methodName = matched[1].toLowerCase(); let filePath = path.resolve(state.file.opts.filename); let startLineNumber = nodePath.node.loc.start.line; let calleeInstance = ""; if (nodePath.node.arguments[0].type === "MemberExpression") { calleeInstance = nodePath.node.arguments[0].object.name + "." + nodePath.node.arguments[0].property.name } else { calleeInstance = nodePath.node.arguments[0].name; } let calleeExpression = types.identifier(calleeInstance + "." + methodName); let logBasic = types.identifier(JSON.stringify({ lineNumber: startLineNumber, fileName: filePath })); let calleeArguments = [ logBasic ]; for (const arg of nodePath.node.arguments.slice(1)) { calleeArguments.push(arg); } let joinedCallExpression = types.callExpression(calleeExpression, calleeArguments); nodePath.replaceWith(joinedCallExpression); }; return { visitor: { CallExpression: function(nodePath, state) { let calleeName = nodePath.node.callee.name; if (LOG_PATTERN.test(calleeName)) { transformLog(nodePath, state); }}}}; };Copy the code

After this plug-in conversion:

__LOG_DEBUG__(self.logger, 'got instance: %s', name);Copy the code

To become

self.logger.debug({"lineNumber":224,"fileName":"/Users/wuliang/zone/node-guice/src/v2/Guice2.js"}, 'got instance: %s', name);Copy the code

Ok, let’s test the performance again:

QPS is almost 300+, a ten-fold increase.

Analysis 4: Map vs Object

The above performance test results are very encouraging, but I still have a long way to go to achieve my goal. Let’s move on to top-calls. As you can see, besides Logger, the other big drain is getInstance in Guice2.js. This is because my entire framework relies heavily on Guice, the dependency injection framework I wrote myself, and getInstance is the method to fetch objects.

In this dependency injection, there is one instancePool to hold all binding information and another instanceCache to cache objects in order to implement the Singleton. {} {Map} {Map} {Map} {Map} {Map} {Map} {Map} {Map}

Es6-map-vs-object-properties es6-map-vs-object-properties es6-map-vs-object-properties es6-map-vs-object-properties es6-map-vs-object-properties From this, we can see that Map still has some performance advantages, so we changed instancePool and instanceCache to Map

It’s nice to test the performance:

== Performance obviously improved a lot than before, the lowest moment is also the same as the last scene, the peak can reach 800+ ==

Analysis 5: Strive to optimize each function to the utmost

After the above results, I found a certain bottleneck in the overall performance. I feel that the big head has been optimized, so we will look at the details of which can be optimized

Detail optimization 1: There is no need to load polyfill for ES5/6/7 on the Node environment

Since many of our base modules are cross-platform (Node and browser), we have introduced ES5-SHIm, ES6-Shim, and ES7-Shim to address browser compatibility issues. In theory these polyfills will decide whether to polyfill or not depending on the running environment; But here’s what blew me away:

Date in ES5-SHIm

Look at the implementation of es5 – shim, es5 – shim. Js: 1413, you can see it has a doesNotParseY2KNewYear | | acceptsInvalidDates | |! SupportsExtendedYears, then Node doesn’t meet its definition of Date.

Es5-shim, ES6-shim, es7-shim, es5-shim, es6-shim, es7-shim

Detail optimization 2: Formatted output of Date

There are two points that can be optimized:

  • new Date().getTime() vs Date.now()
  • Date formatting library selection

new Date().getTime()vsDate.now()

Jsperf already has TestCase: date-now-vs-new-date. Date.now() has a significant performance advantage

Date formatting library selection

There are a lot of Date formatting options, like moment, fecha, which is much smaller than moment,

I used to use fecha, but after performance testing, I finally chose speed-date with cache.

Why is that?

Because my Date formatting is mainly on the originalUrl of the output request, and our system has very heavy traffic, so the benefits of caching the formatting stage at the same time are obvious.

Detail optimization three: string stitching optimization

A preference for sprintf may be innate due to a history of C++ development. So I don’t really like direct string addition, or Template Literals in ES6.

The util. Format method is already available in Node, but I ignored it and used alexei/sprintf.js instead. However, I found that util.

Some of the code looks like “ke” + “ys”. In this case, I added a Babel plugin: Laysent /babel-plugin-transform-string-join optimizes for this case by making it: “keys”

Detail optimization four: some optimization for the production environment

For example, in my log library, we do the log date, level and other information are color output (write JS can do these, ha ha); However, color output is useless in production because production logs are written to a file, and normal modules do istTY judgment to decide whether they want color output or not. This judgment is therefore a liability in the production environment.

Solution: Add a startup environment variable, such as NODE_ENV=production, to mask the color output.

Detail optimization point 5: Some ground rules

There are a few more basic principles that have nothing to do with Node:

  • Initialize variables that can be initialized in advance as early as possible, although it is a waste of memory
  • Get out of the function as early as possible

Analysis 6: Express correlation

There is an implicit optimization point that I suddenly noticed because we added CORS to all request headers. The previous code was:

app.use(function (request, response, next) {
    response.header(
        'Access-Control-Allow-Origin',
        request.headers.origin
    );
    response.header(
        'Access-Control-Allow-Headers',
        'Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With'
    );
    response.header(
        'Access-Control-Allow-Methods',
        'GET,HEAD,POST,PUT,DELETE,OPTIONS'
    );
    response.header('Access-Control-Allow-Credentials', 'true');
    next();
});Copy the code

When I changed to:

response.header({
    'Access-Control-Allow-Origin': request.headers.origin,
    'Access-Control-Allow-Headers':
        'Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With',
    'Access-Control-Allow-Methods':
        'GET,HEAD,POST,PUT,DELETE,OPTIONS',
    'Access-Control-Allow-Credentials': 'true'
});
next();Copy the code

After the optimization of the above analysis five and six, the final test results are as follows:

The graph above is just the QPS of some points. I analyzed the jmeter log and found that ==QPS reached: 680, which is 32 times higher than before.

Analysis 6: Clearly said 40 times, why is 32 times? The title party?

The framework I compare here, guice-web, has our version 2.0, but we still use the 1.x version online. The QPS of the 1.x version is only 17, which is much slower than the 2.0 version, so == the gap from 17 of the 1.x version to 32 of the 2.0 version to 680,40 times after optimization

What are the optimizations for our 1.x and 2.0 releases?

The points involved in performance are:

  • The 1.x release made heavy use of Kriskowal/Q as the Promise library, but after profiling, the performance of this library was not very good
  • Don’t Blindly use Promises: A number of methods have used promises in the past for general purpose purposes, but the reality is that this approach is too blind and affects performance, and should be used appropriately
  • Don’t use it blindlyprocess.nextTickThe same goes for this one, because it pushes tasks to the next poll. Using it blindly causes things to be done later when they could be done now, and because you can’t predict the time consumption of tasks in the queue.

conclusion

This section is optimized for many details in the code, which includes the following points:

  • Use regular expressions as sparingly as possible: V8’s implementation of regular expressions is pretty slow, and I thought later that writing a syntax parser to replace regular expressions might be faster
  • Make use of conversion and compilation
  • Select the appropriate data structure
  • Research your choice of dependent libraries
  • Use promise and promise wiselyprocess.nextTickAnd try not to do that if you don’t put it in the next polling time

There are many more points that can be optimized (for example, for… Question of await in of), we will continue to summarize later.