A few days ago, we found a performance problem. When the amount of data was too large, the request timed out, causing a page of our Web application to not load data. This article is used to record the investigation process at that time and some thoughts.

Preliminary locating problem

By looking at the request log, we locate a Node interface. At first, we thought it was caused by the long I/O time of MySQL. However, by checking the slow log of MySQL, we did not find any related query statements, thus ruling out the cause of MySQL query.

Since this problem only occurs when there is too much data, we shifted our suspicion to the nested 2-tier for loop. In the test environment, the output execution time was found to be caused by the two-layer for loop, which consumed nearly 20 seconds when the data volume was too large. We know that Node is not good at CPU intensive calculation, so there is no complex calculation logic in the loop, just some judgment and object disassembly, combination, why it takes so long?

Locating problem details

In order to facilitate the test, I directed a copy of the data to the local, and then wrote a sample locally for testing and further positioning. This code was executed and analyzed using WebStorm’s own V8 profiling tool. The V8 profiling log is shown below:

As you can see from this image, the most CPU-intensive method is the one in LoDash called copyObject. I went through the code one by one where LODash was used and eventually located the _. Defaults method. Since the key of our object is fixed, we remove the _. Defaults method and assign the corresponding key directly to the object, like this:

const res = {
  'key1': object.key1 || source1.key1 || source2.key1,
  'key2': object.key2 || source1.key2 || source2.key2,
  'key3': object.key3 || source1.key3 || source2.key3,
}Copy the code

After modification, we managed to reduce the request time of the interface to 8-9s.

Further optimization

In fact, 8-9s is unacceptable. Of course, the optimization has to be done step by step, and the next solution we are going to do is to split the data back, similar to paging queries, so that each interface request takes less time, but the front-end needs to send a few more requests.

Think — Defaults method anatomy

The problem was solved, but with the heart of a dead fight with the source code, I still want to go to the source code to see what is consuming too much CPU time. Here is the source of the defaults method

function defaults(object, ... sources) { object = Object(object) sources.forEach((source) => { if (source ! = null) { source = Object(source) for (const key in source) { const value = object[key] if (value === undefined || (eq(value, objectProto[key]) && ! hasOwnProperty.call(object, key))) { object[key] = source[key] } } } }) return object }Copy the code

As you can see, first he calls the object () method on object and source, respectively, in case the data passed in is not object. We then loop through the Sources object array with forEach(). Object [key] = source[key] = source[key]

Optimization points:

  • The object() method is unnecessary in our usage scenario, as it must be passed object.
  • The forEach() method executes more efficiently than normalfor(let i = 0; i < arr.length; i ++)Much lower.
  • (eq(value, objectProto[key]) && ! hasOwnProperty.call(object, key))This part of the judgment, in our usage scenario, is also not necessary.

conclusion

Finally, as a complete and comprehensive third-party library, it is necessary to adapt to all kinds of input values and complete parameter verification. So when you need to use a third-party library, think about whether it’s necessary and whether you can do it in a simpler and more straightforward way. Sometimes, the most direct way may be the fastest :).

Next: Java: What you need to know about value passing