preface

Node is a platform built on top of Chrome’s JavaScript runtime, benefiting directly from the V8 engine for JavaScript execution, and benefiting from better performance or new language features (such as ES6) as V8 upgrades, but also being limited by V8. For example, memory limits.

Answer the title question first, the answer is yes, otherwise when the production environment has a memory leak ready to delete the library and run

By the end of this article you will have learned:

  1. What is a memory leak
  2. Why does Node care about memory
  3. Node memory and garbage collection policy
  4. You can view Node memory indicators
  5. How do I prevent memory leaks
  6. How do I locate memory leaks

Why does Node care about memory

Front-end developers usually don’t have to worry about memory because the JavaScript engine takes care of everything for us. And in the case of one Tab and one JavaScript engine on the browser side, there are almost no memory problems.

Memory leakage means that a program cannot release the allocated memory space after applying for it. The damage caused by a memory leak can be ignored, but the accumulation of memory leaks is serious. No matter how much memory is occupied, it will be exhausted sooner or later.

Node is very sensitive to memory leaks, because once we have tens of thousands of online traffic, even a single byte of leaked memory can pile up, and garbage collection can take a lot of time to scan objects, causing our application to respond slowly until the process runs out of memory and the entire application crashes.

All in all, Node runs under heavy traffic, and a few memory leaks can add up to slow or even crash applications.

So why do memory leaks cause slow applications or even crashes? This requires a deep understanding of Node memory and garbage collection strategies.

Node memory and garbage collection policy

Memory strategy

Node is made up of partially in-heap allocated through V8 and partially out-of-heap allocated by Node itself, where all JavaScript objects are allocated through the heap.

In order to improve the efficiency of garbage collection, V8 divides the heap into the new generation and the old generation, with the new generation being the objects with a short lifetime (requiring frequent garbage collection) and the old generation being the objects with a long lifetime (requiring less frequent garbage collection).

The amount of memory allocated by V8 in Node is limited because V8 was originally designed for browser-side use, which rarely encountered large memory, and V8 garbage collection caused JavaScript logic to pause. Take 1.5GB of garbage collection for example. V8 can take more than 50 milliseconds to do a small garbage collection and more than a second to do a non-incremental garbage collection. There are no large memory scenarios on the browser side, and V8 simply limits how much memory can be allocated.

Node deals with data provisioning, logic, and I/O, and may encounter scenarios with large memory, such as reading a 2GB file into memory for string parsing. Due to the above memory limitations, V8 will not be able to fully utilize even 32GB of physical memory. However, we can pass in parameters to adjust the size of the memory limit when starting Node.

node --max-old-space-size=2000 test.js  // The unit is MB
node --max-new-space-size=2048 test.js  // The unit is KB
Copy the code

It is worth mentioning that buffers are objects that manipulate binary data, whether strings or images, and the underlying data is binary data, so buffers can be used for any type of file operation. The Buffer object itself is a normal object, stored in the heap and managed by V8, but the data it stores is stored in off-heap memory and allocated by C++, so it is not managed by V8 and does not need to be garbage collected by V8, saving V8 resources to some extent and ignoring heap memory limits.

Garbage Collection strategy

Based on the difference in object lifetime, Node memory is divided into the new generation memory and the old generation memory, and different garbage collection strategies are used according to the characteristics of the new and old memory to achieve the highest collection efficiency.

1. New generation garbage recycling strategy

Characteristics of the new generation memory: Most objects have a short lifetime and are recycled using the Scvenge algorithm, while the Cheney algorithm is used to implement the Scavenge algorithm:

A garbage collection algorithm implemented by replication. It divides the heap memory into two parts, each of which is called semispace. Of the two Semispace Spaces, only one is in use and the other is idle. Semispace Spaces that are in use are called From Spaces, and Spaces that are idle are called To Spaces. When we allocate objects, we allocate them first in the From space. When garbage collection begins, live objects in the From space are checked, they are copied To the To space, and space occupied by non-live objects is freed. After the replication is complete, the roles of the From space and To space are swapped. In short, the living object is copied between two Spaces.

Cons: Only uses half the space

Advantages: Because most of the new-generation memory is the object with a relatively short life cycle, there are not many objects to be copied and the recycling efficiency is high.

2. Old generation garbage recycling strategy

Characteristics of old generation memory: Most objects live for a long time, and garbage collection is mainly carried out by combining Mark-sweep and Mark-compact

  1. Mark-sweep algorithm, or Mark Sweep. It is divided into two phases, mark and purge, in which all objects in the heap are traversed and marked alive, and in the subsequent purge phase, only unmarked objects are cleared. One of the problems with Mark-Sweep, as you can see in the figure below, is that after a Mark Sweep collection, the memory space will be discontinuous, and if a large object needs to be allocated, a garbage collection will be triggered, which is not necessary.

  1. Mark-compact stands for markup consolidation. His purpose was to solve the memory discontinuity caused by Mark-sweep. It differs from Mark-sweep in that after objects are marked dead, the living objects are moved to one end during the process of cleaning, and when the move is complete, the memory outside the boundary is cleaned directly.

3. New generation promotion to old generation memory strategy

The promotion of the new generation to the old generation mainly considers the following two conditions:

  1. Whether the subject has been screcycled. When an object is copied From the From space To the To space, its memory location is checked To determine whether the object has been screcycled, and if so, copied From the From space To the older generation.
  2. The memory usage of the To space exceeded the 25% limit. When an object is copied From the From space To the To space, if more than 25% of the To space is already used, the object is copied directly into the old generation. The reason for this is that after the Scavenge, the To space becomes the From space, where subsequent memory allocation takes place. If the ratio is too high, subsequent memory allocation will be affected.

4. Incremental marking strategy

To reduce the pause time caused by full-heap garbage collection, V8 started with Incremental Marking, where actions that were supposed to be completed in one go are broken down into many small “steps” that allow JavaScript application logic to execute for a short time. Garbage collection and application logic alternate until the tagging phase is complete.

Incremental tagging allows tagging of the heap to occur in a few small pauses of 5-10 milliseconds (on mobile devices). Delta marking is enabled when the heap size reaches a certain threshold, after which execution of the script pauses and delta marks each time a certain amount of memory is allocated. Just like regular tags, incremental tags are a depth-first search and use the same white-gray-black mechanism to classify objects.

You can view Node memory indicators

It relies on the three apis that Node provides.

  1. process.memoryUsage()

rssResident Set Size is the resident memory part of a process. A process has several parts of memory, one of which is RSS and the other is in swap or filesystem.

heapTotalIs the total amount of requested memory in the V8 heap.

heapUsedRepresents the amount of memory in use in the V8 heap.

externalRefers to the memory usage of C++ objects bound to V8 managed JavaScript objects.arrayBuffersRefers to asArrayBuffer 和 SharedArrayBufferAllocated memory.

  1. os.totalmem()Returns the total amount of system memory, in bytes, as an integer.

3. os.freemem()Returns the amount of free system memory, in bytes, as an integer.

How do I prevent memory leaks

As mentioned in the memory strategy section, the memory used by V8 in Node is limited. If the number of objects in the memory continues to increase and cannot be released, the memory will eventually be filled up, which will slow down the process or even crash. At this time, the developer will also crash.

To prevent developer crashes, it is important to be aware of the following common Node memory leak scenarios.

  1. Cache misuseLook at the following code
const cahe = {};
const get = (key) = > {
    if(cache[key]) {
      return cache[key];
    } else {
      // do something}}const set = (key, value) = > {
  cache[key] = value;
} 

// When any request comes in, we perform this caching logicCache. The set (someKey someValue);Copy the code

This code is usually fine in browsers, since each Tab has its own V8 thread, but on the Node server, all users share the Node server memory. If each request caches some data into the caHE object, the caHE object will continue to grow. The result is a memory leak. The solution is simple: give the cache object a maximum value of 200 keys, for example, and use a policy to remove existing caches, such as LRU or FIFO, when storing the 201st key. Of course, out-of-process caches can also be used to avoid memory leaks while also solving the problem of cache sharing between different threads, such as Redis.

  1. Queue consumption is not timelyThere are many special requirements that can be accomplished in JavaScript through queues (array objects). Queues often act as intermediates in the consumer-producer model, and when producers are larger than consumers, queues pile up and eventually lead to memory leaks.

An example is when an application collects logs. Without consideration, you might use a database for logging. The log is usually massive, and the database is built on the file system, so the writing efficiency is much lower than that of the file directly. Therefore, the database write operation will accumulate, and the related scope in JavaScript will not be released, and the memory usage will not fall, resulting in memory leakage. Solution: Monitor the length of the queue, once the accumulation, should be through the monitoring system to generate an alarm and inform the relevant personnel. Another solution is that any asynchronous call should include a timeout mechanism. If the response is not completed within a limited time, pass the timeout exception through the callback function, so that the callback of any asynchronous call has a controllable response time, giving a lower limit to the consumption speed.

  1. Scope is not releasedThis is a common scenario for closures. In V8, if an object is always referenced, it will not be garbage collected.

The core idea of preventing memory leaks is that we must take into account massive requests when writing Node code. Node memory leaks are mostly caused by massive requests. This is the difference between writing service and browser side ideas.

How do I locate memory leaks

If we do have a memory leak in production, do we really have to delete the library and run away? Not necessarily. There are several common tools available to help you locate a memory leak. If all of them fail, run away.

  1. Node-heapdump allows you to take snapshots of V8 heap memory for post hoc analysis. Here is an example of how to locate a memory leak

    1. Install heapdumps
    npm install heapdump --save
    Copy the code
    1. Introduce Headump in your code and simulate a cache leak scenario, continually adding data to the cache object
     const heapdump = require('heapdump');
     console.log(process.pid);
    
     const cache = {};
     let index = 0;
     const cacheValue = new Array(2000);
    
     setInterval(() = > {
       cache[`testHeapKey---${index}`] = cacheValue;
       index++;
     }, 0);
    Copy the code
    1. There are two ways to actively generate snapshots, and this example uses Method 2
      1. Active method invocationwriteSnapshot([filename], [callback])
      2. By sending signals to the processkill -USR2 pid
    2. By default, a snapshot file is created in the project root directory after sending a signal to the process. Of course, this file is not for human viewing, we need to use Chrome developer tools to parse it

    1. Open the Cheome Developer tool, go to the Memory tool, and click the Load button to Load the Memory snapshot file we just generated

    1. We locate memory leaks it is best to use two different points in time, comparing the memory snapshot to have the comparison we can better find problems, such as the real case, when we suspect the Node service has a memory leak, the export on Monday a memory snapshot, on Friday to export a new snapshot, and then used to compare the two snapshot, It’s easy to compare which parts of inner existence are growing. I have taken a snapshot at two points in time and imported it into Chrome Memory Analysis tool. The first snapshot has a total memory of 9.2MB and the second one has grown to 42.9MB, which indicates that there is a memory leak

    1. Using the comparison mode of the memory tool, you can clearly see which types of objects are increasing. As shown in the following figure, the number of String objects in the second snapshot increases by 358586 compared with that in the first snapshot. Therefore, it can be further inferred that the number of String objects is increasing.

    1. Click on the String object to see what strings have been added. You can see that the code I used to simulate the memory leak added the String, so that you can locate where the memory leak occurred, and then modify the code to optimize and fix it.

  1. node-memwatchProvides methods to listen for when a memory leak is possible, and executes the methods we listen forNode-memwatch NPM package documentationOk, I’m not going to discuss it here
  2. memwatch-nextThe best way to monitor memory usage is to count memory usage after V8’s GC, which memwatch does. When V8 does a garbage collection, Memwatch triggers an event that we can listen for to glean memory usage. Use referenceNPM package documentation for Memwatch-Next
  3. If you want to use ready-made complete monitoring scheme, you can also use Ali’s Node performance monitoring platform
  4. PM2 has the maximum memory restart function, which can automatically restart our Node process after it crashes due to memory leak, to further protect our Node process.

There are many other tools available for locating memory leaks, but the idea is the same: take a snapshot of V8 memory and analyze it. Having a memory snapshot is like a doctor having a color ultrasound of a patient, so he knows what’s wrong and can fix it.

Of course, we can also use the script to capture the process of memory snapshot, snapshot can do anything you want to do, such as triggering monitoring alarms, draw the memory status graph and so on.

conclusion

Node is need to pay attention to the memory, although the V8 dealt with memory management work for us, we don’t have to directly manipulate memory, but also need to be aware of the memory of the Node and garbage collection mechanism, know common memory leaks, and locate the means to solve the problem, such ability to handle the problems of handy in production environment.

The resources

Chapter 5 of Node.js