- Leaks How JavaScript Works: Memory Management + How to Handle 4 Common Memory leaks
- Originally written by Alexander Zlatkov
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: Cao Xiaoshuai
- Proofreader: PCAaron Usey95
A few weeks ago, we started a series of studies aimed at delving deeper into JavaScript and how it works. The idea was that by understanding how blocks of JavaScript code are built and how they work together, we would be able to write better code and applications.
The first article in this series focuses on providing an overview of the engine, runtime, and stack calls. The second article takes a close look at the internals of Google’s V8 JavaScript engine and offers some advice on how to write better JavaScript code.
In the third article, we will discuss another topic that is increasingly overlooked by developers as the programming languages used for everyday basic memory management become more mature and complex. We will also provide some advice on how to handle JavaScript memory leaks in the SessionStack, where we need to make sure that the SessionStack does not cause memory leaks or increase the consumption of our integrated web application.
An overview of
For example, programming languages like C have basic memory management functions like malloc() and free(). Developers can use these functions to explicitly allocate and free memory for the operating system.
JavaScrip, meanwhile, allocates memory when an object is created and “automatically” frees it when the object is no longer in use, a process known as garbage collection. This seemingly “automatic” release of resources is a source of confusion, giving JavaScript (and other high-level languages) developers the illusion that they can choose not to care about memory management. This is a wrong idea
Even in high-level languages, developers should have some understanding of memory management (at least as far as basic memory management is concerned). Sometimes, problems with automatic memory management (such as garbage collector errors or memory limitations) require developers to understand memory management in order to handle it more appropriately (or find a less costly alternative).
Memory life cycle
No matter which programming language you use, the memory life cycle is always roughly the same:
Here’s an overview of the specifics of each step in the loop:
-
Memory allocation – Memory is allocated by the operating system and allowed to be used by your applications. In a base language (such as C), this is an explicit operation that developers should handle. In high-level systems, however, the language already does this for you.
-
Memory usage – This is when your program actually uses previously allocated memory. Read and write operations occur when you use variables that have been allocated in your code.
-
Free memory – Free memory that you know you don’t need, and make it free and available again. Like memory allocation, this is an explicit operation in the base language. For a quick overview of the concepts of call stacks and memory heaps, read our first article on the subject.
What is memory?
Before jumping straight to the section on memory in JavaScript, we’ll briefly discuss an overview of memory and how it works:
At the hardware level, memory contains a large number of triggers. Each flip-flop contains a number of transistors and can store one bit. Individual triggers can be addressed by unique identifiers, so we can read and override them. Thus, conceptually, we can think of the entire computer memory as a large set of bits that we can read and write to.
As humans, we’re not very good at implementing all our ideas and algorithms in a bit, we assemble them into larger groups that can be used to represent numbers. Eight bits are called a byte. In addition to bytes, there are words (sometimes 16, sometimes 32 bits).
A lot of things are stored in memory:
- All variables and other data used by all programs.
- Program code, including operating system code.
The compiler and the operating system together handle most of the memory management for you, but we recommend that you take a look at what’s going on underneath.
When you compile code, the compiler can examine the raw data types and calculate in advance how much memory they require. The required amount is then allocated to the program in the stack space. The space allocated for these variables is called stack space, because as functions are called, their memory is added to the existing memory. When they terminate, they are removed in LIFO (last in first out) order. For example, consider the following statement:
int n; // 4 bytes
int x[4]; // array of 4 elements, each 4 bytes
double m; // 8 bytes
Copy the code
The compiler can calculate the code needs immediately
4 + 4 × 4 + 8 = 28 bytes
This is how it handles the current sizes of the integers and Doubles types. About 20 years ago, Integers was usually two bytes, and Doubles was four. Your code should not depend on the size of the base data type at any one time.
The compiler inserts code that interacts with the operating system to store the required number of bytes for variable requests in the stack.
In the example above, the compiler knows the exact memory address of each variable. In fact, as soon as we write the variable N, it will be translated internally to something like “memory address 4127963”.
Note that if we try to access x[4] here, we will access the data associated with M. This is because we are accessing an element that does not exist in the array – it is 4 bytes deeper than the last actual allocated element in the array x[3], and may end up reading (or overwriting) some bits of M. This has unintended consequences for the rest of the project.
When a function calls another function, each other function generates its own stack block when called. The stack block retains all its local variables and a program counter that records where the program is executed. When the function call completes, its block of memory can be used again for other purposes.
Dynamic allocation
Unfortunately, things are no longer simple when we don’t know how much memory a variable needs at compile time. Suppose we wanted to do something like this:
int n = readInput(); // reads input from the user
...
// create an array with "n" elements
Copy the code
Here, at compile time, the compiler does not know how much memory the array requires, because it is determined by the user-supplied value.
Therefore, it cannot allocate space for variables on the stack. Instead, our program needs to explicitly ask the operating system for the correct amount of memory at run time. This memory is allocated from heap space. The following table summarizes the differences between static and dynamic memory allocation:
The difference between static and dynamic memory allocation
To fully understand how dynamic memory allocation works, we need to spend more time on Pointers, which may be a slight deviation from the topic of this article. If you’re interested in learning more, let us know in the comments, and we can cover Pointers in more detail in a future article.
Memory allocation in JavaScript
Now we’ll explain how the first step (allocating memory) works in JavaScript.
JavaScript relieves the developer of the burden of dealing with memory allocation – JavaScript performs memory allocation itself and declares values.
var n = 374; // allocates memory for a number
var s = 'sessionstack'; // allocates memory for a string
var o = {
a: 1,
b: null
}; // allocates memory for an object and its contained values
var a = [1, null, 'str']; // (like object) allocates memory for the
// array and its contained values
function f(a) {
return a + 3;
} // allocates a function (which is a callable object)
// function expressions also allocate an object
someElement.addEventListener('click'.function() {
someElement.style.backgroundColor = 'blue';
}, false);
Copy the code
Some function calls also cause object allocation:
var d = new Date(); // allocates a Date object
var e = document.createElement('div'); // allocates a DOM element
Copy the code
Methods can assign new values or objects:
var s1 = 'sessionstack';
var s2 = s1.substr(0, 3); // s2 is a new string
// Since strings are immutable,
// JavaScript may decide to not allocate memory,
// but just store the [0, 3] range.
var a1 = ['str1'.'str2'];
var a2 = ['str3'.'str4'];
var a3 = a1.concat(a2);
// new array with 4 elements being
// the concatenation of a1 and a2 elements
Copy the code
Use memory in JavaScript
Basically using allocated memory in JavaScript means reading and writing in it.
This can be done by reading or writing the value of a variable or object property, or even passing a variable to a function.
Free memory when it is no longer needed
Most memory management problems are at this stage.
The most difficult task here is to determine when the allocated memory is no longer needed. It typically requires the developer to determine which part of the program no longer needs this memory and free it up.
High-level languages embed software called the garbage collector, whose job is to track memory allocation and usage so that it can find out when and when allocated memory is no longer needed, and it automatically frees it.
Unfortunately, this process is an approximation, because the problem of predicting the need for certain memory is usually undecidable (algorithmically impossible).
Most garbage collectors work by collecting memory that can no longer be accessed; for example, all variables pointing to it are out of scope. However, this is an approximation of a set of memory Spaces that can be collected, because in some cases the memory location may still have a variable pointing to it, but it will not be accessed again.
Garbage collection mechanism
Garbage collection is limited in implementing general problem solutions because it is virtually undecidable to discover whether some memory is “no longer needed.” This section explains the basic concepts of the major garbage collection algorithms and their limitations.
Memory references
The main concepts on which the garbage collection algorithm relies come from the Appendix resources.
In the context of memory management, an object is said to reference another object if it has access to another object (either implicitly or explicitly). For example, a JavaScript reference to its prototype (implicit reference) and its attribute value (explicit reference).
In this case, the concept of “object” extends to a much broader scope than normal JavaScript objects and includes function scope (or global lexical scope).
Lexical scope defines how variable names are resolved in nested functions: the inner function contains the scope of the parent function even after the parent function has returned.
Reference counting garbage collection
This is the simplest garbage collection algorithm. If there are zero references to it, the object is considered “garbage recyclable.”
Take a look at the following code:
var o1 = {
o2: {
x: 1
}
};
// 2 objects are created.
// 'o2' is referenced by 'o1' object as one of its properties.
// None can be garbage-collected
var o3 = o1; // the 'o3' variable is the second thing that
// has a reference to the object pointed by 'o1'.
o1 = 1; // now, the object that was originally in 'o1' has a
// single reference, embodied by the 'o3' variable
var o4 = o3.o2; // reference to 'o2' property of the object.
// This object has now 2 references: one as
// a property.
// The other as the 'o4' variable
o3 = '374'; // The object that was originally in 'o1' has now zero
// references to it.
// It can be garbage-collected.
// However, what was its 'o2' property is still
// referenced by the 'o4' variable, so it cannot be
// freed.
o4 = null; // what was the 'o2' property of the object originally in
// 'o1' has zero references to it.
// It can be garbage collected.
Copy the code
Periodic generation problem
There is a limit to the cycle cycle. In the following example, two objects are created and referenced to each other, creating a loop. After a function call, they go out of bounds, so they are virtually useless and can be released. However, the reference-counting algorithm assumes that since each object is referenced at least once, neither object can be garbage collected.
function f() {
var o1 = {};
var o2 = {};
o1.p = o2; // o1 references o2
o2.p = o1; // o2 references o1. This creates a cycle.
}
f();
Copy the code
Tagging and scanning algorithms
To determine whether an object is needed, the algorithm determines whether the object is accessible.
The tag and scan algorithm goes through these three steps:
1. Root node: In general, root is a global variable referenced in code. For example, in JavaScript, the global variable that can act as the root node is the “window” object. The global object in Node.js is called “global”. The complete list of root nodes is built by the garbage collector. 2. The algorithm then checks all root nodes and their children and marks them as active (meaning they are not garbage). Any variables that cannot be accessed by the root node are marked as garbage. 3. Finally, the garbage collector frees any chunks of memory that are not marked as active and returns them to the operating system.
Visualization of tagging and scanning algorithm behavior.
Because “an object with zero references” makes the object unreachable, this algorithm is better than the previous one. The exact opposite of what we see in cycles is not true.
As of 2012, all modern browsers have tag-scanning garbage collectors built in. All of the improvements made last year in the JavaScript garbage collection (general/incremental/concurrent/parallel garbage collection) area were based on implementation improvements to this algorithm (tagging and scanning), but not improvements to the garbage collection algorithm itself, nor to determining whether an object is reachable to this goal.
In this article, you can read more about garbage collection tracking, as well as the tagging and scanning algorithms and their optimizations.
Cycles are no longer an issue
In the first example above, after the function call returns, the two objects are no longer referenced by a variable in the global object. Therefore, the garbage collector considers them inaccessible.
Even if there is a reference between two objects, they are no longer reachable from the root node.
Statistical garbage collector’s intuitive behavior
As convenient as garbage collectors are, they have their own set of trade-offs. One is uncertainty. In other words, GCs (garbage collectors) are unpredictable. You can’t be sure when a garbage collector will perform a collection. This means that in some cases, the program actually needs to use more memory. In other cases, in particularly sensitive applications, a short pause may be noticeable. Most GCS share a common pattern for garbage collection in allocation, although uncertainty means that it is impossible to determine when a garbage collector will perform a collection. If no allocation is performed, most GCS remain idle. Consider the following scenario:
- A large number of assignments are performed.
- Most (or all) of these elements are marked as inaccessible (assuming we revoke a reference to a cache we no longer need).
- No deeper memory allocation is performed.
In this case, most GCS do not run any deeper collections. In other words, even if there are unavailable references available for collection, the collector does not declare them. These are not strictly leaks, but can still result in higher than normal memory usage.
What is a memory leak?
As described in memory, a memory leak is a fragment of memory that has been used by an application but is no longer needed that has not yet been returned to the operating system or available memory pool.
Programming languages prefer different ways of managing memory. However, whether or not a certain chunk of memory is actually used is an undecided question. In other words, only the developer can determine whether a block of memory can be returned to the operating system.
Some programming languages provide functionality to help developers do the above. Others want developers to know exactly when a chunk of memory is unused. Wikipedia has good articles on how to manually and automatically manage memory.
There are four common memory leaks in JavaScript
1: global variable
JavaScript handles undeclared variables in an interesting way: when referencing an undeclared variable, a new variable is created in the Global object. In the browser, the global object will be window, which means
function foo(arg) {
bar = "some text";
}
Copy the code
Is equal to:
function foo(arg) {
window.bar = "some text";
}
Copy the code
Let’s assume that the purpose of bar is simply to refer to a variable in function foo. However, if you don’t declare it using var, you create a redundant global variable. In the above case, this does not have very serious consequences. You can imagine a more destructive scenario.
You can also accidentally create a global variable with this:
function foo() {
this.var1 = "potential accidental global";
}
// Foo called on its own, this points to the global object (window)
// rather than being undefined.
foo();
Copy the code
You can do this by adding ‘use strict’ to the beginning of your JavaScript file; To avoid these consequences, this opens up a stricter JavaScript parsing mode that prevents accidental creation of global variables.
Unexpected global variables are certainly a problem, but more often than not, your code is affected by explicit global variables that cannot be collected by the garbage collector. Special attention needs to be paid to global variables used to temporarily store and process large amounts of information. If you must use global variables to store data, when you do, be sure to null them or reassign them as soon as they are done.
2: forgotten timer or callback function
Take setInterval, which is often used in JavaScript.
The libraries that provide observers and other callbacks generally ensure that all references to callbacks become inaccessible if the instances are inaccessible. However, the following code is not uncommon:
var serverData = loadData();
setInterval(function() {
var renderer = document.getElementById('renderer');
if(renderer) { renderer.innerHTML = JSON.stringify(serverData); }}, 5000); //This will be executed every ~5 seconds.Copy the code
The code snippet above shows the consequences of using timers to reference nodes or useless data.
Renderer objects may be replaced or removed at some point, making blocks wrapped by interval handlers redundant. If this happens, neither the handler nor its dependencies will be collected, because the interval processing needs to be standby stopped (remember, it is still active). It all boils down to the fact that serverData, which stores and processes the payload data, is also not collected.
When using observers, you need to make sure that you write explicit calls to remove them once the transactions that depend on them have been processed (observers are no longer needed, or objects will become unavailable).
Fortunately, most modern browsers will do this for you: even if you forget to remove listeners, they will automatically collect observer handlers when the object becomes inaccessible. Some browsers in the past were unable to handle these situations (old IE6).
However, once the object becomes obsolete, it is best practice to remove the observer. Look at the following example:
var element = document.getElementById('launch-button');
var counter = 0;
function onClick(event) {
counter++;
element.innerHtml = 'text ' + counter;
}
element.addEventListener('click', onClick);
// Do stuff
element.removeEventListener('click', onClick);
element.parentNode.removeChild(element);
// Now when element goes out of scope,
// both element and onClick will be collected even in old browsers // that don't handle cycles well.
Copy the code
Browsers now support garbage collectors that detect these loops and handle them appropriately, so you no longer need to call removeEventListener before making an unreachable node.
If you take advantage of the jQuery API (which is also supported by other libraries and frameworks), you can also remove listeners before the node becomes obsolete. These libraries ensure that there are no memory leaks even if the application is running under an older browser version.
3: closures
A key aspect of JavaScript development is closures: an internal function can access the variables of an external (enclosing) function. Due to the implementation details of the JavaScript runtime, memory can leak in the following ways:
var theThing = null;
var replaceThing = function () {
var originalThing = theThing;
var unused = function () {
if (originalThing) // a reference to 'originalThing'
console.log("hi");
};
theThing = {
longStr: new Array(1000000).join(The '*'),
someMethod: function () {
console.log("message"); }}; };setInterval(replaceThing, 1000);
Copy the code
Once the replaceThing function is called, theThing gets a new object that consists of a large array and a new closure (someMethod). OriginalThing, however, is referenced by a closure held by the unused variable (which is the Thing from the previous call to replaceThing). Remember that once a scope is created for a closure within the same parent scope, the scope is shared.
In this example, the scope created by someMethod is shared with unused. Unused contains a reference to originalThing. Even if unused is never referenced, someMethod can be used by theThing outside the scope of replaceThing (for example, somewhere globally). Since someMethod shares a closure range with unused, a unused reference to originalThing forces it to remain active (the entire shared range between two closures). This prevents their garbage collection.
In the example above, the scope created for the closure someMethod is shared with unused, which in turn references originalThing. SomeMethod can be referenced by theThing outside the scope of replaceThing, though unused is never referenced. In fact, a unused reference to originalThing requires it to remain active, since someMethod shares a closed range with unused.
All of this can lead to a lot of memory leaks. As the code snippet above runs over and over again, you can expect memory usage to rise. When the garbage collector is running, its size does not shrink. A chain of closures is created (in this case the root of which is the theThing variable), and each closure scope contains an indirect reference to the large array.
The Meteor team found this problem, and they have a great article that describes it in detail.
4: out-of-DOM references
In some cases developers store DOM nodes in data structures. Suppose you want to quickly update a few rows of a table. If you store a reference to each DOM row in a dictionary or array, two references to the same DOM element are generated: one in the DOM tree and one in the dictionary. If you decide to delete these lines, you need to remember to make both references inaccessible.
var elements = {
button: document.getElementById('button'),
image: document.getElementById('image')};function doStuff() {
elements.image.src = 'http://example.com/image_name.png';
}
function removeImage() {
// The image is a direct child of the body element.
document.body.removeChild(document.getElementById('image'));
// At this point, we still have a reference to #button in the
//global elements object. In other words, the button element is
//still in memory and cannot be collected by the GC.
}
Copy the code
There is an additional factor to consider when it comes to internal nodes or leaf nodes within a DOM tree. If you keep a reference to a table cell (td tag) in your code and decide to remove the table from the DOM but keep the reference to that particular cell, you can expect a serious memory leak. You might think that the garbage collector would free everything except that cell. But that is not the case. Because a cell is a child node of the table, and the child node keeps a reference to the parent node, such a single reference to a table cell keeps the entire table in memory.
In SessionStack we try to follow these best practices and write code that handles memory allocation correctly for the following reasons:
Once you integrate SessionStack into your production environment’s Web application, it starts logging everything: all DOM changes, user interactions, JavaScript exceptions, stack traces, failed network requests, debug messages, etc.
With SessionStack, you can replay problems in your web application like a video and see all user behavior. All of this must be done without a performance impact on your network application.
Since users can reload pages or navigate your application, all observers, interceptors, variable allocations, and so on must be handled correctly so that they do not cause any memory leaks or increase memory consumption for the Web application we are integrating.
Here’s a free plan so you can try it out.
Resources
- Www-bcf.usc.edu/~dkempe/CS1…
- Blog.meteor.com/an-interest…
- www.nodesimplified.com/2017/08/jav…
- Auth0.com/blog/four-t…
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, React, front-end, back-end, product, design and other fields. If you want to see more high-quality translation, please continue to pay attention to the Project, official Weibo, Zhihu column.