A few weeks ago we started a series of in-depth explorations of JavaScript and how it works: We wanted to take what we already know about JavaScript and put it all together to help you write better code and applications.
The first article in this series focused on an engine discussion of the runtime and call stack. The second article takes an in-depth look at the insides of Google’s V8 JavaScript engine and offers some tips on how to write better JavaScript code.
In the third article, we will discuss another important topic that has been overlooked by developers due to the increasing maturity and complexity of the programming languages we use everyday – memory management. We also provide some advice on how to handle memory leaks in JavaScript, which we follow in the SessionStack because we want to make sure that the SessionStack doesn’t cause memory leaks or add memory overhead to our integrated web application.
An overview of the
Like C, there is raw memory management at the bottom such as malloc() and free(). These primitive methods are used by developers in operating systems to precisely allocate and free memory.
Also, JavaScript allocates memory when things (objects, strings, etc.) are created and “automatically” frees memory when they are no longer needed, a process called garbage collection. This seemingly auto-frees resource feature is a source of confusion, and it gives JavaScript (and other high-level languages) developers the false impression that they can choose not to care about memory management. It was a huge mistake.
Even if working in high-level languages, developers should have an understanding of memory management (at least at a basic level). Sometimes memory management-related issues (such as bugs in garbage collection, limitations, etc.) have to be understood and dealt with appropriately (or a suitable alternative with minimal cost and overhead).
Memory life cycle
No matter what programming language you use, the memory life cycle is very similar:
Here’s an overview of what happens at each step of the cycle:
- Allocate memory – Memory is allocated by the operating system to allow your programs to use it. In low-level languages (such as C), there is an explicit operation that a developer should handle. In a high-level language, however, the high-level language takes care of it for you.
- Use memory – This is when your program actually uses the previously allocated memory. Because of the variables assigned in your code, read and write operations occur.
- Free up memory – This is the time to free up all memory that you don’t need so that it can be allocated freely and used again. As an operation to allocate memory, it is very explicit in low-level languages.
For a quick understanding of the concept of call stacks and memory heaps, you can read the topic of our first article.
What is memory?
Before diving directly into the JavaScript memory concept, let’s briefly discuss memory in general and a one-sentence summary of how it works.
At the hardware level, computer memory consists of a large number of triggers. Each flip-flop contains several transistors that can store one bit. Individual triggers are accessible by unique identifiers, so we can read and copy them. So, conceptually, the entire computer memory we’re thinking about is just a huge set of bits that we can read and write.
As humans, we’re not very good at thinking and calculating in bits. We organize them into larger groups that can be used to represent numbers. Eight bits are called a byte. These are phrases in bytes (sometimes 16, sometimes 32 bits).
Many things are stored in memory:
- All variables and all other data used in the program.
- Program code, including the operating system.
The compiler and the operating system work together to handle memory management for you, but we recommend you take a look at what’s going on down here.
When you compile your code, the compiler checks the basic data types and calculates how much memory will be used in the future. This amount of requirement is allocated to the program called stack space. Because as a function call, the amount of space allocated to the variables is called the footprint, and their memory is added to the memory that already exists. When they are finished, they are removed under LIFO (last in, first out). For example, consider the following statement:
int n; // 4 bytes
int x[4]; // array of 4 elements, each 4 bytes
double m; // 8 bytes
Copy the code
The compiler immediately gets the code it needs:
4 + 4 X 4 + 8 = 28 bytes of space.
This works for the current space of integers and doubles. About 20 years ago, integers were typically 2 bytes and doubles were 4 bytes. Your code should not depend on the size of the base data type at any one time.
The compiler inserts code that interacts with the operating system to request the necessary number of bytes to store your variables on the stack.
In the example above, the compiler knows the exact memory address of each variable. In fact, whenever we write the operation variable n, the operation internally translates to something like “memory address 4127963”.
Note that if we try to access x[4], we will access m for the data association. This is because we access a nonexistent array element — its 4 bits are farther away than the last element actually allocated in the X [3] array, and may end up reading (or overwriting) m bits. This has a series of unintended consequences for the rest of the program.
When one method calls another, each method gets a portion of the stack at the time of the call. It holds all local variables, and the program counter remembers where the execution took place. When the method ends, its memory block becomes available again for other uses.
Dynamic allocation
Unfortunately, some things are not so easy when we don’t know how much memory a variable needs at compile time. Suppose we want to do something like this:
int n = readInput(); // Read the user.// Create an array with n elements
Copy the code
At compile time here, the compiler doesn’t know how much memory is needed here, because it depends on the value entered by the user.
Because here, you can’t allocate a space on the stack for a variable. Instead, our program needs to exactly require the operating system to run with the correct amount of space. Such memory is allocated in heap space. The following table summarizes the differences between static and dynamic memory allocation.
To fully understand how dynamic memory allocation works, we need to spend more time on it, which is probably more of a deviation from the topic of this article. If you’re interested in learning more, just comment below, and we’ll show you more details in a future post.
Allocation in JavaScript
Now let’s see how the first step in JavaScript (memory allocation) works.
JavaScript relieves the developer of the burden of manually allocating memory — JavaScript does the same with declaring values.
var n = 374; // Allocate memory for a number
var s = 'sessionstack'; // Allocate memory to a string
var o = {
a: 1.b: null
}; // Allocate memory for an object and its value
var a = [1.null.'str']; // Same as object operations
// Allocate memory for the array and its values
function f(a) {
return a + 3;
} // Allocate memory to a method (also called a callable)
// Function expressions are also objects
someElement.addEventListener('click'.function() {
someElement.style.backgroundColor = 'blue';
}, false);
Copy the code
The result of some method calls is also an object:
var d = new Date(a);// allocates a Date object
var e = document.createElement('div'); // allocates a DOM element
Copy the code
Methods can assign new values or objects:
var s1 = 'sessionstack';
var s2 = s1.substr(0.3); // s2 is a new string
// Because strings are immutable,
// JavaScript does not determine how much memory is allocated.
// but can store [0,3] range.
var a1 = ['str1'.'str2'];
var a2 = ['str3'.'str4'];
var a3 = a1.concat(a2);
// The new array has four elements
// Elements A1 and a2
Copy the code
Use memory in JavaScript
The basic use of memory in JavaScript means read and write operations.
It can be reading or writing the value of a variable or an object property, or even a function parameter.
Free memory when it is no longer needed
Most memory management problems come from this phase.
The hard part is figuring out when allocated memory is no longer needed. This often requires developers to figure out where memory is no longer needed and release it.
High-level languages embed software called garbage collection, which tracks memory allocations and automatically frees allocated memory in order to find it no longer needed.
Unfortunately, this process is an approximation. Because usually knowing whether a chunk of memory is needed or not is an undecided (algorithmically unsolvable) problem.
Most garbage collectors work by collecting memory that is no longer accessible, such as all variable Pointers that are out of current scope. However, the range of memory Spaces that can be collected is as accurate as possible, because a pointer to any memory location still has a variable pointing to its scope, even though it has never been accessed.
Garbage collection
Since finding “no longer needed” memory is an uncomputable fact, garbage collection implements a constrained solution to this common problem. This section explains the need to understand the main garbage collection algorithms and their limitations.
Memory references
References are one of the main concepts that garbage collection algorithms rely on.
In context memory management, an object is referred to another object if formally accessible to the latter (which may be implicit or explicit). For example, a JavaScript object has a reference to Prototype (implicit reference here) and a reference to its property value (display reference).
In this context, the concept of “object” is broader than regular JavaScript objects and includes functional scope (or global lexical scope) as well.
Lexical scope defines how variable names are stored in nested functions: the inner function contains the scope of the parent function even if the parent function has returned.
Reference counts for garbage collection
This is the simplest garbage collection algorithm. An object whose reference pointer is zero is considered garbage recyclable.
Look at the following code:
var o1 = {
o2: {
x: 1}};// Two objects are created
// 'o2' is referenced as an attribute of 'o1'
// Nothing can be garbage collected
var o3 = o1; // The variable 'O3' is the second
// there is a reference to 'o1'.
o1 = 1; // Now, the original 'o1' object has a separate reference, enclosed by the 'O3' variable
var o4 = o3.o2; // This object references the 'O2' attribute
// It now has two references, one as an attribute and one as an 'o4' variable
o3 = '374'; // The original object in 'o1' is now zero references
// It can be garbage collected
// However, its 'o2' attribute is still referenced by the variable 'o4', so it cannot be released
o4 = null; // The original 'o1' object with the 'O2' attribute has zero references.
// It can now be garbage collected.
Copy the code
Problems with loops
There’s a limitation when we talk about loops. In the following example, two objects are created and reference each other, thus creating a loop. After a function call, they go out of scope, so they should actually be useless and released. However, the reference-counting algorithm assumes that since two objects referred to each other for the last time, they will not be garbage collected.
function f() {
var o1 = {};
var o2 = {};
o1.p = o2; // o1 references O2
o2.p = o1; // O2 refers to o1. Here the loop is created.
}
f();
Copy the code
Mark-clear algorithm
To determine which object is needed, the algorithm tests whether an object is accessible.
The mark-clear algorithm performs the following three steps:
-
Root node: In general, the root node is the global variable referenced in the code. In JavaScript, for example, a global variable is represented as the root node as a “window” object. The object called “global” in Nodde.js is exactly the same. A complete list of root nodes is created through the garbage collector.
-
This algorithm checks all root and child nodes and marks them as “active” (meaning, they are not garbage). Anything that cannot be reached by the root node is marked as garbage.
-
Finally, garbage collection frees any memory blocks that are not marked “active” and returns the memory to the operating system.
This algorithm is better than the previous one which was inaccessible due to a “zero reference object”. This is different when we see it in the loop.
In 2012, all modern browsers shipped with mark-sweep garbage collectors. All of the garbage collection improvements in JavaScript (generational, incremental, concurrent, parallel garbage collection) exceed last year’s improvements to this algorithm (mark-sweep), but none exceed the garbage collection algorithm itself, whether the goal of the improvement is an object accessible or not.
In this article, you can learn more about the details of garbage collection tracking, including the optimization of mark-sweep.
Loops are no longer a problem
In the first example, after the function returns, the two objects no longer refer to each other via something accessible from the global object. Therefore, they are no longer accessible through garbage collection.
Even though the two objects refer to each other, they are not accessible from the root node.
Intuitive counting behavior for garbage collection
Although garbage collections are convenient, they have some balance of their own. One of them is called indecision. In other words, GCs are unpredictable. You can’t really tell when a collection will be performed. This means that some programs use more memory than they really need. In other cases, brief pauses in particularly sensitive applications will be noticed. Most GC implementations share the common pattern of collection delivery at allocation time, although indecisivity means that there is no certainty about when a collection will be executed. If no allocation is performed, most of the GCs remains idle. Consider the following scenario:
- A measurable set of assignments is performed.
- Most (or all) of these elements are marked as unreachable.
- No more allocations can be performed.
In this scenario, most GCs will not run any more collection passes. In other words, even if there are unreachable references available for collection, they are not declared by the collector. These are not strict leaks, however, resulting in higher than usual memory usage.
What is a memory leak?
Just like memory, a memory leak is a block of memory in an application that is no longer in use but is not returned to the operating system or free memory pool.
Programming languages like to manage memory in different ways. However, whether or not a memory is used is actually an indeterminate question. In other words, only the developer can figure out whether a chunk of memory should be returned to the operating system.
Some programming languages provide features to help developers do this. Other languages expect developers to be completely sure when a chunk of memory is no longer needed. Wikipedia has a good article on manual and automatic memory management.
There are four common types of JavaScript leaks
1. Global variables
JavaScript handles undeclared variables in an interesting way: when an undeclared variable is referenced, there is a new variable on the global object, which is usually window, which means:
function foo(arg) {
bar = "some text";
}
Copy the code
Is equivalent to:
function foo(arg) {
window.bar = "some text";
}
Copy the code
We say that the purpose of bar is simply to reference a variable in the foo method. A redundant global variable will be created, however, if you do not declare it using var. In the example above, it doesn’t cause much trouble either. Although you can imagine more scenarios of harm.
You can accidentally create a global variable using this:
function foo() {
this.var1 = "potential accidental global";
}
// Foo called on its own, this points to the global object (window)
// rather than being undefined.
foo();
Copy the code
You can avoid all this by adding use strict; This addition at the beginning of the JavaScript file switches to a more rigorous parsing of JavaScript, preventing accidental global variable creation.
Unexpected globals are somewhat of a problem, however, more so by defining exact global variables that cannot be collected through garbage collection. Of particular concern is the temporary storage of large amounts of information for global variables. If you must use global variables to save data while doing so, be sure to null it or reassign it once you no longer need it.
Forgotten timers and callbacks
Let’s take setInterval, for example, which is often used in JavaScript. Call-accepting libraries that provide observation and other functions typically ensure that all callback references become inaccessible once their instances are inaccessible. Code like this is not uncommon:
var serverData = loadData();
setInterval(function() {
var renderer = document.getElementById('renderer');
if(renderer) {
renderer.innerHTML = JSON.stringify(serverData); }},5000); //This will be executed every ~5 seconds.
Copy the code
The code snippet above shows the consequences of using a timer, referring to a data or node that is no longer needed. The Render object may be replaced or removed at some point, which may make blocks encapsulated by timing handlers redundant. If this happens, either the processing or its dependencies may be collected when the timer needs to stop for the first time (remember, it’s still in effect). It presents the fact that serverData determines that stored and performed data loading will also not be collected.
When using observers, you need to be sure to create a precise call to remove things once you have processed them (observers that are no longer needed, and objects that will no longer be accessible).
Fortunately, most modern browsers will do it for you: even if you forget to remove listeners, they will automatically collect the observer’s processing once an object is found inaccessible. Some browsers in the past didn’t handle this stuff (the good old IE6).
However, it is best practice to remove observations from the current row once the object is obsolete. Look at the following example:
var element = document.getElementById('launch-button');
var counter = 0;
function onClick(event) {
counter++;
element.innerHtml = 'text ' + counter;
}
element.addEventListener('click', onClick);
// Do something else
element.removeEventListener('click', onClick);
element.parentNode.removeChild(element);
// When the element leaves scope,
// Elements and onClick are collected even in older browsers
// There is no processing loop either
Copy the code
When modern browsers support proper detection loops and event garbage collection, you don’t have to call removeEventListener when a node is unreachable.
If you use the jQuery API (as do other libraries and frameworks that support this), you can use listeners to remove nodes when they become obsolete. These libraries ensure that there are no memory leaks even when the application is running on older browsers.
3. The closure
Another aspect of JavaScript development is closures: an internal function can access the variables of an external function. Due to the details of the JavaScript runtime implementation, it may have memory leaks in the following methods:
var theThing = null;
var replaceThing = function () {
var originalThing = theThing;
var unused = function () {
if (originalThing) // reference to 'originalThing'
console.log("hi");
};
theThing = {
longStr: new Array(1000000).join(The '*'),
someMethod: function () {
console.log("message"); }}; }; setInterval(replaceThing,1000);
Copy the code
Once replaceThing is called, theThing gets a new object that is made up of a large array and a new closure (someMethod). However, originalThing is controlled by a closure via the unused variable (the variable theThing was previously called from replaceThing). A closure’s scope is remembered once it is created in the same parent scope, which is shared.
In this example, the closure someMethod creates the scope and unused share. Unused Has a reference to originalThing. Even if unused, someMethod can be used outside the scope of a replaceThing (e.g., some global place). As someMethod and unused share the closure scope, unused references have to force originalThing to remain active (the scope shared between the two closures). This prevents its recycling.
In the example above, the scope created by the someMethod closure shares unused, which references originalThing. Through theThing outside the replaceThing scope, someMethod can be used, despite the fact that unused is never used. Because someMethod shares unused scopes, the fact that unused references to originalThing keep it active.
Think of all this as a memory leak. You can expect to see a degree of memory usage, especially if the above code is executed over and over again. When garbage collection is running, its size does not decrease. A linked list of closures is created (in this case its root node is the theThing variable), and each closure scope loads a large array of indirect references.
This problem was discovered by the Meteor team, who describe the problem in more detail in this paper.
4. Outside of the DOM reference
The following example shows a developer storing a DOM structure in a data structure. Suppose you need to quickly update several rows of a table. If you store each row of DOM references in a dictionary or array, there are two references to the DOM of the same node: one is the DOM tree and the other is in the dictionary. If you need to get rid of these lines, you need to remember to make both of them inaccessible.
var elements = {
button: document.getElementById('button'),
image: document.getElementById('image')};function doStuff() {
elements.image.src = 'http://example.com/image_name.png';
}
function removeImage() {
// Image is the immediate child of the body element
document.body.removeChild(document.getElementById('image'));
// We can still see the global object here
// a reference to #button
In other words, the button element is still in memory and cannot be reclaimed
}
Copy the code
There are additional conditions to consider when talking about leaf nodes in an internal DOM tree or internal references. If your code keeps a reference to a table cell (a < TD > tag) and decides to remove a reference to a cell that is still in the DOM, you can expect a memory leak here. You might think that garbage collection would free everything except cells. However, this is not the case in this example. Because cells are children of the table, and child nodes retain references to their parents, references to table cells will retain the entire table in memory.
In SessionStack we tried to find best practices for writing code to properly control memory allocation, and here’s why:
Once you integrate SessionStack into your web application product, it starts logging everything: all DOM changes, user interactions, JavaScript errors, stack tracking, failed network requests, debugging information, and so on. In SessionStack, you can replay them like a video and show your users everything that’s going on. And all of this has no expressive impact on your Web application.
Because the user can reload the page or jump to your APP, all the observers, inspectors, variable allocations, and so on have to be handled properly, so they don’t cause any memory leaks or add memory overhead to our integrated Web application.
Here’s a free plan you can try right now.
resources
- By www-bcf.usc.edu/~dkempe/CS1… Inspired by the
- By blog.meteor.com/an-interest… Inspired by David Glasse
- The www.nodesimplified.com/2017/08/jav… Inspired by the
- By auth0.com/blog/four-t… Inspired by Sebastian Peyrott
- Content reuse from developer.mozilla.org/en-US/docs/… by MDN Web Docs
The original address