Browser principles series -JS memory mechanism and garbage collection

Reprint please retain this part of the content, indicate the source. In addition, the front end team of Toutiao is looking forward to your joining us

JS language designed automatic garbage collection mechanism, do not need developers to declare how to use memory, resulting in memory management and garbage collection is often ignored by front-end developers, this article to talk about JavaScript language memory mechanism and garbage collection mechanism.

First, JavaScript types

Front-end developers know that JavaScript is a weakly typed, dynamic language, characterized by:

Weakly typed, there is no need to declare the JavaScript engine what data type this or that variable is; the JavaScript engine will figure it out when it runs the code.
Dynamic, which means you can use the same variable to hold different types of data.

This is one of the hallmarks of the JavaScript language.

The JavaScript language has two types of data types: primitive and reference. There are two different types because they are stored in different locations in memory.

Two, memory space

During the execution of JavaScript, there are three main types of memory space, respectivelyCode space 、 Stack space 和 Heap space. The code space mainly stores executable code. Let’s look at how stack space and heap space store variables.

Stack space and heap space

Stack space is the call stack, which is used to store execution context, including variable environment, lexical environment, etc. Here’s an example of how occupying space stores data:

function foo(){
 var a = "Chrome"
 var b = a
 var c = {name:"Chrome"}
 var d = c
}
foo()
Copy the code

When executing a piece of code, you need to compile, create the execution context, and then execute the code sequentially. When line 3 is executed, you can refer to the following call stack state diagram:

As you can see from the figure, when line 3 is executed, the values of variables A and B are stored in the execution context, and the execution context is pushed onto the stack, so you can also think of variables A and B as being on the stack.

Continue with line 4,Because the JavaScript engine determines that the value on the right is a reference type, the situation is different. Instead of storing the object directly in the variable environment, the JavaScript engine allocates it to the heap space, where the object has an address in the “heap”. Then write the address of that data into the variable value of C, the schematic diagram of the final allocated memory is as follows:

Object types are stored in the heap space, and only the reference address of the object is kept in the stack space. When JavaScript needs to access the data, it is accessed through the reference address in the stack.

So why are primitive values stored in the stack, while reference values are stored in the heap? Why don’t you just put all the data on the stack?

The answer is no. This is because the JavaScript engine needs to use the stack to maintain the state of the context during the execution of the program. If the stack space is too large, all the data will be stored in the stack space, which will affect the efficiency of the context switch, and thus affect the efficiency of the entire program execution. For example, when the execution of function foo in this article is finished, the JavaScript engine needs to leave the current execution context. It only needs to move the pointer down to the address of the previous execution context, and the stack space of function foo execution context is completely reclaimed.

Therefore, in general, the stack space is not set too large, mainly used to store some primitive type of small data. Reference data takes up a lot of space, so this kind of data will be stored in the heap, the heap space is large, can store a lot of large data, but the disadvantage is that allocating memory and reclaim memory will take a certain amount of time.

Let’s go back to the sample code and see how its final assignment of variable C to variable D does.

In JavaScript, assignments are very different from those in other languages. Primitive assignments copy the value of a variable, while reference assignments copy the reference address. D =c; d=c; d=c;

As you can see from the figure, variables C and D both refer to objects in the same heap, and ultimately they are the same object.

Implementation of closures in memory

As mentioned earlier, closures are the product of lexical scope. How are closures stored in memory?

Continue with sample code for closure in the Browser Principles series -JS Execution Context Details:

function foo() {
    var myName = "IE";
    let test1 = 1;
    const test2 = 2;
    var innerBar = {
        getName:function(){
            console.log(test1);
            return myName;
        },
        setName:function(newName){
            myName = newName;
        }
    }
    return innerBar;
}
var bar = foo();
bar.setName("Chrome");
bar.getName();
console.log(bar.getName());
Copy the code

Here is how the stack looks when the code executes to setName:

Function foo has a closure that contains two variables: test and myName. How is this closure formed? Let’s examine the execution flow of this code from a memory model perspective.

When the JavaScript engine executes on foo, it first compiles and creates an empty execution context.
When the internal function setName is encountered during compilation, the JavaScript engine also does a quick lexical scan of the internal function and finds that the internal function references the myName variable in foo. Since the internal function references the variable of the external function, So the JavaScript engine decides that this is a closure and creates a new “closure(foo)” object in the heap space (an internal object that JavaScript cannot access) to hold the myName variable.
The JavaScript engine then adds test1 to the “Closure (foo)” object when it continues to scan into the getName method and finds that the function also references the variable test1. The “closure(foo)” object in the heap now contains the myName and test1 variables.
Because test2 is not referenced by an internal function, test2 remains in the call stack.

From the above analysis, we can draw the stack state when executing the “return innerBar” statement in foo, as shown below:

When foo is executed, the closure is generated; When foo is finished executing, the returned getName and setName methods both refer to an object called “clourse(foo),” so even if foo exits, “Clname (foo)” is still referenced by its internal getName and setName methods. So the next time you call bar.setname or bar.getName, you create an execution context that includes “clourse(foo).”

In summary, there are two core steps to generating closures: the first step is to pre-scan the internal functions; The second step is to save the external variables referenced by the inner function to the heap.

How does the V8 engine store objects

Soul asks: What is the attribute complexity of accessing objects? Think about this for a moment, and the next article in the Browser Principles series – Optimizing object storage for the V8 engine will examine this question.

Three, garbage recycling

Garbage Collection strategy

In general, garbage data collection can be divided into manual collection and automatic collection. The familiar C/C++ language belongs to the manual garbage collection strategy, requiring developers to manually control the allocation and destruction of memory, and the other strategy is the automatic garbage collection strategy, such as JavaScript, Java, Python and other languages, the garbage data generated is released by the garbage collector, do not need to manually release through code.

The fact that JavaScript “automatically” frees resources leads many JavaScript developers to believe that they can ignore memory management, which is a big misconception. Let’s take a look at how JavaScript does garbage collection.

How is the data in the call stack recycled

As mentioned earlier, JavaScript pushes the execution line into the execution stack to form the call stack during execution. At the same time, there is a pointer to the current execution state, called ESP, which points to the context of execution in the call stack. When the function completes and needs to destroy its execution context, the JavaScript engine moves the ESP down one bit, destroying the execution context at the top of the stack. Because the pointer moves down, this portion of memory becomes invalid, and when another calling context is pushed onto the stack, it is overwritten to hold the context in which another function is executing.

When a function finishes executing, the JavaScript engine destroys its execution context on the stack by moving ESP down.

How is the data in the heap recycled

As mentioned earlier, non-basic types of data are stored in the heap. The calling context offsets the ESP pointer to free up stack space, but variables of reference types declared in the calling context still occupy heap space. To recycle garbage data from the heap, you need to use the JavaScript garbage collector.

Intergenerational hypothesis

Before we can learn more about JavaScript garbage collection strategies, we need to learn a hypothesis on which JavaScript garbage collection strategies are based.

The intergenerational hypothesis has the following two characteristics:

The first is that most objects live in memory for a short time. Simply put, many objects become inaccessible quickly once memory is allocated.
The second is the undead object, which will live longer.

In fact, these two characteristics not only apply to JavaScript, but also to most dynamic languages, such as Java, Python, etc.

In general, there are many garbage collection algorithms, but no one is suitable for all scenarios. You need to balance various scenarios and use different algorithms according to the life cycle of the object to achieve the best results.

Therefore, in V8, the heap is divided into new generation and old generation, with the new generation storing short-lived objects and the old generation storing long-lived objects.

Newborn areas usually only support 1 to 8M capacity, while older areas support much more capacity. V8 uses two different garbage collectors for each of these areas to perform garbage collection more efficiently.

Secondary garbage collector, mainly responsible for the new generation of garbage recycling.
Main garbage collector, mainly responsible for old generation garbage collection.

Garbage collector workflow: mark-collection-memory defragmenting

V8 divides the heap into two regions – the new generation and the old generation – and uses two different garbage collectors. Regardless of the type of garbage collector, they all have a common execution process.

The first step is to mark active and inactive objects in the space. Live objects are objects that are still in use, and inactive objects are objects that can be garbage collected.

The second step is to reclaim memory occupied by inactive objects. All objects marked as recyclable in memory are cleaned uniformly after all tags are completed.

The third step is to do memory defragment. Generally speaking, after frequent object collection, there will be a large number of discontinuous memory space, we call these discontinuous memory space memory fragmentation. When a large amount of memory fragmentation occurs in memory, it is possible to run out of memory if large contiguous memory needs to be allocated. So the last step is to defragment the memory, but this step is optional because some garbage collectors do not generate memory fragments, such as the sub-garbage collector in JavaScript.

Secondary garbage collector

The secondary garbage collector is mainly responsible for garbage collection in the new area. Typically, most of the small objects are allocated to the newborn area, so this area is not large, but garbage collection is frequent.

The Cenozoic are treated with the Scavenge algorithm. The Scavenge algorithm is the splitting of the Cenozoic space into two areas, half the insane and half the insane, as shown in the figure below:New objects are stored in the object area. When the object area is nearly full, a garbage cleanup operation is performed.

In the garbage collection process, the garbage in the object region is marked first; The garbage collector copies the surviving objects into the free area. It also arranges the objects in an orderly manner, so that the copying process is equivalent to memory defragmentation. After the copying, the free area is free of memory fragmentation.

After the replication is complete, the roles of the object area and the free area are reversed, that is, the original object area becomes the free area, and the original free area becomes the object area. This completes the collection of garbage objects, and this role reversal allows both areas to be reused indefinitely in the new generation.

Each time a cleanup operation is performed, live objects need to be copied from the object region to the free region. However, the replication operation requires time cost. If the space of the new area is set too large, each clean up will take too long. Therefore, the space of the new area will be set relatively small for efficiency. Because the newborn area doesn’t have much space, it’s easy to fill the entire area with living objects. To solve this problem, the JavaScript engine adopts the object promotion strategy, where objects that have survived two garbage collections are moved to the old area.

Main garbage collector

The main garbage collector is mainly responsible for garbage collection in the old area. In addition to the promoted objects in the freshman area, some large objects will be directly assigned to the old area. Therefore, objects in old age area have two characteristics, one is that objects occupy large space, the other is that objects live for a long time.

The main garbage collector uses a Mark-sweep algorithm for garbage collection.

The first is to mark the process phase. The marking phase is to start from a set of root elements and recursively traverse this set of root elements. In this traversal process, the elements that can be reached are called active objects, and the elements that cannot be reached can be judged as junk data.

Next comes the garbage removal process. It is completely different from the garbage removal process of the garbage collector. This process is to remove the red-marked data. Please refer to the following figure for a general understanding of the garbage removal process:The above marking and cleaning procedures are mark-clean algorithms, but executing the mark-clean algorithm on a piece of memory many times can result in a large number of discrete memory fragments. Too much fragmentation causes large objects to be unable to allocate enough contiguous memory, which leads to another algorithmMark-compact ， The marking process is still the same as in the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects to one end and then clean up memory directly beyond the end boundary.Please refer to the following figure:

Full pause and incremental markers

Since JavaScript runs on the main thread, once the garbage collection algorithm is executed, the JavaScript script that is being executed needs to be suspended and script execution is resumed after garbage collection is completed. We call this behavior stop-the-world.

When there is 1.5GB of data in the heap, V8 takes more than a second to complete a garbage collection, during which time the application’s performance and responsiveness plummets due to garbage collection causing JavaScript threads to pause. In the V8 new generation garbage collection, due to its small space and fewer surviving objects, total pause has little impact, but the old generation is different. The main thread takes a long time during garbage collection.

In order to reduce the lag caused by old garbage collection, V8 divides the tagging process into sub-tagging processes and alternates the garbage collection tagging and JavaScript application logic until the tagging phase is complete. This algorithm is calledIncremental Marking algorithm 。

Using incremental marking algorithm, can take a full garbage collection task broken down into many small tasks, these small task execution time is shorter, can with among other JavaScript task execution, so that when performing the above animation effects, will not let the user feel because of the garbage collection task page card. The subsequent clean-ups and incremental compact are also used, respectively.

The incremental reclamation process is complex, and two difficulties need to be solved: suspended restart and “side effects” during program execution. If you’re interested, look at the three-color markers. This article will not go into depth.