preface

The V8 garbage collector (GC) has changed a lot over the years, with the Orinoco project adopting the Stop-the-World garbage collector to become a more parallel, concurrent and incremental garbage collector.

How is data reclaimed in the call stack?

Let’s start by talking about how data in the call stack is garbage collected. The first is the data in the call stack. We still analyze its recycling mechanism through the execution process of a sample code, as follows:

function foo(){
    var a = 1
    var b = {name:Geek Bang}
    function showName(){
      var c = "Geek Time"
      var d = {name:"Geek Time"}
    }
    showName()
}
foo()
Copy the code

When line 6 is executed, the call stack and heap space state diagram looks like this:

Then, when the showName function completes, the function execution flow goes to foo, and the execution context of the showName function needs to be destroyed. ESP(a pointer to the current execution state) is useful in this case. JavaScript moves ESP down to the execution context of foo, which destroys the execution context of showName. The details are shown below.

So, when a function finishes executing, the JavaScript engine destroys its execution context on the stack by moving ESP down.

How is the data in the heap recycled?

As we know from above, ESP should be pointing to the global execution context after foo is executed, but the two objects stored in the heap still take up space, as shown below:

To recycle garbage data from the heap, you need to use the JavaScript garbage collector.

The Intergenerational hypothesis &V8 Garbage Recycling mechanism

Intergenerational hypothesis

There is an important term in recycling: the “generational hypothesis”; The generation hypothesis states that many objects have a short lifetime in memory. In other words, from a garbage collection point of view, many objects become inaccessible as soon as memory is allocated. This hypothesis is not only applicable to V8 and JavaScript, but also has the following two characteristics: 1. The first is that most objects live in memory for a short time. Simply put, many objects become inaccessible quickly once memory is allocated. 2. The second one is undead and will live longer.

Generational heap layout

In V8, the heap is divided into two distinct regions, which are called old age and Cenozoic, and the Latter is further divided into the ‘nursery’ (from space) progeny and the ‘intermediate’ (to space) progeny. The first time an object is allocated memory is allocated to the ‘from-space’ children of the new generation. If the object is still in the new generation after the next garbage collection, we move it to the ‘to-space’ generation, and then after the next garbage collection the object is still in the new generation, we move it to the old generation.

V8 Two garbage collectors

V8 has two garbage collectors, a primary and a secondary garbage collector. The main garbage collector collects garbage from the entire heap, and the Scavenger collects garbage from the new generation. The main garbage collector is very efficient at collecting garbage from the entire heap, but the generational hypothesis tells us that newly allocated objects are also very likely to need garbage collection.

Secondary Garbage Collector — (Scavenger)

The sub-garbage collector only collects garbage from the new generation; surviving objects are always allocated to pages of memory. During cleaning, the initial free area is called “to-space”, and the area From which the object is copied is called “from-space”. In the worst case, if every object survives the cleanup, we have to copy every object.

For cleanup, we maintain an additional root set that holds references from old to new. These references are Pointers to objects in the new generation in old-space. We use this pointer to maintain a list of references from old to new, rather than tracking every object change in the entire heap. When the heap is used with global objects, we know every reference to an object in the new generation without having to trace the entire old generation.

Then move all live objects into a contiguous chunk of memory, which has the advantage of completely removing memory fragmentation (cleaning up memory debris left over from inactive objects). Then we swap the two memory Spaces, changing ‘to-space’ To ‘from-space’ and vice versa. Once garbage collection is complete, the newly allocated memory Space will start with the free memory address next to ‘from-space’.

With this strategy alone, we would quickly run out of memory for the new generation; In order not To run out of memory Space for the new generation, in the next garbage collection we move the live object To the old generation instead of ‘to-space’.

The final step in the cleanup is to update the pointer address of the moved object. Each copied object is left with a forward address that can be updated to point to the new address.

Main garbage collector

The main garbage collector is mainly responsible for garbage collection in the old area. In addition to the promoted objects in the freshman area, some large objects will be directly assigned to the old area. Therefore, objects in old age area have two characteristics, one is that objects occupy large space, the other is that objects live for a long time.

The Scavenge algorithm takes a lot of time to copy large objects to be recycled using the Scavenge algorithm, resulting in inefficient recycling execution and a waste of up to half the space. Therefore, the main garbage collector uses a Mark-sweep algorithm for garbage collection. Let’s see how this algorithm works.

The first is to mark the process phase. The marking phase is to start from a set of root elements and recursively traverse this set of root elements. In this traversal process, the elements that can be reached are called active objects, and the elements that cannot be reached can be judged as junk data.

Next comes the garbage removal process. It is completely different from the garbage collection process of the garbage collector. You can understand that this process is to remove the red marked data. Please refer to the following figure for an overview of the garbage collection process:

The above marking and cleaning procedures are mark-clean algorithms, but executing the mark-clean algorithm on a piece of memory many times can result in a large number of discrete memory fragments. Too much fragmentation causes large objects to be unable to allocate enough contiguous memory, which leads to another algorithm, mark-compact. The marking process is still the same as in mark-clean, but the next step is not to clean up the recyclable objects directly. Instead, move all surviving objects to one end and then clean up memory directly beyond the end boundary. You can refer to the picture below

Orinoco collection enforcement mechanism

Orinoco, the code name for the V8 garbage collector project, uses the latest and best garbage collection techniques to reduce main thread hang time, such as parallel, incremental, and concurrent garbage collection.

Parallel garbage collection

Parallelism is when the main thread and the helper thread perform the same work at the same time, but this is still a stop-the-world garbage collection, but the garbage collection time is equal to the total time divided by the number of threads participating (plus some synchronization overhead). This is the easiest JavaScript garbage collection of the three techniques; There is no JavaScript execution, so just make sure that only one helper thread is accessing the object at the same time.

Incremental garbage collection

Incremental garbage collection is a way for the main thread to do small amounts of garbage collection intermittently. We do not perform the entire garbage collection process in incremental garbage collection, but only a small part of the overall garbage collection process. This is extremely difficult to do because JavaScript also executes at the same time as incremental garbage collection, which means that the heap state has changed, potentially rendering the previous incremental collection completely invalid. As you can see from the figure, the main thread pause time does not decrease (in fact, it usually increases slightly), but only increases over time. But it’s still a good way to solve the problem by intermittently executing JavaScript, and also intermittently doing garbage collection, so that JavaScript execution is still responsive to user input or animation.

Concurrent garbage collection

Concurrency is when the main thread executes JavaScript all the time, while the helper thread performs full garbage collection in the background. This approach is the most difficult of the three techniques, as the contents of the JavaScript heap can change at any time, invalidating all previous work. Most importantly, there are now Read /write races, and it is very possible for the main thread and the helper thread to change the same object at the same time. The advantages of this approach are obvious: the main thread is not suspended, and JavaScript can execute freely, despite some synchronization overhead to ensure that only one helper thread is modifying the same object at a time.

conclusion

Most JavaScript developers don’t need to think about garbage collection, but understanding some of the internals of garbage collection can help you understand memory usage and the appropriate coding paradigm.

As you can see from the above analysis, there is often no perfect solution for either the garbage collection strategy or the strategy for dealing with Orinoco’s recycling enforcement mechanism. You need to spend some time making trade-offs, and this requires sacrificing some of the current measures in exchange for improvements in others.