Heap GC is a stack GC that relies on ESP downscaling and memory overusing

This article delves into the nuts and nuts of V8 GC, with a lot of pseudo-code and concepts that are not suitable for reading

Before we talk about GC, let’s talk about some important concepts:

The basic concept

The Generational Hypothesis.

  1. Most variables are allocated and cleared soon after
  2. There are a few variables that don’t die for a long time, or even for the entire program cycle

In response to The Generational Hypothesis, JS proposed The following solution:

  1. This part of the collection is called the paralgarbage collector
  2. We call this part of the collection the main garbage collector

mutator

You don’t need to know the exact concept of Mutator, you just need to know the behavior it brings:

  • Make a new object
  • Change the old object reference

As you can see, mutator is a continuation of the program, and gc exists to deal with the garbage generated by mutator

Common GC algorithms

Tracking the recycling
  • Mark clearance:

    Details will be covered in subsequent chapters

    The advantage is that there is no extra memory, just need to mark the head of the existing object, and finally unified cleanup

  • Copy tracking:

    It requires a new space to copy live objects, so no unified cleanup is required, and it has the advantage of being fast (one pass through) and having an advantage in the case of large garbage percentages

In fact, trace algorithms are used for gc in both old and new generations of javascript

Reference counting

Tracing algorithms require a separate time for GC marking, not reference counting

It simply identifies the references to each object and kills the unreferenced objects when gc is needed

Of course, it also has problems with circular reference, and reference counting is not covered here

V8 heap memory

This article focuses more on the technical details of GC, and the broad abstractions are only briefly described here

As shown in the figure above, V8 is divided into new/old generations, the benefits of which I have already mentioned in the generation hypothesis

To narrate convenient, introduce new generation space here

  • The new generation is divided into to/ FROM two Spaces, corresponding to the object area and the free area in the figure above respectively
  • The New generation has only 1-8m ram
  • Cenozoic space stores only objects that soon die

Cheney of the Cenozoic GC

Cheney is an implementation of the Scavenge algorithm

As mentioned above, this article is intended to dissect the technical details, but refer to other articles for general concepts

Cheney’s general process is as follows:

Now the big part, how does Cheney do that?

Core code: copy
copying(){
    scan = $free = $to_start 
    for(r : $roots)
        *r = copy(*r)
	// scan is used to search for Pointers that have been copied to to-space objects
	// Free is a forward-leaning pointer used to identify the header
        // Scan and free are Pointers to the to space
    while(scan ! = $free) 
        for(child : children(scan))
            *child = copy(*child) 
        scan += children(scan).size
    
    swap($from_start, $to_start)
}

copy(obj){
    // scan catch up $free ? true : false
    if(is_pointer_to_heap(obj.forwarding, $to_start) == FALSE)
        copy_data($free, obj, obj.size) 
        obj.forwarding = $free
        $free += obj.size
    return obj.forwarding
}
Copy the code

  • The so-called survival check does not actually “check “, but directly iterates the nodes and their children pointed to by root(in js’s memory model, variables on the stack) through the reachability analysis algorithm

  • The figure above shows the to space after executing the following statement, with the free pointer leading scan

    for(r:roots){
    	*r = copy(*r)
    }
    Copy the code
Iterative copy

  • The figure above shows the to space after iterating over the child object that copies the root reference

    while($free! = scan){for(child:children(scan)) {
            // $free right move
      		*child = copy(*child)
    		}
      	scan += children(scan).size
    }
    Copy the code
  • Combined with the figure above, we can clearly imagine that the end condition occurs when the object searched by the Scan pointer has no child reference object to be copied iteratively

Algorithm principle

In fact, the scan and free Pointers must eventually be equal.

  • The scan pointer initially lags behind the free pointer because the root block is copied first
  • The scan pointer gradually catches up to the Free pointer because the scan pointer iterates through all the copied blocks to achieve the effect of iterating over the copied sub-blocks

At this point, the inspection of the surviving objects is accepted and the copying function completes

Other technical details
  • This is done by the secondary collector, which skips it when it detects that an object belongs to the old generation. The main collector performs the major gc for the old generation object
  • One problem is that if the only reference to a new generation object is an old generation object, it will not be copied

To solve the above problems, we introduce a concept:

recordset

In the mutator process, object B is added with A reference to object A. If object B is an old generation object and object A is A new generation object, its OBJECT B(the object that issued the reference) is added to the recordset, and A reference is obtained by scanning the recordset during side-collection.

Transition between old and new generations

Promotion promotion

when?
  • The new generation target object has gone through a secondary collector GC
  • copyIn the process,toSpace usage exceeds 25%

In both cases, the new generation is promoted to the old generation

how?

We need to look at the process of promotion, and it’s actually similar to copy in Cheney, it’s not a simple pointer space move, it’s a copy of a new object to the old generation.

Recordset update

As mentioned above, the recordset does not store the new generation object, but the old generation object that initiates the reference. When the new generation is copied to the old generation, the reference will be synchronized to the old generation.

Old gc mark – clear

It is not recommended that you understand GC in an online way, such as mark-sweep or reference counting. Gc should be viewed from an engine perspective, not from a code parsing perspective

Mark-clear

Before we talk about the next algorithms, we must first understand the great —- mark-clean algorithm in the GC world

Some common sense

We must first know that an object has two fields in memory: head and field. Head stores its information (including type, size, etc.), and all our identification operations are carried out in the head field. You can think of the head field as an index of an object.

process
  1. Get root reference(you can interpret it as global)
  2. Look down through root reference for references that can be retrieved and mark them
  3. Iterate through the object again to find unmarked garbage for collection

As you can easily see, the mark-clear algorithm is not interruptible, it is a recursive process. But to make sure the rest of it works, let’s move on to another algorithm

Tri-color marking(tri-color marking)

If you are familiar with react’s async Reconcile process, you should be able to figure out how to convert recursive updates to iterative updates using accelerated update markers to skip updated Fiber. Tricolor increments work exactly the same way.

Tri-color marking divides the object into three colors:

  • White (not traversed)
  • Gray (iterate over part, wait to iterate over subreferences)
  • Black (all child references are also processed)

Obviously, only two colors exist after GC: white or black

At this point, white means garbage because it has never been traversed by the reachability algorithm, and black is the living object

The technical details
How to paint the color
  1. Gray the root reference directly

  2. Gray out the root reference child nodes (there is no need to recursively process the child nodes), and then black out the root reference

  3. Recursively color the grey nodes

  4. Delete the white nodes in series

Existing problems

Let’s start with a brief introduction to the above images:

  • A: A is the root reference, and B is the child reference of A. In this case, perform the corresponding steps of coloring 1 and 2
  • A-b gap: Execute mutator where variable reference changes and child reference of B switches to A
  • B: In the recovery marking phase, A has been blacked out and cannot be searched, and b’s reference to C has been lost
  • C: The c object is obviously alive, but has not been marked at any time

Double listeners are used in redux to listen for changes to the queue triggered by the removal of the event. Notice how common the problem is for C objects to be missed. The Mutator has dynamically modified the reference so that it cannot be traversed. Let’s use a new concept

Write barriers

See the code directly

void write_barrier($obj,field,newField) {
  // obj is an old generation object, newField is a new generation object, obj exists in the recordset
  if($obj >= $oldStart && newField <= $oldStart && $obj.ismemorable = false) {
    addMemorableSet($old)
    $obj.ismemorable = true
  }
  // New references are directly greased
  if(! newField.mark) { newField.color = markingObj(newField,TRI_COLOR_MARKING_GREY) newField.mark =true
  }
  // Modify the reference
  *yield = $newObj.yield
}

Copy the code

The key is that new references to changes made during the Mutator process are directly grayed out. This process is handled separately, not iterated over by the parent node.

Color phase

The increment tag is actually divided into three parts:

  • Root reference lookup
  • Marking (asynchronous marking -> mutator -> marking)
  • Clear (synchronous clear)

You only need to pay attention to a few things:

  1. The root reference search is completely synchronous and only happens once
  2. The tag is not a continuous process; it is called in betweenmutator
  3. Since the tags are not continuous, we introducewrite barriesTo handle issues such as reference changes

Indeed, it is very similar to react Concurrent-mode diff and commit

Mark-compress

Mark-compression is a derivative of the mark-clear algorithm, which is designed to remove memory gaps after GC

reference

  1. v8 GC
  2. Accessibility analysis algorithm
  3. The garbage collection

Ball ball you give a star or like

If you find it helpful, check out my other tech post on Github