Heap GC is a stack GC that relies on ESP downscaling and memory overusing
This article delves into the nuts and nuts of V8 GC, with a lot of pseudo-code and concepts that are not suitable for reading
Before we talk about GC, let’s talk about some important concepts:
The basic concept
The Generational Hypothesis.
- Most variables are allocated and cleared soon after
- There are a few variables that don’t die for a long time, or even for the entire program cycle
In response to The Generational Hypothesis, JS proposed The following solution:
- This part of the collection is called the paralgarbage collector
- We call this part of the collection the main garbage collector
mutator
You don’t need to know the exact concept of Mutator, you just need to know the behavior it brings:
- Make a new object
- Change the old object reference
As you can see, mutator is a continuation of the program, and gc exists to deal with the garbage generated by mutator
Common GC algorithms
Tracking the recycling
-
Mark clearance:
Details will be covered in subsequent chapters
The advantage is that there is no extra memory, just need to mark the head of the existing object, and finally unified cleanup
-
Copy tracking:
It requires a new space to copy live objects, so no unified cleanup is required, and it has the advantage of being fast (one pass through) and having an advantage in the case of large garbage percentages
In fact, trace algorithms are used for gc in both old and new generations of javascript
Reference counting
Tracing algorithms require a separate time for GC marking, not reference counting
It simply identifies the references to each object and kills the unreferenced objects when gc is needed
Of course, it also has problems with circular reference, and reference counting is not covered here
V8 heap memory
This article focuses more on the technical details of GC, and the broad abstractions are only briefly described here
As shown in the figure above, V8 is divided into new/old generations, the benefits of which I have already mentioned in the generation hypothesis
To narrate convenient, introduce new generation space here
- The new generation is divided into to/ FROM two Spaces, corresponding to the object area and the free area in the figure above respectively
- The New generation has only 1-8m ram
- Cenozoic space stores only objects that soon die
Cheney of the Cenozoic GC
Cheney is an implementation of the Scavenge algorithm
As mentioned above, this article is intended to dissect the technical details, but refer to other articles for general concepts
Cheney’s general process is as follows:
Now the big part, how does Cheney do that?
Core code: copy
copying(){
scan = $free = $to_start
for(r : $roots)
*r = copy(*r)
// scan is used to search for Pointers that have been copied to to-space objects
// Free is a forward-leaning pointer used to identify the header
// Scan and free are Pointers to the to space
while(scan ! = $free)
for(child : children(scan))
*child = copy(*child)
scan += children(scan).size
swap($from_start, $to_start)
}
copy(obj){
// scan catch up $free ? true : false
if(is_pointer_to_heap(obj.forwarding, $to_start) == FALSE)
copy_data($free, obj, obj.size)
obj.forwarding = $free
$free += obj.size
return obj.forwarding
}
Copy the code
-
The so-called survival check does not actually “check “, but directly iterates the nodes and their children pointed to by root(in js’s memory model, variables on the stack) through the reachability analysis algorithm
-
The figure above shows the to space after executing the following statement, with the free pointer leading scan
for(r:roots){ *r = copy(*r) } Copy the code
Iterative copy
-
The figure above shows the to space after iterating over the child object that copies the root reference
while($free! = scan){for(child:children(scan)) { // $free right move *child = copy(*child) } scan += children(scan).size } Copy the code
-
Combined with the figure above, we can clearly imagine that the end condition occurs when the object searched by the Scan pointer has no child reference object to be copied iteratively
Algorithm principle
In fact, the scan and free Pointers must eventually be equal.
- The scan pointer initially lags behind the free pointer because the root block is copied first
- The scan pointer gradually catches up to the Free pointer because the scan pointer iterates through all the copied blocks to achieve the effect of iterating over the copied sub-blocks
At this point, the inspection of the surviving objects is accepted and the copying function completes
Other technical details
- This is done by the secondary collector, which skips it when it detects that an object belongs to the old generation. The main collector performs the major gc for the old generation object
- One problem is that if the only reference to a new generation object is an old generation object, it will not be copied
To solve the above problems, we introduce a concept:
recordset
In the mutator process, object B is added with A reference to object A. If object B is an old generation object and object A is A new generation object, its OBJECT B(the object that issued the reference) is added to the recordset, and A reference is obtained by scanning the recordset during side-collection.
Transition between old and new generations
Promotion promotion
when?
- The new generation target object has gone through a secondary collector GC
copy
In the process,to
Space usage exceeds 25%
In both cases, the new generation is promoted to the old generation
how?
We need to look at the process of promotion, and it’s actually similar to copy in Cheney, it’s not a simple pointer space move, it’s a copy of a new object to the old generation.
Recordset update
As mentioned above, the recordset does not store the new generation object, but the old generation object that initiates the reference. When the new generation is copied to the old generation, the reference will be synchronized to the old generation.
Old gc mark – clear
It is not recommended that you understand GC in an online way, such as mark-sweep or reference counting. Gc should be viewed from an engine perspective, not from a code parsing perspective
Mark-clear
Before we talk about the next algorithms, we must first understand the great —- mark-clean algorithm in the GC world
Some common sense
We must first know that an object has two fields in memory: head and field. Head stores its information (including type, size, etc.), and all our identification operations are carried out in the head field. You can think of the head field as an index of an object.
process
- Get root reference(you can interpret it as global)
- Look down through root reference for references that can be retrieved and mark them
- Iterate through the object again to find unmarked garbage for collection
As you can easily see, the mark-clear algorithm is not interruptible, it is a recursive process. But to make sure the rest of it works, let’s move on to another algorithm
Tri-color marking(tri-color marking)
If you are familiar with react’s async Reconcile process, you should be able to figure out how to convert recursive updates to iterative updates using accelerated update markers to skip updated Fiber. Tricolor increments work exactly the same way.
Tri-color marking divides the object into three colors:
- White (not traversed)
- Gray (iterate over part, wait to iterate over subreferences)
- Black (all child references are also processed)
Obviously, only two colors exist after GC: white or black
At this point, white means garbage because it has never been traversed by the reachability algorithm, and black is the living object
The technical details
How to paint the color
-
Gray the root reference directly
-
Gray out the root reference child nodes (there is no need to recursively process the child nodes), and then black out the root reference
-
Recursively color the grey nodes
-
Delete the white nodes in series
Existing problems
Let’s start with a brief introduction to the above images:
- A: A is the root reference, and B is the child reference of A. In this case, perform the corresponding steps of coloring 1 and 2
- A-b gap: Execute mutator where variable reference changes and child reference of B switches to A
- B: In the recovery marking phase, A has been blacked out and cannot be searched, and b’s reference to C has been lost
- C: The c object is obviously alive, but has not been marked at any time
Double listeners are used in redux to listen for changes to the queue triggered by the removal of the event. Notice how common the problem is for C objects to be missed. The Mutator has dynamically modified the reference so that it cannot be traversed. Let’s use a new concept
Write barriers
See the code directly
void write_barrier($obj,field,newField) {
// obj is an old generation object, newField is a new generation object, obj exists in the recordset
if($obj >= $oldStart && newField <= $oldStart && $obj.ismemorable = false) {
addMemorableSet($old)
$obj.ismemorable = true
}
// New references are directly greased
if(! newField.mark) { newField.color = markingObj(newField,TRI_COLOR_MARKING_GREY) newField.mark =true
}
// Modify the reference
*yield = $newObj.yield
}
Copy the code
The key is that new references to changes made during the Mutator process are directly grayed out. This process is handled separately, not iterated over by the parent node.
Color phase
The increment tag is actually divided into three parts:
- Root reference lookup
- Marking (asynchronous marking -> mutator -> marking)
- Clear (synchronous clear)
You only need to pay attention to a few things:
- The root reference search is completely synchronous and only happens once
- The tag is not a continuous process; it is called in between
mutator
- Since the tags are not continuous, we introduce
write barries
To handle issues such as reference changes
Indeed, it is very similar to react Concurrent-mode diff and commit
Mark-compress
Mark-compression is a derivative of the mark-clear algorithm, which is designed to remove memory gaps after GC
reference
- v8 GC
- Accessibility analysis algorithm
- The garbage collection
Ball ball you give a star or like
If you find it helpful, check out my other tech post on Github