preface
This article is a summary of garbage Collection Algorithms and Implementations
What’s V8
V8 JavaScript Engine is a JavaScript Engine written in C++.
Garbage collection
Garbage Collection is called Garbage Collection. While the code is running, all data is stored in memory, and without GC, the developer must manually manage the memory, or one day the memory will be full, causing the process to crash, or even the system to crash. The purpose of GC is to allow the computer to automatically manage memory for the developer and to recycle garbage from memory that does not need to be used again.
This article covers only the garbage collection algorithms in V8, but there are many more, and there is no perfect garbage collection algorithm.
terms
Pointer to the
During GC, the corresponding object is destroyed or preserved. At this point we can use the object pointer to search for other objects.
Pointers generally point to the first address of an object. In fact, can be simply understood as JS reference address.
The heap
Refers to the memory space where objects are stored dynamically. In JS, objects are stored in the heap.
mutator
The entity of mutator is “application”
Personally, in the JS context, it refers to the V8 engine. As V8 runs, objects are constantly generated and references between objects are changed (i.e., Pointers are updated). These changes introduce garbage, which is where GC is needed.
Active/inactive objects
Live objects are objects in the heap that can be referenced by the mutator.
Inactive objects are objects in the heap that cannot be referenced by mutator, that is, memory garbage.
The root
A noun mentioned many times below is root. What is a root? The term is an object that can be referenced directly by mutator. For example
const obj = new Object(); Obj. Child = new Object(); / / object BCopy the code
In the first line of code, we create an object A whose reference address is assigned toobj
. In the second line, we create an object whose reference address is assigned toobj.child
. The heap then looks like the following figureHere because we can pass global variablesobj
Find A, so A is the active object, and then we can find B through A, so B is also the active object. So this global variableobj
It’s just a root.
Generational garbage collection
Objects in memory space are divided into two categories, new generation and old generation. Newly created objects are stored in the Cenozoic, and some objects survive the Cenozoic GC. Treat objects that have survived several times as old age objects.
The new generation of objects -> the old generation of objects process, we call promotion.
GC
The classification of
There are two types of GC, conservative and accurate.
Conservative typeGC
Conservative (GC)
When GC cannot tell whether something is a pointer or not, the root is called undefined.
For example, we define the global variable A and the local variable obj, where A is a value and obj is a pointer (reference address). The reference address is the same as the value A, so GC can hardly tell whether a is a pointer or a value. Therefore, treat it conservatively as a pointer, and when the object obj points to should be garbage collected, it will not be garbage collected due to the presence of global variable A. This is conservative GC.
window.a = 0x00d0caf0; Const obj = new Object(); // Create an object at address 0x00d0caf0Copy the code
In a conservative GC scenario, objects cannot be moved. Because if you move an object, that means that the reference address of the object changes, then obj will be overwritten as the reference address of the moved object. At the same time, the global variable A will also be overwritten, which is very scary. So objects cannot be moved.
Accurate typeGC
Exact (GC)
As the name implies, precise GC correctly identifies what is a value and what is a pointer. To implement accurate GC, it depends on the processing of the programming language and means an increase in cost, which I won’t repeat here.
V8
In theGC
Exact GC is implemented in V8.
GC
In terms of algorithm, generational garbage collection is adopted, and its structure is as follows.
GC
Replication algorithm (By Cheney)
The GC replication algorithm divides the memory space into From and To. When the From space is full, the active objects (highlighted) in the From space are copied To the To space, the inactive objects are reclaimed, and then From and To are exchanged. Obviously, From and To have exactly the same size space.
There are many kinds of GC replication algorithms, such as those developed by Robert R.Fenichel and Jerome C.Yochelson and THOSE developed by C. J.Cheney. Here is the algorithm that Cheney developed.
Algorithm process
In Cheney’s replication algorithm, the algorithm flow is as follows
The initial state
First, copy all objects that are directly referenced from the root, B and G. Note that the new B refers to A in From, and the new G refers to B in From (B1 for differentiation) and E.
And then, searchB1
“Was found to be quotedA
, so theA
Copied to theTo
At the same timeB
Point to.
And then, searchG
,E
Copied to theTo
, andG
Point to theB1
The pointer has changedB
.
Finally, searchA
andE
If no reference object is found, delete itFrom
That will beFrom
andTo
Space swap, end of copy algorithm.
advantages
- Throughput of large
- Fragmentation does not occur
- There is no recursive call to the function. Cheney’s algorithm uses an iterative approach to replication, which means there is no excessive consumption of the stack
disadvantages
- The memory space utilization is small, and obviously we can only use half of the memory space to allocate objects at a time
- Not compatible with conservative
GC
Because the object is moved
trigger
When there is no partition in space
GC
Mark-clear algorithm
The algorithm is divided into two stages
- Marking phase: This phase recursively traverses all live objects in the heap, marking them
- Cleanup phase: Iterates through the heap to recycle all unmarked objects. The larger the heap, the longer the recovery time
It is clear that after these two phases, unused memory can be reused.
Advantages:
- Simple algorithm implementation
- Can be applied to the conservative form
GC
The scene of
Disadvantages:
Fragmentation occurs in memory after multiple GC sessions. The consequence of fragmentation is that even if the total amount of available memory is sufficient, it is impossible to allocate content because there is not enough individual space (this is where the mark-compression algorithm mentioned below is used)
Assume that the total memory space is 5KB. In the figure below, A, B, C, D, and E each occupy 1KBGC
After that, B and D are reclaimed, and 2KB is left in the memory space. At this time, an object with a size of 2KB is allocated to the memory space, which cannot be allocated because the remaining 2KB space is not contiguous.
trigger
- When a space in the old time space is not divided
- When a certain number of objects are allocated to the old chronospace (start the new generation
GC
Will check) - When there are no Cenozoic sized segments in the old chronospace (at this time the Cenozoic cannot be guaranteed
GC
(promotion)
GC
Mark-compression algorithm
-
Marking phase: V8 marks in a depth-first way, marking an object and then marking its children. Depth-first traversal is usually performed recursively, and when recursively, a stack is naturally required, which in V8 is generated by V8 itself. The space used by stack is the From space of the new generation. Because the old generation GC must be performed before the new generation GC, this time From space is empty, since the empty, it is better to put the stack here, do not use xD.
-
Compression stage:
Before compression, you can see that there are many empty little squares in the old chronospace, which are memory fragments
During compression, memory objects are moved sequentially to the front of the memory space
After compression, you can see that the blank squares are contiguous and there is no memory fragmentation
advantages
- Efficient use of the heap, unlike the replication algorithm can only use half of the heap, there is no memory fragmentation
disadvantages
- In the process of algorithm, the cost of constantly moving objects is very high compared with other algorithms
trigger
- When the amount of debris in the old chronosphere reaches a certain level
Afterword.
At first, the title was “Garbage Collection Algorithms”, however… In addition to the above, there are reference counting, incremental garbage collection, RC Immix algorithm and so on. Even if it is the same algorithm, different designers also have certain differences. There are also some differences in GC algorithm implementation in different languages. So much content that I added “in V8” in front of the title.
This article involves things that are not common in the actual development, and there are many terms, it is inevitable that there are some mistakes, if a friend found, please leave a comment in the comment section.