preface

This article is a summary of garbage Collection Algorithms and Implementations

What’s V8

V8 JavaScript Engine is a JavaScript Engine written in C++.

Garbage collection

Garbage Collection is called Garbage Collection. While the code is running, all data is stored in memory, and without GC, the developer must manually manage the memory, or one day the memory will be full, causing the process to crash, or even the system to crash. The purpose of GC is to allow the computer to automatically manage memory for the developer and to recycle garbage from memory that does not need to be used again.

This article covers only the garbage collection algorithms in V8, but there are many more, and there is no perfect garbage collection algorithm.

terms

Pointer to the

During GC, the corresponding object is destroyed or preserved. At this point we can use the object pointer to search for other objects.

Pointers generally point to the first address of an object. In fact, can be simply understood as JS reference address.

The heap

Refers to the memory space where objects are stored dynamically. In JS, objects are stored in the heap.

mutator

The entity of mutator is “application”

Personally, in the JS context, it refers to the V8 engine. As V8 runs, objects are constantly generated and references between objects are changed (i.e., Pointers are updated). These changes introduce garbage, which is where GC is needed.

Active/inactive objects

Live objects are objects in the heap that can be referenced by the mutator.

Inactive objects are objects in the heap that cannot be referenced by mutator, that is, memory garbage.

The root

A noun mentioned many times below is root. What is a root? The term is an object that can be referenced directly by mutator. For example

const obj = new Object(); Obj. Child = new Object(); / / object BCopy the code

In the first line of code, we create an object A whose reference address is assigned toobj. In the second line, we create an object whose reference address is assigned toobj.child. The heap then looks like the following figureHere because we can pass global variablesobjFind A, so A is the active object, and then we can find B through A, so B is also the active object. So this global variableobjIt’s just a root.

Generational garbage collection

Objects in memory space are divided into two categories, new generation and old generation. Newly created objects are stored in the Cenozoic, and some objects survive the Cenozoic GC. Treat objects that have survived several times as old age objects.

The new generation of objects -> the old generation of objects process, we call promotion.

GCThe classification of

There are two types of GC, conservative and accurate.

Conservative typeGCConservative (GC)

When GC cannot tell whether something is a pointer or not, the root is called undefined.

For example, we define the global variable A and the local variable obj, where A is a value and obj is a pointer (reference address). The reference address is the same as the value A, so GC can hardly tell whether a is a pointer or a value. Therefore, treat it conservatively as a pointer, and when the object obj points to should be garbage collected, it will not be garbage collected due to the presence of global variable A. This is conservative GC.

window.a = 0x00d0caf0; Const obj = new Object(); // Create an object at address 0x00d0caf0Copy the code

In a conservative GC scenario, objects cannot be moved. Because if you move an object, that means that the reference address of the object changes, then obj will be overwritten as the reference address of the moved object. At the same time, the global variable A will also be overwritten, which is very scary. So objects cannot be moved.

Accurate typeGCExact (GC)

As the name implies, precise GC correctly identifies what is a value and what is a pointer. To implement accurate GC, it depends on the processing of the programming language and means an increase in cost, which I won’t repeat here.

V8In theGC

Exact GC is implemented in V8.

GCIn terms of algorithm, generational garbage collection is adopted, and its structure is as follows.

GCReplication algorithm (By Cheney)

The GC replication algorithm divides the memory space into From and To. When the From space is full, the active objects (highlighted) in the From space are copied To the To space, the inactive objects are reclaimed, and then From and To are exchanged. Obviously, From and To have exactly the same size space.

There are many kinds of GC replication algorithms, such as those developed by Robert R.Fenichel and Jerome C.Yochelson and THOSE developed by C. J.Cheney. Here is the algorithm that Cheney developed.

Algorithm process

In Cheney’s replication algorithm, the algorithm flow is as follows

The initial state

First, copy all objects that are directly referenced from the root, B and G. Note that the new B refers to A in From, and the new G refers to B in From (B1 for differentiation) and E.

And then, searchB1“Was found to be quotedA, so theACopied to theToAt the same timeBPoint to.

And then, searchG,ECopied to theTo, andGPoint to theB1The pointer has changedB.

Finally, searchAandEIf no reference object is found, delete itFromThat will beFromandToSpace swap, end of copy algorithm.

advantages

  • Throughput of large
  • Fragmentation does not occur
  • There is no recursive call to the function. Cheney’s algorithm uses an iterative approach to replication, which means there is no excessive consumption of the stack

disadvantages

  • The memory space utilization is small, and obviously we can only use half of the memory space to allocate objects at a time
  • Not compatible with conservativeGCBecause the object is moved

trigger

When there is no partition in space

GCMark-clear algorithm

The algorithm is divided into two stages

  • Marking phase: This phase recursively traverses all live objects in the heap, marking them
  • Cleanup phase: Iterates through the heap to recycle all unmarked objects. The larger the heap, the longer the recovery time

It is clear that after these two phases, unused memory can be reused.

Advantages:

  • Simple algorithm implementation
  • Can be applied to the conservative formGCThe scene of

Disadvantages:

Fragmentation occurs in memory after multiple GC sessions. The consequence of fragmentation is that even if the total amount of available memory is sufficient, it is impossible to allocate content because there is not enough individual space (this is where the mark-compression algorithm mentioned below is used)

Assume that the total memory space is 5KB. In the figure below, A, B, C, D, and E each occupy 1KBGCAfter that, B and D are reclaimed, and 2KB is left in the memory space. At this time, an object with a size of 2KB is allocated to the memory space, which cannot be allocated because the remaining 2KB space is not contiguous.

trigger

  • When a space in the old time space is not divided
  • When a certain number of objects are allocated to the old chronospace (start the new generationGCWill check)
  • When there are no Cenozoic sized segments in the old chronospace (at this time the Cenozoic cannot be guaranteedGC(promotion)

GCMark-compression algorithm

  • Marking phase: V8 marks in a depth-first way, marking an object and then marking its children. Depth-first traversal is usually performed recursively, and when recursively, a stack is naturally required, which in V8 is generated by V8 itself. The space used by stack is the From space of the new generation. Because the old generation GC must be performed before the new generation GC, this time From space is empty, since the empty, it is better to put the stack here, do not use xD.

  • Compression stage:

Before compression, you can see that there are many empty little squares in the old chronospace, which are memory fragments

During compression, memory objects are moved sequentially to the front of the memory space

After compression, you can see that the blank squares are contiguous and there is no memory fragmentation

advantages

  • Efficient use of the heap, unlike the replication algorithm can only use half of the heap, there is no memory fragmentation

disadvantages

  • In the process of algorithm, the cost of constantly moving objects is very high compared with other algorithms

trigger

  • When the amount of debris in the old chronosphere reaches a certain level

Afterword.

At first, the title was “Garbage Collection Algorithms”, however… In addition to the above, there are reference counting, incremental garbage collection, RC Immix algorithm and so on. Even if it is the same algorithm, different designers also have certain differences. There are also some differences in GC algorithm implementation in different languages. So much content that I added “in V8” in front of the title.

This article involves things that are not common in the actual development, and there are many terms, it is inevitable that there are some mistakes, if a friend found, please leave a comment in the comment section.