GC stands for Garbage Collection.

What is computer program junk?

If you’ve ever written a computer program, you know that there are references, assignments, operations, etc., going on all the time. As a result, there are certain variables that the program will not use after they are used, but they still occupy a certain amount of memory. Data in memory that will not be used again by a program is called garbage.

What is recycling?

In short, recycling is the process of releasing the “garbage” generated by the above program so that the program can use this area of memory again.

The history of GC

  • In 1960 John McCarthy first published the famous CG algorithm, THE GC tag-sweep algorithm
  • In 1960 George E. Collins published the GC algorithm for reference counting
  • In 1963 Marvin L. Minsky published the GC assignment algorithm

Amazingly, all of the current GC algorithms are simply combinations or applications of the above three algorithms, that is, since the assignment algorithm was born in 1963, no new GC algorithms have been created so far!

Basic concepts in GC

First, let’s make it clear that the basic unit of GC destruction movement in memory is objects.

object

An object consists of a head and a field.

Head to head

The part of an object that holds information about the object itself is called the header. The information in the head cannot be accessed or modified by users and contains the following information.

  • Object size
  • Object types

The domain field

Fields are familiar to us, and the parts of JavaScript that can be accessed directly using property operators are called fields. A domain contains two types of data.

  • Pointers, that is, reference data types
  • Non-pointers, i.e., basic data types such as true, false, 1… .

mutator

This refers to modifying the reference relationships between CG objects. More accurately, it is the ‘application’ itself, which is the code we write.

Heap heap

The heap refers to the memory space used to store objects dynamically (that is, when a program is executing). When the Mutator requests to store objects, the required memory space is allocated to the Mutator from the heap.

Active object/inactive object

Objects allocated to memory that can be referenced by the mutator are called ‘live objects’ and’ inactive objects’, i.e. garbage.

Root root

In the CG world, the root refers to the starting part of the object pointer. That is, global variables in the **mutator **(application). Objects that can be traversed by these roots are active objects, and non-active objects (garbage).





GC tag – cleanup algorithm

As the name of the algorithm describes, its execution can be divided into two stages: marking and cleaning. A mark is an object in memory that can be traversed by the root (global object) mentioned above and marked as an active object. After the mark is completed, the unmarked object in memory is cleared, which is an inactive object. Through these two steps, the memory space can be reused.

Mark phase

The tagging process can actually be represented more concretely in pseudocode:

/* Marks the function */mark_phase(){// Iterate over the global object, i.e., the rootfor(r : $roots) mark(*r)} /* Marks the object iterated over, and then continues iterating through the object's references. The marking process uses a depth-first algorithm */ mark(obj){// traversal is set totrueObjects are no longer processedif(obj.mark == FALSE) obj.mark = TRUE // Continue traversing the reference objects of the current object until all active objects have been traversedfor(child : children(obj))      
    	mark(*child) 
}
Copy the code

Sweep phase

The cleanup phase traverses the object’s flag bits one by one, starting with the first address of the heap.

The pseudocode is shown below:

sweep_phase() {/ /$heap_startPointers refer to the pointer sweeping = of the first object in the heap$heap_start// Limit the size of the heap when traversing itwhile(sweeping < $heap_end)
    if(Sweeping. Mark == TRUE) // Encounter is marked astrueThe object then cancels the flag bit and prepares for the next GC sweeping. Mark = FALSEelse// This is the key recycling process, first$free_listIt is the 'free linked list' from which the memory that a program needs is allocated. /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *$free_listIs a pointer to the free list. Assign it to the next field of the object to be reclaimed, and the object to be reclaimed becomes the head of the free list, i.e. : header. The object is added to the free list and is then allocated as free space to the application. Finally, assign the header pointer to$free_listThe variable, that's just theta$free_listPoint back to the free list and continue iterating... */ sweeping.next =$free_list
        $free_list= the size of the objects that are collected is kept, and this step is actually moving to the next object in the heap for collection. sweeping += sweeping.size }Copy the code

The state of the heap after completing a GC collection can be visualized in the following figure:








Reference counting method

GC is essentially a technique that studies how to free objects that can’t be referenced. So, you can imagine if you let an object keep track of whether or not it’s being referenced by the program. This is reference counting.









Reference counting method
new_obj()
update_ptr()


The new_obj() function allocates memory

The pseudocode is as follows:

New_obj (size){// Allocate space for program from 'free list' obj = pickup_chunk(size,$free_list// Failed to allocate spaceif(obj == NULL)    
    allocation_fail()  
  else// Set the counter for the successfully allocated object and initialize obj.ref_cnt = 1return obj 
 }
Copy the code

The update_ptr() function, which is referenced by the program

The pseudocode is as follows:

update_ptr(ptr, // dec_ref_cnt(* PTR) // PTR pointing to obj * PTR = obj} // /* var a = {}; /* var a = {}; var b = {}; A counter increases, b counter decreases, b=a; * /Copy the code

Inc_ref_cnt () function pseudocode

Inc_ref_cnt (obj){obj.ref_cnt++}Copy the code

Dec_ref_cnt () function pseudocode:

Dec_ref_cnt (obj){// if the counter is 0, the obj object is garbageifRef_cnt == 0) // Then objects referenced by obj should also decrementfor(child: children(obj)) dec_ref_cnt(*child)Copy the code

disadvantages

Cyclic reference, causing memory cannot be reclaimed

// Note: The example is only used to describe the scenario and does not conform to the actual environment

funciton Person(name,lover) { 
  this.name = name;
  this.lover= lover;
}

let jxy = new Person("jxy"); 
let xxx = new Person("xxx")
jxy.lover = xxx;
xxx.lover = jxy;

xxx = null;
jxy = null;

// When GC uses reference counting to manage memory, in the above example the variables are assigned null, but the two objects themselves are mutual
// reference, which results in memory not being recycled effectively, i.e., memory leaks
Copy the code

GC replication algorithm

The copy algorithm, as its name implies, copies all live objects in the heap into another space, and then reclaims all of the original space. This has the advantage of preventing fragmentation of memory and making it easier to allocate new space for the program later. It can be vividly understood as the following figure:

Replication algorithm

We call the original live object space **From ** space, and the new space we will copy To is called To space. When the **From ** space is fully occupied, GC copies live objects To the To space. When the copy is complete, the algorithm swaps the From space with the To space, and the GC ends. The GC replication algorithm is summarized as follows:









From












copying()

copying() {/ /$freeThe variable records the starting position of the To space$free = $to_start// Iterate over all root objects and copy them To the To space using the copy methodfor(r : $roots// The *r returned is the new pointer To the object after it was copied into the To space. *r = copy(*r) *r = copy(*r)$from_start.$to_start)}Copy the code

**copy()**

/* Copy copies the object given as an argument and copies its child recursively */ copy(obj){// The tag field of an obj object is used to indicate whether it is an assigned objectif(obj.tag ! // Use the copy_data method specifically To copy an obj object, passing in the address of the To space and the size of the obj d object copy_data($free, obj, obj.size) // When the copy is done, mark the object as having been assigned.$freeTag = COPIED // The forwarding field of an old obj object saves a pointer to the new obj object for later assigning it to the original pointing to obj.forwarding = in the program$free
        // $freeSkip the space of the obj that has been copied and point To the free position of the To space for the next replication$free+= obj.size // Recursively copy the reference object of the obj objectfor(child: children(obj.forwarding)) *child = copy(*child) // Returns a pointer to the new object when the copy is completereturn obj.forwarding
}
Copy the code

Chrome V8 garbage collection

So what is the recycle algorithm that most browsers use? I think this is probably the most important concern of friends. So take Chrome, our favorite, which uses a combination of recycling algorithms instead of a single algorithm. V8’s GC algorithm is collectively called generational garbage collection algorithm, that is, by recording the number of references of objects, the objects with more than a certain number of references are divided into old objects, the rest are called ** new generation objects, and then adopt different garbage collection algorithms for them. ** What is the advantage of this division? We know that most objects generated in programs are actually created and then discarded. For example, the following code generates an object inside the function. After the function completes execution and exits the stack, both the function itself and its internal variables are immediately garbage:

// The execution context of this function is not global
function foo() {
    var a = {c:1};
    var c = {c:2};
}
Copy the code

For this new generation of objects, the collection becomes frequent, and using the GC tag cleanup algorithm means that many objects need to be processed at a time, wasting a lot of time. Therefore, if GC replication algorithm is used for new generation objects, it is only necessary to copy the active objects and empty the whole From. There is no need to traverse the objects that need to be cleared to achieve the purpose of optimization. On the other hand, for old objects, they have multiple references, which means that they are less likely to become inactive objects, which means that old objects will not easily become garbage. Further is very little waste generated in the old object, if using the algorithm of copy do more harm than good, a lot of old objects are copied to copy to also increase burden, so on older objects using the tag QingChuFa, need to remove the old object is a minority, this tag will sweep algorithm has more advantages. * * ** And as the program is implemented, the new generation of objects will become old objects, which is a complicated process with limited capabilities, so I will just mention it here. Since objects are divided into new band pairs and old objects, how they are distributed in the heap is shown below:









To








This article refer to