1 what are the garbage collection algorithms

1.1 Reference Counting

Algorithm idea: Each cell maintains a field and keeps the number of references to it by other cells (similar to digraph entry). When the number of references is zero, it is reclaimed. Reference counting is progressive, spreading the overhead of memory management across the program. C++ share_ptr uses the reference calculation method.

Reference counting algorithm implementations typically put all units in a cell pool, such as a free list. Then all the units are strung together and reference counting is done. The newly assigned unit count is set to 1 (note that it is not 0, because applications usually say PTR = new Object). Each time a pointer is set to point to the cell, the cell’s value is incremented by 1; And every time you delete a pointer to it, its count decreases by one. When its reference count reaches zero, the unit is reclaimed. Although this is simple, there are a lot of details to consider when implementing it. For example, when deleting a unit, all units it points to need to be subtracted by 1 from the reference count.

advantages

  • Progressive. Memory management is intertwined with the execution of the user program, spreading the cost of GC across the program. Unlike The mark-sweep algorithm, which requires STW (Stop The World, suspend The user program while GC).
  • The algorithm is simple.
  • Fast memory reclamation. Because it is interwoven with the execution of the user program, objects with a zero reference count can be immediately reclaimed, so memory can be reclaimed quickly, unlike other garbage collection algorithms, which wait until memory runs out or reaches a certain threshold before garbage collection.

Disadvantages:

  • Cannot handle circular references. This is probably the most criticized flaw of all. However, there are many solutions to this problem, such as strong references
  • Reduce operating efficiency. The update and deletion of memory units need to maintain the reference count of relevant memory units, which reduces the efficiency of program operation.

1.2 Mark Clearing

Algorithm idea: the local variables and any static variables on the stack are called roots. The algorithm is divided into two stages. The first stage is marking. All root variables are iterated over and marked, recursively marking the variables it references. In the second stage, all objects are traversed. The unmarked objects can be reclaimed, and the marked objects can be removed for the convenience of re-marking next time. Stage 1 pseudocode

func mark(objectP) {
	if! objectP.marked { objectP.marked =true
		for p := range objectP.refedObjs {
			mark(p)
		}
	}
}
Copy the code

Stage two pseudocode

func sweep(a) {
	for p range in the heap {
		if p.marked{
			p.marked = false
		}else{
			head.release(p)
		}
	}
}
Copy the code

Because tag scavenging keeps track of all objects accessed by root, it can properly mark and perform garbage collection even when there is a circular reference, which is the biggest advantage of tag scavenging, and there is no additional memory overhead and maintenance for objects.

The disadvantage is that you need to stop the Word during garbage collection, which affects the operation of user programs.

1.3 Node Replication

Also called copying algorithms. The free memory is initially divided into two parts, the FROM field and the TO field. Only the FROM field is used each time, and the TO field is free. When the FROM field runs out of memory and GC is performed, the surviving objects in the FROM field are copied to the TO field and then the FROM field is cleaned directly.

The JVM divides the Heap into new generation and old generation, Then it divides the Cenozoic generation into Eden and two Survivor Spaces, and then performs Copying between Eden — >Survivor Space and From Survivor Space and To Survivor Space Algorithm. However, when the JVM applies the coping algorithm, memory is not divided 1:1, which is a waste of memory space. The average JVM is 8:1. In other words, the ratio of Eden :From :To is 8:1:1. Always 90 percent of the space is available for creating objects, while the remaining 10 percent is used to store objects that survive recycling.

General steps:

  • 1. When Eden is full, the first Young GC is triggered to copy surviving objects to Survivor From zone; When Young GC is triggered in Eden area again, the Eden area and From area will be scanned and garbage collected in the two areas. The surviving objects will be directly copied To To area and Eden and From area will be emptied.
  • 2. When young GC occurs in Eden again, the Eden and To regions will be garbage collected, the surviving objects will be copied To the From region, and Eden and To regions will be emptied.
  • 3. Visible objects are copied back and forth between the From and To sections 15 times (determined by the JVM parameter MaxTenuringThreshold, which defaults To 15), and eventually saved To the old age if they are still alive

Advantages: In the case of few viable objects, high performance, can solve the memory fragmentation and reference update of the mark clearing algorithm

Disadvantages: will cause part of the memory waste. However, the memory block size can be adjusted according to the actual situation. If the number of surviving objects is large, the performance of coping becomes poor.

1.4 Collection by generation

A major problem with trace-based garbage collection algorithms (mark-sweep, node copy) is that they waste time on objects with long lifetimes (objects with long lifetimes do not require frequent scanning). Also, there is the fact that memory allocation is “most object die young”. Based on these two points, the generational garbage collection algorithm stores objects in two (or more) regions on the heap, called generations, over their lifetime. The recycling frequency of the Cenozoic area is significantly higher than that of the old area.

When assigning objects, you assign them from the new generation, and if you later discover that objects have a long life span, you move them to the old generation, a process called promote. As we promote, the size of the new generation will not take up a large proportion of the heap. When collecting, focusing on the Cenozoic generation will be relatively more efficient and the STW time will be shorter.

Advantages: Better performance. Objects with long life cycle have low GC frequency, while objects with short life cycle have high GC frequency. Most objects have short life cycle, which also reduces STW time.

Disadvantages: complex algorithm implementation.

2 go garbage recycling mechanism — three-color marking method

Tricolor marking method is an improved marking clearing algorithm, which mainly improves the marking part and makes garbage collection and user program execute concurrently

Algorithm idea:

  • First put all the objects in the white set
  • The object is traversed from the root node, and the white object traversed is moved from the white set into the gray set
  • Iterate over the objects in the gray set, putting the objects in the white set referenced by the gray object into the gray set, and putting the objects in the gray set traversed into the black set
  • Loop the previous step until there are no objects in the gray set
  • There are no objects in the gray collection, and the objects in the white collection are unreachable objects, which are garbage, which are collected

2.1 write barriers

Go does not have STW during the tricolor marking, that is to say, the object can still be modified at this time, which will cause problems. When the gray set is scanned during the tricolor marking, object A is scanned and all references of object A are marked, as shown in the figure belowAt this point, the reference to object D is scanned, at which point another Goroutine changes the reference to D->E to the one shown belowWill this result in E objects not being scanned and being mistaken for white objects, i.e. garbage

The write barrier is designed to solve this problem. After the introduction of the write barrier, E is considered alive after the above steps, even if it is discarded by A later, E will be reclaimed in the next GC, which will not be reclaimed.

The write barrier principle: At the beginning of each memory write, the compiler generates a short code block to ensure that constraints are not broken. When changing A reference to A->E, there is A small piece of code that disables this operation. This little piece of code is a constraint, and the constraint is that black objects cannot reference white objects.

Marking stage: Originally, STW was needed to mark nodes, but in order to avoid STW, the marking stage was also made concurrent with the user program without STW. The marking coroutine was jointly completed by multiple MarkWorker Goroutines. They were bound to P before the recycling task was completed, and then went into hibernation until they were awakened by the scheduler. They are executed concurrently with the user coroutine.

Two problems arise when the marking phase is concurrent with the user program: 1. Objects created by the user program. In the marking phase, objects created by the user may not be marked. Therefore, the object created by the user is directly considered black, and the tag will not mark it, so that the user program will create the object no problem. 2. The user program modifies the reference relationship of the object. Write barriers do not allow the black object reference white objects directly, of course write barriers is not really don’t let the black object reference white object, but rather a signal, the garbage collector will capture the signal after knew this object change, and scanning the object again, see the references or referenced whether be changed, This makes use of state resets to determine whether an object is alive or dead when its state changes.

So when does the marking phase STW?

Cleanup phase: No STW, because the cleanup is white objects, which cannot be used by the user program, so the cleanup program can execute concurrently with the user program without STW.

Concurrent cleanup is essentially an infinite loop that wakes up to perform cleanup tasks. The memory collector is triggered to reclaim by traversing all span objects. After the task is complete, sleep again and wait for the next task

To summarize some of the GO GC’s three-color labeling process:

The three-color notation scheme supports parallelism, meaning that user code can be run at the same time as GC code. Specifically, Golang GC is divided into several stages:

  • The Mark stage is divided into three parts:
    • Mark Prepare: Initializing the GC task, including enabling the write barrier and mutator assist (GC), and counting the number of tasks on the root object, requires STW.
    • GC Drains: Scans all root objects, including global Pointers and Pointers on the Goroutine (G) stack that will be stopped while scanning, adding them to a labeled queue (a gray queue), and cycles through grey-queued objects until the gray queue is empty. The process is executed in parallel in the background.
    • Mark Termination phase: This phase completes the marking and re-scans the global pointer and stack. Not only will new object assignments and Pointers not performed in A DRAIN, GC tests will require a write barrier to perform a drain, but re-scans will also perform a drain.
  • Sweep: Reclaims all white objects as a result of the markup, which is done in parallel in the background.
  • Sweep Termination: Sweep unswept spans. A new GC can be started only after the Sweep of the previous GC is complete.

To recap, Golang’s GC process will have two STW’s: the first STW will prepare for a scan of root objects, initiate a Write Barrier and a mutator assist GC in preparation for a DRAIN phase. A second STW will rescan a portion of root objects, disabling Write barriers and mutator assist GC as the GC drain phase is over.

The main reason the GO GC has so many phases is to allow the GC to execute concurrently with the user program. Split into phases, only go to STW when it is necessary, so that the other phases can be executed concurrently with the user phase.

Each STW takes very little time, generally within 1ms.

2.2 GO GC trigger time and conditions

(1) Trigger time. When an object larger than 32K bytes is allocated to the heap, the system checks whether garbage collection conditions are met. If yes, the system collects garbage.

(2) Trigger conditions.

// gcShouldStart returns true if the exit condition for the _GCoff
// phase has been met. The exit condition should be tested when
// allocating.
//
// If forceTrigger is true, it ignores the current heap size, but
// checks all other conditions. In general this should be false.
func gcShouldStart(forceTrigger bool) bool {
    return gcphase == _GCoff && (forceTrigger || memstats.heap_live >= memstats.gc_trigger) && memstats.enablegc && panicking == 0 && gcpercent >= 0
}
Copy the code

ForceTrigger is the forceGC flag; The number of active objects on the heap is greater than the GC trigger threshold we set during initialization. Heap_live will be updated all the time on Malloc and free.

3 GO GC tuning

1. Hard parameters

When reachable*(1+ GOG0/100)=8M, the GC is triggered to perform related GC operations.

Therefore, the GOGC parameter can be adjusted based on the memory footprint of the process to reduce the number of GC triggers

2. Tiops at code level

(1) Reduce the allocation of objects: the so-called reduction of the allocation of objects, in fact, as far as possible to achieve the reuse of objects. For example, the following two functions:

func(r*Reader)Read(a)([]byte,error)
func(r*Reader)Read(buf[]byte)(int,error)
Copy the code

The first function has no input parameter, so the first function allocates space and returns each time; The second function has an input parameter buf, so the passed buF can be used inside the function without allocating memory.

(2) do less string and []byte conversion. Reduce GC pressure

(3) Use less string concatenation. String concatenation with + generates a new object

(4) Predict the size of the array in advance. Always initialize the array with make, even if the size is 0, but it’s best to anticipate the size so you don’t have to expand it too often at append

4 reference

[1] Go garbage collection [2] Golang garbage collection analysis [3] Garbage collection in the way of marked clearance [4] Why Go in GC STW time is very short? [5] Garbage collection algorithm (count, mark, copy), Golang garbage collection [6] Golang source code exploration (three) GC implementation principle [7] in-depth understanding of Go- garbage collection mechanism [8] Java garbage collection copy algorithm