Take you through garbage collection algorithms

This is the sixth day of my participation in the August Text Challenge.More challenges in August

When it comes to writing code, there is also the problem of garbage collection. By garbage, I mean the memory space that is no longer needed in the program. Garbage collection refers to the collection of the memory space that is no longer needed so that the program can reuse the freed memory space.

So how does garbage collection work without algorithms? In this class, we will discuss the Java garbage collection algorithm.

Mark-sweep algorithm

Mark – Remove, as the name suggests, marks the garbage and then removes it. It is the most basic garbage collection algorithm, and many subsequent algorithms are based on it to improve. The mark-clear algorithm is divided into two stages:

Mark phase: Objects that need to be reclaimed. This process is essentially the process of using reachability analysis to determine whether an object is garbage.

Cleanup phase: After the marking is complete, the objects to be reclaimed are uniformly cleaned up.

As shown in the figure below:

The first step is to mark which objects are alive and which are recyclable. When the tag is complete, delete the object it recycles. You can see that in this picture. Mark clearing has some disadvantages. After mark clearing, a large number of discontinuous memory fragments will be generated. Too much space fragment may cause the program to be unable to find enough continuous memory and have to trigger another garbage collection action in advance when it needs to allocate large objects during the running process. For example, if we want to allocate three units of memory in a row, we need to allocate a large byte array. There is no way to do this in this memory structure.

Mark-compact algorithm

So let’s take a look at the mark-unscramble algorithm, and there are some articles that translate it into mark-compress, which is generally called mark-unscramble in this course, so how does mark-unscramble work?

First and marked for recycling of objects, the process and tag removal is the same, but after the completion of a tag is not directly to the recycling of clear object, but all the live objects to one end of the memory, the last in clearing out all the space boundaries, so does not produce memory fragments, increase the efficiency of memory, this algorithm is suitable for old age. As shown in the figure below:

It marks which objects are alive and which can be reclaimed, then compresses the surviving objects to one end of memory, and finally removes the recyclable objects. This avoids the problem of memory fragmentation, but it is also time consuming in the process of marking and defragmenting moves.

Copy algorithm

Replication algorithm generally this is a play, the memory is divided into two pieces, each time USES only one block, then put in use this memory inside of live objects are copied to not use of memory, and then clean up is using the memory all the objects in it, and then the role of the exchange of two pieces of memory, waiting for the next collection. As shown in the figure below:

Divide a block of memory into two pieces, use only one piece at a time, and when doing garbage collection, move the surviving objects to the other end of memory, then clean up all objects in this block. The next collection will do the same, moving the surviving objects back and then clearing out all objects in the memory.

Compare the three algorithms

These are the three basic garbage collection algorithms. Let’s compare them.

Advantages of the mark-clear algorithm: it is relatively simple to implement.

Disadvantages: Memory fragmentation exists. In addition, the speed of memory allocation is also affected when using the tag clearing algorithm. Why is this? You know, let’s say I have a relatively large object, because now I have a lot of fragmented memory. If you want to find a contiguous space, you need to facilitate the free linked list to find out which chunk of memory can hold such an object. So in extreme cases, you need to go through the entire list to know where the object should be allocated, or there is no way to allocate it at all. Traversing the free list takes time, so the speed of memory allocation is affected.

Advantages of the mark-defragment algorithm: no fragmentation.

Cons: Collation is expensive. Because markup collation requires gathering objects into one end of memory, this process requires calculation and time, and the more objects, the more memory, the more the collation costs.

The advantage of the replication algorithm is good performance and no fragmentation. In general, the copy algorithm performs better than the mark-clean and mark-tidy algorithms, because unlike the mark-clean or mark-tidy algorithms, which need to mark which objects are alive and which objects can be reclaimed, only the surviving objects need to be found. Then move the surviving object. So there are some performance advantages.

Disadvantages: Low memory usage. We can only use half of the entire memory at a time, and memory utilization is only 50% at most.

Two integrated algorithms

Ok, after exploring the three basic garbage collection algorithms, let’s look at two relatively comprehensive garbage collection algorithms in Java.

The first is the generational collection algorithm. The idea of generational collection algorithm is to divide a memory into multiple regions, and different regions are reclaimed using different collection algorithms. The generation collection algorithm is complex and extremely detailed. We will discuss this in detail below.

The second is the incremental algorithm. You know, if you have a very large memory, if you collect all the garbage at once, it will take a long time, which may cause the system to pause for a long time. So the idea of the incremental algorithm is to collect only a small area of memory space garbage at a time, so that the system can reduce pauses.

Generational collection algorithm

At present, the heap memory of various commercial virtual machines is basically collected by generation. So you can imagine how important generational collection algorithms are.

The idea of the current algorithm is to divide the memory into multiple regions according to the life cycle of objects, and then different regions use different garbage collection algorithms to collect objects. Java divides the heap into new generation and old generation. The following figure is illustrated in detail earlier when we explored Java memory structures.

So over the generations, garbage collection can be divided into the following categories:

One is the collection of the new generation (Minor or Young GC).

The second is the old collection (Major GC).

The third is Full GC of the whole heap.

Since a Major GC is usually accompanied by a Minor GC, it can be said that the Major GC is ≈ Full GC. So a lot of times, programmers talk about the same thing when they talk about Major GC and Full GC.

Let’s take a look at how objects are allocated to the heap memory, and let’s talk about the structure of the heap memory. When objects are created, they are stored in Eden first. When Eden is full, garbage collection is triggered. The process of garbage collection is To copy From or To Survivor of the surviving objects in Eden.

For example, if the first collection is From Survivor, the next collection copies the surviving object From Survivor To Survivor. The next time it copies the surviving object From Survivor into From Survivor and repeats.

It is not hard to see that this recycling process uses a replication algorithm, which is why the new generation has two survivors. Because copy algorithms need to split a memory in two. Each time an object goes through a garbage collection, its age increases by +1 if it survives. When an object’s age reaches the threshold, which is 15 by default, it is promoted to the old age.

In the old age, the survival rate of the object is generally relatively high, because you think, have recovered 15 times, or failed to recover, so the possibility of continuing to survive is still relatively large.

Older objects are usually recycled using either mark-clean or mark-clean algorithms. It should be noted here that this object allocation process is a typical allocation process, with some exceptions:

First, the interest rate of a newly created object will always be allocated to Eden, or it may directly enter the old age. There are two main scenarios. One, if your object size is larger than this threshold it will be allocated to the old age. Of course, the default value of this parameter is zero, which means that all objects will be allocated in Eden first. Second, if your object is very large, for example, a large array, the new generation of space is not enough, then this time will directly put the object to the old age to guarantee. The main reason for allowing objects to be directly allocated to the old age is that the new generation uses the replication algorithm. Allocating large objects in Eden will result in a large number of memory copies between Eden and two Survivor areas.

Second, the object does not have to reach the age threshold to enter the old age. Virtual machines have a concept of dynamic age, which means that the total size of all objects of the same age in the Survivor zone is already greater than half of the Survivor space. At this point, if an object is older than this age, it will go straight to the old age, for example, if you have a bunch of objects whose age is 9, and the sum of all the objects whose age is 9 is greater than half of the Survivor space, then the objects whose age is greater than 9 will go straight to the old age.

Conditions that trigger garbage collection

Let’s look at the conditions that trigger garbage collection in different regions:

Let’s take a look at the trigger conditions for Cenozoic generation collection. If Eden has insufficient space, Minor GC will collect Cenozoic generation.

Let’s look at the Full GC trigger condition. The Full GC trigger condition is a bit more complicated, and there are several scenarios:

One is, the old age space is insufficient, the old generation space is insufficient and divided into two situations: the first is the space is really insufficient. The second is memory fragmentation, which causes no contiguous memory to allocate objects, triggering the Full GC.

Second, the lack of meta-space.

Third, after a new generation is recycled, the space occupied by the object to be promoted to the old age is larger than the remaining space of the old age, which will also trigger Full GG.

Fourth, system.gc () is explicitly called. System.gc() is used to suggest garbage collection container diameter garbage collection. This code triggers Full GC. You can also ignore system.gc () calls with the parameter -xx :DisableExplicitGC.

Ok, so here we can briefly summarize, generation collection algorithm is based on the life cycle of the object, do the generation of memory. Then, in the allocation of objects, objects with different life cycles are placed in different generations, and appropriate recycling algorithms are selected for recycling on different generations.

For example, the life cycle of objects in the new generation is relatively short, and a large number of objects are found dead every time the garbage is collected. So IBM has done research that more than 98% of objects die quickly, and only a small number of objects survive, so the new generation can use replication algorithms to complete garbage collection. In the old age, the survival rate of objects is relatively high, so the use of mark-clearing algorithm or mark-sorting algorithm for recycling.

So what are the benefits of three generations of simple mark-clean, mark-clean and copy algorithms?

First of all, generation can more effectively remove unwanted objects. For objects with a short life cycle, objects will be recycled when they are still in the new generation.

Secondly, generation improves the efficiency of garbage collection. If you don’t do generational, you need to scan the entire heap. Now you just scan the new generation or the old.

conclusion

Ok, just to sum up, in this class we discussed five algorithms for garbage ports. The basic garbage collection algorithms are mark-sweep, mark-collation, and copy. In addition, two comprehensive garbage collection algorithms, namely generational collection algorithm and incremental algorithm, are discussed in detail.

Finally, the tuning principle of generational collection algorithm is summarized as follows:

First, set the size of the Survivor zone properly, because the Survivor zone does not use much memory. If the configuration is too large, the memory waste will be serious.

Second, try to make GC as minor-level as possible.

Minimize the occurrence of Full GC.

Also, summarize JVM parameters for heap memory.

parameter	role	The default
-XX:NewRatio=n	Old age: Memory size ratio of new generation	2
-XX:SurvivorRatio=n	Eden: Ratio of memory size to survivor zone	8
-XX:PretenureSizeThreshold=n	Object size This value is allocated in the old age, 0 means no restriction	0
-Xms	Small heap memory required	–
-Xmx	Maximum heap memory	–
-Xmn	Cenozoic size	–
-XX:+DisableExplicitGC	Ignore calls to system.gc ()	To enable the
-XX:NewSize=n	Initial memory size of new generation	–
-XX:MaxNewSize=n	New generation maximum memory	–

Take you through garbage collection algorithms

Mark-sweep algorithm

Mark-compact algorithm

Copy algorithm

Compare the three algorithms

Two integrated algorithms

Generational collection algorithm

Conditions that trigger garbage collection

conclusion

Related Posts

RocketMQ source learning (iii)-Producer business

What are some good books to read about TensorFlow, the popular algorithm framework?

LoadRunner- Summary (I)