JVM garbage collection algorithms and common garbage collectors

preface

In this article, we will focus on garbage collection in the JVM, including the common garbage collection algorithm and the common garbage collector.

Front knowledge

Garbage collection is mainly concentrated in the Java heap, allocating space for objects and recycling garbage object space

Before we get into garbage collection, we need to know how objects are allocated in heap memory. Objects are allocated according to the following three principles:

Objects are allocated in Eden area first

The heap space is divided into new generation and old age, which is mainly for the purpose of better regional garbage collection for objects with different lifetimes. When an object is allocated memory for the first time and there is enough space in the Eden region, it is allocated in the Eden region first.
Big object goes straight to the old age

When an object needs a large space and Eden does not have such a large space to allocate, the object will be allocated in the old age
Long-lived objects enter the old age

The virtual machine uses a generational collection algorithm to manage heap memory. If an object allocated to Eden is still alive after a Minor GC and Survivor has enough space, the surviving object is moved to one of the two Survivor zones and its age becomes 1. When a threshold is reached, the object will be promoted to the old age. The parameter -xx :MaxTenuringThreshold can set this promotion threshold.

The object is dead

Now that we know the basics, we should think about garbage collection. What objects are garbage?

Identifying garbage relies on two algorithms

Reference Counting

Every reference to an object is counted, with a counter +1 for every reference to the object, and -1 for every reference to fail. The object header holds the reference count. Objects with a count greater than 0 are considered alive, and objects with a count less than 0 can be reclaimed. The Recycler algorithm can be used to solve the Recycler problem, but it is not used in The Java virtual machine in a multi-threaded environment where changes to the Recycler reference count are expensive and slow to be synchronized.
Tracing GC

These referenced nodes form a reference chain. Objects that can be searched are considered reachable. If there is no reference link between the object and GC Root, the object will be reclaimed later. In the figure below, the green objects are the living objects and the gray objects are the recyclable objects.

Supplement:

Objects that can be used as GC Root objects:

The object referenced in the virtual machine stack (the local variable table in the stack frame)
Objects referenced in the Native method stack
The object referenced by the class static property in the method area
The object referenced by the constant in the method area
All objects held by the synchronization lock

Classification of references

A series of garbage collection operations are related to references, whether it is the reference counting algorithm to count the number of references to an object, or the reachability analysis algorithm to determine whether an object is reachable by segment. Before jdk1.2, the definition of references was relatively traditional. After 1.2, references were divided into four types: strong references, soft references, weak references, and virtual references.

Strong reference

Java programmers mostly use strong references. As long as an object has a strong reference relationship, the garbage collector will never reclaim it. Even in the case of low memory, the virtual machine would rather throw OOM errors than reclaim it.
Soft references

An object with a SoftReference is a useful but unnecessary object. If the system memory is insufficient, the object associated with a SoftReference is reclaimed by a vm. You can use the SoftReference class to implement a SoftReference.
A weak reference

Objects with weak references are also non-essential, but less strong than soft references. Weak references only survive until the next garbage collection occurs. The difference with soft references is that weak references are collected whenever garbage collection occurs, regardless of whether there is enough memory.
Phantom reference

Phantom reference belongs to that can be referenced when it does not exist, it is one of the weakest reference relationship, whether an object has a virtual reference completely will not affect the life cycle of an object, the purpose of existence of virtual reference only to the object by the garbage collector recovery when receive a system notification, can be done by PhantomReference class reference.

Garbage collection algorithm

After being able to identify garbage objects, how to recycle these garbage objects seems to be a problem, in the Java virtual machine to solve this problem mainly rely on garbage algorithms, the following details of the following several common garbage collection algorithms.

Generation collection theory

In today’s virtual machines, the design basically follows the theory of “generational collection”, which is based on his rule of thumb. It is based on two generational hypotheses:

The weak generation hypothesis: Most subjects die young.

Strong generation hypothesis: objects that survive multiple garbage collections have a good chance of surviving.

The above two hypotheses can be summed up in one sentence: the strong are always strong, and the weak are always weak.

The generation collection theory in Java VIRTUAL machine is shown as follows: Will heap space is divided into two areas: the new generation, old age, according to different area with different garbage collection algorithms, because most of the object is in the new generation of space allocation at the beginning, in the new generation of most objects are comply with the principle of facing life in death, so the new generation is divided into Eden and Survivor of two pieces of the same size, The surviving objects switch back and forth in the Survivor zone until the age of the object reaches the promotion threshold and enters the old age.

Mark-clean algorithm

The marking – clearing algorithm can be divided into “marking” and “clearing” stages. First, mark all objects that need to be reclaimed. After the mark is completed, all marked objects are reclaimed.

Its disadvantages are also obvious: it causes a lot of memory fragmentation, requires a lot of marking actions, and is less efficient.

Mark-copy algorithm

Principle: Divide the memory into two areas of the same size, use only one of them at a time, when one of the memory is used up, the surviving objects are moved to another area, and then the used memory is cleaned up once. The advantage of this algorithm is that it solves the problem of large amount of memory fragmentation caused by the marker clearing algorithm, and this algorithm has been applied in the new generation. The disadvantages are also obvious: only half of the space is used at the same time, which wastes more space.

Mark-collation algorithm

Principle: first to mark of garbage objects, then move the object to one end of the memory of survival, and then clean up memory area boundary, the difference between it and mark sweep algorithm is that it’s the middle of a move, he has an advantage in the efficient use of memory space to minimize the memory fragments at the same time.

A common garbage collector

With the above garbage collection algorithm, garbage collector is the concrete practice of garbage collection algorithm. Different garbage collectors have their own advantages and disadvantages, and are applicable to different scenarios, so it is up to us programmers to choose different garbage collectors according to different scenarios.

Serial collector

Serial collector is the oldest collector, it is based on the single-thread collector, the Serial collector when working, must suspend all other worker threads until it is finished collecting, from this can be seen in the garbage collection, will cause a pause, will affect the user experience. But even so, the Serial collector has a high collection efficiency with single threads and is well suited for virtual machines running in client mode.

ParNew collector

The ParNew collector can be understood as a multithreaded version of the Serial collector. It uses multiple threads to collect garbage during the garbage collection phase, but otherwise behaves exactly like the Serial collector. After JDK9, ParNew can only be used with CMS

Parallel avenge

The Parallel Collector is a new generation collector based on the mark-copy algorithm. The Parallel Insane collector is similar to the ParNew collector, but the focus of the Parallel Insane collector is to achieve a controlled throughput. The ratio of the amount of time the processor spends running user code to the total elapsed time the processor consumes. The Parallel Scavenge avenge provides a number of parameters to find the most appropriate pause times or maximum throughput. It is also a good idea to turn memory management optimization over to the VIRTUAL machine using the Parallel Insane collector in conjunction with an adaptive adjustment strategy.

Serial Old collector

Serial Old collector An older version of the Serial collector, also a single-threaded version, is intended for use by virtual machines in client mode. It is used in server mode in two ways: 1. Pair with the Parallel Scavenge collector in JDK5 and previous versions, and 2. As a fallback in case the CMS collector fails.

Parallel Old collector

Parallel Old is an older version of the Parallel Avenge collector, supported by multiple threads for concurrent collection and implemented on a mark-collation algorithm.

CMS collector

The CMS (Concurrent Mark Sweep) collector is concerned with minimizing the pause time of user threads during garbage collection. The shorter the pause times (low latency), the better the application that interacts with the user, and the better the response speed improves the user experience. At present, a large part of Java applications are concentrated on the server side of Internet sites or B/S systems. These applications pay special attention to the response speed of services and hope to have the shortest system pause time to bring users a better experience. The CMS collector is a good fit for such applications.

The CMS collector is a collector whose goal is to obtain the shortest collection pause time. It is perfectly suited for use in ux focused applications. The CMS collector is the first truly concurrent collector for the HotSpot VIRTUAL machine, allowing the garbage collector thread to work (basically) at the same time as the user thread.

The whole process of CMS is more complex than the previous collector. The whole process is divided into four main stages, namely the initial marking stage, concurrent marking stage, re-marking stage and concurrent clearing stage. (The main stages involved in STW are: initial marking and re-marking)

Initial-mark phase: In this phase, all worker threads in the program are briefly paused by stop-the-world, and the main task of this phase is simply to Mark objects that GC Roots can be directly associated with. Once the tag completes, all application threads that were suspended will be resumed. Because the directly related object is small, the speed here is very fast.
Concurrent-mark phase: The process of traversing the entire object graph from the directly associated objects of GC Roots. This process is time-consuming but does not require the suspension of the user thread and can be run concurrently with the garbage collection thread.
Re-marking phase: Because in concurrent mark phase, the program will work thread and garbage collection threads run at the same time or cross operation, thus to correct during concurrent tags, tags that changes caused by the user program continue to operate the part of object mark record, this phase of the pause time usually slightly longer than the initial mark phase, It also causes “stop-the-world” to occur, but for a much shorter time than the concurrent marking phase.
Concurrent-sweep phase: This phase removes dead objects judged by the concurrent-sweep phase, freeing up memory. Since there is no need to move live objects, this phase can also be concurrent with the user thread.

Advantages and disadvantages of CMS:

advantages

Low latency disadvantage of concurrent collection

Memory fragmentation occurs, resulting in insufficient space for user threads after concurrent cleanup. In the case that large objects cannot be allocated, the Full GC has to be triggered early. The CMS collector is very sensitive to CPU resources. In the concurrent phase, it does not cause user pauses, but it does slow down the application and reduce overall throughput by taking up a portion of the threads. The CMS collector cannot handle floating garbage.

G1 collector

G1 (garbage-first) Garbage collector is a new Garbage collector introduced after Java7, which is one of the most advanced achievements in the development of collector technology. At the same time, in order to accommodate today’s expanding memory and increasing number of processors, pause time is further reduced while maintaining good throughput. The official goal for G1 is to achieve the highest throughput possible with manageable latency, so the G1 collector is officially called a “fully functional collector” by Oracle. G1 is a server-oriented garbage collector for machines with multiple processors and large amounts of memory.

When using the G1 collector, it divides the entire Java heap into approximately 2048 independent Region blocks of the same size. Each Region block size is determined by the actual size of the heap, and the overall Region block size is controlled between 1MB and 32MB, and is controlled to the NTH power of 2, namely 1MB, 2MB, 4MB, 8MB, 16MB, and 32MB. Can be achieved by

XX: G1HeapRegionSize is set. All regions are the same size and do not change during the lifetime of the JVM.

Although the concept of Cenozoic and oldyn is still retained, Cenozoic and oldyn are no longer physically separated; they are collections of parts of regions (which do not need to be continuous). Dynamic Region allocation is used to achieve logical continuity.

A Region may belong to an Eden, Survivor, or Old/Tenured memory Region. However, a Region can belong to only one role.

Features:

Parallelism and concurrency: The G1 takes full advantage of CPU, multi-core hardware, and uses multiple cpus (cpus or CPU cores) to reduce stop-the-world pause times. While other collectors would have paused GC actions performed by Java threads, the G1 collector can still allow Java programs to continue executing concurrently.
Generational collection: Although G1 can manage the entire GC heap independently without the cooperation of other collectors, the concept of generational collection is retained.
Spatial integration: Different from CMS’s “mark-clean” algorithm, G1 is a collector based on the “mark-clean” algorithm as a whole; Locally, it is based on the mark-copy algorithm.
Predictable pauses: This is another big advantage G1 has over CMS. Reducing pause times is a common concern of BOTH G1 and CMS, but G1 also models predictable pause times, allowing users to explicitly specify a time segment of M milliseconds in length.

The G1 collector operates in the following steps:

Initial tag: Mark objects that GCRoots can associate directly with
Concurrency marker: It takes a long time to analyze the reachability of objects in the heap starting from GCRoot to find objects to recycle
Final sign: Do a short pause on the user thread
Filter reclamation: Updates the statistics of a Region, makes a reclamation plan based on the pause time expected by users, copies the surviving objects in the Region to be reclaimed to an empty Region, and clears the entire old Region space.

The G1 collector maintains a priority list behind the scenes, prioritizing the Region with the greatest collection value (hence its name garbage-first) based on the allowed collection time each time. This use of regions and prioritized Region collection ensures that the G1 collector can collect as efficiently as possible in a limited amount of time (dividing memory into pieces).

ZGC collector

ZGC is an experimental low-latency garbage collector added in JDK 11. It is designed to minimize the impact on throughput and achieve low latency that can limit garbage collection pauses to less than 10 milliseconds at any heap memory size. Is a garbage collector based on Region memory layout, no generation, the use of read barrier, dye pointer and memory multiple mapping technology to achieve concurrent mark-collation algorithm, with low latency as the primary goal. At present, JDK8 is mostly used in the production environment, so it is not used much. If you are interested, you can have a look at this article by Meituan technical team: Exploration and practice of the new generation of garbage collector ZGC.

Reference: Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (3rd edition)

————-end————

Java virtual machine will continue to update the related articles, to have a little wave of attention to go. Wechat public search Java wind