JVM garbage Collection

Writing in the front

This section often meet questions

The answers to these questions are given in the passage

How to tell if an object is dead (two ways).
This section briefly introduces strong references, soft references, weak references, and virtual references (the differences between virtual references and soft references and weak references, and the benefits of using soft references).
How to determine if a constant is discarded
How do you tell if a class is useless
What are the algorithms for garbage collection and their characteristics?
Why HotSpot is divided into new generation and old generation?
What are the common garbage collectors?
Introduce the CMS,G1 collector.
What’s the difference between the Minor AND Full Gc?

Text fuse

These “automated” technologies need to be monitored and tuned as necessary when various memory overflow issues need to be addressed and when garbage collection becomes a bottleneck for higher concurrency.

Demystify JVM memory allocation and reclamation

Automatic memory management in Java focuses on object memory reclamation and object memory allocation. At the same time, the core function of Java automatic memory management is the allocation and reclamation of objects in heap memory.

The Java Heap is the primary area managed by the Garbage collector and is therefore also known as the Garbage Collected Heap. From the point of view of garbage collection, the Java heap can be subdivided into: new generation and old generation: Eden space, From Survivor, To Survivor space, etc. The purpose of further partitioning is to better reclaim memory, or to allocate memory faster.

Basic structure of heap space:

The Eden area, From Survivor0(” From “) and To Survivor1(” To “) shown in the figure above all belong To the Cenozoic era, while the Old Memory area belongs To the Old age.

In most cases, the object will be allocated in Eden first. After a Cenozoic garbage collection, if the object is still alive, it will enter S0 or S1 and its age will be increased by 1(the initial age of the object becomes 1 after Eden ->Survivor). When it reaches a certain age (15 by default), it is promoted to the old age. The age threshold for the object to be promoted to the old age can be set by using -xx :MaxTenuringThreshold.

Fix (issue552) : “As Hotspot traverses all objects, it accumulates the size it occupies from small to large, and when it accumulates a certain age that exceeds half of the survivor zone, it takes that age and a smaller value of MaxTenuringThreshold as the new promotion age threshold.”

The code for dynamic age calculation is as follows
Uint ageTable::compute_tenuring_threshold(size_t survivor_capacity) {// SURVIVor_capacity is size_t of survivor Spaces desired_survivor_size = (size_t)((((double) survivor_capacity)*TargetSurvivorRatio)/100); size_t total = 0; uint age = 1; while (age < table_size) { total += sizes[age]; // Sizes are sizes for each age if (total > desired_survivor_size) break; age++; } uint result = age < MaxTenuringThreshold ? age : MaxTenuringThreshold; . } 1234567891011121314Copy the code

After this GC, the Eden area and the “From” area have been cleared. At this point, “From” and “To” switch roles, so the new “To” is the “From” before the last GC, and the new “From” is the “To” before the last GC. Either way, the Survivor region named To is guaranteed To be empty. The Minor GC repeats this process until the “To” section is filled, and when the “To” section is filled, all objects are moved To the old age.

1.1 Objects are preferentially allocated in Eden area

The current mainstream garbage collector will use generational collection algorithm, so we need to divide the heap memory into new generation and old generation, so that we can choose the appropriate garbage collection algorithm according to the characteristics of each generation.

In most cases, objects are allocated within the Eden region of the new generation. When the Eden area does not have enough space to allocate, the virtual machine will initiate a Minor GC. Let’s do the actual test below.

Testing:

public class GCTest { public static void main(String[] args) { byte[] allocation1, allocation2; allocation1 = new byte[30900*1024]; //allocation2 = new byte[900*1024]; }} 12345678Copy the code

Run by:

Added parameters:-XX:+PrintGCDetails

Run result (red font is incorrect, should correspond to persistent generation of JDK1.7) :

We can see from the figure above that the Eden area memory is almost fully allocated (even if the program does nothing, the new generation will use more than 2000 K of memory). What happens if we also allocate memory to allocation2?

allocation2 = new byte[900*1024];
1
Copy the code

Here’s a quick explanation of why this is the case: Since allocation2 is allocated almost all of its memory in Eden, we just said that when there is not enough space in Eden to allocate memory, The virtual machine will initiate a Minor GC. During the GC, the virtual machine finds that allocation1 cannot be stored in Survivor space, so the virtual machine has to use allocation guarantee mechanism to advance the new generation of objects to the old age, where there is enough space to store Allocation1. So there will be no Full GC. After a Minor GC, memory is allocated in the Eden region if the later allocated objects can have Eden region. This can be verified by executing the following code:

public class GCTest { public static void main(String[] args) { byte[] allocation1, allocation2,allocation3,allocation4,allocation5; allocation1 = new byte[32000*1024]; allocation2 = new byte[1000*1024]; allocation3 = new byte[1000*1024]; allocation4 = new byte[1000*1024]; allocation5 = new byte[1000*1024]; }} 123456789101112Copy the code

1.2 Large objects directly into the old age

Large objects are objects that require a large amount of contiguous memory (e.g., strings, arrays).

Why is that?

To avoid the loss of efficiency when allocating memory for large objects due to replication caused by the allocation guarantee mechanism.

1.3 Long-lived objects will enter the old age

Since virtual machines use generational collection to manage memory, memory collection must be able to identify which objects should be placed in the new generation and which objects should be placed in the old generation. To do this, the virtual machine gives each object an object Age counter.

If the object survives after Eden is born and passes through the first Minor GC and can be accommodated by Survivor, it is moved to Survivor space and the object age is set to 1. Each time an object survives MinorGC in a Survivor, its age increases by one year, and when it reaches a certain age (15 by default), it is promoted to the old age. The age threshold for the object to be promoted to the old age can be set by using -xx :MaxTenuringThreshold.

1.4 Dynamic object age determination

Fix (issue552) : “As Hotspot traverses all objects, it accumulates the size it occupies from small to large, and when it accumulates a certain age that exceeds half of the survivor zone, it takes that age and a smaller value of MaxTenuringThreshold as the new promotion age threshold.”

The code for dynamic age calculation is as follows
Uint ageTable::compute_tenuring_threshold(size_t survivor_capacity) {// SURVIVor_capacity is size_t of survivor Spaces desired_survivor_size = (size_t)((((double) survivor_capacity)*TargetSurvivorRatio)/100); size_t total = 0; uint age = 1; while (age < table_size) { total += sizes[age]; // Sizes are sizes for each age if (total > desired_survivor_size) break; age++; } uint result = age < MaxTenuringThreshold ? age : MaxTenuringThreshold; . } 1234567891011121314Copy the code
A side note (issue672) : most of the sources for this statement about the default promotion age being 15 are in the book “understanding the Java virtual machine”. -xx :MaxTenuringThreshold=threshold If you go to the Oracle website to read the related vm parameters, you will find that there is a description here

Sets the maximum tenuring threshold for use in adaptive GC sizing. The largest value is 15. The default value is 15 for the parallel (throughput) collector, and 6 for the CMS collector. The default promotion age is not always 15, this is to distinguish the garbage collector, CMS is 6.

1.5 Area where gc is mainly performed

Mr. Zhou Zhiming wrote in understanding the Java Virtual Machine, 2nd edition P92:

“Major GC (Full GC) refers to GC that occurs in the old age…”

This has been corrected in the third edition of Understanding the Java Virtual Machine. Thank you for your answer:

Conclusion:

For the HotSpot VM implementation, there are actually only two types of GC that are accurate:

Partial GC:

Minor/Young GC: Garbage collection is only for Young generations.
Major GC/Old GC: Only Old GC is collected. It is important to note that the Major GC is also used in some contexts to refer to a whole heap collection;
Mixed GC: Garbage collection for the entire New generation and part of the old generation.

Full GC: Collects the entire Java heap and method area.

2 Is the object dead?

There are almost all object instances in the heap, and the first step before garbage collection is to determine which objects are dead (that is, objects that can no longer be used in any way).

2.1 Reference counting method

Add a reference counter to the object, incrementing it every time it is referenced; When a reference is invalid, the counter is decayed by 1; Any time an object with a 0 counter is no longer usable.

This method is easy to implement and efficient, but the mainstream virtual machine does not choose this algorithm to manage memory, its main reason is that it is difficult to solve the problem of object reference cycle. The cross-reference problem between objects is shown in the following code: There are no references between objects except objA and objB refer to each other. But because they refer to each other, their reference counters are not zero, and the reference counting algorithm cannot tell the GC collector to reclaim them.

public class ReferenceCountingGc { Object instance = null; public static void main(String[] args) { ReferenceCountingGc objA = new ReferenceCountingGc(); ReferenceCountingGc objB = new ReferenceCountingGc(); objA.instance = objB; objB.instance = objA; objA = null; objB = null; }} 123456789101112Copy the code

2.2 Accessibility analysis algorithm

The basic idea of this algorithm is to do it through a series of things called“GC Roots”The path taken by the nodes is called the reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is not available.

Objects that can be used as GC Roots include:

The object referenced in the virtual machine stack (the local variable table in the stack frame)
Objects referenced in the Native method stack
The object referenced by the class static property in the method area
The object referenced by the constant in the method area

2.3 References

Whether we judge the number of references by reference counting method or whether the reference chain of an object is reachable by reachability analysis method, the survival of an object is related to “references”.

Prior to JDK1.2, the definition of a reference in Java was traditional: if the value of a data store of type Reference represented the starting address of another piece of memory, that piece of memory represented a reference.

After JDK1.2, Java has expanded the concept of reference, including strong reference, soft reference, weak reference, and virtual reference.

1. StrongReference

Most of the references we used before were actually strong references, which are the most commonly used references. If an object has a strong reference, it is like an essential household item, and the garbage collector will never collect it. When running out of memory, the Java virtual machine would rather throw outofMemoryErrors to abort the program than randomly recycle objects with strong references to resolve the memory problem.

2. SoftReference

If an object has only soft references, it is akin to an unnecessary commodity. If there is enough memory, the garbage collector will not reclaim it, and if there is not enough memory, it will reclaim the memory of these objects. As long as the garbage collector does not collect it, the object can be used by the program. Soft references can be used to implement memory sensitive caching.

A soft reference can be used in conjunction with a ReferenceQueue (ReferenceQueue), and if the object referenced by the soft reference is garbage collected, the JAVA virtual machine adds the soft reference to the ReferenceQueue associated with it.

3. WeakReference

If an object has only weak references, it is akin to an optional household item. The difference between weak and soft references is that objects with only weak references have a shorter lifetime. When the garbage collector thread scans the memory area under its control, once it finds an object with only weak references, it reclaims its memory regardless of whether the current memory space is sufficient. However, because the garbage collector is a low-priority thread, objects that have only weak references are not necessarily found quickly.

Weak references can be used in conjunction with a ReferenceQueue (ReferenceQueue), and if the object referenced by a weak reference is garbage collected, the Java virtual machine adds the weak reference to the ReferenceQueue associated with it.

4. Virtual Reference

A “virtual reference” is, as the name implies, a virtual reference. Unlike the other references, a virtual reference does not determine the lifetime of an object. If an object holds only virtual references, it can be garbage collected at any time, just as if there were no references at all.

Virtual references are mainly used to track the activity of objects being garbage collected.

One difference between a virtual reference and a soft or weak reference is that a virtual reference must be used in conjunction with a ReferenceQueue. When the garbage collector is about to reclaim an object and finds that it has a virtual reference, it adds the virtual reference to the reference queue associated with it before reclaiming the object’s memory. A program can determine whether a referenced object is about to be garbage collected by determining whether a virtual reference has been added to the reference queue. If a program finds that a virtual reference has been added to the reference queue, it can take the necessary action before the memory of the referenced object is reclaimed.

In particular, weak references and virtual references are rarely used in program design, but soft references are often used. This is because soft references can speed up the recycling of garbage memory by JVM, maintain the safety of system operation, and prevent memory overflow and other problems.

2.4 Unreachable objects are not necessarily dead

Even in the reachability analysis, unreachable objects are not necessarily dead. At this time, they are temporarily in the “probation stage”. In order to truly declare an object dead, at least two marking processes must be experienced. In the reachability analysis method, unreachable objects are marked for the first time and screened once, and the screening condition is whether it is necessary to implement finalize method for this object. When objects do not overwrite the Finalize method, or finalize method has been called by the virtual machine, the virtual machine considers these two cases as unnecessary to execute.

Objects judged to need to be executed are marked a second time in a queue, and are actually reclaimed unless the object is associated with any other object in the reference chain.

2.5 How can I Determine whether a Constant is obsolete?

The runtime constant pool mainly recycles obsolete constants. So, how do we tell if a constant is a discarded constant?

**JDK1.7 and later JVMS have removed the runtime constant pool from the method area, creating an area in the Java Heap for the runtime constant pool. 六四屠杀

Fix (issue747, reference) :

Prior to JDK1.7, the runtime constant pool logic contained the string constant pool stored in the method area, where the hotspot virtual machine implemented the method area as a permanent generation

The JDK1.7 string constant pool is taken to the heap from the method area, there is no mention of the runtime constant pool, that is, the string constant pool is taken to the heap separately, what is left of the runtime constant pool is still in the method area, which is the permanent generation in hotspot.

In JDK1.8 hotspot, the Metaspace is removed, so the string constant pool is still in the heap and the runtime constant pool is still in the method area, but the method area implementation is changed from the permanent generation to the Metaspace area.

If the String “ABC” exists in the String constant pool, the constant “ABC” is deprecated if there is no String reference to the String constant. If memory reclamation occurs, the constant “ABC” will be cleaned out of the pool if necessary.

2.6 How can I Determine whether a Class is useless

The method section mainly recycles useless classes, so how do you tell if a class is useless?

It’s easier to determine whether a constant is “obsolete,” whereas the criteria for determining whether a class is “useless” are much tougher. A class must meet all three of the following criteria to be considered “useless” :

All instances of the class have been reclaimed, meaning that there are no instances of the class in the Java heap.
Loading of the classClassLoaderIt has been recycled.
Corresponding to the classjava.lang.ClassObject is not referenced anywhere, and there is no way to access its methods through reflection anywhere.

A virtual machine can recycle useless classes that meet the above three conditions. This is only “yes”, not necessarily recycled when they are no longer used.

Garbage collection algorithm

3.1 Mark-clear algorithm

The algorithm is divided into “mark” and “clean” phases: first, mark all the objects that do not need to be recycled, and after the completion of marking, uniformly recycle all the unmarked objects. It is the most basic collection algorithm, the subsequent algorithms are to improve its shortcomings. There are two obvious problems with this garbage collection algorithm:

The efficiency problem
Space issues (large number of discontinuous fragments after marker clearing)

3.2 Replication Algorithm

To solve the efficiency problem, “copy” collection algorithms emerged. It divides memory into two equally sized pieces and uses one piece at a time. When this area of memory is used up, the surviving objects are copied to another area, and then the used space is cleaned up again. In this way, half of the memory range is reclaimed each time.

3.3 Mark-collation algorithm

A marking algorithm based on the characteristics of the old era, the marking process is still the same as the “mark-clean” algorithm, but the next step is not to directly reclaim the recyclable objects, but to move all surviving objects to one end, and then directly clean up the memory beyond the end boundary.

3.4 Generational collection algorithm

The current virtual machine garbage collection adopts generational collection algorithm, which has no new idea, but divides the memory into several blocks according to the different object life cycle. The Java heap is generally divided into the new generation and the old generation, so that we can choose the appropriate garbage collection algorithm based on the characteristics of each generation.

For example, in the new generation, a large number of objects die in each collection, so a replication algorithm can be used to complete each garbage collection for a small amount of object replication cost. Older objects have a higher chance of survival, and there is no extra space for allocation guarantees, so we must choose the “mark-clean” or “mark-clean” algorithm for garbage collection.

Extended interview question: Why HotSpot is divided into new generation and old generation?

Answer according to the introduction of the generation collection algorithm above.

4 Garbage Collector

If the collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection.

Although we are comparing collectors, we are not trying to pick the best collector. Since there is no single best garbage collector and no universal garbage collector, all we can do is choose the one that suits our application scenario. Just think about it: if there were a perfect collector that could be used universally in any situation, our HotSpot VIRTUAL machine would not implement so many different garbage collectors.

4.1 Serial Collector

The Serial collector is the most basic and oldest garbage collector. As you can see from the name, this collector is a single-threaded collector. Its “single-threaded” significance not only means that it uses only one garbage collection thread to complete The garbage collection, but also that it must pause all other worker threads (” Stop The World “) until The garbage collection is complete.

The new generation adopts copy algorithm and the old generation adopts mark-collation algorithm.

The designers of The virtual machine were certainly aware of The poor user experience of Stop The World, so The pauses were reduced in subsequent garbage collector designs (there were still pauses, and The search for The best garbage collector continued).

But does the Serial collector have any advantages over other garbage collectors? Of course it does, it’s simple and efficient (compared to the single-threaded efforts of other collectors). The Serial collector naturally achieves high single-thread collection efficiency because it has no overhead of thread interaction. Serial collectors are a good choice for virtual machines running in Client mode.

4.2 ParNew collector

The ParNew collector is essentially a multithreaded version of the Serial collector, with the same behavior (control parameters, collection algorithms, collection strategies, and so on) as the Serial collector, except that it uses multiple threads for garbage collection.

The new generation adopts copy algorithm and the old generation adopts mark-collation algorithm.

It is the first choice for many virtual machines running in Server mode, and in addition to the Serial collector, it is the only one that works with the CMS collector (a truly concurrent collector, described below).

Additions to the concepts of parallelism and concurrency:

Parallel: Multiple garbage collection threads work in Parallel while the user thread is still in a waiting state.
Concurrent: When the user thread executes concurrently (but not necessarily in parallel, but possibly alternately) with the garbage collector running on another CPU.

4.3 Parallel Scavenge

The Parallel Avenge collector is also a multithreaded collector that uses a replication algorithm and looks almost exactly like ParNew. So what’s so special about it?

-xx :+UseParallelGC Parallel collector + old serial -XX:+UseParallelOldGC Parallel collector + old Parallel 12345678Copy the code

The Parallel Scavenge collector focuses on throughput (efficient CPU utilization). Garbage collectors such as CMS focus more on the pause times of user threads (improving user experience). Throughput is the ratio of the CPU time spent running user code to total CPU consumption. The Parallel Exploiter provides a number of parameters to find the most appropriate pause times or maximum throughput. Use the Parallel Exploiter to support an adaptive adjustment strategy if manual optimization is difficult and you don’t know how the collector operates. Leaving memory management optimization to the virtual machine is also a good option.

The new generation adopts copy algorithm and the old generation adopts mark-collation algorithm.

The JDK1.8 default collector is viewed using the Java -xx :+PrintCommandLineFlags -version command

-XX:InitialHeapSize=262921408 -XX:MaxHeapSize=4206742528 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -xx :+UseCompressedOops -xx :+UseParallelGC Java version "1.8.0_211" Java(TM) SE Runtime Environment (build 1.8.0_211-B12) Java HotSpot(TM) 64-bit Server VM (Build 25.211-B12, Mixed mode) 12345Copy the code

Insane +UseParallelGC -xx :+UseParallelOldGC JDK1.8 insane +UseParallelOldGC You can disable this feature by using -xx: -useParalleloldgc

4.4.Serial Old collector

An older version of the Serial collector, which is also a single-threaded collector. It is used primarily for two purposes: as a companion to the Parallel Scavenge collector in JDK1.5 and earlier releases, and as a fallback to the CMS collector.

4.5 Parallel Old Collector

An older version of the Parallel Exploiter. Use multithreading and mark-tidy algorithms. The Parallel Avenge and Parallel Old collectors are preferred in applications where throughput and CPU resources are important.

4.6 CMS Collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. It is perfectly suited for use in ux focused applications.

The CMS (Concurrent Mark Sweep) collector was the HotSpot VIRTUAL machine’s first truly Concurrent collector, enabling the garbage collector thread to work (basically) at the same time as the user thread.

As the word Mark Sweep in its name implies, the CMS collector is implemented as a mark-and-sweep algorithm, which is a bit more complex than the previous garbage collectors. The whole process is divided into four steps:

Initial flag: Suspend all other threads and record objects directly connected to root, which is fast;
Concurrent marking: Enable both GC and user threads, using a closure structure to record reachable objects. At the end of this phase, however, the closure structure is not guaranteed to contain all currently reachable objects. Because the user thread may be constantly updating the reference field, the GC thread cannot guarantee real-time accessibility analysis. So the algorithm keeps track of where these reference updates happen.
Relabelling: The relabelling phase is to correct the mark record of the part of the object that the mark changes because the user program continues to run during the concurrent marking phase. The pause time of this phase is usually slightly longer than the initial marking phase, and much shorter than the concurrent marking phase
Concurrent cleanup:The user thread starts and the GC thread starts cleaning the unmarked area.

As its name suggests, it is an excellent garbage collector with its main advantages: concurrent collection and low pauses. But it has the following three obvious disadvantages:

Sensitive to CPU resources;
Unable to handle floating garbage;
The collection algorithm it uses – the “mark-sweep” algorithm – results in a large amount of space debris at the end of the collection.

4.7 G1 Collector

G1 (garbage-First) is a server-based Garbage collector, mainly for machines equipped with multiple processors and large memory capacity. High throughput performance characteristics while meeting the GC pause time requirements with extremely high probability.

Is seen as an important evolutionary feature of the HotSpot virtual machine in JDK1.7. It has the following characteristics:

Parallelism and concurrency: The G1 takes full advantage of CPU, multi-core hardware, and uses multiple cpus (cpus or CPU cores) to reduce stop-the-world pause times. While other collectors would have paused GC actions performed by Java threads, the G1 collector can still allow Java programs to continue executing concurrently.
Generational collection: Although G1 can manage the entire GC heap independently without the cooperation of other collectors, the concept of generational collection is retained.
Spatial consolidation: Different from CMS’s “mark-clean” algorithm, G1 is a collector based on the “mark-clean” algorithm as a whole; Locally, it is based on a “copy” algorithm.
Predictable pauses: This is another big advantage G1 has over CMS. Reducing pause times is a common concern of BOTH G1 and CMS, but G1 also models predictable pause times, allowing users to explicitly specify a time segment of M milliseconds in length.

The G1 collector operates in the following steps:

Initial tag
Concurrent tags
In the end tag
Screening of recycling

The G1 collector maintains a priority list behind the scenes, prioritizing the Region with the greatest collection value (hence its name garbage-first) based on the allowed collection time each time. This use of regions and prioritized Region collection ensures that the G1 collector can collect as efficiently as possible in a limited amount of time (dividing memory into pieces).

reference

Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (2nd Edition)
My.oschina.net/hosee/blog/…
Docs.oracle.com/javase/spec…
2020 Latest Java Basics Tutorial and Learning route!

Writing in the front

This section often meet questions

Text fuse

Demystify JVM memory allocation and reclamation

1.1 Objects are preferentially allocated in Eden area

1.2 Large objects directly into the old age

1.3 Long-lived objects will enter the old age

1.4 Dynamic object age determination

1.5 Area where gc is mainly performed

2 Is the object dead?

2.1 Reference counting method

2.2 Accessibility analysis algorithm

2.3 References

2.4 Unreachable objects are not necessarily dead

2.5 How can I Determine whether a Constant is obsolete?

2.6 How can I Determine whether a Class is useless

Garbage collection algorithm

3.1 Mark-clear algorithm

3.2 Replication Algorithm

3.3 Mark-collation algorithm

3.4 Generational collection algorithm

4 Garbage Collector

4.1 Serial Collector

4.2 ParNew collector

4.3 Parallel Scavenge

4.4.Serial Old collector

4.5 Parallel Old Collector

4.6 CMS Collector

4.7 G1 Collector

reference

Related Posts

Netty asynchronous high performance communication framework

Singly linked list data structure and the second set of insert | circular linked list

Java Concurrent Containers