This is the twelfth article in the In-depth Learning ABOUT the JVM series

This article will continue to cover various garbage collectors throughout the history of the JVM. Most of this article comes from understanding the Java Virtual Machine. There is not much extensibility to add, but it will be reposted for the sake of completeness.

Serial collector

The Serial collector is the most basic and oldest garbage collector, a new generation of collectors based on the mark-copy algorithm.

It has two characteristics:

  • It only uses a single thread for garbage collection;
  • It is an exclusive garbage collection.

Its “single-threaded” significance not only means that it uses only one garbage collection thread to complete garbage collection, but it must “Stop The World” while garbage collection is in progress.

The designers of The virtual machine were of course aware of The poor user experience associated with Stop The World, so in successive iterations of The JDK, new garbage collectors were developed, from Serial to Parallel collectors, Moving on to the Concurrent Mark Sweep (CMS) and Garbage First (G1) collectors, the pause times are getting shorter (there are still pauses, and the search for the best Garbage collector is still ongoing).

But does the Serial collector have any advantages over other garbage collectors? Of course it does, it’s simple and efficient (compared to the single-threaded efforts of other collectors). The Serial collector naturally achieves high single-thread collection efficiency because it has no overhead of thread interaction.

Using the -xx :+UseSerialGC parameter, you can specify the use of the new-generation and old-generation serial collectors. The Serial collector is a good choice for virtual machines running in Client mode, which is the default garbage collector when the virtual machine is running in Client mode.

The collector works with the following log:

4.755: [Allocation Failure (GC) 4.755: [DefNew: 16384K-> 1647k (18432K), secs] 16384K->6199K(59392K), secs] [Times: User =0.01 sys=0.00, real=0.02 secs] 9.240: [GC (Allocation Failure) 9.240: [DefNew: 18431K->2047K(18432K), 0.0282920secs] 22583K->20523K(59392K), 0.028333334 secs] [Times: User sys = = 0.03 0.00, real = 0.03 secs]Copy the code

ParNew collector

The ParNew (parallel) collector is essentially a multithreaded version of the Serial collector. It is also based on the mark-copy algorithm and behaves exactly the same as the Serial collector (control parameters, collection algorithm, collection strategy, and so on) except for garbage collection using multiple threads.

The collector works as shown in the figure below. During the collection process, the application also pauses, but because the parallel collector uses multiple threads for garbage collection, it produces a shorter pause time than the serial collector on the more concurrent cpus. On systems with a single CPU or weak concurrency, the parallel collector does no better than the serial collector.

It is the first choice for many virtual machines running in Server mode, and in addition to the Serial collector, it is the only one that works with the CMS collector (a truly concurrent collector, described below).

The ParNew collector can be started with the following parameters:

-xx :+UseParNewGC: new generation uses ParNew collector, old generation uses serial collector -xx :+UseConcMarkSweepGC: new generation uses ParNew collector, old generation uses CMSCopy the code

The number of threads for the ParNew collector to work on can be specified using the -xx :ParallelGCThreads parameter. It is generally preferred to match the number of cpus to avoid excessive threads affecting garbage collection performance. By default, ParallelGCThreads equals the number of cpus when the number of cpus is less than eight, and 3+((5*CPU_Count)/8) when the number of cpus is greater than eight.

The log input for the ParNew collector is as follows:

4.822: [Allocation Failure (GC) 4.823: [ParNew: 16384K-> 1647k (18432K), 0.0254299 secs] 16384K->6751K(59392K), 0.0256276secs] [Times: User =0.09 sys=0.02, real= 0.03secs] 9.381: [GC (Allocation Failure) 9.381: [ParNew: Secs] [Times: 3305K -> 3305k, 3305k] [Times: 3305k, 3305k] User sys = = 0.11 0.02, real = 0.02 secs]Copy the code

The ParNew collector can work in a multithreaded environment, and other collectors will cover the concepts of “concurrency” and “parallelism” in the context of the garbage collector, which can be understood as:

  • Parallel: Parallel describes the relationship between multiple garbage collection threads, indicating that there are multiple such threads working together at the same time, usually by default, the user thread is in the waiting state.
  • Concurrent: Concurrency describes the relationship between garbage collector and user threads, indicating that both garbage collector and user threads are running at the same time. Since the user thread is not frozen, the program can still respond to service requests, but the throughput of the application’s processing is affected because the garbage collector thread occupies a portion of the system resources. Also, because the user thread may cause the object’s reference chain to change, the garbage collection thread will be affected.

Parallel avenge

The Parallel Scavenge collector, similar to the ParNew collector, is based on a mark-copy algorithm and is a multithreaded collector capable of collecting in Parallel. In practical application, it can be combined:

-xx :+UseParallelOldGC :+UseParallelOldGC :+UseParallelOldGCCopy the code

The Parallel Scavenge collector focuses on throughput (efficient CPU utilization). Garbage collectors such as CMS focus more on the pause times of user threads (improving user experience). Throughput is the ratio of the CPU time spent running user code to total CPU consumption.

The Parallel Avenge avenge provides a number of parameters to find the most appropriate pause times or maximum throughput. Use the Parallel Avenge avenge with an adaptive adjustment strategy if manual optimization is difficult if you don’t know how the collector operates. Leaving memory management optimization to the virtual machine is also a good option.

The log for this collector at work looks like this:

4.561: [Allocation Failure (GC) [PSYoungGen: 15360K->2540K(17920K)] 15360K->5861K(58880K), 0.0046649 secs] [Times: User =0.02 sys=0.01, real=0.00 secs] 8.778: [GC (Allocation Failure) [PSYoungGen: [Times: user= 0.03sys =0.02, real= 0.03secs]Copy the code

Serial Old collector

An older version of the Serial collector, which is also a single-threaded collector, uses a mark-collation algorithm. It is used primarily for two purposes: as a companion to the Parallel Scavenge collector in JDK1.5 and earlier releases, and as a fallback to the CMS collector.

To enable the Serial Old collector, try using the following parameters.

-xx :+UseSerialGC :+UseParNewGC :+ useParNew collector :+UseParallelGC: The younger generation uses the ParallelGC collector, while the older generation uses the serial collectorCopy the code

The collector works with the following log:

16.400: [Allocation Failure (Full GC) 40959K->40959K(40960K), 0.0762813 secs] 59391K->59391K(59392K) [Times: user= 0.03secs =0.00, real= 0.03secs] [Full GC (Allocation Failure) 16.476: 40959K->40959K(40960K), 0.0756075 secs] 59391K->59391K(59392K) [Times: user= sys=0.00, real= 0.003secs] [Times: user= sys=0.00, real= 0.003secs]Copy the code

Parallel Old collector

An older version of the Parallel Exploiter. Use multithreading and mark-tidy algorithms. The Parallel Avenge and Parallel Old collectors are preferred in applications where throughput and CPU resources are important.

The log for this collector at work looks like this:

16.889: [Ergonomics] [PSYoungGen: 15359K->15359K(17920K)] [ParOldGen: 40942K->40942K(40960K)] 56302K->56302K(58880K), [Metaspace: 3766K->3766K(1056768K)], 0.0311130secs] [Times: User =0.29 sys=0.01, real= 0.03secs] 16.920: [Full GC (Ergonomics) [PSYoungGen: 15359K->15359K(17920K)] [ParOldGen: 40944K->40944K(40960K)] 56304K->56304K(58880K), [Metaspace: 3766K->3766K(1056768K)], 0.0304346 secs] [Times: User sys = = 0.29 0.00, real = 0.03 secs]Copy the code

JDK8 focuses on throughput and CPU resources and uses the Parallel Insane and Parallel Old collectors by default.

% java -XX:+PrintCommandLineFlags -version-XX:InitialHeapSize=268435456 -XX:MaxHeapSize=4294967296 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -xx :+UseCompressedOops -xx :+UseParallelGC Java version "1.8.0_301" Java(TM) SE Runtime Environment (build 1.8.0_301-b09) Java HotSpot(TM) 64-bit Server VM (Build 25.301-B09, Mixed mode)Copy the code

UseParallelGC is the Parallel Avenge + Parallel Old.

CMS collector

The CMS (Concurrent Mark Sweep concurrent-mark-sweep) collector is a collector whose goal is to obtain the shortest collection pause time. It is perfectly suited for use in ux focused applications.

The CMS (Concurrent Mark Sweep) collector was the first truly Concurrent collector for my HotSpot VIRTUAL machine, allowing the garbage collector thread to work (basically) at the same time as the user thread.

As the word Mark Sweep in its name implies, the CMS collector is implemented as a mark-and-sweep algorithm, which is a bit more complex than the previous garbage collectors. The whole process is divided into five steps:

  • Initial flag: Suspend all other threads and record objects directly connected to root, which is fast;
  • Concurrent marking: Enable both GC and user threads, using a closure structure to record reachable objects. At the end of this phase, however, the closure structure is not guaranteed to contain all currently reachable objects. Because the user thread may be constantly updating the reference field, the GC thread cannot guarantee real-time accessibility analysis. So the algorithm keeps track of where these reference updates happen.
  • To sign: to mark phase correcting during concurrent tag because the user is to continue to run and lead to produce changes in that part of the object’s marking record, also need to suspend another thread, this stage pause time usually slightly longer than the initial mark phase time, much shorter than the concurrent mark phase time.
  • Concurrent cleanup: The user thread is started and the GC thread begins to clean the marked area.
  • Concurrent reset: After the garbage collection is complete, the CMS data structures and data are reinitialized in preparation for the next garbage collection.

Parameters to enable the CMS collector:

-xx :+UseConcMarkSweepGC: New generation uses ParNew collector, old generation uses CMSCopy the code

The default number of concurrent threads started by CMS is (ParallelGCThreads+3)/4. ParallelGCThreads, as mentioned above, is the number of threads used for GC parallelism. If the new generation uses ParNew, ParallelGCThreads is the number of threads in the new generation GC. The number of concurrent threads can also be manually set with the -xx :ConcGCThreads or -xx :ParallelCMSThreads parameters.

The CMS log output is as follows:

12.724: [GC (CMS Initial Mark) [1 CMS- Initial Mark: 37113K(40960K)] 39484K(59392K), 0.0004892secs] [Times: User =0.00 sys=0.00, real=0.00 secs] 12.724: [cms-concurrent-mark-start] 12.742: [CMs-concurrent-mark: Secs] [Times: user= 0.06sys =0.00, real= 0.03secs] 12.742: [cms-concurrent-preclean-start] 12.742: [Times: user= 0.06sys =0.00, real= 0.03secs] [cmS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [GC (CMS Final Remark) [YG occupancy: 2371 K (18432 K)]12.742: [Rescan (Parallel), 0.0004109 secs]12.743: 2. [weak refs processing, class semantics, 0.00021721 secs] [Scrub Symbol table, 0.0003475 secs]12.744: [Scrub String table, 0.0001665 secs][1 CMS-remark: [Times: user=0.00 sys=0.00, real=0.00 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] [cms-concurrent-sweep -start] 12.758: [CMs-concurrent-sweep: 0.014/0.014secs] [Times: User =0.02 sys=0.00, real=0.02 secs] 12.758: [cms-concurrent-reset-start] 12.758: [CMs-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]Copy the code

You can see that the CMS process includes the phases mentioned above, and you can also see the CMS time and heap memory information in the log.

If the CMS encounters a collection failure, the log will display:

16.467: [Allocation Failure (Full GC) 16.468: [CMS16.487: [CMS-concurrent-mark: 0.018/0.019secs] [Times: User =0.05 sys=0.00, real=0.02 secs] (Concurrent mode failure): 40959K->40959K(40960K), secs] 59391K->59391K(59392K), [Times: user= 0.00 sys=0.00, real= 0.25secs] [Times: user= 0.25secs] [Full GC (Allocation Failure) 16.561: [CMS: 40959K->40959K(40960K), 0.0829561 secs] 59391K->59391K(59392K), [Metaspace: [Times: user= 0.03secs =0.00, real= 0.03secs] 16.644: [CMS: 40959K->1011K(40960K), 0.0202456 secs] 59391K->1011K(59392K), [Metaspace: [Times: user= 0.03sys =0.00, real= 0.03secs] [Times: user= 0.03sys =0.00, real= 0.03secs]Copy the code

This is most likely due to the program running out of space in the old era.

Overall, CMS was an excellent garbage collector, but it has been replaced. Its main advantages: concurrent collection, low pauses. But there are three obvious disadvantages:

  • Sensitive to CPU resources;
  • Unable to handle floating garbage;
  • The collection algorithm it uses – the “mark-sweep” algorithm – results in a large amount of space debris at the end of the collection.

The CMS collector also suffers from the problem of “object disappearance” due to concurrent processing. As mentioned above, CMS marks concurrency based on incremental updates.

G1 collector

G1 (garbage-First) is a server-based Garbage collector, mainly for machines equipped with multiple processors and large memory capacity. High throughput performance characteristics while meeting the GC pause time requirements with extremely high probability.

On the release date of JDK 9, G1 was declared to replace the Parallel Insane and Parallel Old combination as the default garbage collector in server mode, while CMS was reduced to a collector declared deprecated.

Features introduced

When introducing the first few garbage collectors, the target scope of garbage collection was either the entire Minor GC, the entire Major GC, or the entire Java heap.

The G1 collector, on the other hand, can collect garbage from any part of the heap in a Collection Set (commonly referred to as a CSet), measured not by which generation it belongs to, but by which chunk of memory holds the most garbage and collects the most revenue. This is the Mixed GC mode of the G1 collector.

Specifically, although the concepts of Cenozoic era and old age are still retained, the Cenozoic era and old age are no longer separated in the region. It divides the entire Java heap into independent regions of equal size called regions. Cenozoic and old age are dynamically composed of regions, which can be discontinuous intervals.

Each Region can act as the Eden space of the new generation, Survivor space, or old chronospace as needed. It also has a special class of regions called Humongous for storing large objects. G1 considers large objects as long as the size of a Region exceeds half of its capacity.

Note that large objects can cause performance problems in the following scenarios.

  • Large objects have a short lifetime.
  • Region of the object is reclaimed in the young generation if the first one is met.
  • Allocate large objects frequently.

The G1 heap memory is divided into multiple regions of equal size, but the total number of regions is about 2048. The default value is 2048. For a Region, it is a logically continuous segment of space. Its size ranges from 1MB to 32MB.

The structure is as follows:

The top E, S, and blue square with no letter (understood as old), H is a concept not found in previous garbage collectors and stands for Humongous.

The G1 collector maintains a priority list behind the scene, prioritizing the Region with the greatest collection value (hence its name, garbage-first, To be more specific, each Region has a “value” for garbage (the value is the amount of space collected and the amount of time it takes to collect). This use of regions and prioritized Region collection ensures that the G1 collector can collect as efficiently as possible in a limited amount of time (dividing memory into pieces).

Therefore, in the reclamation phase, regions with the highest reclamation value are processed first. Therefore, a single collection does not reclaim all regions.

The G1 collector operates in the following steps:

  • Initial Marking: This stage simply marks objects that GC Roots can be directly associated with and changes the value of TAMS(Next Top at Mark Start) so that new objects can be created in the correct available Region when the user program runs concurrently in the Next stage. This stage requires the thread to be paused, but it takes a very short time. And this is done synchronously during the Minor GC, so the G1 collector actually has no additional pauses at this stage.

  • Concurrent Marking: The reachability analysis of objects in the heap is performed starting with GC Roots, recursively scanning the object graph in the heap to find viable objects. This phase is time-consuming, but can be performed concurrently with user programs. When the object graph scan is complete, it is also necessary to reprocess the objects recorded by SATB that have reference changes at the time of concurrency.

  • Final Marking: Another short pause on the user thread to process the last few SATB records that remain after the concurrent phase is over.

  • Live Data Counting and Evacuation: Responsible for updating statistics for regions, sequencing the value and cost of recovery for each Region, and developing recovery plans based on user expected downtime.

    You can select any number of regions to form a collection, copy the surviving objects of the Region to the empty Region, and clear all the space of the old Region.

    The operation here involves the movement of the living object, which must suspend the user thread, and is done in parallel by multiple collector threads.

The G1 collector requires complete suspension of the user thread in all phases except for concurrent marking. As you can see, the goal of the G1 collector is to achieve the highest throughput possible with manageable latency.

The G1 collector works and outputs the following log:

[5.487s][INFO][GC,start] GC(0) Pause Young (G1 Evacuation Pause) [5.487s][INFO][GC, Task] GC(0) Using 10 workers of 10 For evacuation [5.498s][info][GC, Phases] GC(0) Pre Evacuate Collection Set: 37 ms [5.498s][info][GC, Phases] gc (0) Evacuate Collection Set: Post Evacuate Collection Set: 0.5ms [5.498s][info][GC, Phases] GC(0) Other: 0.2ms [5.498s][info][GC,heap] GC(0) Eden Regions: 24->0(12) [5.498s][info][GC,heap] GC(0) Survivor regions: 0->3(3) [5.498s][info][GC,heap] GC(0) Old Regions: 0->11 [5.498s][info][GC,heap] GC(0) Humongous Regions: 5->5 [5.498s][info][GC,metaspace] GC(0) metaspace: 6984K->6984K(1056768K) [5.498s][info][GC] GC(0) Pause Young (G1 Evacuation Pause) 29M->18M(60M) 10.904ms [info][GC, CPU] GC(0) User=0.05s Sys=0.04s Real=0.01sCopy the code

reference

In-depth Understanding of the Java Virtual Machine