An in-depth understanding of the JVM (xiII) goes into the garbage collector

The GC classification

  1. According to the number of threads, it can be divided into serial garbage collectors and parallel garbage collectors.
  • Serial garbage collector refers to the fact that only one CPU is allowed to perform garbage collection at a time, at which point the worker thread is suspended until the garbage collection is complete.

    • In cases where hardware platforms are not particularly superior, such as single-CPU processors or small application memory, serial collectors can outperform parallel and concurrent collectors. So serial reclamation is applied by default to the JVM in Client mode on the Client side
    • Parallel collectors produce shorter pause times than serial collectors on more concurrent cpus.
  • The parallel garbage collector can use multiple cpus to perform garbage collection at the same time, thus improving the throughput of the application, but the parallel collection is still exclusive and uses a stop-the-world mechanism like serial collection.

Note: The concept of parallelism in the context of JVM garbage collection is the parallelism of garbage collection threads.

  1. According to the working mode, can be divided into parallel garbage collector and exclusive garbage collector.
  • And deploy the garbage collectorAlternate with application threads to minimize application pause time.
  • Exclusive garbage collectorOnce run, Stop all user threads in the application until the garbage collection process is complete. (The serial garbage collector is an exclusive garbage collector.)

  1. According to the way of debris treatment, it can be divided into compressed garbage collector and non-compressed garbage collector.
  • Compressed garbage collectorAfter the collection is complete, the surviving objects are compressed and sorted to eliminate the recovered fragments.
    • Redistribute object space using: pointer collisions
  • Non-compressed garbage collectorDo not perform this step.
    • Reallocate object space usage: free list
  1. According to the working memory interval, and can be divided intoYoung generation garbage collectorandOld age garbage collector.

Evaluate the performance metrics for GC

  • Throughput: The percentage of total elapsed time spent running user code
    • (Total elapsed time: the elapsed time of the program + the elapsed time of memory reclamation)
  • Pause time: The amount of time a program’s worker thread is suspended while garbage collection is being performed.
  • Memory footprint: The amount of memory occupied by the Java heap.
  • Garbage collection overhead: the complement of throughput, the ratio of the garbage collection time to the total elapsed time.
  • Collection frequency: How often collection operations occur relative to the execution of the application.
  • Fast: The time an object takes from birth to being recycled.

Together they form an impossible triangle. The overall performance of all three will get better and better as technology advances. A good collector usually does at most two of these.

Of these three, time out becomes increasingly important. Because more memory footprint becomes more tolerable as hardware evolves, hardware performance improvements also help reduce the impact of collector runtime on the application, which increases throughput. Memory expansion has a negative effect on latency.

In brief, two main points should be taken:

  • throughput
  • The pause time

Throughput (throughput)

  • Throughput is the ratio of the CPU time spent running user code to total CPU consumption, i.e. :

Throughput = time to run user code/(Time to run user code + garbage collection time)

  • In this case, applications can tolerate high pause times, so high-throughput applications have a longer time baseline and fast response is not a concern.

  • Throughput first, which means that STW has the shortest time per unit time: 0.2 + 0.2 = 0.4

For example, if the virtual machine is running for 6s and garbage collection takes 400ms, the throughput is 93.33%. Single STW time 200ms

Pause time

  • “Pause time” refers to a period of time during which the application thread is paused to allow the GC thread to execute

For example, a 100-millisecond pause time during GC means that no application threads are active during that 100-millisecond period.

  • Pause time priority means keeping the time of a single STW as short as possible :0.1+0.1+0.1+0.1 =0.5

For example, if the virtual machine is running for 6s and garbage collection takes 500ms, the throughput is 91.67%. The time of single STW is 100ms

Both comparisons

  • Why high throughput is better: Because it gives the end user of the application the impression that only application threads are doing “productive” work. Intuitively, the higher the throughput, the faster the application will run. (The program runs more code)

  • Why low pause times (low latency) are better: Because it is always bad from the end user’s point of view for an application to be suspended, whether for GC or other reasons. Depending on the type of application, sometimes even a brief 200-millisecond pause can interrupt the end-user experience. Therefore, having low, large pause times is very important, especially for an interactive application. (Want interactive fast response)

  • “High throughput” and “low pause times” are competing goals.

    • Because if you choose toThroughput priority, then there must beReduce the frequency of memory reclamation, but this can cause the GC to require longer pause times to perform memory reclamation.
    • On the contrary, if you choose toLow latency priority, so in order to reduce the pause time of each memory reclamation, onlyPerform memory reclamation frequentlyHowever, this causes memory reduction in the younger generation and results in reduced program throughput.

Now standard: Reduce pause times when maximum throughput is first.

Overview of different garbage collectors

History of garbage collector

  • Along with JDK1.3.1 in 1999 came the Serial GC, which was the first GC in Serial mode. The ParNew garbage collector is a multithreaded version of the Serial collector
  • On February 26, 2002, Parallel GC and Concurrent Mark Sweep GC were released along with JDK1.4.2 and Parallel GC became the HotSpot default GC after JDK6.
  • In 2012, G1 was available in JDK 1.7U4.
  • In 2017, G1 became the default garbage collector in JDK9, replacing CMS.
  • Parallel full garbage collection for G1 garbage collector in JDK10 in March 2018, implementing parallelism to improve worst-case latency.
  • JDK11 was launched in September 2018. Introduced the Epsilon garbage collector, also known as the “no-0P (No action)” collector. At the same time, the introduction of ZGC: scalable low delay garbage collector (Experimental).
  • In March 2019, JDK12 was released. Enhanced G1 to automatically return unused heap memory to the operating system. Meanwhile, Shenandoah GC: Experimental GC with low pause time is introduced.
  • JDK13 was released in September 2019. Enhanced ZGC to automatically return unused heap memory to the operating system.
  • In March 2020, JDK14 was released. Delete the CMS garbage collector. Extend ZGC on macOS and Windows

Seven classic garbage collectors

  • Serial collector: Serial, Serial old
  • Parallel recyclers: ParNew, Parallel Avenge, Parallel old
  • Concurrent collector: CMS, G1

  1. There is a line between the two collectors, indicating that they can be used together:

    • Serial/Serial old, Serial/CMS,
    • ParNew/Serial Old, ParNew/CMS
    • Insane /Serial 0LD, Parallel Scavenge/Paral1el 01D,
    • G1;
  2. Serial Old is the backup plan for Concurrent Mode Failure of CMS.

  3. (red dotted line) Due to the cost of maintenance and compatibility testing, the Seria1+CMS and ParNew+Serial Old combinations were declared obsolete in JDK8 (JEP 173) and completely unsupported in JDK9 (JEP214), i.e., removed.

  4. Use the Parallel Scavenge avenge and Serial Old GC.

  5. JDK 14: Delete CMS garbage collector (JEP 363)

Official garbage collector introduction documentation

View the default garbage collector

  • -xx:+PrintCommandLineFlags: View command line parameters (including the garbage collector used)

  • Jinfo-flag Indicates the process ID of related garbage collector parameters

Replace garbage collector parameter description:

  • -xx :+UseSerialGC: indicates that the new generation uses SerialGC and the Old generation uses Serial Old GC

  • -xx :+UseParNewGC: specifies the new generation to UseParNewGC

  • You only need one run parameter, and the two can activate each other

    • -xx :+UseParallelGC: Indicates that the new generation uses ParallelGC
    • -xx :+UseParallelOldGC: Indicates that ParallelOldGC is used in older years
  • -xx :+UseConcMarkSweepGC: indicates that CMS GC is used in the older generation and ParNew is triggered in the younger generation

Serial collector – Serial collector

The collector is a single-threaded collector, but its “single-threaded” meaning is not only that it uses only one CPU or one collection thread to complete garbage collection, but also that it must suspend all other worker threads while it collects garbage until it stops The World.

Serial

  • The Serial collector is the most basic and oldest garbage collector. The only option to recycle the new generation before JDK1.3.
  • The Serial collector is the default new generation garbage collector in Client mode in HotSpot.
  • Serial collector usesReplication algorithm,Serial recoveryAnd “stop-the-world” mechanisms to perform memory reclamation.

Serial old

  • The Serial collector also provides the Serial Old collector for performing the old age garbage collection. The Serial Old collector is also usedSerial recoveryAnd stop-the-world, except the memory reclamation algorithm usesMark-compression algorithm.
    • Serial Old is the default old garbage collector running in Client mode
    • Serial Old has two main uses in Server mode:
      • To be used in conjunction with the New Generation of the Parallel Insane
      • As a back-up garbage collection solution for older CMS collectors

Advantage:

  • Simple and efficientFor single-CPU-limited environments, Serial collectors can achieve the highest single-threaded collection efficiency by focusing on garbage collection because there is no overhead of thread interaction.
    • A virtual machine running in Client mode is a good choice.
  • In a user’s desktop application scenario, the available memory is generally small (tens of MB to one or two hundred MB), garbage collection can be completed in a relatively short time (tens of ms to more than one hundred ms), and serial collector is acceptable as long as it does not occur frequently.

Parameter Settings:

  • The -xx :+UseSerialGC parameter specifies that both young and old generations use the serial collector. (Equivalent to replacing Serial GC for freshmen, and replacing Serial old GC for seniors)

Conclusion:

  • It is no longer serialized, and can only be used on a single-core CPU. It’s not even mono anymore.

  • For highly interactive applications, this garbage collector is unacceptable. Serial garbage collectors are not typically used in Java Web applications.

ParNew- Parallel collection

If the Serial GC is a single-threaded garbage collector in the younger generation, the ParNew collector is a multithreaded version of the Serial collector.

  • Par is short for Parallel, New: only deals with the New generation
  • The ParNew collector in addition to usingParallel recoveryThere is almost no difference between the two garbage collectors except that they perform memory collection. The ParNew collector is also used in the younger generationReplication algorithmStop-the-world.
  • ParNew is the default garbage collector for the new generation of many JVMS running in Server mode.

  • For the new generation, the recycling times are frequent and the parallel method is efficient.

  • For the old age, the number of recycling is less, using the serial way to save resources. (CPU parallel need to switch threads, serial can save the resources of switching threads)

Since the ParNew collector is based on parallel collection, is it safe to assume that the ParNew collector will be more efficient at collecting than the Serial collector in any scenario?

  • ParNew collector runs in a multi-CPU environment. Because it can take full advantage of physical hardware resources such as multi-CPU and multi-core, it can complete garbage collection more quickly and improve the throughput of the program.
  • But in a single CPU environment, the ParNew collector is no more efficient than the Serial collector. Although the Serial collector is based on Serial collection, it avoids some of the overhead associated with multithreaded interactions because the CPU does not need to switch tasks frequently.
  • In addition to Serial, only ParNew GC currently works with the CMS collector

Parameter Settings:

  • -xx :+UseParNewGC: specifies that the ParNew collector is used to perform the memory reclamation task. It means that the younger generation uses the parallel collector without affecting the older generation.
  • -xx :ParallelGCThreads Limits the number of threads that are enabled by default.

Parallel collector – Throughput first

Parallel Scavenge

The Parallel Avenge collector also uses replication algorithms, Parallel recycling, and a “stop-the-world” mechanism.

  • Unlike the ParNew collector, the Parallel Scavenge collector aims to achieve oneControllable Throughput, which is also known as throughput first garbage collector.
  • The adaptive adjustment strategy is also an important distinction between the Parallel Avenge and ParNew.

Parallel old

High throughput can efficiently use THE CPU time, as soon as possible to complete the operation of the program, mainly suitable for the background operation without too much interaction. Therefore, it is commonly used in server environments. For example, applications that perform batch processing, order processing, payroll, and scientific calculations.

  • Parallel collector a Parallel old collector for performing old-age garbage collection was provided in JDK1.6 in place of the Serial old collector.
  • Parallel Old collector is adoptedMark-compression algorithmBut also based onParallel recoveryAnd stop-the-world.

  • In application throughput first scenarios, the combination of the Parallel collector and the Parallel 0LD collector performs well in Server mode.
  • In Java8, the default is this garbage collector.

Parameter Settings:

  • -xx :+UseParallelGC specifies that the younger generation uses the Parallel collector to perform memory reclamation tasks.

  • -xx :+UseParallelOldGC Specifies that older generations are using parallel collectors.

    • It applies to the new generation and the old age respectively.Jdk8 is enabled by default.
    • One of the above two parameters is enabled by default, and the other is enabled as well. (Mutual activation)
  • -xx :ParallelGCThreads Sets the number of threads for the young generation of parallel collectors. It is best to match the number of cpus to avoid too many threads affecting garbage collection performance.

    • By default, when the number of cpus is less than eight, ParallelGCThreads is equal to the number of cpus.
    • ParallelGCThreads = 3+ [5* cpu_count]/8 when the number of cpus is greater than 8
  • -xx :MaxGCPauseMillis sets the maximum garbage collector pause time (that is, the time of STw). The units are milliseconds.

    • To keep the pause time within MaxGCPauseMills as much as possible, the collector is at workTo adjustJava heap size or some other parameter.
    • For users, the shorter the pause, the better the experience. But on the server side, we focus on high concurrency, overall throughput. So the server side is suitable for Parallel control.
    • Use this parameter with caution.
  • -xx :GCTimeRatio Ratio of the garbage collection time to the total time (= 1 / (N + 1)). Used to measure throughput size.

    • The value ranges from 0 to 100. The default value is 99, which means the garbage collection time is less than 1%.
    • Is somewhat contradictory to the previous -xx :MaxGCPauseMillis parameter. The longer the pause time, the Radio parameters tend to exceed the set ratio.
  • -xx :+UseAdaptiveSizePolicy Sets the Parallel Scavenge avenge has an adaptive adjustment policy

    • In this mode, parameters such as the size of the young generation, the ratio of Eden to Survivor, and the age of objects promoted to the old generation are adjusted automatically to reach a balance between heap size, throughput, and pause time.
    • In cases where manual tuning is difficult, you can use this adaptive approach to specify only the maximum heap of the virtual machine, the throughput of the target (GCTimeRatio), and the pause time (MaxGCPauseMills), and let the virtual machine do the tuning itself.

CMS – low latency

In the JDK1.5 era, HotSpot introduced a garbage collector that could almost be considered revolutionary in strongly interactive applications: CMS (Concurrent-mark-sweep) collector, the first truly Concurrent collector in the HotSpot VIRTUAL machine, enables the garbage collector thread to work simultaneously with the user thread for the first time.

  • The focus of the CMS collector isMinimize the pause time of user threads during garbage collection. The shorter the pause times (low latency), the better the application that interacts with the user, and the better the response speed improves the user experience.
  • At present, a large part of Java applications are concentrated on the server side of Internet sites or B/S systems. These applications pay special attention to the response speed of services and hope to have the shortest system pause time to bring users a better experience. The CMS collector is a good fit for such applications.
  • CMS garbage collection algorithm is adoptedMark-clear algorithmAnd also “stop-the-world”

Unfortunately, CMS, as a collector for older ages, does not work with the Parallel Scavenge collector, which already exists in JDK1.4.0, so when using CMS to collect older ages in JDK1.5, the Cenozoic collector has to choose either ParNew or Seria1.

Before G1, CMS was widely used. Today, there are still many systems using CMS GC.

There are four main phases of a CMS:

The whole process of CMS is more complex than the previous collector. The whole process is divided into four main stages, namely the initial marking stage, concurrent marking stage, re-marking stage and concurrent clearing stage.

  • Initial-mark phase: In this phase, all worker threads in the program are briefly paused due to stop-the-world, and the primary task of this phase is simply to Mark objects that GCRoots can be directly associated with. Once the tag completes, all application threads that were suspended will be resumed. Because the directly related object is small, the speed here is very fast.

  • Concurrent-mark phase: The process of traversing the entire object graph from the directly associated objects of GC Roots. This process is time-consuming but does not require the suspension of the user thread and can be run concurrently with the garbage collection thread.

  • Remark phase: In the concurrent marking phase, the worker thread of the program and the garbage collection thread will run at the same time or cross each other. Therefore, in order to correct the marking record of the part of objects whose marks are changed due to the continued operation of the user program during the concurrent marking phase, The pause time in this phase is usually slightly longer than in the initial tagging phase, but also much shorter than in the concurrent tagging phase.

  • Concurrent-sweep phase: This phase removes dead objects judged by the concurrent-sweep phase, freeing up memory. Since there is no need to move live objects, this phase can also be concurrent with the user thread

Insufficient memory enables the Serial Old collector

Although the CMS collector uses concurrent collection (non-exclusive), it still performs a stop-the-world mechanism to suspend the worker threads in the program during its initialization and re-marking phases, but not for very long. Thus, none of the current garbage collectors need to be stop-the-world free, but only pause as short as possible.

Because the most time-consuming concurrent marking and concurrent cleanup phases do not require pauses, the overall collection is low-pause.

Since the user threads are not interrupted during the garbage collection phase, you should also ensure that the application user threads have enough memory available during the CMS collection process. Therefore, the CMS collector does not wait until the old age is almost completely filled, as other collectors do. Instead, the CMS collector starts collecting when the heap memory usage reaches a certain threshold to ensure that the application still has enough space to run while the CMS is working.

If the CMS is running without enough memory to meet the program’s requirements, a “Concurrent Mode Failure” occurs, at which point the virtual machine starts a fallback: the Serial old collector is temporarily enabled to restart the old garbage collection, resulting in long pauses.

One might think that since Mark Sweep causes memory fragmentation, why not change the algorithm to Mark Compact?

Because when concurrent cleanup is done, how do you use the memory used by the user thread? To ensure that the user thread can continue to execute, the resource it is running on is not affected. The Mark Compact is better suited for “Stop the World” scenarios

Advantages of CMS:

  • Concurrent collection
  • Low latency

Disadvantages of CMS:

  • Memory fragmentation is generated, causing insufficient space for the user thread. In the case that large objects cannot be allocated, the Full GC has to be triggered early.
  • The CMS collector is very sensitive to CPU resources.In the concurrent phase, it doesn’t cause the user to pause, but it slows down the application by taking up a portion of the thread,The total throughput will decrease.
  • The CMS collector cannot handle floating garbage (new garbage generated during the concurrent phase).A “Concurrent Mode Failure” may occur, resulting in another Full GC. In the concurrent marking stage, the worker thread and garbage collection thread of the program run at the same time or cross, so if new garbage objects are generated in the concurrent marking stage, CMS will not be able to mark these garbage objects, which will eventually lead to the timely collection of these newly generated garbage objects. These previously unreclaimed memory Spaces can only be freed on the next GC.

Parameter Settings:

  • -xx :+UseConcMarkSweepGC manually specifies the use of the CMS collector to perform memory reclamation tasks.
    • After this parameter is enabled, -xx :+UseParNewGC is automatically enabled. ParNew (Young)+ CMS (Old)+Serial Old.
  • – XX: CMSlnitiatingOccupancyFraction setting threshold of heap memory usage, once reached the threshold, began to recycle.
    • The default value for JDK5 and previous versions is 68, which means that a CMS collection is performed when the space usage of the older generation reaches 68%.The default value for JDK6 or later is 92%
    • If the memory growth is slow, you can set a larger threshold. A larger threshold can effectively reduce the triggering frequency of the CMS, and reduce the number of old reclaim times, which can significantly improve application performance. Conversely, if your application’s memory usage is growing rapidly, you should lower this threshold to avoid triggering the old serial collector too often. soThis option effectively reduces the number of Full GC executions.
  • – XX: + UseCMSCompactAtFullCollection is used to specify the execution after FullGC to compress the memory space, so as to avoid the generation of memory fragments. The problem, however, is that the pause times become longer because the memory compacting process cannot be executed concurrently.
  • – XX: CMSFullGCsBeforeCompaction set after how many times perform Full GC to compress the memory space.
  • -xx :ParallelCMSThreads Sets the number of CMS threads.
    • The default number of threads started by CMS is (ParallelGCThreads+3) /4. ParallelGCThreads is the number of threads for the young generation of parallel collectors. When CPU resources are tight, application performance can be very poor during the garbage collection phase due to the impact of CMS collector threads.

Development:

  • New JDK9 feature: CMS marked as Deprecate (JEP291)
    • If you enable the CMS collector with the -xx :+UseConcMarkSweepGC parameter on a HotSpot VIRTUAL machine with JDK 9 or later, you will receive a warning that the CMS will be deprecated in the future.
  • New features for JDK14: Remove CMS garbage collector (JEP363)
    • The CMS garbage collector is removed. If -xx :+UseConcMarkSweepGC is used in JDK14, the JVM will not issue an error, just a warning message, but no exit. The JVM automatically falls back and starts the JVM in the default GC mode

G1- Regionalization

Why is it called Garbage First (G1)?

  • Because G1 is a parallel collector, it divides heap memory into a number of unrelated regions (physically discontinuous). Use different regions to represent Eden, Survivor 0, Survivor 1, old age, and so on.

  • The G1 GC systematically avoids region-wide garbage collection across the entire Java heap. G1 tracks the value of garbage accumulation in each Region (the amount of garbage collection space obtained and the experience value of garbage collection time), maintains a priority list in the background, and collects garbage from the Region with the highest value according to the allowed collection time.

  • Since this approach focuses on regions where the most Garbage is collected, we gave G1 a name: Garbage First.

  • Gbage-first (G1) is a Garbage collector for server applications. It is mainly aimed at machines equipped with multi-core CPUS and large memory capacity. It can meet the GC pause time with a high probability and has high throughput.

  • In JDK1.7 version officially enabled, remove the identification of Experimental, is the default garbage collector after JDK9, replacing CMS collector and Parallel + Parallel old combination. Officially called “full-featured garbage collector” by Oracle

  • Meanwhile, CMS has been marked as deprecated in JDK9. It is not yet the default garbage collector in JDK8 and needs to be enabled using -xx :+UseG1GC.

G1 features:

  1. Parallelism and concurrency
  • Parallelism: G1 can have multiple GC threads working at the same time during collection, effectively leveraging multi-core computing power. At this point the user thread is STW
  • Concurrency: G1 has the ability to alternate execution with the application, so that some work can be performed at the same time as the application, so that, generally speaking, the application does not completely block during the entire reclamation phase
  1. Generational collection
  • In terms of generation, G1 is still a generational garbage collector. It differentiates the young generation from the old generation, and the young generation still has Eden and Survivor zones. However, from the structure of the heap, it does not require the whole Eden area, the young generation or the old generation to be continuous, nor does it insist on fixed size and fixed quantity.
  • The heap space is divided into regions, which containlogicallyThe younger generation and the older generation.
  • Unlike previous types of recyclers, it alsoTake care of the younger generation and the older generation. Compare other recyclers, either working in the younger generation or working in the older generation;
  1. Spatial integration
  • CMS: “mark-clean” algorithm, memory fragmentation, defragmentation after several GC
  • The G1 divides memory into regions.Memory reclamation is based on region. It’s a replication algorithm between regions, but it’s actually a mark-compact algorithm as a whole,Both algorithms can avoid memory fragmentation. This feature helps programs run for a long time and allocate large objects without triggering the next GC prematurely because contiguity memory space cannot be found. This is especially true when the Java heap is very large.
  1. Predictable pause time model (i.e., soft real-time soft real-time)
  • This is another advantage G1 has over CMS. In addition to pursuing low pauses, G1 also models predictable pause times, allowing users to explicitly specify that no more than N milliseconds should be spent on garbage collection within a time segment of M milliseconds.

  • Due to partitioning, G1 can select only part of the region for memory reclamation, which reduces the scope of reclamation, so that the occurrence of global pause can be well controlled.

  • G1 tracks the value of garbage accumulation in each Region (the amount of garbage collection space obtained and the experience value of garbage collection time), maintains a priority list in the background, and collects garbage from the Region with the highest value according to the allowed collection time. The G1 collector is guaranteed to achieve the highest possible collection efficiency in a limited time.

  • G1 is not necessarily as good at delaying pauses as CMS GC is at best, but much better at worst.

Comparison between CMS and G1:

Compared to CMS, G1 does not have a comprehensive, overwhelming advantage. For example, G1 has a higher garbage collection Footprint and overload than CMS during user program execution. Empirically, CMS is more likely to outperform G1 in small memory applications, while G1 is more likely to outperform G1 in large memory applications. The balance point is between 6-8GB.

Parameter Settings:

  • -xx :+UseG1GC specifies the use of the G1 collector.
  • -xx :G1HeapRegionSize Sets the size of each Region. The value is a power of 2, ranging from 1MB to 32MB, and the goal is to partition about 2048 regions based on the minimum Java heap size. The default is 1/2000 of the heap.
  • -xx :MaxGCPauseMillis sets the maximum GC pause time metric that the JVM will try to achieve, but is not guaranteed to achieve. The default value is 200ms
  • -xx :ParallelGCThread Sets the value of the number of STW worker threads. The maximum value is 8
  • -xx :ConcGCThreads Sets the number of concurrent threads to be tagged. Set n to about 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).
  • – XX: InitiatingHeapOccupancyPercent set trigger a concurrent GC cycle Java heap usage rate threshold value. If this value is exceeded, GC is triggered. The default value is 45.

Performance tuning is a three-step process

G1 was designed to simplify JVM performance tuning in three simple steps for developers

  • Step 1: Start the G1 garbage collector
  • Step 2: Set the maximum memory for the heap
  • Step 3: Set a maximum pause time

There are three garbage collection modes available in G1: YoungGC, Mixed GC, and Fu1l GC, which are triggered under different conditions.

G1 Application Scenario

  • Server – oriented applications, for withLarge memory, multiple processorsThe machine. (No surprise in a normal size heap)
  • The main application is needLow GC delaysAnd have a bunch of applications to provide solutions;
  • For example, when the heap size is about 6GB or larger, predictable pause times can be less than 0.5 seconds; G1 ensures that each GC pause is not too long by incrementally cleaning only some regions at a time, not all of them.
  • To replace the CMS collector in JDK1.5; G1 may be better than CMS when:
    • More than 50% of the Java heap is occupied by active data;
    • The frequency of object assignment or chronological lifting varies greatly;
    • GC pauses are too long (longer than 0.5 to 1 second).
  • With the exception of G1, the collector uses a built-in JVM thread to perform multi-threaded GC operations, while G1 GC can use application threads to perform background GC operations.That is, when the JVM’s GC thread is slow, the application thread is called to help speed up the garbage collection process.

Partition Region- Divide into parts

When using the G1 collector, it divides the entire Java heap into approximately 2048 independent regions of the same size, each Region size depending on the actual size of the heap, and the overall Region size is controlled between 1MB and 32MB to the NTH power of 2. That’s 1MB, 2MB, 4MB, 8MB, 16MB, 32MB. -xx :G1HeapRegionSize Can be set. All regions are the same size and do not change during the lifetime of the JVM.

Although the concept of Cenozoic and oldyn is still retained, Cenozoic and oldyn are no longer physically separated; they are collections of parts of regions (which do not need to be continuous). Dynamic Region allocation is used to achieve logical continuity.

A region may belong to Eden, Survivor, or 0LD /Tenured memory regions. However, a region can belong to only one role. In the figure, E indicates that the region belongs to Eden memory region, S indicates that the region belongs to Survivor memory region, and 0 indicates that the region belongs to old memory region. Blank Spaces in the figure represent unused memory space. The G1 garbage collector also adds a new memory region called the Humongous memory region, shown in block H. It is used to store large objects. If the number of regions exceeds 1.5, the objects are placed in region H

The reason for setting H:

Large objects in the heap are directly assigned to the old age by default, but if it is a short-lived large object, this can have a negative impact on the garbage collector. To solve this problem, G1 has a Humongous section, which is dedicated to large objects. If an H block does not fit a large object, G1 looks for contiguous H blocks to store. Sometimes you have to start the Full GC in order to find consecutive H regions. Most of G1’s behavior treats the H region as part of the old age.

Remembered Set

A Region cannot be isolated. Objects in a Region can be referenced by objects in any Region. Do YOU need to scan the entire Java heap to determine whether an object is alive? In other generational collectors, there is a problem (and G1 is more prominent) that collecting the new generation also has to scan the old generation. This reduces the efficiency of the Minor GC;

Solutions:

  • For both G1 and other generational collectors, the JVM uses Remembered Set to avoid global scans:
  • Each Region has a Remembered Set.
  • Each time a Reference data Write operation is performed, a Write Barrier operation is generated. Then check whether the Reference to be written refers to an object in a different Region from the Reference type data (other collectors: check whether old objects refer to new ones).
  • If not, the references of other regions are recorded in the Remembered Set of the Region where the references point to the object through CardTable.
  • When garbage collection is performed, add the enumeration scope of the GC root to Remembered Set. You can guarantee that no global scan will be done, and there will be no omissions.

The main part of the garbage recovery process

  • Young GC
  • Concurrent Marking in the old days
  • Mixed GC
  • Single-threaded, exclusive, high-intensity Full GC will continue to exist if needed. It provides a fail-safe mechanism against GC evaluation failures, namely strong collection. (Optional)

  1. The application allocates memory and starts the young generation reclamation process when the young generation’s Eden area is exhausted. G1’s young-generation collection phase is a parallel, exclusive collector. During the young generation collection period, the G1 GC suspends all application threads and starts multithreading to perform the young generation collection. Then move the surviving object from the young generation to the Survivor or the old, or possibly both.

  2. When heap memory usage reaches a certain value (45% by default), the old-age concurrent marking process begins.

  3. Mark completion to begin the mixed recycling process immediately. For a mixed payback period, the G1 GC moves live objects from the old period to the free period, which becomes part of the old period. Unlike the young generation, the G1 collector of the old generation does not need to recycle the entire old generation, but only scan/reclaim a small number of old regions at a time. At the same time, the old Region is reclaimed along with the young generation. Start four or five mixed collections after marking is complete.

Young GC process

  • When JVM starts, G1 prepares Eden area first, and the program continuously creates objects to Eden area during the running process. When Eden space runs out, G1 will start a young generation garbage collection process.
  • Young generation garbage collection will only collect Eden and Survivor areas.
  • In YGC, G1 stops The execution of The application stop-the-world first and creates a Collection Set, which refers to The Collection of memory segments that need to be reclaimed. The Collection in The young generation reclamation process contains all memory segments in The Eden area and Survivor area of The young generation.

Detailed stages:

  1. Scan root

The root refers to the object to which the static variable points, the local variable in the chain of method calls being executed, and so on. The root reference, along with the external reference to the RSet record, serves as the entry point for scanning the living object.

  1. Update the RSet

Process cards in the Dirty Card queue and update the RSet. After this phase is complete, the RSet can accurately reflect the reference of the old age to the object in the memory segment.

  1. Processing RSet

Identify the objects in Eden that are pointed to by the old objects. The objects in Eden that are pointed to are considered alive.

  1. Copy the object

At this stage, the object tree is traversed, and the surviving objects in the memory segment of Eden area will be copied to the hollow memory segment of Survivor area. If the age of surviving objects in the memory segment of Survivor area does not reach the threshold, the age will be increased by 1. When the age reaches the threshold, the surviving objects will be copied to the hollow memory segment of old area. If Survivor space is insufficient, some data in Eden space will be promoted directly to the old space.

  1. Deal with reference

Handle Soft, Weak, Phantom, Final, JNI Weak etc references. Finally, the data in Eden space is empty, GC stops working, and the objects in the target memory are continuously stored without fragmentation. Therefore, the replication process can achieve the effect of memory consolidation and reduce fragmentation.

Concurrent Marking in the old days

  1. Initial marking stage

Marks objects directly reachable from the root node. This phase is STW and triggers a young GC.

  1. Root Region Scanning

The G1 GC scans the old-age region objects that are directly reachable from the Survivor region and marks the referenced objects. This process must be completed before youngGC.

  1. Concurrent Marking

Concurrent marking (and application concurrent execution) throughout the heap may be interrupted by youngGC. During the concurrent marking phase, if all objects in a region object are found to be garbage, the region is immediately reclaimed. At the same time, the object activity (the percentage of living objects in the region) of each region is calculated during concurrent tagging.

  1. Mark again (Remark)

As the application continues, you need to fix the last marked result. Is the STW. G1 uses a faster initial snapshot algorithm than CMS: snapshot-at-the-beginning (SATB).

  1. Exclusive cleanup (STW)

Calculate the ratio of live objects and GC collection for each region and sort them to identify regions that can be mixed for collection. Set the stage for the next phase. Is the STW. This phase does not actually do garbage collection

  1. Concurrent cleanup phase

Identify and clean areas that are completely free.

Mixed GC

As more and more objects are promoted to old regions, the virtual machine fires a Mixed garbage collector, known as a Mixed GC, to avoid running out of heap memory

It is not an old GC. In addition to the entire Young Region, a portion of the old Region is reclaimed.

Note here: part of the old era, not all of it. You can select which old regions to collect, thus controlling the garbage collection time. Also note that Mixed GC is not a Fu1l GC.

  • After the concurrent marking ends, the segments that are 100% garbage in the old age are reclaimed and the segments that are partially garbage are calculated. By default, these older memory segments are collected eight times (which can be set to -xx :G1MixedGCCountTarget).

  • The Collection Set of a mixed Collection consists of one-eighth of old age segments, Eden segment, and Survivor segment. The algorithm of hybrid collection is exactly the same as the algorithm of young generation collection, but it collects more memory segments of the old generation. Please refer to the young generation recycling process above for details.

  • Since memory segments are recycled eight times by default in older generations, G1 prioritises memory segments with more garbage. The higher the percentage of garbage in memory segments, the more garbage will be collected first. There is a threshold will determine whether memory segments are recycled, – XX: G1MixedGCLiveThresholdPercent, the default is 65%, mean waste of memory block can be recycled to reach 65%. If the garbage ratio is too low, it means that there is a high percentage of live objects, which will take more time to replicate.

  • Mixed recycling does not have to be done eight times. There is a threshold -xx :G1HeapWastePercent, which defaults to 10%, meaning that 10% of the total heap memory is allowed to be wasted, meaning that if the percentage of garbage that can be recycled is less than 10% of the heap memory, no mixed recycling is done. Because GC takes a lot of time but recycles very little memory.

Full GC(optional)

The G1 was designed to avoid Fu11 GC. But if that doesn’t work, G1 stops The application’s execution (stop-the-world) and uses a single-threaded memory reclamation algorithm for garbage collection, with poor performance and long application pauses.

There are two possible causes of G1 Fu1l GC:

  1. [Fixed] Recycle does not have enough to-space to store promoted objects
  2. Space runs out before the concurrent processing completes.

Recovery phase problem

According to Official information from Oracle, the Evacuation phase was designed to be executed concurrently with user programs, but this is complicated and is not urgent, considering that G1 only reclaims part of a Region and pause times are controlled by the user. Instead, we chose to put this feature into the low-latency garbage collector (ZGC) that emerged after G1.

In addition, considering that G1 is not only geared toward low latency, pausing the user thread can maximize garbage collection efficiency, so the implementation of pausing the user thread completely was chosen to ensure throughput.

G1 Optimization Suggestions

  1. Young generation size
  • Avoid using options such as -xmn or -xx :NewRatio to explicitly set the young generation size
    • If the young generation size is fixed, the heap will not be able to adjust dynamically, and the set pause time will be invalid
  • Fixed the size of the young generation to override the pause time target
  1. Don’t be too strict with your pause time goals
  • The throughput goal for the G1 GC is 90% application time and 10% garbage collection time
  • When evaluating G1 GC throughput, don’t be too harsh with pause time goals. Being too stringent means you are willing to incur more garbage collection overhead, which directly affects throughput.

Shenandoah – low pause

Shenandoah is undoubtedly the loneliest of the many GC’s. The first HotSpot garbage collector not developed by the Oracle team. Inevitably ostracized by the authorities. For example, Oracle, which claims there is no difference between the 0penJDK and OracleJDK, still refuses to support Shenandoah in OracleJDK12.

Shenandoah Garbage Collector An implementation of the Pauseless GC, a garbage collector research project originally undertaken by RedHat, was designed to meet the need for low pauses for memory reclamation on the JVM. Contributed to OpenJDK in 2014.

Red Hat’s Shenandoah team claims that the Shenandoah garbage collector pauses regardless of the size of the heap, meaning 99.9% of the time, whether the heap is set to 200MB or 200GB, can limit garbage collection pauses to less than 10 milliseconds. However, actual usage performance will depend on the actual working heap size and workload.

Shenandoah is a region-based garbage collector that, like G1, maintains the entire heap as a collection of regions. However, Shenandoah does not need remember set or card table to record cross-region references.

Main stages:

Shenandoah GC Each GC cycle consists of two STW (Stop The World) phases and two concurrent phases.

  1. STW is scanned for the root collection during the initialization tag phase.

  2. The Concurrent marking phase, where the Shenandoah GC runs with the Java worker thread,

  3. The final markup phase is STW, followed by a concurrent evacuation phase.

Detailed stages:

  1. Init Mark The initialization phase of the concurrent tag, which prepares the heap and application threads for the concurrent tag, and then scans the root collection. This is the first pause in the entire GC lifecycle, and this phase is mostly about root collection scanning, so the pause time depends on the root collection size.

  2. Concurrent Marking runs through the heap, starting with the root collection and tracking all objects reachable. This phase runs with the application, known as concurrent. The duration of this phase depends largely on the number of living objects and the structure of the object graph in the heap. Since the application can still allocate new data at this stage, the heap usage increases during the concurrent marking phase.

  3. Final Mark emptying all pending tag/update queues, rescan the root collection, and terminate the concurrent tag. The session also clarifies the region, or garbage collection, that needs to be cleaned, and usually prepares for the next stage. The final mark is the second pause phase of the entire GC cycle, where part of the work can be done in the concurrent pre-clean phase, where the most time-consuming phases are empting queues and scanning root collections.

  4. Concurrent Cleanup immediate garbage areas – areas where no living objects can be detected after Concurrent markings.

  5. Concurrent Evacuation copies living pairs from garbage collection collections into other regions, which is a major difference from other OpenJDK GCS. This phase can run with the application again, so the application can continue to allocate memory. The duration of this phase depends on the size of the selected garbage collection (for example, the whole heap is divided into 128 regions. If 16 regions are selected, it will take more than 8 regions to be selected).

  6. Init Update Refs initializes the Update reference phase, which does nothing but ensure that all GC and application threads have completed the concurrent Evacuation phase and are ready for the next GC phase. This is the third and shortest pause in the entire GC cycle.

  7. Concurrent Update References iterate through the heap again, updating References to objects that are moved during the Concurrent evacuation phase. This is also a major difference from other OpenJDK GC’s. The duration of this phase depends primarily on the number of objects in the heap, regardless of the structure of the object graph, because this process is a linear scan of the heap. This phase is run concurrently with the application.

  8. The Final Update Refs completes the Update reference phase by updating the existing root collection again, and it will also reclaim regions in the collection because the heap now has no references to objects in those regions. This is the last phase of the entire GC cycle, and its duration depends largely on the size of the root collection.

  9. Concurrent Cleanup recycles collections of regions that currently have no references.

All four pauses are primarily determined by GC root size, not heap size

RedHat published a paper in 2016 that tested using ES to index 200GB of wikipedia data. From the results:

  • The pause time is a qualitative leap compared to other collectors, but it does not achieve the maximum pause time under ten milliseconds.
  • There was a significant drop in throughput, with the longest total running time of any test collector.

The advantages and disadvantages:

  • Disadvantages: Throughput degradation under high operating load.
  • Advantages: Low latency.

Parameter Settings:

-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC

ZGC- Regionalization without generation, low pause

ZGC is highly similar to Shenandoah’s goal of achieving low latency that limits garbage collection pauses to less than 10 milliseconds at any heap size with as little impact on throughput as possible.

The ZGC collector is a garbage collector based on Region memory layout, with (for now) no generation, using techniques such as read barriers, dye Pointers, and memory multiple mapping to implement concurrent mark-compression algorithms, with low latency as the primary goal.

ZGC is executed almost everywhere concurrently, except for the STW that is initially marked. So the pause time is almost spent on the initial tag, and the actual time for this part is very little.

ZGC divides memory into regions, also known as ZPages. ZPage can be created and destroyed on the fly. These can also be dynamically sized (unlike the G1 GC) and are multiples of 2MB. Here is the large group of heap areas:

  • Small (2 MB)
  • (32 MB)
  • Large (N * 2 MB)

These heap regions can occur multiple times in the ZGC heap. Large areas are allocated continuously, as shown below:

Unlike other GCS, the ZGC’s physical heap area can be mapped to a larger heap address space (which can include virtual memory). This is critical to addressing memory fragmentation. Imagine a user being able to allocate a very large object in memory, but unable to do so because contiguous space in memory is not available.

This typically results in multiple GC cycles to free up enough contiguous space. If none is available, even after (multiple) GC cycles, the JVM will shut down and display an OutOfMemoryError. However, this particular use case is not a problem for THE ZGC. Because physical memory maps to a larger address space, it is feasible to locate a larger contiguous space.

Features:

  • Concurrent GC
  • Support elastic expansion
  • Low latency (ZGC guarantees application latency of no more than 10 milliseconds, regardless of heap size)
  • Uncommitted memory is returned to the operating system
  • It can mark memory, copy and relocate memory, all operations are concurrent, and it has a concurrent reference handler
  • All other garbage collectors usestore barriersZGC useload barriersFor tracking memory
    • The lock – > unlock – > read – > read memory load
    • Use – > assign – > store – > write write memory

There are four stages of ZGC:

  1. Concurrent tags
  2. Concurrent preparatory reallocation
  3. Concurrent redistribution
  4. Concurrent remapping

Parameter Settings

  • Prior to JDK11-JDK14, ZGC was only supported by Linux.

  • ZGc can now be used on MAC or Windows.

    • -XX:+UnlockExperimentalVMOptions -XX:+UseZGC

Reference: Complete guide to Z Garbage Collector (ZGC)

AliGC and Zing

AliGC: alibaba JVM team based on G1 algorithm, for LargeHeap application scenarios.

Zing: Famous low latency GC

Garbage collector summary

  • Minimize memory and parallel overhead, single core: Serial: Serial GC
  • Maximizing application throughput: Parallel: Parallel GC
  • Minimize GC interruption or pause times: concurrency: CMS
  • Low pause, large memory: concurrent parallel: G1,Shenandoah,ZGC

The following relationship is updated to JDK14

Related terms

  • parallelMultiple garbage collection threads working together can cause the application to stop
  • serialThe garbage collector has only one thread working
  • stop the worldApplication stop
  • concurrentThe garbage collector runs in the background, and the application runs at the same time
  • incrementalStop the garbage collection until it’s finished, and come back later to finish the rest

dirty card queue

For the application’s reference assignment statement Object. field=object, the JVM performs special operations before and after to enqueue a card that holds object references in the dirty Card queue. During the recycle of the young generation, G1 will process all cards in the Dirty Card Queue to update the RSet and ensure that the RSet accurately reflects the reference relationship in real time.

Why not update the RSet directly at the reference assignment statement? This is for the sake of performance, RSet processing requires thread synchronization, which can be very expensive, using queue performance is much better.

The ZGC introduces two new concepts, pointer coloring and load barriers

Point Coloring

This feature allows ZGC to discover, mark, locate, and remap objects. It only works on 64-bit operating systems, and Colored Pointer requires virtual address masking.

  • Finalizable objects can be reached by Finalizer
  • Marked0 and Marked1 tags are reachable objects
  • The remap reference points to the location of the current object, which may be relocate

Load Barrier

A load barrier is a piece of code that is run when a thread loads a reference from the heap. For example, when we access an object property of a non-primary type.

In ZGC, the load barrier checks the referenced metadata bits and does something to the referenced object based on the metadata bits, so that the reference may be removed when the object is fetched without affecting its use.

GC Log Analysis

Parameter Settings

  • -xx :+PrintGC Prints Gc logs. Similar: verbose: gc
  • -xx :+PrintGCDetails Displays GC details logs
  • -xx :+PrintGCTimeStamps prints GC timestamps (in base time)
  • -xx :+PrintGCDateStamps Prints the GC timestamp (in the form of a date, e.g. 2013-05-04T21:53:59.234 +0800)
  • -xx :+PrintHeapAtGC Prints heap information before and after Gc
  • -Xloggc:.. /logs/gc.log Output path of the log file

Log analysis

/** * -Xms60m -Xmx60m -XX:SurvivorRatio=8 -XX:+PrintGCDetails -Xloggc:./logs/gc.log * */
public class GCLogTest {
    public static void main(String[] args) {
        ArrayList<byte[]> list = new ArrayList<>();

        for (int i = 0; i < 500; i++) {
            byte[] arr = new byte[1024 * 100];//100KB
            list.add(arr);
            try {
                Thread.sleep(50);
            } catch(InterruptedException e) { e.printStackTrace(); }}}}Copy the code
  1. Run parameters: -xMS60m -XMX60m -xx :SurvivorRatio= 8-XX :+PrintGC

Parameter analysis:

  • GC, Full GC: GC type, GC only on Cenozoic, Full GC includes immortal generation, Cenozoic, old generation.
  • Allocation Failure: The reason why GC occurs.
  • 46179K->46157K(59392K): Heap size before and after GC.
  • 59392K: Total heap size.
  • 0.0032093 SECS: GC duration.
  1. Run parameters: -xMS60m -XMX60m -xx :SurvivorRatio= 8-XX :+PrintGCDetails

Parameter analysis:

  • GC, Full FC: Type of GC, GC is only performed on Cenozoic generation, Full GC includes immortal generation, Cenozoic generation and old generation.
  • Allocation Failure: The reason why GC occurs.
  • PSYoungGen: The size change before and after the generation GC using the Parallel Insane garbage collector
  • ParOldGen: Changes in size before and after GC using a Parallel Old garbage collector
  • Metaspace: changes in the size of metadata areas before and after GC. Metadata areas were introduced in JDK1.8 to replace permanent generations
  • 0.0122215 secS: Indicates the GC time
  • Times: user: refers to all CPU time spent by the garbage collector, sys: time spent waiting for system calls or system events, and real: time spent from start to finish by the GC, including the actual time spent by other processes in the time slice
    • [Times: user=0.00 sys=0.00, real= 0.01secs]
  1. instructions
  • The GC and Full GC indicate The type of pause for The garbage collection, with “Full” indicating that The GC has “stopped The World”
  • Using Serial the collector in the New Generation is named Default New Generation, so it displays “DefNew”
  • – The name of the collector using ParNew in the New Generation will change to “ParNew”, which means “Parallel New Generation”
  • Use the Parallel Scavenge collector. The Cenozoic insane is called “PSYoungGen”.
  • The name of the old collector is determined by the collector, just like the new one
  • Using the G1 collector, it shows “garbage-first heap”
  • Allocation Failure: Indicates that the GC is caused because there is not enough space in the young generation to store new data.
  • [PSYoungGen: 5986K->696K(8704K) ] 5986K-> 704K (9216K)
    • Size of young generation before GC collection, size of young generation after GC collection, (total size of young generation)
    • Parentheses: GC collects the size of the previous young generation and the old generation, and the size after the collection, (total size of the young generation and the old generation)
  • User indicates the reclaim time in user mode, sys kernel mode, and REA mode. Due to multicore reasons, the sum of time can

Energy will exceed real time

GC:

Full GC graph:

  1. Common log analysis tools are as follows:

Memory Analyzer Tool, GCViewer, GCEasy, GCHisto, GCLogViewer, Hpjmeter, Garbagecat, etc.

In-depth understanding of the JVM family

  • 1. In-depth understanding of the JVM (I) – Introduction and architecture
  • 2. In-depth understanding of the JVM II – 1 classloader subsystem
  • 3. In-depth understanding of THE JVM (III) – Runtime data area (virtual machine stack)
  • 4. In-depth understanding of THE JVM (IV) – runtime data area (program counter + local method stack)
  • 5. In-depth understanding of THE JVM (V) – Runtime data area (heap)
  • 6. In-depth understanding of THE JVM (VI) – Runtime data area (methods area)
  • 7. In-depth understanding of the JVM (vii) – execution engine (interpreter and JIT compiler)
  • 8. An in-depth understanding of the JVM (8) – string constant pool
  • 9. In-depth understanding of JVM (IX) – object instantiation and memory layout
  • 10. In-depth understanding of JVM (x -) – bytecode level profiler execution
  • 11. In-depth understanding of concepts related to JVM (xi) garbage collection
  • 12. In-depth understanding of JVM (xii) – garbage collection algorithms