In our last article, we introduced the basic concept of ZGC and Ali’s ZGC scale-up practice, and saw that Ali’s business and customers on the cloud enjoyed the response time optimization brought by ZGC, but also encountered some practical problems. In order to use ZGC well, we need to understand some of the principles of ZGC and learn how to analyze ZGC logs and tune ZGC.

First taste of the new Garbage Collector ZGC

ZGC principle

From a macro perspective, ZGC is a concurrent compacting GC algorithm:

  • Concurrency: While the Java thread runs, the GC thread silently executes in the background;
  • Compressed: Periodically collates live objects in the heap to resolve memory fragmentation.

Compared to Java’s legacy Parallel GC and G1, which were suspended at the 100-millisecond level, and CMS, which had unresolved fragmentation issues, concurrent and compressed ZGC is a major leap forward in Java’s GC capabilities — GC threads can defragment memory while simultaneously You can let the Java thread continue executing. The ZGC uses a mark-compression strategy to reclaim the Java heap: the ZGC will first concurrently mark active objects in the heap and then concurrently relocate active objects in parts of the heap. The difference here from the earlier Java GC is that the current ZGC is a single-generation garbage collector that iterates through all objects in the heap during the marking phase. So how does ZGC do concurrent markup and migration? This brings us to the core technologies behind ZGC — load barriers and Colored Pointers.

ZGC’s read barrier is to insert a piece of processing logic for the pointer during the pointer load operation:

  • If a pointer points to an object that has been moved, the read barrier corrects that pointer;

  • During the marking phase, if the pointer is not marked, the read barrier marks the pointer;

  • In the transition phase, if the pointer points to the region to be moved, the object to which the pointer points is moved, and then the pointer is modified.

Read barriers ensure that the correct object is accessed each time a pointer is loaded in the event that the GC thread is running concurrently with the Java thread.

The coloring pointer of ZGC takes the unused bit at the high position of the pointer as the color of the pointer to represent the state of the pointer, so that the read barrier can directly obtain the state of the pointer when processing the pointer and decide which way to process the pointer. The production-ready ZGC supports 2^44=16TB of addressing space and actually uses 44+4=48 bits as the address of the coloring pointer, where the top four bits are the color of the pointer. The dye pointer works with the read barrier to convert the conditional judgment part of the read barrier to the pointer color judgment. If the pointer color is “wrong”, the read barrier will fix the pointer to “correct”.

ZGC log analysis

The actual execution of a single ZGC cycle requires three short pauses, each followed by several concurrent phases.

[2020-12-23T13:30:57.402+0800] GC(10) Garbage Collection (Allocation Rate) [2020-12-23T13:30:57.408+0800] GC(10) Pause Concurrent Mark 674.216ms [2020-12-23T13:30:58.087+0800] Concurrent Mark 674.216ms [2020-12-23T13:30:58.087+0800] GC(10) Pause Mark End 1.336 MS [2020-12-23T13:30:58.105+0800] GC(10) Concurrent Process Non-strong References 18.293ms [2020-12-23T13:30:58.111+0800] GC(10) Concurrent Reset Relocation Set 5.533 MS [2020-12-23T13:30:58.111+0800] Concurrent Destroy Detached Pages 0.001 MS [2020-12-23T13:30:58.121+0800] GC(10) Concurrent Select Relocation Set [2020-12-23T13:30:58.136+0800] GC(10) Concurrent Prepare Relocation Set 9.083 MS [2020-12-23T13:30:58.136+0800] GC(10) Concurrent Relocate Start 2.452 MS [2020-12-23T13:30:58.203+0800] GC(10) Concurrent Relocate 66.595 MS... [2020-12-23T13:30:58.203+0800] GC(10) Garbage Collection (Allocation Rate) 62020M(76%)->41270M(50%)Copy the code

The above GC log shows a typical ZGC cycle. The Pause phase is defined as the phase starting with Pause in each line of each cycle

  • Pause Mark Start
  • Pause Mark End
  • The Relocate will begin.

You can see in the GC log above that the three pause phases of ZGC are significantly less than 10ms. These three pause phases are mainly responsible for the marking and transferring of GC Roots, as well as the synchronization of the marking threads.

The three pause phases are followed by the phases beginning with Concurrent, of which the two core phases are

  • A Concurrent Mark;
  • Concurrent Relocate.

The rest of the concurrency phase is mostly preparatory work before concurrency migration.

Diagrams of the ZGC phases

The current concurrent markup of THE ZGC marks all active objects in the entire heap, which is distinct from the G1/CMS/Parallel GC subgeneration and belongs to a single-generation GC. The concurrent marking process incidentally fixes the bad pointer in the heap. To reduce the burden of migrated objects, ZGC’s concurrent migration policy selects regions that reach a certain ZFragmentationLimit, similar to G1’s Garbage First policy.

ZGC tuning

The following describes the tuning details related to ZGC, and users should complete at least the basic tuning.

Basic tuning

In general, the ZGC should set the heap size (Xmx) and the number of concurrent GC threads (ConcGCThreads). It is recommended that all ZGC users enable GC logging, generally -xlog: GC *:gc.log:time is recommended to record more ZGC details.

Heap size

GC usually requires the developer to specify a heap size that is greater than the total size of active objects in the heap. The higher the ratio of redundant Spaces, the better GC performance is generally. For example, if the estimated total size of objects is 32GB, you can set -xmx40g to enable the 40GB heap.

ZGC differs from traditional GC in that the Java thread allocates new objects at the same time that ZGC collects objects. Therefore, ZGC requires a higher proportion of redundant space than traditional GC.

The total size of allocated objects during each ZGC round can be estimated as “allocation speed · time for a single ZGC round”, so the size of heap space should be greater than “total size of active objects + total size of allocated objects during a single ZGC”.

The above “allocation speed” and “Single round ZGC time” statistics can be found in the GC log.

Number of concurrent GC threads

The default number of concurrent GC threads is 1/8 of the number of CPU cores. For example, on a 16-core machine, if ConcGCThreads is not specified, ZGC will start two concurrent GC threads.

If “Allocation Stall” appears frequently in the GC log, indicating that the collection cannot keep up with the Allocation, then ConcGCThreads may need to be improved. Of course, ConcGCThreads cannot be multiplied indefinitely, because too many concurrent GC threads can consume CPU resources and even affect the proper execution of Java threads.

Note that concurrent GC threads (ConcGCThreads), which can execute concurrently with Java threads, are different from ParallelGCThreads (ParallelGCThreads), which are the GC threads for GC pauses.

Advanced tuning

Product Ready ZGC also supports several advanced ZGC tuning options, refer to the instructions for Alibaba Dragonwell 11.0.11.7

Github.com/alibaba/dra…

The core part of advanced tuning is the control of GC trigger timing. Since ZGC still allocates objects during collection, it cannot wait until the heap space is full before GC is triggered. Instead, it needs to trigger GC some time in advance so that the heap space will not be full during THE EXECUTION of THE ZGC, resulting in Allocation Stall or OOM. However, if the ZGC fires too frequently, CPU resource consumption increases and throughput decreases.

Dragonwell11 supports the following GC trigger timing options:

  • ZAllocationSpikeTolerance: ZGC by estimating the distribution of velocity, single wheel ZGC time “to estimate total size distribution during the period of single ZGC object, as long as the total size is smaller than the rest of the current heap space, you need to trigger the GC. But because the Java business allocation rate is not stable, so need to take on allocation rate ZAllocationSpikeTolerance “burr” coefficient, and conservative to trigger GC in advance. If the Java business Allocation rate is not stable, the occasional Allocation Stall occurs, then it should consider increase ZAllocationSpikeTolerance appropriately.
  • ZCollectionInterval: GC is triggered periodically to avoid a long GC interval.
  • ZProactive: Literally “actively triggering GC” to handle low allocation rate situations.
  • ZHighUsagePercent: When the water level of the heap exceeds this percentage, the ZGC is triggered.

The ZGC fires as long as one of the above conditions for GC firing timing are met.

The SoftMaxHeapSize option sets the “soft upper limit” of ZGC heap space, which is between Xmx and Xms. Above ZAllocationSpikeTolerance/ZProactive/ZHighUsagePercent SoftMaxHeapSize values as ZGC heap space “soft cap”, You can scale up to Xmx heap space when the allocation speed is too fast and shrink to Xms when the allocation speed is slow. SoftMaxHeapSize usually requires -xx :+ZUncommit.

There are also some useful advanced tuning features:

  • ZFragmentationLimit: Controls the fragmentation degree of ZGC objects. The lower the ZFragmentationLimit is, the more thoroughly ZGC is collected.
  • ZMarkStackSpaceLimit: Adjust the ZGC tag stack space size;
  • ZUnloadClassesFrequency: controls the unloading frequency of the ZGC class;
  • ZRelocationReservePercent: control of ZGC reserve allocation, reduce OOM risk;
  • ZStatisticsInterval: controls the output frequency of statistics in ZGC logs. The previous output of statistics once every 10 seconds may affect the interpretation of GC details.

(the above ZHighUsagePercent/ZUnloadClassesFrequency/ZRelocationReservePercent Dragonwell11 specific options. Avoid using these options when switching to other versions of OpenJDK.

In later chapters, readers will see that our Alibaba Dragonwell11 solves some of the problems in production practices by adapting the ZGC to production readiness.

About the author

Tang Hao joined aliyun programming language and compiler team in 2019. Currently, he is engaged in JVM memory management optimization. Now DragonWell has joined the OpenAnolis Community (Java language and virtual machine SIG), at the same time, Anolis OS version 8 support DragonWell cloud native Java, welcome to join the community SIG, participate in the community.

SIG address: openanolis. Cn/SIG/Java/do…