1 preliminaries

The Epsilon GC garbage collector, ZGC, has been introduced to OpenJDK11. Shenandoah GC is a combination of OpenJDK11 and OpenJDK12 projects, named JEP333 and JEP189 respectively, based on G1. These two garbage collectors, which are currently under development, are the main performance comparisons with CMS and G1, two earlier concurrent collectors. Then I hit on a theme, which was comparing the performance of two garbage collectors. What is the comparison? What performance are you comparing? Can one garbage collector crush another garbage collector in every aspect of the performance dimension? I have my own opinions on this question. I have been thinking, in terms of JVM, teacher zhou zhiming’s masterpiece “in-depth understanding of Java virtual machine” third edition was completed in 2019, in which the principle analysis of ZGC and ShenandoahGC for low latency garbage collector has been done in detail, whether there is a need to write more articles to make some explanations. But then I thought, even though there is so much to be said about the JVM garbage collector, it would be a waste of intellectual value not to share it if even a little bit of my own opinion is valuable. So after some hesitation, I decided to share my understanding of THE JVM GC.

So what’s the scale of the impossible triangle we’re talking about with THE JVM garbage collector? As we all know, we can express the relation between JDK, JRE, and JVM as follows:



What we need to discuss in this article is just one part of the Java HotSpot Client and Server VM: the garbage collector

What are the most important metrics to describe a garbage collector?

According to the description of Teacher Zhou Zhiming in The third edition of In-depth Understanding of Java Virtual Machine in 2019, they are respectively —

A. Memory usage in the Heap range (size of off-heap auxiliary memory space) b. Throughput without GC (effect of read/write barriers) c. Delayed pauses (STW duration), these three indicators form an impossible triangle similar to CAP theory of transactions.



All of the dimensions we use to describe garbage collectors are meant to resolve the conflict between these three things, and it’s simply a matter of who to choose, who to sacrifice, or who to give more to and who to give less to.

When we talk about garbage collectors, we usually talk about the following:



The following three groups of classification: memory model classification, recycling algorithm by execution process classification and recycling algorithm by algorithm concept classification, one by one to do a simple explanation, this paper is limited by space, the core focus is to classify according to memory model.

According to the Java Virtual Machine specification, the memory managed by the Java Virtual machine is divided into the following five areas.

Strictly speaking, the following diagram cannot be called a memory model, but should be called a partition of memory:



The JVM garbage collector mainly manages the heap region within these five memory regions, and the memory model we are talking about is the MEMORY model of the JVM heap region.

For JVMS, I’ve concluded that the types of heap memory space fall into four broad categories:

A. Generation model B. Generation partition model C. Generation partition model D. Hierarchical partition model

Citing the explanation of memory model given by Our teacher Feng Zhiming, this explanation is very simple and simple, so it is a pity not to share it with you:

The concept of the memory model (also known as the memory consistency model) comes from hardware. Is a concept constructed to solve the problem of CPU cache and memory consistency. Since there are read and write operations, there are four consistency scenarios: LoadLoad, LoadStore, StoreLoad, and StoreStore. Each CPU has its own memory model. There are some strong consistency models, like x86, that only have consistency problems in StoreLoad scenarios. Some weak consistency models, such as ARMv7, have consistency problems in all four scenarios.

The weaker the consistency model, the simpler the CPU architecture and the higher the performance. But the demands on software developers are getting higher. Linux provides three assembly primitives to handle different conformance scenarios: write barrier, read barrier, and full barrier. In order to do a good job of data synchronization operations under different CPU and different memory models. The JVM is both software and virtual machine (virtual CPU), so the JVM must also have its own memory model (JMM). This memory model also covers all cpus, so the JMM is more general and opaque.

So the old JMM defined eight operations (obsolete concepts) read, Load, Store, write, and so on. Working memory can easily be mistaken for a part of memory, but it actually refers to the CPU cache. The JMM model is divided into four scenarios, but if you look at the JVM source code for x86, you’ll see that only StoreLoad() has the actual barrier code. The other three barrier functions are empty. In summary, the JMM is a universal memory model, and the JVM executes code according to the CPU’s memory model.

2. Generation model

Serial collector, ParNew collector, Parallel Scavenge collector, Serial Old collector, Parallel Old collector, CMS collector. Heap memory is divided into new generation and old generation, which are physically isolated.

The generation model is now known as the generation collection model, also known as the David Ungar heap memory model.



The designer of this memory model and the inventor of the garbage collection algorithm is an American engineer named David Ungar. The earliest description of generational recycling appears in David Ungar’s 1984 paper Generation Cycles: A non-disruptive High Performance Storage Reclamation Algorithm.

Why does generational recycling happen? The main reason is that a large number of objects in memory can become unreachable in a short period of time. A lot of objects don’t work after they’ve been used a few times and it’s a rule of thumb, but it applies to most situations. There are three hypotheses that form the basis of generational collection in Teacher Zhou Zhiming’s book, which are as follows:

1) Weak generation hypothesis: the vast majority of objects are ephemeral. 2) Strong generation hypothesis: the more garbage collections an object passes, the less likely it is to die. 3) Cross-generation citation hypothesis: Cross-generation citation is very rare compared with same-generation citation.

Based on these three hypotheses, the new generation GC only treats newly generated objects as objects, which reduces the time consumed by garbage collection within the same Heap range by reducing the scope of objects. Old-age GC is performed for old-age objects that are harder to garbage. The STW of the old GC is longer than that of the new generation GC but not longer than the length of the fullGC if it is executed without generation.

Assuming that the GC throughput efficiency of this algorithm is 4 times higher than the GC throughput efficiency of this algorithm and the maximum STW interval time is unchanged, it is assumed that the GC throughput efficiency of this algorithm is the same as the GC throughput efficiency of this algorithm and the maximum STW interval time is unchanged.

It is important to note that this increase in GC throughput efficiency is probabilistic. If the object generated by the service happens to live for a long time, the GC throughput efficiency will actually decrease, because generational collection will do multiple YGCs on top of FGC, even without considering THE YGC factor. It can also reduce throughput efficiency due to write barriers created by generation.

It should also be mentioned that for hypothesis 3: This class Remembered Set is called Card Marking in the JVM. The purpose of this class Remembered Set is to record references from older objects to younger objects. Prevents YGC from traversing older objects to see if they have references to younger objects. As we can see, if hypothesis 3 is not true, then you can imagine a scenario where YGC not only traverses the new generation objects but also traverses the objects in Remembered Set that are of the same size as the old ones. GC throughput must be very low. Also, the JVM wastes a lot of memory for Remembered Set, and the overall memory utilization of the JVM is not high.

Card Marking method is published by Paul Wilson and Thomas Moher in ACM SIGPLAN in 1989. In this method, the old chronological space is firstly divided according to equal sizes, and each space divided is called Card. When an object in an old age refers to an object of the new generation, the card corresponding to the old age will write a pointer to the object of the new generation. In this way, when the object of the new generation is referenced by the card in the card table, it will not be collected.

Speaking of Remembered Set and Card Marking, there is another concept called write barriers, and what write barriers do is describe some logic that needs to be added to remember Set to identify changes in this reference. In vernacular, this is a specific operation performed before or after a write. Generational collectors or garbage collectors with generational concepts have the concept of write barriers. In the JVM, write barriers are implemented in Dirty Card mode.

3. Generation and partition model

The generation and partitioning model is represented by the Garbage First(G1) Garbage collector. G1 breaks the previous pattern of fixed collection scope in the new generation or the old age, and G1 divides the Java heap space into several regions of the same size, namely regions. It includes Eden, Survivor, Old, and Humongous. Humongous is a special Old type that places large objects. This partitioning means that there is no need for a contiguous memory space management object. G1 divides the space into multiple zones, giving priority to those with the most garbage collection. The G1 uses good space integration capabilities without creating a lot of space debris. The value of a Region is a power of 2 between 1M and 32M bytes, and the JVM tries to divide about 2048 regions of the same size. One of the big advantages of the G1 is its predictable pause times, and the ability to complete garbage collection tasks as quickly as possible in a predictable time frame. In JDK11, G1 has been set as the default garbage collector.

G1, as a garbage collector originally conceived in 2004, was finally released in April 2012 with JDK7 update4. We can think of G1 as a garbage collector between generation model and partition model, which plays an important role in the history of garbage collector.

The reason why G1 garbage collector adopts the method of generation and partition is derived from the concept of G1 garbage collector, which pursues the balance between throughput rate and maximum STW time. Unlike pure generational garbage collector, which pursues the ultimate efficiency of garbage collection, and unlike the later garbage collector, which pursues the ultimate maximum STW time, Shenanoah and ZGC both pursue maximum STW times below 10ms.

In contrast, the G1 garbage collector makes a big compromise on memory space utilization, and the G1 garbage collector’s RememberedSet is relatively complex, with a size of about 20% of the heap space.

In addition to the concurrent write blocking that all generational garbage collectors have in the G1 garbage collector due to the Remembered Set mechanism, there is also a SATB based write block for concurrent marking. The G1 garbage collector will still encounter the FGC, which will still use the parallel tag scavenging algorithm, and the STW maximum interval is no better than the older generational collector.

4 Partition Model

The partition model is represented by Shenandoah.

Shenandoah garbage collector has been added to the low-latency garbage collector in The third edition of Zhou Zhming’s 2019 new book “Deep Understanding of Java Virtual Machine”, and its corresponding generation garbage collector and G1 garbage collector are classified as classic garbage collectors. Shenandoah’s origins date back to prior to 2014, when it was started by Red Hat with the goal of taking advantage of modern multicore cpus to reduce the downtime of garbage collection in a heap. Shenandoah has many similarities with G1. For example, Shenandoah has a region-based memory layout, Humongous Region for storing large objects, and the default reclamation policy prioritises the Region with the largest reclamation value. Shenandoah uses the Connection Matrix to record cross-region references, replacing G1’s Remembered Set for lower memory and computation costs. And, in contrast to the G1, Shenandoah’s memory model is generational.

Shenandoah collector memory model is shown as follows:



Shenanoah garbage collector also has components like card tables and Remembered Set to occupy memory space. In the Shenanoah garbage collector, this component, called the Connection Matrix, looks a lot like a two-dimensional card table that keeps track of cross-region references. Compared with Remembered Set, the cross-region reference Pointers for objects are relatively easy to maintain.

As shown below:



The join matrix can be simply understood as a two-dimensional table. If region N has objects pointing to region M, a mark will be placed in row N and column M of the table. If the object Baz in region 5 refers to the object Foo in region 3, and the object Foo in region 3 refers to the object Bar in region 1. So row 5, column 3 in the join matrix, row 3, column 1 in the join matrix should be marked. This join matrix allows you to figure out which regions have cross-generation references during garbage collection.

There is a performance test for several garbage collectors in the third edition of “Deep Understanding Java Virtual Machine” by Zhou Zhiming:



In addition to the illustrations in The third edition of “In-depth Understanding of Java Virtual Machines” by Zhou Zhiming, two more papers in 2016 “Shenandoah: An open-source Concurrent Compacting garbage Collector for OpenJDK

5. Hierarchical partition model

The layered partitioning model is represented by ZGC.

The goals of ZGC and Shenanoah are highly similar in that both want to achieve low latency that can limit garbage collection pauses to less than 10ms at any heap memory size, while minimizing the impact on GC throughput per unit time.

ZGC is a garbage collector that implements a concurrent mark-compression algorithm based on variable Region memory layout using techniques such as read barriers, dye Pointers, and memory multiple mapping.

Compared with Shenandoah, ZGC is a more advanced garbage collector in concept, which is mainly reflected in the many-to-one mapping and isolation of virtual memory space and real memory space, and the dyeing work is reflected in the pointer, rather than the object to which the pointer points. That is to say, from the beginning of ZGC, we no longer need to invade objects in the Heap range through the garbage collector to pollute the objects in the Heap to meet the needs of garbage collection. Instead, we complete the classification of objects in the virtual space through the coloring pointer and virtual space mapping. Starting with the ZGC, garbage collection no longer requires object memory space in the Heap to help with garbage collection. In trisolaran terms, ZGC’s transition from a model that treated the Heap range as a two-dimensional space to a three-dimensional space is a conceptual leap forward and an important platform on which GC can move toward stabilization of the memory model and standardization of the GC interface.



It is worth mentioning that the concept of ZGC and the 2011 paper C4: The Continuously Concurrent Compacting Collector is highly consistent, C4 is a low latency garbage Collector running in Azul’s paid JVM ZingVM. It’s worth noting that C4 describes itself in the paper as”C4 is a generational, continuously concurrent compacting collec- tor algorithm, it expands on the Pauseless GC algorithm [7] by in- cluding a generational form of a self healing Loaded Value Bar- rier (LVB), supporting simultaneous concurrent generational col- lection, as well as an enhanced heap management scheme that re- duces worst case space waste across the board“, it can be expected that in the future, ZGC will also support algorithms similar to generational collection. It is uncertain whether this generation is also similar to dyeing Pointers that continue to be implemented in the virtual memory layer.

The biggest difference between ZGC and Shenanoah is the realization of dyeing pointer + multiple mapping. ZGC writes marker information directly to the pointer of a reference object by staining the pointer. ZGC “marks references by traversing the reference graph”.



In the JEP333 documentation, the authors devote a great deal of time to describing the dye pointer:

A core design principle/choice in ZGC is the use of load barriers in combination with colored object pointers (i.e., colored oops). This is what enables ZGC to do concurrent operations, such as object relocation, while Java application threads are running.

From a Java thread’s perspective, the act of loading a reference field in a Java object is subject to a load barrier. In addition to an object address, a colored object pointer contains information used by the load barrier to determine if some action needs to be taken before allowing a Java thread to use the pointer.

For example, the object might have been relocated, in which case the load barrier will detect the situation and take appropriate action. Compared to alternative techniques, we believe the colored-pointers scheme offers some very attractive properties. In particular: It allows us to reclaim and reuse memory during the relocation/compaction phase, before pointers pointing into the reclaimed/reused regions have been fixed. This helps keep the general heap overhead down. It also means that there is no need to implement a separate mark-compact algorithm to handle a full GC.

It allows us to have relatively few and simple GC barriers. This helps keep the runtime overhead down. It also means that it’s easier to implement, optimize and maintain the GC barrier code in our interpreter and JIT compilers. We currently store marking and relocation related information in the colored pointers. However, the versatile nature of this scheme allows us to store any type of information (as long as we can fit it into the pointer) and let the load barrier take any action it wants to based on that information.

We believe this will lay the foundation for many future features. To pick one example, in a heterogeneous memory environment, this could be used to track heap access patterns to guide GC relocation decisions to move rarely used objects to cold storage.

Multiple mapping is the carrier for ZGC to implement dyeing Pointers:



The following data is taken from the Performance comparison provided in JEP333:

ZGC does not provide Benchmark with NO GC activity data as Shenanoah does, but it can be inferred that ZGC only has read barrier but no write barrier, so the data should be better than Shenanoah. Given that Shenanoah’s data are comparable to G1’s, it can be concluded that ZGC is superior or at least equal to G1’s.

Performance

Regular performance measurements have been done using SPECjbb® 2015[1]. both from a throughput and latency point of view. Below are typical benchmark scores (in percent, Normalized against ZGC’s max-jops), comparing ZGC and G1, in composite mode using a 128G heap. (Higher is better)

Below are typical GC pause times from the same benchmark. ZGC manages to stay well below the 10ms goal. Note that exact numbers can vary (both up and down, but not significantly) depending on the exact machine and setup used.

(Lower is better)

Ad-hoc performance measurements have also been done on various other SPEC®benchmarks and internal workloads.In general, ZGC manages to maintain single-digit millisecond pause times.[1] SPECjbb® 2015 is a registered trademark of the Standard Performance Evaluation Corporation (spec.org). The actual results are not represented as compliant because the SUT may Not meet SPEC’s requirements for general availability.

6. Summary

First, make a summary of the contents of this article by using a table:



As stated at the beginning of this article, GC can be understood and classified from many different perspectives, and any book describing GC must have its own train of thought and classification. No classification can be said to be completely scientific. This article first classiifies GC garbage collectors from the perspective of the Heap memory model. This article does not describe each garbage collector comprehensively, but only from the perspective of the Heap memory model. The reason is that the following text will continue to describe the various garbage collectors currently supported by Hotspot JVM from other perspectives. Hopefully, I’ll eventually put the pieces together so that readers, including myself, can see the full picture of each garbage collector.

This article also emphasizes that the purpose of the garbage collector from any point of view is to:

  • A. Memory footprint of the Heap range (size of the off-heap auxiliary memory space)
  • B. Throughput without GC (read/write barrier effect)
  • C. Delayed pauses (STW duration) rebalancing these three things

In the case of hardware conditions, there is no need for full Heap utilization, high program throughput, and low latency pause at the same time, but in the right hardware environment, using the right GC garbage collector, to achieve a better balance, this is also my own use of garbage collector, And try to take the time to learn a little bit about the purpose of the garbage collector principle.

【 Bibliography 】

  1. C4: The Continuously Concurrent Compacting Collector Gil Tene,Balaji Iyengar, Michael Wolf
  2. Shenandoah An Open-source Concurrent Compacting Garbage Collector for OpenJDK Christine H. Flood,Roman Kennke,Andrew Dinn ,Andrew Haley, Roland Westrelin
  3. Generation Cycles: A Non-Disruptive High Performance Storage Reclamation Algorithm by David Ungar
  4. Virtual Machines by James Smith, Ravi Nair
  5. JEP 333: ZGC: A Scalable Low-Latency Garbage Collector by Liden, Stefan Karlsson
  6. JEP 189: Shenandoah: A Low-pause-time Garbage Collector Christine H. Flood, Roman Kennke
  7. Understanding the Java Virtual Machine in Depth by Zhiming Zhou
  8. Algorithms and Implementation of Garbage Recycling. Nakamura, Shanyo
  9. ZGC Design and Implementation peng Chenghan