In-depth understanding of the JVM-ZGC collector
This is the 14th day of my participation in the August Text Challenge.More challenges in August
preface
The Shenadoah garbage collector is a low-latency garbage collector developed by Oracle after JDK11. In addition, the content of ZGC is very complicated and there are a lot of knowledge points, so IT is suggested to make a cup of tea and watch it while drinking.
Before the formal introduction, let’s take a look at what ZGC supports:
Keywords about ZGC are as follows:
- Concurrent
- Region – -based (Region)
- Compacting (Compression – Collation Algorithm)
- NUMA- Aware (NUMA support)
- Using colored Pointers
- Using load barriers
An overview of the
- Introduces the ZGC collector and its features (emphasis: dye Pointers)
- Understand the basic working principle, workflow and steps of ZGC
- Understanding of ZGC in-depth Learning Methods (end of article)
ZGC compatibility
Supported Platforms
Platform | Supported | Since | Comment |
---|---|---|---|
Linux/x64 | YES | JDK 11 | |
Linux/AArch64 | YES | JDK 13 | |
macOS | YES | JDK 14 | |
Windows | YES | JDK 14 | Requires Windows version 1803 (Windows 10 or Windows Server 2019) or later. |
ZGC collector
introduce
The NAME of the ZGC Collector is an abbreviation, but in fact it is an abbreviation for Z Garbage Collector. ZGC and Shenadoah Garbage Collector are both Garbage collectors designed for low latency, and both want their collection times to be under 10ms.
Shenandoah is arguably an extension and upgrade to the G1 garbage collector. The ZGC is more like a combination of the PGC garbage collector and the C4 garbage collector.
PGC and C4 is a garbage collection that implements both the tagging and collation phases concurrently with the user thread, but only on the Azul VM (it was implemented in 2005, which is awesome).
If I had to write a short paragraph about it: The ZGC collector is a low-latency garbage collector based on the mark-collation algorithm, with no generation (for now), read barriers (note no write barriers), coloring Pointers and memory multiple mapping.
Features and characteristics of ZGC:
Here are the features of the ZGC garbage collector, which are quite complex and will be the focus of this article:
Compression – collation algorithm
The ZGC uses the compact collation + copy algorithm for processing, which is used to copy live objects to an idle region. Tag collation is used to ensure that memory fragmentation does not occur after collection.
Region
Like the Shenandoah collector, ZGC uses regions as the layout of heap memory, but ZGC’s regions have three sizes:
- Small: the fixed value is 2MB and the value is smaller than 256Kb
- Medium: the fixed value is 32MB. Objects larger than 256Kb and smaller than 4Mb are placed
- Big: Each large region stores only one object. Although it is called a large object, it can obviously store objects of 4MB. In addition, large objects cannot be redistributed (ZGC processing action, As mentioned in the workflow), copying a large object is very expensive, so the ZGC forbids this operation.
Concurrent collation algorithm
ZGC also uses a concurrent collation garbage collection algorithm, but ZGC concurrent collation is implemented through read barriers and forward Pointers, which is completely different from Shenandoah’s implementation. Let’s take a look at what a dye pointer is.
Dyeing pointer:
First, the forward pointer used by ZGC is called the dye pointer. Dyeing method to test the existence of pointer is the most pure mark record, it will be a small amount of extra information is stored in the pointer directly, ZGC targeted is addressing space occupied by the operating system after the rest of the 46 physical address space of the space, to pick up the four high storage four marks information, virtual machine can be directly through this pointer to see a few information reference state of three color marker, Of course, the address space of only 46 bits also directly causes that the memory that ZGC can manage cannot be more than 4TB. Using physical address space means that pointer compression technology cannot be used.
For a better understanding, let’s take a look at the graph given in the official source code. It shows that the coloring pointer uses part of the address space to store some of the object’s marker information, and also ensures that the object reference moves synchronously after the object is moved. The 42 bits from 0 to 41 are normal addresses. So the ZGC supports up to 4TB (theoretically 16TB) of memory, because only 42 bits are used to represent addresses, which means it cannot compress Pointers and does not support 32-bit operating systems. The 42-45 bits represent flag bits, which are used to record object references and point to the same object.
One thing to note here is the variables M0, M1, Remapped, and Finalizable. Where, [0~4TB) corresponds to the Java heap, [4TB ~ 8TB) is called M0 address space, [8TB ~ 12TB) is called M1 address space, [[12TB ~ 16TB] ** is not used in JDK13. (it is not mentioned in JDK13.) Why not do this in JDK11? There is a reservation here, which has been filled in JDK13 to make ZGC support 16GB of memory.
ZGC stores object survival information in 42 to 45 bits, as opposed to traditional garbage collection and placing object survival information in object headers.
How was object reference movement implemented in the past?
In previous implementation, if you want to be in the object store additional information such as want to collect the garbage collector, you need to the expansion of the additional fields in the object, such as head and object information such as age and the lock state of the object, the information under normal circumstances is very smooth to use, but once the moving objects, things become very complicated, Who exactly does this information relate to? Notice that there’s a misconception that this data is related to the object itself, but in fact, it’s related to the reference to the object, and gastrin does that. Imagine if there were only objects but no reference to the object itself, would that object be valuable? Clearly this object is junk. So it’s the reference to the object that’s associated with the data.
And to keep this in mind for object references, in Hotspot’s design, there was the idea of marking the tags in the object header (Serial) and putting the tag records into a separate data structure (G1, Shenadoah) Bitmap.
How does the dye pointer work?
Once the live objects of a Region have been removed, the Region can be freed and reused without having to wait until all references to the Region in the heap have been corrected. This means that as long as there are free regions, ZGC can complete the collection operation.
Shenandoah’s problem lies in this. The way to forward the pointer undoubtedly requires fixing the reference point (CAS lock), which means that there will be extreme cases where all regions will survive. In this case, a region with at least half of the free space will be required to reclaim if replication is required.
Why the dye pointer can do this has to do with the self-healing properties of the dye pointer.
Pointer to the self-healing
In short, a reference to an old object is repaired to a new reference by forwarding the pointer record table after the accessed object is captured by the memory barrier. This process is called pointer self-healing.
What is the self-healing of a dye pointer? Here to be involved in the process of “concurrent redistribution”, in order to deepen the concept of the pointer here together, we skip the concurrent redistribution process, the key to realize this step is dyed a pointer, can according to the dyeing in ZGC pointer know whether object reference in a set of redistribution, if user threads access the redistribution of object, This operation is intercepted by the pre-placed memory barrier, but immediately forwards the access to the newly copied object based on the region’s forwarding record, correcting the value of the reference, and then pointing the reference to the new object.
Note that this process looks exactly like Shenandoah’s forward pointer, but note that Shenadoah uses read/write barriers and forward Pointers with CAS locks. The ZGC implements this operation directly through the dye pointer plus the forward pointer record table and write barrier. There are essential differences between the two.
Virtual memory mapping technology
Note that this technique is used to achieve the use of coloring Pointers, it is the function of multiple virtual addresses pointing to the same physical address, after multiple mapping conversion, can achieve normal access and addressing of coloring Pointers.
Read barrier
G1 requires a write barrier to maintain the memory set so that it can handle cross-generation Pointers and incremental Region collection. Shenadoah has previously stated that only Brooks Pointer + read/write barrier is used to repair old and new references to objects. ZGC does not use write barrier, but only use read barrier to achieve concurrent garbage collection action, how to apply
Read barriers are a technique by which the JVM inserts a small piece of code into the application code. This code is executed when the application thread reads the object reference from the heap. Note that this code is triggered only by “reading the object reference from the heap”.
Examples of read barriers:
Object o = obj.FieldA // To read a reference from the heap, add a barrier
<Load barrier>
Object p = o // There is no need to add barriers because references are not being read from the heap
o.dosomething() // There is no need to add barriers because references are not being read from the heap
int i = obj.FieldB // There is no need to add a barrier because it is not an object reference
Copy the code
NUMA support(JDK15)
The following is the official WIK introduction for NUMA. Note that it is only supported in JDK15, not in JDK11, and the G1 collector is fully supported in JDK14. ZGC implements NUMA as follows:
When a Java thread allocates an object, the object will end up in the local memory of the executing Java thread CPU. If local memory is insufficient, the object will be allocated from remote memory. The ZGC collector will first attempt to allocate the object on the multiple local memory of the requesting thread’s current processor.
ZGC has NUMA support, which means that it tries to direct Java heap allocation to NUMA local memory. This feature is enabled by default. However, it is automatically disabled if the JVM detects that it is bound to a subset of cpus in the system. This means that we usually do not need to use this parameter, but can be controlled using -xx :+UseNUMA or -xx: -usenuma.
When running on NUMA machines (such as multiplexed x86 machines), enabling NUMA support often results in significant performance improvements.
Before we introduce what NUMA is, let’s understand what SMP is:
Symmetric Multi-Processor (SMP) Architecture
Symmetric multiprocessing system contains many tightly-coupled multiprocessor, in such a system, all the CPU Shared resources, such as bus, memory and I/O system, etc, is simply the structure of all the CPU share a resource in the biggest characteristic is the computer sharing a memory resources, CPU of this kind of structure in the early days of the north and south bridge structure is very common, The structure diagram is as follows:
As you can see from the figure, SMP is also called Uniform Memory Access (UMA) because all cpus Access the same Memory content (and Access speed).
As modern processors advanced, the SMP architecture allowed memory to keep up with the CPU’s processing speed, resulting in a large amount of memory being “wasted”, so the NUMA architecture was later improved:
Non-uniform Memory Access (NUMA)
Wik address: en.wikipedia.org/wiki/Non-un…
NUMA is one of the results of efforts to develop techniques for effectively scaling up large systems due to SMP’s limited ability to scale. NUMA integrates content and CPU into a single cell, and because of this CPU and memory structure, memory access can be “inconsistent”, which is why it is called inconsistent memory access
As you can see from the figure above, NUMA tries to solve this problem by providing separate memory for each processor, avoiding performance degradation when multiple processors try to address the same memory.
Finally, the Parallel Avenge avenge, which was designed for throughput, was the only collector prior to the ZGC to support NUMA allocation, which was implemented in JDK14G1 and ZGC was implemented in JDK15.
Only 64-bit systems are supported
ZGC only supports 64-bit systems, and it divides the 64-bit virtual address space into multiple subspaces due to the use of coloring Pointers.
ZGC workflow
This article can only give you a rough idea of the workflow, but to fully understand the details, you need to take a look at ** Next Generation Garbage Collector ZGC Design and Implementation **.
The operation process of ZGC can be roughly divided into the following four stages. All four phases can be executed concurrently, except for short pauses between phases: for example, the initial tag initializes GC ROOT, which is consistent with Shenandoah’s initial tag.
There are four important steps described in the book: concurrent marking, concurrent preparatory reallocation, concurrent reallocation, and concurrent remapping.
Initial tag
All garbage collectors have this step, note that this phase is a pain point for the JVM, even in the ZGC STW will appear and all GC ROOT objects will be marked and recorded in the tag stack.
Concurrent tags
The object graph is traversed by GC ROOT’s markers, and there are also short pauses in steps such as G1 and Shenadoah’s initial markers and final markers. Note that ZGC’s markers are on Pointers rather than objects, and the markers of M1 and M2 are updated in the marking phase. (Marked 1)
Concurrent preparatory reallocation
At this stage, specific query conditions are used to calculate which regions are to be cleared in the collection process, and these regions are grouped into a redistribution set. The redistribution set is different from the return collection of G1 collector. The garbage collection of ZGC does not calculate the most valuable Region to be collected, but scans all regions. Use a wider range of scans to eliminate the maintenance of memory sets.
So reallocation simply determines which live objects will be copied to other regions, the marking process is for the full heap, and JDK12 supports class unload and weak reference handling based on this phase.
Concurrent redistribution
Core stage. This process copies live objects to new regions. Concurrent reassignment requires maintaining a forwarding table for each Region to record the redirection relationship from the old object to the new object. As for how to achieve this, the self-healing properties of Pointers have been described previously and will not be described here.
Concurrent remapping
Mapping work is fixed in the heap to redistribution concentrated all the old object references, also can think that is true for object references directly to repair a step, from the perspective that shenandoah concurrent references update phase is the same, but ZGC doesn’t need to do this right away (because a pointer self-healing properties), ZGC neatly merges the work of the concurrent remapping phase into the concurrent marking phase of the next garbage collection cycle, which has the advantage of saving the overhead of traversing the object graph.
Once all Pointers are fixed, the referential forwarding table of the old and new objects can be released.
One final note: Understanding the JVM In Depth is a little light on the details of this phase of the workflow. For details on each step, see recommended reading and related books.
The disadvantage of ZGC
Is the chief drawback of ZGC regardless of generation, why this time regardless of generation instead become a drawback, using Region is not very good, because a generational are difficult to achieve, and the article said the opening Azul is implemented in 0 years concurrent garbage collection and object allocation, and is based on the generational, of course he is done for a specific virtual machine, The JDK takes into account compatibility between different operating systems, such as the following:
- If a large number of small objects are allocated, the ZGC will accumulate floating garbage because concurrent collections cannot keep pace with object creation
- Because there is no generation, it cannot be very high-speed and precise recovery, requiring complex algorithms to control
The official FAQ
Why is it called ZGC?
It doesn’t stand for anything, ZGC is just a name. It was originally inspired by, or paid homage to, **ZFS (** file system), which was revolutionary in many ways when it first appeared. Originally, ZFS was an acronym for “Zettabyte File System,” but the meaning was dropped and later said to stand for nothing. It’s just a name. For more details, see Jeff Bonwick’s blog.
Upgrade logs:
If you don’t know which version has been added, you can check the Main-ChangeLog
JDK 16
Concurrent Thread Stack Scanning (JEP 376)
Support for in-place relocation
Performance improvements (allocation/initialization of forwarding tables, etc)
JDK 15
Production ready (JEP 377)
Improved NUMA awareness
Improved allocation concurrency
Support for Class Data Sharing (CDS)
Support for placing the heap on NVRAM
Support for compressed class pointers
Support for incremental uncommit
Fixed support for transparent huge pages
Additional JFR events
JDK 14
macOS support (JEP 364)
Windows support (JEP 365)
Support for tiny/small heaps (down to 8M)
Support for JFR leak profiler
Support for limited and discontiguous address space
Parallel pre-touch (when using -XX:+AlwaysPreTouch)
Performance improvements (clone intrinsic, etc)
Stability improvements
JDK 13
Increased max heap size from 4TB to 16TB
Support for uncommitting unused memory (JEP 351)
Support for -XX:SoftMaxHeapSIze
Support for the Linux/AArch64 platform
Reduced Time-To-Safepoint
JDK 12
Support for concurrent class unloading
Further pause time reductions
JDK 11
Initial version of ZGC
Does not support class unloading (using -XX:+ClassUnloading has no effect)
Copy the code
Recommended reading:
Here is a collection of several danniu’s articles, personal articles also have reference and reference:
-
Exploration and practice of new generation garbage collector ZGC
-
Meituan interviewer asked me: what does ZGC stand for
-
R Large JVM learning processes –Learn JVM implementations from tables to tables
-
R big blog
conclusion
It can be seen that ZGC is more complex than Shenandoah, with more content and more time to digest and understand, but if you do not go into the source code, these principles are easier to understand. Of course, there is a lot of operating system knowledge included here. If you find it difficult to read, it is necessary to supplement the foundation of CSAPP.
Don’t get confused here, if you are familiar with the principles and implementation of ZGC, you will probably have no problem talking to the interviewer, because you are not the only person who can discuss the JVM source-level implementation with you. So don’t be confused.
Many places here refer to the blog, some content directly transferred, so read other people’s articles to see how others understand and then read the book yourself will have a different understanding, learning is basically this way, continuous reference and thinking is the process of growth.
Write in the last
Another long article. I don’t know how many people can finish it.
Because the work flow is a little rough after my personal study, I will summarize it again according to my personal study of ZGC.