Glory for Java programmers, listen to R talk about ZGC for JDK11

preface

ZGC come!!!!!! Java programmers can be gloriously free of annoying GC pauses and tuning. The ZGC result is that no matter how much heap memory you turn on (1288G? 2 t? JVM pauses of less than 10 ms are guaranteed.

In the SPECjbb 2015 benchmark, the maximum pause time for a 128GB heap was only 1.68ms (not average, not 90%, 99%, Max!). Far below its original target of a conservative 10ms, and far better than its predecessor, the G1.

The G1 improved the pause time of the heap by reclaiming only parts of the heap at a time instead of the whole heap, but it didn’t perform as well in a normal-size heap. Now it’s all over the place.

If the article is too long to read, you just need to remember the following sentence:

Compared with the traditional algorithm of tagged objects, ZGC marking on the pointer, join the Load when accessing the pointer Barrier (read Barrier), such as when the object moving by GC, the pointer on the color will be wrong, the Barrier will first pointer updates to effectively address and back, that is, never read only a single object probability were to slow down, There is no Stop The World that grates on The whole in order to keep The application consistent with GC.

In fact, Azul JDK’s emperor brand C4 garbage collection, has long been the same with the highest ten milliseconds pause to become a legend. Having studied at THE UNIVERSITY of Azul, I was very familiar with the JDK11 ZGC algorithm and results. After chatting with ZGC leader Per Liden, I confirmed that ZGC and Azul Pauseless GC are equally valuable. (R is reading this article – other students preview, R is reading, thought for a long time, selected the word “equivalent”)

(R big beats Per greatly in JVMLS)

Well, if you still have time, let’s talk about eight more features of the ZGC.

First, all phases are executed almost concurrently

By Concurrent, we mean that the application thread and the GC thread work in parallel without interfering with each other.

ZGC is not a Zero Pause GC because there are still three very short STW phases.

R: For example, in the Pause Mark Start phase, the root set is scanned for Pointers to global variables, thread stacks, etc., but not Pointers to objects in the GC heap. So the pauses don’t vary with the size of the GC heap (but depending on the number of threads, thread stack size, etc.) “– so the ZGC can be confident that the pauses are less than 10ms regardless of the size of the heap.

“Colored Pointer” and “Load Barrier” ensure concurrent execution

The principle of R in front of the big sentence has been said. Colored Pointer borrowed bits from the 64-bit Pointer to Finalizable, Remapped, Marked1, and Marked0. So it doesn’t support 32-bit Pointers or compressed Pointers, and the heap is up to 4TB.

When the Load barrier is present, it will see if it wants to do something special (Slow Path) at various stages, depending on the pointer color. Note in the figure below that read barriers are only required for the first statement and not for the last three, such as when the value is of primitive type.

R also mentions ZGC’s Load Value Barrier, which differs from Red Hat’s Shenandoah collector, which chose a fairly basic Brooks Pointer from the 1970s, The former adds self Healing to the old Baker Barrier, as in the following code:

Object a = obj.x;

Object b = obj.x;

Both lines insert a read barrier, but ZGC after the first read barrier not only a is new, but also the value of obj.x in self healing is corrected, the second read barrier goes directly into FastPath, no consumption; Shenandoah does not correct the value of obj.x, and the second read barrier SlowPath again.

Divide regions like G1, but with more flexibility

The ZGC divides the heap into regions for cleaning, moving, and parallel GC threads.

However, G1 divides the heap into fixed-size regions at the very beginning, while ZGC can have three Size Groups: 2MB, 32MB, and N× 2MB, which dynamically creates and destroys regions and determines the Size of regions.

Objects less than 256K are allocated in Small Page, objects less than 4M in Medium Page, and objects above in Large Page.

So ZGC can handle large object allocation better.

4. Can do Compacting as G1

CMS is a Mark-swap, marking expired objects and then recycling them in place, which causes memory fragmentation and makes it harder and harder to find contiguous space until Full GC occurs.

ZGC is mark-compact, which moves all living objects to another Region, reclaiming the entire Region.

The G1 is a Incremental Collector and will also do compression.

Below is a rough dozens of times through a wave of recycling process, small stages have been skipped ha:

1. Pause Mark Start – Indicates the initial Pause Mark

The pause JVM marks the Root object, with 1, 2, and 4 marked as live.

2. Concurrent Mark – A Concurrent Mark

Other objects are recursively marked concurrently, with 5 and 8 also marked as live.

3. Relocate – Move the object

The comparison shows that 3, 6, and 7 are expired objects. That is, the two gray regions in the middle need to be compressed and cleared. Therefore, objects 4, 5, and 8 need to be moved to the new region on the right. During the movement, there is a forward table that records the turn.

This region can be released immediately after all live objects have been removed and used as the to region of the next region to be scanned. So in theory to collect the entire heap, all you need is an empty region.

RedHat’s Shenandoah requires 1/2 Heap to be empty because of its forward pointer design.

4. Remap – Correction pointer

Finally, the pointer is properly updated to point to the new address. “The Remap of the previous stage and the Mark of the next stage are mixed together, which is very efficient and saves the overhead of repeatedly traversing the object graph.”

5. Remember Set (G1) does not have a Write Barrier

G1 ensures that each GC pause time is not too long, which is an incremental method of cleaning some regions at a time instead of all of them.

Remember that when you clean up a Region independently, you need a RememberSet to remember object references between regions so that you can rely on it to help calculate the memory of objects without scanning the whole Heap. RS usually accounts for 20% or more of the whole Heap.

Remember that when G1 writes references and GC moves objects, RememberSe must be updated synchronously. Keep track of references across generations and regions. CMS only has CardTable between new and old generations, which is much lighter.

ZGC has almost no stops, so it is not just a matter of incrementally recycling all regions, so there is no need for that RememberSet (RememberSet) because it is not even generational yet, so there is no Write Barrier.

6. Support Numa architecture

Multi-cpu socket servers are now Numa architectures. For example, a server with two CPU sockets (24 cores) and 64 GIGABytes of memory, the 12 cores on one CPU can access its 32 GIGABytes of local memory much faster than the other 32 GIGABytes of remote memory.

The Parallel Scavenger algorithm in the JDK supports Numa architecture and achieved a 40% improvement in the SPEC JBB 2005 benchmark.

Principle, is to apply for a heap memory, the memory of each Numa Node can apply for some, when a thread object, which is based on the current CPU running, in near the CPU memory allocation, the thread continue to go down, often to access the object, and if the thread has not been switched out, The CPU is still visiting, so soon.

But unfortunately CMS, G1 does not support Numa, now ZGC has made simple support again, hahaha.

R big supplement G1 or plan to support the Numa: http://openjdk.java.net/jeps/157

Seven, parallel

It is described on the ZGC website that in the case of a 32-core server with 128GB heap in the previous benchmark test, its configuration is:

20 ParallelGCThreads, working in parallel during those three extremely short STW phases – Mark Roots, Weak root processing (StringTable, JNI weak Handles,etc) and relocate Roots;

Four ConcGCThreads that work concurrently with the application in other phases – Mark, Process Reference, Relocate. With just four, it’s a high-minded effort not to compete with applications for CPU.

Each ConcCGCThreads starts with its own equally allocated Region. If the cable runs out first, it tries to “steal” regions that other threads have not yet completed.

Eight, a single generation

The lack of generation is probably the only weakness of the ZGC. So R basically said that the level of ZGC was between AZul’s early PauselessGC and Generational C4 — which is called GPGC in the code, PauselessGC.

Generation was originally assumed by Most Object Die Young to allow the new generation and the old generation to use different GC algorithms. But C4 is all the way concurrent, so why do we have to do it?

R says:

“Because generational C4 can tolerate object Allocation rates that are about 10 times higher than the original PGC.

If you do a full concurrent collection cycle over the entire heap, it can take a long time such as a few minutes, and the objects created during this cycle are basically treated as live objects, even though they would have died long before they could be collected. If there is a generational algorithm, new objects are created in a special region, and collections for this region are more frequent and faster, and fewer objects are left alive by accident.

And Per greatly because of the generation implementation of the trouble, the first implementation of a relatively simple and usable single-generation version. So if the ZGC encounters very high object allocation rates, the only effective way to “tune” at this point is to increase the size of the entire GC heap to give the ZGC more breathing room.”

summary

ZGC makes Java look good, it is not Java!!

All kinds of R big chat record, not turn not R big powder ！！！！

Summary 2

Rest more than a year after the update again, because missed the most golden age of the public, we re concerned about this number, to the late night code word author a little comfort.

You old boss to write the public number recommended collection, please take the trumpet with you.

The resources

1. ZGC wiki:

https://wiki.openjdk.java.net/display/zgc/Main

2. R Big Zhihu answer:

https://www.zhihu.com/question/287945354/answer/458761494

3. How sick is the ZGC collector? ImportSource by He Zhuofan

Thanks for borrowing many pictures in this article. Link is too long to post, you search by title.

4. A FIRST LOOK INTO ZGC

http://dinfuehr.github.io/blog/a-first-look-into-zgc/

5. The Pauseless GC Algorithm by AZul:

https://www.usenix.org/legacy/events/vee05/full_papers/p46-click.pdf

6. AZul open source C4 reference implementation, original paper implementation

https://github.com/GregBowyer/ManagedRuntimeInitiative/tree/master/MRI-J/hotspot/src/azshare/vm/