In-depth understanding of JVM-Shenadoah

This is the 13th day of my participation in the August More Text Challenge. For details, see:August is more challenging

The cow was very tired. The cow could not write any more

preface

The collector of ZGC and Shenadoah is facing the future, which is still in the stage of continuous improvement. Although we may not use it very much in ordinary times, it is necessary to understand and master it. There is very little content about this piece on the Internet, so I still use the contents in books for summary.

The other two garbage collectors completely discard the concept of generational collection, note that completely discard, not like the G1 collector which uses partitioning but is essentially generational collection.

Since these two collectors cover a lot of content, they are covered separately here, and the Shenadoah collector is covered in this section.

Mind mapping:

If you don’t want to read the text, check out the mind map: www.mubucm.com/doc/7L4W-FA…

An overview of the

Low latency garbage collector

Before the formal introduction, it is necessary to explain the whole background, the modern garbage collector mainly considers the following three conditions: Memory usage, throughput, latency. From the previous collector introduction, we know that although the mainstream G1 collector realizes concurrency in the marking phase, it still needs to stop world in stages during the initial marking and filtering reclamation phase. This garbage collector does not achieve real concurrency. And because of the design limitations of partition +region generation, there will inevitably be a pause in garbage collection. So the main goal of the garbage collector in the future will be to move towards very low latency, which is to strive for full concurrency between the user thread and the garbage collector thread.

It’s worth noting that even though the new low latency abandoned the concept of a generational garbage collector, but G1 Region partition and retained garbage pause model, we can see almost all of the garbage collector are based on the predecessors’ efforts to improve, so don’t need to be very fear difficult or content is totally subversive ideas.

Details of g1 and CMS implementation concurrency

The previous article mentioned incremental updates and raw snapshots, CMS uses incremental updates, G1 uses raw snapshots, and CMS uses a mark-clean algorithm, which is not free from memory fragmentation, while G1 uses mark-clean, but it still needs to be paused, so this is a very tricky issue.

Shenadoah

Introduction to the

The first garbage collector not officially developed by the JDK, the collector was developed by redhat and later donated to the eclipse foundation, where it is maintained and managed. While Shenadoah has a lot to improve on in terms of design details, it is well positioned to be used as a stand-alone garbage collector.

Shenadoah exists only in openJDK and cannot be deployed on Oracle JDK, but this garbage collector is still worth learning.

The characteristics of

Here are the features of Shenadoah:

Region

Similar to the design principle of G1 collector, the collector uses regions for partitioning and has the same concept of large objects. The default strategy is to reclaim the most valuable region based on the algorithm.

There is no generational and connection matrix

In other words, there is no new generation and no old generation. How do you design it? Shenandoah’s solution is to maintain region references using a independently constructed global data structure called “join matrix.” Don’t be put off by the term “join matrix.” It’s essentially a two-dimensional array structure. A mark is placed on the corresponding N rows and M columns, which means that all global object references are maintained through this table, which also means that the join matrix expands as the number of objects grows.

The G1 collector is designed to use partitions instead of fixed generations, but partitions are essentially generational, only you can decide which generation you belong to.

Here is a diagram directly from the book to show the design of the connection matrix:

Have to say is this connection matrix is designed many things, although very convenient but maintain a matrix as the grow in quantity of objects can present a table grows exponentially, the perspective is a questionable design, it will be mentioned in subsequent ZGC garbage collector is introduced, ZGC found connection matrix of the problem, Some improvements have been made to solve the problem of table inflation.

Supports concurrent collection and collation

Support for concurrent collection and collation, allowing the markup and collation phases to be executed completely concurrently with user threads.

Algorithm details

So how does the collector do all of this? Before we go into the workflow, let’s talk about the implementation details of the algorithm.

brooks pointer

For historical reasons, here is the meaning of this value: forward pointer. What is the forward pointer? It is used to solve the object movement and user program concurrent one kind of solution.

How Brooks Pointer works: When the object is moved, Brooks Pointer will point to the address of the new reference, so that the object pointing to the old reference can fix the reference to the new object. This structure is similar in form to the JVM handle positioning. Both use an indirect form of access, the difference being that the forward pointer is scattered within the object header. (We discussed earlier that object headers are dynamically extended formats.)

This design form is also somewhat similar to the linked list design form.

Supplement: How did you resolve object references before?

It uses a way of setting protection trap + exception handling on the original object memory. Once the behavior of accessing the old object appears, it will enter the protection trap and enter the exception handler to repair the code logic and reference. This approach seems to be very effective, but without operating system support, it requires constant switching from user mode to kernel mode, requires more context switching resources, and is a very costly performance compromise.

Disadvantages:

Although the forward pointer is optimized to a single line assembly instruction, it still consumes the efficiency of object access, which is undoubtedly better than the memory trap.

Concurrency problem:

The design of the forward pointer means that it must have concurrent problems. If concurrent operations occur, you need to ensure that the write operation must be under the newly copied object. Consider the following questions:

  1. The collector thread makes a new copy of the object
  2. The user thread updates a field of the object
  3. The collector thread updates the reference value of the forward pointer to the new copy address.

If these three problems are not prevented, object changes in user threads will all operate on old objects, so the pointer access must be synchronized. The solution is similar to how references are allocated to objects and is also an operational mechanism that uses CAS+ update failure retry.

Finally, it’s important to note that Shenadoah has to maintain the Brooks Pointer using a read/write barrier (concurrency issues require constant synchronization), which is very costly. Let’s move on to the question of readwritebarriers.

Reading and writing barrier

Shenandoah uses not only write barriers but also read barriers. Read barriers are another aspect of AOP similar to object reference operations. We all know that there are more read operations than write operations on objects, so the cost of using read barriers is much higher.

The concept of write barriers can be seen in the previous article: Understanding the JVM-Hotspot algorithm details # Write barriers

Shenandoah developers are aware of the problem, of course, in JDK13 version, based on “reference to access barrier” is to solve the problem of reading barriers, “reference to access barrier” refers to intercept only object is similar to a reference type data access barrier to intercept, so you can save some primitive types the concurrent modification operations, Reduce the large read barrier maintenance overhead.

It also shows the Redhat development team’s lack of experience in designing a JVM garbage collector, but can be tuned in time to fix the problem.

Working process:

Shenandoah’s work can be divided into nine steps, and the latest version of Shenandoah adds three steps before the initial markup step, which can be simply understood as Minor GC operations in generational collections.

Here are the steps:

  1. Initial markup: As in G1, all GC ROOT associated objects are marked first. Note that there is a pause at this stage.
  2. Concurrency marking: As in G1, the object graph is traversed against GC ROOT, marking all reachable objects, and this phase is concurrent with the user thread.
  3. Final marking: As in G1, the rest of the object scanning is handled, and the Region with the highest value to recycle is calculated. There is a short pause in the final marking phase.
  4. Concurrent cleanup: This phase is used to clean up a Region where there are no living objects. This phase is executed concurrently.
  5. Concurrent collection: The core difference, at this stage, is that the remaining objects in the collection are copied to other unused regions, but note that this operation is not synchronous but concurrent with user threads, again concurrent, not the alternate pause and run way G1 works. Note that the implementation is the same as the “Brooks Pointer” mentioned earlier, using the forward Pointer operation + CAS lock to restore the old object reference to the new object. This phase is also the biggest difference from G1, as garbage collection and concurrent operations by user threads are implemented.
  6. Initial reference update: After the object is copied in the concurrent reclamation phase, all updates in the heap that point to the old object need to be copied to the new address. This operation is also called reference update. There is also a very short pause.
  7. ** Concurrent reference updates: ** The actual operation of referencing updates begins, depending on the number of references. It was executed concurrently, of course
  8. Final reference update: After resolving reference updates in the heap, the GC ROOT reference is corrected. This stage is the last pause. The pause time depends on the number of GC Roots.
  9. Concurrent cleanup: Finally reclaim an empty Region without any objects

The three stages of concurrent markup, concurrent reclaim, and concurrent reference update are the most important.

The following image was taken from the official wiki:

Init Mark: Indicates the initial Mark

Final Mark: Final Mark

Init-ur: initial reference update

Final-ur: Final reference update

Results contrast

The officials have a comparison chart showing Shenandoah’s garbage collection time, which shows the nearly latency-free garbage collection:

conclusion

Shenadoah collector is not officially developed by JDK, but unfortunately, due to business competition, it only exists in OpenJDK and has not been commercialized. Besides, due to further development of ZGC, the role of Shenadoah is gradually decreasing. But admittedly, as a collector developed by developers with no experience in JVM garbage collector development, this collector meets the requirements and is worth learning from.

In addition, you can see that modern garbage collectors are very complex even when they are simplified. Since most developers are still using garbage collectors like JDK8 and G1, these garbage collectors are future-oriented for now, but there is no doubt that we need to learn more.

Other information:

Shenadoah garbage collector website introduction

Shenandoah JVM parameter of the collector case: Java – XX: XX: + UseShenandoahGC – ShenandoahGCHeuristics = passive – Xlog: gc

Write in the last

This book is a superficial description of the garbage collector, but it is enough for us to understand the collector. If you want to learn more, I suggest you start with the official WIKI provided above, since the people who developed it are the ones who know the most about it.