Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
Introduction to ZGC
The original
ZGC is a new garbage collector recently open-sourced by Oracle for the OpenJDK. It was mainly written by Per Liden.
ZGC is similar to Shenandoah or Azul’s C4 that focus on reducing pause-times while still compacting the heap. Although I Won’t give a full introduction here, “Compacting the heap” just means moving the still-alive objects to the start (or some other region) of the heap. This helps to reduce fragmentation but usually this also means that the whole application (that includes all of its threads) needs to be halted while the GC does its magic, this is usually referred to as stopping the world. Only when the GC is finished, the application can be resumed. In GC literature the application is often called mutator, Since from the GC’s point of view the application mutates the heap. Depending on the size of the heap such a pause could take several seconds, which could be quite problematic for interactive applications.
The translation
ZGC is a new garbage collector that has just been officially distributed as open source by Oracle as part of the OpenJDK and was developed primarily by Per Liden.
ZGC is similar to ShenandoahGC and Azul’s C4 garbage collector in that its main features focus on reducing the pause time of GC and also the ability to compress heap memory, although we can’t cover it completely here. “Defragmenting or compressing the heap” simply means moving objects that are still alive into the heap of another region. This approach greatly reduces memory fragmentation, but comes at the cost of stopping and interrupting the entire application (including all threads in the virtual stack), which is usually specified as “STW”, and all threads and application threads are resumed when only GC is complete. In the technical culture of GC, the user’s application thread is often referred to as a “Mutator thread” in order to distinguish GC threads. From the perspective of the GC collector, the application changes the distribution of objects in heap memory. Depending on the size of the heap, this pause can take several seconds, which is a big problem for frequently user-focused interactive applications.
ZGC Technical Options
The original
There are several ways to reduce pause times:
- The GC can employ multiple threads while compacting (parallel compaction).
- Compaction work can also be split across multiple pauses (incremental compaction).
- Compact the heap concurrently to the running application without stopping it (or just for a short time) (concurrent compaction).
- No compaction of the heap at all (an approach taken by e.g. Go’s GC).
ZGC uses concurrent compaction to keep pauses to a minimum, this is certainly not obvious to implement so I want to describe how this works. Why is this complicated?
- You need to copy an object to another memory address, at the same time another thread could read from or write into the old object.
- If copying succeeded there might still be arbitrary many references somewhere in the heap to the old object address that need to be updated to the new address.
I should also mention that although concurrent compaction seems to be the best solution to reduce pause time of the alternatives given above, there are definitely some tradeoffs involved. So if you don’t care about pause times, you might be better off using a GC that focuses on throughput instead.
The translation
There are several important ways to reduce pause times.
- The GC collector can use multiple threads during the compression phase.
- The work of compressing and defragmenting memory can also be divided into several pause small phases (incremental compression algorithm).
- While compressing heap memory and the application thread are in the concurrent processing phase, it will not pause or pause for very short periods of time.
- Abandon the defragment phase and do not compress the heap at all, resulting in fragmentation (such as the approach taken by Go’s GC)
ZGC uses concurrent compression to minimize application thread suspension time. The phases and implementation are very complex, so let’s start with its workflow and features. Why is it so complicated?
When an object is collected by GC, and when “tag copy and tag collation” takes place, an object must be copied to another memory address and the application thread can read or write the object.
While everyone mentioned that “concurrent compression” seemed to be the best solution to reduce pause times among the above alternatives, the trade-off was always a situational one. Therefore, if you are not concerned with GC pause times or user experience, it is better to use a throughput-focused GC.
GC Barriers
The original
The key to understanding how ZGC does concurrent compaction is the load barrier (often called read barrier in GC Although I have an own section about ZGC’s load-barrier, I want to give a short overview since not all readers might be familiar with them. If a GC has load-barriers, the GC needs to do some additional action when reading a reference from the heap. Basically in Java this happens every time you see some code like obj.field. A GC could also need a write/store-barrier for operations like obj.field = value. Both operations are special since they read from or write into the heap. The names are a bit confusing, but GC barriers are different from memory barriers used in CPUs or compilers.
Both reading and writing in the heap is extremely common, so both GC-barriers need to be super efficient. That means just a few assembly instructions in the common case. Read barriers are an order of magnitude more likely than write-barriers (although this can certainly vary depending on the application), So read-barriers are even more performance-sensitive. Generational GC’s for example usually get by with just a write barrier, ZGC needs a read barrier but no write barrier. For a concurrent compaction I haven’t seen a solution without read barriers.
Another factor to consider: Even if a GC needs some type of barrier, they might “only” be required when reading or writing references in the heap. Reading or writing primitives like int
or double
might not require the barrier.
The translation
The key to understanding how the ZGC collector does “concurrent compression” is the load barrier (commonly referred to as the read barrier in THE GC literature) although “Per Liden” has a [Dinfuehr implementation barrier] about the ZGC load barrier, here is a brief overview because not all developers are familiar with them. For the GC’s load barrier, the GC needs to perform some additional operations when reading references from the heap, just as our AOP aspect preempts methods and performs some fixed operations before methods.
In normal Java development, when you see code like “obj.field”, you invoke the action control that generates a “load barrier”.
In addition, GC requires a ‘write/store barrier’ for operations like ‘obj.field=value’, both of which are special because they read or write from the heap, and GC barriers are different from memory barriers used by CPUS or compilers.
Both reading and writing of objects in heap memory are common, so both GC barriers need to be very efficient. This means that in ordinary cases only a few assembly instructions are required. Read barriers are an order of magnitude more likely than write barriers (although this of course varies from application to application), and therefore more sensitive to performance. For example, generational GC typically requires only a write barrier, not a read barrier. For example, ZGC requires a read barrier, but not a write barrier. For concurrent compression, I have yet to see a solution that doesn’t have a read barrier.
Another factor to consider is that even if the GC requires some type of barrier, they may be needed “only” for reading or writing references to the heap. Reading or writing primitives such as ‘int’ or ‘double’ may not require obstacles.
The Reference coloring is in the coloring room.
The original
The key to understanding ZGC is reference coloring. ZGC stores additional metadata in heap references. On x64 a reference is 64-bit wide (ZGC doesn’t support compressed oops or class pointers at the moment), but today’s hardware actually limits a reference to 48-bit for virtual memory addresses. Although to be exact only 47-bit, since bit 47 determines the value of bits 48-63 (for our purpose those bits are always 0).
ZGC reserves the first 42-bits for the actual address of the object (referenced to as offset in the source code). 42-bit addresses give you a theoretical heap limitation of 4TB in ZGC. The remaining bits are used for these flags: finalizable, remapped, Marked1 and Marked0 (one bit is reserved for future use). There is a really nice ASCII drawing in ZGC’s source that shows all these bits:
6 4 4 4 4 4 0 3 7 5 2 0 + 1 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 00000000 00000000 | | 1111 | 0 11 11111111 11111111 11111111 11111111 11111111 | +-------------------+-+----+-----------------------------------------------+ | | | | | | | * 41-0 Object Offset (42-bits, 4TB address space) | | | | | * 45-42 Metadata Bits (4-bits) 0001 = Marked0 | | 0010 = Marked1 | | 0100 = Remapped | | 1000 = Finalizable | | | * 46-46 Unused (1-bit, always zero) | * 63-47 Fixed (17-bits, always zero)Copy the code
Having metadata information in heap references does make dereferencing more expensive, since the address needs to be masked to get the real address (without metainformation). ZGC employs a nice trick to avoid this: When reading from memory exactly one bit of marked0
, marked1
or remapped
is set. When allocating a page at offset x
, ZGC maps the same page to 3 different address:
- for
marked0
:(0b0001 << 42) | x
- for
marked1
:(0b0010 << 42) | x
- for
remapped
:(0b0100 << 42) | x
ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB. Here is another nice drawing from ZGC’s source:
+--------------------------------+ 0x0000140000000000 (20TB)
| Remapped View |
+--------------------------------+ 0x0000100000000000 (16TB)
| (Reserved, but unused) |
+--------------------------------+ 0x00000c0000000000 (12TB)
| Marked1 View |
+--------------------------------+ 0x0000080000000000 (8TB)
| Marked0 View |
+--------------------------------+ 0x0000040000000000 (4TB)
Copy the code
At any point of time only one of these 3 views is in use. So for debugging the unused views can be unmapped to better verify correctness.
The translation
The key to understanding ZGC is the technical implementation of the dye pointer. The ZGC stores additional metadata internally in heap references (Card tables, Rsets, and so on). On x64 servers, its reference pointer is 64 bits wide (ZGC currently does not support compression oops or class pointer: Klass Pointer), but today’s hardware actually limits references to virtual memory addresses to 48 bits, although it is only 47 bits to be exact, but values from 48 to 63 bits are always 0, as shown in the model below.
The ZGC reserves the first 42 bits of the object’s actual address (called offset in the source code). The 42-bit address gives us a theoretical heap limit of 4TB in the ZGC. The remaining bits are used for these flags: Finalizable, REMAPPED, MarkeD1, and Marked0 (one is reserved for future use). ZGC’s Source has a nice ASCII plot that shows all of these bits:
6 4 4 4 4 4 0 3 7 5 2 0 + 1 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 00000000 00000000 | | 1111 | 0 11 11111111 11111111 11111111 11111111 11111111 | +-------------------+-+----+-----------------------------------------------+ | | | | | | | * 41-0 Object Offset (42-bits, 4TB address space) | | | | | * 45-42 Metadata Bits (4-bits) 0001 = Marked0 | | 0010 = Marked1 | | 0100 = Remapped | | 1000 = Finalizable | | | * 46-46 Unused (1-bit, always zero) | * 63-47 Fixed (17-bits, always zero)Copy the code
Including metadata information in the heap reference does make it more expensive to clear the reference, because the address needs to be masked to get the real address (no meta-information). The ZGC uses a good technique to avoid this type of situation: set ‘marked1’ or ‘remaped’, when a bit of ‘marked0’ is read from memory. When a page is allocated at offset ‘x’, the ZGC maps the same page to three different addresses.
- for
marked0
:(0b0001 << 42) | x
- for
marked1
:(0b0010 << 42) | x
- for
remapped
:(0b0100 << 42) | x
Therefore, ZGC reserves only 16TB of address space from address 4TB (but does not actually use all memory).
+--------------------------------+ 0x0000140000000000 (20TB)
| Remapped View |
+--------------------------------+ 0x0000100000000000 (16TB)
| (Reserved, but unused) |
+--------------------------------+ 0x00000c0000000000 (12TB)
| Marked1 View |
+--------------------------------+ 0x0000080000000000 (8TB)
| Marked0 View |
+--------------------------------+ 0x0000040000000000 (4TB)
Copy the code
Pages & Physical & Virtual Memory
The original
Shenandoah separates the heap into a large number of equally-sized regions. An object usually does not span multiple regions, except for large objects that do not fit into a single region. Those large objects need to be allocated in multiple contiguous regions. I quite like this approach because it is so simple.
ZGC is quite similar to Shenandoah in this regard. In ZGC’s parlance regions are called pages. The major difference to Shenandoah: Pages in ZGC can have different sizes (but always a multiple of 2MB on x64). There are 3 different page types in ZGC: small (2MB size), medium (32MB size) and large (some multiple of 2MB). Small objects (up to 256KB size) are allocated in small pages, medium-sized objects (up to 4MB) are allocated in medium pages. Objects larger than 4MB are allocated in large pages. Large pages can only store exactly one object, in constrast to small or medium pages. Somewhat confusingly large pages can actually be smaller than medium pages (e.g. for a large object with a size of 6MB).
Another nice property of ZGC is, that it also differentiates between physical and virtual memory. The idea behind this is that there usually is plenty of virtual memory available (always 4TB in ZGC) while physical memory is more scarce. Physical memory can be expanded up to the maximum heap size (set with -Xmx for the JVM), so this tends to be much less than the 4 TB of virtual memory. Allocating a page of a certain size in ZGC means Allocating both physical and virtual memory. With ZGC the physical memory doesn’t need to be contiguous – only the virtual memory space. So why is this actually a nice property?
Allocating a contiguous range of virtual memory should be easy, since we usually have more than enough of it. But it is quite easy to imagine a situation where we have 3 free pages with size 2MB somewhere in the physical memory, but we need 6MB of contiguous memory for a large object allocation. There is enough free physical memory but unfortunately this memory is non-contiguous. ZGC is able to map this non-contiguous physical pages to a single contiguous virtual memory space. If this wasn’t possible, we would have run out of memory.
On Linux the physical memory is basically an anonymous file that is only stored in RAM (and not on disk), ZGC uses memfd_create to create it. The file can then be extended with ftruncate, ZGC is allowed to extend the physical memory (= the anonymous file) up to the maximum heap size. Physical memory is then mmaped into the virtual address space.
The translation
Shenandoah divides the heap into a large number of equally sized areas. Objects generally do not span multiple regions, except for large objects that do not fit into a single region. These large objects need to be distributed across multiple contiguous regions. I really like this method because it’s so simple.
ZGC is very similar to Shenandoah, and according to ZGC, the main difference between these areas called memory pages and Shenandoah is that pages in ZGC can have different sizes (but are always multiples of 2MB on the X64). There are three different page types in the ZGC: small (2MB size), medium (32MB size), and large (some multiples of 2MB).
Small objects (up to 256KB) are allocated on small pages, objects of medium size (up to 4MB) are allocated on medium pages, and objects larger than 4MB are allocated on large pages. A large page can store only one object compared to a small or medium memory page. Somewhat confusingly, large pages can actually be smaller than medium pages (for example, for large objects with a size of 6MB).
Allocating a contiguous virtual memory range should be easy because we usually have enough virtual memory. But it’s easy to imagine a situation where we have three free pages of 2MB size in physical memory, but we need 6MB contiguous memory to allocate a large object. There is enough physical memory available, but unfortunately this memory is discontinuous. ZGC has the ability to copy discontiguous physical pages into a single contiguous virtual memory space. If this is not possible, we will run out of memory (OOM, or frequent FullGC).
On Linux, physical memory is basically an invisible anonymous program service (meaning transparent and black-box to developers) that is stored only in RAM (not on disk), and ZGC creates it using memfd_create. This file can then be extended using fTRUNCate, and ZGC allows physical memory to be expanded to the maximum heap size. Physical memory is where mMAP is inserted into the virtual address space.
Marking & Relocating objects
The original
A collection is split into two major phases: marking & relocating. (Actually there are more than those two phases but see the source for more details).
A GC cycle starts with the marking phase, which marks all reachable objects. At the end of this phase we know which objects are still alive and which are garbage. ZGC stores this information in the so called live map for each page. A live map is a bitmap that stores whether the object at the given index is strongly-reachable and/or final-reachable (for objects with a finalize-method).
During the marking-phase the load-barrier in application-threads pushes unmarked references into a thread-local marking buffer. As soon as this buffer is full, the GC threads can take ownership of this buffer and recursively traverse all reachable objects from this buffer. Marking in an application thread just pushes the reference into a buffer, the GC threads are responsible for walking the object graph and updating the live map.
After marking ZGC needs to relocate all live objects in the relocation set. The relocation set is a set of pages, that were chosen to be evacuated based on some criteria after marking (e.g. those page with the most amount of garbage). An object is either relocated by a GC thread or an application thread (again through the load-barrier). ZGC allocates a forwarding table for each page in the relocation set. The forwarding table is basically a hash map that stores the address an object has been relocated to (if the object has already been relocated).
The advantage with ZGC’s approach is that we only need to allocate space for the forwarding pointer for pages in the relocation set. Shenandoah in comparison stores the forwarding pointer in the object itself for each and every object, which has some memory overhead.
The GC threads walk over the live objects in the relocation set and relocate all those objects that haven’t been relocated yet. It could even happen that an application thread and a GC thread try to relocate the same object at the same time, in this case the first thread to relocate the object wins. ZGC uses an atomic CAS-operation to determine a winner.
While not marking the load-barrier relocates or remaps all references loaded from the heap. That ensure that every new reference the mutator sees, already points to the newest copy of an object. Remapping an object means looking up the new object address in the forwarding table.
The relocation phase is finished as soon as the GC threads are finished walking the relocation set. Although that means all objects have been relocated, there will generally still be references into the relocation set, that need to be remapped to their new addresses. These reference will then be healed by trapping load-barriers or if This doesn’t happen soon enough by the next marking cycle. That means marking also needs to inspect the forward table to remap (but not relocate – all objects are guaranteed to be relocated) objects to their new addresses.
This also explains why there are two marking bits (marked0
and marked1
) in an object reference. The marking phase alternates between the marked0
and marked1
bit. After the relocation phase there may still be references that haven’t been remapped
and thus have still the bit from the last marking cycle set. If the new marking phase would use the same marking bit, the load-barrier would detect this reference as already marked.
The translation
Garbage collection is divided into two main phases: tagging and relocation. In fact, there are more than two phases; the GC cycle starts with the marking phase, marking all reachable objects. At the end of this phase, we know which objects still exist and which are garbage.
The ZGC stores this information in what are called real-time reference graph relationships on each page. Real-time reference relationships are bitmaps that store whether objects at a given index are strongly and/or ultimately reachable (for objects with a “Finalize” – method).
During the tagging phase, the Load barrier pushes untagged references into the thread-local tag buffer within the application thread. Once the buffer is full, the GC thread can take ownership of the buffer and recursively traverse all accessible objects in the buffer. Marking in the application thread simply pushes the reference into the buffer, and the GC thread is responsible for traversing the object graph and updating the active map.
After you mark the ZGC, you need to relocate all the active objects in the marked collection. A relocation set is a set of physical memory pages that are labeled and selected for decentralized homogenization (to prevent starvation or imbalance) based on some criteria (for example, the most garbage filled memory pages). Objects are repositioned by GC threads or application threads again via [load barrier](dinfuehr.github. IO /#load-barri…
The ZGC assigns a forwarding table to each page in the relocation set. The forwarding table is basically a hash map that stores the address to which the object has been relocated, if it has been relocated.
The advantage of the ZGC approach is that we only need to allocate space for forwarding Pointers to memory pages in the relocation set. By contrast, Shenandoah stores the forward pointer to each object in the object itself, which has some memory overhead.
The GC thread traverses the live objects in the relocation set and relocates any objects that have not been relocated. The application thread and the GC thread may even attempt to relocate the same object at the same time, in which case the first thread to relocate the object wins. The ZGC uses the atomic CAS mechanism to determine the winner’s operation.
However, it does not mark the [read barrier](dinfuehr.github. IO /#load-barri…
The relocation phase ends when the GC thread completes traversing the relocation set. Although this means that all objects have been relocated, there are usually still references to the relocation set that need to be remapped to their new addresses. These references are then fixed by capturing read barriers if before the next marking cycle, or if this does not happen soon. This means that the tag also needs to check the forwarding table in order to remap (but not relocate – all objects are guaranteed to relocate) to their new address.
This also explains why there are two marker bits (marked0 and Marked1) in the object reference. The marking phase alternates between the “Marked0” and “Marked1” bits. After the relocation phase, there may still be references that have not been “remapped” and therefore bits that were set during the previous marking period. If the new marking phase will use the same marking bit, the read barrier will detect this marked reference.
Load-barrier
ZGC needs a so called load-barrier (also referred to as read-barrier) when reading a reference from the heap. We need to insert this load-barrier each time the Java program accesses a field of object type, e.g. obj.field. Accessing fields of some other primitive type do not need a barrier, Obj. anInt or obJ. anDouble. ZGC doesn’t need store/write-barriers for obj.field = someValue.
Depending on the stage the GC is currently in (stored in the global variable ZGlobalPhase), the barrier either marks the object or relocates it if the reference isn’t already marked or remapped.
The global variables ZAddressGoodMask and ZAddressBadMask store the mask that determines if a reference is already considered good (that means already marked or remapped/relocated) or if there is still some action necessary. These variables are only changed at the start of marking- and relocation-phase and both at the same time. This table from ZGC’s source gives a nice overview in which state these masks can be:
GoodMask BadMask WeakGoodMask WeakBadMask -------------------------------------------------------------- Marked0 001 110 101 010 Marked1 010 101 110 001 Remapped 100 011 100 011Copy the code
Assembly code for the barrier can be seen in the MacroAssembler for x64, I will only show some pseudo assembly code for this barrier:
mov rax, [r10 + some_field_offset]
test rax, [address of ZAddressBadMask]
jnz load_barrier_mark_or_relocate
# otherwise reference in rax is considered good
Copy the code
The first assembly instruction reads a reference from the heap: r10 stores the object reference and some_field_offset is some constant field offset. The loaded reference is stored in the rax register. This reference is then tested (this is just an bitwise-and) against the current bad mask. Synchronization isn’t necessary here since ZAddressBadMask only gets updated when the world is stopped. If the result is non-zero, we need to execute the barrier. The barrier needs to either mark or relocate the object depending on which GC phase we are currently in. After this action it needs to update the reference stored in r10 + some_field_offset with the good reference. This is necessary such that subsequent loads from this field return a good reference. Since we might need to update the reference-address, we need to use two registers r10 and rax for the loaded reference and the objects address. The good reference also needs to be stored into register rax, such that execution can continue just as when we would have loaded a good reference.
Since every single reference needs to be marked or relocated, throughput is likely to decrease right after starting a marking- or relocation-phase. This should get better quite fast when most references are healed.
Translation (Read barrier)
The ZGC requires a so-called read barrier (also known as a read barrier) when reading references from the heap. We need to insert this read barrier every time a Java program accesses a field of an object type, such as “obj.field.” There are other basic types of fields that don’t require barriers to access, such as ‘obj.anInt’ or ‘obj.anDouble’. For ‘obj.field=someValue ‘, ZGC does not require a storage/write barrier.
Depending on the GC’s current phase (stored in the global variable ZGlobalPhase), the barrier marks the object or relocates the object if the reference has not been marked or remapped.
The global variables ZAddressGoodMask and ZAddressBadMask store determine whether the reference is already considered a good mask (which means marked or remapped/repositioned) or if some action is still required. These variables change only at the beginning of the mark and relocation phases and at the same time, and this table from the ZGC source provides a good overview of the states in which these masks can be:
GoodMask BadMask WeakGoodMask WeakBadMask -------------------------------------------------------------- Marked0 001 110 101 010 Marked1 010 101 110 001 Remapped 100 011 100 011Copy the code
The assembly code for the barrier can be seen in MacroAssembler, but for X64 I’ll just show some pseudo-assembly code for this barrier:
mov rax, [r10 + some_field_offset]
test rax, [address of ZAddressBadMask]
jnz load_barrier_mark_or_relocate
# otherwise reference in rax is considered good
Copy the code
The first assembly instruction reads the reference from the heap: R10 stores the object reference, and some_field_offset is some constant field offset. Loaded references are stored in the ‘rax’ register. The reference is then tested against the current bad mask (which is just a bitwise and). No synchronization is required because ZAddressBadMask is updated only when STW is used. If the result is non-zero, we need to enforce barriers. Barriers need to mark or relocate objects depending on the GC phase we are currently in. After doing this, it needs to update the reference stored in “R10 + some fields \u offset” with the correct reference. This is necessary so that subsequent loads of the field return a good reference. Since we may need to update the reference address, we need to use two registers’ R10 ‘and ‘rax’ as the loaded reference and object address. A good reference also needs to be stored in the register “rax” so that execution can continue as if the reference had been loaded.
Since each reference needs to be marked or relocated, throughput may decrease immediately after the marking or relocation phase begins. This situation should improve soon when most references are processed.
Stop-the-World Pauses
· The ZGC doesn’t get rid of stop-the-world pauses completely. The collector needs pauses when starting marking, ending marking and starting relocation. But this pauses are usually quite short – only a few milliseconds.
When starting marking ZGC traverses all thread stacks to mark the applications root set. The root set is the set of object references from where traversing the object graph starts. It usually consists of local and global variables, but also other internal VM structures (e.g. JNI handles).
Another pause is required when ending the marking phase. In this pause the GC needs to empty and traverse all thread-local marking buffers. Since the GC could discover a large unmarked sub-graph this could take longer. ZGC tries to avoid this by stopping the end of marking phase after 1 millisecond. It returns into the concurrent marking phase until the whole graph is traversed, then the end of marking phase can be started again.
Starting relocation phase pauses the application again. This phase is quite similar to starting marking, with the difference that this phase relocates the objects in the root set.
STW pause phase
The ZGC does not stop all application threads at all times.
-
The collector needs to pause when it starts marking –
- At the start of the tag, ZGC traverses all thread stacks to mark the GCROOT root set. The GCROOT root set is a set of direct object references that start with traversing the object link-diagram. It typically includes local and global variables, but also other internal VM structures (such as JNI handles).
-
End tag – But this pause is usually very short, only a few milliseconds.
- When you end the marking phase, you need to pause again. During this pause, the GC needs to empty and traverse all thread-local tag buffers. Since THE GC can find a large chain of unlabeled child references, this can take longer. The ZGC tries to avoid this by stopping the end of the marking phase after 1 millisecond. It returns to the concurrent marking phase until the entire graph is traversed, and then can start marking the end of the phase again.
-
Reposition – But this pause is usually very short, only a few milliseconds.
- Starting the relocation phase pauses the application again. This stage is very similar to the start tag, except that it relocates the objects in the root set.
Recommend to everyone the address of the great god
- Cr.openjdk.java.net/~pliden/zgc…).