Note: The following content is based on an in-depth Understanding of the JVM 3th.

Next, let’s talk about G1. As the main JVM GC at present, G1 is used as the main garbage collector in the Internet. Only by understanding the principle and process of G1 recycling, can we better locate and solve problems.


-xx :+UseG1GC Enable G1GC

G1 Memory Partition

G1 looks similar to CMS, but the implementation is quite different.

Traditional generational GC divides the entire memory into several large regions, such as Eden,S0,S1,Tenured, etc.

In G1, memory regions are divided into n discontinuous regions of the same size. The size of regions ranges from 1 to 32 MB, depending on the total memory size. The target is to have no more than 2048 regions. As shown below:

image-20200701114120927

Each Region plays a different role in G1, such as Eden(newborn), Survivor(Survivor), or Old(Old).

In addition to the traditional old age and Cenozoic era, G1 also divides into Humongous regions, which are used to house Humongous objects (H-OBJ).

For large objects, it is worth noting the following:

  • H-obj is defined as an object greater than or equal to half of Region

  • H-obj is assigned directly to Old Gen to prevent frequent copying. However, the collection of H-OBJ is not in the Mixed GC phase, but in the clean up process and full GC in the concurrent marking phase

    Be aware of this, as you will often find this in GC logs during tuning

    [GC pause (G1 Humongous Allocation) (young) (initial-mark), 0.0029216 secs]

    The puzzle is why Humongous Allocation causes yong GC.

    The reason is to start concurrent marking with Yong GC’s initial-mark, and to recycle large objects with Clean up

    If you want to look at G1 logs, you might allocate large objects to fill the heap and trigger GC for quick GC, but if you’re looking at large objects, you might find no Mixed GC in the GC logs, but rather frequent Yong GC and Concurrent Marking. That’s why

  • H-obj is never moved, and while G1’s collection algorithm is generally based on mark-collation, h-OBJ is never moved, either directly recycled or always present. Therefore, H-OBJ can cause large memory fragmentation and thus frequent GC


G1The recovery process of

The way G1 is partitioned, it needs to manage both the new generation and the old generation. G1 is divided into two recovery modes according to different recovery areas:

  • Only part of the young generation’s memory is reclaimed:Yong GC
  • Reclaim all young memory and some old memory:Mixed GC

When mixed GC collection is not fast enough to allocate memory, G1 uses a single thread (using Serial Old code) for Full GC

While the entire Yong GC process returns to STW, Mixed GC mainly consists of two phases: the first phase is Concurrent Marking, which is followed by older generation garbage collection when Concurrent Marking is complete: This process consists of copying and moving objects and collating regions.


Mixed GC

It would be logical to start with Yong Gc, but it seems that Yong Gc is not the focus of G1 collection, and there are no parameters that control Yong Gc, so I’ll skip it here. However, it is important to know that the Yong GC’s collection process is similar to that of other garbage collectors in that it is marked and then copied. However, the whole process will be STW, and since the Marking process of Yong GC is the same as the first stage of Concurrent Marking, initial Marking, in the later Mixed GC, therefore, The initial Marking phase of Concurrent Marking is always carried out with Yong GC.

Mixed GC is divided into two phases, the first phase is concurrent markup and the second phase is filter collection. The concurrent marking process is as follows:

  • Initial marking
  • Concurrent marking
  • Final marking,remarking
  • Cleanup

Cleaning here is not cleaning objects (unlike CMS), and while Understanding the JVM treats cleaning and filtering as a single process, on GC logs they are completely separate processes. Here, GC logs prevail

After marking, G1 then selects some areas for screening and recycling. Note that these stages can be executed separately, that is, they may be executed as follows:

Start the program-> young GC
-> young GC
-> young GC
-> young GC + initial marking (... concurrent marking ...) -> young GC (... concurrent marking ...) (... concurrent marking ...) -> young GC (... concurrent marking ...) -> final marking -> cleanup -> mixed GC -> mixed GC -> mixed GC .-> mixed GC -> young GC + initial marking (... concurrent marking ...) Copy the code

What follows is a detailed description of each tagging phase.

Initial marking: Stops The World from marking all objects that are reachable directly from GC Root. This process pauses, but it borrows from Yong GC’s pause phase, so there are no additional, separate pause phases.

Concurrent marking: The concurrency phase. Starting from the objects scanned in the previous phase, each object is marked as alive. Note: This procedure also scans for references recorded by SATB (concurrent snapshot).

Recall concurrent snapshot: It is a solution to the problem that objects may be mismarked during concurrent processing because users modify reference relationships. CMS uses incremental updates, where G1 uses concurrent snapshots that record all references at the beginning of the concurrent marker.

Final marking (remarking) : STW. Although SATB was scanned in the previous concurrent marking process, after all, the previous stage was still a concurrent process, so it was necessary to suspend all user threads and mark SATB again after the completion of concurrent marking. This process also handles weak references.

All three phases are similar to CMS, which also handles weak references during the final markup phase.

However, the final marking phase of CMS requires rescan of the entire Yong Gen, so the REMARK phase of CMS may be very slow.

Clean up: Pause phase. Clean up and reset the tag state. It is used to count the number of live objects in each region. If a region with no live objects is found in this phase, the whole region is reclaimed to the list of allocatable regions.


After the marking is complete, the Evacuation phase is completely suspended. It copies live objects from a part of a region to an empty region and reclaims the original region space. You can select any number of regions to form a Collection Set. After selecting a Collection Set, Objects in the Collection Set can be copied into the new region in parallel.


Now that WE understand the overall collection process of G1, we can compare CMS to see how G1 handles some problems in the concurrent process:

  1. Remember Set: As mentioned above, CMS chose not to maintain the memory Set of the new generation to the old age for the problem of cross-generation reference, because the new generation changes too quickly and the maintenance cost is high. However, G1’s solution is to add Yong Gen to the Collection Set regardless of Yong GC or Mixed GC. To put it simply, either only the new generation is recycled, or the whole new generation is recycled with the old generation, thus avoiding the maintenance of the old generation memory set by the new generation.

    We are only talking about the maintenance of the memory set of the old generation’s references to the old generation, but the old generation’s references to the new generation still maintain a memory set

  2. Reference changes during concurrency: Here in the Remarking phase, CMS uses incremental updates while G1 uses parallel snapshot-at-the-beginning.

  3. G1 also maintains memory sets and concurrent snapshots through write barriers.


G1 Reclamation Log

talk is cheap, show me the code

The above process is summarized in Understanding the JVM and some resources on the web. If this is the case, let’s take a look at the G1 reclaim log:

Yong GC

//GC cause: Allocating large object GC type yong GC for initial mark with concurrent mark this GC cost 0.0130262s
[GC pause (G1 Humongous Allocation) (young) (initial-mark), 0.0130262 secs]
   // During pause, the parallel collection takes 4.5ms with 4 threads collecting simultaneously
[Parallel Time: 4.5ms, GC Workers: 4] // Concurrent work start timestamp  [GC Worker Start (ms): Min: 1046.3, Avg: 1046.3, Max: 1046.4, Diff: 0.1]  // The time spent scanning the root collection (thread stack, JNI, global variables, system tables, etc.)  [Ext Root Scanning (ms)Min: 0.9, Avg: 1.0, Max: 1.2, Diff: 0.3, Sum: 4.0] // Update Remember Set time  // Remember Set is maintained through write barriers and buffers  // The time that Remember Set is up to date  [Update RS (ms)Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] // How many buffers were processed during Update RS  [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]  // Time to scan memory set  [Scan RS (ms)Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] // Scan the Root node time in the code  [Code Root Scanning (ms)Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] // The time to copy (evacuate) the object  [Object Copy (ms)Min: 3.0, Avg: 3.0, Max: 3.1, Diff: 0.0, Sum: 12.1] // Thread stealing algorithm, after each thread completes a task, it tries to help other threads finish the rest of the task  [Termination (ms)Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] // The number of times the thread succeeded in stealing the task  [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 5]  // Time to complete other tasks during the GC process  [GC Worker Other (ms)Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.2, Sum: 0.8] // Shows the minimum, maximum, average, difference, and total time for each garbage collection thread.  [GC Worker Total (ms)Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 17.1] //min indicates the time of the earliest thread to complete the task, and Max indicates the time of the last thread to receive the task  [GC Worker End (ms): Min: 1050.6, Avg: 1050.6, Max: 1050.6, Diff: 0.0]  // Release manage parallel garbage collection activity data structures [Code Root Fixup: 0.0ms] // Clean up other data structures [Code Root Purge: 0.0ms] // Remember Set 0.8 ms] [Clear CT: // Other functions 7.8 ms] [Other: // Evaluate areas to collect. YongGC does not collect all of them, but according to expectations 0.0 ms] [Choose CSet: // Handle Java references soft, weak, final, phantom, JNI, etc. 5.2 ms] [Ref Proc: // Iterate over all references and place those that cannot be reclaimed into the pending list 0.1 ms] [Ref Enq: // Cards that are modified during the collection process will be reset to dirty 0.4 ms] [Redirty Cards:0.0 ms] [Humongous Register:0.0 ms] [Humongous Reclaim: // The partition to be freed is returned to the free list. 0.1 ms] [Free CSet: //Eden uses 3072K before recycling, a total of 12M -- > Eden uses 0B after recycling, a total of 11M  //Survivors use 0B before recovery and 1024K after recovery  // The whole heap used 101M before recycling, a total of 256M, after recycling 99M, a total of 256M  // The purpose of Yong GC is to start Concurrent Marking  [Eden: 3072.0K(12.0M)- > 0.0B(11.0M)Survivors: 0.0 B - > 1024.0 K Heap: 101.0M(256.0M)- > 99.0M(256.0M)]  [Times: user=0.00 sys=0.00, real=0.01 secs] Copy the code

Concurrent Marking

Concurrency markers typically occur after the Yong GC. The initial mark was completed after Yong GC

// Scan GC Roots
[GC concurrent-root-region-scan-start]
// Scan GC Roots to complete, costing 0.0004341s
[GC concurrent-root-region-scan-end, 0.0004341 secs]
// The concurrent marking phase begins
[GC concurrent-mark-start] // Introduction to concurrent markup. 0.0002240 s [GC concurrent-mark-end, 0.0002240 secs] // re-mark start, will STW. //Finalize Marking cost 0.0006341s // Handle references. 0.0000478 secs // Unload the class. 0.0008091 secs // Total cost 0.0020776 secs [GC remark [Finalize Marking, 0.0006341 secs] [GC ref-proc, 0.0000478 secs] [Unloading, 0.0008091 secs], 0.0020776 secs]  [Times: user=0.00 sys=0.00, real=0.00 secs] // The cleanup phase will STW // Main: Marks all objects allocated after the 'initial mark' stage, and marks regions with at least one living object // Clear Old regions and Humongous regions with no living objects // Process rsets without any living objects // Sort all Old regions by the survival rate of the objects // It takes 0.0013110s to clear the Humongous Region [GC cleanup 150M->150M(256M), 0.0013110 secs]  [Times: user=0.00 sys=0.00, real=0.00 secs] Copy the code

Mixed GC

After the concurrent tag is completed, Mixed GC can be carried out. The main work of Mixed GC is to reclaim regions filtered out during the process of concurrent tag

Mixed GC does much the same as the Yong GC process, except that the reclaimed content is based on concurrent tags.

G1 may not be able to collect all of the candidate partitions at once, so G1 may produce multiple successive mixed collections that alternate with application threads

  [GC pause (G1 Evacuation Pause) (mixed), 0.0080519 secs]
[Parallel Time: 7.6ms, GC Workers: 4]      [GC Worker Start (ms): Min: 140411.4, Avg: 140415.1, Max: 140418.9, Diff: 7.4]
      [Ext Root Scanning (ms)Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 1.0] [Update RS (ms)Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3] [Processed Buffers: Min: 0, Avg: 1.3, Max: 4, Diff: 4, Sum: 5]  [Scan RS (ms)Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Code Root Scanning (ms)Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms)Min: 0.0, Avg: 2.6, Max: 5.2, Diff: 5.2, Sum: 10.3] [Termination (ms)Min: 0.0, Avg: 0.9, Max: 1.8, Diff: 1.8, Sum: 3.4] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]  [GC Worker Other (ms)Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms)Min: 0.1, Avg: 3.8, Max: 7.5, Diff: 7.4, Sum: 15.2] [GC Worker End (ms): Min: 140418.9, Avg: 140418.9, Max: 140418.9, Diff: 0.0] [Code Root Fixup: 0.0ms][Code Root Purge: 0.0ms]0.1 ms] [Clear CT:0.4 ms] [Other:0.0 ms] [Choose CSet:0.1 ms] [Ref Proc:0.0 ms] [Ref Enq:0.1 ms] [Redirty Cards:0.0 ms] [Humongous Register:0.0 ms] [Humongous Reclaim:0.0 ms] [Free CSet: [Eden: 4096.0K(4096.0K)- > 0.0B(138.0M)Survivors: 8192.0 K - > 2048.0 K Heap: 68.0M(256.0M)- > 65.5M(256.0M)]  [Times: user=0.03 sys=0.00, real=0.01 secs] Copy the code

The log is posted here just to show that they are the same, so I will not comment further here.


Full GC

G1 Full GC

//GC type: Full GC GC Cause: 298M memory used before GC and 509K used after GC. Time: 0.0101774s
[Full GC (System.gc()) 298M->509K(512M), 0.0101774 secs]
  // New generation: 122M before GC, 0B after GC. The total capacity is expanded from 154M to 230M
  // Survive zone: use 4096K before GC and use 0B after GC
 // Total memory: 298M used before GC and 509K used after GC Total memory: 512M unchanged  // Meta space: 3308K before GC and 3308K after GC Total capacity 1056768K  [Eden: 122.0M(154.0M)- > 0.0B(230.0M)Survivors: 4096.0 K - > 0.0 B Heap: 298.8M(512.0M)- > 509.4K(512.0M)], [Metaspace: 3308K->3308K(1056768K)]  [Times: user=0.01 sys=0.00, real=0.01 secs] Copy the code

It can be seen that the GC time is 10ms, but this is only when the whole heap is only 512M and only 300M, G1 has no Full GC mechanism. G1 GC is the Serial Old code used (later optimized for multi-threading, but still relatively slow). As a result, Full GC pauses for a long time, so in a production environment, be aware of Full GC, which is normally acceptable every few days.

G1 Full GC:

  • Mixed GC can’t keep up with the speed of memory allocation and can only free memory through Full GC, a solution for this will come later

  • For classes that use a lot of reflection and dynamic proxy, each class of dynamic proxy generates a new class, and the class information is stored in the MetaSpace. Therefore, if the MetaSpace is insufficient, G1 will rely on Full GC to expand the MetaSpace. In this case, the solution is to expand the initial MetaSpace size.

  • Humongous allocation fails. As mentioned earlier, when G1 allocates large objects, collection is subject to Concurrent Marking or Full GC, so if large object allocation fails, Full GC may be raised

    The exact rules are not clear here, as Humongous is triggered by Concurrent Marking when testing.


G1 tuning

The point of all this is to understand how to adjust GC when it affects the online environment. Therefore, by understanding the G1 recycling process, you can have a general idea of what each parameter does and how to modify it.

  • -xx :+UseG1GC: uses the G1 collector.

  • -xx: MaxGCPauseMillis = 200: Sets the maximum pause target. The default value is 200, G1 will only try its best to reach this target, and this target value needs to be adjusted in combination with the project. If the time is too short, it may cause a decrease in the swallow. At the same time, the Mixed GC collects too little garbage, resulting in the final garbage accumulation and Full GC; if the time is too long, it may cause poor user experience.

  • – XX: InitiatingHeapOccupancyPercent = 45: concurrent began to heap usage of GC cycle threshold, when the whole heap usage threshold, will start to concurrent cycle. Like CMS, this parameter is mainly used to prevent concurrent failure in Mixed GC process. If concurrent collection is carried out too late, the remaining memory in the concurrent process may not be enough to meet the memory of the user’s tree daemon, which will cause G1 to give up the concurrent mark and upgrade to Full GC. This is a case where you can see the words to-space Exhausted in GC.

    This parameter can not be set too low, too small will cause a loop, CPU resources.

  • -XX:G1MixedGCCountTarget=8 : The amount of old age memory recovered by Mixed GC is 8 by default. This is also to solve the problem of To-space Exhausted. If Mixed GC collects more old age memory, it will increase the pause time

  • -xx :ConcGCThreads: Indicates the number of cpus used in each GC. The value is not fixed. Also in order to solve the problem of to-space Exhausted, the GC would be faster if there were too many threads, at the expense of the user’s CPU time.

  • -XX:G1ReservePercent=10%: Number of false ceilings. The function is to reserve 10% space for non-use in case of to-space Exhausted in concurrent cycle process. In case of to-space Exhausted, too much reservation may lead to memory waste

  • Do not set the young generation size: Do not use -xMN, because G1 expands or shrinks the young generation size by the need. If you set the young generation size, G1 will not be able to use the pause time target.


These are just some of the common parameters that G1 needs to be aware of. Of course, there may be other problems, such as allocation of large objects, size of metadata, etc. In general, if you understand the GC collection process, you can find the problem through the GC log.

References:

Some key technologies for Java Hotspot G1 GC — Meituan Technical team

Consult the principle of G1 algorithm

In-depth Understanding of G1 GC Logs (Part 1)

Garbage First Garbage Collector Tuning

HotSpot VIRTUAL Machine Garbage Collection Optimization Guide -G1

Fat MAO said summing up classic Java books reading notes.