Basic concept

Definition: Garbage First, a Garbage collector for server-side applications. -xx :+UseG1GC Target: “pause time model” collector: can support specifying targets with a high probability of not spending more than N milliseconds on garbage collection within a time segment of M milliseconds in length. Application scenario: Applies to machines with large memory and multiple cpus.

Design concept

Jumping out of the previous collection of either the new generation or the old age, G1 collects and recyles any part of all heap memory. The measurement standard is no longer which generation it belongs to, but which area has the largest amount of garbage stored and the largest recycling benefit.

division

Region area

Definition: Divide the Java heap into equal sized regions. Each Region can be a new generation or an old generation. G1HeapRegionSize: -xx :G1HeapRegionSize, not specified G1 is self-specified according to the heap size

Humongous Region (part of Region)

Definition:It is used to store large objects (objects that exceed half of the Region capacity are large objects). The objects that exceed the entire Region are stored in multiple contiguous Humongous regions.

Reclaim identity:G1 sees Humongous as part of an older era

Garbage collection

With R large answers, at the highest level, the Collector side of G1 has two large parts that can be executed relatively independently.

Global Concurrent marking
Copy alive objects (evacuation)

Global Concurrent marking

Concurrent markup in The form of SATB(Snapshot At The Begining).

1. Initial tag (STW)

G1 marks the roots. Scan the root collection, mark all objects reachable directly from the root collection (CMS’s initial markup is similar) and push their fields onto the marking stack for subsequent scans. The G1 uses an external bitmap to record mark information, rather than the Mark bit in the mark Word in the object header. In generational G1 mode, the initial marking phase borrows yong GC pauses, so there are no additional, separate pause phases.

Why is the initial marking phase done using Yong GC pauses?

Logically from “global concurrent mark” and “object” survival “copy is relatively independent, but” global concurrent tag “and the” initial tag “phase, the phase and Yong GC do overlap – traverse the root collection, so put them together to do in the implementation, Yong during the GC can do it, incidentally, also can not do.

2. Concurrent markup

G1 looks for accessible (live) objects throughout the heap, recursively scanning the object graph throughout the heap. Each scanned object is marked and pushed onto the scan stack. Repeat the scan process until the scan stack is empty. During the process, references recorded by the SATB write barrier are also scanned, which is described later.

3. Final/relabeling (STW)

Process the remaining unprocessed SATB write barrier records during the concurrent marking phase. ** This pause is essentially different from CMS remark. The pause only needs to scan the SATB buffer (rescan the old references as the root to avoid missing marks). CMS remark needs to rescan the dirty card in the mod-Union table plus the whole root set. ** In this case, the whole Yong area will be treated as part of the root set regardless of whether the object is dead or dead, so CMS remark may be very slow.

4. Cleanup (STW)

Counting and resetting the marking state is similar to the Sweep phase in Mark-sweep, but instead of sweeping the actual objects on the heap, the marking bitmap counts how many objects are marked alive for each Region. If a Region is found with no live objects at all, the whole Region is reclaimed into the list of allocatable regions (the free list).

Copy Alive Objects (STW)

Also called filter recycle/clean (STW), it is responsible for copying live objects from a part of a Region to an empty Region, and then reclaiming the space of the original Region. In this stage, you can select any multiple regions to form a Collection Set (CSet) independently, which is implemented by the RSet of each Region. This is characteristic of the Regional garbage collector. The ParallelScavenge avenge is the Young GC algorithm used to copy live objects from each Region in the CSet to a new Region, and the whole process is completely frozen. According to the paper “Garbage-First Garbage Collection”, the selection of CSet depends entirely on the statistical model to find several regions with the highest revenue and the cost within the upper limit specified by the user. Since each Region is covered by RSet, It is ok to evacuate any one or more regions. There are two submodes for selecting CSet in generational G1 mode:

Yong GC: Select all regions in the Yong Region. Control GC overhead by controlling the number of regions in the Yong Region.
Mixex GC: Select all Yong Gen regions, plus the old regions with the highest revenue according to Global Concurrent marking statistics. Select the old Region with high revenue as far as possible within the cost target range specified by the user

You can see that the Yong Region is always inside the CSet, so generational G1 does not maintain RSet updates for reference designs from the Yong Region.

Summary of workflow

The normal workflow of generational G1 (there is only generational G1, the others are not yet available) is to switch between YongGC and Mixed GC as the case may be, with global concurrency markers periodically behind it. Initialization flags are executed on YongGC by default; G1 does not choose to do MixedGC when global concurrent tokens are working, and does not initiate initialization tokens when MixedGC is in progress. There is no concept of a Full GC in a normal workflow, and the collection of Old sections is done entirely by MixedGC.

The problem

How to ensure that the collection thread and the user thread do not interfere with each other?

In the implementation details of the algorithm, the “three-color marking” algorithm is described. This algorithm clarifies all the states of an object during garbage collection, white, black, and gray. Garbage collection process, the state of an object may appear black object references the white object or gray with white object references between objects have disconnected (when two conditions at the same time satisfy the standard/wrong mark will appear when), in fact, that is, the original object structure in garbage collection process is broken, there are two solutions of this kind of situation: Incremental updates, raw snapshots. The G1 GC uses raw snapshots (SATB) and CMS uses incremental Updates.

SATB (Snapshot At The Begining)

SATB is a means of maintaining the correctness of concurrent GC, in the abstract

Live objects are considered alive at the start of a GC, at which point the object graph forms a logical “snapshot”;
Newly allocated objects are treated as alive during GC, and other unreachable objects are dead.

How does G1 know which objects are newly allocated after GC starts?

Each Region records two Pointers to TAMS (Top At Mark Start), prevTAMS and nextTAMS. G1 assigns newly allocated objects to TAMS during concurrent marking. Objects above TAMS are newly assigned and are therefore considered implicit tokens. G1 uses two bitmaps for concurrency markers:

PrevBitmap records the survival status of the object from the n-1 round of concurrent markup. Since the n-1 round of concurrent markup has been completed, the bitmap information can be used directly
NextBitmap records the results of the NTH round of concurrent marking. This bitmap is the result of current concurrent marking or ongoing marking and is not yet available

Each Region has several Pointers:

Top is the current allocation pointer of the Region, [bottom,top) is the used part of the Region, and [top,end) is the unused allocatable space of the Region.

[Bottom,preTAMS) The survival of the object here is known by prevBitmap
PrevTAMS,nextTAMS) is implicitly alive in the n-1 round of concurrent tokens
[nextTAMS,top) this part of the object is implicitly alive in the NTH round of concurrent markup

How does G1 handle changes to object references made by user threads during the concurrent marking phase?

SATB write barrier is a loop cut of the assignment to a reference field, which is covered by the barrier before and after the assignment. The part before the assignment is called a pre-write barrier, and the one after is called a post-write barrier. As we discussed in the JVM memory set article, other garbage collectors prior to the G1 GC only used post-write barriers. SATB maintains the logical snapshot of the “live object at GC start” state. Instead of marking the entire object graph from root, all you need to do is use a pre-write barrier to record the old reference values every time the reference relationship changes. In this way, when an object is concurrently marked, all changes in the reference type fields of that object are recorded, so that no objects alive in snapshot are missed. Of course, it is possible that an object is alive in the Snapshot, but as the concurrent GC progresses, it is dead but SATB will let it live through the GC, resulting in floal garbage. So in the G1 GC, the entire write barrier+oop_field_store looks like this:

void oop_field_store(oop* field, oop new_value) {  
  pre_write_barrier(field);             // pre-write barrier: for maintaining SATB invariant  
  *field = new_value;                   // the actual store  
  post_write_barrier(field, new_value); // post-write barrier: for tracking cross-region reference  
}  
Copy the code

The process of pre-write barrier is as follows:

void pre_write_barrier(oop* field) {  
  oop old_value = *field;  
  if(old_value ! = null) {if ($gc_phase == GC_CONCURRENT_MARK) { // SATB invariant only maintained during concurrent marking  
      $current_thread->satb_mark_queue->enqueue(old_value); }}}Copy the code

The enqueue action is the actual code in G1SATBCardTableModRefBS: : the enqueue (oop pre_val), It determines whether JavaThread:: SATb_mark_queue_set ().is_active() is currently used in the concurrent marking phase. SATBMarkQueueSet is only set to active in the concurrent marking phase. CMS’s incremental update design makes it necessary to rescan all thread stacks and the entire Yong area as Root during the Remark phase; However, G1 SATB design only needs to scan the remaining SATb_mark_queue in the Remark phase

Why does the pre-write barrier only put old references into the SATBMarkQueue, not into the tag stack?

In order to reduce the impact of write barriers on user thread performance, G1 moves some of the tasks that should be done in the barrier to other threads for concurrent execution. This separation is achieved through logging. User threads only record information about what they want to do in a barrier to a queue, and other threads pull information from the queue to complete the rest of the action.

Take the SATB write barrier as an example. Each Java thread has an independent, fixed lengthSATBMarkQueueThe user thread pushes only old_value into the barrier. When a queue is full, it is added to the global SATB queue set, SATBMarkQueueSet, to wait for processing, and the corresponding Java thread is given a new, clean queue to continue executing.

The size of the global SATB queue collection is periodically checked during concurrent marking.When the number of queues in SATBMarkQueueSet**** exceeds a certain threshold, the concurrent tag thread processes all the queues in the collection, **** marks each object recorded in the queue and pushes its reference field behind the tag stack for further marking.

How to resolve cross-region reference?

The JVM uses Remember sets to avoid full heap for GC Roots scanning, as described in the previous article. If you have forgotten the memory Set, you can go back to the JVM memory Set.

Remember Set

G1’s heap, like other GC’s, has a cart table that covers the entire heap. Logically speaking, G1’s RSet should have one for each Region. This RSet records cards pointing to the Region from other regions. So it’s a ponits-into RSet.

The RSet implemented with card table is usually points-out, that is, the card table needs to record Pointers to other ranges from its coverage. Take the card table of generational GC as an example, to record cross-generation Pointers to yong from old area. The card that’s marked is in the old block.

G1 GC adds a layer of structure on top of ponites-out card table to form points-into RSet: each Region records which other regions have Pointers to it, and these Pointers are in the range of which cards.The RSet is actually a Hash Table. The key is the starting address of another Region, and the value is a collection. The element in the RSet is the index of the card Table, which corresponds to a card page. The seconds in the figure below are the points-into RSet relationshipTips for Tuning the Garbage First Garbage Collector

For example:

If A key of Region A’s RSet is RegionB and A card with index 1234 is in value of Region A’s RSet, it means that A card of RegionB has A reference to Region A. So for A, the RSet records the points-into relationship, while the card table still records the points-out relationship (don’t get too involved in points-into and points-out).

disadvantages

The excessive number of REGIONS in G1 can cause the G1 collector to consume more memory than other collectors, and it has been a rule of thumb that the G1 collector consumes approximately 10% to 20% of Java heap memory to maintain its work.

How is RSet updated

G1 uses a post-write barrier to update the RSet. Theoretical post-write barrier logic

The card page pointing to the field is first fetched
Check whether the card page is marked as dirty. If it is, no action is taken
Make the current card page “dirty” to prevent multiple fields belonging to the same card page from repeating this logic
In G1, both the Yong GC and Mixed GC process the Yong GC. Therefore, RSet maintenance is involved in filtering references from the Yong GC. (Since GC can scan, why waste time updating the region?)
Find the Region to which the card page belongs
Find the Region in the heap that refers to Region
Update rsets for cart tables and regions

Similarly to SATBMarkQueue, each Java thread consists of a DirtyCardQueue, which has a global DirtyCardQueueSet. The RSet update is performed concurrently by multiple ConcurrentG1RefineThread. When the number of queues in the global set exceeds a specified threshold, ConcurrentG1RefineThread fetches several queues. Traverses the card recorded in each queue and adds the card to the RSet of the corresponding Region.

How to build a reliable pause prediction model?

-xx :MaxGCPauseMillis can be used to specify the expected pause time. The G1 GC pause prediction model is based on Decaying Average. During garbage collection, The G1 collector records the cost of each measurable step, such as the collection time of each Region, the number of dirty cards in each Region’s memory set, and analyzes statistical information such as the mean, standard deviation, and confidence. A “decaying average” means that it is more susceptible to new data than a regular average, which represents the overall average, but a decaying average more accurately represents the “most recent” average. In other words, the newer the statistical status of a Region is, the more it determines the value of its collection. This information is then used to predict which regions will make up the collection to get the most benefit without exceeding the expected pause time if the collection starts now.

advantages

The G1 collector is not a pure low-latency collector; its official goal is to achieve the highest throughput possible with manageable latency. The design philosophy is also different from that of previous garbage collectors, which have gone from cleaning up the entire Java heap at once to pursuing a memory allocation rate that can handle the application. 3. No space debris will be generated. When the program allocates memory for large objects, it is not easy to find continuous memory and trigger the next GC or Full GC in advance. You only need to scan for references in the SATB buffer.

disadvantages

If the Mixed GC cannot keep up with the speed of memory allocation and the Old region fills up and cannot allocate memory, it will switch to Serial Old GC outside G1 to collect the entire Java heap (including Yong, OId, Permgen), also known as Full GC. G1 in this state is the same as the Full GC of -xx :+UseSerialGC (the core code behind this is common to both)

G1 GC System. The default is a Full GC, GC () is the Serial Old GC, only add – XX: + ExplicitGCInvokesConcurrent will use its own concurrent GC to perform System. The GC (), The effect of system.gc () is to forcibly start a global Concurrent marking; Normally, only the initial mark is made and then returned, and subsequent concurrent marks are executed as usual.

Matters needing attention

Do not set -xx :MaxGCPauseMillis too low. Setting it too low will cause G1 garbage collection to fail to keep up with object allocation, which can lead to garbage accumulation and eventually to Full GC.

Related parameters

parameter	meaning
-XX:G1HeapRegionSize=n	Sets the Region size, not the final value
-XX:MaxGCPauseMills	Set the target time for G1 collection. The default is 200
-XX:G1NewSizePercent	Minimum value of new generation, default value 5%
-XX:G1MaxNewSizePercent	Maximum for new generation, 60% by default
-XX:ParallelGCThreads	Number of parallel GC threads during STW
-XX:ConcGCThreads=n	Concurrent marking phase, number of threads executing in parallel
-XX:InitiatingHeapOccupancyPercent	Sets the Java heap usage threshold for trigger marking cycles. The default value is 45%. The Java heap ratio here refers to non_YOUNG_CAPACity_bytes, including old+humongous