G1 (Garbage First) Garbage collector

Because the summary of the JVM garbage collector was too long in the last article, and G1 is also very important, there will be a separate article summarizing the G1 garbage collector;

Garbage collector (PS+PO,CMS, etc.)

The PS+PO,CMS, etc garbage collector was easy for me to learn in this part, but by G1, the Java heap layout is a little bit “sexy”! And then it gets a little more and more confusing; Do not understand to see a few more times, more access to some information!

A brief introduction to G1

G1 divides the heap into independent regions of equal size. Cenozoic and old generations are not physically separated, but logically defined. Each Region can act as the Eden space of the new generation, Survivor space, or old era space as required. In addition, it also has a special Region called Humongous, which is specially used to store large objects. If the size of the new object exceeds half of the size of the Region, It is allocated directly in one or more new contiguous regions, labeled H, and the spatial distribution is shown as follows:

As mentioned above, G1’s heap memory is divided into multiple equal-size regions, but how many? I found some information:

The goal is to have around 2048 regions for the total heap.

For a Region, it is a logically continuous segment of space. Its size ranges from 1MB to 32MB. So where do these numbers come from? Did you believe me when I said 32MB? As 2048 as some evidence ah!

#ifndef SHARE_VM_GC_G1_HEAPREGIONBOUNDS_HPP
#define SHARE_VM_GC_G1_HEAPREGIONBOUNDS_HPP

#include "memory/allocation.hpp"

class HeapRegionBounds : public AllStatic {
private:
  // Minimum region size; we won't go lower than that.
  // We might want to decrease this in the future, to deal with small
  // heaps a bit more efficiently.

  // The minimum Region size is 1 MB
  static const size_t MIN_REGION_SIZE = 1024 * 1024;

  // Maximum region size; we don't go higher than that. There's a good
  // reason for having an upper bound. We don't want regions to get too
  // large, otherwise cleanup's effectiveness would decrease as there
  // will be fewer opportunities to find totally empty regions after
  // marking.
  
  // The maximum Region size is 32 MB
  static const size_t MAX_REGION_SIZE = 32 * 1024 * 1024;

  // The automatic region size calculation will try to have around this
  // many regions in the heap (based on the min heap size).
  
  // The default Region number is 2048
  static const size_t TARGET_REGION_NUMBER = 2048; 

public:
  static inline size_t min_size(a);
  static inline size_t max_size(a);
  static inline size_t target_number(a);
};


Copy the code

From: hg.openjdk.java.net/jdk/jdk/fil…

This is more authoritative, but is it important to know these numbers? I don’t think it matters! But it’s even better to know!

Without further discussion, G1 redefines the heap space, breaking down the generation model and dividing the heap into regions so that collection does not have to be done across the whole heap, which is its most notable specificity; The benefits of not collecting in the full heap are: predictable pause times; The user can specify how long the collection will take to complete; If the user specified a collection time of 0.05s; How can G1 get close to this time when it collects garbage? G1 achieves this goal by collecting as few regions as possible;

To rearrange the above words is to say: The G1 collector is able to model predictable pause times because it can systematically avoid all-region collections across the entire Java heap. G1 calculates and quantifies collection costs for each Region using a reasonable calculation model, so that the collector, within a given pause time limit, It can always select a group of appropriate regions as the collection target, and make the collection cost meet the limitation, so as to achieve the purpose of real-time collection.

Now that we’ve taken a look at G1 at a macro level, let’s take a look at its execution flow steps:

How the G1 collector works

Initial Marking

Just Mark the objects that GCRoots can associate directly with and change the value of TAMS(Next Top at Mark Start) so that the user program can create objects on the correct Region when running concurrently in the Next stage. This stage requires a pause thread, but it takes a very short time!

Concurrent tags

Reachability analysis of the heap, starting with GCRoots, to find viable objects, is a time-consuming phase that can be performed concurrently with user programs;

In the end tag

This phase is used to correct the part of the mark record that changes during concurrent marking as the user continues to run. The virtual machine records the changes in the thread Remembered Set Logs during this time. The final markup phase requires the data of the Remembered Set Logs to be merged into the Remembered Set. This phase requires a pause thread, but can be executed in parallel; (This is familiar with Remembered Set, but more on that later)

Screening of recycling

Update statistics of regions, sort the collection value and cost of each Region, and make a collection plan based on the pause time expected by users.

You can select any number of regions to collect data. Then, you can copy the surviving objects of the Region to the empty Region and clear all the space of the old Region. The operation here involves moving copies of living objects, which must be paused by the user thread, and is done in parallel by multiple collector threads;

This is what the G1 collector will tell you about Remembered Set. What is Remembered Set?

The technical points involved in G1

Rset (Remembered Set)

Each Region has an Rset that records references to objects entering the Region. For example, the objects in block A refer to block B. The Rest of block B records this information, which is used to parallelize the collection process and make the collection process independent. But the overall memory footprint is less than 5%;

Card Table

We also covered Card tables in the last JVM garbage collector summary article! Together we can review:

When we talked about CMS in the last article, we had a scenario where CMS was the old garbage collector, but it also scanned the new generation during the concurrent marking phase, because there was no guarantee that unreachable objects would become reachable again during the concurrent phase (GCRoots were disconnected but reconnected during the concurrent phase). This time if the full scan of the new generation and old age is particularly slow! And that’s when you’re going to use a Card Table and this thing is just an array, and each place in the array holds a byte; The CMS divides the old space into blocks with a size of 512bytes. Each element in the card table has one block.

In concurrent marking, if the reference to an object changes, the block on which the object resides is marked as a dirty card.

It doesn’t matter if you didn’t read my last post, I posted it here:

Let’s use a picture to explain:

State of the object when marking concurrently:

But then the current OBj reference changed:

The block where current obj resides is marked as a Dirty card.

Then came the pre-cleaning phase, where one of the tasks was to tag the objects that were modified during the concurrent tagging phase, and then the objects that were made reachable through Current OBj were tagged as follows:

The Dirty Card flag is also removed. That’s how it worked in the old days.

This is one of the functions of the Card Table, but the Card Table also has another function:

Let’s imagine a scenario where an old age object might refer to a new age object, and when marking a living object, all objects in the old age need to be scanned. Because this object has a reference to a Cenozoic object, this reference is also called GC Roots. Wouldn’t you have to do a full stack scan again? The cost is too high. Instead of scanning the entire age during Minor GC, we can look for dirty cards in the card table and add objects from dirty cards to Minor GC Roots. After scanning for all dirty cards, the Java VIRTUAL machine clears all dirty cards.

Card tables can be used to reduce full-heap space scans of older generations, which can greatly improve GC efficiency.

Three color tag

As objects are traversed, they are colored according to whether they have been accessed:

  • White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all objects are white, and if at the end of the analysis, the objects are still white, that is, they are unreachable.
  • Black: Indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned. The black object indicates that it has been scanned, and it is safe to live. If there are other object references to the black object, there is no need to scan again. A black object cannot point directly to a white object (without passing the gray object).
  • Gray: Indicates that an object has been accessed by the garbage collector, but at least one reference to the object has not been scanned.

As you can see, the gray object is an intermediate state between the black object and the white object. When the marking process is over, there will only be black and white objects, and the white object is the object that needs to be reclaimed.

During the concurrent tagging phase, when the garbage collector and the user thread are working at the same time, this can cause problems; The garbage collector marks the color on the object graph while the user thread is modifying the reference relation. The reference relation changes and the object graph changes.

One is to mismark a dead object as alive. This error is tolerable, but it just creates a bit of floating garbage that escapes this collection and can be cleaned up next time.

The other is to mistakenly mark the original alive object as dead, this is a very serious consequence, a program still need to use the object is recycled, that program will therefore error! This situation is absolutely intolerable!

So next we focus on the problem of object disappearance; So let’s look at what the normal case looks like before we look at what happens when an object disappears, so it’s easier to understand what happens when an object disappears;

The following is a quote from a blog post by Why Technology

The first is the initial state, very simple, only GC Roots are black. Note also that the arrows in the following pictures represent directed directions, for example, one of the reference chains is: root node ->5->6->7->8->11->10

During the scan, the changes look like this:

If you look at the GIF above, the gray object is always somewhere between black and white. When the scan is complete, the object graph looks like this:

At this point, the black object is alive, and the white object is dead and can be reclaimed.

Remember, this is a normal case where everything is fine

Case one where the object disappears

If the user thread modifies the reference relationship while marking, the following situation occurs:

When the scan is complete, the object graph looks like this:

At this time, we can clearly see by comparing with the previously analyzed object graph after normal scanning. After scanning, object 9, which was still referenced by object 5, is a white object, so according to the three-color marking principle, object 9 will be treated as garbage collection.

Object disappearance case two

Now let’s look at another case where an object disappears

As demonstrated above, the user thread cuts the reference and then re-references the black object as part of the original reference chain.

Objects 7 and 10 are already part of the original reference chain (root node ->5->6->7->8->11->10). The modified reference chain becomes (root node ->5->6->7->10).

When the scan is complete, the object graph looks like this:

Since black objects are not rescanned, this will cause both object 10 and object 11 to be reclaimed after the scan is over. They are all part of a chain of references to the principles before they were modified;

This is the problem with concurrent markup, not only floating garbage, but also objects disappearing;

Resolve object disappearance problem

The two scenarios in which objects disappear as shown above must satisfy two conditions:

  • Condition 1: The assignor inserts one or more new references from black objects to white objects

  • Condition 2: The assignment removes all direct or indirect references from the gray object to the white object

When these two conditions are met at the same time, the object will disappear. Let’s take a look at these two conditions:

References between black object 5 and white object 9 are newly created, corresponding to condition 1.

The reference between black object 6 and white object 9 is removed, corresponding to condition 2.

Since the object disappears only when both conditions are met, we can solve the problem by breaking either condition;

This led to two solutions: Incremental Update and Snapshot At The Beginning (SATB).

In the HotSpot virtual machine, CMS does concurrent marking based on incremental updates, whereas G1 uses raw snapshots.

Incremental updating

Incremental update breaks the first condition. When a black object is inserted as a new reference to a white object, the newly inserted reference is recorded. After the concurrent scan, the black object in these recorded reference relationships is scanned again.

A black object becomes a gray object once a reference to a white object is inserted.

After a concurrent scan, black object 5 points to white object 9:

Object 9 is then scanned black again. It’s not recycled, so it doesn’t disappear.

The original snapshot

Original snapshot is to destroy the second condition, according to damage is the second condition (assignment deleted from the gray object to all this white object reference) directly or indirectly, when grey object to remove points to the white object reference relations, will delete the record reference, after the end of the concurrent scanning, The gray objects in these recorded reference relationships are scanned again.

This can be simplified as: whether the reference relationship is deleted or not, the search will be based on the snapshot of the object graph at the moment when the scan is started.

Note that no matter the insertion or deletion of reference records in the above introduction, the recording operation of the virtual machine is implemented through the write barrier. Writing barriers is also an important topic, but it is not the focus of this article and will not be covered in detail.

Just two things to add:

1. The write barrier here is not the same thing as the “memory barrier” used to solve concurrent out-of-order execution problems.

2. Write barriers can be viewed as the virtual machine level AOP aspect of the action “reference type field assignment”. When a reference object assignment is made, a circular notification is generated for the program to perform additional actions, that is, before and after the assignment is covered by the write barrier. The Write Barrier before assignment is called a pre-write Barrier, and the one after assignment is called a post-write Barrier.

Therefore, after simple deduction, we can know:

Incremental updates use a post-write Barrier to record all new references.

The original snapshot uses a pre-write Barrier to record old references to all reference relationships that are about to be deleted.

Full GC

This is a last-gasp phase in G1(and most collectors as well), mainly due to the memory footprint of older programs, such as programs that use a template parsing engine but don’t extract variables, This can easily lead to Full GC, or the presence of many Humongous objects. If a Full GC occurs in a program, in addition to gC-related tuning, it also takes time to optimize the business code.

To optimize the Full GC

  • Increase the heap size, or adjust the ratio of old generation to young generation; The corresponding effect is that G1 can have more time to complete Concurrent Marking

  • If the number of humongous regions may be caused by too many large objects, run gc+heap=info to check the number of Humongous regions. You can increase the RegionSize by running -xx :G1HeapRegionSize to prevent large objects in the old era from occupying too much memory space.

  • Add threads for Concurrent Marking, set with -xx :ConcGCThreads

  • The concurrency cycle starts earlier, by default when 45% of the entire heap is occupied

Parameter is introduced

  • -XX:+UseG1GC

    Using the G1 collector

  • -XX:MaxGCPauseMillis=200

    Specifies the destination pause time. The default is 200 ms.

    When setting the -xx :MaxGCPauseMillis value, do not specify the average time, but specify that 90% of the pauses are within this time. Keep in mind that the pause time goal is a goal that can’t always be met.

  • -XX:InitiatingHeapOccupancyPercent=45

    When the heap usage reaches this ratio, concurrent GC cycles are triggered, which defaults to 45%.

  • -XX:NewRatio=n

    Old/young, the default is 2, which is 1/3 of the young and 2/3 of the old

    Do not set young user to fixed size, otherwise:

    • G1 no longer needs to meet our pause time goals
    • You cannot expand or shrink the size of the young generation as required
  • -XX:SurvivorRatio=n

    Eden/Survivor, default 8, this is the same as other generational collectors

  • -XX:MaxTenuringThreshold =n

    The age threshold for promotion from the young generation to the old generation is the same as for other generational collectors

  • -XX:ParallelGCThreads=n

    The number of garbage collection threads in parallel collection

  • -XX:ConcGCThreads=n

    Number of garbage collection threads in the concurrent marking phase

    Increasing this value makes the concurrent markup complete more quickly. If this value is not specified, the JVM calculates it using the following formula:

    ConcGCThreads=(ParallelGCThreads + 2) / 4^3

  • -XX:G1ReservePercent=n

    The percentage of heap memory reserved, 10 by default, to reduce the risk of promotion failure, i.e. 10% of the heap is set aside by default.

  • -XX:G1HeapRegionSize=n

    The size of each region is calculated based on the heap size. The default value is 1MB to 32MB. We usually specify the size of the whole heap.

This article on the JVM is over for now and will be filled in later as we learn more! As long as efforts can be solved, I think it is the simplest thing!