Previous article
(1) The data region of the JVM runtime
2, The JVM (ii), Java object creation process
(3) Allocating memory for object creation
JVM garbage collection algorithm
Introduction to garbage collector
JVM (vi) garbage collector CMS
Last time we looked at how CMS works, and this time we’ll start talking about G1. It’s still a bit of a stretch, so it’s not recommended to skip the basics
A, partition && generation
1. The purpose and disadvantages of generation
As we mentioned earlier in Article 4, the goal of generational collection is to avoid scanning the entire heap at once and instead scan one generation at a time, which reduces the time spent in garbage collection. But when the JVM heap memory is very big, such as 64 G, at this time whether old age or younger generation can have dozens of G space, at that time, even a generational each scan space will be very big, resulting in a longer pause (remember the fifth article finally said impossible triangle), over the years, with the application memory is more and more big, The problem is getting worse.
2, the progress of zoning
Since simple generation cannot solve the problem of large scan space when the memory is large, the traditional physical space generation method is no longer adopted since G1, but the partition method is adopted. To visualize, the heap is divided into several small areas, and only a few areas need to be scanned during each garbage collection. This avoids long pauses caused by scanning too much memory when the total heap is large. From the partition design began to appear after the JVM garbage collection into a big step forward, at that time, the designers realized that do not need to complete scan every time a generation of memory space, and as long as the garbage collector can speed up with the speed of memory allocation, and since then, the garbage collector is also becoming more intelligent, more usable.
Being smarter and more usable doesn't mean it's less complex inside. We think it's easier because of more effort and smarter design behind the scenes.Copy the code
3. Misunderstanding of generation and partition
In the interview before I asked some candidates, partition will not be regardless of generation? This is a very simple question, but it turns out that many candidates fail to get it right, with several claiming that G1 partitioning is ungenerational. Is the need to say here is, partition and generational without conflict, they are indeed have the same purpose, is in order to avoid a one-time scan too much memory space (every scan too much memory space is easy to cause a long pause), but the means used are not the same, generational is according to the age of the object size to memory as the young generation and the old s way, Avoid frequent scan collections for areas that may not need to be scanned, as the frequency of gc in the older generation is completely different from that of the younger generation (gc in the younger generation is more frequent, but relatively stable in the older generation). Partitioning reduces the size of each scan by controlling the size of each scan. So the two can go together.
Region and RSet of G1
1, the Region
G1 divides the heap memory space into equal regions, which may not be physically contiguous but logically constitute a contiguous heap address space. Each Region is marked with E, S, O, and H, indicating Eden, Survivor, Old, and Humongous respectively. Where E and S belong to the young generation, O and H belong to the old age. The distribution of G1 is shown as follows:Objects that are equal to or larger than half of Region are placed in Region H to prevent copying of large objects during GC. A Full GC is triggered when no contiguous area large enough to hold large objects can be found.
2. RSet and Card
If during young GC, some objects in young region still refer to objects of the Old age, then in order to avoid scanning all the Old age, each region of G1 maintains a Remember Set to record all references of the external Old region to the local region, RSet for short. This can be done by scanning objects in RSet during GC.
RSet maintains references from external Old to local Region, but does not include young. This is because the young region is so large that there is no need to record it. If there is a reference, just scan the whole region in the young region over there.Copy the code
Card Indicates that each Region is divided into several cards, and each Card is 512 KB in size. Why do I say this Card? If Region1 contains a reference to Region2’s Card1, Region2’s RSet contains the address of Region1’s Card1. The relationship between regions and rsets and cards can be represented as follows:
Three, the work process
1. Several working modes
Since G1 is a collector that takes care of the entire heap, different modes are used to deal with the problem of filling different areas with objects.
young gc
This pattern, which is the easiest to understand, is the GC that occurs when the E region is full, clearing the E region. Before and after recycling are shown in the following two figures:
As you can see, there’s still S after the recycling, but E has been emptied.
In fact, G1’s Young GC also has several processes similar to other collectors, such as root scanning, Update RS, Scan RS, and Object copy, which can be well reflected in gc logs. Object Copy usually takes the longest time, and the most time in young GC is usually the Object Copy phase, where the pause time is mainly in this phase.
mixed gc
The Chinese name for this mode is mixed collection, so called because this mode will recycle all young and some old areas at the same time. Why do you choose part of the old area instead of all of it? This is because G1 has a very powerful point, which establishes a predictable pause model. We can use -xx :MaxGCPauseMillis to control the expected pause time. G1 will try to meet this time as much as possible, and then select the corresponding region to reclaim. After each gc, the amount of Garbage in the corresponding area will be counted. In the later mixed GC, the areas that were considered to have a lot of Garbage will be recycled first to achieve the maximum recycling benefit, which is also the origin of the name Garbage First. Mixed gc to trigger condition can also be used by our XX: InitiatingHeapOccupancyPercent this parameter to control, it is the value of the default 45, occupy forty-five percent, that is, when the old s will trigger the mixed gc.
The mixed GC process is divided into the following key steps:
1. Global Concurrent marking
The global concurrency tag can be further broken down into the following steps:
- Initial mark (STW). It marks objects directly reachable from GC Root. The initial marking phase borrows the pauses of the Young GC, so there are no additional, separate pause phases.
- Concurrent Marking. This phase starts with GC Root to mark objects in the heap, the tag thread executes in parallel with the application thread, and collects information on live objects for each Region.
- Final marking (Remark, STW). Mark objects that change during the concurrent mark phase and are reclaimed.
- Cleanup (part of STW). If a region is found with no live objects at all, the whole region is reclaimed to the list of allocatable regions. Clear an empty Region.
It looks very similar to the CMS, but it needs to be distinguished. If you look closely, the difference between the first recognition tag and the Young GC is quite large. CMS is concurrent, but G1 is STW, because object copy is performed at this point. If it is concurrent with user threads, the accuracy of this work becomes difficult to guarantee.Copy the code
Second, Copy survival objects (Evacuation)
The Evacuation phase is fully suspended. It copies live objects from a region to an empty region (parallel copy) and reclaims space from the original region. In the Evacuation phase, any number of regions can be selected to form collection sets (Csets) independently, and the selection of regions in the CSet depends on the pause prediction model mentioned above. In this stage, not all regions with live objects are evacuated, but only a small number of regions with high yields are selected to evacuate, so that the cost of the suspension can be controlled within a certain range (referring to the value in the MaxGCPauseMillis parameter we gave).
full gc
The full GC is triggered when the speed of collection cannot keep up with the speed of allocation. It is worth noting that G1 does not have a full gc mode. When a full GC occurs, serial is used to collect the whole heap, which causes a long pause. From the author’s own experience, when we use G1, if the configuration is correct and the code is fine, then full GC is actually very rare. G1 is not as easy to full GC, and it doesn’t have the same flaws that CMS has that cause it to do full GC.
2. Problems and solutions in the marking process
We mentioned earlier in the algorithm-based article that references can be changed during concurrent tokens, and we talked about both read and write barriers. In G1, SATB (Snapshot At The Beginning) is used to fix this problem. This is complicated. Simply put, during gc, if The reference of The object changes, The reference of The object is put into a queue. It then marks the scan with the reference in the queue as the starting point for re-marking. In this way, it ensures correctness in the concurrent tag, and the queue is called SATb_mark_queue. Here’s a quote from R big:
All you need to do is use a pre-write barrier to keep track of the old references every time the reference relationship changes. This way, when a Concurrent Marker reaches an object, all changes in the reference type fields of the object are logged so that no objects live in the Snapshot are missed. Of course, there is a good chance that an object is alive in the Snapshot, but SATB will let it live through the GC even though it might already be dead as the concurrent GC progresses. The incremental update design of CMS makes it necessary to rescan all thread stacks and the entire Young Gen as root during the remark phase; The SATB design of G1 only needs to scan the remaining SATB_mark_queue in the remark phase, which solves the potential risk of a long STW in the CMS garbage collector re-marking phase.Copy the code
But what if a new object appears in a concurrent tag? There is also a pointer called Top at Mark Start (TAMS), which G1 assigns to each Region. This is the part of the Region where new objects are allocated during concurrent processes. The G1 collector by default considers these objects to be living objects. These objects are not included in the collection scope.
3. Establishment of predictable pause models
We mentioned earlier that we can control our expected pause times with -xx :MaxGCPauseMillis, but how does G1 satisfy our expectations? G1 here is done by the average attenuation, G1 in the process of recycling, will go to each Region in the recycling of the indicators, such as the number of dirty CARDS, recovery time, number of live objects, etc., and to calculate the index of some statistical data, then draws the maximal profit recovery which areas, In the next collection, select the most profitable regions from these regions for collection (note that these regions are old ones).
In fact, there is a knowledge point "attenuation standard deviation", which is a statistical knowledge, in order to reduce the difficulty of the article, I will not mention this point here, interested readers can learn about this.Copy the code
4. Common misunderstandings
- Tradeoffs between pause times and throughput: Do not be naive to think that the smaller the -xx :MaxGCPauseMillis is, the better, because if the pause time is too short, the space for each clean up is too small, which causes the collector to have to do multiple collections, resulting in reduced throughput. Also, if the pause time is too low, it will cause garbage accumulation. The result is full GC, which leads to longer pause times.
This parameter like this - when we use the CMS at XX: + UseCMSComPackAtFullCollection parameters, is not as small as possible, or the bigger the better, but need to be done according to the circumstance of actual pressure measurement values, in general, is the range of a few hundred milliseconds.Copy the code
- Comparison with CMS: As stated in the deep Understanding of the Java Virtual Machine (I read the third edition), CMS is still likely to outperform G1 on small memory applications, while G1 has its advantages on large memory applications, and the heap size balance is between 6-8GB. The JDK version used by our project team is 1.8.0_20. I only provided 3G JVM memory for our UAT environment, but after my test, G1 is much better than CMS in terms of pause time and gc frequency. Therefore, it is recommended that after 1.8, If possible, choose G1 as the collector for the production environment.
3. Common parameters
In addition to the above parameters, G1 also has the following important parameters:
- -XX:G1HeapRegionSize=n
This is to set the size of each region. The value ranges from 1 MB to 32MB and must be an exponent of 2. If not, G1 will be determined based on the heap size.
- -XX:G1NewSizePercent
The minimum size of the Cenozoic is 5.
- -XX:G1MaxNewSizePercent
The maximum size of the Cenozoic, the maximum I can do here is 60.
Many people are confused about the above two parameters. This is especially important because the ratio of G1's young to old generations isn't constant, and it's actually a variable value.Copy the code
- -XX:ParallelGCThreads
This is the number of threads recycled in the STW. It is recommended to give a value equal to the number of cpus and not exceed this value.
- -XX:ConcGCThreads=n
This is the concurrent marking phase, the number of threads executing in parallel, and the default is recommended.
- -XX:InitiatingHeapOccupancyPercent
The default value is 45. This value is conservative, and it is recommended that this value be appropriately high.
- -XX:G1ReservePercent
This is the size of the reserved heap, meaning that some heap space is set aside for new objects at the time of collection. The default is 10.
4. Log analysis
GC log is a very annoying thing, and every GC collector logs are different, although the author himself, see the log to see more but can't remember, most of the time also need to flip their notes, here recommend actually don't have to specifically to remember this thing, his taking notes or find a collection of blog down after contrast to watch.Copy the code
Here we take a part of the online log to do analysis, explanation see my Chinese notes
2021-01-26T9:56:58.661 + 0800:61596.688: [GC Pause (G1 Evacuation Pause) (young), 0.0824483 secs] // You can see that GC threads are four because I set -xx :ParallelGCThreads=4 [Parallel Time: Parallel Time: 28.4ms, GC Workers: 4] [GC Worker Start (ms): Min: 61596688.3, Avg: 61596688.3, Max: 61596688.4, Diff: 0.1] // Time spent Scanning Roots, sum is the total time, in ms [Ext Root Scanning (ms): Min: 1.5, Avg: 2.1, Max: 4.0, Diff: 2.5, sum: // The time spent on updating RS per thread [Update RS (ms): Min: 0.4, Avg: 2.4, Max: 3.2, Diff: 2.8, Sum: 9.5] [Scan RS (ms): Min: 2.3, Avg: 2.6, Max: 2.8, Diff: 41, Sum: 79] 0.5, Sum: 10.4] // Scan the root in code, where the root in code refers to the object referenced by the JIT compiled code, and the reference relationship is saved in RS [Code root Scanning (MS): Min: 0.1, Avg: 1.0, Max: 1.9, Diff: 1.8, Sum: 3.9] Object Copy (ms): Min: 19.2, Avg: 20.1, Max: 21.0, Diff: 1.7, Sum: 3.9 [Termination (MS): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] // GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: [GC Worker Total (ms): Min: 28.2, Avg: 28.3, Max: 28.3, Diff: 0.1, Sum: [GC Worker End (ms): Min: 61596716.6, Avg: 61596716.6, Max: 61596716.7, Diff: 0.1] // Here are some times to do other things, not so important [Code Root Fixup: 3.6ms] [Code Root Migration: 28.4ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.7ms] [Other: 21.3ms] [Choose CSet: 0.0 ms] [Ref Proc: 16.9ms] [Ref Enq: 0.1ms] [Redirty Cards: 0.2ms] [Free CSet: 3.3ms] [Eden: 3282.0m (3282.0m)-> 0.0b (3278.0m) Survivors: 46.0m -> 50.0m Heap: 3450.4m (6656.6m)-> 172.4m (6656.6m)] // As before, user is the time used by all threads, real is the pause time, sys is the waiting time for system scheduling [Times: User sys = = 0.17 0.00, real = 0.08 secs]Copy the code
The G1 Young GC has several important steps: 1, enumerating root nodes, 2, updating RS, 3, scanning CS, and 4, copying objects.
5. Reasons for full GC
Taking a look at some of the good things written in other articles, let’s look at some of the reasons for full GC in G1
Discard tags in mixed GC
This happens when mixed GC is started and we know that the marking process is concurrent with the user thread. At that time measures is increasing the heap memory, if can’t increase the heap memory, the need to reduce – XX: InitiatingHeapOccupancyPercent value, and increase – XX: ConcGCThreads = n value
2. Failing to get promoted
Let’s pay attention to 1 and 2. There are some problems explained by the author in the article I refer to. The correct reason here is that the marking phase of mixed GC has ended, and the recycling work is started at this time. However, due to the lack of space in the old age, the object needs to be promoted to the old age, which leads to the failure of promotion. Therefore, this full GC is triggered here.
At that time the best way is to reduce – XX: InitiatingHeapOccupancyPercent the size of the value, or increasing – XX: G1ReservePercent this parameter also has a good effect
3. Large object allocation fails
Large objects are a tough problem in G1. Why? By default, G1 considers more than half of the Region’s objects to be large objects, and allocates a Region to store it. This means fragmentation of memory, since half of the Region’s space may be wasted. Fragmentation leads to inefficient space utilization, and in earlier versions of 1.8, only the full GC reclaimed these large objects.
We should increase the region size appropriately to reduce the number of large objects, or if we are using JDK11 or later, we can try using ZGC.
reference
The algorithm implementation part in G1 is extremely complex, you don’t see what I wrote, in fact, I also read many times before I understand, in order to help you better learn, the author listed his own learning time reference articles and books
1. “Deep Understanding of Java Virtual Machine” 3rd edition, Zhiming Zhou
This book will not explain, JVM entry must see
Some key technologies for Java Hotspot G1 GC
It is an article by meituan technical team, worth a look
3, [Java Garbage Collection algorithm G1] (juejin.cn/post/684490…) Also wrote a very good article, very comprehensive introduction
4, hllvm-group.iteye.com/group/topic…
Here are some big explanations of the complex concepts in G1
5, [Tips for Tuning the Garbage First Garbage Collector] (www.infoq.com/articles/tu)… This is a foreign article, introduce some key things in G1, the author of English is not good, but also hard to learn to read it