Recently I read some articles about CMS and G1, it really hurts, many articles you copy me copy him, copy everywhere is wrong. Or just throw a few coarse-grained concepts in there, which is really a waste of people’s time.
Spent the day looking through the Q&A threads and other resources of RednaxelaFX to sort out some of the implementation details of CMS and G1. The basic concepts and a whole bunch of things that search engines search for won’t be described here. There’s nothing out of order.
What exactly is the GC root for the INITIAL marking phase of CMS?
In the context of CMS Initial Mark, the root set does not include Young Gen but only stack, Register, globals, and so on. This is because during the CMS Concurrent Mark phase, the CMS iterates through the live objects in Young Gen along the original root set. So from the point of view of CMS initial mark + Concurrent Mark, young Gen is still part of the root set (because it is scanned but not collected).
That is, the actual GC root set for the initial marking phase does not include the new generation that will be traversed later in the concurrent marking phase (or rather, in the pre-cleaning phase). However, the new generation will be retraversed in the remark phase, for the same reason as the dirty card in the old age, the new generation will also have such dynamic dirty cards in the execution.
G1 region is
From Meituan technology blog https://tech.meituan.com/2016/09/23/g1.html
H indicates that these regions store humongous objects (H-OBj), that is, objects with a size greater than or equal to half of the Region. H-obj is directly allocated to the old Gen to prevent repeated copy and movement.
Under what circumstances will miss the mark occur?
In CMS and G1, we use black, white and gray colors to identify scanned objects:
-
White: The object is not marked. After the marking phase, it will be garbage collected.
-
Gray: The object is marked, but its field has not been marked or finished marking.
-
Black: The object is marked and all its fields are marked.
The following two conditions must be met at the same time for missed bids to occur:
1. The application inserts a new reference from the black object to the white object
2. The application removes all direct or indirect references from the gray object to the white object.
Insert a new reference means a black to white objects would have as rubbish recycling white objects have a new reference, become can reach, and because it is a black object references, black object won’t be rescanning calculation so that the white object will be missed, but if the white object also be grey object references that also can salvage once, Since the gray object has not been marked, it will be scanned from the gray object to the white object.
Use dirty cards to solve missing marks
The JVM divides memory into fixed-size cards, and a dedicated data structure (card Table in this case) maintains the state of each card, one byte for each card. When references to objects on a Card change, the status of the corresponding Card Table is set to dirty(actually using the mod-union Table). These dirty cards will be processed in the pre-cleaning and remark phases
What is the difference between CMS and G1?
- CMS uses incremental update write barrier, also called insertion barrier, to focus on inserting new reference relationships
- G1 uses SATB write barrier (snapshots-at-the-beginning), also known as deletion barrier, which focuses on deleting old reference relationships
Incremental update does this: Whenever a reference to a white object is assigned to a black object field in a write Barrier, turn the white object gray (for example, marking it on the marking stack, or recording it in a mod-union table). This strongly eliminates the occurrence of the first situation mentioned above.
SATB takes all live objects in the logical snapshot at the beginning of marking as live objects. This is done by turning all old references to non-white objects in a write barrier. The practical effect of this is that if a gray object’s field originally points to a white object but is assigned another value (say null) before concurrent Marker can scan the field, the association between the field and the white object is cut off. The SATB write barrier prevents this from happening by ensuring that the object referenced by the field grays out before such a cut-off occurs.
Floating Garbage for CMS and G1
CMS only cares about the insertion of new reference relationships, not the deletion. So when a reference to a black object is deleted, the CMS does not know it is deleted, and it will also treat it as a marked object, causing a garbage object not to be deleted in this round.
However, G1 only focuses on deletion and will treat all new objects after Inital marking as black objects (new references to original objects will not generate floating garbage). G1 does not care whether these objects will become garbage during concurrent marking. So G1 may produce more floating garbage in a single GC than CMS.
Precleaning (pre-cleaning) and AbortablePreclean (Interruptible pre-cleaning) for CMS
It should be noted that the pre-cleanup and interruptible pre-cleanup phases are not required and can be turned off with XX: -cmSPRecleaningEnabled. And AbortablePreclean execution cycles is the premise of the memory usage of Eden area greater than CMSScheduleRemarkEdenSizeThreshold (default 2 m) if the object of the new generation is too little, there is no need to perform the stage, direct execution to mark phase. The pre-cleanup phase mainly does two things: 1) scan young Gen to process dynamically generated new references to old Gen; 2) process previously marked dirty cards. And interruptible pre-cleaning and pre-cleaning is just a circular execution, and with the following interrupt conditions:
-
The number of cycles than most loop CMSMaxAbortablePrecleanLoops. Default is 0, said there was no limit cycles.
-
CMSMaxAbortablePrecleanTime execution cycle time reached the threshold, the default is 5 s.
-
New generation memory utilization of Eden area has reached the threshold CMSScheduleRemarkEdenPenetration 50% (the default)
The pre-cleanup phase is executed in parallel to reduce the stress of the REMARK phase. Cyclic pre-cleaning expects a Young GC within 5 seconds, which significantly reduces the losses associated with scanning Young Gen during the Remark phase. However, we can also specify a GC before AbortablePreclean with the CMSScavengeBeforeRemark parameter.
AbortablePreclean AbortablePreclean AbortablePreclean AbortablePreclean AbortablePreclean AbortablePreclean AbortablePreclean This parameter should be through open cause Eden area usage less than CMSScheduleRemarkEdenSizeThreshold cause over AbortablePreclean phase.
CSet and RSet of G1
Collection Set (CSet)
The CSet records the collection of regions to be collected by a GC. The regions in the collection can be of any age. References outside the CSet are ignored in the GC, and the GC only cares about regions in the CSet
remembered set
Logically, each Region has an RSet. Rsets record the relationship between objects in other regions and objects in this Region. Rsets belong to the points-into structure (who references my object). For example, if an A object refers to B, that is, A. iled=B. So B’s RSet is going to record the address of the Card that A is in.
Images from Meituan technology blog https://tech.meituan.com/2016/09/23/g1.html
What does the G1 inital marking phase share with the Young GC?
An inital marking of Global Concurrent marking shares the pause phase with the Young GC, which actually shares the root collection produced by the Young GC scan. A Young GC can be accompanied by inital marking or not.
Why doesn’t G1 care about updating Young Gen’s object reference relationships?
There are two submodes of selected CSet in generational G1 mode, corresponding to young GC and Mixed GC respectively:
-
Young GC: Select all regions in Young Gen. Control the overhead of young GC by controlling the number of regions of Young Gen.
-
Mixed GC: Select all regions in Young Gen, and collect some old Gen regions with high returns according to global Concurrent marking statistics. Select the old Gen region with the highest revenue within the cost target specified by the user.
You can see that the Young Gen region is always inside the CSet. G1 gc is concerned with references outside the Cset to references inside the Cset, so generational G1 does not maintain RSet updates involving references from the Young Gen Region. CMS only recycles old Gen, so CMS “CSet” is old Gen
Evacuation process for G1
Evacuation process is the cleanup process, which is a relatively separate process from Global Concurrent marking. The Evacuation phase is fully suspended. It copies live objects from a region to an empty region and reclaims space from the original region. The evacuation process can select any number of CSets to be recycled, because each region has a corresponding Rset that determines which objects are alive.
What are the advantages of G1 over CMS?
- G1, while mark the entire heap, does not evacuate all regions that have live objects; By choosing only a few regions with high yields to evacuate, the cost of this pause can be manageable (to a certain extent). But after all, you have to pause to copy objects, and the pause time is limited. The EVACUATION pause of G1 is normal in tens to one hundred or even two hundred milliseconds. So be careful not to set -xx :MaxGCPauseMillis too low, or the G1 will not be able to meet its target and will be prone to garbage accumulation, which in turn will cause full GC and degrade performance. In general, 100ms, 250ms and so on might be reasonable.
- CMS remark needs to rescan all the dirty cards in the mod-Union table plus the whole root set. In this case, the whole young Gen (regardless of whether the object is alive or dead) will be considered as part of the root set, so CMS remark may be very slow. If mutator is still allocating memory at a high speed during CMS’s concurrent markup phase and young Gen has a lot of objects, then remark phase may be paused for a long time. The larger Young Gen is, the longer CMS Remark pause time is likely to be. The G1 finial marking(or remarking) phase only requires scanning the SATB buffer
- The CMS reclaims the memory in the clear mode, which causes memory fragmentation. The evacuation phase of G1d uses a replication algorithm without fragmentation