CMS garbage collection mechanism

[Garbage Collection] Series (@.@)

Recently in the arrangement of THE JVM related PPT, the CMS algorithm again, each time to read the source can understand a little more, continue to adhere to.

What is the CMS

Concurrent MarkSweep CMS is a Concurrent garbage collector that uses a mark-sweep algorithm. If CMS garbage collector is used in the old days, the vm parameter -“XX:+UseConcMarkSweepGC” is added.

Usage Scenarios:

The GC process stops for a short time, which is suitable for services with high latency requirements. User threads are not allowed to stop for a long time.

Disadvantages:

The service runs for a long time, causing severe memory fragmentation. In addition, the algorithm implementation is relatively complex (if also a disadvantage)

Implementation mechanisms

According to the trigger mechanism of GC, there are two types: periodic Old GC (passive) and active Old GC (active).

Periodic Old GC

Periodic Old GC, execution logic is also called Background Collect, to recycling of Old age, are common in the GC logs, by the Background thread ConcurrentMarkSweepThread cycle to judge whether to trigger (default 2 s).

The trigger condition

1, if not set – XX: + UseCMSInitiatingOccupancyOnly, virtual opportunities according to the collected data to decide whether to trigger (advice online environment bring this parameter, or it will increase the difficulty of the problem screen). 2, use to old s threshold CMSInitiatingOccupancyFraction, 92% by default. 3, permanent use of generation, reach the threshold CMSInitiatingPermOccupancyFraction, default 92%, the premise is open CMSClassUnloadingEnabled. 4, the new generation of promotion guarantee failure.

Promotion guarantee failure

Is there enough space in the old age to accommodate all the new generation objects or the objects that have been promoted to the old age on average in history? If not, a collection of the old age should be carried out in advance to prevent promotion failure in the next YGC.

Periodic Old GC process

When the conditions are met, the “mark-clean” algorithm is used to recycle the old years. The process can be said to be very simple, such as marking the living objects and cleaning up the garbage objects. However, in order to realize the low delay of the whole process, the actual algorithm is far from so simple, and the whole process is divided into the following parts:

In the process of marking, objects can be divided into three types according to the marking situation:

  1. White object, indicating that it is not marked;
  2. A gray object indicating that it is marked, but that internal references are not processed;
  3. Black objects, indicating that they are marked and internal references are processed;

Suppose that when Background Collect occurs, the object distribution of the Java heap is as follows:

1. InitialMarking (InitialMarking, the whole process STW)

This stage is executed by a single thread and is mainly divided into two steps:

  1. Mark old age objects reachable by GC Roots;
  2. Traversing new generation objects, marking reachable old age objects;

After this process, the object distribution is as follows:

2. Marking (concurrent Marking)

This phase is executed concurrently by the GC thread and the application thread, traversing the live objects marked by the InitialMarking phase, and then recursively marking the objects reachable by those objects. [Obtaining resources]

Because the phase of the concurrent execution, Cenozoic may occur during operation is the object of promotion to the old age, or directly in the old s allocating objects, or update old s object reference relationship, and so on, for these objects, are required to mark, otherwise will be missing some object, target leakage happens.

To improve the efficiency of re-marking, the Card where the objects reside is identified as Dirty. In the future, only Dirty Card objects need to be scanned to avoid scanning the entire old age.

3. Precleaning

Disable this phase with the CMSPrecleaningEnabled parameter. It is enabled by default. It does two things:

  1. Handle references already discovered by the new generation. For example, in the concurrency phase, an OBJECT A is allocated in Eden, and an object A refers to an old object B (which has not been marked before). Object B is marked as an active object in this phase.
  2. In the concurrent marking phase, if the object’s internal reference changes in the old days, the Card will be marked as Dirty (actually this is not using CardTable, but a similar data structure called ModUnionTalble). By scanning these tables, Re-mark objects that refer to updates during the concurrent marking phase (objects promoted to the old age, objects that were originally old)
4, AbortablePreclean (interruptible preclean)

Is the premise of the phase, new generation the memory usage of Eden area is greater than the parameter CMSScheduleRemarkEdenSizeThreshold default is 2 m, if the object of the new generation is too little, there is no need to perform the stage, direct execution to mark phase. [Obtaining resources]

Why is this stage needed and what is the value of existence?

Because the ultimate goal of the CMS GC is to reduce the pause time for garbage collection, try your best to process older objects that were updated by the application thread during the concurrent phase so that less processing can be done during the relabelling phase of the pause and the pause time can be reduced accordingly. [Obtaining resources]

In this phase, the main cycle does two things:

  1. Handles objects in the From and To sections and marks reachable old age objects
  2. As in the previous phase, objects in the Dirty Card are scanned

Of course, this logic doesn’t go on forever, and there are three conditions for breaking the loop:

  1. You can set the maximum number of cyclesCMSMaxAbortablePrecleanLoops, the default is 0, meaning there is no limit on the number of cycles.
  2. If the time to execute this logic reaches the thresholdCMSMaxAbortablePrecleanTime, the default is 5s, which exits the loop.
  3. If the memory usage of the Eden area of the new generation reaches the thresholdCMSScheduleRemarkEdenPenetration, default 50%, will exit the loop. (This condition can be established on the premise that the utilization rate in Eden area is less than 1/10 during Precleaning)[Obtaining resources]

If a YGC occurs before the cycle exits, the burden of scanning the young generation will be greatly reduced for the subsequent Remark stage. However, the occurrence of YGC is not artificially controlled, so we can only pray for a YGC within 5s.

.1678.150: [CMS-concurrent-preclean-start] 
1678.186: [CMS-concurrent-preclean: 0.044/0.055 secs] 
1678.186: [CMS-concurrent-abortable-preclean-start] 
1678.365: [GC 1678.465: [ParNew: 2080530K->1464K(2044544K), 0.0127340 secs] 
1389293K->306572K(2093120K), 0.0167509 secs] 
1680.093: [CMS-concurrent-abortable-preclean: 1.052/1.907 secs]  
....
Copy the code

In the GC log above, 1678.186 starts the AbortablePreclean phase, followed by an YGC less than 2s later.

5. FinalMarking (concurrent re-marking, STW process)

This phase is executed concurrently. In the previous parallel phase (the GC thread and the application thread are executed simultaneously, like your mother cleaning the house and you throwing away paper scraps), new references may be generated as follows:

  1. New objects from the old era are referenced by GC Roots
  2. Unmarked objects of the old age are referenced by new generation objects
  3. Adds a new reference to an object marked in the old age to another object in the old age
  4. The new generation object pointing to the old age reference was removed
  5. There may be other cases…

Some of the above objects may have been processed in the Precleaning stage and AbortablePreclean stage, but there are always some that have not been processed, so there are also the following processing:

  1. Iterate over new generation objects and re-mark them
  2. Re-label according to GC Roots
  3. I’m going to go through old Dirty cards, and I’m going to re-mark them, because most of the Dirty cards here have already been dealt with in the Clean phase

In the first step, all objects of the new generation need to be traversed. If the new generation is heavily used, many objects need to be traversed, which is a disaster for the total time of this phase (because a large number of objects may be temporary, and these objects may also reference a large number of old objects). AbortablePreclean, if YGC happens exactly once in the AbortablePreclean phase, it can avoid scanning invalid objects.

What if there is no time to execute YGC once in the AbortablePreclean phase?

The CMS algorithm provides a parameter: CMSScavengeBeforeRemark, which is not enabled by default. If this parameter is enabled, YGC will be forcibly triggered before the implementation of this stage, which can reduce the traversal time of new generation objects and recycle them more thoroughly.

However, this parameter has advantages and disadvantages, the advantage is to reduce the pause time of the Remark phase, the disadvantage is that in the case of few new generation objects, there is also a YGC, the most pitiful is that in the AbortablePreclean phase, there is already a YGC, and then a silly trigger in this phase.

So the pros and cons need to be grasped.

Take the initiative to Old GC

This process of active Old GC is triggered by strict conditions:

  1. Promotion Failed occurs in the YGC process, and then the old years are recycled
  2. Let’s say it’s executedSystem.gc(), provided there are no parametersExplicitGCInvokesConcurrent
  3. Other cases…

If an active Old GC is triggered while a periodic Old GC is executing, the periodic Old GC will be deprived of execution (only one Old GC can be running at a time). Record concurrent mode failure or concurrent mode interrupted.

At the beginning of active GC, it is necessary to determine whether the GC should Compact the old space (because a long period of periodic GC will cause a large amount of debris space). The logic of judgment is as follows:

*should_compact =
 UseCMSCompactAtFullCollection &&
 ((_full_gcs_since_conc_gc >= CMSFullGCsBeforeCompaction) || 
GCCause::is_user_requested_gc(gch->gc_cause()) || 
gch->incremental_collection_will_fail(true /* consult_young */));
Copy the code

Compression occurs in three cases:

  1. The parameterUseCMSCompactAtFullCollection(default true) andCMSFullGCsBeforeCompaction(default 0), so by default each active GC compresses the old memory space by moving the object to the far left of memory.
  2. Of course, like executionSystem.gc(), provided there are no parametersExplicitGCInvokesConcurrent, will also compress.
  3. If the new generation of promotion guarantee will fail.

The algorithm with compression action, called MSC, mark-clean-compression, takes a single-threaded, all-pause approach to garbage collection with very, very long pauses…

So what does the algorithm look like without compression?

The execution logic without compression action is called Foreground Collect. Compared with periodic Old GC, the whole process lacks Precleaning and AbortablePreclean, and other processes are almost the same.

If the execution System. The gc (), and added ExplicitGCInvokesConcurrent parameters, and then does not belong to gc actively, it will promote Old gc periodically, such as just after once, will not wait after 2 s check condition, but immediately start the cyclical Old gc.

In the end, I wish you all success as soon as possible, get satisfactory offer, fast promotion and salary increase, and walk on the peak of life.

If you can, please give me a three support me?????? [Obtain information]