Relevant concepts

The official name of the CMS GC is “Mostly Concurrenct Mark and Sweep Garbage Collector”. Scope of action: old algorithm: concurrent mark clearing algorithm. Enable parameters: -xx :+UseConMarkSweepGC Default number of reclaimed threads: (Number of processor cores + 3) /4 When the CMS garbage collector is used after Java9, the default young generation is ParNew and cannot be changed, while JDK9 is marked as not recommended and JDK14 is removed.

What is the difference between concurrency and parallelism? Parallel: Parallel describes the relationship between multiple garbage collector threads, indicating that more than one such thread is working together at the same time, and the user thread is usually in a waiting state by default. Concurrent: Concurrent garbage collection threads and the relationship between the user thread, show the same time the garbage collection thread and the user threads are in work, because the user thread is not frozen, so still can continue to corresponding service request, but as a result of the garbage collector thread take up system resources, the application processing throughput will be some impact.

Design goal/advantage: To avoid long delays in garbage collection in the old days, this goal is achieved through two main means:

  • First, instead of collating older generations, you manage the reclamation of memory space using free-lists
  • Second, most of the work in the Mark-and-sweep phase is done concurrently with the application threads.

Applicable scenarios:

  • The GC process is short and has low latency, which is suitable for systems with high latency requirements

If the server is a multi-core CPU and the main tuning goal is to reduce system latency caused by GC pauses, using a CMS is a wise choice. Many times this directly improves the user experience by reducing the length of each GC pause. Because most of the time some CPU resources are consumed by garbage collector threads, the CMS GC can perform worse than the parallel GC under CPU constraints (for most systems, the difference between throughput and latency should not be significant). In practice, when performing concurrent collections of older generations, Minor GC may accompany multiple young generations. In this case, the log for the Full GC is interspersed with multiple minor GC events

Several large phases of the CMS GC

  • 1. CMS Initial Mark
  • CMS Concurrent Mark
  • 3, re-marking (CMS Remark)
  • Concurrent sweep (CMS Concurrent sweep)

Stage 1: Initial markup

This phase will STW.

Working Mode:Single-threaded before JDK7, multi-threaded after JDK8

Goal:Mark all root objects, including those directly referred to by the root object, as well as older objects referred to by all living objects in the young generation (just mark objects that GC Roots can relate to directly, quickly)

Phase 2: Concurrent markup

In this phase, the CMS GC iterates through all objects, marking the surviving objects, starting with the root element found in the “initial tag” of the previous phase. The concurrent tagging phase is the phase that runs concurrently with the application without pausing. The state of the object may change due to concurrent execution with the user thread, as follows:

  • The objects of the younger generation are promoted from the younger generation to the older generation
  • Some objects are assigned directly to the old age
  • Object reference relationship changes between the old generation and the young generation

The JVM will passCard (Card)In a way that marks old areas that have changed as “dirty”, this is calledCard Marking



In the diagram above, a reference to the “currently processed object” is broken by the application thread, that is, the object relationship in this section has changed

Phase 3: Concurrent pre-cleanup

This phase is still executed concurrently with the application thread, without stopping the application thread. Purpose: To make the final/relabeled STW time as short as possible to mark the target:

  • In the old days a card marked “dirty” in a concurrent tag
  • Old age objects referenced in the surviving sections (from and to)

Closing parameters:-XX:-CMSPrecleaningEnabled.The default open





Phase 4: Cancelable concurrent precleanup

This phase also does not stop the application, and this phase tries to do as much work as possible before the final tagging phase of the STW. The exact timing of this phase depends on a number of factors, as it does the same thing in a loop until some exit condition (such as number of iterations, useful work, system time consumed, etc.) is met. Objective: As with concurrent preprocessing, in order to keep the final/relabeling STW time as short as possible value: Wait as long as possible for a Minor GC before entering the final tag to minimize the pause trigger conditions for the final tag phase: after the pre-cleanup step, if one of the following conditions is met, interruptible pre-cleanup is initiated and the re-tag phase is directly initiated

  • The used space of Eden is greater than-XX:CMSScheduleRemarkEdenSizeThreshold“, this parameter defaults to 2M;

Cancellation conditions:

  • Set upCMSMaxAbortablePrecleanLoopsNumber of cycles, and the number of executions is greater than or equal to this value.The default is 0
  • CMSMaxAbortablePrecleanTimeThe time to perform interruptible precleanup exceeds this valueThe default value is 5000 milliseconds
  • The Eden usage reached-XX:CMSScheduleRemarkEdenPenetration.The default value of this parameter is 50%.

Question:It is possible that continuous pauses may occur if the final tag is performed during the cancelable concurrent preprocessing without waiting for the Minor GC. If the new generation has a Minor GC (STW) at the time of the final tag, continuous pauses may occur because the final tag is STW. CMS provides parametersCMSScavengeBeforeRemarkForce a Minor GC before final/re-marking (which also causes continuous pauses, Minor and Remark).

Stage 5: Final marking/re-marking

The final tag is the second (and final) STW pause in the GC event of this phase. Target: Rescan objects in the heap, because the previous pre-cleanup phase was performed concurrently and it is possible that the GC thread could not keep up with the application’s changes. Scan range: Cenozoic objects +GC Roots+ objects marked as “dirty”. If the pre-cleaning phase is not done properly, this step will take a lot of time to scan the new generation.

Phase 6: Concurrent Sweep

This phase executes concurrently with the application without STW pauses. At this stage, the JVM removes objects that are no longer used and reclaims the memory they occupy. Because phase 5 has marked all the objects that are still in use, this phase can be executed concurrently with the application thread.

Phase 7: Concurrent Reset

This phase executes concurrently with the application to reset internal data related to the CMS algorithm in preparation for the next GC cycle. In summary, the CMS garbage collector does a lot of complex and useful work to reduce pause times without suspending the application thread while the parallel threads for garbage collection are executing.

Dynamic detection mechanism

CMS will according to historical records, predict how long will it be old s will full and the time needed for a recovery, you can use the parameter _ – XX: + UseCMSInitiatingOccupancyOnly_ to shut down, after opening the parameters, Configuration of recycling threshold – XX: CMSInitiatingOccupancyFraction = N will take effect for a long time, otherwise it will only take effect for the first time

disadvantages

Abnormal situation

  • Concurrent mode failure: Most of the CMS phases are executed concurrently with the user thread. A Concurrent mode failure is reported if the objects created by the user thread are allocated directly to the old age while garbage collection is being performed and there is not enough memory
  • Concurrent mode failure: Concurrent mode failure occurs when Minos do Minor GC and older generations do not have enough space to store promoted objects. If the allocation cannot be made due to memory fragmentation, promotion is reported as unsuccessful
  • Permanent generation space (Java8 meta-space) runs out: CMS does not collect permanent generations by default, and FullGC is triggered once the permanent generation space runs out

tuning

  1. In terms of hardware, the number of CPU cores can be increased. CMS is a multi-thread garbage collector, and the default number of threads started is (CPU cores +3) /4. The more CPU cores, the less impact on user threads
  2. Too long pause time tuning
    1. Firstly, we need to judge which stage is slow. The pause caused by CMS is as follows:
      1. The young Minor GC pauses
      2. The old era starts and ends with a pause
      3. Serial Old Serial Old Serial Old Serial
      4. Full GC pause
  3. Concurrency failure tuning
    1. Increase the space of the old age and increase the size of the whole heap
    2. Increase CMS garbage collection frequency –>> Adjust CMS collection thresholds
      1. - XX: CMSInitiatingOccupancyFraction = value + -XX:+UseCMSInitiatingOccupancyOnly.-XX:CMSInitiatingOccupancyFractionMake it small, but not too small. Too small will lead to too many ineffective collections and waste resources
        1. As suggested in the Java Performance Guru’s Guide, a better value for this flag for a particular application can be derived from the value in the GC log for the first startup failure of the CMS cycle. The method is to look for concurrent mode failures in the garbage collection log, find the most recent startup record of the CMS cycle, calculate the old age space footprint from the log, and set a smaller value than that value.
    3. Increase the number of CPU reclamation threads ((Number of CPU cores +3)/4)
  4. Permanent generation tuning:If garbage collection is performed on the permanent generation, Full GC is performed. CMS does not process garbage in the permanent generation by default-XX:+CMSPermGenSweepingEnabledTo enable collection of the method area, a set of threads will be dedicated to garbage collection of the permanent generation, and another parameter needs to be enabled-XX:+CMSClassUnloadingEnabledSo that unused classes can be unloaded during garbage collection.

conclusion

If your system is looking for low latency, you can choose the CMS garbage collector, only the STW time is shorter, but the overall GC time is relatively longer; If the system pursues high throughput, Parallel GC can be selected. Although STW time is long, it can guarantee non-GC time, and the whole system resources are occupied by the application thread.

The resources

  • Understanding the Java Virtual Machine by Zhiming Zhou
  • Not to be missed CMS study notes
  • Explain the CMS garbage collection mechanism