1. The background

The LongGC alarm is generated for applications on multiple service lines

Recently, Long GC alarms (longer than 3 seconds) are frequently received from CAT.

2. Knowledge review

2.1 JVM heap memory partition

  • Young Generation

The Cenozoic era is divided into three zones: Eden, from survivor, and to survivor. Most objects are created in the new generation. The Minor GC is aimed at the new generation of garbage collection.

  • The Old Generation

Objects that survive a few Minor GC’s in the new generation are relegated to the old generation. The Major GC is for old age garbage collection. This paper focuses on the analysis of CMS is a garbage collection algorithm for the old. In addition, Full GC does garbage collection for the whole heap (both new generation and old generation).

  • Permanent generation (Perm)

It stores data such as class information, constants, and static variables that have been loaded by VMS. This area has little impact on garbage collection and will not be covered too much in this article.

2.2 six important phases of CMS garbage collection

1. Initial-mark (the first STW phase of CMS), which marks objects referenced directly by GC Root.

Concurrent-mark stage, in which all reachable objects are marked from the objects marked in the first stage.

Concurrent-preclean Concurrent preclean is also a phase of concurrent execution. In this phase, objects that were promoted from the new generation or newly assigned or updated during the implementation of the previous phase are found. By rescanning these objects concurrently, the pre-clean phase can reduce the amount of work done in the next stop-the-world re-marking phase.

Concurrent-abortable-preclean Indicates the preclean phase of concurrent abortable. This phase actually does the same thing as the previous phase to reduce the workload of the next STW relabeling phase. This phase is added so that we can control the end time of this phase, such as the scanning time (default: 5 seconds) or the end of this phase when the usage ratio of Eden area reaches the desired ratio (default: 50%).

Remark remarking phase (second STW phase of CMS), suspends all user threads, rescan the heap from GC Root, and marks alive objects. Note that although the CMS only collects old garbage objects, this phase still needs to scan the new generation because many GC roots are in the new generation, and these GC roots point to objects in the old generation. This is called “cross-generation references”.

Concurrent-sweep, concurrent-sweep, concurrent-sweep

3. The analysis

Let’s take a look at what happens when LongGC appears.

Analyze the GC logs of one of the applications and find that the LongGC occurred during the collection phase of the CMS.

Arrow 1 shows that the abortable-preclean phase takes 4.04 seconds. Arrow 2 shows the remark phase, which takes 0.11 seconds.

Although the abortable-preclean phase is concurrent, other user threads are not suspended. Even if it’s not optimized, it probably doesn’t matter. However, receiving gc alarms from various business lines every day is not a good thing in the long run.

Take a look at the application’s GC statistics, including the number of GC counts, before tuning:

During the statistical period (18 days), CMS GC occurred 69 times, of which the average time of abortable Preclean phase was 2.45 seconds, the average time of final remark phase was 112ms, and the maximum time was 170ms.

4. Optimize your goals

Reduce abortable preclean time without increasing final remark time (because remark is STW).

5. Tune JVM parameters

5.1 First Tuning

Try lowering the abortable preclean stage to see the effect.

There are two parameters that control when this phase ends:

  • -XX:CMSMaxAbortablePrecleanTime=5000

The default value is 5s, which indicates the maximum duration of this phase

  • -XX:CMSScheduleRemarkEdenPenetration=50

The default value is 50%, which indicates that if the usage ratio in Eden exceeds 50%, the phase ends and enters remark

The maximum duration is 1s, and the Eden area occupies 10%, as follows:

-XX:CMSMaxAbortablePrecleanTime=1000

-XX:CMSScheduleRemarkEdenPenetration=10

Why we adjust these two values is considered as follows: Firstly, each CMS occurs when the usage ratio reaches 80% in the old years, because this is determined by the following two parameters:

-XX:CMSInitiatingOccupancyFraction=80

-XX:+UseCMSInitiatingOccupancyOnly

The increase in the old generation is due to the fact that some of the objects that survived the Minor GC are promoted to the old generation, resulting in an increase in the old generation usage ratio, i.e., each Minor GC occurs just before the CMS GC, so the usage ratio of the new generation at that point is very low. Then we expect to end the abortable preclean stage as soon as possible at this time, so there will be no need to scan too many objects in Eden area during remark, and the time of remark STW will not be too long.

The idea of adjustment is so, how is the effect?

The result of the first adjustment

During the statistical period (17 hours or so), two CMS GCS occurred. Abortable Preclean takes 835ms on average, which is expected. However, the average time of Final Remark is 495ms (112ms before adjustment), one time is 80ms, the other time is 910ms! Nearly a second! Remark is STW! This is unacceptable for applications that require low latency!

Comparing the detailed GC logs from the two CMS GC’s, we found something very useful for analyzing the problem.

Remark remark the GC log that took 80ms

[YG occupancy: 181274 K (1887488 K)] – Current occupancy and total capacity of young occupancy

When the remark, which took 80ms, occurred (9am, non-peak time), the Cenozoic generation (YG) occupied 181.274m.

Remark Remark The GC log that took 910ms

[YG occupancy: 773427 K (1887488 K)]

When the 910ms remark occurred (around 10pm, peak time), the Cenozoic generation (YG) occupied 773.427M. Because at the peak of this time, the Cenozoic usage rose very fast, almost the same time, the non-peak time only rose to 181M, but the peak time rose to 773M.

A useful conclusion can be drawn here: if the abortale preclean phase is too short, then in remark, the larger the Cenozoic generation occupation, the longer the remark duration (STW).

This is a dilemma. If the abortale preclean time is not shortened, longGC will be reported. If the remark phase is shortened, the remark phase will be longer, and it is STW, which is unacceptable.

In this case, CMS provides the CMSScavengeBeforeRemark parameter to try to perform a Minor GC before the remark phase to reduce the occupation of the Cenozoic generation.

-XX:+CMSScavengeBeforeRemark

Enables scavenging attempts before the CMS remark step. By default, this option is disabled.

5.2 Second Tuning

Considerations before tuning:

Adding -xx :+CMSScavengeBeforeRemark is not costless, as it adds a Minor GC pause. Therefore, the criterion of whether this scheme is good or not is: if the minor GC pause time after adding CMSScavengeBeforeRemark parameter + remark GC pause time is smaller than the remark GC pause time before adding CMSScavengeBeforeRemark parameter, it is a good scheme.

Results of the second adjustment

During the statistical period (about 20 hours), three CMS GCS occurred. Abortable Preclean takes 693ms on average. The average time of Final remark is 50ms, and the maximum time is 60ms. The Final remark time is lower than the average time before tuning (112ms).

What about the Minor GC pause times before the CMS GC? Take a look at the verbose GC logs.

Analyze Minor GC logs before CMS GC remarks three times

In the first non-peak period, the Minor GC takes 0.01s + remark takes 0.06s = 0.07s = 70ms, as follows

The second time is the peak period, and the Minor GC takes 0.01s + remark takes 0.05s = 0.06s = 60ms, as follows

In the third non-peak period, the Minor GC takes 0.00s + remark takes 0.04s = 0.04s = 40ms, as follows

As a result, the average time of the three Minor GC + remark times is < 60ms, which is much better than the average time of 495ms for the first tuning.

6. Optimization results

So far, our initial goal of reducing abortable preclean time without increasing final remark time has been achieved. Even remark time has been shortened.

7. Summary

The solution to the long abortable preclean time can be summed up in two steps:

  • Shorten the aborTable preclean duration by adjusting these two parameters:

-XX:CMSMaxAbortablePrecleanTime=xxx

-XX:CMSScheduleRemarkEdenPenetration=xxx

One criterion for how much to adjust is: at the end of the abortable preclean phase, the space occupied by the new generation cannot exceed a certain reference value. After the first tuning, Cenozoic generation (YG) occupies 181.274m, and remark takes 80ms. When Cenozoic generation (YG) occupies 773.427M, remark takes 910ms. So this reference could be 300M. If the Cenozoic generation grows too fast, like this tuning application that uses 2 gb of Cenozoic heap space in 2 seconds, you can only do a Minor GC through CMSScavengeBeforeRemark.

  • Added the CMSScavengeBeforeRemark parameter to attempt Minor GC before enabling remark

Although it is officially stated that this is an attempt to perform Minor GC, it may not happen. In practice, however, almost every time remark is preceded by a Minor GC.

8. To summarize

  1. Define your goals before tuning
  2. During the tuning process, GC indicators were statistically analyzed (in this paper, gCeasy. IO online analysis tool was used) to verify the effect
  3. You need to be able to read GC logs
  4. GC tuning is not an overnight affair, it is a fine-sight-fine-tuning process. Therefore, you need to understand some of the basics of GC in order to avoid detours.