This is the 25th day of my participation in Gwen Challenge
Study background
-
There are a lot of articles about CMS GC introductions and tuning, but most of them are unproven. Because CMS is still in relatively high use until Java9 (and G1 still needs to be investigated), here are some of the most important CMS knowledge and tuning lessons learned.
-
The JVM source code version of/its – 8 – SRC – b132-03 _mar_2014 / its/hotspot/SRC/share/vm, personal advice or choose openjdk7 is better, because is the industry standard!
Trust nothing but the OpenJDK source code and R size.
Some important points of CMS
Three parameters are required to use the CMS GC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=n
-XX:+UseCMSInitiatingOccupancyOnly
Copy the code
-
– XX: CMSInitiatingOccupancyFraction = n: the percentage of CMS recycling garbage collector mechanism triggered jdk8 proportion is 92%.
-
– XX: + UseCMSInitiatingOccupancyOnly: whether every time the triggering MajorGC curing parameters distribution
NewRatio Parameter Reference
- The default NewRatio is 2, indicating that the ratio of Cenozoic to old is 1:2, or 1/3 of the heap
However, when -xmx and -xMS are set, the size of the new generation is not as expected
Reason: the runtime. The arguments. CPP
else if (UseConcMarkSweepGC) {
set_cms_and_parnew_gc_flags();
}
const size_t preferred_max_new_size_unaligned =
MIN2(max_heap/(NewRatio+1), ScaleForWordSize(young_gen_per_worker * parallel_gc_threads));
Copy the code
The size of the CMS Cenozoic generation is calculated
Therefore, when using CMS, it is recommended to manually specify the new generation size parameter
(-xx :NewRatio or -xmn or -xx :NewSize/ -xx :MaxNewSize)
In addition the JDK – 6862534: -xx :NewRatio completely ignored when combined with -xx :+UseConcMarkSweepGC -xx :NewRatio completely ignored when combined with -xx :+UseConcMarkSweepGC -xx :NewRatio completely ignored when combined with -xx :+UseConcMarkSweepGC -xx :NewRatio completely ignored when combined with -xx :+UseConcMarkSweepGC
When the CMS FGC was observed using jstat -gCCause PID, it was found that the FGC jumped twice each time the threshold was recovered
-
CMS has two phases ina concurrent cycle: Initial mark and Final Re-mark, both of which “stop the world” with short pause times
-
Jstat’s FGC counter is the number of times the application paused. Note that this is the STW caused by ‘CMS GC’. Refer to jstat’s full GC counter for details
-
If you observe the CMS FGC and suddenly find that the STW time is very long, up to a few seconds or more, there must be abnormal conditions, and these conditions are very expensive, should be avoided as much as possible when doing CMS tuning
concurrent mode failure
-
This error will be raised if the application thread requests more space from the older generation during the CMS concurrent cycle than the application thread can allocate
-
Sometimes “out of space” is temporarily out of space due to too much current floating garbage (memory fragmentation) at the time of THE CMS GC. Floating garbage is the memory space claimed by the user thread during the CMS execution. This error can trigger two situations
- CMS foreground mode (the default CMS GC belongs to background mode), this mode is CMS’s own Mark-sweep does not implement concurrent (serial) old Generation GC, but some phases are omitted.
- The foreground collector algorithm of CMS is plain Mark-sweep. It collects only the old generation of CMS, not the rest of the generation (controversially, the tag refers to the Young tribe and can also be scavenge or pretrigger the Minorgc). Therefore it is not called full GC in HotSpot VM
Serial Old GC
- Mark – sweep – compact algorithm
- It collects the entire GC heap, including the young and old generation of the Java Heap and the permanent generation of the non-Java Heap. Hence the name Full GC
The reasons for the former
A STW foreground collection can pick up where a concurrent background collection left off to try to avoid a full GC. This is nice but normally it has worse performance than a full GC.
This is done to avoid FGC, but often performs even worse than FGC.
Foreground mode for the first one, you must – XX: – UseCMSCompactAtFullCollection and – XX: CMSFullGCsBeforeCompaction set greater than zero
- The default is true, but UseCMSCompactAtFullCollection CMSFullGCsBeforeCompaction default is 0 (each fullgc compressing!) , so the second Serial Old GC must be triggered
Reference:
- Bugs.openjdk.java.net/browse/JDK-…
- Bugs.openjdk.java.net/browse/JDK-…
- Bugs.openjdk.java.net/browse/JDK-…
All Suggestions foreground collector in Java8 abandoned in Java9 removed, including UseCMSCompactAtFullCollection and CMSFullGCsBeforeCompaction these two parameters
-
Setting the above two parameters is generally not recommended; otherwise, the foreground collector may trigger in Java8, which may be slower (single-threaded). So the Serial Old GC is usually triggered when a Concurrent mode failure occurs
-
Warning about UseCMSCompactAtFullCollection and CMSFullGCsBeforeCompaction source code
runtime\arguments.cpp
if (FLAG_IS_CMDLINE(UseCMSCompactAtFullCollection)) {
warning("UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.");
}
if (FLAG_IS_CMDLINE(CMSFullGCsBeforeCompaction)) {
warning("CMSFullGCsBeforeCompaction is deprecated and will likely be removed in a future release.");
}
Copy the code
- What kind of way about source code gc_implementation concurrentMarkSweep/concurrentMarkSweepGeneration. CPP
void CMSCollector::acquire_control_and_collect{
...
bool should_compact = false; decide_foreground_collection_type(clear_all_soft_refs, &should_compact, &should_start_over); .if (should_compact) {
...
// This is the Mark-sweep-Compact Full GCdo_compaction_work(clear_all_soft_refs); . }else {
// mark-sweep
do_mark_sweep_work(clear_all_soft_refs, first_state,
should_start_over);
}
*should_compact =
UseCMSCompactAtFullCollection &&
((_full_gcs_since_conc_gc >= CMSFullGCsBeforeCompaction) ||
GCCause::is_user_requested_gc(gch->gc_cause()) ||
gch->incremental_collection_will_fail(true /* consult_young */));
Copy the code
A judge logic is to judge and should_compact main UseCMSCompactAtFullCollection and CMSFullGCsBeforeCompaction these two parameters
concurrent promotion failed
Java Performance,The Definitive Guide reads:
The original:
Here, CMS started a young collection and assumed that there was enough free space to hold all the promoted objects (otherwise, it would have declared a concurrent mode failure). That assumption proved incorrect: CMS couldn’t promote the objects because the old generation was fragmented (or, much less likely, because the amount of memory to be promoted was bigger than CMS expected).
Translation:
The CMS collector will report a Concurrent mode failure if the old generation seems to have enough free space to accommodate all the promoted objects. This assumption turned out to be incorrect, as the CMS collector was unable to promote these objects due to fragmentation of the old chronospace (or, less appropriately, due to the fact that promotion actually took up more memory than the CMS collector judged).
The original:
Sometimes we see these promotion failures even when thelogs show that there is enough free space in tenured generation. The reason is’fragmentation’ – the free space available in tenured generation is notcontiguous, and promotions from young generation require a contiguous freeblock to be available in tenured generation. CMS collector is a non-compactingcollector, so can cause fragmentation of space for some type of applications.
Translation:
-
The CMS collector does not do any compression or collation of the old collection, meaning that the old collection will become fragmented as the application runs. Too much fragmentation will affect the allocation of large objects, although there is still a lot of space left in the old era, but there is no continuous space to allocate large objects
-
If CMS says promotion is ok when ParNew is ready to collect, but ParNew does encounter promotion failure after it has started collecting.
-
Promotion failed means that the guarantee mechanism determines whether there is enough space for new objects in the old age. If the guarantee mechanism says there is, but it fails to find continuous space due to fragmentation in the real allocation. Concurrent mode failure occurs when a user thread requests more space than the reserved space before the concurrent cycle is completed, i.e. the background thread is not collecting as fast as the application thread can allocate it.
-
Promotion failed Triggers FGC in the same mode as above, usually also Serial Old GC
permgen (or the metaspace) fills up
- In the case of Java8, this is mostly triggered when metaspace expands
- If a CMS was set in the old age, the FGC caused by Metasapce expansion is converted to a CMS
- By default, the collector in Java8 collects classes that are no longer loaded in the meta-space
After the application is started, the FGC is displayed by jstat-gccause PID, and the OU is not occupied
-
This is usually the result of the metaspace expansion mentioned above, which can also be seen from LGCC
Metadata GC Threshold
, triggered because the Metaspace size reached the GC threshold -
MetaspaceSize is mainly the initial threshold and minimum threshold that controls metaspaceGC occurrence, but the threshold that triggers metaspaceGC is constantly changing
Jstat -gccause 23270 1000 S0 S1 E O M CCS YGC YGCT FGC FGCT GCT LGCC GCC 0.00 25.87 82.46 0.00 97.47 94.80 1 0.124 2 0.096 0.220 Metadata GC Threshold No GCCopy the code
By observing the GC logs, several cases of CMS exceptions occur
[ParNew (promotion failed): ... (concurrent mode failure):...
Copy the code
In this case, promotion failed occurs first and then the FGC is ready to be triggered.
The CMS is performing concurrent collection, and the interrupt logic is displayed as concurrent mode failure
Specific source code is also concurrentMarkSweepGeneration. CPP
if (first_state > Idling) {
report_concurrent_mode_interruption();
}
[ParNew (promotion failed): ...
Copy the code
Concurrent mode failure (concurrent mode failure)
The JVM has a memory guarantee mechanism, which is similar to determining whether the largest contiguous available space in the old age is greater than the sum of all objects in the new generation. Promotion failed occurs when the guarantee mechanism is sufficient. Therefore, since there is the maximum available contiguous space, why failed with 5.0 because a single contiguous chunk of space is not required for promotions, that is, after JDK5, Promotion does not require continuous space, so the guarantee here refers to ‘is there enough space in the old days to accommodate the target of promotion’, not continuous space. So fail is a fragmentation problem
CMS Optimization Direction
The principle of
** CMS has the advantage of low latency, but if there is a long STW, it can have a significant impact on the application
If concurrent mode failures and promotion failures occur, they are very expensive and should be avoided as much as possible. **
Optimizations for Concurrent mode failures
The main reason for this failure is that CMS cannot clear the old chronospace fast enough
Concurrent reclamation begins when the usage of old chronospace reaches a certain threshold. The race begins when a CMS background thread starts scanning the old chronospace for unwanted junk objects. The CMS collector must complete the scanning and reclamation of the old era space before the remaining space of the old era is exhausted. Otherwise this error will occur if it fails during normal speed racing
During the concurrent cleanup phase, the user thread is still running and space must be set aside for the user thread, resulting in ‘floating garbage’.
Conventional optimization approaches are as follows
-
Execute background recycle threads at a higher frequency, that is, increase the frequency with which CMS concurrent cycles occur
-
Mainly to trim CMSInitiatingOccupancyFraction values
-
But not too low, because too low can lead to too frequent GC, consuming more CPU and pauses
You need to calculate the resident memory size of the old years first, if 60%, then this threshold can be set to about 70%, otherwise gc will be more frequent
A guarantee mechanism can be considered. As long as the remaining space reserved for the old generation is larger than the size of the young generation, for example, the ratio between the new generation and the old generation is 1:4, that is, the new generation occupies 25% of the old generation, the threshold can be set to 70, that is, 30% of the space reserved for the old generation
Note that if there are too many floating garbage, this problem cannot be solved. That is, during concurrent collection of CMS, floating garbage increases and occupies reserved space. If yGC is repeated for many times, the reserved space may be filled, although the probability is low
Considering the two conditions, if the threshold of 70 is set, but the resident memory of the old generation is large or even exceeds 70, then it is recommended to increase the heap memory, increase the size of the old generation or reduce the size of the new generation
Optimization for promotion failed
This is the most serious’ fragmentation problem ‘in CMS, and we try to avoid FGC when this happens
So optimizing the problem can also be described as’ how to solve the fragmentation problem ‘.
Conventional optimization approaches are as follows
Increase heap memory, increase age size, The HotSpot JVM uses a trick to compress object Pointers when heaps are less than around 32 GB.
Perform CMS gc as soon as possible, setting up reasonable CMSInitiatingOccupancyFraction, will merge in adjacent free space, the old generation can be assigned to a larger objects
Same as above, you can make an older generation with more space than the younger generation
When the threshold is reached, the CMS GC will be triggered, but floating garbage + debris will still occur as described above
Another “bad” solution is to do FGC in the early hours of the morning, when traffic is low, and perform “fragment compression”.
Such as System. Gc, but should pay attention to whether open – XX: + ExplicitGCInvokesConcurrent, – XX: DisableExplictGC
So the recommended solution isto use ** jmap-histo :live**
Promotion also includes to space is small, can try to improve Survivor according to the situation
CMS Actual combat Parameters
CMS logs are used to troubleshoot CMS problems
Basic parameters:
-Xloggc:gc_%t.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps
Copy the code
Optional debugging parameters:
+ PrintGCApplicationStoppedTime # print pause log - - XX: XX: + PrintTenuringDistribution: Print promotion log -xx :+PrintPromotionFailure # Print dispatch failure -xx :+PrintHeapAtGC GC prints -xx :PrintFLSStatistics=1 Less used mainly for statistical calculationsCopy the code
CMS related
- Dedicated server memory: 16 GB
- It is estimated that the resident object of the old age, such as Player 3000, is 2M on average, about 6GB, so 10GB is recommended for the old age
- -Xms12G -Xmx12G
- Set 2G for new generation and 10G for old generation
- Set CMSInitiatingOccupancyFraction to 70, the remaining space for 3 g, the old s is greater than the size of Cenozoic era
- Optional: – XX: + CMSScavengeBeforeRemark
Simple algorithm:
-xx :NewRatio=4, that is, Cenozoic and old age 1:4
And then set CMSInitiatingOccupancyFraction is 70, that the remaining space larger old s new generation But to ensure that the 70 permanent memory than the old s basically, otherwise it may frequently CMS gc
It is also recommended to add scripts and try to perform FGC manually to defragment
Like every day at 3 a.m
jstat -gccause pid >> cms.log
jmap -histo pid >> cms.log
jstat -gccause pid >> cms.log
jmap -histo:live pid >> cms.log
Copy the code
metaspace
-xx :MetaspaceSize=512m -xx :MaxMetaspaceSize=512m. Note that if the value is too small, FGC or even Metaspace OOM will occur