background
One day, a student in the group sent a map of jmap-heap memory usage.
It says Survivor zones are always above 98%.
If you look closely at this chart, it contains several important messages:
- The From and To areas are relatively small, only 10M. Capacity is smaller, just appear to occupy ratio tall.
- The proportion and usage of Old area (two G) are relatively high.
In addition, you can also see that the ratio between Eden, From, and To is not the default 8:1:1.
So, adaptive Design policy comes to mind immediately.
It is confirmed by group friends that the default recycle algorithm of JDK 1.8 is used.
JVM parameters are configured as follows:
The GC algorithm is not configured in the parameter, that is, the default UseParallelGC is used.
Start a JDK 1.8-based application with the default parameters, then use jinfo-flags PID to see the default configured GC algorithm.
Adaptive IzePolicy is enabled by default
AdaptiveSizePolicy is part of the JVM GC Ergonomics.
If AdaptiveSizePolicy is enabled, the Eden, From, and To areas are recalculated after each GC based on the GC time, throughput, and memory usage.
JDK 1.8 uses the UseParallelGC garbage collector by default, which has Adaptive IzePolicy enabled by default.
Adaptive IzePolicy has three goals:
- Pause goal: The application reaches the expected GC Pause time.
- Throughput goal: The application reaches the expected Throughput, that is, the uptime of the application/(uptime + GC time).
- Minimum footprint: The Minimum memory footprint possible.
Adaptive Server Policy involves the following actions to achieve the three expected goals:
- If the GC pause time exceeds the expected value, the memory size is reduced. In theory, by reducing memory, you can achieve the desired pause time by reducing the time it takes to perform operations such as garbage tagging.
- If the application throughput is less than expected, the memory size is increased. In theory, increasing memory can reduce the frequency of GC to achieve the desired throughput.
- If the application meets the first two goals, it tries to reduce memory consumption by reducing memory.
Note: AdaptiveSizePolicy involves a wide range of content, this paper mainly focuses on the impact of AdaptiveSizePolicy on the size of the young generation, and the subsequent problems.
AdaptiveSizePolicy looks smart, but sometimes it can be naughty and cause GC problems.
Even if the default value of SurvivorRatio is 8, the ratio between the three zones of the young generation will change.
For this question, please refer to the answer from R University:
Hllvm.group.iteye.com/group/topic…
In the HotSpot VM, the ParallelScavenge GC (UseParallelGC/UseParallelOldGC) defaults to SurvivorRatio and is not useful if it is not explicitly set. Setting it explicitly to the same value as the default will work.
The ParallelScavenge AVENGE GC was originally designed to turn adaptive IzePolicy on by default, automatically and adaptively adjusting various parameters.
In the screenshot of qunyou, the From area is only 10M, but Eden area occupies more than 80% of the space of the young generation.
The reason for this is that adaptive Logic policy has been adjusted to achieve the desired goal.
After locating the reason for the small Survivor zone, there is another problem:
Why is the ratio and usage of the old era relatively high?
The group then uses Jmap-histo to view instances in the heap.
It can be seen that there are many instances of two classes, respectively:
- LinkedHashMap$Entry
- ExpiringCache$Entry
Therefore, search for the key class ExpiringCache.
You can see that in the context of the ExpiringCache constructor, a LinkedHashMap is initialized.
Suspected LinkedHashMapEntry is directly related.
ExpiringCache(long millisUntilExpiration) { this.millisUntilExpiration = millisUntilExpiration; map = new LinkedHashMap<String,Entry>() { protected boolean removeEldestEntry(Map.Entry<String,Entry> eldest) { return size() > MAX_ENTRIES; }}; }
Note: This map is used to store cached data and sets the elimination mechanism. When the map size exceeds MAX_ENTRIES = 200, elimination begins.
Then look at the ExpiringCache$Entry class.
The main attributes of this class are “timestamp” and “value.” Timestamps are used for timeout elimination (a common caching practice).
static class Entry { private long timestamp; private String val; ... }
Then look at where the cache is used.
So I find the get method, and I locate a method with only one class that uses the cache.
Looking up, I see a familiar class: File, whose getCanonicalPath() method uses this cache.
This method is used to get the file path.
Ask the group if they use the getCanonicalPath() method in their project.
The answer was yes.
When the getCanonicalPath() method is used in a project to get the file path, the following things happen:
- First read from the cache, if not, need to generate cache.
- To generate a cache, you need to create a ExpiringCache$Entry object to store cached values. These new objects will be allocated to the Eden block.
- When the getCanonicalPath() method is used extensively and the number of caches exceeds MAX_ENTRIES = 200, the elimination policy is enabled. The original ExpiringCache$Entry object in a map becomes a garbage object. Only 200 entries actually survive.
- When YGC occurs, the theoretically viable 200 entries go To the To zone, and the remaining garbage entries that are eliminated are recycled.
- However, because the AdaptiveSizePolicy changes the To area To only 10MB, there is no room for the objects that should be moved To the To area, so they can only be moved To the old age.
- As a result, at each YGC, there are close to 200 ExpiringCache$Entry objects alive that would be old. As cache flushing is implemented, these Entry objects immediately become garbage again.
- When an object enters the old age, even if it becomes garbage, it needs to wait for the old age GC or FGC to recycle it. YGC would like to have 200 ExpiringCache$Entry objects.
- As a result, usage gradually increased in the old age.
The problem of high memory usage in the old days has also been located.
Since only 200 instances of YGC go into the old age each time, the problem is mild.
Only once in a while the FGC is triggered, and the application seems to work just fine.
Then use jstat -gcutil to check the GC status.
You can see that 15,654 YGCs have occurred since the application started.
YGC contains 200 ExpiringCache$Entry objects.
So there are about 3130,800 ExpiringCache$Entry objects in this context.
As you can see from jmap-histo, the number of ExpiringCache$Entry objects is 6118824.
Both numbers are in the millions. The other 300W instances should be in Eden District.
After every YGC, a large number of ExpiringCache$Entry objects are reclaimed.
It can be seen from GC logs intercepted by group members that the frequency of YGC is about 23 seconds.
Assume that the jmap-histo command is running just before YGC is triggered.
So your application would generate 300W ExpiringCache$Entry instances in 20 seconds and 15W ExpiringCache$Entry instances in 1s.
If QPS = 300 in a single machine, about 500 ExpiringCache$Entry instances are generated in a single request.
The guess is that the getCanonicalPath() method is used in the body of the loop.
Thus, it can be concluded that the Survior area becomes smaller and the proportion of older age becomes higher:
- In the case of default SurvivorRatio = 8, where throughput expectations are not met, the AdaptiveSizePolicy increases the Eden zone size. The From and To areas are compressed To only 10M.
- The getCanonicalPath() method is used extensively in your project to generate a large number of ExpiringCache$Entry instances.
- When YGC occurs, surviving Entry objects go directly To the old age because the To region is too small. The old age occupation gradually becomes larger.
As can be seen from the jstat-gcutil screenshots of group members, from the start of the application to the use of the command, the application triggered 19 FGC times, which took 9.933s, and the average time of each FGC was 520ms.
This pause time is intolerable for a high QPS application.
The cause of the problem is identified and the solution is relatively simple.
There are two solutions:
- Without caching, you would not generate a large number of ExpiringCache$Entry instances.
- Prevents AdaptiveSizePolicy from shrinking the To zone. The ExpiringCache$Entry object that would survive in YGC would be kept in the To area for the younger generation, not the older generation.
Solution 1:
No caching is used.
Use the -dsun. IO. UseCanonCaches = false parameter to turn caching off.
This solution is convenient, but this parameter is not a conventional parameter, so use it with caution.
Solution 2:
Keep UseParallelGC and explicitly set -xx :SurvivorRatio=8.
Configure parameters to test:
See that the ratio between the three is not 8:1:1 by default.
As you can see, adding the -xmn100m -XX:SurvivorRatio=8 parameter fixes the ratio between Eden and Survivor.
Solution 3:
Use the CMS garbage collector.
The CMS disables adaptive Server Policy by default.
The configuration parameter -xx :+UseConcMarkSweepGC is displayed by using the jinfo command. You can see that the CMS subtracts or does not use AdaptiveSizePolicy by default.
Group friends also adopt this method:
It can be seen that the ratio between Eden and Survivor is fixed, while the To region is not narrowed. Usage and utilization rates in the old days are also very normal.
Three, the source code level to understand adaptive Server policy
Note: The following source code is mainly based on OpenJDK 8, there are differences between JDK versions.
The understanding of source code is limited, and the understanding of source code is always on the way.
Please correct any mistakes, thank you.
First, explain why an explicit configuration of SurvivorRatio fixes the ratio between the three zones of the young generation, with the UseParallelGC collector.
There is a set_parallel_gc_flags() method in the arguments.cpp class.
The method name is used to set the parameters of the parallel collector.
// If InitialSurvivorRatio or MinSurvivorRatio were not specified, but the // SurvivorRatio has been set, reset their default values to SurvivorRatio + // 2\. By doing this we make SurvivorRatio also work for Parallel Scavenger. // See CR 6362902 for details. if (!FLAG_IS_DEFAULT(SurvivorRatio)) { if (FLAG_IS_DEFAULT(InitialSurvivorRatio)) { FLAG_SET_DEFAULT(InitialSurvivorRatio, SurvivorRatio + 2); } if (FLAG_IS_DEFAULT(MinSurvivorRatio)) { FLAG_SET_DEFAULT(MinSurvivorRatio, SurvivorRatio + 2); } }
When a SurvivorRatio is explicitly set, i.e.! FLAG_IS_DEFAULT(SurvivorRatio), which sets other parameters.
The method comment reads:
Make SurvivorRatio also work for Parallel Scavenger By explicitly setting the SurvivorRatio parameter to be applied to the Parallel Scavenge avenge.
Exactly why it works remains to be learned.
The default is adjusted by adaptive IzePolicy.
Then look at the code for dynamically resizing memory using AdaptiveSizePolicy.
The UseParallelGC collector is the default in JDK 1.8. The Parallel Avenge algorithm is the younger generation recycling algorithm.
GC can be triggered for a variety of reasons, the most common of which is a failure to allocate memory in the young generation.
UseParallelGC allocates memory failure cause GC’s entry point is located at vmPSOperations. CPP class VM_ParallelGCFailedAllocation: : doit () method.
The following methods are then called in order:
Parallelscavengeheap. CPP class failed_mem_allocate(size_t size) method.
The psScavenge. CPP class invoke(), invoke_no_policy() methods.
The invoke_no_policy() method has a piece of code that refers to AdaptiveSizePolicy.
If (UseAdaptiveSizePolicy) {... size_policy->compute_eden_space_size(young_live, eden_live, cur_eden, max_eden_size, false /* not full gc*/); ... }
After the main GC process is complete, UseAdaptiveSizePolicy is enabled to recalculate the Eden area size.
In the compute_eden_space_size method, there are several judgments.
The three targets correspond to adaptive IzePolicy:
- Compare to expected GC pause times.
- Compared to expected throughput.
- If desired, adjust the memory capacity.
if ((_avg_minor_pause->padded_average() > gc_pause_goal_sec()) || (_avg_major_pause->padded_average() > gc_pause_goal_sec())) { adjust_eden_for_pause_time(is_full_gc, &desired_promo_size, &desired_eden_size); } else if (_avg_minor_pause->padded_average() > gc_minor_pause_goal_sec()) { adjust_eden_for_minor_pause_time(is_full_gc, &desired_eden_size); } else if(adjusted_mutator_cost() < _throughput_goal) {assert(major_cost >= 0.0, “major cost is < 0.0”); Assert (minor_cost >= 0.0, “minor cost is < 0.0”); adjust_eden_for_throughput(is_full_gc, &desired_eden_size); } else {the if (UseAdaptiveSizePolicyFootprintGoal && young_gen_policy_is_ready () && avg_major_gc_cost () – > business () > = 0.0 &&avg_minor_gc_cost ()->average() >= 0.0) {size_t desired_sum = desired_eden_size + desired_promo_size; desired_eden_size = adjust_eden_for_footprint(desired_eden_size, desired_sum); }} Look at one of these judgments in detail.
if ((_avg_minor_pause->padded_average() > gc_pause_goal_sec()) || (_avg_major_pause->padded_average() > Adjust_eden_for_pause_sec ())) In adjust_EDen_FOR_PAuse_time (adjust_EDen_FOR_PAuse_time).
Gc_pause_goal_sec () method to obtain the expected pause time in ParallelScavengeHeap: : initialize () method, by reading the JVM parameter MaxGCPauseMillis acquisition.
Next, look at the CMS collector.
CMS initialization is generational in the initialize_generations() method of the cmsCollectorPolicy.cpp class.
if (UseParNewGC) { if (UseAdaptiveSizePolicy) { _generations[0] = new GenerationSpec(Generation::ASParNew, _initial_gen0_size, _max_gen0_size); } else { _generations[0] = new GenerationSpec(Generation::ParNew, _initial_gen0_size, _max_gen0_size); } } else { _generations[0] = new GenerationSpec(Generation::DefNew, _initial_gen0_size, _max_gen0_size); } if (UseAdaptiveSizePolicy) { _generations[1] = new GenerationSpec(Generation::ASConcurrentMarkSweep, _initial_gen1_size, _max_gen1_size); } else { _generations[1] = new GenerationSpec(Generation::ConcurrentMarkSweep, _initial_gen1_size, _max_gen1_size); }
Where _generations[0] represents the characteristics of the young generation, and _generations[1] represents the characteristics of the old age.
If you set different UseParNewGC and UseAdaptiveSizePolicy parameters, different policies will be applied to the younger generation and the older generation.
The CMS garbage collection entry is located in the do_collection method of the genCollectedheap.cpp class.
In the DO_collection method, each generation is resized after the main GC process is complete.
for (int j = max_level_collected; j >= 0; j -= 1) { // Adjust generation sizes. _gens[j]->compute_new_size(); }
This article mainly discusses the effect of AdaptiveSizePolicy on the young generation, mainly see ASParNewGeneration class, where the AS prefix is AdaptiveSizePolicy meaning.
If -xx :+UseAdaptiveSizePolicy is set, the young generation corresponds to ASParNewGeneration; otherwise, the young generation corresponds to ParNewGeneration.
In the compute_new_size() method of the ASParNewGeneration class, another method is called to adjust the Eden range size.
size_policy->compute_eden_space_size(eden()->capacity(), max_gen_size());
The method is similar to the Parallel Insane compute_eden_space_size method and adjusts memory size in three ways:
- adjust_eden_for_pause_time
- adjust_eden_for_throughput
- adjust_eden_for_footprint
To test, set the parameters -xx :+UseAdaptiveSizePolicy, -xx :+UseConcMarkSweepGC.
The CMS is expected to enable Adaptive Smart Policy, but according to jmap-heap results, it is not enabled, and the ratio between the three regions of the young generation is 8:1:1.
The jinfo command results show that AdaptiveSizePolicy is disabled even if -xx :+UseAdaptiveSizePolicy is set.
Because in JDK 1.8, if you use CMS, UseAdaptiveSizePolicy is set to false regardless of how it is set.
Check out the set_cms_AND_parnew_gC_flags method in the arguments. CPP class, It calls the disable_adaptive_size_policy method to set UseAdaptiveSizePolicy to false.
static void disable_adaptive_size_policy(const char* collector_name) { if (UseAdaptiveSizePolicy) { if (FLAG_IS_CMDLINE(UseAdaptiveSizePolicy)) { warning("disabling UseAdaptiveSizePolicy; it is incompatible with %s.", collector_name); } FLAG_SET_DEFAULT(UseAdaptiveSizePolicy, false); }}
If it is set in the startup parameter, an alert is displayed.
However, in JDK 1.6 and 1.7, the logic of the set_cms_and_parnew_gC_flags method is different from that in 1.8.
If the UseAdaptiveSizePolicy parameter is the default, it is forcibly set to false.
If set explicitly (complete), no changes are made.
// Turn off AdaptiveSizePolicy by default for cms until it is // complete. if (FLAG_IS_DEFAULT(UseAdaptiveSizePolicy)) { FLAG_SET_DEFAULT(UseAdaptiveSizePolicy, false); }
Therefore, we tried to use JDK 1.6 to build a Web application, plus -xx :+UseAdaptiveSizePolicy, -xx :+UseConcMarkSweepGC two parameters.
Check with jinfo-flag to see that both parameters are set to true.
Next, we use Jmap-heap to look at the heap memory usage and find that no information is displayed.
This is actually a Bug in earlier JDK versions.
This issue has been confirmed in all versions from 1.6.30 to 1.7 and fixed in JDK8.
Four, the problem summary
- Most applications today use JDK 1.8, the default recycle is Parallel Insane, and adaptive IzePolicy is turned on by default.
- AdaptiveSizePolicy dynamically adjusts the size of Eden and Survivor zone. It is possible to adjust the size of Survivor zone. When Survivor zone is reduced, some YGC surviving objects directly enter the old age. FGC is triggered by a gradual increase in the old age occupation, leading to a longer STW.
- The CMS garbage collector is recommended and Adaptive IzePolicy is disabled by default.
- Suggest adding – XX in the JVM parameters: + PrintGCDetails – XX: + PrintGCDateStamps – XX: XX: + PrintHeapAtGC – + PrintTenuringDistribution, Make the GC log more detailed to locate problems.