This is the third article in the JVM optimization series:
- JVM optimization – garbage collection
- JVM optimization – Monitoring tool
JVM performance tuning involves many trade-offs, often at the same time, and requires consideration of all aspects. But there are some basic theories and principles that understanding and following will make your performance tuning task much easier. To better understand what this article is about. You should already know and follow the following:
1. You know about the JVM garbage collector
2. Familiar with common tools for JVM performance monitoring
3. Able to read GC logs
4. Make sure that not tuning for tuning’s sake, JVM tuning does not solve all performance problems
These contents have been introduced in the previous two articles. If you don’t know anything about them, you can click the above link to review them. If you don’t know anything about them, it is not recommended to read this article.
This article is based on THE JVM performance tuning, combined with the JVM parameters of the application tuning, the main content has the following aspects:
1. General flow of JVM tuning
2. Performance metrics to focus on for JVM tuning
3. Principles for JVM tuning
4. Tuning Strategies & Examples
Level of performance tuning
In order to improve the system performance, we need to optimize the system from various angles and levels. The following are several levels that need to be optimized.
As you can see from the above, there are several other layers that need to be addressed in addition to JVM tuning, so tuning for a system is not just about JVM tuning, but tuning for the system as a whole to improve system performance. This article focuses only on JVM tuning, but we’ll cover the other aspects later.
Prior to JVM tuning, we assume that architectural and code tuning for a project has already been done or is optimal for the current project. These two are the foundation of JVM tuning, and architectural tuning is the one that has the most impact on the system. We can’t expect an application with a flawed system architecture or a code level optimization without exhaustion to make a qualitative leap through JVM tuning. It’s impossible.
In addition, before tuning, you must have a clear performance optimization goal and then find the performance bottleneck. Further optimization for bottlenecks requires stress and benchmarking of the application, using various monitoring and statistical tools to verify that the optimized application has achieved the relevant goals.
Second, JVM tuning process
The ultimate goal of tuning is to allow applications to carry more throughput with minimal hardware consumption. The SAME applies to JVM tuning, which focuses on the collection performance optimization of the garbage collector to enable applications running on virtual machines to use less memory and latency to achieve greater throughput. Of course, here the least is the best choice, not less is better.
1. Performance definition
To find and evaluate a performance bottleneck, we first need to know the performance definition. For JVM tuning, we need to know the following three definition attributes as a basis for evaluation:
- Throughput: One of the most important metrics is the highest performance that the garbage collector can support for an application, regardless of the pause times or memory consumption caused by garbage collection.
- Latency: This metric is the reduction of pauses due to garbage collection or the complete elimination of pauses due to garbage collection to avoid application runtime jitter.
- Memory footprint: The amount of memory that the garbage collector needs to run smoothly.
The performance improvement of one of the three attributes is almost at the cost of the performance loss of the other one or two attributes. The performance of one or two attributes is important to the application and should be determined based on the business requirements of the application.
2. Performance tuning principles
There are three principles to keep in mind during the tuning process to make it easier to tune garbage collection to meet the performance requirements of your application.
1. Principle of MinorGC collection: Each MinorGC should collect as many garbage objects as possible. To reduce the frequency of Full GC occurrences in your application.
2. GC memory maximization principle: When dealing with throughput and latency issues, the more memory the garbage processor can use, the better garbage collection will be, and the smoother the application will be.
3. GC tuning 3 Choose 2 rule: Among the performance attributes, throughput, latency, and memory usage, we can only choose two of them for tuning, not all three.
3. Performance tuning process
This is the basic flow for JVM tuning of an application, as you can see, which is a process of iterating over the configuration based on performance test results. Each previous step may go through multiple iterations before each system requirement metric is reached. Sometimes, in order to achieve a certain indicator, it may be necessary to adjust the previous parameters many times, and then need to re-test all the previous steps.
In addition, tuning generally starts from meeting the memory usage requirements of the program, followed by the time delay requirements and finally the throughput requirements. It is based on this step to continuously optimize, and each step is the basis for the next step, which cannot be retrograde. Below are detailed examples of each step.
In terms of running mode for the JVM, we went straight to server mode, which is officially recommended after jdk1.6.
For the garbage collector, we used parallel collector directly as the default in JDK1.6-1.8 (parallelGC for the new generation and parallelOldGC for the old generation).
Three, determine memory usage
There are two things we need to know before we can determine our memory footprint:
- The running phase of the application
- JVM memory allocation
1. Operation stage
The running stages of an application can be divided into the following three stages:
1. Initialization phase: The JVM loads the application and initializes the application’s major modules and data.
2. Stable stage: The application runs for most of the time at this time, and all performance parameters are stable after the pressure test. The core function is executed, preheated by JIT compilation.
3. Summary stage: In the final summary stage, some benchmark tests are carried out to generate a response strategy report. We don’t have to focus on this phase.
To determine the size of memory footprint and active data, we should do this during the stable phase of the program, not at the beginning of the project. To determine this, we first look at the following JVM memory allocation.
2. JVM memory allocation & parameters
The main space in the JVM heap is composed of the above Cenozoic, old generation, and permanent generation. The total heap size = Cenozoic size + old generation size + permanent generation size. Instead of going into too much detail here, let’s look at some JVM command parameters that specify heap sizes. If the following parameters are not specified, the vm automatically selects an appropriate value and adjusts based on the system overhead.
generational |
parameter |
describe |
The heap size |
-Xms |
Initial heap size, default to 1/64 of physical memory (<1GB) |
-Xmx |
Maximum heap size. By default (the MaxHeapFreeRatio parameter can be adjusted) when free heap memory is greater than 70%, the JVM reduces the heap to the minimum limit of -xms |
|
The new generation |
-XX:NewSize |
Initial value of Cenozoic spatial size |
-XX:MaxNewSize |
Cenozoic maximum space size |
|
-Xmn |
Space size of the new generation, the size here is (Eden +2 survivor space) |
|
The permanent generation |
-XX:PermSize |
The initial value & minimum value of the permanent generation space |
– XX: MaxPermSize |
The maximum number of permanent generations |
|
The old s |
The size of the old age is set implicitly according to the size of the new generation |
|
Initial = -xmx minus -xx: the value of NewSize |
||
Minimum value = -xmx value minus -xx :MaxNewSize value |
When setting up, if you are concerned about performance overhead, try to set the initial value and maximum value of the permanent generation to the same value, because the sizing of the permanent generation requires FullGC.
3. Calculate the active data size
Calculating the active data size should follow the following process:
As mentioned earlier, the active data should be based on looking at the long life and the amount of space an object occupies in the Java heap during the application stabilization phase.
Computing active data should ensure that the following conditions occur:
1. During the test, the startup parameters of the JVM are default parameters, not manually set.
2. Ensure that the application is in a stable phase when Full GC occurs.
Starting with the JVM default parameters is to observe the memory usage required by the application during the stabilization phase.
What is a stable phase?
It is necessary to generate enough pressure to find a load that is similar to the peak compliance state of the application and production environment, after which a stable state is maintained after the peak. Therefore, in order to achieve the stability stage, the stress test is essential. The specific details of how to apply the stress test will not be explained in this paper, and there will be a special introduction of the length later.
When you determine that your application is in a stable phase, watch your application’s GC logs, especially the Full GC logs.
GC log instruction: -xx :+PrintGCTimeStamps -xx :+PrintGCDetails -xloggc :<filename>
GC logging is the best way to gather the information needed for tuning. Even in a production environment, GC logging can be turned on to locate problems with minimal performance impact and rich data.
FullGC logs must be present. If not, you can force a call with a monitoring tool or trigger it with the following command
jmap -histo:live pid
When FullGC is triggered in the stable phase, we usually get the following information:
From the above GC logs, we can probably analyze the heap usage and GC time of the entire application at the time of fullGC. Of course, to be more accurate, we should collect more times to obtain an average. Or use the longest FullGC for estimation.
In the figure above, after fullGC, the old age space occupies 93168KB (about 93MB), which is regarded as the active data of the old age space.
The allocation of other heap space is based on the following rules.
space |
The command parameter |
Suggested expansion factor |
java heap |
– Xms and -xmx |
Old age space occupied after 3-4 times FullGC |
The permanent generation |
-XX:PermSize -XX:MaxPermSize |
1.2-1.5 times FullGc after permanent belt space occupancy |
The new generation |
-Xmn |
1-1.5 times FullGC after old age space occupancy |
The old s |
Old age space occupied after 2-3 times FullGC |
Based on the above rules and the FullGC information in the figure above, we can now plan the heap space of the application as follows:
Java heap space: 373Mb (= 93168KB *4)
New generation space :140Mb(= Old space 93168KB *1.5)
Permanent generation space :5Mb(= permanent generation space 3135KB x 1.5)
Old space: 233Mb= heap space – New generation space =373Mb-140Mb
The corresponding application startup parameters should be:
java -Xms373m -Xmx373m -Xmn140m -XX:PermSize=5m -XX:MaxPermSize=5mCopy the code
Fourth, delay tuning
After determining the active data size of the application, we need to perform deferred tuning, because the heap memory size at this time, the latency requirements cannot meet the needs of the application, need to debug based on the application situation.
During this step, we may again optimize the heap size configuration, evaluate the duration and frequency of GC, and whether we need to switch to a different garbage collector.
1. System latency requirements
Before tuning, we need to know what the latency requirements of the system are and what the corresponding latency tunable metrics are.
- Average acceptable latency for an application: This is compared to the measured Minor GC duration.
- Acceptable Minor GC frequency: The frequency of the Minor GC is compared to the tolerable value.
- Maximum acceptable pause time: The maximum pause time is compared to the duration of the worst-case FullGC.
- The maximum acceptable frequency of pauses: basically the frequency of FullGC.
Of the above, average and maximum pause times are the most important for user experience, so you can pay more attention to them.
Based on the above requirements, we need to collect the following data:
- Duration of MinorGC;
- MinorGC counts;
- The worst duration of FullGC;
- In the worst case, the frequency of FullGC;
2. Optimize the size of the New generation
For example, in the GC log above, we can see that the average duration of Minor gc is 0.069 seconds and MinorGC frequency is 0.389 seconds.
If the average stagnation time of our system is set at 50ms, the current 69ms is obviously too long and needs to be adjusted.
We know that the larger the Cenozoic space, the longer and lower the frequency of the Minor GC.
If you want to reduce its duration, you need to reduce its space size.
If you want to reduce its frequency, you need to increase its space size.
To minimize the impact of changing the size of the Cenozoic on other regions. While changing the size of Cenozoic space, try to keep the size of old space.
For example, the size of Cenozoic space was reduced by 10% this time, and the size of old age and durian should be kept unchanged. After the first step is optimized, the parameters are as follows:
Java-xms359m-xmx359m-xmn126m-xx :PermSize= 5m-xx :MaxPermSize=5m The size of the new generation changes from 140m to 126, and the heap size changes with the change, while the old age is unchanged.Copy the code
3. Optimize the size of the old age
As in the previous step, gc log data needs to be collected before optimization. This time we focus on the duration and frequency of FullGC.
In the image above, we can see that
Average FullGC frequency =5.8s Average FullGC duration =0.14sCopy the code
Is there a way to evaluate if FullGC logs are not available?
We can calculate by object lift rate.
Object lift rate
For example, in the boot parameter above, our old age size =233Mb.
How long it takes to fill the 233Mb of free space in the old generation depends on the rate of improvement from the new generation to the old generation.
Old generation usage per upgrade = Java heap usage after each MinorGC minus new generation usage after MinorGC
Object lift rate = average (usage per lift in old age) divided by old age space
Given the object lift rate, we can figure out how many minorgcs it takes to fill the old chronospace, which is roughly the time it takes to complete a fullGC.
Such as:
Above:
After the first minor GC, the old GC: 13740KB - 13732KB = 8KB Old space: 48143KB - 17913KB = 30230KB After the fourth minor collector Old space :62112kb - 17917kb =44195kbCopy the code
MinorGC upgrade rate per game in the old days
4481KB 12333KB Between the second and first minorGC 13408KB between the fourth and third minorGC 13965KB between the fifth and fourth minorGCCopy the code
We can calculate:
The average lift per minorGC is 12211KB, which is about 12Mb. The average minorGC frequency is 213ms/ lift rate = 12211KB /213ms= 57KB /ms. The old space is 233Mb, which takes about 233*1024/57=4185ms about 4.185s to fill.Copy the code
The expected worst-frequency duration of FullGC can be estimated in the above two ways. You can adjust the frequency of FullGC by resizing the old age. Of course, if FullGC lasts too long to meet the worst-latency requirements of the application, you need to switch garbage handlers. How to switch, for example to CMS, will be covered in the next article, and the tuning method for CMS will be slightly different.
Five, throughput tuning
After the long tuning process, the final tuning step comes, which tests the throughput of the above results and fine-tunes them.
Throughput tuning is primarily based on the throughput requirements of the application, and the application should have a comprehensive throughput metric derived from the requirements and testing of the real application. The tuning process is complete when an application’s throughput meets or exceeds the expected throughput goals.
If after tuning is still unable to meet the application’s throughput goal, need to review the throughput requirements, assess current throughput and target gap is huge, if at about 20%, can modify the parameters, increase the memory, again from the debugging, if big you need to consider from the whole application level, design, and goals are consistent, reassess throughput goals.
For The garbage collector, The goal of performance tuning to improve throughput is to minimize or minimize FullGC or stop-the-world compressed garbage collection (CMS), both of which result in reduced application throughput. Recycle as many objects as possible during the MinorGC phase to prevent objects from being promoted too quickly to the old age.
Six, the last
Plumbr conducted a survey of specific garbage collector usage, using data from 84,936 cases. The concurrent collector (CMS) was used the most in the 13% of cases where the garbage collector was explicitly specified; But most cases do not choose the best garbage collector. The proportion is about 87 percent.
JVM tuning is a systematic and complex work, and the automatic tuning under the JVM has been done well. Some basic initial parameters are enough to ensure that normal applications run stable. For some teams, application performance may not be a high priority, and the default garbage collector is sufficient. Tuning depends on your situation.
—————————————————————————–
For more interesting and original technical articles, scan our public account.
Focus on personal growth and game development to promote the growth and progress of the domestic gaming community.