Reprint articles on common JVM tuning strategies, as well as some examples of JVM tuning encountered in my work.
This article is not original, feel the original actual combat operation is relatively strong, hereby reproduced, for future use, original address: juejin.cn/post/694980…
This article is really worth reading if you need to tune the JVM, or if you have a JVM-related problem that you don’t know how to solve.
preface
JVM tuning sounds fancy, but realize that it should be the last bullet in Java performance tuning.
I agree with Professor Liao xuefeng’s point of view. We should realize that JVM tuning is not a conventional means. Generally, the optimization program is the first choice for performance problems, and the last choice is to conduct JVM tuning.
Common tuning strategies
Again, it’s important to note that when you decide to tune your JVM, don’t fall into the trap of knowing that you can improve performance with an optimizer, and still prefer the optimizer.
Choose the appropriate garbage collector
- CPU core, then Serial garbage collector is your only choice.
- CPU multi-core, focus on throughput, then choose PS+PO combination.
- Multiple CPU cores, user pause times, JDK 1.6 or 1.7, CMS.
- If the CPU is multi-core, user pause time is concerned, JDK1.8 or higher, and the JVM has more than 6GB of available memory, choose G1.
Parameter configuration:
// Set Serial garbage collector to open: To be insane, use the Parallel Avenge collector. -xx :+UseParallelOldGC //CMS garbage collector (Old age) open -xx :+UseConcMarkSweepGC // Set G1 garbage collector on -xx :+UseG1GCCopy the code
Adjusting memory Size
Symptom: Garbage collection is very frequent.
Reason: If memory is too small, frequent garbage collection is required to free up enough space to create new objects, so the effect of increasing heap memory size is very obvious.
Note: If the number of garbage collections is very high and the number of objects collected at a time is very small, then it is not that the memory is too small, but that memory leaks are causing the objects not to be collected, resulting in frequent GC.
Parameter configuration:
// Set heap initial value directive 1: -xMS2G directive 2: -xx :InitialHeapSize=2048m // Set heap maximum directive 1: '-XMx2G' directive 2: -xx :MaxHeapSize=2048m // New generation memory configuration directive 1: -xmn512m command 2: -xx :MaxNewSize= 512MCopy the code
Set a pause time that matches your expectations
Symptom: Program indirection lag
Reason: If there is no precise pause time setting and garbage collector is throughput oriented, garbage collection times can be erratic.
Note: Do not set unrealistic pause times, as a shorter one means more GC cycles to collect the original amount of garbage.
Parameter configuration:
// The GC pause time, which the garbage collector tries to achieve by various means -xx :MaxGCPauseMillisCopy the code
Adjust the memory area size ratio
Symptom: GC is frequent in one area, normal in others.
Reason: If the corresponding region is running out of space and frequent GC is required to free up space, you can adjust the size ratio of the corresponding region if the JVM heap memory cannot be increased.
Note: It may not be a lack of space, but a memory leak that causes memory not to be reclaimed. This leads to frequent GC.
Parameter configuration:
// Survivor zone and Eden zone size ratio command: -xx :SurvivorRatio=6 // Ratio of Cenozoic and Eden zones is 1:6, and ratio of Cenozoic and old age in two S zones is 2:6 // -xx :NewRatio=4 // Indicates Cenozoic: old age = 1:4, that is, old age accounts for 4/5 of the whole heap; The default value = 2Copy the code
Adjusts the age of the object in the old age
Symptom: In the old days, GC is frequent, and many objects are collected each time.
Reason: if the rising generation age is small, the object of the new generation will soon enter old age, lead to more old s object, the object is in the following is a short period of time can be recycled, at this time of the object can be adjusted to upgrade generation age, let the object is not so easy to enter the old s to solve the problem there is insufficient space on the old s frequent GC.
Note: increasing the age of these objects in the Cenozoic may lead to an increase in GC frequency in the Cenozoic, and frequent copying of these objects may also lead to a longer GC time in the Cenozoic.
Configuration parameters:
The smallest GC / / into the old s age, young generation object into old s minimum age value, the default value 7 - XX: InitialTenuringThreshol = 7Copy the code
Adjust the criteria for large objects
Symptom: In the old days, GC is frequent, and many objects are reclaimed each time, and the volume of a single object is relatively large.
Cause: If a large number of large objects are allocated directly to the old age, the old age is easily filled and causes frequent GC, you can set the criteria for the object to enter the old age directly.
Note: These large objects entering the Cenozoic may increase the frequency and duration of Cenozoic GC.
Configuration parameters:
// The maximum number of objects that the new generation can hold will be allocated to the old generation. 0 indicates no limit. -XX:PretenureSizeThreshold=1000000Copy the code
Adjust the timing of GC triggering
Symptom: CMS, G1 often Full GC, program lag serious.
The reason: G1 and concurrent GC stage for that part of the CMS business working thread and garbage collection threads, also means that the garbage collection in the process of the business will generate a new thread object, so when the GC need to set aside part of the memory space to accommodate new objects, if this time memory space is not enough to accommodate new objects, The JVM stops concurrent collection and suspends all business threads (STW) to keep garbage collection running. At this point, you can adjust the timing of GC firing (for example, 60% in the old days) so that enough space can be set aside for objects created by business threads to have enough space to allocate.
Note: Triggering GC early increases the frequency of old GC.
Configuration parameters:
/ / how much proportion of the old s CMS collection, the default is 68%, if frequent SerialOld caton, should be down - XX: CMSInitiatingOccupancyFraction / / G1 mixed garbage collection cycles to be included in the old area of the occupancy rate threshold value set. The default occupancy rate of 65% - XX: G1MixedGCLiveThresholdPercent = 65Copy the code
Adjust the JVM local memory size
Symptom: The number, time, and objects collected by GC are normal. The heap memory is sufficient, but OOM is reported
The JVM also has an out-of-heap memory, which is also called local memory, but it does not trigger GC when the local memory is insufficient. If the local memory is insufficient, it will reclaim the local memory.
Note: In addition to the above phenomena, the exception message may be OutOfMemoryError: Direct Buffer Memory. In addition to adjusting the local memory size, you can also catch this exception and manually trigger the GC (system.gc ()).
Configuration parameters:
XX:MaxDirectMemorySize
Copy the code
JVM tuning instance
Here are some examples of JVM tuning that collates from the network:
The site responded slowly to a surge in traffic
1. The problem is speculated: the test speed is relatively fast in the test environment, but it slows down once it comes to production, so it is speculated that the business thread pauses due to garbage collection.
2. Positioning: In order to confirm the correctness of the prediction, we can see from the jstat-GC instruction that THE FREQUENCY of GC performed by JVM is very high and the gc takes a very long time. Therefore, the basic inference is that the high GC frequency leads to frequent pauses of business threads and slow response of web pages.
3. Solution: Due to the high traffic of web pages, object creation speed is very fast, resulting in the heap memory is easy to fill up and frequent GC, so the problem here is that the new generation memory is too small, so we can increase the JVM memory here, so the initial increase from 2G memory to 16G memory.
4, the second problem: after increasing the memory, it is true that the usual request is faster, but there is another problem, that is, the irregular intermittent card, and the single time of the card is much longer than before.
5, the problem is speculated: the previous optimization increased the memory, so it is speculated that the memory increased, resulting in a longer time for a single GC, leading to the indirect lag.
6. Positioning: Through the jstat-GC command, it is true that the number of FGC is not very high, but the time spent on FGC is very high. According to the GC log, the time spent on a single FGC reaches tens of seconds.
7. Solutions: Since the JVM uses the PS+PO combination by default, the PS+PO garbage tag and the collection phase are both STW, so as memory increases, the time required for garbage collection will be longer. Therefore, to avoid a long single GC, we need to change the collector for concurrent classes, because the current JDK version is 1.7. Therefore, THE CMS garbage collector was finally selected, and an expected pause time was set according to the previous garbage collection situation. After the launch, there was no lag problem on the website.
The OOM is generated when the background data is exported
Description of ** problem: the background system of ** company occasionally causes OOM exception and the heap memory overflow.
1. Because it was accidental, it was simply considered to be caused by insufficient heap memory for the first time, so the heap memory was unilaterally increased from 4G to 8G.
2, but the problem is still not solved, can only from the heap memory information, through the opened – XX: + HeapDumpOnOutOfMemoryError parameters to obtain the heap memory dump file.
3. VisualVM analyzes heap dump files, using VisualVM to see that the object that occupies the largest memory is String objects, originally wanted to trace String objects to find its reference place, but the dump file is too large, always stuck when tracing into the file. And the String object occupation is also quite normal, at the beginning did not identify the problem here, so from the thread information to find a breakthrough point.
4. Through analysis of threads, I first found several running business threads, followed up the business threads one by one, looked at the code, and found a method that caught my attention: export order information.
5, because the order information export method may have tens of thousands of data, the first thing is to query the order information from the database, and then the order information into Excel, this process will generate a large number of String objects.
6, in order to verify their conjecture, so ready to login back to test, the results in the process of testing found order button front didn’t do gray interaction events after click on the button, the button can always points, because the export order data is inherently very slowly, long after using personnel may find click page after all have no reaction, the result has been point, As a result, a large number of requests went into the background, and the heap generated a large number of order objects and EXCEL objects, and the method execution was so slow that none of these objects could be collected for a while, so the memory eventually ran out.
7, know that would be easy to solve the problem, finally do not have any JVM parameter adjustment, just on the front end of export orders button with the grey state, such as the back-end response later button to click on, and then reduce the query order information of fields necessary to reduce the volume of the object is generated, then the problem solved.
Single cache data is too large, resulting in system CPU overload
1. After the release of the system, it was found that the CPU had soared to 600%. The first thing to do after discovering this problem was to locate which application occupied the highest CPU.
2. If the application’s CPU is too high, it’s probably due to lock resource competition or frequent GC.
If GC is normal, then check from the point of view of the thread. First, print the GC information using the jstat-GC PID command. The result shows that the GC statistics obtained are obviously abnormal. So it’s obvious that frequent GC is causing CPU spikes.
The next step is to find out the cause of frequent GC, so you can either find out where objects are frequently created, or you can find out if there is a memory leak.
Jmap-dump dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump = jmap-dump
6, dump the heap memory information down, use visualVM for offline analysis, first from the most memory occupying object search, the result ranks third see a service VO occupies about 10% of the heap memory space, it is obvious that this object is problematic.
7, find the corresponding business through business object code, through the analysis of the code found a suspicious of this business object is to look at the news information generated by the object, because want to improve the efficiency of the query, so keep the news information to redis cache inside, each invocation information interface from the cache to get inside.
8, save the news to redis cache in this way is no problem, the problem is that the news of more than 50,000 data are saved in a key, so that each call query news interface will be from redis to take out more than 50,000 data, and then do filtering page out 10 returned to the front end. More than 50,000 pieces of data means more than 50,000 objects will be generated. Each object is about 280 bytes, and there are 13.3m objects for each object. This means that at least 13.3m objects will be generated as long as the news information is checked once. Such large objects are allocated directly to the old generation, so that a 2G old memory can fill up in a matter of seconds, triggering GC.
After 9, to know what is the problem then would be easy to solve, the problem results from a single cache is too large, so only need to minish the cache line, here only need to turn the page cache to the granularity of cache line, 1 per 10 key cache as returned to the front pages of data, so every time the query news information will only be took 10 data from the cache, This problem is avoided.
CPU often 100% problem location
Problem analysis: A high CPU must be a program that occupies CPU resources for a long time.
1, so first need to find out which to use the highest CPU.
Top Lists the resource usage of each system process.Copy the code
2, and then according to the corresponding process to find which thread occupies the highest CPU.
Top-hp process ID Lists resources occupied by threads in the processCopy the code
3. Find the thread ID and print the stack information of the thread
Printf "%x\n" PID converts the thread ID to hexadecimal. Jstack PID Displays information about all threads in the process and finds information about the thread whose ID was converted to hexadecimal in the last step.Copy the code
4. Finally, locate the specific business method according to the stack information of the thread and find the problem from the code logic.
If the thread is in the watting state for a long time, watch watting on XXXXXX, indicating that the thread is waiting for the lock, and then locate the thread holding the lock based on the address of the lock.Copy the code
Locate the memory overload problem
If a Java process creates a large number of objects, the garbage collection can’t keep up with the speed of object creation, or the object can’t be collected due to a memory leak.
1, first observe the situation of garbage collection
Jstat -gc PID 1000 Displays the number of GC counts and time every second. Jmap - histo PID | head - 20 view heap memory footprint the largest first 20 object types, object preliminary view which takes up memory.Copy the code
If each GC is frequent and the amount of memory collected is normal, it is because the object creation speed is fast and the memory usage is always high. If a very small amount of memory is reclaimed each time, it is likely that memory leaks have prevented it from being reclaimed.
2. Export the heap memory file snapshot
Jmap-dump :live,format=b,file=/home/myheapdump.hprof PID Dumps heap memory information to a file.Copy the code
3. Use visualVM to perform offline analysis on dump file, find the object that occupies high memory, and then find the location of the business code that creates the object, and locate specific problems from the code and business scenarios.
Data analysis platform system frequently Full GC
The platform mainly conducts regular analysis and statistics on users’ behaviors in App, supports report export and uses CMS GC algorithm.
The data analyst found in the use of the system page open often lag, through the jstat command found that the system after each Young GC about 10% of the surviving objects into the old age.
This is because the Survivor region space is set too small. After each Young GC, the surviving objects cannot be placed in the Survivor region and enter the old age in advance.
The Survivor zone is enlarged so that the Survivor zone can accommodate objects that survive the Young GC, and the object goes through the Survivor zone many times before reaching the age threshold before entering the old age.
After adjustment, the live objects that enter the old age after each Young GC run stably at only a few hundred Kb, and the Full GC frequency is greatly reduced.
Service interconnection gateway OOM
The gateway mainly consumes Kafka data, performs data processing calculations and forwards the data to another Kafka queue. The system runs for a few hours and then restarts for a few hours.
Export heap memory through JMAP, and find out the reason in Eclipse MAT tool analysis: the code will be a service Kafka topic data asynchronous log printing, the service data volume is large, a large number of objects piled up in the memory waiting to be printed, resulting in OOM.
The authentication system is frequently Full GC for a long time
The System provides a variety of account authentication services to the outside world, but it is often found that the service is not available when using the System. Through the monitoring platform of Zabbix, it is found that the System is frequently Full GC for a long time, and the old heap memory is usually not fully occupied when triggered. It is found that system.gc () is called in the business code.
Welcome everyone to like a lot, more articles, please pay attention to the wechat public number “Lou Zai advanced road”, point attention, do not get lost ~~