Series columns: SERIES of JVM columns
Series of articles:
-
JVM performance Tuning (1) — JVM memory model and class loading operation mechanism
-
JVM performance Tuning (2) — garbage collection algorithms and garbage collectors
-
JVM performance tuning (3) – Analyze garbage collection policies through GC logs
Memory tuning objectives
YoungGC is triggered when Eden is full and no new objects can be allocated. Moreover, the replication algorithm adopted by the new generation is highly efficient, and there are very few surviving objects of the new generation. As long as these few surviving objects are quickly marked, moved to Survivor area, and then recovered Eden area quickly, it is very fast. A YoungGC usually takes milliseconds or tens of milliseconds, so the impact of the new generation GC on the system is not very significant.
This is not the case with old-generation GCS, which are often time consuming, especially when old-generation GCS (FullGC/OldGC) are triggered frequently. Both CMS garbage collector and G1 garbage collector, for example, go through initial marking, concurrent marking, re-marking, concurrent cleaning, and defragmentation. The process is very complex, and STW takes longer, as does G1 garbage collector. FullGC is generally at least 10 times slower than YoungGC.
There are four opportunities for the new generation to enter the old age: the age of the object exceeds the threshold, the large object directly enters the old age, the dynamic age judgment rule, and too many surviving objects after the new generation GC cannot be put into Survivor zone. It is inevitable to enter the old age when the object is too old, because this part of the object is generally the object of long-term survival, which needs to enter the old age. The last three are generally due to the improper memory allocation or some unreasonable parameter Settings resulting in the object into the old age, and basically are the objects with a short life cycle, and then occupy the old age, triggering the old age GC.
Therefore, the biggest problem of jVM-based systems is that due to improper memory allocation and parameter Settings, objects frequently enter the old age, and then FullGC is frequently triggered. As a result, the system will be stuck for hundreds of milliseconds or even seconds at intervals, which will be very bad for user experience.
Therefore, the most important goal of JVM tuning is to tune memory allocation and then properly optimize the memory size of the new generation, old generation, Eden, and Survivor regions. Then try to optimize parameters to avoid objects of the new generation entering the old age, try to keep objects in the new generation to be recycled, even without FullGC.
Estimate the memory running model
In setting up the JVM memory, is not a fixed standard, the fixed parameter, but have a more general method of analysis and optimization is to estimate the system according to actual business it business, visits, to calculate the system concurrency value of a second, and then calculate every second request to the memory space occupied, The JVM memory running model for the entire system is then calculated. Then tune the various parameters so that garbage objects are collected in the younger generation as much as possible and avoid frequent Full GC.
Let’s imagine a payment system with a million transactions per day and see how to estimate a reasonable memory model.
Step 1: Analyze the core business and pressure of the system
First of all, it is necessary to analyze where the core pressure of a system is concentrated. In the payment system with millions of transactions per day, the most core business should be the payment process. Each payment request will create at least one order object, this order object contains the payment user, channel, amount, goods, time and other information.
There are many aspects of payment system stress, including high concurrent requests, high performance processing requests, and large order data storage, but at the JVM level, the biggest stress of the payment system is the frequent creation and destruction of 1 million payment order objects per day in the JVM.
Step 2: Estimate how many requests will be processed per second
To set a reasonable JVM memory size, start by estimating how many requests the core business has per second. Assuming 1 million payment orders per day, the average user transaction is concentrated in the peak period of the day, which is three to four hours in the middle of the day or evening, then the average is close to 100 transactions per second.
Assuming that the payment system is deployed with three machines, this is an average of 30 payment requests per machine.
Step 3: Estimate how long a request takes
The user initiates a payment request, and the back end will create an order object, do some association verification, write to the database, and some other operations, such as calling the third-party payment platform. Suppose a payment request takes 1 second, then 30 order objects are generated every second, and after 1 second those 30 objects become junk.
Step 4: Estimate how much memory is being requested per second
We can calculate it based on the instance variable type in the order class, such as Integer (4 bytes), Long (8 bytes), and String (length). Suppose an order class is calculated in 20 fields, with a large rough estimate of 500 bytes. So 30 payment requests per second is 30 * 500B ≈ 15KB.
However, in fact, in the process of each request, in addition to the order object, a large number of other types of objects are often created, such as some other associated query objects, objects created by the Spring framework, etc. At this time, it is generally necessary to enlarge a single object by 10-20 times.
And the payment system will also include some other services, such as transaction records, reconciliation management, settlement management, and so on, and expand five to ten times. This works out to roughly 1M objects per second.
But these are not absolute, for some special systems, such as report system, data computing system, each request to create objects may be more than 10 M, then the auxiliary creation of these objects may not be so big, at this time can be considered to ignore.
Step 5: Estimate the size of the meta-space
Meta space is mainly to store type information, there is not much good tuning, generally set a few hundred M is enough, such as 256M.
Step 6: Estimate stack memory size
Thread stack is mainly used to store method parameters, local variables and other information during runtime. Generally, setting 1M is enough. For example, if the system has 100 threads, the virtual stack will consume at least 100M memory.
Step 7: Memory allocation
The million-transaction-per-day payment system deplores three machines, each of which handles 30 requests per second. Assuming that the deployed machine is 4G with 2 cores, but the machine itself still needs some memory to run, then the JVM is only 2G. Considering the space to be reserved for meta-space and virtual machine stack, assuming that the heap memory is only 1G, 500M for the new generation and 500M for the old generation, then the Eden area will occupy 400M. Two Survivor zones each occupy 50M.
In this way, the memory parameter Settings are as follows:
-Xms1G -Xmx1G -Xmn500M -Xss1M -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=256M -XX:SurvivorRatio=8
Copy the code
Step 8: System operation model
After the above analysis, combined with the machine configuration, we can roughly estimate the memory operation model of the system. Using the memory Settings above, 30 requests are received per second and 30 order-related objects are created in Eden area; A new object of 1M will be generated, and a garbage object of 1M will be generated after the request processing is completed 1 second later. After 400 seconds, just a few minutes, Eden will be filled and the Young GC will be triggered. YoungGC copies the surviving objects into the FromSurvivor zone, then recycles the new generation of garbage objects, and so on. You can also estimate how often FullGC/OldGC is triggered if Survivor zones are improperly allocated and the surviving object ages. The main thing is to estimate the GC frequency, and then you can tune the memory.
Step 9: Model estimation for instantaneous pressure increase
If there is a rush activity or some sudden performance jitter, the pressure can suddenly increase by a factor of 10 or more, which can be thousands of payment requests per second, with memory usage of at least 10M per second. At this time, each payment request may not be processed in one second, because the pressure suddenly increases, the system memory, thread resources, CPU resources will be full, resulting in the system performance decline, so some payment requests may take several seconds, then there may be dozens of M objects will occupy the heap memory for several seconds.
Still deploy as 2 core 4GB machine, heap memory set 1G, new generation 500M, 400M in Eden zone, Survivor50M. It takes only a few seconds for Eden to fill, and trigger YoungGC. However, as the pressure increases and some requests take several seconds, dozens of M objects will be copied to Survivor zones without being recycled.
In this case, there are several scenarios. First, an object with tens of megabytes of survivability may be 50 megabytes larger than the Survivor zone, and then it will be copied directly to the old age. Then, if it is smaller than the Survivor area and 50% larger than the Survivor area, some objects may be copied to the old age through the judgment of dynamic age rule next time.
Then, after about 10 YoungGC times, which is only a few hundred seconds later, the old generation will also be full, which may trigger FullGC. During FullGC, the system will stop running and cannot process any requests. Moreover, in this case, most of the old generation are garbage objects, and the collection performance is very low.
YoungGC tuning
Properly allocate memory to reduce YoungGC frequency
According to The previous calculation, under normal circumstances, if 1 gb of heap space is allocated, YoungGC will trigger frequently, and The new generation recycling will be efficient, but also Stop The World, and The system will be stopped frequently if The heap is frequently YoungGC.
We can consider increasing the memory of the new generation and using a machine with more memory. For example, if we use 4 cores and 8 GIGABytes, then the JVM will allocate 4 gigabytes of heap space, 1.5 GIGABytes for the new generation, 1.5 gigabytes for the old generation, about 1.2 GIGABytes for Eden region, and 150 MB for Survivor region. It takes about half an hour for Eden to fill up, and then a YoungGC is triggered, and 99% of the YoungGC objects are garbage. Mark-copy is pretty good at doing YoungGC, which greatly reduces the YoungGC frequency.
If your business volume is larger, you can also consider deploying several machines horizontally so that there are fewer requests and less stress to assign to each machine.
Ensure sufficient Survivor space
If there is a big rush, the instantaneous pressure increases, more than 10M objects will be generated every second, and then tens or even hundreds of megabytes of objects will survive for more than a few seconds. According to the analysis of the previous memory model, the Eden area will be full in about 2 minutes, and then the surviving tens of megabytes of objects will be copied to the Survivor area. If these surviving objects are larger than 150M, they will enter the old age directly. If it is smaller than 150M but larger than 75M, dynamic age judgment may frequently cause some objects with short life cycles to enter old age. The old generation will trigger FullGC frequently if it fills up quickly.
One of the most important aspects of Cenozoic generation tuning is to ensure sufficient Surivivor space as far as possible to avoid large numbers of objects entering the old age due to insufficient Survivor space in YoungGC. In this way, FullGC can be greatly reduced or even eliminated.
In fact, the life cycle of most objects in such business systems is very short. Objects that live for a long time do not occupy much memory. We should try to keep objects in the new generation. Therefore, we can increase the memory ratio of the younger generation to 2 GB, and the older generation to 1 GB, so that the Eden region accounts for 1.6 GB, and the Survivor region accounts for 200 MB, which basically ensures that the surviving objects in each YoungGC can be placed in the Survivor region. Alternatively, the -xx :SurvivorRatio parameter can be used to adjust the ratio of Eden and Survivor zones so that Survivor zones contain as many objects as possible after each YoungGC.
Optimize the age threshold of the object
There is also a situation where the new generation of objects will enter the old age, where some objects will be promoted to the old age after surviving 15 consecutive collections. We can also make adjustments based on the actual business model. For example, in the scene of great Promotion, the new generation is divided into 2G and Eden is divided into 1.6G, and YoungGC is triggered every 3 minutes or so. Then 15 times of back and forth replication in the new generation will take about 45 minutes to enter the old age. Most objects have a very short life cycle. Objects that can survive for more than a few minutes should be core business components such as Controller, Service and Repository in programs that need to survive for a long time.
Therefore, for this type of system, it is desirable to get the long-lived objects into the old age as soon as possible, rather than to get into the old age after the Cenozoic generation replicates back and forth 15 times. You can lower the age threshold with the -xx :MaxTenuringThreshold parameter, for example, to 5.
Optimize the large object threshold
Another case is that the large object will directly enter the old age, the large object threshold is generally set to 1M is enough, generally speaking, there is rarely an object more than 1M. If we determine that large objects with short lifetimes are frequently created in the system, we can increase this threshold appropriately to prevent them from aging.
Can through the parameter – XX: PretenureSizeThreshold = 1 m threshold to set up the big object.
Select the garbage collector
The new generation of garbage recyclers are Serial, ParNew, ParallelScavenge, and the older generation of CMS recycling can only be specified as ParNew.
With the ParNew collector, the idea of tuning is basically the first 4 points, allocate the new generation of memory properly, ensure that objects can be placed in the Survivor zone, avoid the old age, basic YoungGC no problem.
The JVM parameter
The JVM parameters are tuned as follows:
-Xms3G
-Xmx3G
-Xmn2G
-Xss1M
-XX:SurvivorRatio=8
-XX:MetaspaceSize=256M
-XX:MaxMetaspaceSize=256M
-XX:MaxTenuringThreshold=5
-XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
Copy the code
FullGC tuning
The CMS garbage collector is mainly used in the old days. We will take a look at the optimization of various parameters of the CMS garbage collector based on the above business model.
How often do YOU trigger FullGC
On top of the previous optimizations for the younger generation, we also need to estimate how often the system will trigger Full GC, which will determine whether we should focus on optimizations for the older generation. For example, if a Full GC is estimated to be performed every one or two hours or more after the peak hour has passed, the impact of performing Full GC on the system is minimal.
Let’s first look at the conditions that trigger the Full GC:
- There was one before JDK6
-XX:HandlePromotionFailure
Allocate guarantee failure parameter, is each time before YoungGC will judge whether the size of the old age available space is greater than the total size of the new generation object, according to the previous configuration, the new generation will have a maximum of 1.8G object, the old maximum only 1G, that is not every time YoungGC will guarantee failure. However, after JDK1.6, there is no such parameter and no such judgment. - Before each YoungGC check whether the available space of the old age is larger than the average object size of the old age after the previous YoungGC, according to the previous configuration, basically the object is recycled in the new generation, the average object size of the past into the old age is actually very small, this condition will not trigger.
- FullGC can be triggered if the YoungGC is larger than the Survivor area and the copy is copied to the old age, but there is not enough space for the old age. However, after the optimization of the young generation, this probability is very small.
- CMS has a
92%
When the old age exceeds 92%, the old age garbage collection will be automatically triggered. This parameter can be passed-XX:CMSInitiatingOccupancyFraction
Settings.
When the system is running, some objects may slowly enter the old age, but after the optimization of the new generation, the object is promoted to the old age slowly, and it may take several hours to trigger a FullGC. If you miss the peak, FullGC won’t have much of an impact.
CMS Concurrency failure
When the old GC is triggered, the old GC is almost full, the CMS has a threshold of 92%, so the old GC has about 100M space left. If the old GC is Concurrent collection, the new objects promoted to the old GC exceed 100M, it will cause Concurrent Model Failure. When The concurrency fails, it enters The Stop The World state and switches to The Serial Old collector, which is a single thread collector and very inefficient.
However, after the tuning of the young generation, the speed of the object advancing to the old age is very slow, and the average size of the object advancing to the old age is very small, so the probability of the object advancing to the old age of more than 100M is also very small in the concurrent collection. This case, we usually don’t have to go to adjust – XX: CMSInitiatingOccupancyFraction the value of the parameter.
Frequency of defragmentation after CMS recovery
When FullGC is complete, The default is to defragment memory once at a time, which also stops The World. But according to the previous analysis, in fact, we do not have to adjust this part of the parameter.
CMS by XX: + UseCMSCompactAtFullCollection parameters on GC memory fragments after finishing process, through – XX: CMSFullGCsBeforeCompaction set how much time after FullGC memory defragmentation, the default is 0, After every FullGC.
Generally don’t have to adjust CMSFullGCsBeforeCompaction value, improve the value, means many times after FullGC defragmentation of memory, so a few times before FullGC will lead to a lot of memory fragments produced, not finishing will lead to more frequent trigger FullGC, Because while there’s a lot of space available after FullGC, there’s not a lot of continuous space available. So it’s usually set to 0 and defragment after each FullGC.
CMS improves FullGC performance
CMS also has two parameters that can further optimize FullGC performance and reduce FullGC time.
-
– XX: + CMSParallelInitialMarkEnabled: open the parameters in the CMS “initial tag” stage of the garbage collector open multi-threaded concurrent execution, reduce the time of the STW, further reduce FullGC time.
-
-xx :+CMSScavengeBeforeRemark: This parameter will be executed as soon as possible before the CMS relabeling phase. CMS relabeling is also STW, so if you execute a YoungGC before relabeling, which will recycle some unreferenced objects in the young generation, you can scan fewer objects during CMS relabeling, which will improve CMS relabeling performance and reduce the time of this phase. (Note: both concurrent marking and relabelling scan the entire heap of objects, because even if the object is old, it may be referenced by the new generation of objects.)
To disable the System. The gc
In the code, we could suggest that the JVM perform a FullGC through system.gc (), but the JVM may not. However, this method should not be called casually, and manual GC is basically prohibited, as it is possible to trigger FullGC frequently if not properly used.
For this, we can generally disable the display of GC execution by adding the -xx :+DisableExplicitGC parameter, that is, not allowing GC to be triggered by system.gc.
Meta-space GC optimization
FullGC is triggered not only when the old age is full, but also frequently when the metadata space is incorrectly configured or too many dynamically loaded classes.
It is generally possible to dynamically generate classes to place in the Metaspace section:
- For example, bytecode frameworks such as ASM, CGLib, Javassist, and so on are used to create proxy classes.
- And when called by reflection, such as
Method method = XXX.class.getDeclaredMethod(); method.invoke(target, args);
Classes are dynamically generated after a certain number of reflection calls.
2, If FullGC is caused by metaspace, you can add -xx :+TraceClassLoading, -xx :+TraceClassLoading, and -xx :+TraceClassLoading to see which classes are loaded and unloaded frequently.
There are two parameters that control the size of a meta-space:
-XX:MaxMetaspaceSize
: Sets the maximum value of the metadata space. The default value is -1, which is not limited by the local memory size-XX:MetaspaceSize
: Specifies the initial size of the meta-space. When this value is reached, garbage collection is triggered for type offloading, and the collector adjusts the value. If a large amount of space is freed, the value is reduced appropriately. If you free up very little space, then in no more-XX:MaxMetaspaceSize
In the case of, appropriately increase the value.
The JVM parameter
-Xms3G
-Xmx3G
-Xmn2G
-Xss1M
-XX:SurvivorRatio=8
-XX:MetaspaceSize=256M
-XX:MaxMetaspaceSize=256M
-XX:MaxTenuringThreshold=5
-XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=92
-XX:CMSWaitDuration=2000
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+DisableExplicitGC
Copy the code
Large memory machine GC tuning
Scenarios with large memory machines
Through the optimization of the payment system, YoungGC frequency is a few minutes, Full GC will not happen. However, such a big promotion scene as Double 11 May increase the usual pressure by dozens or even hundreds of times in the few minutes in the early morning. At this time, if the deployment is still based on 4-core 8G memory, it may require hundreds of machines. At this point, you can consider upgrading the machine configuration, such as upgrading to 16 core 32GB, each machine can carry thousands of requests per second, so that only a dozen machines may be enough to deploy.
There are such systems such as reporting system, BI system, data computing systems, data system, the core business of this kind of system such as data report, a request may query do calculate dozens of hundreds of megabytes of data in the memory, if still use small memory machine, so Eden area will fill quickly, and then trigger YoungGC, and with the increase of concurrent pressure, More machines are needed. In this case we can usually upgrade the machine configuration and deploy on a large memory machine.
In general, scenarios using large memory machines tend to be frequently YoungGC due to high concurrency or high memory footprint per request, and then when many machines need to be added, we can use large memory machines to reduce the number of machines deployed.
Problems with large memory machines
For example, if 16-core 32G memory is used, if the new generation is given 20G, then Eden area will be 16G, and Survivor area will be 2G each. YoungGC is triggered every 5 minutes or so, based on 50M objects per second. Memory is 10 times larger than before, and with the garbage collector combination of ParNew+CMS, YoungGC’s pause time is a few hundred milliseconds or even a second, in this case a few hundred milliseconds every few minutes. Moreover, due to the long time delay, it will lead to the backlog of requests, and in serious cases, it will lead to the return of some requests timeout. If you increase the configuration, such as using 32 cores and 64 GIGABytes, YoungGC will pause for a few seconds each time, which can have a significant impact on the system.
At this point, the G1 collector can be used to solve the problem of slow YoungGC for large memory. We can set G1 to an expected GC pause time of, say, 100 milliseconds, so that G1 ensures that each YoungGC pause is no longer than 100 milliseconds to avoid impacting the user experience.
But for systems that don’t run in the background directly to the user, even a second or a few seconds of GC doesn’t really matter, and there’s no need to use the G1 collector.
G1 collector tuning
G1 Memory Layout
G1 can use -xx :G1NewSizePercent to set the initial proportion of the new generation Region. The default value is 5%. The -xx :G1MaxNewSizePercent command is used to set the maximum proportion of the new generation Region. The default value is 60%. These two parameters do not need to be set, use the default values.
By default, the size of G1 Region is the size of the heap memory divided by 2048, N times of 2. You can also set the size of each Region by using the -xx :G1HeapRegionSize parameter.
GC pause time
G1 has a very important parameter that affects the performance of the G1 collector: -xx :MaxGCPauseMillis, which sets the maximum pause time for a GC. This parameter generally needs to be combined with the system pressure measurement tool, GC log, memory analysis tool for comprehensive reference, to try to make GC frequency not too high, at the same time, each GC pause time is not too long, to achieve an ideal reasonable value.
G1 will assign regions to the new generation as the system runs, but it does not have to reach 60% to trigger YoungGC. It is not clear how many regions G1 will assign to the new generation, how often it will trigger YoungGC, and how long it will take each time. It allocates some memory to the new generation according to the preset pause time, and then triggers YoungGC to control the GC time within the preset time, so as to avoid collecting too many regions at one time, resulting in longer GC pauses than expected, and avoiding collecting too few regions at one time, resulting in frequent GC.
MixedGC optimization
G1 triggers MixedGC by default when the percentage of older generations exceeds 45%. In fact, the most important thing to optimize MixedGC is to optimize memory allocation, so as to avoid objects going into the old age and avoid triggering MixedGC too often.
And then the core -xx :MaxGCPauseMillis parameter. If this parameter is set too high, the system will run for a long time, and then the proportion of new generation reaches 60%. At this time, the surviving objects may not be placed into Survivor zone or trigger the dynamic age judgment of Survivor zone. This will cause some objects to age, triggering the MixedGC. This parameter needs to be set properly to ensure that YoungGC is not too frequent, but also to consider the size of objects that survive after each GC, so as not to trigger MixedGC with too many objects aging.
The JVM parameter
-Xms24G
-Xmx24G
-Xmn20G
-Xss1M
-XX:SurvivorRatio=8
-XX:MetaspaceSize=256M
-XX:MaxMetaspaceSize=256M
-XX:MaxTenuringThreshold=5
-XX:PretenureSizeThreshold=1M
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=5
-XX:G1MaxNewSizePercent=60
-XX:G1HeapRegionSize=4M
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=4
Copy the code
OOM memory overflow problem
According to the Java Virtual Machine Specification, OutOfMemoryError (OOM) may occur in several run-time areas of virtual machine memory except the program counter. In general, an out-of-memory problem is devastating to the system. It means that the VM does not have enough memory to support programs, so once it happens, it can cause the system to stop running or even cause VM processes to crash. OOM is a very serious problem, this section will take a look at the common causes of OOM.
Metaspace overflow
Cause of metaspace overflow
The Metaspace area rarely runs out of memory. If it does, it usually does so for two reasons:
- Improper Metaspace parameters, such as Metaspace memory is too small, it is easy to run out of Metaspace
- The code uses CGLib, ASM, Javassist, and other dynamic bytecode techniques to create classes on the fly. If the code is poorly written, it can result in too many classes and fill up Metaspace
Simulate metaspace overflow
Let’s use CGLib to create classes to simulate stuffing Metaspace.
First add the cglib dependency to pom.xml:
<dependency>
<groupId>cglib</groupId>
<artifactId>cglib</artifactId>
<version>3.2.4</version>
</dependency>
Copy the code
The following program continuously creates proxy classes using CGLib:
public class GCMain {
public static void main(String[] args) {
while (true) {
Enhancer enhancer = new Enhancer();
enhancer.setSuperclass(IService.class);
enhancer.setUseCache(false);
enhancer.setCallback(new MethodInterceptor() {
@Override
public Object intercept(Object o, Method method, Object[] objects, MethodProxy methodProxy) throws Throwable {
returnmethodProxy.invokeSuper(o, objects); }}); enhancer.create(); }}static class IService {}}Copy the code
Set the following JVM parameters: fixed the meta space at 10M, and added parameters to track class loading and unloading:
-Xms200M
-Xmx200M
-Xmn150M
-XX:SurvivorRatio=8
-XX:MetaspaceSize=10M
-XX:MaxMetaspaceSize=10M
-XX:+UseConcMarkSweepGC
-XX:+TraceClassLoading
-XX:+TraceClassUnloading
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:./gc.log
Copy the code
Run the program for a while to report OOM error, and then directly exit the operation.
From under Caused by: Java. Lang. OutOfMemoryError: Metaspace suggests it is Caused by the Metaspace OOM. And as you can see from the class-loading trace above, the program keeps loading proxy classes that CGLIB dynamically creates.
Take a look at the GC log: you can see that a FullGC was triggered due to a full meta-space.
Stack overflow
Stack overflow cause
As you can see from the previous two articles, each thread has a thread stack, and the size of the thread stack is fixed, such as 1MB. Each time the thread calls a method, it pushes the frame of the calling method into the thread stack, and the stack frame pops up when the method calls. Stack frame stores local variables, exception tables, method addresses and other information of the method, which also occupies a certain amount of memory.
If the thread keeps calling methods and pushing frames without popping them, such as recursive calls with no end condition, the thread’s stack will fill up sooner or later, causing the stack to run out of memory. Generally speaking, the cause of stack memory overflow is often caused by some bugs written in the code, which rarely happens under normal circumstances.
In terms of virtual machine stacks and local method stacks, the Java Virtual Machine Specification describes two types of exceptions: StackOverflowError and OutOfMemoryError.
1, StackOverflowError
StackOverflowError is thrown if the stack depth requested by the thread is greater than the maximum depth allowed by the virtual machine. Stack depths of 1000~2000 are perfectly fine in most cases, and should be sufficient for normal method calls.
2, OutOfMemoryError
If the virtual machine’s stack memory allows dynamic scaling, an OutOfMemoryError is raised when sufficient memory cannot be allocated to the extended stack. The HotSpot VIRTUAL machine does not support scaling and the stack depth is dynamic, so if you set the thread stack size (-xSS) to a smaller size, the stack depth will decrease accordingly.
So HotSpot VIRTUAL machine stack overflow only causes StackOverflowError because the stack cannot accommodate new stack frames, not OutOfMemoryError.
Simulation stack overflow
Call recursion recursively with no end condition, so it must cause a stack overflow
public class GCMain {
public static void main(String[] args) {
recursion(1);
}
public static void recursion(int count) {
System.out.println("times: "+ count++); recursion(count); }}Copy the code
Set the following JVM parameters: Thread stack to 256K
-Xms200M
-Xmx200M
-Xmn150M
-Xss256K
-XX:SurvivorRatio=8
-XX:MetaspaceSize=10M
-XX:MaxMetaspaceSize=10M
Copy the code
StackOverflowError: StackOverflowError:
Stack overflow
Heap overflow cause
The main reason for a heap overflow is that there are too many objects in a limited amount of memory, and most of them are alive, even after GC, and then the heap can’t fit any more objects.
Generally speaking, there are two main scenarios for a heap overflow:
- When the system is overloaded and the number of requests is too large, a large number of objects are alive and cannot be added. In this case, OOM system crashes
- The system has a memory leak problem, inexplicably creates a lot of objects, which are alive and cannot be collected during GC, resulting in OOM
Simulated heap overflow
Run the following code: constantly creating strings that are referenced by datas and cannot be recycled will inevitably result in OOM.
public static void main(String[] args) {
Set<String> datas = new HashSet<>();
while (true) { datas.add(UUID.randomUUID().toString()); }}Copy the code
Set the following JVM parameters: new generation and old generation 100M each
-Xms200M
-Xmx200M
-Xmn100M
-Xss1M
-XX:SurvivorRatio=8
-XX:MetaspaceSize=10M
-XX:MaxMetaspaceSize=10M
-XX:+UseParNewGC
Copy the code
OutOfMemoryError: See OOM caused by insufficient Java Heap space.
Out-of-heap memory overflow
Out of memory
The size of DirectMemory can be specified by the -xx :MaxDirectMemorySize parameter. If not specified, it is the same as the Maximum value of the Java heap (-xmx) by default.
If you want to apply for a block of out-of-heap memory in Java code, you can use the DirectByteBuffer class and then build a DirectByteBuffer object that is itself in JVM heap memory. But when you build this object, you create a chunk of memory out of the heap associated with this object. When the DirectByteBuffer runs out of references and becomes garbage, the DirectByteBuffer is reclaimed in a YoungGC or FullGC event, and then the out-of-heap memory associated with DirectByteBuffer can be freed.
Simulate an out-of-heap memory overflow
If you create a lot of DirectByteBuffer objects that consume a lot of out-of-heap memory, and these DirectByteBuffer objects become garbage objects, if they are not collected by the GC, then the out-of-heap memory will not be released, and over time, it may cause out-of-heap memory overflow.
But NIO actually has a mechanism for releasing out-of-heap memory by calling system.gc () when out-of-heap memory is running low to suggest that the JVM perform a GC to reclaim garbage objects.
Run the following code: through the ByteBuffer allocateDirect cycle 1 m outside the heap of allocated memory, allocateDirect internal builds DirectByteBuffer object.
public class GCMain {
private static final int _1M = 1024 * 1024;
public static void main(String[] args) {
ByteBuffer byteBuffer;
for (int i = 0; i < 40; i++) { byteBuffer = ByteBuffer.allocateDirect(_1M); }}}Copy the code
Set the following JVM parameters: 300M for new generation, Max 20M out of heap memory, this will not trigger YoungGC.
-Xms500M
-Xmx500M
-Xmn300M
-XX:MaxDirectMemorySize=20M
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:./gc.log
Copy the code
After running the program, look at the GC log: You can see that NIO called System.gc() twice, which did not result in OOM, due to insufficient memory out of the heap.
If we add the -xx :+DisableExplicitGC argument to disallow calling System.gc() :
-Xms500M
-Xmx500M
-Xmn300M
-XX:MaxDirectMemorySize=20M
-XX:+DisableExplicitGC
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:./gc.log
Copy the code
An out-of-heap memory overflow exception is thrown:
So as a general rule, don’t set the -xx :+DisableExplicitGC parameter if off-heap memory is used in your program, just to be on the safe side.
How to solve OOM problem
OOM Analysis
Generally speaking, the general idea to solve the OOM problem is similar. When OOM appears, first analyze which area of memory overflow from the log, and then analyze the THREAD stack of OOM. If the code is written by myself, you can basically see the problem through the thread stack.
If the memory is not properly allocated, it will cause the young generation and the old generation to fill up quickly or a large number of objects to live for a long time. Therefore, the memory will fill up quickly and may also cause OOM.
Finally, the heap dump snapshot can be analyzed with MAT tool. Heap dump contains the overall picture of heap site and thread stack information, so as to know what object causes OOM because there are too many objects. Then, the object reference situation can be analyzed, which part of code causes memory overflow and the root cause of the problem can be found out.
But the analysis of OOM problem is generally more complex, the general online system OOM is not caused by the code we write, may be caused by the use of an open source framework, container, etc., this need to understand the framework, further analysis of its underlying source code to fundamentally understand its cause.
Heap dump snapshot
To automatically dump memory snapshots in OOM, add the following startup parameters:
-XX:+HeapDumpOnOutOfMemoryError
: In OOM, memory snapshots are automatically dumped-XX:HeapDumpPath=dump.hprof
: indicates the location where snapshot files are stored
Once you have a memory snapshot, you can use tools like MAT to analyze what objects have been created in large numbers. A snapshot file that uses out-of-heap memory, such as NIO, will not see any obvious exceptions.
Summary of performance tuning
Summary of tuning process
In general, the lower the GC frequency, the better. The efficiency of YoungGC is very fast, while FullGC is at least 10 times slower. Therefore, objects should be recycled in the younger generation as much as possible to reduce the frequency of FullGC. A few fullGCs a day or once a few days, or none at all, is generally a good JVM performance.
It can be summarized from the previous tuning process that the premise of tuning in the old generation is tuning in the young generation, the premise of tuning in the young generation is allocating memory space reasonably, and the premise of allocating memory space reasonably is estimating the memory usage model.
Therefore, the general idea of JVM tuning is to first estimate the memory usage model, reasonably allocate the memory space and proportion of each generation, try to make the surviving objects of the young generation enter the Survivor zone, and let the garbage objects be recycled in the young generation, not enter the old age, and reduce the frequency of FullGC. Finally, it’s all about choosing the right garbage collector.
Several manifestations of frequent FullGC
We need to consider frequent fullGCs when:
- The CPU is overloaded
- Frequent FullGC alarms
- The system cannot process the request or processes it too slowly
There are two scenarios when the CPU is overloaded:
- In the system created a large number of threads, these threads concurrently run, and the workload is very heavy, too many threads concurrently run will lead to machine CPU load is too high.
- The VM running on the machine is performing frequent FullGC, which is cpu-intensive. And frequent FullGC can cause the system to freeze occasionally.
Several common reasons for frequent FullGC
(1) The system carries high concurrent requests, or processes too much data, resulting in frequent YoungGC, and each time after the YoungGC survived too many objects, memory allocation is not reasonable, Survivor area is too small, resulting in frequent objects into the old age, frequent trigger FullGC
② The system loads too much data into the memory at one time and creates many large objects, which leads to frequent large objects entering the old age and frequent triggering of FullGC
③ The system has a memory leak, creating a large number of objects, which can not be recycled, has been occupied in the old era, must frequently trigger FullGC
④ Metaspace triggers FullGC because too many classes are loaded
⑤ FullGC is triggered by calling system.gc () by mistake
JVM Parameter Templates
Through the analysis summary, in front of the JVM parameters although there is no fixed standard, but for the general system, we actually can be summed up a set of generic JVM arguments template, basically guarantee the JVM performance is not bad, without a system to tuning, performance problems in a system, then pointed to tuning.
For a general system, we might use a 4-core 8G machine to deploy, so a set of templates are summarized as follows:
- The heap memory is allocated 4G, 3G in the new generation, 1G in the old generation, 2.4g in Eden region and 300M in Survivor region. Generally speaking, if the object survived after YoungGC is less than 150M, there is no big problem
- 512 megabytes is generally sufficient, but you can increase this value if the system creates many classes at runtime
-XX:MaxTenuringThreshold
Object GC age has been changed to 5 years to allow long lived objects to age faster-XX:PretenureSizeThreshold
The threshold for large objects is set to 1 MB. If large objects exceed 1 MB, you can adjust the threshold+ UseParNewGC, - - XX: XX: + UseConcMarkSweepGC
The garbage collector uses a combination of ParNew + CMS-XX:CMSFullGCsBeforeCompaction
Set to 0 to defragment memory after each FullGC-XX:+CMSParallelInitialMarkEnabled
In the CMS initial marking stage, multi-threaded concurrent execution is enabled to reduce the time of FullGC-XX:+CMSScavengeBeforeRemark
The Young GC is performed as much as possible before the CMS relabeling phase-XX:+DisableExplicitGC
To disable the display of manual GC-XX:+HeapDumpOnOutOfMemoryError
In OOM, the heap snapshot is exported for troubleshooting-XX:+PrintGC
To print GC logs for troubleshooting
-Xms4G
-Xmx4G
-Xmn3G
-Xss1M
-XX:SurvivorRatio=8
-XX:MetaspaceSize=512M
-XX:MaxMetaspaceSize=512M
-XX:MaxTenuringThreshold=5
-XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=92
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=2000
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+DisableExplicitGC
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=dump.hprof
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:./gc.log
Copy the code
The JVM parameter
We’ve already covered a lot of JVM parameters, but this section briefly summarizes them, as well as some of the less common ones.
Java startup parameters fall into three categories:
Standard parameters (-)
All JVM implementations must implement the function of these parameters and be backward compatible, such as- version, - the classpath
Nonstandard parameter (-x)
The default JVM implements these parameters, but not all JVM implementations are guaranteed, and backward compatibility, such as -xms and -xmx, is not guaranteedNon-stable parameter (-xx)
: This parameter varies from JVM implementation to JVM implementation and may be cancelled at any time in the future. Use it with caution-XX:UseParNewGC
,-XX:MetaspaceSize
JVM standard parameters (-)
You can see the standard parameters of the JVM using the Java-help command
JVM nonstandard parameter (-x)
The JVM non-standard parameters can be seen with the Java-x command
Common parameters:
JVM non-stable parameter (-xx)
JVM non-stable parameters fall into three categories:
Functional switch parameter
: switches functions that change the underlying behavior of the JVMPerformance tuning parameters
: for performance tuning of the JVMDebug parameter
: Generally used to turn on JVM parameters such as trace, print, and output to display more detailed INFORMATION about the JVM
Note: Parameters with plus sign “+” and minus sign “-” are generally switch parameters, plus means enabled, minus means disabled, such as -xx :+/ -useadaptivesizePolicy. Those that do not contain a plus or minus sign must enter a parameter value with an equal sign (=), for example, -xx :SurvivorRatio=8.
You can print all JVM parameters and their values at startup by setting -xx :+PrintFlagsFinal.
Functional switch parameter
1. Parameters related to garbage collector
2. Some other parameters
Performance tuning parameters
Debug parameter
Just-in-time compilation tuning parameters
Once the class is initialized, the execution engine converts the bytecode to machine code during the execution of the class call, which can then be executed in the operating system. In the process of bytecode to machine code, there is also compilation in the virtual machine, which is just-in-time compilation. Initially, bytecode in a virtual machine is compiled by the Interpreter, and when the virtual machine detects that a method or block of code is being run particularly frequently, it identifies that code as “hot code.” To improve the efficiency of hot code execution, the just-in-time compiler (JIT) compiles the code to local platform-specific machine code at run time, optimizes it at various levels, and stores it in memory. Without JIT just-in-time compilation, the same code is compiled using the interpreter every time it is run.
Compilation optimizations are related to just-in-time compiler selection, optimization of hot-spot detection count thresholds, method inlining, escape analysis, lock elimination, scalar replacement, etc. Compilation optimizations generally do not need to be tuned.