I. Situation of the problem

Recently, users have reported slower and slower system response, and not by accident. According to background logs, oom is displayed in the system.

The JDK’s built-in JConsole tool allows you to monitor when your system is blocked. CPU is full, the number of active threads continues to increase, and the heap memory is near the peak.

Second, analyze the situation

Analysis using JConsole:

Find the JDK installation path, click jconsole.exe in the bin directory, and run.

At that time, the online situation was that the heap memory usage was about 7 G, close to the peak; About 80 active threads; The CPU usage is around 80%. The system can go down at any time.

According to the user’s feedback, the system would be stuck at intervals, and the heap memory would rise for a period of time, then suddenly drop, and then rise again. Therefore, my first reaction is: Is it possible that the system frequently performs FullGC, which leads to the system being unavailable for a period of time?

Let’s start with what Full GC is (some knowledge of the Java memory model is required) :

FullGC is a garbage collection mechanism that occurs in the permanent generation and the old generation.

Trigger condition: old memory full.

Features: Takes a long time to execute, during which the system is unavailable.

So check the occupation of the old age:

During the use of the old age memory peak has reached about 7 G, close to the upper limit. It smells like something bad.

Jstat -gc pid = jstat -gc pid

At the time of the incident, the Full GC was executed 28 times, and the total execution time was 27 seconds. It can be confirmed that frequent Full GC causes system threads to block.

As mentioned above, FullGC is triggered when the old age runs out of memory. In which case will the old age be entered?

(1) Objects that undergo a fixed number of minor GC’s in the new generation will enter the old age

The value can be set by -xx :MaxTenuringThreshold. The default value is 15

(2) After the minor GC is collected in the new generation, when the surviving object is larger than the survivor to region capacity, it enters the old age

(3) Big object directly into the old age

Through – XX: PretenureSizeThreshold Settings

Analysis using JVisualVM:

In the BIN directory of the JDK, go to JVisualvm. exe and hit Run.

Here you can clearly see the distribution of the contents of the heap. At that time, byte[] and InternalAprOutputBuffer occupied the largest memory, both of which were 2.9G.

It is recommended to use the Mat plug-in of Eclipse to analyze the specific reasons.

Download the Mat plugin installation tutorial: mp.csdn.net/postedit/10…

System Dump:

The problem is that they take up too much heap memory.

Then we go to the Dominator Tree and see the largest set of objects that remain alive

At least half of them are InternalAprOutputBuffer objects. Shallow Heap indicates the original size, and Retained Heap indicates the size of memory used in the Heap, expressed in bytes(1kb = 1024bytes). Each Retained Heap occupies about 47.6 MB space in the Heap, which is hard to imagine.

So what’s in these objects? Why so many?

Click on the tree and find the contents of this 50,000,000 byte array.

Take a look at the Attributes value on the left. It does have a length of 50 million. What data does it hold?

Click the plus sign in the lower right corner of the picture to expand the data. It is found that only the hundreds of items in the front have data, and all the items in the back are 0 up to 50,00000000. This means that there are only 311 valid byte[] values, and the rest are filled with zeros, occupying tens of millions of values, resulting in memory explosion.

Combining org. Apache. Coyote. Http11. InternalAprOutputBuffer class is a class of Tomcat, guess is related to the Tomcat configuration, then open the Tomcat server. Check the XML.

Does this 500 million look familiar to you? Is the length of this byte[].

So why set this parameter?

This parameter is added to solve the problem that url parameters in GET requests are too long and too large.

However, all major browsers have restrictions on the length of URL, and the default value of url is 4K in Tomcat, which is about 50M here. Therefore, in order to solve the problem, we changed it to 8K, that is, 8192, and then modified the interface. Queries with too large parameters are requested by Post.

At this point, the problem is solved, the server runs smoothly, and there is no more stuck situation.