JVM online tragedy analysis, investigation, solution!

1. Background of this paper

This article will explain to you a special JVM optimization case, this optimization case itself is because the novice engineer may have a half understanding of JVM optimization, and then found a very special JVM parameter setting error, resulting in frequent Full GC problems in the online system.

However, a large number of our subsequent optimization cases are actually a variety of strange scenarios, because it is a variety of strange scenarios to gradually accumulate a relatively rich experience of JVM optimization

The more scenarios you understand, the better equipped you will be to deal with JVM performance issues in the future.

2, the generation of problems

This scenario occurs roughly as follows: one day, a novice engineer in the team is probably on a whim, feel that he saw a JVM parameters on the Internet, think learned the secret of the unique martial arts, so on the day of the online system, set a JVM parameters on his own

What is this parameter?

Don’t worry, just follow the case study below, now just know that he set a strange parameter, and the accident happened.

Because most medium and large companies are connected to monitoring systems like Zabbix, OpenFalcon or some of the company’s own developed monitoring systems, monitoring systems are usually very good, you can directly access your system, and then you can see the CPU, disk, memory, network load of each machine.

You can also see a line chart of fluctuations in memory usage for your JVM as well as a line chart of the frequency of GC occurrences for your JVM. Including if you report a certain business metric, it can be seen in the monitoring system.

For example, you can set up an alarm if you find more than 3 Full GC occurrences in a system JVM within 10 minutes. This can be done by sending an alarm to your SMS, email or an IM tool such as a spike.

This type of monitoring system is beyond the scope of our column and is recommended for you to check your own resources. In fact, based on the command-line tools we have described, you can use some commands on Linux to have Jstat automatically monitor the JVM and output the monitoring results to a file on your machine.

Then the next day you can look up that file and see some GC statistics for that machine’s JVM.

So a simple command line tool without a visual tool can work just as well.

So that day the engineer set a JVM parameter, which directly led to frequent online JVM Full GC alarm, we were very surprised, so we started to check that system.

3. View the GC log

I’ve already shown you how to get it to output GC logs when you start up the system, so once you find an alarm, log directly to the online machine and see the corresponding GC logs.

At this point we see a large number of Full GC records in the GC log.

So what causes Full GC?

Metadata GC Threshold = Metadata GC Threshold = Metadata GC Threshold = Metadata GC Threshold

Full GC(Metadata GC Threshold) XXXXX, XXXXX)

This frequent Full GC is actually a result of the Metadata area in JDK 1.8 and later, which is similar to the persistent generation we mentioned earlier.

This Metadata area typically holds classes that are loaded into the JVM.

So why is the Full GC triggered because the Metadata area is frequently filled? And the Full GC, as we all know, drives the CMS to reclaim the old, as well as the Metadata itself.

Let’s start with the following image:

4. Check Metaspace memory usage

Then, of course, we want to look at the Metaspace area memory usage, which you can easily do by looking at jstat. If you have a monitoring system, it will show you a Metaspace area memory usage fluctuation graph, similar to the following.

It looks like the Metaspace region has a fluctuating state of memory. It always increases, reaches a peak, fills up the Metaspace region, and then naturally triggers a Full GC, which carries the garbage collection of the Metaspace region. So the memory footprint of the Metaspace area becomes small again.

5. A comprehensive analytical approach

New classes are being added to the Metaspace region, the Metaspace region is being filled, and a Full GC is triggered to reclaim some classes in the Metaspace region.

This process then repeats over and over again, causing the Metaspace region to be filled over and over again, which in turn leads to Full GC, as shown in the figure below.

What classes are being loaded all the time?

Then we wonder what classes keep getting loaded into the JVM’s Metaspace region.

At this point, you need to add the following two parameters to the JVM startup parameters:

“- XX: TraceClassLoading – XX: TraceClassUnloading”

These two parameters, as the name suggests, track class loading and class unloading, and it prints out from the log which classes were loaded and unloaded in the JVM.

After adding these two parameters, we can see that Tomcat’s catalina.out log file outputs a bunch of logs that display something like the following:

【 the Loaded sun. Reflect. GeneratedSerializationConstructorAccessor from __JVM_Defined_Class 】

Clearly as you can see, the JVM continuously during operation to load a large amount of so-called “GeneratedSerializationConstructorAccessor” class to Metaspace area

As shown in the figure below

It is believed that the JVM is constantly loading strange classes and then filling up the Metaspace region, causing the Full GC to be executed continuously.

Frequent Full GC is triggered not only by older generations, but also by too many classes in the Metaspace area.

So far, we’re getting closer to the truth.

7. Why are strange classes loaded frequently?

And then when we have a problem like this, we should turn to Google or Baidu, and of course Google is recommended. You can look at that class that keeps loading, what class is it, that you wrote? Or is it a built-in class in the JDK?

For example, if you look up the above class, it’s easy to see that the class is probably loaded when you use reflection in Java, and the reflection code looks something like this.

Method method = XXX.class.getDeclaredMethod(xx,xx);

method.invoke(target,params);

A friendly reminder, reflection is the most basic concept in Java, do not understand the friends of their own information.

In simple terms, get a class from XXX. Class, and then get a method from that class by geteDeclaredMethod.

This Method is a Method object, and method. invoke calls the Method of an object of that class, or something like that.

When you execute this reflection code, the JVM dynamically generates some classes after a certain number of reflection calls, which is the kind of puzzling class we saw earlier

The next time you perform reflection, you call the methods of these classes directly, which is an underlying optimization mechanism of the JVM.

See here, some small partners are a little confused?

It doesn’t really matter, this paragraph doesn’t affect your JVM optimizations at all

Just keep one conclusion in mind: if you use a lot of reflection in your code, the JVM will dynamically generate classes to put into the Metaspace section.

So the weird classes you see above are the result of executing reflection code over and over again, as shown below.

What’s wrong with weird classes created by the JVM?

The next thing to wonder is why the JVM keeps creating weird classes and putting them in Metaspace.

This is a good point to start with, because one of those weird classes created by the JVM has a SoftReference Class object.

You’ve never heard of a Class before. In simple terms, each Class is itself an object, namely a Class object, and a Class object represents a Class. And this Class object represents a Class that can be derived from many instance objects.

For example, Class Student, which is a Class, is itself represented by an object of type Class.

But if you go Student Student = new Student(), that’s an object that instantiates that Student class, that’s an instance object of type Student.

So the Class object is a Class object that is dynamically generated by the JVM during launch, and is soft-referenced.

Soft references, as we mentioned in our previous article, are not normally recycled, but they are recycled when memory is tight.

What formula is used to determine whether a SoftReference object should be recycled during GC?

Is a formula as follows:

Clock-timestamp = freespace * SoftRefLRUPolicyMSPerMB

This formula means that clock-timestamp represents how long it has been since a soft reference object was accessed, freespace represents free memory space in the JVM, SoftRefLRUPolicyMSPerMB Indicates how long a SoftReference object is allowed to live for each MB of free memory space.

For example, let’s say the JVM now creates a bunch of weird classes whose Class objects are soft referenced by SoftReference.

And now the space in the JVM is 3000MB memory space, SoftRefLRUPolicyMSPerMB default is 1000 milliseconds, so that means that the Class object of that weird SoftReference SoftReference, It can survive 3000 * 1000 = 3000 seconds, which is about 50 minutes.

Of course, these are just examples, but as you know, there is more or less free memory in the JVM during GC, so soft references will not be collected unless an OOM memory is about to run out.

As you can see, the JVM is supposed to dynamically create weird classes as reflection code executes. Their Class objects are soft-referenced and normally not recycled, but they shouldn’t grow rapidly either.

9. Why does the JVM create more and more weird classes?

So why does the JVM keep creating more and more weird classes?

The reason is simple, because the rookie engineer at the beginning of this article somehow picked up SoftRefLRUPolicyMSPerMB as the JVM startup parameter and set it to 0.

Once this parameter is set to 0, any soft reference objects can be freed as quickly as possible.

Really think very silly very naive.

In fact, once this parameter is set to 0, it causes the formula clock-timestamp = freespace * SoftRefLRUPolicyMSPerMB to be 0 on the right side, which causes all soft reference objects, such as those weird Class objects generated by the JVM, As soon as it is created, it can be immediately recycled by a Young GC.

As shown in the figure below.

Let’s say the JVM finally gives you 100 weird classes, and then all of a sudden the GC reclaims dozens of classes because you set the soft reference parameters blind

The JVM then continues to create these strange classes as it reflects code execution, and under the JVM’s mechanism, the number of such strange classes increases.

Maybe the next GC will recycle some strange classes, but soon the JVM will continue to generate such classes, eventually causing the Metaspace region to become Full. Once the Metaspace region is Full, the Full GC will be triggered, and many classes will be reclaimed, and the cycle will repeat again, as shown in the figure below.

Many people wonder why the JVM keeps creating more classes when soft-referenced classes are quickly recycled because of bad parameter Settings.

In fact, we do not have to deduct the details here, there is a large number of underlying JDK source code implementation, extremely complex, to really clear, it will take several articles to clear the UNDERLYING JDK source code of these details.

You just have to remember this conclusion and understand this truth.

How to solve this problem?

Although we haven’t analyzed some implementation details of the underlying JDK, we have a general idea of what the problem is and why

The solution is simple. In scenarios where there’s a lot of reflection code, all you have to do is put

-XX:SoftRefLRUPolicyMSPerMB=0

This parameter can be set to 1000ms, 2000 ms, 3000 ms, 5000 ms, don’t let some novice students set to 0 ms.

By increasing this value, we can make sure that soft references created by the JVM during reflection are not recycled, and then we can see that the system is stable.

Basically, the memory usage of the Metaspace area is stable and does not fluctuate wildly.

Today’s question

So given what we saw yesterday, what is the nature of this online incident?

It’s not the JVM, it’s the people.

Consider this: if you are an architect in your company, should you rigorously audit production ENVIRONMENT JVM parameters for each system?

For example, it is perfectly possible to implement a set of JVM parameter templates. If someone wants to do a custom JVM optimization, should they try it out in the test ring and then hand it over to your senior architect?

If someone had checked, there wouldn’t have been a bloodbath like this.

JVM online tragedy analysis, investigation, solution!

Related Posts

Diagram of Docker architecture

Distributed service avalanche protection fuse, Hystrix theory + practice.

How do I integrate Google reCAPTCHA into a PHP project