The author | education infrastructure department r&d engineers juei-shen wang supremo source | alibaba cloud native public number

background

After receiving the development feedback from the business department of the company, the UAT (pre-production) environmental interface performance pressure test failed to reach the standard after the application was applied to upgrade the internal framework of the company. Pre-upgrade pressure test report:

Pressure test report after upgrade:

In the same machine configuration (1C4G), throughput decreased from 53.9/s to 6.4/s, and CPU load was high.

In addition, according to the link information queried from SkyWalking, the whole link monitoring system of the company, the development feedback shows that the time of most requests for Feign invocation is not normal (390ms), while the actual response speed of the downstream service called is very fast (3ms).

Location problem

As soon as I received the feedback, I applied for permission for the appropriate machine and uploaded Arthas (Version 3.4.3) to the appropriate machine.

Ask the business side to maintain pressure measurement and begin problem location.

1. Run profiler to analyze CPU performance

[arthas@17962]$ profiler start -d 30 -f /tmp/arthas/1.txt
Copy the code

After 30 seconds, open the 1. TXT file to view the CPU performance analysis result. The following is an example of the beginning:

-- 1630160766 ns (4.24%), 141 samples...... [14] org.springframework.boot.loader.LaunchedURLClassLoader.definePackageIfNecessary [15] org.springframework.boot.loader.LaunchedURLClassLoader.loadClass [16] java.lang.ClassLoader.loadClass [17] java.lang.Class.forName0 [18] java.lang.Class.forName [19] org.springframework.util.ClassUtils.forName [20] org.springframework.http.converter.json.Jackson2ObjectMapperBuilder.registerWellKnownModulesIfAvailable [21] org.springframework.http.converter.json.Jackson2ObjectMapperBuilder.configure [22] org.springframework.http.converter.json.Jackson2ObjectMapperBuilder.build [23] org.springframework.web.servlet.config.annotation.WebMvcConfigurationSupport.addDefaultHttpMessageConverters [24] org.springframework.web.servlet.config.annotation.WebMvcConfigurationSupport.getMessageConverters [25] org.springframework.boot.autoconfigure.http.HttpMessageConverters$1.defaultMessageConverters [26] org.springframework.boot.autoconfigure.http.HttpMessageConverters.getDefaultConverters [27] org.springframework.boot.autoconfigure.http.HttpMessageConverters.<init> [28] org.springframework.boot.autoconfigure.http.HttpMessageConverters.<init> [29] org.springframework.boot.autoconfigure.http.HttpMessageConverters.<init> [30] com.zhangmen.xxx.DefaultFeignConfig.lambda$feignDecoder$0 [31] Com. Zhangmen. XXX. DefaultFeignConfig $$Lambda $704.256909008. GetObject [32] org.springframework.cloud.openfeign.support.SpringDecoder.decode [33] org.springframework.cloud.openfeign.support.ResponseEntityDecoder.decode ......Copy the code

2. Run the trace command to trace the time of each node in the method path

Analyzing the CPU performance analysis results obtained in the previous step, you can see that there are indeed Feign related stack frames among the stacks that occupy the most CPU.

And com.zhangmen related stack frames are found around Feign related stack frames: com.zhangmen.xxx.DefaultFeignConfig$$Lambda 704.256909008. GetObject and com. Zhangmen. XXX. DefaultFeignConfig. Lambda704.256909008. GetObject and Com. Zhangmen. XXX. DefaultFeignConfig. Lambda704.256909008. GetObject and com zhangmen. XXX. DefaultFeignConfig. LambdafeignDecoder $0.

In 1. TXT search com. Zhangmen. XXX. DefaultFeignConfig, found 340 times hit, so think this is a very dubious methods.

Run the trace command to output the time spent on each node on the method path:

[arthas @ 17962] $trace com. Zhangmen. XXX. DefaultFeignConfig * '# cost > 200' - n 3 ` - [603.999323] ms Com. Zhangmen. XXX. DefaultFeignConfig: lambda $feignEncoder $1 () ` - [603.856565] ms org.springframework.boot.autoconfigure.http.HttpMessageConverters:<init>() #42Copy the code

Find org. Springframework. Boot. Autoconfigure. HTTP. HttpMessageConverters: () takes time, and continue to trace track layers:

[arthas@17962]$ trace org.springframework.boot.autoconfigure.http.HttpMessageConverters <init> '#cost>200' -n 3
......
[arthas@17962]$ trace org.springframework.http.converter.json.Jackson2ObjectMapperBuilder registerWellKnownModulesIfAvailable '#cost>200' -n 3
Copy the code

Eventually found org. Springframework. Util. ClassUtils: class.forname () more time-consuming, and throw the exception.

Use the watch command to view specific exceptions:

[arthas@17962]$ watch org.springframework.util.ClassUtils forName -e "throwExp" -n 
Copy the code

To solve the problem

Report the problems identified to the relevant business development and recommend the introduction of jackson-datatype-joda dependencies.

Introduction of dependent post pressure measurement reports:

Throughput increased from 6.4/s to 69.3/s, higher than 53.9/s before the framework upgrade.

According to the business development feedback, the problem was caused by the Feign codec (as shown below) that was customized in the code, and that the codec was there before the framework was upgraded.

So, squashed the code before upgrading the framework and used Arthas to execute the following command during the squashed process:

I found the same anomaly. The jackson-datatype-joda dependency is introduced, and the pressure measurement report is as follows:

Summary of the previous pressure test results:

A new problem can be found: why does the old version and the new version introduce no dependencies at the same time, resulting in a nearly 8-fold throughput difference, while the old version and the new version introduce dependencies at the same time, resulting in a nearly 1-fold throughput difference?

Further locate the problem

Based on the new issues identified in the previous step, the unupgraded framework with introduced dependencies and the upgraded framework with introduced dependencies were then crimped, and Arthas profiler was used to sample CPU performance analysis data during the crimping process to obtain samples 1 and 2. And find similar stacks from sample 1 and sample 2 for comparison:

By comparison, the first 17 rows of the similarity stack for the two samples are different. And trace suspicious stack frames in sample 2:

[arthas@10561]$ trace org.apache.catalina.loader.WebappClassLoaderBase$CombinedEnumeration * '#cost>100' -n 3 ` - [171.744137] ms org. Apache. Catalina. Loader. WebappClassLoaderBase $CombinedEnumeration: hasMoreElements () ` - [171.736943] ms org. Apache. Catalina. Loader. ` WebappClassLoaderBase $CombinedEnumeration: inc () # 2685 - [171.724546] ms org.apache.catalina.loader.WebappClassLoaderBase$CombinedEnumeration:inc()Copy the code

After discovering the upgrade framework, there was a time consuming situation in the classloader section.

However, no trace tracing of this part in sample 1 takes more than 100ms.

The profiler command was further used to generate flame maps of the two versions respectively in the pressure measurement scenario and find similar stacks for comparison:

[arthas@10561]$ profiler start -d 30 -f /tmp/arthas/1.svg
Copy the code

Found version upgrade framework and introduce to rely on More than a few org/springframework/boot loader/stack.

Further problem solving

Feedback new findings to relevant business development.

They report that in addition to the framework upgrade, there are adjustments to the Spring Boot War to JAR deployment. Switch from a standalone Tomcat WAR deployment to a Tomcat Java-JAR deployment embedded in Spring Boot. Therefore, it is suspected that there are performance differences between the two deployment modes in class loaders. Related business development during the period of I step on the positioning problem, based on my initial orientation to the problem, in the Google search feign com. Fasterxml. Jackson. Datatype. Joda. JodaModule, I found a related article “loadClass causes Online Service Lag analysis”.

The author of this article has a similar problem to ours.

After reading this article, I also Debug part of the source code, and finally understand the root cause of the problem is: SpringEncoder/SpringDecoder calls ObjectFactory.getobject ()).getConverters() to get HttpMessageConverters each time it encodes/decodes. The implementation of our custom ObjectFactory configuration in DefaultFeignConfig is to new a new HttpMessageConverters object each time.

By default, the constructor of HttpMessageConverters executes the getDefaultConverters method to get the default Collection of HttpMessageConverters and initializes those default HttpMessageConverters. The MappingJackson2HttpMessageConverter (there are two, See image below) each initialization time loading com. Not in the classpath fasterxml. Jackson. The datatype. Joda. JodaModule and Com. Fasterxml. Jackson. Datatype. Joda $JodaModule (org. Springframework. Util. ClassUtils load less than class, will try to reload the inner class), ClassNotFoundException is thrown and the exception is swallowed alive.

And some related to XML default HttpMessageConverter, SourceHttpMessageConverter and Jaxb2RootElementHttpMessageConverter (see below) each have two, every time initialization, Performs TransformerFactory. NewInstance (), the execution will use SPI in the process of scanning the meta-inf/services directory under the classpath to obtain specific implementation, and each finished scanning also have no access to the specific implementation, Finally use the default specified com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl as implementation.

As a result, every Feign call (encoding and decoding), Will be loaded for four com. Not in the classpath fasterxml. Jackson. Datatype. Joda. JodaModule and Com. Fasterxml. Jackson. Datatype. Joda $JodaModule (8), and 8 times using SPI scanning the meta-inf/services directory under the classpath ascertained implementation, However, after war to JAR, the performance of class loader in frequently searching and loading resources deteriorates, which ultimately seriously affects interface performance.

Default HttpMessageConverter collection:

Some key codes are as follows.

Org/springframework/boot/autoconfigure/HTTP/HttpMessageConverters. :

Org/springframework/HTTP/converter/json/Jackson2ObjectMapperBuilder registerWellKnownModulesIfAvailable:

Org/springframework/util. ClassUtils. Class.forname:

Org/springframework/HTTP/converter/XML/SourceHttpMessageConverter:

Javax.mail/XML/transform/FactoryFinder. Find:

The article also offers two solutions to this problem:

The first approach, which I originally suggested introducing jackson-datatyp-joda dependencies, To avoid every time initialize the default MappingJackson2HttpMessageConverter this repeated loading is not in the classpath Com. Fasterxml. Jackson. Datatype. Joda. JodaModule and com fasterxml. Jackson. Datatype. $JodaModule joda.

The second method does not initialize the default HttpMessageConverter. Because we only need to use custom FastJsonHttpMessageConverter here to perform the codec, can completely avoid the execution getDefaultConverters method, Repeatedly initialize many unused default HttpMessageConverter. So in the new HttpMessageConverters object, you can set the addDefaultConverters parameter to false.

ObjectFactory<HttpMessageConverters> objectFactory = () -> new HttpMessageConverters(false, new HttpMessageConverter[] { (HttpMessageConverter)fastJsonHttpMessageConverter });
Copy the code

In fact, we can also modify the ObjectFactory implementation in DefaultFeignConfig to avoid creating a new HttpMessageConverters object every time (repeatedly initializing HttpMessageConverters), Achieve further optimization. Therefore, it is recommended to modify DefaultFeignConfig into the following code for relevant business development:

After the relevant business development improved DefaultFeignConfig in both the old and new versions of the code and deployed it to the FAT (test) environment, I conducted a simulated pressure test on the FAT environment with JMeter on my own computer.

Pressure test results after improvement of the old version:

New improved pressure test results:

The interface performance of the two versions is basically the same.

Finally, the tester pressed the code after updating the framework and improving DefaultFeignConfig in the UAT environment again, and the results were as follows:

Throughput increased to 160.4/s from an initial sub-par 6.4/s.

So why do tweaks to war to JAR deployments cause class loaders to perform worse when they frequently find and load resources?

After doing some understanding of the principle of SpringBoot executable JAR. In order to be able to Boot with a fat JAR, Spring Boot extends the JDK’s JarFile URL protocol, and customizes its own ClassLoader and JAR file protocol Hander. Jar in jar, jar in directory.

For those interested in the principle of SpringBoot executable JARS, please refer to the Executable JAR Package.

The root cause of War2Jar class loader performance degradation

To verify my guess, I built a simple Demo on my own computer.

There are two services in the Demo, A and B. Both A and B are registered in the Eureka registry, and A calls B via Feign.

The various scenarios were then crimped using Jmeter in the same configuration and Arthas profiler command was used to generate fire maps for each scenario during the crimping process.

The pressure test results are as follows: -xMS512m -XMX512m

By comparing Table 3 and Table 4, it can be seen that the introduction of dependencies after code optimization has almost no effect on throughput.

It can be seen from Table 3 and Table 4 that after code optimization, the throughput of the three deployment modes is basically the same when non-existent resources are not frequently searched and loaded.

As you can see from Table 2, Tomcat WAR deployments perform better when SPI is frequently used to get implementations that cannot be found under clasSPath.

It can be seen from Table 1 that when non-existent classes are frequently loaded, the performance of JarLauncher is better after the JAR package is unpacked.

Compare the flame diagram of similar stacks ③ and ② in Table 1:

Can be found in both org/springframework/boot/loader/LaunchedURLClassLoader loadClass loading class, there are differences.

(2) not only can implement Java/lang/this loadClass, Will also implement org/springframework/boot/loader/LaunchedURLClassLoader definePackageIfNecessary.

Check the org/springframework/boot/loader/LaunchedURLClassLoader loadClass source code:

A conditional branch is found.

Check the org/springframework/boot loader/Launcher. CreateArchive source code:

Find that the value of this condition depends on whether the application is an executable JAR file or a file directory.

(2) again for pressure measurement and the trace org/springframework/boot/loader/LaunchedURLClassLoader definePackageIfNecessary:

` - [165.770773] ms org. Springframework.. The boot loader. LaunchedURLClassLoader: definePackageIfNecessary (+) - [0.00347] ms . Org. Springframework. The boot loader. LaunchedURLClassLoader: getPackage () # 197 ` - [165.761244] ms org.springframework.boot.loader.LaunchedURLClassLoader:definePackage() #199Copy the code

It turns out that this place does have some time consuming situations.

DefinePackageIfNecessary is mainly to try to define the package of a class by its name before calling findClass. Make sure that the MANIFEST in the JAR file nested jar package is associated with the package.

Debug definePackageIfNecessary: If definePackage is defined, all jar packages under boot-INF /lib/ and boot-INF /classes/ are iterated. If the specified class is found in these resources, the definePackage method is continued, otherwise null is returned.

As I said, Every Feign call four loading com. Not in the classpath fasterxml. Jackson. The datatype. Joda. JodaModule and Com. Fasterxml. Jackson. Datatype. Joda $JodaModule (8). My simple Demo application relies on 117 jars (actual enterprise projects will have more). For each Feign call, 8 * (117 + 1) is executed, totaling 944 loops of logic. . The logic of the org. Springframework. The boot loader. The jar. The Handler. The openConnection method in the process of execution will involve more time-consuming IO operations, eventually lead to the serious influence the interface performance. You can also see this part of the processing logic in the generated flame diagram.

At this point, we can confirm that adjustments to the WAR to JAR deployment are at the root of the class loader’s performance degradation when it frequently finds and loads resources: In order to enable Spring Boot to start with a FAT JAR, some customized processing logic has been added, and this part of customized processing logic will affect the performance of the program when it is frequently executed.

Why does unpacking the JAR and launching it through JarLauncher perform better than Deploying Tomcat WAR when frequently loading classes that don’t exist? [When SPI is frequently used to get implementations that can’t be found in your classpath, Tomcat WAR deployment performs better than unpacking the JAR and launching it through JarLauncher?] Due to space limitations, I will not continue to expand in this article. Interested friends can in accordance with the method introduced in this article, and then combined with the relevant source code to further explore.

conclusion

When customizing Feign codecs and using SpringEncoder/SpringDecoder, avoid repeated initialization of HttpMessageConverters. If you do not need those default HttpMessageConverters, you can initialize HttpMessageConverters by setting the first input parameter to false.

In addition, you should be aware that different deployment methods can have performance differences when class loaders frequently find and load resources.

When writing code, we should also avoid repeated initialization, and repeatedly finding and loading resources that do not exist.

Finally, using SkyWalking and Arthas will help us identify programming errors and performance bottlenecks more efficiently.

eggs

If your application uses SkyWalking Agent and Arthas then you may encounter problems with some Arthas commands (trace, watch, etc., which enhance classes) not working properly.

Solution: github.com/apache/skyw…

Once Arthas is working properly, it is a good idea to add an * to the method name for fuzzy matching when executing commands like trace on SkyWalking Agent’s enhanced class methods. Arthas will eventually display a summary of the trace tracing results of all the matching methods.

Method name trace without * :

The method name is traced by * :

You can see that after the method name is added with *, the result traced is the desired result.

This is due to SkyWalking Agent using ByteBuddy for bytecode enhancement. While ByteBuddy every increase a method, for the method to generate a secondary inner class (HelloController auxiliaryauxiliaryauxiliaryjiu2bTqU), And for the current class (HelloController) in the original method (test1) rename (test1 originaloriginaloriginallyu0XDob), And generate a with the original method with the same name (test1) and a different name but only for the method called secondary inner class (test1 originaloriginaloriginallyu0XDob accessoraccessoraccessor8F82ImAF).

Using a Java decompiler developed by a colleague, you can see the code visually:

Also, when using Arthas, it is recommended to choose the latest version. For example, large methods traced in versions prior to 3.4.2 May result in the JVM Metaspace OOM. See: Remembering a Metaspace OOM Problem caused by Arthas.

If you want to build an enterprise-level online diagnostic platform based on Arthas, check out icbc’s Exploration and Practice in Building an Online Diagnostic Platform.

Author’s brief introduction

Wang Ruixian, an open source enthusiast, is currently a R&D engineer in the department of Education Infrastructure, mainly responsible for the development of the company’s full-link monitoring system and application diagnosis platform.

Arthas 57 hands-on experiments at start.aliyun.com:Start.aliyun.com/handson-lab…

Arthas Experimental preview

In an effort to get more developers to start using Arthas as a Java diagnostic tool, the Arthas community has teamed up with JetBrains to launch an Arthas essay campaign: ** Talk about your relationship with Arthas over the years. ** activity is still in progress, click to participate, welcome everyone to contribute, participation is likely to win!

Spring Boot Microservice performance down 90%! Use Arthas to locate the root cause