Personal creation convention: I declare that all articles created are their own original, if there is any reference to any article, will be marked out, if there are omissions, welcome everyone critique. If you find online plagiarism of this article, welcome to report, and actively submit an issue to the Github warehouse, thank you for your support ~

This article is the second part of “Why DO I Suggest” series. In this series, I will explain and analyze the requirements of some development suggestions and development specifications for background development in the group under high concurrency scenarios. I believe that it can help you avoid some pitfalls when facing high concurrency business. Past review:

  • Why do I recommend adding Force Index to all queries on complex but performance-critical tables

At the beginning of the launch of the service, our online log level was INFO, and the code location was output in the log content in the format such as:

2022-03-02 19:57:59.425  INFO [service-apiGateway,,] [35800] [main][org.springframework.cloud.gateway.route.RouteDefinitionRouteLocator:88]: Loaded RoutePredicateFactory [Query]
Copy the code

Disruptor’s WaitStrategy uses a low-cpu-balanced Sleep, which is configured for Disruptor’s WaitStrategy: AsyncLoggerConfig WaitStrategy = Sleep. As our business grew, we found frequent instances with very high CPU usage (especially those with a large amount of log output in a short period of time), so we dumped JFR to further locate:

First let’s look at GC. Our GC algorithm is G1, which is mainly viewed through the event G1 Garbage Collection:

It was found that all GC were Young GC, with relatively normal time consumption and no obvious abnormality in frequency.

Next, CPU usage is relevant. Check the CPU usage of each Thread by viewing the event Thread CPU Load. It was found that the reactor-HTTP-epoll thread pool had a high CPU usage, which combined was close to 100%.

These threads are the reactor-Netty threads that process business, look at other instances, and see that normally, you wouldn’t have this high CPU load. So why is there such a high load? Take a look at the Thread stack with Thread Dump.

By looking at the stack dump of multiple threads, we found that these threads are basically Runnable and execute native methods, andStackWalkerCorrelation, for example (and this matches the highest percentage of Method Runnable events collected in JFR, basically confirming that the thread’s CPU is being consumed primarily by the Method currently corresponding to the stack) :

It has to do with these two native methods:

  • java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk
  • java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames

Also note that the thread stack can be very deep in microservices (around 150), especially for reactive code (up to 300), mainly because servlets and filters are designed in a chain of responsibility mode, with each filter constantly being added to the stack. This is especially true for reactive code, layer upon layer, with various concatenated observation points. The stack listed above is a reactive stack.

It’s going to go to those two native methods, and what this code is really doing is it’s going to print out in the log the location of the code that called the print log method, the class name, the method name, the number of lines of the method and so on. In the thread stack example I gave above, the code location information for calling the print log method is this line: At com. XXX. Apigateway. Filter. AccessCheckFilter. Filter (AccessCheckFilter. Java: 144), in this line, we use the info () some log output.

As you can see, Log4j2 gets the location of the code calling the print log method by getting the current thread stack. And it’s not that the top of the stack is the code location for calling the log method, it’s the first stack element after finding the log4j2 stack element that’s the code location for the log method

How do you get the stack in Log4j2

Before Java 9, the stack of the current thread (we don’t want to fetch the stack of another thread, it is the current thread) can be accessed by:

Among themThread.currentThread().getStackTrace();The bottom layer is actuallynew Exception().getStackTrace();So it’s essentially the same thing.

After Java 9, a new StackWalker interface was added, combined with the Stream interface to read the stack more elegantly:

New Exception().getStackTrace(); How to get the stack at the bottom:

javaClasses.cpp

Then there is StackWalker, whose core underlying source code is:

As you can see, the cores are filled with stack details, the difference is that one directly fills all, one reduces the fill stack information. Fill in the stack, and we’re going to be accessing SymbolTable, StringTable and so on, because we’re going to be looking at the class name and the method name, not the address of the class and the address of the method, not the address of the class name and the address of the method name. It should be obvious that fetching the stack through Exception requires more visits to the Symbol Table and String Table than StackWalker does because there are more stacks to fill.

Let’s test this by simulating how much performance degradation of the original code is caused by code execution at different stack depths.

Simulate the performance difference between getting the code location of the call to the print log method two ways and not getting the code location at all

The following code I refer to as a unit test for the official Log4j2 code, starting with stack code that simulates a certain call depth:

Then, you write test code to see how much of a performance difference it makes when you simply execute the code and when you add the fetch stack.

Execute: View the result:

As you can see from the results,Fetching the code execution location, that is, fetching the stack, can result in a significant performance penalty. At the same time,This performance penalty is related to stack padding. The more stacks you fill, the greater the loss. You can see that StackWalker performs better than fetching the stack through exceptions, and that the gap becomes more apparent as the stack depth increases.

Why is it slow? String:: Intern brings performance attenuation tests

This performance degradation, as seen from the previous analysis of the underlying JVM source code, is due to access to StringTable and SymbolTable. Let’s simulate this access. The underlying access to StringTable is via the String intern methodString::internMethod, the test code is as follows:

Test results:

Contrast StackWalkBenchmark. Baseline and StackWalkBenchmark. ToString results, we see that bh. Consume (time); There is no performance penalty per se. But through them and StackWalkBenchmark. Intern and StackWalkBenchmark intern3 result contrast, found that the attenuation performance, also is very serious, and access to, the more the more serious the performance attenuation (analogy to get in front of the stack).

Conclusions and Recommendations

Therefore, we can draw the following intuitive conclusions:

  1. The position of the line of code in the log is output, and the stack is fetched by exception before Java 9 and StackWalker after Java 9
  2. Both methods require access to SymbolTable as well as To StringTable, and StackWalker can reduce traffic by reducing the number of stacks to populate
  3. Either way, the degradation of performance is significant.

Therefore, I suggest that for microservice environments, especially responsive microservice environments, the stack depth is very deep, and if a large number of logs are being output, the logs should not have code locations, otherwise it will cause serious performance degradation.

After we closed the output line location, the CPU usage was not as high under the same pressure, and the overall throughput improved significantly.

Wechat search “Dry goods Manman Zhang Hash” to follow the public account, a daily brush, easily improve skills, won a variety of offers