Click here to see common memory analysis ideas and other important content
Summary: Today, Qi Guang will give some common tuning analysis ideas based on the previous list of many indicators, that is, how to find the most core among many abnormal performance indicators, and then locate the performance bottleneck, and finally perform performance tuning. The whole article will be organized in accordance with the code, CPU, memory, network, disk and other directions. For a certain optimization point, there will be a summary of the system “routine”, which is convenient for the migration practice of ideas.
1. Code correlation
When you encounter a performance problem, the first thing you should do is check to see if it is related to business code — not by reading the code to solve the problem, but by using logs or code to troubleshoot low-level errors related to business code. The best place for performance optimization is inside the application.
For example, check whether a large number of errors are reported in service logs. Most performance problems of the application layer and framework layer can be found in logs (improper log level setting leads to frantic online log calls). Furthermore, checking the main logic of the code, such as improper use of the for loop, NPE, regular expressions, mathematical calculations, and other common problems, can be fixed by simply modifying the code.
Don’t associate performance tuning with caching, asynchrony, JVM tuning, etc. Complex problems may have simple solutions, and the 80/20 rule still holds true when it comes to performance tuning. Of course, understanding some basic “common code pothole” can speed up the process of problem analysis, from CPU, memory, JVM analysis to some bottleneck optimization ideas, may also be reflected in the code.
Here are some high-frequency coding points that can cause performance problems.
1) Regular expressions are CPU intensive (greedy mode may cause backtracking), so be careful with string split(), replaceAll(), etc. Regular expression expressions must be precompiled.
2) String.intern() used on older JDKS (Java 1.6 and earlier) may cause method area (permanent generation) memory to overflow. In older JDKS, if the string pool setting is too small and too many strings are cached, there is also a significant performance overhead.
3) When exporting the exception log, if the stack information is clear, you can cancel the output of the detailed stack, the construction of the exception stack is costly. Note: when a large number of duplicate stack messages are thrown in the same location, the JIT optimizes it to throw a pre-compiled, type-matched exception, and the exception stack will not be seen.
4) Avoid unnecessary unpacking operations between reference types and base types, and try to keep the same. Automatic packing occurs too often, which will seriously consume performance.
5) Stream API selection. For complex and parallel operations, the Stream API is recommended to simplify code while taking advantage of multiple CPU cores. For simple operations or single-cpu operations, explicit iteration is recommended.
6) Manually create a thread pool using ThreadPoolExecutor based on service scenarios, and specify the number of threads and queue size based on different tasks to avoid resource exhaustion risks. The unified thread name can also facilitate subsequent troubleshooting.
7) Select concurrent containers based on business scenarios. For example, to ensure data consistency when selecting a Map container, you can use Hashtable or Map + lock. Read much more than write, use CopyOnWriteArrayList; ConcurrentHashMap is used when the amount of data accessed and accessed is small, data consistency is not required, and changes are not frequent. ConcurrentSkipListMap is used to access a large amount of data, frequently read and write data, and does not have strong data consistency requirements.
8) Lock optimization ideas include: reduce lock granularity, use lock coarsening in the loop, reduce lock holding time (read and write lock choice), etc. At the same time, some JDK optimized concurrency classes are also considered, such as LongAdder instead of AtomicLong for counting in statistical scenarios with low consistency requirements, ThreadLocalRandom instead of Random class, etc.
In addition to these code layer optimizations, there are many more not to list. We can observe that some common optimization ideas can be extracted from these points, such as:
Space for time: the use of memory or disk in exchange for more valuable CPU or network, such as cache use; Time for space: to save memory or network resources by sacrificing part of CPU, for example, to change a large network transmission into several times; Other techniques such as parallelization, asynchrony, pooling, etc.
2. The CPU
As mentioned earlier, we should pay more attention to CPU load. High CPU utilization is generally not a problem. CPU load is a key basis for judging the health of system computing resources.
2.1 High CPU usage && High load average
This is common in CPU-intensive applications, where a large number of threads are running and I/ OS are small. Common APPLICATION scenarios that consume CPU resources are as follows:
Regular operation math serialization/deserialization reflection operation infinite loop or unreasonably large loop base/third party component defect
The common way to check high CPU usage is as follows: Print the thread stack multiple times (> five times) by jStack to locate the thread stack that consumes a large number of cpus. Or, through Profiling (based on event sampling or burial points), an on-CPU flame map can be applied over a period of time to quickly locate problems.
It is also possible that the application has frequent GC (including Young GC, Old GC, and Full GC), which can lead to increased CPU utilization and load. The jstat -gcutil command is used to continuously display the number and time of GC statistics collected by the current application. Frequent GC increases the load and is usually accompanied by a lack of available memory. Use commands such as free or top to check the amount of available memory on the current machine.
Is it possible that CPU performance bottleneck is the cause of high CPU utilization? It’s possible. You can further view detailed CPU utilization through vmstat. If the user-mode CPU usage (US) is high, it indicates that user-mode processes occupy a large number of cpus. If the value is greater than 50% for a long time, check the application performance. Kernel-mode CPU utilization (SY) is high, indicating that kernel-mode occupies a large amount of CPU. Therefore, it is important to check the performance of kernel threads or system calls. If the value of US + SY is greater than 80%, the CPU may be insufficient.
2.2 Low CPU usage & High load average
If CPU utilization is not high, our application is not busy computing, but doing something else. Low CPU utilization and high load average are common for I/O intensive processes. This is easy to understand, since load average is the sum of R and D processes. If you remove the first one, only D processes are left. Such as disk I/O, network I/O, etc.).
Investigation && verification ideas: Check the % WA (ioWAIT) column. This column indicates the percentage of disk I/O wait time in the CPU time slice. If the value exceeds 30%, it indicates that disk I/O wait is serious. This may be caused by a large number of random disk accesses or direct disk accesses (without the use of the system cache), or the disk itself may have a bottleneck. You can verify this by combining the output of IOstat or Dstat. For example, if the %wa(ioWAIT) increases and the disk read request is large, the disk read may be a problem.
In addition, network requests that take a long time (namely, network I/O) also increase the AVERAGE CPU load, such as slow MySQL query and interface data acquisition using RPC interfaces. The troubleshooting of this situation generally requires comprehensive analysis based on the upstream and downstream dependencies of the application itself and trace logs of middleware buried points.
2.3 CPU context Switchover Times Becomes high
Vmstat was used to check the number of context switches on the system, and pidstat was used to check the voluntary and involuntary context switches on the process (CSWCH). Voluntary context switching is caused by intra-application thread state transitions, such as invoking sleep(), join(), wait(), or using Lock or synchronized Lock structures; An involuntary context switch is caused by a thread running out of allocated time slices or by execution priorities being scheduled by the scheduler.
If the number of voluntary context switches is high, it means that the CPU is waiting to acquire resources, such as insufficient system resources such as I/O and memory. If the number of involuntary context switches is high, the possible cause is that there are too many threads in the application, leading to fierce COMPETITION for CPU time slices and frequent forced scheduling by the system. In this case, the number of threads and thread status distribution can be used as evidence.
1. Memory
As mentioned above, memory is divided into system memory and process memory (including Java application process). Generally, most of the memory problems we encounter will fall on the process memory, and the bottleneck caused by system resources is relatively small. For Java processes, their built-in memory management automatically solves two problems: how to allocate memory to objects and how to reclaim the memory allocated to objects, with the garbage collection mechanism at its core.
Although garbage collection can effectively prevent memory leaks and ensure the effective use of memory, it is not a panacea. Improper parameter configuration and code logic will still bring a series of memory problems. In addition, the early garbage collectors were not very functional or efficient at collecting, and the excessive GC parameter Settings depended heavily on the tuning experience of the developer. For example, improper setting of maximum heap memory can cause problems such as heap overflow or heap flapping.
Let’s look at some common memory problem analysis ideas.
Keywords: Design pattern cache JavaScript network protocol front-end development Java API scheduling Perl container