1. Background
In addition to routine functional tests, a system needs to go through strict performance tests to meet the expected performance indicators (common response time, TPS, etc.) before being allowed to go into production environment.
Generalized performance testing generally includes load test (used for testing the system capacity: namely system under the condition of the guarantee of response time how many concurrent users can allow access), pressure tests (used for the stability of the test system: under the condition of the guarantee of pressure, check the stability of the test system), concurrent test (namely multiple concurrent ability test system: Tests that simulate multiple users accessing the same application to find concurrency problems, such as thread locks, resource contention, database deadlocks, etc.
Through performance testing, we can find the bottleneck of the system as soon as possible. If the expected business goals are not being met, performance tuning is required. Performance tuning requirements, sometimes from prototype verification, sometimes from actual production problems, no matter what kind of performance tuning, we generally follow the following steps: performance monitoring, performance analysis, performance optimization. Each step is examined in detail in the following sections.
2. Performance monitoring
Performance monitoring is the first step in performance tuning. The main purpose is to understand the current state of the system, the current server resource usage, JVM memory usage, thread usage, etc., so as to find bottlenecks in the first place.
2.1 Viewing Server Configurations
To better evaluate server performance, you should first understand the configuration of the current host server. The following are common view commands for Linux servers.
2.1.1 CPU configuration
For the CPU, you are concerned about the total number of logical cores in the CPU. You can use mpstat to view the total number of logical cores.
You can run the cat /proc/cpuinfo command to view the CPU model:
2.1.2 Memory Configuration
Using the free command, you can see the total memory and how it is being used
2.1.3 Disk Configuration
You can run the fdisk -l command to view the configuration of all disks, and run the df -th command to view the directory mounting of the current disk
Sometimes, need to confirm whether the current disk is the SSD disk, determine the cat/sys/block / / queue/rotational return values (including hard disk device name for you, such as the sda, etc.), if return 1 indicates disk that can rotate, so is the HDD; Otherwise, if 0 is returned, the disk cannot be rotated, which is probably SSD. As shown in the following figure, SDA is an SSD.
2.1.4 Network Configuration
You can run the ifconfig command to view the configuration of the NIC. To check whether the current NIC is 1000 MB or 10000 MB, run the ethtool command to check the speed. As shown in the following figure, eth4 is a 1000M NIC.
2.2 Server Monitoring
To learn about the resource usage during system running in real time, you need to monitor server system resources. The following lists common commands and monitoring items.
2.2.1 CPU monitoring
Run the vmstat command. Vmstat 2 indicates that statistics are collected every 2 seconds
The key observation
ø R value in Procs, which represents the length of the scheduler run queue. If this value is more than 1 times the number of CPU logical cores for a long time, attention should be paid; if it is more than 3-4 times the number of CPU logical cores, immediate action should be taken
If the values of in (interrupt) and CS (context switch) in the System are larger, the System kernel consumes more CPU
If the PROPORTION of US (user mode) in the Cpu column is greater than 50% for a long time, optimization algorithms need to be considered. According to experience, us+ SY accounts for 80% of the reference value
You can use pidstat -w -i -p PID 2 to monitor application lock contention
Concessive-context switch (CSWCH) clock cycles occupy 3% to 5%, indicating that Java applications face lock competition. The preemptive context switch rate (NVCSWCH) is high, indicating that the number of threads ready to run exceeds the number of available virtual processors.
2.2.2 Memory Monitoring
You can also use vmstat above to view memory page swapping,
Focus on the free, SI, so columns. If free becomes smaller and SI, SO is changing, it indicates that there is insufficient memory, and page swap with disk swap occurs, so we need to consider increasing the memory.
2.2.3 Network monitoring
Using third-party software iptraf, it provides a visual page through which network traffic can be monitored in real time.
2.2.4 disk
Use iostat for monitoring
CPU attribute values:
ø If the % ioWAIT value is too high, the disk has an I/O bottleneck. If the % IDLE value is high, the CPU is idle. If the % IDLE value is high but the system responds slowly, the CPU may wait to allocate memory. If the %idle value is lower than 10 for a long time, the CPU processing capability of the system is relatively low, indicating that the CPU is the most important resource in the system.
Disk property values:
ø If %util is close to 100%, too many I/O requests are generated. The I/O system is fully loaded, and there may be a bottleneck on the disk. If SVCTM is closer to await, I/O has almost no wait time; If await is much larger than SVCTM, it indicates that the I/O queue is too long and the I/O response is too slow, and necessary optimization is required. If AVGQU-SZ is relatively large, it also indicates that equivalent IO is waiting.
2.3 the JVM monitoring
Add JMX configuration when the remote Java process is started using jVisualVM tool in JDK as follows:
This allows the JVisualVM to listen to the remote JVM through port IP +1111.
2.4 Connection Pool Monitoring
2.4.1 Viewing the Number of Database Connection pools
Using netstat – an | grep ‘db IP | wc – l command, you can see with the database created by the connection pool, watch the value to set the minimum value of the database connection pool, and the relationship between the maximum. If the maximum value is always passed, you need to consider adjusting the maximum value of the connection.
2.4.2 Viewing the Number of Working Threads
Method 1: Use the JVisualVM tool for remote monitoring to see
Method 2: Run commands to view information
2.5 the Oracle monitor
2.5.1 Viewing Oracle Configurations
Log in to an Oracle server as an Oracle user (su – oracle).
Start sqlplus command line mode (SQLplus/as sysdba)
ø Show parameter sga;
2.5.2 Performance Monitoring
ø Use the SQLplus command line mode
ø Start the snapshot command and run the quick command again when stopping
Note: snapshot command (exec dbms_workload_repository.create_snapshot ();)
ø After the snapshot is executed, fetch the report (@? /rdbms/admin/awrrpt)
ø Analysis report (focus on top 5 Time Events)
3 Performance Analysis
3.1 the JVM analysis
3.1.1 heap analysis
To not affect online performance, you can use heap dumps as follows:
Jmap -dump:live, format=b, file=heap_dump. Hprofpid
The generated.hprof file can then be imported into MAT or JVisualVM for analysis to see which objects are consuming memory. At the same time, the software provides a quick and convenient histogram method for identifying memory problems caused by creating too many of a particular object.
3.1.2 Garbage recovery analysis
When the Jvm starts, gc log collection can be enabled by setting parameters such as -xloggc, -xx :PrintGCDetails. You can also use jstat for monitoring analysis, such as jstat -gcutil PID 2 to print the current Java heap and GC status every two seconds.
3.1.3 Thread Analysis
Using the JDK’s built-in JMC and JStack tools, you can view blocked threads. The JFR integrated within JMC makes it easy to retrieve events that cause thread blocking. Jstack, on the other hand, can partly check what resources threads are blocking. The following describes the positioning roadmap of jStack:
4 Performance Optimization
Before tuning your system in depth, you should first understand why CPU utilization is low. The goal of optimizing code is to increase, not decrease, CPU utilization over a shorter period of time.
4.1 Optimization of JVM startup parameters
4.1.1 Optimization of native memory
Native memory optimizations, including the use of compressed OOP (-xx :+UseCompressedOops on JVM startup parameters) and tweaking large memory pages (both Linux configuration and JVM startup parameters -xx :LargePageSizeInBytes), can improve performance.
4.1.2 Optimization of garbage collection mechanism
ø Set the heap size properly, and set the partition of generation space properly: if the heap size is too small, it is easy to GC frequently, while if the heap size is too large, the pause time of GC will be too long. At the same time, to avoid possible use of virtual memory, memory page swapping leads to slower, at least 1 gb of physical memory.
ø How you choose the size of each partition should depend on the distribution of the object life cycle in the application. If the application has a large number of short-term objects, you should choose the larger young generation. If there are a relatively large number of persistent objects, the age should be appropriately increased.
ø Stable and oscillating heap sizes: Make -xMS and -xmx the same size for garbage collection.
4.1.3 Optimization of Large Object Allocation
ø Large objects should be allocated in TLAB as much as possible. If a large number of objects occur outside TLAB, it is necessary to consider adjusting TLAB parameters or reducing the size of allocated objects. You can view the results with the -xx :PrintTLAB flag.
Ø large objects classified as old s: distribution of the large object directly to the old s, maintain the integrity of the structure of the new generation object, in order to improve the efficiency of GC, in order to pass the – XX: PretenureSizeThreshold set into the old s threshold.
4.2 Java Programming Optimization
Because there are many points related to performance optimization in actual programming, the following are just some common optimization items for reference.
4.2.1 Thread pool optimization
ø Set the maximum number of threads, minimum number of threads, and size of thread pool task queue according to the number of current server CPUS. CPU intensive tasks configure the smallest possible threads, such as configuring a thread pool of Ncpu+1 threads. For IO intensive tasks, since threads are not executing tasks all the time, configure as many threads as possible, such as 2*Ncpu.
ø It is recommended to use bounded queues, which can increase system stability and early warning capability.
ø Tasks with different priorities can be processed using the PriorityBlockingQueue.
ø Tasks with different execution times can be assigned to thread pools of different sizes, or priority queues can be used to allow tasks with shorter execution times to be executed first. Set the priority of the thread.
4.2.2 Other programming details
Minimize memory usage, reduce the size of objects, consider the principle of minimum when setting types, remove unused attributes, and unused instance variables.
Increase object duplication by using object pools and thread-local variables. Object reuse is somewhat contradictory to GC, so the main consideration is when the initialization of an object is expensive (that is, the initialization time is long).
Thread-local variables can be used to reduce synchronization contention for objects that do not need to be shared between multiple threads and are passed between different threads.
ø optimized the pattern using java8 parallel streams in some scenarios
4.3 Database Optimization
4.3.1 Using precompilation
Using preparedStatements, reusing a pool of prepared statements can greatly improve performance while avoiding the GC issues associated with pooling large objects.
4.3.2 Using connection Pools
Introduce hikari connection pooling, which is configured at startup
ø The cost of creating a connection is very high. Obtaining a connection from a JDBC connection pool saves the time of creating a connection. Set the connection pool size properly.
ø Reasonable value setting: For example, set the batch value during retrieval, set the optimal pre-value, and set the batch value of the ResultSet, which can improve the performance of retrieval.
ø Transaction optimization: Transaction submission and transaction related locking mechanism will affect the performance of the system. It is necessary to consider setting the transaction isolation level reasonably and the batch submission strategy.
5 Performance Combat experience summary
5.1 The performance of clearing concurrency is poor
5.1.1 Symptom of the problem
When java8 is used for parallel flow computing, it is found that the performance of concurrency does not improve, and the performance deteriorates over time
5.1.2 optimization point
ø Hikari connection pool is introduced to reduce single pen delay to 5ms
ø Disable logging
ø Change SQL to precompiled mode
ø Oracle server improves Oracle memory
5.2 There are many FIN_WAIT2 connections between Hsiar and middle stage
5.3 Check whether Server_name can be configured randomly
6 summary
6.1 Performance Toolbox
6.1.1 Pressure Measuring Tool JMeter
Jmter is an open source pressure measuring tool that is also easy to use. Its use is not introduced, here is mainly about some matters needing attention:
ø Its real-time drawing depends on the response of the server. If the machine is not synchronized with the server time, there will be a fault in the display graph. In order to get a more accurate performance curve, it is recommended to use the command line mode.
ø Sometimes we find that the pressure measurement performance is not good. The possible reason lies in the client. Main considerations: the client CPU is not sufficient to support the required number of client threads, or the client takes a significant amount of time to process the response before sending the request.
6.1.2 JVM
In addition to the tools that come with the JDK mentioned above, MemoryAnalyzer from IBM and the commercial software JProfile are powerful.
6.1.3 Database Related
For database performance monitoring, you can use Spotlight, which is supported in different versions based on mysql and Oracle.
6.1.4 Network related
The Tcpdump command in Linux is used to export captured packets and then use the Wireshark to analyze the captured packets.
This article summarizes general ideas for Java performance tuning and shares some common best practices for performance tuning. I hope it’s helpful.