Summary of system optimization
Before, one of the team leaders shared some dry goods about system performance optimization. Here I put it into a paper and added some tools and skills I usually use. Since there is a lot of content involved in system performance optimization, I will share it in several articles. This time share common methods for locating system level problems.
System Performance Definition
- Throughout throughput (the number of requests the system can process per second)
- Latency (the delay in the system processing a request)
- Usage Resource Usage
Relationship between throughput and latency
- The higher the throughput, the greater the latency. Because of the volume of requests, the system is too busy, so the response time is reduced.
- The smaller the latency, the higher the throughput that can be supported. Because shorter latency means faster processing, more requests can be processed.
Asynchrony can increase the flexibility of the system's throughput without achieving faster response times.
System performance pressure measurement tools commonly used
tcpdump
1. Common parameters:
-s: The default packet capture length is 68 bytes. Complete packets can be captured after -s 0 is added. -w: the monitored packets are written to the specified fileCopy the code
Example 2.
Tcpdump -i eth1 host 10.1.1.1 Network packets whose destination or source ADDRESS is 10.1.1.1 tcpdump -i eth1 SRC host 10.1.1.1 // Source address tcpdump -i eth1 DST host 10.1.1.1 // Destination addressCopy the code
If you want to use Wireshark to analyze packets in tcpdump, you need to add the -s parameter:
tcpdump -i eth0 tcp and port 80 -s 0 -w traffic.pcap
Copy the code
Tcpcopy – On-line drainage pressure measurement
Tcpcopy is a request replication tool for real-time and offline playback. It can copy online traffic to the test machine and simulate the real environment on the line in real time, so as to bear the test of online real traffic when the program is not online. A necessary tool for a war game.
A. Tcpdump records the pace file
tcpdump -i eth0 -w online.pcap tcp and port 80
Copy the code
B. Traffic playback
X :80 -i traffic.pcap tcpcopy-x 80-10.1.x.x:80-a2 -i traffic. Pcap // Offline playback speeds up by 2 timesCopy the code
C. Traffic diversion mode
X: 80-r 20 // 20% Traffic diversion TCPcopy-x 80-10.1.x.x: 80-N 3 // Triple traffic diversionCopy the code
wrk & ApacheBench & Jmeter & webbench
I highly recommend WRK, which is lightweight and accurate, combined with Lua scripts to support more complex test scenarios.
Pressure test example: 4 threads to simulate 1000 concurrent connections, the whole test lasts 30 seconds, the connection times out 30 seconds, and the request delay statistics are printed.
> wrk -t4 -c1000 -d30s -T30s --latency http://www.baidu.com
Running 30s test @ http://www.baidu.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.71s 3.19s 26.51s 89.38%
Req/Sec 15.83 10.59 60.00 66.32%
Latency Distribution
50% 434.52ms
75% 1.70s
90% 5.66s
99% 14.38s
1572 requests in30.09 s, 26.36 MBread
Requests/sec: 52.24
Transfer/sec: 0.88MB
Copy the code
More parameters help information:
> wrk --help
Usage: wrk <options> <url>
Options:
-c, --connections <N> Connections to keep open
-d, --duration <T> Duration of test
-t, --threads <N> Number of threads to use
-s, --script <S> Load Lua script file
-H, --header <H> Add header to request
--latency Print latency statistics
--timeout <T> Socket/request timeout
-v, --version Print version details
Numeric arguments may include a SI unit (1k, 1M, 1G)
Time arguments may include a time unit (2s, 2m, 2h)
Copy the code
Locating performance bottlenecks
The system performance can be measured from the following aspects:
- Application level
- The system level
- The JVM level
- Profiler
Application level
Performance indicators of application layer:
- QPS
- Response time, 95, 99 lines, etc.
- The success rate
The system level
System level indicators include Cpu, memory, disk, and network. It is recommended to use a sharp command to query system performance:
dstat -lcdngy
Dstat is a powerful tool for monitoring CPU, disk, network, IO, and memory usage in real time.
- Installation method
yum install -y dstat
- Functional specifications
-c: displays information about CPU usage, user usage, idle, wait, interrupt, and software interrupt. -c: displays the CPU status when there are multiple cpus. For example, -c 0,1 displays information about cpu0 and cpu1.-d: Displays the disk read and write data size. -d hda,total: include hda and total. -n: displays the network status. -n eth1,total: specifies the network adapter to be displayed when there are multiple network adapters.-l: Displays the system load. -m: Displays the memory usage. -g: displays the page usage. -p: displays the process status.-s: Displays switch partition usage. -s: similar to D/N. -r: I/O request status. -y: indicates the system status. -- IPC: displays information about IPC message queues and signals. --socket: displays the TCP UDP port status.-a: This is the default option, equivalent to -cdngy. -v: equal to -pmgdsc -d total. -- Output file: This option is also useful to redirect the status information to a specified file in CSV format for later viewing. Example: dstat --output /root/dstat.csv & let the program run silently in the background and output the result to /root/dstat.csv.Copy the code
Cpu
-
Cpu usage: The Cpu is the most important resource. If the Cpu is waiting, the Cpu usage will be high.
CPU utilization = 1 - Application CPU usage/total application running time
-
User time/kernel time: Roughly determine whether the application is computationally or IO intensive.
The time the CPU spends in user-mode code is called user time, and the time it spends executing kernel-mode code is called kernel time. Kernel time mainly includes system call, kernel thread, and interrupt time. When measured system-wide, the ratio of user time to kernel time reveals the type of load being run. Computationally intensive applications spend a lot of time on user-mode code, with the ratio of user time to kernel time approaching 99/1. Examples of this are image processing, data analysis, etc. I/ O-intensive applications have a high frequency of system calls and perform I/O operations by executing kernel code. A Web server doing network I/O has a user/kernel time ratio of about 70/30.
-
Load load: The average number of processes in the running queue within a specified interval. Each CPU has a run queue that holds threads that are ready to be executed by the CPU. Ideally, you want the average load to be less than or equal to the number of Cpu cores.
The difference between LOAD and Cpu usage:
- Load averages are used to measure trends in CPU utilization, not at any point in time.
- The load mean includes all CPU requirements, not just those active at the time of measurement.
disk
Disk space: Without space, the program cannot start or an error occurs.
Du -sh // View the size of all files in the current folder df -hl // View the file system by disk partitionCopy the code
Sometimes, the disk usage is too high due to the large number of system log files on the Linux server. You are advised to clear the log files in either of the following ways:
sudo /dev/null > /var/log/**.log // Delete specified large log files, fast sudo find /var/log/ -type f -mtime +30 -exec rm -f{} \ // Delete log files created 30 days agoCopy the code
Disk permissions: Without permissions, the program cannot start or an error occurs.
ll /yourdir
Copy the code
Disk Performance Test
dd if=/dev/zero of=output.file bs=10M count=1
Copy the code
IO swallow, iowait
To focus on these two factors, a high number of disk reads and writes and a high IOwait often mean that disks can be bottlenecks. Iowait does not actually reflect that disks are a performance bottleneck, it actually measures CPU time:
%iowait = (cpu idle time)/(all cpu time)
Copy the code
So the only direct way to identify a disk as a performance bottleneck is to look at read/write times. This section describes how to locate I/O problems.
A. Determine whether the I/O problem is macroscopically. Run the top command to view the Cpu line to see the percentage of Cpu wasted in I/O Wait. The higher the value, the more CPU resources are waiting for I/O permission.
B. Determine the disk problem: iostat
%util visually reflects which disk is being written and how busy the device is. Read/write requests per millisecond (RRQM /s WRQM /s) and read/write per second (R /s w/s) also provide a lot of useful information for troubleshooting problems.
C. Identify specific processes: THE simple AND crude IOTOP shows which processes are responsible for I/O problems.
D. Ps Checks whether the process is as powerful as I/O
As is known to all, the ps command provides information such as memory, CPU, and process status. Based on the process status, you can easily find the information about the process that is waiting for I/OS.
Here are a few states of a Linux process:
- R (TASK_RUNNING) : indicates the executable state.
- S (TASK_INTERRUPTIBLE) : sleep state that can be interrupted.
- D (TASK_UNINTERRUPTIBLE) : uninterruptible sleep.
- TASK_STOPPED or TASK_TRACED: T (TASK_STOPPED or TASK_TRACED)
- Z (TASK_DEAD -exit_zombie) : indicates that the process exits and becomes a zombie process.
- X (TASK_DEAD – EXIT_DEAD) : exits and the process is to be destroyed.
The process waiting for I/ OS is generally in the “uninterruptible sleep” state. The process in the D state and R state are considered to be in the run queue.
View the process in D state:
> for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "-- -- -- -- -- -- -- --"; sleep 5; doneD 13389 /usr/libexec/ GCC /x86_64- Redhat-linux /4.4.7/cc1 -quiet -I.. /.. /include/cat -I.. / -I.-dD message_sender.c -quiet -dumpbase message_sender.c -mtune=generic -auxbase message_sender -ggdb3 -O2 -O0 -o /tmp/ccivsNPE.s
Copy the code
Proc pseudo-file system IO
> cat /proc/pid/io
rchar: 548875497
wchar: 270446556
syscr: 452342
syscw: 143986
read_bytes: 253100032
write_bytes: 24645632
cancelled_write_bytes: 3801088
Copy the code
E. Determine the file that is frequently read and written: lsof -p pid
network
1. nestat
Netstat -nt Displays TCP connection status, number of connections, and send and receive queues
The TCP status requires you to be familiar with the three-way handshake and four-way wave. The TCP status is listed here.
Client: SYN_SENT, FIN_WAIT1, FIN_WAIT2, CLOSING, TIME_WAIT Server: LISTEN, SYN_RCVD, CLOSE_WAIT, LAST_ACK Common: ESTABLISHED, CLOSEDCopy the code
Tcp state change diagram (from network) :
A few notes on TCP status:
- The normal connection state should be ESTABLISHED. If there are a large number of SYN_SENT connections, look at the firewall rules.
If the number of recV-q or Send-Q packets persists, it indicates a connection bottleneck or a bug in the program.
2. Some other common techniques
There are many other useful techniques for netstat. Here are some of the most common:
Netstat nap | grep port according to use the port all the process id netstat NAT | awk'{print $6}'| sort | uniq -c | sort - rn to track the status of all and sorted the awk'{print $1}'Access. The log | sort | uniq -c | sort - nr | head - 10. Analysis of the access log to get access to do much of the IP address of the top n netstat NAT | grep"10.1.1.1:8080" |awk '{print $5}'|awk -F: '{print $1}'| sort | uniq -c | sort - nr | head - 20 connections a server at most of the IP address of the top n netstat-sIf the number of retransmitted packets continues to increase, it is highly likely that there is a problem with the network adapterCopy the code
JVM
A killer for locating problems – the thread stack
1. Steps to get the thread stack:
ps -ef | grep java
sudo -u nobody jstack <pid> > /tmp/jstack.<pid>
Copy the code
Tip: JStack information is the stack information at a certain moment, sometimes only one Jstack can not analyze the problem, you can appropriately several jstacks, and then comparative analysis.
2. How do I find the id of the local thread from the thread stack
Nid =native thread ID. In particular, nID is identified in hexadecimal, while the local thread ID is identified in decimal. Therefore, the two can be related through the decimal conversion.
Interchangeability of hexadecimal and hexadecimal:
printf %d 0x1b40
printf "0x%x" 6976
Copy the code
3. Analysis method of high Cpu consumption
A. Find the CORRESPONDING Java process PID:
ps -ef | grep java
Copy the code
B. Find the thread that consumes the most CPU in the Java process:
top -H -p <pid>
Copy the code
- Converts the thread ID found to hexadecimal
- Jstack gets the Java thread stack
- Find the relevant stack information from the thread stack based on the hexadecimal ID
Note: The thread stack can be seen that the corresponding thread is executing Java code or Native method
Can’t find corresponding thread stack?
- The Native method executed is the recreated thread.
- The code is buggy, the heap runs out, and the JVM keeps executing full GC.
- JVM bugs 😂.
Garbage collection statistics – See Gc reasons
Jstat -gccause is used to view garbage collection statistics and, if a garbage collection has occurred, displays the reason for the last garbage collection and the reason for the current garbage collection, more than -gcutil, which shows the reason for the last garbage collection and the reason for the current garbage collection.
jstat -gccause pid 1234
Copy the code
Reprint please note the source, welcome to pay attention to my public number: Yap technology wheel