Uptime command

This command gives you a quick look at the load on your machine. On A Linux system, this data represents the number of processes waiting for CPU resources and the number of processes (process status D) that are blocked in an uninterruptible IO process. This data can give us a macro view of system resource usage.

The output of the command shows the load average of 1 minute, 5 minutes, and 15 minutes respectively. Using these three data points, you can see whether the server load is trending tight or easing. If the 1-minute load average is high and the 15-minute load average is low, the server is under high load and you need to further check where CPU resources are being consumed. On the other hand, if the 15-minute average load is high and the 1-minute average load is low, the CPU resource stress may have passed.

In the output of the example above, you can see that the average load in the last 1 minute is very high and much higher than the average load in the last 15 minutes, so we need to continue to identify which processes in the current system are consuming a lot of resources. You can run the vmstat and mpstat commands described in the following sections to further troubleshoot the fault.

Dmesg command

This command outputs the last 10 lines of the system log. In the sample output, you can see a kernel oom kill and a TCP packet drop. These logs can help troubleshoot performance problems. Don’t forget this step.

3. The vmstat command

The vmstat(8) command outputs some system core indicators in each line. These indicators help us understand the system status in more detail. The following parameter, 1, outputs statistics every second. The header indicates the meaning of each column. Here are some columns related to performance tuning:

  • R: indicates the number of processes waiting on the CPU resource. This figure is a better indicator of CPU load than load average and does not include processes waiting for I/OS. If this value is greater than the number of CPU cores on the machine, the CPU resource on the machine is saturated.

  • Free: indicates the available memory (in kilobytes) of the system. Insufficient memory may cause performance problems. The free command in the following section provides a more detailed view of the system memory usage.

  • Si, so: the number of writes and reads in the switch area. If the value is not 0, it indicates that the system is using swap and the physical memory of the machine is insufficient.

  • Us, SY, ID, WA, ST: These represent CPU time consumption. They represent user time, system (kernel) time (SYS), idle time, IO wait time (WAIT), and stolen time (usually consumed by other VMS), respectively.

These CPU times can quickly tell us if the CPU is busy. In general, if the sum of user time and system time is very large, the CPU is busy executing instructions. If the I/O wait time is long, the system bottleneck may be disk I/O.

As you can see from the output of the sample command, a significant amount of CPU time is consumed in user mode, which means that the user application consumes CPU time. It’s not necessarily a performance problem, it needs to be analyzed in conjunction with the R queue.

4. The mpstat command

This command displays the usage of each CPU, and if there is a particularly high CPU usage, it may be caused by a single-threaded application.

The pidstat command

The pidstat command displays the CPU usage of a process. The pidstat command continuously displays the CPU usage without overwriting the previous data, which helps you observe system dynamics. In the output above, you can see that two JAVA processes consume approximately 1600% of the CPU time, consuming approximately 16 CPU cores of computing resources.

The iostat command

  • R /s, W /s, rkB/s, and wkB/s: indicates the read and write times and data volume per second (kilobytes) respectively. Excessive read/write volume may cause performance problems.

  • Await: The average wait time of an IO operation in milliseconds. This is the amount of time an application spends interacting with the disk, including the I/O wait time and the actual operation time. If this number is too high, it could be a hardware bottleneck or failure.

  • Avgqu-sz: Average number of requests made to the device. If the value is greater than 1, the hardware may be saturated (some front-end hardware supports parallel writes).

  • %util: indicates the device utilization. The value indicates how busy the device is. As a rule of thumb, if the value exceeds 60, I/O performance may be affected (refer to average WAITING time for I/O operations). If it reaches 100%, the hardware device is saturated.

If logical device data is displayed, device utilization does not indicate that the actual hardware devices on the back end are saturated. It is important to note that even if the IO performance is not ideal, it does not necessarily mean that the application performance will be poor. You can use strategies such as prefetch and write caching to improve the application performance.

The free command

The free command displays the memory usage of the system. The -m parameter indicates that the memory usage is displayed by megabytes. The last two columns represent the amount of memory used for IO caching and the amount of memory used for file system page caching, respectively. Note that the second line -/+ buffers/cache appears to use a lot of memory.

This is the memory usage policy of The Linux system, using as much memory as possible, and if the application needs memory, it is immediately reclaimed and allocated to the application. Therefore, this portion of memory is generally considered free memory.

If there is very little available memory, the system may use the swap area (if configured), which increases IO overhead (which can be withdrawn in the iostat command) and reduces system performance.

SAR command

You can run the SAR command to view the throughput of the network device. When troubleshooting performance problems, you can determine whether network devices are saturated based on the throughput of network devices. For example, the eth0 nic has a throughput rate of 22 Mbytes/s (176 Mbits/ SEC), which is lower than the upper limit of 1Gbit/ SEC.

The SAR command is used to view the TCP connection status, including:

  • Active /s: indicates the number of TCP connections initiated locally per second.

  • Passive/S: Number of remote TCP connections initiated per second. That is, TCP connections created through the accept call.

  • Retrans /s: Indicates the number of TCP retransmissions per second.

The number of TCP connections can be used to determine whether the performance problem is caused by establishing too many connections. In addition, you can determine whether the connections are initiated or passively received. TCP retransmission may be caused by poor network environment or server pressure, resulting in packet loss.

Top command

The top command contains the contents of several previous commands. For example, system load (UPtime), system memory usage (free), and system CPU usage (VMstat). Therefore, you can use this command to view the source of system load. In addition, the top command supports sorting. You can sort by column to find the processes that occupy the most memory and the highest CPU usage.

However, the output of the top command is a transient value relative to previous commands, and if you don’t keep staring, you might miss some clues. You may need to pause the top command refresh to record and compare data.

(The article is from the network, author unknown, copyright belongs to the original author)