Original: Coding diary (wechat official ID: Codelogs), welcome to share, reprint please reserve the source.

Introduction to the

This is the fourth installment of the Linux command pickup series. This article focuses on the Linux commands for observing hardware resources, such as top, vmstat, pidstat, iostat, SAR, etc.

This article indexes Linux command gleanings – Getting started Linux command gleanings – Text Processing Linux command gleanings – Software resource observations

CPU and memory observations

vmstat

Vmstat is a virtual memory statistics command, which looks like it is used to measure memory. In fact, it can measure CPU, memory, and IO resources.

$ vmstat -w 1
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
 r  b         swpd         free         buff        cache   si   so    bi    bo   incs us sy id wa st 4 0 0 12531512 102680 274940 0 0 0 3 0 3 0 0 100 0 0 2 0 0 12531512 102680 274940 0 0 0 0 106 55 25 0 75 0 02 0 0 12531512 102680 274940 0 0 0 105 58 25 0 75 02 0 0 12531512 102680 274940 0 0 0 105 56 25 0 75 0 0 0Copy the code

The first line displays the statistics since the system is started. Generally, you can ignore the statistics and view the statistics from the second line.

  • R: CPU run queue length, that is, how many threads are waiting for the operating system to run, which can be regarded as a CPU saturation indicator. A high value for a long time is usually a problem.
  • B: The number of uninterruptible blocking threads, generally the number of threads blocking IO access.
  • SWPD: size of memory exchanged to a disk (unit: kB)
  • Free: indicates the remaining memory size (unit: kB)
  • Buff: Size of memory used for buffing, in kB
  • Cache: Size of memory used for file page cache, in kB
  • Si: current speed of disk swap into memory (unit: kB/s)
  • So: indicates the current memory swap speed to disk, expressed in kB/s
  • Bi: Number of disk blocks read per second, expressed in blocks/s
  • Bo: number of disk blocks written per second, in blocks/s
  • In: number of interrupts per second
  • Cs: number of thread context switches per second
  • Us: CPU usage in user mode
  • Sy: indicates the CPU usage in kernel mode
  • Id: indicates the CPU idle rate
  • Wa: The ratio of idle CPU time to total CPU time when a thread is blocked waiting for DISK I/O
  • St: steal Indicates the cost of the CPU on other tenants in the hypervisor

mpstat

Mpstat is used to view the CPU usage of each CPU core as follows:

$mpstat -p ALL 1 Linux 4.19.128- Microsoft-Standard (desktop-gc9llHC) 10/24/21_x86_64_ (8 CPU) 12:39:37 CPU %usr %nice %sys % IOwait % IRq % Soft % Steal %guest % gnICE % Idle 12:39:38 all 24.57 0.00 0.00 0.00 0.00 1.72 0.00 0.00 0.00 73.71 12:39:38 0 0.00 0.00 0.00 12.28 0.00 0.00 87.72 12:39:38 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:39:38 2 0.00 0.00 0.00 0.00 0.00 0.00 12:39:38 3 100.00 0.00 0.00 0.00 12:39:38 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:39:38 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:39:38 6 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:39:38 7 0.00 0.00 0.00 0.00 0.00 0.00 100.00Copy the code

As you can see, cores 3 and 5 have full CPU usage, while other cores are idle. This is usually due to the design of multiple threads in a program, so that some threads are busy and others are idle, and mpstat is used to see if there is such a problem. The CPU imbalance above is due to the stress command, which has 2 cores fully loaded as follows:

$ stress -c 2
Copy the code

In addition, this can happen if you use the kernel binding mechanism in Linux to fix the program to some core, but the kernel binding configuration is not properly configured. Here is how I bind the stree process that is already running to the 1,2 core:

# Query the stress process ID
$ pgrep stress
5477
5478
5479

Use taskset to bind cores
$ taskset -pc 1,2 5478
pid 5478 current affinity list: 0-7
pid 5478 new affinity list: 1,2
$ taskset -pc 1,2 5479
pid 5479 current affinity list: 0-7
pid 5479 new affinity list: 1,2
# Check the binding condition
$ taskset -pc 5479
pid 5479 current affinity list: 1,2

Use mpstat to check CPU usage. Now we can see that cores 1 and 2 are 100%$ mpstat -P ALL 1 1 Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 74.23 Average: 0.00 0.00 0.00 0.00 7.48 0.00 0.00 0.00 92.52 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00 0.00 Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00Copy the code

Note: Binding kernel actually has pros and cons. The program is bound to run on some cores, which can avoid the migration of threads between cores, thus improving the utilization of CPU cache. However, in general, most current programs are not recommended to use the binding mechanism, which should be managed by infrastructure such as Docker.

top

The above vmstat and mpstat commands can only view the status of the entire system, while the top and pidstat commands can view the status of each process, as follows:

$top top-13:14:07 up 2 days, 6:38, 0 Users, Load Average: 1.65, 0.59, 0.27 Tasks: 17 total, 3 running, 14 sleeping, 0 stopped, 0 zombie %Cpu(s): 25.0US, 0.0sy, 0.0Ni, 74.9 ID, 0.0wa, 0.0Hi, 0.0Si, 0.0ST MiB Mem: 12693.4 Total, 12052.8 Free, 271.6 used, 368.9 buff/ Cache MiB Swap: Total 4096.0, 4096.0 free, 0.0 used. 12171.8 Avail Mem PID USER PR NI VIRT RES SHR S %CPU % Mem TIME+ COMMAND 3174 Work 20 0 3860 104 0 R 100.0 0.0 1:40.75 Stress 3175 work 20 0 3860 104 0 R 100.0 0.0 1:40.76 stress 1 root 20 0 900 492 428 S 0.0 0.0 0:00.11 init 10 Work 20 0 10044 5140 3424 S 0.0 0.0 0:00.12 bash 3051 work 20 0 6393208 204364 20116 S 0.0 1.6 0:07.51 Java 3173 work 20 0 3860 980 896 S 0.0 0.0 0:00.00 Stress 3176 work 20 0 10888 3932 3348 R 0.0 0.0 00.02 topCopy the code
  • The first line is the system summary: current time, system startup time, system users, system load 1min/5min/15min
  • The second line is the task summary: total tasks, number of running/sleeping/suspended/zombie tasks
  • Line 3 CPU usage: us: CPU time used by non-NICed processes sy: CPU time used by kernel processes NI: CPU time used by NICed processes ID: The percentage of CPU time spent by idle processes in the kernel. Generally, the CPU cannot be idle. The CPU is idle, which is the code running an idle program. Wa: proportion of CPU time spent waiting for disk I/O completion HI: proportion of CPU time spent processing hardware interrupts SI: proportion of CPU time spent processing software interrupts ST: proportion of CPU time stolen by other VMS
  • Memory usage: total: total memory size (MB) Free: free memory size (MB) Used: used memory size (MB) Buff /cache: memory size used for file cache and system cache (MB)
  • The fifth line is the swap information: Total: total swap file size Free: swap free size used: swap use size Avail Mem: available memory size, and swap is irrelevant, approximately equal to the previous line free+buff/cache

Top is also an interactive command, which can be used by typing commands directly on this interface, as follows:

instruction Functional description
1 View the CPU usage of CPU cores 1, similar to mpstat
M Processes in reverse order by memory usageshift + m
P Processes in reverse order by CPU usageshift + p
H To check the thread status, pressshift + h
c View the full command line of the process
k Kills the process with the specified PID
h See the help
q Out of the top

Note that many of these instructions are on/off. For example, press 1 to display CPU core usage, and then press 1 to display the overall CPU usage. In addition, if your computer is 8-core, the CPU% in the top process can be up to 800%, the first time you see this phenomenon, it is easy to be surprised!

pidstat

Pidstat is basically similar to top, except that it is a non-interactive command. It is usually used as a supplement to top, as follows:

The CPU usage of the active process can be checked by default
$ pidstat 1
13:32:45      UID       PID    %usr %system  %guest   %wait%CPU CPU Command 13:32:46 1000 3051 0.00 0.00 0.00 1.00 1 Java 13:32:46 1000 3241 100.00 0.00 0.00 0.00 7 Stress 13:32:46 1000 3242 100.00 0.00 0.00 100.00 5 Stress Average: UID PID %usr %system %guest %wait%CPU CPU Command Average: 1000 3051 0.00 0.33 0.00 0.00 0.33 - Java Average: 1000 3241 100.00 0.00 0.00 - Stress Average: 1000 3242 100.00 0.00 0.00 - Stress Average: 1000 3242 100.00 0.00 0.00 - Stress# -w can look at thread context switching
# CSWCH/S: Voluntary context switching, such as waiting for IO or locking
# NVCSWCH /s: Involuntary context switching, such as when you run out of time slices, is a general concern, because most applications today are IO intensive and rarely run out of time slices$pidstat -w 1 13:37:57 UID PID CSWCH /s NVCSWCH /s Command 13:37:58 1000 3299 1.00 0.00 pidstat 13:37:58 UID PID CSWCH /s NVCSWCH /s Command 13:37:59 0 8 1.00 0.00 init 13:37:59 1000 9 1.00 0.00 wsltermd 13:37:59 1000 3299 1.00 0.00 pidstat# -v displays the number of threads running the process and the number of file descriptors
$ pidstat -v 1
01:41:34 PM   UID       PID threads   fd-nr  Command
01:41:35 PM  1000       876      95     177  java

# -r can see memory usage of running processes and page misses
# minflt/s: Slightly missing pages
# MAJFLT/S: Serious missing pages. This usually means swRAP has occurred$pidstat -r 1 02:07:24 PM UID PID minflt/s majflt/s VSZ RSS %MEM Command 02:07:25 PM 999 2786 2.00 0.00 52792 3140 0.08 Redis-server 02:07:25 PM 1000 601098 1.00 0.00 13976 6296 0.16 SSHD-d Displays the I/O usage of a process$pidstat -d 1 14:12:06 UID PID kB_rd/s kB_wr/s kB_ccwr/s IOdelay Command 14:12:07 1000 3051 0.00 80.00 0.00 0 Java 14:12:07 1000 3404 0.00 0.00 0.00 79 StressCopy the code

free

Vmstat, top, vmstat, free, vmstat, vmstat, top, vmstat, free, vmstat, top, vmstat

-m is in MB, -g is in GB
$ free -m
              total        used        free      shared  buff/cache   available
Mem:           3907        1117         778           3        2012        2503
Swap:          1897         708        1189
Copy the code

Pay special attention to free, buff/cache and available, as follows:

  • Free: Free memory of the system. In general, the free memory in Linux gets smaller and smaller as time goes by. The reason is that Linux tries to cache accessed file data in memory as much as possible so that it can be quickly returned on the next read
  • Buff /cache: indicates the memory size of files cached in memory
  • Available: indicates the actual available memory of the systemfree+buff/cache, so if the system has enough memory, you should look at the value available.

slabtop

Slab is a memory allocation mechanism of Linux operating system. Manage slab allocator is based on the object, the same type of object to (such as process descriptor is category), each application for such an object, from a list of slab slab allocator is assigned a unit of this size, and released, to save it in the list, when later again to request a new object, It can be fetched directly from the list without repeated initialization.

You can think of it as a mechanism for the kernel to implement object pooling, which is contained in the buff/cache column and can be measured as follows:

$sudo slabTOP Active/Total Objects (% used) : 1641750/1772440 (92.6%) Active/Total Slabs (% used) : 35906/35906 (100.0%) Active/Total Caches (% used) 107/158 (67.7%) Active/Total Size (% Used) : 512123.18K / 553465.8K (92.5%) Minimum/Average/Maximum Object: 0.01K / 0.31K / 16.75K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 282324 246579 87% 0.19K 6722 42 53776K Dentry 173824 159910 91% 0.03K 1358 128 5432K kmalloc-32 173784 164054 94% 0.10K 4456 39 17824K buffer_head 167580 159572 95% 0.13K 2793 60 22344K kernfs_node_cache 100839 89862 89% 1.07K 3479 29 111328K ext4_inode_cache 91260 86183 94% 0.81k 2340 39 74880K fuse_inode 65084 62708 96% 0.59k 1228 53 39296K inode_cache 64576 64401 99% 0.50k 1009 64 32288K kmalloc-512 53120 51516 96% 0.06k 830 64 3320K anon_vma_chainCopy the code

Disk observation

df

Using the df command, you can easily view the space usage of the file system as follows:

$df -h Filesystem Size Used Avail Use% Mounted on udev 1.9g 0 1.9g 0% /dev TMPFS 391M 2.7m 389M 1% /run /dev/sda1 276G 150G 115G 57% / TMPFS 2.0g 0 2.0g 0% /dev/shm TMPFS 5.0m 4.0k 5.0m 1% /run/lockCopy the code

The Use% column for /dev/sda1 shows that the disk is 57% used.

iostat

You can run the iostat command to view the current I/O status of a disk as follows:

$iostat -xz 1 AVG-CPU: %user %nice % System % IOwait % Steal % Idle 0.06 0.00 0.00 0.00 0.00 99.94 Device: RRQM /s WRQM /s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await SVCTM %util sda 1.87 17854.96 3799.10 14930.26 42642.19 208548.03 26.82 7.10 0.37 1.04 0.20 0.28 522.73 AVg-CPU: %user % Nice % System % IOwait % Steal % Idle 4.36 0.00 0.00 0.00 0.00 95.64 Device: RRQM /s WRQM /s r/s w/s rkB/s wkB/s AVgrq-sz AVgqu-sz await r_await w_await SVCTM %util SDA 0.00 0.00 606.00 0.00 0.04 0.06 0.00 0.06 0.06 3.40Copy the code

Note that, like vmstat, the first output is the historical statistics, which are generally ignored as follows:

  • %util: disk usage. Linux assumes that a disk can handle only one concurrency, but SSDS or RAID groups can actually handle more than one, so 100% does not necessarily mean full load.
  • Avgqu-sz: indicates the length of the disk task queue. If the number of concurrent tasks exceeds that of the disk, the disk is saturated.
  • SVCTM: indicates the average service time, excluding the waiting time in the disk queue.
  • R_await, W_await: read and write latency (ms), disk queue wait time + SVCTM, too much disk saturation.
  • R/S + W /s: indicates the current IOPS.
  • Avgrq-sz: indicates that the current average throughput per second is in sectors (512B).

Note: The preceding command output is in centos. The output of the iostat command in Ubuntu is slightly different.

iotop

Iotop Views THE I/O status of a process as follows:

# -p indicates that only the whole process is looked at, otherwise IOTOP looks at each thread
-o indicates that only the processes with I/O operations are viewed. Otherwise, IOTOP lists all processes$sudo iotop - P - o Total DISK READ: 3.84 K/s | Total DISK WRITE: 138.97 M/s Current DISK READ: 3.84 K/s | Current DISK WRITE: 80.63m /s PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 737183 be/4 root 3.84k /s 0.00 B/s 0.00%88.89% [kworker/ U256:1 +flush-8:0] 761496 be/4 work 0.00 B/s 0.00 % stress -D 1 876 be/4 work 0.00 B/s 0.00 % stress -D 1 876 be/4 work 0.00 B/s 0.00 % stress -D 1 876 be/4 work 0.00 B/s 0.00 % K/s 0.00%0.00 % java-xms256m -XMx1g-xSS1m-xx :MaxMetaspaceSize=1g...Copy the code

You can view the current read/write rates of the entire disk and the proportion of each process.

Hcache and vmtouch

When we access a file, the kernel will cache the file data in memory for us. This is the buff/cache item in the free command above. Hcache and vmTouch can view the file cache details as follows:

# See pagecache for the first 6 files$ sudo ./hcache -top 6 +---------------------------------------------------------------------+----------------+------------+-----------+------- --+ | Name | Size (bytes) | Pages | Cached | Percent | |---------------------------------------------------------------------+----------------+------------+-----------+------- --| | /var/log/ journal/ae4a8a4eec6f418a9596826e2f4f6891 / system. Journal 090.515 | | 33554432 | | 7415 | 8192 | / snap/core / 11993 / usr/lib/snapd snapd 064.147 | | 24113872 | | 3777 | 5888 | / usr/bin/dockerd | 104770064 | 25579 | 2692 010.524 | | | / usr /local/ mysql/bin/mysqld 003.197 | | 227306584 | 55495 | 1774 | | / var /log/ journal/ae4a8a4eec6f418a9596826e2f4f6891 / user - 1000. The journal 077.637 | | 8388608 | | 1590 | 2048 | / usr/bin/python3.8 | 5482296 | 1339 | 1072 | | 080.060 +---------------------------------------------------------------------+----------------+------------+-----------+------- --+# view pagecache details for the specified file
$ vmtouch -v /usr/local/mysql/bin/mysqld
/usr/local/mysql/bin/mysqld [OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 55495/55495 Files: 1 Directories: 0 Resident Pages: Elapsed: 0.000275 seconds# clear the pagecache of the specified file
$ vmtouch -ve /var/log/journal/ae4a8a4eec6f418a9596826e2f4f6891/system.journal
Evicting /var/log/journal/ae4a8a4eec6f418a9596826e2f4f6891/system.journal

           Files: 1
     Directories: 0
   Evicted Pages: 8192 (32M)
         Elapsed: 0.008735 seconds

# Clear "pagecache" and "slab", it's not necessary to do this unless you know exactly what you're doing
$ sync                               # Write the modified pagecache back to disk
$ echo 1 > /proc/sys/vm/drop_caches  # remove pagecache
$ echo 2 > /proc/sys/vm/drop_caches  # remove slab
$ echo 3 > /proc/sys/vm/drop_caches  # Clear pagecache with slab
Copy the code

Hcache download address: github.com/silenceshel…

Network observation

nicstat

Nicstat can view the network adapter usage as follows:

$nicstat -z 1 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:22 ens33 38.09 7.13 32.03 6.77 1217.8 1078.4 0.03 22:35:22 lo 0.07 0.07 0.36 0.36 207.6 207.6 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:23 ens33 0.27 0.56 3.99 4.99 69.50 114.0 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:24 Ens33 0.21 0.34 3.00 3.00 72.67 116.7 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:25 ens33 0.28 3.00 3.00 72.67 116.7 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:25 ens33 0.28 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:26 ens33 0.34 0.34 5.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:26 ens33 0.34 0.34 5.00 3.00 69.20 116.7 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:27 ens33 0.28 0.33 4.00 3.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 22:35:27 ens33 0.28 0.33 4.00 3.00 70.50 111.3 0.00 0.00Copy the code

%Util is the network card bandwidth usage.

iftop

Iftop can be used to view the current network speed of the entire network card and each connection as follows:

$sudo iftop - b-nNP 244KB 488KB 732KB 977KB 1.19MB └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 10.134.60.10:10100 = > 10.134.92.10:29318 85.8 KB 103 KB 94.3 KB < = 2.95 KB of 3.54 KB 3.25KB 10.134.60.10:10100 => 10.134.93.9:30981 170KB 103KB 94.3KB <= 5.41KB 3.49KB 3.21KB 10.134.60.10:35172 => 10.134.24.54:3961 13.3KB 9.88KB 5.25KB <= 58.6KB 60.6KB 32.4KB 10.134.60.10:43240 => 10.134.24.55:3960 9.83KB 5.52KB 3.09KB <= 101KB 53.6KB 31.1KB 10.134.60.10:60932 => 10.134.24.55:3961 4.45KB 5.07KB 6.04KB <= 35.0KB 39.8KB 47.4KB 10.134.60.10:58990 => 10.134.24.5:80 22.0KB 19.2KB 22.5KB ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ the TX: cum: 10.9 MB peak: 1.75 MB rates: 611 KB, 438 KB, 557 KB RX: 6.49MB 453KB 360KB 296KB 332KB TOTAL: 17.4MB 2.19MB 972KB 735KB 889KBCopy the code

In addition, if the nicstat and iftop commands are unavailable, you can also use ifconfig + awk to view the network speed (unit: B/s) as follows:

$ while sleep 1;do ifconfig;done|awk -v RS= 'match($0,/^(\w+):.*RX.*bytes ([0-9]+).*TX.*bytes ([0-9]+)/,a){eth=a[1]; if(s[eth][1])print a[1],a[2]-s[eth][2],a[3]-s[eth][3]; for(k in a)s[eth][k]=a[k]}'eth0 294873 353037 lo 2229 2229 eth0 613730 666086 lo 17981 17981 eth0 317336 544921 lo 5544 5544 eth0 237694 516947 lo 2256, 2256,Copy the code

Omnipotent observation tool SAR

SAR is an almost universal observation tool that can observe CPU, memory, disk, network, etc. Unlike the above commands, it focuses on one aspect only, and because it is so powerful, it is much more difficult to master. Common uses of SAR are as follows:

# CPU usage
sar -u ALL 1
# Run queue and load
sar -q 1
# interrupt times
sar -I SUM 1
# number of process creation times vs. number of thread context switches
sar -w 1
Memory usage, dirty pages, and slab
sar -r ALL 1
# Missing page and memory page scan
sar -B 1
# Memory swap usage
sar -S 1 1
sar -W 1
# disk IOPS
sar -dp 1
# File descriptor and number of open terminals
sar -v 1 1
# Nic layer usage
sar -n DEV 1
The TCP layer receives and sends packets
sar -n TCP,ETCP 1
# Socket usage
sar -n SOCK 1
Copy the code

This is just a list of some of the uses of SAR, in fact SAR can observe a lot of content, the details can be seen in man SAR.

Method of USE

Brendan Gregg, a performance optimization guru, summed up the USE method and told us which metrics to focus on for hardware resources, as follows:

  1. Utilization: Indicates the percentage of resources used, such as CPU usage and memory usage.
  2. Saturation refers to the saturation of resources. If most resources use the queuing mechanism, saturation represents the number of tasks waiting to be processed in the queue.
  3. Errors: Errors that occur when accessing resources.

Note: The usage is different from the usage. For example, the CPU usage generally represents the time usage. If the real code is executed for 800ms in 1s, the CPU usage is 80%, while the memory usage is 50%, indicating that half of the capacity is used.

These three metrics can be understood using thread pools in Java, as follows:

  1. A thread pool is a software resource.
  2. Thread pool utilization: The percentage of currently running task threads in the total thread pool.
  3. Thread pool saturation: The number of tasks currently queued in the thread pool task queue.
  4. Thread pool error: Number of times a thread pool rejection policy was triggered.

The following table is displayed for the four common resources: CPU, memory, disk, and network.

resources Usage (utilization) Saturation (saturation) Error (errors)
CPU vmstat 1Middle “us” + “sy” + “st”



sar -uIn the same way
vmstat 1In therColumn is greater than the number of virtual cpus



sar -qIn the same way
memory free -mOf the available



vmstat 1The “free” and “swap”



sar -r“% memused”
vmstat 1“Si” and “so”



sar -B“Pgscank” and “pgscand”



dmesg | grep killed
disk iostat -xz 1The “% of util.” “



sar -d 1The “% of util.” “
iostat -xz 1“Avgqu-sz” > 1 or “await” remains high



sar -d 1In the same way
network iftopThe rates of



nicstatThe “% of Util.” “



sar -n DEV 1“RxKB/s” and “txKB/s”
ifconfig“Overruns” and “dropped”



netstat -s|grep "segments retransmited"



sar -n EDEV 1Of * * drop and fifo
ifconfig“Errors” and “dropped”



sar -n EDEV 1“Rxerr/s” and “txerr/s”

All-purpose monitoring tool

There are also a number of comprehensive observation tools similar to SAR, most of which are quite cool and are used more in personal notebooks than servers, as follows:

dstat

bpytop

glances

nmon

Content of the past

Awk is really a magic tool for Linux text command tips (top) Linux text command tips (bottom) character encoding solution