“This is the 27th day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge in 2021”

One, foreword

“The Master said,” If one considers the old and learns the new, one can become a teacher.”

—– The Analects of Confucius

Second, the top

Execute command:

top [-] [d] [p] [q] [c] [C] [S] [s]  [n]
Copy the code

Parameter Description:

  • D: Specifies the interval between screen information refreshes. Of course, the user can use the S interactive command to change this.
  • P: Monitors only the status of a process by specifying the process ID.
  • Q: This option will refresh top without any delay. If the caller has superuser privileges, top will run with the highest priority possible.
  • S: Specify the accumulative mode.
  • S: Run the top command in safe mode. This removes the potential dangers of interactive commands.
  • I: Make top not show any idle or dead processes.
  • C: Displays the entire command line instead of just the command name.

Command description:

  1. System uptime and average load:
Top-20:20:16 up 16:18, 4 Users, Load Average: 0.00, 0.01, 0.04Copy the code

The top command displays output similar to that of the uptime command

These fields show:

  • The current time
  • Time the system has been running
  • Number of current logged-in users
  • Average load in the last 1, 5, and 15 minutes.

You can use the ‘l’ command to switch the display of uptime

  1. task
Tasks: 112 total,   1 running, 104 sleeping,   7 stopped,   0 zombie
Copy the code

Tasks – There are currently 122 processes in the system, including 1 running, 103 sleeping, 7 stoped, and 0 zombie. The summary information of these processes can be displayed with ‘T’.

  1. State of the CPU
%Cpu(s): 0.0us, 0.3SY, 0.0Ni, 99.7id, 0.0wa, 0.0hi, 0.0Si, 0.0stCopy the code

Here is the percentage of CPU time used in different modes. These different CPU times are represented as:

  • Us, user: CPU time for running (unprioritized) user processes
  • Sy, system: CPU time for running kernel processes
  • Ni, niced: CPU time to run a user process whose priority has been adjusted
  • Wa, IO wait: CPU time used to wait for I/O completion
  • Hi: CPU time to handle hardware interrupts
  • Si: CPU time to process software interrupts
  • St: The amount of CPU time stolen by the hypervisor from this vm.

The display can be switched using the ‘t’ command.

  • 0.0% US – The percentage of CPU occupied by user space
  • 0.3% SY – The percentage of CPU occupied by kernel space
  • 0.0% NI – The percentage of CPU used by processes that changed their priority
  • 99.7% ID – Percentage of idle CPUS
  • 0.0% CPU usage of WA-IO waits
  • 0.0% HI – Percentage of hard interrupts (IRQ) used by cpus
  • 0.0% SI – Percentage of the CPU used by Software Interrupts
  1. Memory usage
KiB Mem :   995896 total,   432992 free,   168912 used,   393992 buff/cache
KiB Swap:  2097148 total,  2084084 free,    13064 used.   621592 avail Mem
Copy the code

The next two lines show memory usage, somewhat like the ‘free’ command.

  • The first line is physical memory usage
    • The physical memory is displayed as follows: All available memory, used memory, free memory, and buffered memory.
  • The second line is virtual memory usage (swap space)
    • The swap section shows: all, used, free, and buffered swap space.

Memory display can be toggled with the ‘m’ command.

  • 995896 Total – Total physical memory
  • 168912K Used – Total amount of memory in use
  • 432992 K Free – Total free memory
  • 393992K Buffers – Amount of memory in the cache

Swap Swap partition:

  • 2097148K Total – Total number of switch areas
  • 13064K Used – Total number of switch areas used
  • 2084084K Free – Total number of free switch areas
  • 621592K cached – Total amount of cached swap area

In line 4, used refers to the amount of memory currently controlled by the kernel. Free is the amount of memory that the kernel does not control. Not all kernel managed memory is in use, including memory that was used in the past and can now be reused. The kernel does not return this reusable memory to free, so on Linux there will be less and less free memory, but don’t worry about that.

If you want to calculate free memory out of habit, here’s an approximate formula: Free on line 4 + buffers on line 5 + cached on line 5 = available memory for the server Cached on line 5 = available memory of the server free on line 4 + Buffers on line 4 + cached on line 5 = available memory of the server

For memory monitoring, in top we should always monitor the fifth line of swap partition used. If this value is constantly changing, it indicates that the kernel is constantly exchanging data between memory and swap, which is the real memory is insufficient.

  1. Monitor the status of each process (task)

Parameter Description:

  • PID: indicates the unique identifier of a process.
  • USER: indicates the actual USER name of the process owner.
  • PR: scheduling priority of a process. Some values of this field are ‘rt’. This means that these processes are running in real tense.
  • NI: The nice value (priority) of the process. A smaller value means a higher priority. A negative value indicates a high priority and a positive value indicates a low priority
  • VIRT: virtual memory used by a process. Total amount of virtual memory used by a process (unit: KB)VIRT=SWAP+RES
  • RES: specifies the resident memory size. Resident memory is the amount of non-switched physical memory used by a task. The amount of physical memory used by a process that has not been swapped out, in KB.RES=CODE+DATA
  • SHR: SHR is the shared memory used by the process. The size of the shared memory, expressed in KB
  • S: This is the state of the process. It has the following different values:
    • D – Uninterruptible state of sleep.
    • R — Running state
    • S — Sleep state
    • T – Tracked or stopped
    • Z – zombie state
  • %CPU: Percentage of CPU time used by the task since the last update.
  • %MEM: Percentage of available physical memory used by a process.
  • TIME+: Total CPU time used since the task started, accurate to one hundredth of a second.
  • COMMAND: COMMAND used to run a process. Process name (command name/command line)
  1. Interactive command – ‘h’ help command

In the top basic view, press the keyboard number “1” to monitor the status of each logical CPU :(this vm is one CPU)

Monitor Java thread count:

ps -eLf | grep java | wc -l
Copy the code

Monitoring network customer connection number:

Netstat -n | grep TCP | grep listener port | wc -lCopy the code

Third, the vmstat

  • 2 indicates that the server status is collected every two seconds
  • 1 indicates that data collection is performed only once.

Structure Description:

  • R: Indicates the run queue (that is, how many processes are actually allocated to the CPU). The server I tested is currently running on idle CPU, and when this value exceeds the number of CPUS, there will be a CPU bottleneck. This is also related to the load of top. Generally, if the load exceeds 3, it is high, if the load exceeds 5, it is high, and if the load exceeds 10, it is abnormal. The state of the server is very dangerous. The load of top is like a run queue per second. If the run queue is too large, your CPU is busy, which generally results in high CPU utilization.
  • B: The process is blocked.
  • Swap: The size of the virtual memory that has been used. If it is greater than 0, your machine is running out of physical memory.
  • Free: the size of the free physical memory. The total memory of my machine is 8G, and the remaining 3415M.
  • Buff: Linux/Unix is used to store the contents of directories in the cache, permissions, etc. My machine probably takes up more than 300 M
  • Cache: The cache is directly used to remember the files we opened, and it is used as a buffer for files. My machine occupies about 300 M(here is the clever point of Linux/Unix, a part of the free physical memory is used as a file and directory cache, in order to improve the performance of the program execution. Buffer /cached will be used quickly.)
  • Si: Indicates the amount of virtual memory read from the disk per second. If this value is greater than 0, it indicates that the physical memory is insufficient or the memory is leaked. My machine has plenty of memory and everything is fine.
  • So: the size of virtual memory written to disk per second, if this value is greater than 0, ditto.
  • Bi: The default block size is 1024 bytes. I don’t have much IO on my machine, so it’s always 0, but I’ve seen it go up to 140,000 /s on machines that copy a lot of data (2-3 TERabytes). Disk write speeds are approximately 140 megabits per second
  • Bo: The number of blocks sent per second by the block device. For example, if we read a file, bo would be greater than 0. Bi and BO are generally close to 0, or IO is too frequent and needs to be adjusted.
  • In: indicates the number of CPU interrupts per second, including time interrupts
  • Cs: Number of context switches per second, for example, when we call system functions, we need context switches, thread switches, process context switches, keep this number as low as possible, too high, consider lowering the number of threads or processes, for example in Web servers like Apache and Nginx, We generally do performance testing will be thousands of concurrent or even tens of thousands of concurrent tests, the selection of web server process can be lowered by the process or thread peak, pressure test, until CS to a relatively small value, the number of processes and threads is a more appropriate value. The same is true for system calls. Every time a system function is called, our code will enter the kernel space, causing context switches. This is very expensive, and we should try to avoid frequent system function calls. Too many context switches means that most of your CPU is being wasted on context switches, leaving it with less time to do serious things and underusing it, which is undesirable.
  • Us: user CPU time, I’ve been on a server that does a lot of encryption and decryption, and I can see us approaching 100 and R running queue reaching 80(the machine is under stress testing and performing poorly). When the us value is high, it indicates that the user process consumes a lot of CPU time, but if the usage exceeds 50% for a long time, then we should consider optimizing the program algorithm or speeding up.
  • Sy: indicates the system CPU time. If the CPU time is too high, the system call time is long, for example, I/O operations are frequent.
  • Id: idle CPU time, generally speaking,id + us + sy = 100Generally, I think id is the idle CPU usage, US is the user CPU usage, and SY is the system CPU usage.
  • Wt: Wait for THE I/O CPU time. Note: If the value of WA is high, it indicates that the I/O wait is serious. This may be caused by random disk access or disk bottleneck (block operation).

Four, iostat

Installation method:

yum install sysstat
Copy the code
Iostat [parameter] [time] [number of times]Copy the code

Parameter Description:

  • -c: displays the CPU usage
  • -d: displays the disk usage
  • -k: Displays data in the unit of K
  • -m: Displays the information in the unit of M
  • -n: Displays disk array (LVM) information
  • -n: Displays NFS usage
  • -p: displays the usage of each partition on each disk
  • -t: displays information about the terminal and CPU
  • -x: Displays detailed information

  • RRQM /s: How many device-related read requests are merged per second (When system calls need to read data, VFS sends requests to FS, and if FS finds that different read requests are from the same Block, FS merges the requests)
  • WRQM/S: How many device-related write requests are merged per second
  • Rsec /s: number of sectors read per second
  • Wsec / : Number of sectors written per second
  • RKB/S: The number of read requests that were issued to The device per second
  • WKB/S: The number of write requests that were issued to The device per second
  • Avgrq-sz Indicates the average size of the requested sector
  • Avgqu-sz is the average request queue length. There is no doubt that the shorter the queue length, the better.
  • Await: The average processing time (in microseconds ms) of each IO request. In general, the IO response time of the system should be less than 5 ms. If it is greater than 10 ms, it is relatively large. This time includes queue time and service time, that is to say, in general, await is greater than SVCTM, and the smaller the difference between them, the shorter the queue time will be; otherwise, the larger the difference, the longer the queue time will be, indicating that the system has a problem.
  • SVCTM: indicates the average service time (in milliseconds) of each DEVICE I/O operation. If the value of SVCTM is very close to await, it means there is almost no I/O wait and the disk is performing well. If the value of await is much higher than the value of SVCTM, it means the I/O queue is too long and the application running on the system will be slow.
  • %util: indicates the total processing I/O time divided by the total processing time. For example, if the statistical interval is 1 second, the device has 0.8 seconds in processing IO and 0.2 seconds idle, then the device’s%util = 0.8/1 = 80%, so this parameter indicates how busy the device is
  • In general, if this parameter is 100%, the device is running near full capacity (of course if it is multi-disk, even if %util is 100%, disk usage is not necessarily at a bottleneck because of disk concurrency).

Common usage:

iostat -d -k 1 10    Check TPS and throughput (read/write speed in KB)
iostat -d -m 2       Check TPS and throughput (read/write speed in MB)
iostat -d -x -k 1 10 # view device usage (%util), response time (await)
iostat -c 1 10       Check the CPU status
iostat -c 1 10       Check the CPU status
Copy the code

Note:

  • The high throughput of the network card may result in more cpus
  • The large amount of CPU overhead increases the number of memory usage requests
  • Large memory and disk requests can lead to more CPU and IO problems

5, free,

  • Mem: Line (second) is memory usage
  • The Swap: row (third row) is the Swap space usage.
  • Total: The column displays the total available physical memory and swap space of the system.
  • Used: Column shows the physical memory and swap space that has been used.
  • Free: Column shows how much physical memory and swap space is available.
  • Shared: Column shows how much physical memory is being shared.
  • Buff /cache: Column shows the physical memory size used by buffer and cache.
  • Available: Column shows the amount of physical memory that can still be used by the application.

Six, iftop

Interface related description:

  • Interface above display is similar to the scale scale range, for the display of flow graphics for the strip ruler.

  • The left and right arrows in the middle indicate the direction of traffic.

  • TX: sends traffic

  • RX: receives traffic

  • TOTAL: indicates the TOTAL traffic

  • Cumm: indicates the total traffic of ifTOP until now

  • Peak: indicates the peak traffic

  • Rates: average flow rates over the past 2s 10s 40s, respectively

Common parameters:

  • -i Sets the network adapter to monitor, for example, # iftop -i eth1

  • -b Displays traffic in bytes (bits by default), for example, # iftop -b

  • -n Enables the host information to display IP addresses by default, for example, # iftop -n

  • -n Makes the port information display the port number by default, for example, # iftop -n

  • -f Displays the incoming and outgoing traffic of a specific network segment, for example, # iftop -f 100.100.30.25 or # iftop -f 100.100.30.25/255.255.255.0

  • -h (display this message) : displays the help information

  • -p After this parameter is used, IP addresses other than the local host are displayed in the list in the middle.

  • -b Displays the traffic graph bar by default.

  • -f = -f = -f

  • -p enables both host information and port information to be displayed by default.

  • -m Sets the maximum value of the top scale on the interface. The scale is displayed in five large segments, for example, # iftop -m 100M

Iftop (Case sensitive)

Common operations:

  • Press h to switch whether help is displayed;

  • Press n to switch the IP address or host name of the host.

  • Press S to switch whether the host information of the host is displayed.

  • Press D to switch whether the host information of the remote target host is displayed.

  • Press T to switch the display format to 2 lines / 1 line/display only the sent traffic/display only the received traffic.

  • Press N to display the port number or port service name.

  • Press S to switch whether the port information of the host is displayed.

  • Press D to switch whether the port information of the remote target host is displayed.

  • Press P to switch whether the port information is displayed.

  • Press P to pause/continue display;

  • Press B to switch whether to display the graph bar of average traffic.

  • Calculate the average flow in 2 seconds, 10 seconds or 40 seconds according to B switch;

  • Press T to switch whether to display the total traffic of each connection;

  • Press L to enable the screen filtering function, enter the character to be filtered, such as IP, and press Enter to display only traffic information related to this IP address.

  • Press L to switch the scale above the display screen; The scale is different, the flow graph bar will change;

  • Press J or K to scroll up or down the connection record displayed on the screen;

  • Press 1, 2 or 3 to sort traffic data according to the three columns displayed on the right.

  • Sort by native name or IP on the left;

  • Press > to sort by the host name or IP address of the remote target host.

  • Press O to switch whether only the current connection is displayed.

  • Press f to edit the filter code, that’s the translation, I haven’t used this yet!

  • According to the! You can use shell commands, this is not used! I don’t know what command works here!

  • Press Q to exit monitoring.