This is the first article I participated in beginners’ introduction

Thanks to the Nuggets campaign, IT gave me a chance to settle down and start doing something I “wanted to do but never did” — writing

background

I believe that most of your programs are running on the service, and most of the server system is Linux system. In the daily development, deployment, debugging and online problem investigation process, there are more or less problems that need to be located. This paper mainly lists some common scenarios and shell commands in the daily operation and maintenance process. I hope I can give you some help in the process of Linux operation and maintenance projects

Analysis methods

When our program on the server exception, mainly divided into the following three steps

  • Program checks
  • Investigation Service Log
  • Checking System Resources

Program checks

First of all, we found that the first step to do is to check whether their program is normal, mainly using the following commands

The ps command

Ps command is a more powerful process view command, used to report the current system process status, in the daily application should be more, with the pipeline has a lot of application scenarios, his corresponding parameters are many, the most commonly used is with AUX using PS AUX represents the user based display of all the programs on the machine

– [Scenario 1] View information about a specified process

ps aux | grep nginx

View information about nginx processes

– [Scenario 2] View the number of processes run by the admin user

ps -ef | awk '{if ($1 == "admin") count++} END {print count}'

There are many things you can do with pipes combined with grep and AWk

dmesg

Dmesg can print information about kernel/hardware interaction to the terminal. It can detect TCP or hard disk failures, as well as program memory problems

– [Scenario 1] The Java program abnormally exits. How do I rectify the fault

Check whether there is a memory leak by using the dmesg command. OOM kills the process

  • Dmesg printing time problem

This is a question that I have actually encountered and many students are asking, but the time is not displayed, and it is difficult to distinguish historical events from real-time events

You can add time in the following ways

RedHat5 echo 1 > /sys/module/printk/parameters/printk_time

RedHat6echo Y > /sys/module/printk/parameters/time

In this way, I can also add related monitoring based on DMESG information by time and find problems in time

Investigation Service Log

If the service is running properly but the result is not as expected, we need to check the service log to see if there is error or warning, which involves file operations. Linux file operations include grep, sed, and awk

The grep command

The text search command is used to retrieve text

  • [Scenario 1] View the context of the matching text

Grep -b 10 “error” test.log -a n: matches the last n lines of the text -b n: matches the first n lines of the text -c n: matches the first n lines of the text

  • [Scenario 2] Count the number of texts

Grep -c “warining” test.log Displays the matching number

  • [Scenario 3] Recursively find subdirectories

Grep -nr error./log/ -r indicates that the ground cabinet searches for files containing matching characters in the current directory and subdirectories. -n indicates the number of matched lines

The sed command

Sed is used to automatically edit one or more files, reducing repetitive operations

  • [Scenario 1] Replace all matching text in the file

sed -i 's/delete/deleted/g' demo.txt

S/indicates that delete is replaced with Deleted. /g indicates that a full replacement is performed. Otherwise, only one replacement is performed

  • [Scenario 2] Delete the specified content

Delete the row containing “test”

sed -i '/test/d' demo.txt

The awk command

Awk is a programming language and Linux’s way of dealing with text and data that can be used in conjunction with pipes as we investigate problems.

There are many scenes related to sed and AWK, so we will not expand on them here. If you are interested, you can open a chapter to discuss them in detail

Checking System Resources

If the above points do not locate the problem, it is necessary to start from the system resources to check whether there is a system level problem

CPU

Commonly used commands:

Pidstat -u1-p $pid # Check the CPU usage and load average vmstat1
Copy the code
  • Scenario View the programs that occupy the top three CPU resources

ps auxw|head -1; ps auxw|sort -rn -k3|head -3

disk

Run the following command to query the I/O information: iotop run the following command to query the disk mount information: df -h Run the following command to query the I/O information: iostat -d -x -k1 10Du -sh./Copy the code

memory

Free -m # Check the memory usage before3The process of ps auxw | head -1; ps auxw|sort -rn -k4|head -3Vmstat = vmstat = vmstat = vmstat = vmstat3 3Pmap -d $pidCopy the code

network

Network problems are difficult to locate in Linux, which involves too many interference factors. This section lists only a few common commands

# # network information netstat -s look at specified port's process, solve the problem of port conflicts, not be able to access) netstat anp | grep ":22Tcpdump-nn -c tcpdump-nn -c5-i eth0 ICMP/TCP/UDP/HTTP # Monitor packets sent to the specified host or port tcpdump -c2 -q -XX -vvv -nn -i eth0 tcp dst X.X.X.X dst port 22
Copy the code

Through the above steps, most of the program problems can be located, the program problems need to debug investigation, system problems find the relevant OP to solve

Finally, attach the Linux command to query the website

man.linuxde.net/

Time is short, many details may not be particularly clear, if you are interested, I can write a detailed explanation for each sub-module, thank you for your understanding ~


I am Wu Liu, a programmer who loves basketball and coding

Follow me and bring you more of what you want to see ❤