This is the first article I participated in beginners’ introduction
Thanks to the Nuggets campaign, IT gave me a chance to settle down and start doing something I “wanted to do but never did” — writing
I believe that most of your programs are running on the service, and most of the server system is Linux system. In the daily development, deployment, debugging and online problem investigation process, there are more or less problems that need to be located. This paper mainly lists some common scenarios and shell commands in the daily operation and maintenance process. I hope I can give you some help in the process of Linux operation and maintenance projects
Analysis methods
When our program on the server exception, mainly divided into the following three steps
- Program checks
- Investigation Service Log
- Checking System Resources
Program checks
First of all, we found that the first step to do is to check whether their program is normal, mainly using the following commands
The ps command
Ps command is a more powerful process view command, used to report the current system process status, in the daily application should be more, with the pipeline has a lot of application scenarios, his corresponding parameters are many, the most commonly used is with AUX using PS AUX represents the user based display of all the programs on the machine
– [Scenario 1] View information about a specified process
ps aux | grep nginx
View information about nginx processes
– [Scenario 2] View the number of processes run by the admin user
ps -ef | awk '{if ($1 == "admin") count++} END {print count}'
There are many things you can do with pipes combined with grep and AWk
Dmesg can print information about kernel/hardware interaction to the terminal. It can detect TCP or hard disk failures, as well as program memory problems
– [Scenario 1] The Java program abnormally exits. How do I rectify the fault
Check whether there is a memory leak by using the dmesg command. OOM kills the process
- Dmesg printing time problem
This is a question that I have actually encountered and many students are asking, but the time is not displayed, and it is difficult to distinguish historical events from real-time events
You can add time in the following ways
RedHat5 echo 1 > /sys/module/printk/parameters/printk_time
RedHat6echo Y > /sys/module/printk/parameters/time
In this way, I can also add related monitoring based on DMESG information by time and find problems in time
Investigation Service Log
If the service is running properly but the result is not as expected, we need to check the service log to see if there is error or warning, which involves file operations. Linux file operations include grep, sed, and awk
The grep command
The text search command is used to retrieve text
- [Scenario 1] View the context of the matching text
Grep -b 10 “error” test.log -a n: matches the last n lines of the text -b n: matches the first n lines of the text -c n: matches the first n lines of the text
- [Scenario 2] Count the number of texts
Grep -c “warining” test.log Displays the matching number
- [Scenario 3] Recursively find subdirectories
Grep -nr error./log/ -r indicates that the ground cabinet searches for files containing matching characters in the current directory and subdirectories. -n indicates the number of matched lines
The sed command
Sed is used to automatically edit one or more files, reducing repetitive operations
- [Scenario 1] Replace all matching text in the file
sed -i 's/delete/deleted/g' demo.txt
S/indicates that delete is replaced with Deleted. /g indicates that a full replacement is performed. Otherwise, only one replacement is performed
- [Scenario 2] Delete the specified content
Delete the row containing “test”
sed -i '/test/d' demo.txt
The awk command
Awk is a programming language and Linux’s way of dealing with text and data that can be used in conjunction with pipes as we investigate problems.
There are many scenes related to sed and AWK, so we will not expand on them here. If you are interested, you can open a chapter to discuss them in detail
Checking System Resources
If the above points do not locate the problem, it is necessary to start from the system resources to check whether there is a system level problem
Commonly used commands:
Pidstat -u1-p $pid # Check the CPU usage and load average vmstat1
Copy the code
- Scenario View the programs that occupy the top three CPU resources
ps auxw|head -1; ps auxw|sort -rn -k3|head -3
Run the following command to query the I/O information: iotop run the following command to query the disk mount information: df -h Run the following command to query the I/O information: iostat -d -x -k1 10Du -sh./Copy the code
Free -m # Check the memory usage before3The process of ps auxw | head -1; ps auxw|sort -rn -k4|head -3Vmstat = vmstat = vmstat = vmstat = vmstat3 3Pmap -d $pidCopy the code
Network problems are difficult to locate in Linux, which involves too many interference factors. This section lists only a few common commands
# # network information netstat -s look at specified port's process, solve the problem of port conflicts, not be able to access) netstat anp | grep ":22Tcpdump-nn -c tcpdump-nn -c5-i eth0 ICMP/TCP/UDP/HTTP # Monitor packets sent to the specified host or port tcpdump -c2 -q -XX -vvv -nn -i eth0 tcp dst X.X.X.X dst port 22
Copy the code
Through the above steps, most of the program problems can be located, the program problems need to debug investigation, system problems find the relevant OP to solve
Finally, attach the Linux command to query the website
Time is short, many details may not be particularly clear, if you are interested, I can write a detailed explanation for each sub-module, thank you for your understanding ~
I am Wu Liu, a programmer who loves basketball and coding
Follow me and bring you more of what you want to see