This is an article from ali internal technology forum, the original text in Ali internal praise. The author has made this article available to the cloud community for extranet access. Hollis made some cuts to the article, including the introduction of the tools that are only available inside Alibaba and the links that can only be accessed through Alibaba’s internal network.
preface
At ordinary times often encounter a lot of problems in the work process, at the same time of problem solving, there are some tools play a considerable role, write down here, it is as notes, can let oneself forget quickly through follow-up, 2 it is to share, want to see this students can show their daily feel great help tool, everybody progresses together.
Enough gossip, let’s do it.
Tail The most commonly used Linux command class tail -f
tail -300f shopbase.log Count down to 300 lines and enter real-time listening file write mode
Copy the code
grep
grep forest f.txt # file search
grep forest f.txt cpf.txt # Multifile search
grep 'log' /home/admin -r -n Find all files in the directory that match the keyword
cat f.txt | grep -i shopbase
grep 'shopbase' /home/admin -r -n --include *.{vm,java} # specify the file suffix
grep 'shopbase' /home/admin -r -n --exclude *.{vm,java} # the match
seq 10 | grep 5 -A 3 # on the match
seq 10 | grep 5 -B 3 # the match
seq 10 | grep 5 -C 3 # match the top and bottom, usually use this
cat f.txt | grep -c 'SHOPBASE'
Copy the code
Awk 1 Basic commands
awk '{print $4,$6}' f.txt
awk '{print NR,$0}' f.txt cpf.txt
awk '{print FNR,$0}' f.txt cpf.txt
awk '{print FNR,FILENAME,$0}' f.txt cpf.txt
awk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txt
echo 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'
Copy the code
2 match
awk '/ldb/ {print}' f.txt # match LDB
awk '! /ldb/ {print}' f.txt # do not match LDB
awk '/ldb/ && /LISTEN/ {print}' f.txt # match LDB and LISTEN
awk '$5 ~ /ldb/ {print}' f.txt The fifth column matches LDB
Copy the code
3 Built-in variables
NR:NR indicates the Number of data reads according to the Record separator after the execution starts from AWK. The default Record separator is a newline character, so the default is the Number of data rows read. NR can be understood as the abbreviation of Number of Record.
FNR: When awK processes multiple input files, it does not start at 1 after the first File is processed. Instead, it continues to add up. Therefore, FNR is generated.
NF: indicates the Number of fields to be split in the current record. NF can be understood as the Number of fields.
find
Sudo -u admin find /home/admin/tmp/usr-name \*. Log find. -iname \*. TXT find. Find /usr-type l(all symbolic links in the current directory) find /usr-type l-name"z*"-ls(symlink details eg:inode, directory) find /home/admin-size + 250000K (files exceeding 250000K, Find /home/admin f-perm 777-exec ls-l{} \; Find /home/admin-atime-1 Files accessed within one day find /home/admin-ctime-1 Files whose status has changed within one day find /home/admin-mtime-1 Files modified within one day Find /home/admin-amin-1 Files accessed within one minute find /home/admin-cmin-1 Files whose status has changed within one minute find /home/admin-mmin-1 Files whose status has changed within one minute Find /home/admin-mmin-1 Files whose status has changed within one minute Find /home/admin-mmin-1 Files whose status has changed within one minute Find /home/admin-mmin-1 Files whose status has changed within one minute find /home/admin-mmin-1 Files whose status has changed within one minute Files modified within 1 minuteCopy the code
PGM Batch query logs that match the conditions of the VM-ShopBase
pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'
Copy the code
A: Tsar is our company’s own collection tool. It is very useful to persist historical data on disk, so let’s quickly query historical system data. Of course, the real-time application can also be queried. It’s installed on most machines.
tsar You can view the indicators of the latest day
Copy the code
tsar --live ### You can view real-time indicators, one swipe in five seconds by default
Copy the code
tsar -d 20161218 ### specifies that you can view data for a single day. It looks like you can only view data for up to four months
Copy the code
tsar --mem
tsar --load
tsar --cpu
This can also be used with the -d parameter to query the status of a single indicator on a given day
Copy the code
top
Top in addition to look at some basic information, the rest is to cooperate with the query VM various problems
ps -ef | grep java
top -H -p pid
Copy the code
After the thread is converted from base 10 to base 16, JStack tries to figure out what the thread is doing
other
netstat -nat|awk '{print $6}'|sort|uniq -c|sort -rn
# Check the current connection and note the high close_wait case, such as the following
Copy the code
Screening tool
Btrace the first thing to say is BTrace. What a production environment & a pre-issued troubleshooter. Forget about the introduction. Go straight to the code
Check who is currently calling the add method of ArrayList, and print only the stack of threads whose current ArrayList size is greater than 500
@OnMethod(clazz = "java.util.ArrayList", method="add", location = @Location(value = Kind.CALL, clazz = "/. * /", method = "/. * /"))
public static void m(@ProbeClassName String probeClass, @ProbeMethodName String probeMethod, @TargetInstance Object instance, @TargetMethodOrField String method) {
if(getInt(field("java.util.ArrayList"."size"), instance) > 479){
println("check who ArrayList.add method:" + probeClass + "#" + probeMethod + ", method:" + method + ", size:" + getInt(field("java.util.ArrayList"."size"), instance));
jstack();
println();
println("= = = = = = = = = = = = = = = = = = = = = = = = = = ="); println(); }}Copy the code
2. Monitor the value returned when the current service method is called and the parameters of the request
@OnMethod(clazz = "com.taobao.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl", method="nav", location = @Location(value = Kind.RETURN))
public static void mt(long userId, int current, int relation, String check, String redirectUrl, @Return AnyType result) {
println("parameter# userId:" + userId + ", current:" + current + ", relation:" + relation + ", check:" + check + ", redirectUrl:" + redirectUrl + ", result:" + result);
}
Copy the code
Interested in more content, please click: https://github.com/btraceio/btrace
Note:
- After observation, the release output of 1.3.9 is unstable, and the correct result can be seen only after it is triggered several times
- The range in which the regular expression matches the trace class must be controlled, otherwise it is highly likely that the application will freeze due to CPU overload
- Due to the principle of bytecode injection, you need to restart the application to restore it to normal state.
Greys has several cool features (some of which overlap with BTrace):
Sc-df XXX: Outputs details of the current class, including source location and classloader structure
Trace Class Method: Really like this feature! JProfiler has seen this feature for a long time. Prints out the elapsed time of the current method call, broken down into each method.
JavOSize describes classes as a function that changes the content of a class by modifying the bytecode immediately. So you can do a quick log somewhere to see the output, but the downside is that it’s too intrusive to code. But it’s great if you know what you’re doing.
Other functions Greys and BTrace can easily do, forget it.
JProfiler used to determine a lot of problems through JProfiler, but now Greys and BTrace can basically handle it. Plus, the problems are mostly in production environments (network isolation), so it’s not used much anymore, but it’s worth noting. Website, please click https://www.ej-technologies.com/products/jprofiler/overview.html
Big is
EclipseMAT can be opened as a plug-in for Eclipse or as a separate program. Please click http://www.eclipse.org/mat/ for details
Java three axe, no, seven axe
JPS I only use one command:
sudo -u admin /opt/taobao/java/bin/jps -mlvV
Copy the code
jstack
Common usage:
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815
Copy the code
Native + Java stack:
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815
Copy the code
jinfo
You can see the system startup parameters as follows
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815
Copy the code
jmap
Two purposes
1. Check the heap
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815
Copy the code
2.dump
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815
Copy the code
or
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815
Copy the code
3. Who’s taking up the heap? Combined with Zprofiler and BTrace, troubleshooting problems is like adding a tiger to its wings
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10
Copy the code
jstat
There are many jstat parameters, but using just one is sufficient
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000
Copy the code
jdb
JDB is still in regular use today.
JDB can be used to pre-send debug, assuming you pre-send javA_home to /opt/ Taobao/Java/and remote debug port 8000. then
sudo -u admin /opt/taobao/java/bin/jdb -attach 8000.
If the preceding information is displayed, the JDB is successfully started. You can set breakpoints for debugging.
The specific parameters visible oracle official http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html
CHLSDB CHLSDB feels like in many cases you can see more interesting things. I’ve heard that tools like JStack and JMap are based on it.
sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB
Copy the code
More detailed R large this post at http://rednaxelafx.iteye.com/blog/1847971
plugin of intellij idea
key promoter
You can’t remember a shortcut key once, but you can remember it several times, right?
maven helper
Analysis maven depends on a good helper.
VM options
1. From which file did you load your class?
- XX: + TraceClassLoading results form such as [the Loaded Java. Lang. Invoke. MethodHandleImpl$Lazy from D:\programme\jdk\jdk8U74\jre\lib\rt.jar]
Copy the code
2. The dump file is outputted
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprof
Copy the code
The jar package conflict
Is it too much to write this as a headline? Everyone has dealt with this annoying case at one point or another. What the hell can I do with all these plans?
mvn dependency:tree > ~/dependency.txt
Copy the code
Play all dependencies
mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId
Copy the code
Type only the specified groupId and artifactId dependencies
-XX:+TraceClassLoading
Copy the code
Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script
-verbose
Copy the code
Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script
greys:sc
Copy the code
The sc command of Greys can also clearly see where the current class is loaded from
tomcat-classloader-locate
Copy the code
Through the following url to know from where the current class loading curl http://localhost:8006/classloader/locate? class=org.apache.xerces.xs.XSObjec
other
dmesg
If you find that your Java process has quietly disappeared, leaving few clues, then DMESG might have what you’re looking for.
sudo dmesg|grep -i kill|less
Copy the code
Go to the keyword oom_killer. The results found are similar to the following:
[6710782.021013] Java invoked oom - killer: Gfp_mask =0xd0, order=0, oOM_adj =0, oOM_scoe_adj =0 [6710782.070639] [< ffffff81118898>]? Oom_kill_process + 0 x68/0 x140 Task [6710782.257588]in /LXC011175068174 killed as a result of limitOf /LXC011175068174 [6710784.698347] cgroup out of Memory: Kill process 215701 (Java) score 854 or Sacrifice child [6710784.707978] Kill Process 215701, UID 679, (java) total-vm:11017300kB, anon-rss:7152432kB, file-rss:1232kBCopy the code
The Java process was killed by OOM Killer with a score of 854. Explain OOM killer (out-of-memory killer), which monitors the machine’s Memory consumption. When the machine runs out of memory, the mechanism scans all the processes (calculated according to certain rules, memory usage, time, etc.), selects the process with the highest score, and kills it to protect the machine.
Dmesg log time conversion formula: Log Actual time = Greenwich 1970-01-01+(current time seconds – seconds since the system was started + Log time printed by DMESG) Seconds:
date -d "1970-01-01 UTC `echo "$(date +%s)-$(cat /proc/uptime|cut -f 1 -d' ') + 12288812.926194"|bc ` seconds"
Copy the code
All that remains is to see why the memory is so large that it triggers OOM-Killer.
A new skill get
RateLimiter wants fine control of QPS? For example, if you call an interface and they explicitly want you to limit your QPS to 400, how do you control that? That’s where RateLimiter comes in. Details can be found at http://ifeve.com/guava-ratelimite
Author: Red Devil Number seven
Link: https://yq.aliyun.com/articles/69520?utm_content=m_10360
Welcome to follow my wechat public account “Code farming breakthrough”, share Python, Java, big data, machine learning, artificial intelligence and other technologies, pay attention to code farming technology improvement, career breakthrough, thinking transition, 200,000 + code farming growth charge first stop, accompany you have a dream to grow together