Author: the reds no. 7 link: https://yq.aliyun.com/articles/69520?utm_content=m_10360
Copy the code
This is an article from ali internal technology forum, the original text in Ali internal praise. The author has made this article available to the cloud community for extranet access. Hollis made some cuts to the article, including the introduction of the tools that are only available inside Alibaba and the links that can only be accessed through Alibaba’s internal network.
At ordinary times often encounter a lot of problems in the work process, at the same time of problem solving, there are some tools play a considerable role, write down here, it is as notes, can let oneself forget quickly through follow-up, 2 it is to share, want to see this students can show their daily feel great help tool, everybody progresses together.
Enough gossip, let’s do it.
tail
Tail -f is the most commonly used
Tail-300f shopbase.log # count down 300 lines and enter real-time listening file write mode
Copy the code
grep
Grep forest f.t # xt file find grep forest f.t xt CPF. # TXT file to find more grep 'log'/home/admin - r - n # directory to find all the documents as keyword cat f.t xt | grep -i Shopbase grep 'shopbase' /home/admin -r-n --include *.{vm, Java} # grep 'shopbase' /home/admin -r-n --exclude *. # {vm, Java} the matching seq 10 | grep on 3 # 5 - A match seq 10 | grep matching seq # 5-3 B under 10 | grep 3 # 5 - C match, At ordinary times agreed with the cat f.t xt | grep -c 'SHOPBASE'
Copy the code
awk
1 Basic Commands
awk '{print $4,$6}' f.txtawk '{print NR,$0}' f.txt cpf.txt awk '{print FNR,$0}' f.txt cpf.txtawk '{print FNR,FILENAME,$0}' f.txt cpf.txtawk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txtecho 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'
Copy the code
2 match
Awk '/ LDB / {print}' f.t # match ldbawk '! LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f.t # = LISTENawk '$5 ~ / LDB / {print}' f
Copy the code
3 Built-in variables
NR:NR indicates the Number of data reads according to the Record separator after the execution starts from AWK. The default Record separator is a newline character, so the default is the Number of data rows read. NR can be understood as the abbreviation of Number of Record.
FNR: When awK processes multiple input files, it does not start at 1 after the first File is processed. Instead, it continues to add up. Therefore, FNR is generated.
NF: indicates the Number of fields to be split in the current record. NF can be understood as the Number of fields.
find
Sudo -u admin find /home/admin/tmp/usr-name \*. Log (multiple directories to find)find. -iname \*. TXT (case matched)find /usr-type l(all symbolic links in the current directory)find /usr-type l-name "z*" -ls(symbolic link details eg:inode, directory)find /home/admin-size Find /home/admin f-perm 777-exec ls -l {} \; Find /home/admin-atime-1 Files accessed within one day find /home/admin-ctime-1 Files whose status has changed within one day find /home/admin-mtime-1 Files modified within 1 day find /home/admin-amin-1 Files accessed within 1 minute find /home/admin-cmin-1 Files whose status has changed within 1 minute Find /home/admin-mmin-1 Files modified within 1 minute
Copy the code
pgm
Batch query vM-ShopBase logs that meet the conditions
pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'
Copy the code
tsar
Tsar is our own collection tool. It is very useful to persist historical data on disk, so let’s quickly query historical system data. Of course, the real-time application can also be queried. It’s installed on most machines.
See the latest day's indicators
Copy the code
Tsar live allows you to view real-time metrics, with a default of five seconds
Copy the code
Tsar-d 20161218 ### Specifies data for a single day, which looks like a maximum of four months
Copy the code
Tsar --memtsar --loadtsar --cpu### this can also be used with the -d parameter to query the status of individual metrics on a given day
Copy the code
top
Top in addition to look at some basic information, the rest is to cooperate with the query VM various problems
ps -ef | grep javatop -H -p pid
Copy the code
After the thread is converted from base 10 to base 16, JStack tries to figure out what the thread is doing
other
Netstat NAT | awk '{print $6}' | sort | uniq -c | sort - rn # to check the current connection, note close_wait on the high side, such as the following
Copy the code
btrace
The first thing to say is bTrace. What a production environment & a pre-issued troubleshooter. Forget about the introduction. Go straight to the code
Check who is currently calling the add method of ArrayList, and print only the stack of threads whose current ArrayList size is greater than 500
@OnMethod(clazz = "java.util.ArrayList", method="add", location = @Location(value = Kind.CALL, clazz = "/.*/", method = "/.*/"))public static void m(@ProbeClassName String probeClass, @ProbeMethodName String probeMethod, @TargetInstance Object instance, @TargetMethodOrField String method) { if(getInt(field("java.util.ArrayList", "size"), instance) > 479){ println("check who ArrayList.add method:" + probeClass + "#" + probeMethod + ", method:" + method + ", size:" + getInt(field("java.util.ArrayList", "size"), instance)); jstack(); println(); println("==========================="); println(); }}
Copy the code
2. Monitor the value returned when the current service method is called and the parameters of the request
@OnMethod(clazz = "com.taobao.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl", method="nav", location = @Location(value = Kind.RETURN))public static void mt(long userId, int current, int relation, String check, String redirectUrl, @Return AnyType result) { println("parameter# userId:" + userId + ", current:" + current + ", relation:" + relation + ", check:" + check + ", redirectUrl:" + redirectUrl + ", result:" + result); }
Copy the code
Interested in more content, please click: https://github.com/btraceio/btrace
Note:
-
After observation, the release output of 1.3.9 is unstable, and the correct result can be seen only after it is triggered several times
-
The range in which the regular expression matches the trace class must be controlled, otherwise it is highly likely that the application will freeze due to CPU overload
-
Due to the principle of bytecode injection, you need to restart the application to restore it to normal state.
Greys
Here are a few cool features (some of which overlap with bTrace):
Sc-df XXX: Outputs details of the current class, including source location and classloader structure
Trace Class Method: Really like this feature! JProfiler has seen this feature for a long time. Prints out the elapsed time of the current method call, broken down into each method.
javOSize
Classes, for example, changes the content of classes by modifying the bytecode, effective immediately. So you can do a quick log somewhere to see the output, but the downside is that it’s too intrusive to code. But it’s great if you know what you’re doing.
Other functions Greys and BTrace can easily do, forget it.
JProfiler
JProfiler was used to determine a lot of problems, but now Greys and BTrace can do almost everything. Plus, the problems are mostly in production environments (network isolation), so it’s not used much anymore, but it’s worth noting. Website, please click https://www.ej-technologies.com/products/jprofiler/overview.html
eclipseMAT
It can be opened as an Eclipse plug-in or as a separate program. Please click http://www.eclipse.org/mat/ for details
jps
I use only one command:
sudo -u admin /opt/taobao/java/bin/jps -mlvV
Copy the code
jstack
Common usage:
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815
Copy the code
Native + Java stack:
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815
Copy the code
jinfo
You can see the system startup parameters as follows
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815
Copy the code
jmap
Two purposes
1. Check the heap
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815
Copy the code
2.dump
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815
Copy the code
or
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815
Copy the code
3. Who’s taking up the heap? Combined with Zprofiler and BTrace, troubleshooting problems is like adding a tiger to its wings
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10
Copy the code
jstat
There are many jstat parameters, but using just one is sufficient
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000
Copy the code
jdb
JDB is still in regular use today. JDB can be used to pre-send debug, assuming you pre-send javA_home to /opt/ Taobao/Java/and remote debug port 8000. Sudo -u admin /opt/taobao/ Java /bin/ jdb-attach 8000
If the preceding information is displayed, the JDB is successfully started. You can set breakpoints for debugging. The specific parameters visible oracle official http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html
CHLSDB
The CHLSDB feels that in many cases you can see more interesting things, without going into detail. I’ve heard that tools like JStack and JMap are based on it.
sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB
Copy the code
More detailed R large this post at http://rednaxelafx.iteye.com/blog/1847971
key promoter
You can’t remember a shortcut key once, but you can remember it several times, right?
maven helper
Analysis maven depends on a good helper.
1. From which file did you load your class?
- XX: + TraceClassLoading results form such as [the Loaded Java. Lang. Invoke. MethodHandleImpl $Lazy from D: \ programme \ JDK \ jdk8U74 \ jre \ lib \ rt jar]
Copy the code
2. The dump file is outputted
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprof
Copy the code
Is it too much to write this as a headline? Everyone has dealt with this annoying case at one point or another. What the hell can I do with all these plans?
mvn dependency:tree > ~/dependency.txt
Copy the code
Play all dependencies
mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId
Copy the code
Type only the specified groupId and artifactId dependencies
-XX:+TraceClassLoading
Copy the code
Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script
-verbose
Copy the code
Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script
greys:sc
Copy the code
The sc command of Greys can also clearly see where the current class is loaded from
tomcat-classloader-locate
Copy the code
Through the following url to know from where the current class loading curl http://localhost:8006/classloader/locate? class=org.apache.xerces.xs.XSObjec
dmesg
If you find that your Java process has quietly disappeared, leaving few clues, then DMESG might have what you’re looking for.
sudo dmesg|grep -i kill|less
Copy the code
Go to the keyword oom_killer. The results found are similar to the following:
[6710782.021013] Java invoked oom - killer: Gfp_mask =0xd0, order=0, oOM_adj =0, oOM_scoe_adj =0[6710782.070639] [< ffffff81118898>]? Oom_kill_process +0x68/0x140 [6710782.257588] Task in /LXC011175068174 KILLED as a result of limit of /LXC011175068174 [6710784.698347] Memory cgroup out of Memory: Kill process 215701 (Java) score 854 or Sacrifice child [6710784.707978] Kill Process 215701, UID 679, (java) total-vm:11017300kB, anon-rss:7152432kB, file-rss:1232kB
Copy the code
The Java process was killed by OOM Killer with a score of 854. Explain OOM killer (out-of-memory killer), which monitors the machine’s Memory consumption. When the machine runs out of memory, the mechanism scans all the processes (calculated according to certain rules, memory usage, time, etc.), selects the process with the highest score, and kills it to protect the machine.
Dmesg log time conversion formula: Log Actual time = Greenwich 1970-01-01+(current time seconds – seconds since the system was started + Log time printed by DMESG) Seconds:
Date - d "of the 1970-01-01 UTC ` echo" $(date + % s) - $(cat/proc/uptime | the cut - f - 1 d ') + 12288812.926194 "| BC ` seconds"
Copy the code
All that remains is to see why the memory is so large that it triggers OOM-Killer.
RateLimiter
Want fine control of QPS? For example, if you call an interface and they explicitly want you to limit your QPS to 400, how do you control that? That’s where RateLimiter comes in. Details can be found at http://ifeve.com/guava-ratelimite
How do you compare the size of two times in Java?
Learn more about enumerations in Java.
– MORE | – MORE excellent articles
-
Why can’t my girlfriend apply for refund on Double 11
-
This article gives you an in-depth understanding of Zookeeper
-
Concurrency problems caused by a HashSet
-
5 minutes to understand congestion control
If you enjoyed this article.
Please long press the QR code to follow Hollis
Forwarding moments is the biggest support for me.