This is an article from ali internal technology forum, the original text in Ali internal praise. The author has made this article available to the cloud community for extranet access. Hollis made some cuts to the article, including the introduction of the tools that are only available inside Alibaba and the links that can only be accessed through Alibaba’s internal network.

preface

At ordinary times often encounter a lot of problems in the work process, at the same time of problem solving, there are some tools play a considerable role, write down here, it is as notes, can let oneself forget quickly through follow-up, 2 it is to share, want to see this students can show their daily feel great help tool, everybody progresses together.

Enough gossip, let’s do it.

Tail The most commonly used Linux command class tail -f

tail -300f shopbase.log Count down to 300 lines and enter real-time listening file write mode
Copy the code

grep

grep forest f.txt     # file search
grep forest f.txt cpf.txt # Multifile search
grep 'log' /home/admin -r -n Find all files in the directory that match the keyword
cat f.txt | grep -i shopbase    
grep 'shopbase' /home/admin -r -n --include *.{vm,java} # specify the file suffix
grep 'shopbase' /home/admin -r -n --exclude *.{vm,java} # the match
seq 10 | grep 5 -A 3    # on the match
seq 10 | grep 5 -B 3    # the match
seq 10 | grep 5 -C 3    # match the top and bottom, usually use this
cat f.txt | grep -c 'SHOPBASE'
Copy the code

Awk 1 Basic commands

awk '{print $4,$6}' f.txt
awk '{print NR,$0}' f.txt cpf.txt    
awk '{print FNR,$0}' f.txt cpf.txt
awk '{print FNR,FILENAME,$0}' f.txt cpf.txt
awk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txt
echo 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'
Copy the code

2 match

awk '/ldb/ {print}' f.txt   # match LDB
awk '! /ldb/ {print}' f.txt  # do not match LDB
awk '/ldb/ && /LISTEN/ {print}' f.txt   # match LDB and LISTEN
awk '$5 ~ /ldb/ {print}' f.txt The fifth column matches LDB
Copy the code

3 Built-in variables

NR:NR indicates the Number of data reads according to the Record separator after the execution starts from AWK. The default Record separator is a newline character, so the default is the Number of data rows read. NR can be understood as the abbreviation of Number of Record.

FNR: When awK processes multiple input files, it does not start at 1 after the first File is processed. Instead, it continues to add up. Therefore, FNR is generated.

NF: indicates the Number of fields to be split in the current record. NF can be understood as the Number of fields.

find

Sudo -u admin find /home/admin/tmp/usr-name \*. Log find. -iname \*. TXT find. Find /usr-type l(all symbolic links in the current directory) find /usr-type l-name"z*"-ls(symlink details eg:inode, directory) find /home/admin-size + 250000K (files exceeding 250000K, Find /home/admin f-perm 777-exec ls-l{} \; Find /home/admin-atime-1 Files accessed within one day find /home/admin-ctime-1 Files whose status has changed within one day find /home/admin-mtime-1 Files modified within one day Find /home/admin-amin-1 Files accessed within one minute find /home/admin-cmin-1 Files whose status has changed within one minute find /home/admin-mmin-1 Files whose status has changed within one minute Find /home/admin-mmin-1 Files whose status has changed within one minute Find /home/admin-mmin-1 Files whose status has changed within one minute Find /home/admin-mmin-1 Files whose status has changed within one minute find /home/admin-mmin-1 Files whose status has changed within one minute Files modified within 1 minuteCopy the code

PGM Batch query logs that match the conditions of the VM-ShopBase

pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'
Copy the code

A: Tsar is our company’s own collection tool. It is very useful to persist historical data on disk, so let’s quickly query historical system data. Of course, the real-time application can also be queried. It’s installed on most machines.

tsar  You can view the indicators of the latest day
Copy the code

tsar --live ### You can view real-time indicators, one swipe in five seconds by default
Copy the code

tsar -d 20161218 ### specifies that you can view data for a single day. It looks like you can only view data for up to four months
Copy the code

tsar --mem
tsar --load
tsar --cpu
This can also be used with the -d parameter to query the status of a single indicator on a given day
Copy the code







top

Top in addition to look at some basic information, the rest is to cooperate with the query VM various problems

ps -ef | grep java
top -H -p pid
Copy the code

After the thread is converted from base 10 to base 16, JStack tries to figure out what the thread is doing

other

netstat -nat|awk  '{print $6}'|sort|uniq -c|sort -rn 
# Check the current connection and note the high close_wait case, such as the following
Copy the code



Screening tool

Btrace the first thing to say is BTrace. What a production environment & a pre-issued troubleshooter. Forget about the introduction. Go straight to the code

Check who is currently calling the add method of ArrayList, and print only the stack of threads whose current ArrayList size is greater than 500

@OnMethod(clazz = "java.util.ArrayList", method="add", location = @Location(value = Kind.CALL, clazz = "/. * /", method = "/. * /"))
public static void m(@ProbeClassName String probeClass, @ProbeMethodName String probeMethod, @TargetInstance Object instance, @TargetMethodOrField String method) {
   if(getInt(field("java.util.ArrayList"."size"), instance) > 479){
       println("check who ArrayList.add method:" + probeClass + "#" + probeMethod  + ", method:" + method + ", size:" + getInt(field("java.util.ArrayList"."size"), instance));
       jstack();
       println();
       println("= = = = = = = = = = = = = = = = = = = = = = = = = = ="); println(); }}Copy the code

2. Monitor the value returned when the current service method is called and the parameters of the request

@OnMethod(clazz = "com.taobao.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl", method="nav", location = @Location(value = Kind.RETURN))
public static void mt(long userId, int current, int relation, String check, String redirectUrl, @Return AnyType result) {
   println("parameter# userId:" + userId + ", current:" + current + ", relation:" + relation + ", check:" + check + ", redirectUrl:" + redirectUrl + ", result:" + result);
}
Copy the code

Interested in more content, please click: https://github.com/btraceio/btrace

Note:

  • After observation, the release output of 1.3.9 is unstable, and the correct result can be seen only after it is triggered several times
  • The range in which the regular expression matches the trace class must be controlled, otherwise it is highly likely that the application will freeze due to CPU overload
  • Due to the principle of bytecode injection, you need to restart the application to restore it to normal state.

Greys has several cool features (some of which overlap with BTrace):

Sc-df XXX: Outputs details of the current class, including source location and classloader structure

Trace Class Method: Really like this feature! JProfiler has seen this feature for a long time. Prints out the elapsed time of the current method call, broken down into each method.

JavOSize describes classes as a function that changes the content of a class by modifying the bytecode immediately. So you can do a quick log somewhere to see the output, but the downside is that it’s too intrusive to code. But it’s great if you know what you’re doing.

Other functions Greys and BTrace can easily do, forget it.

JProfiler used to determine a lot of problems through JProfiler, but now Greys and BTrace can basically handle it. Plus, the problems are mostly in production environments (network isolation), so it’s not used much anymore, but it’s worth noting. Website, please click https://www.ej-technologies.com/products/jprofiler/overview.html

Big is

EclipseMAT can be opened as a plug-in for Eclipse or as a separate program. Please click http://www.eclipse.org/mat/ for details

Java three axe, no, seven axe

JPS I only use one command:

sudo -u admin /opt/taobao/java/bin/jps -mlvV
Copy the code



jstack

Common usage:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815
Copy the code



Native + Java stack:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815
Copy the code



jinfo

You can see the system startup parameters as follows

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815
Copy the code



jmap

Two purposes

1. Check the heap

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815
Copy the code





2.dump

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815
Copy the code

or

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815
Copy the code

3. Who’s taking up the heap? Combined with Zprofiler and BTrace, troubleshooting problems is like adding a tiger to its wings

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10
Copy the code



jstat

There are many jstat parameters, but using just one is sufficient

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000 
Copy the code



jdb

JDB is still in regular use today.

JDB can be used to pre-send debug, assuming you pre-send javA_home to /opt/ Taobao/Java/and remote debug port 8000. then

sudo -u admin /opt/taobao/java/bin/jdb -attach 8000.



If the preceding information is displayed, the JDB is successfully started. You can set breakpoints for debugging.

The specific parameters visible oracle official http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html

CHLSDB CHLSDB feels like in many cases you can see more interesting things. I’ve heard that tools like JStack and JMap are based on it.

sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB
Copy the code

More detailed R large this post at http://rednaxelafx.iteye.com/blog/1847971

plugin of intellij idea

key promoter

You can’t remember a shortcut key once, but you can remember it several times, right?



maven helper

Analysis maven depends on a good helper.

VM options

1. From which file did you load your class?

- XX: + TraceClassLoading results form such as [the Loaded Java. Lang. Invoke. MethodHandleImpl$Lazy from D:\programme\jdk\jdk8U74\jre\lib\rt.jar]
Copy the code

2. The dump file is outputted

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprof
Copy the code

The jar package conflict

Is it too much to write this as a headline? Everyone has dealt with this annoying case at one point or another. What the hell can I do with all these plans?

mvn dependency:tree > ~/dependency.txt
Copy the code

Play all dependencies

mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId
Copy the code

Type only the specified groupId and artifactId dependencies

-XX:+TraceClassLoading
Copy the code

Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script

-verbose
Copy the code

Vm startup script added. The details of the loaded classes are visible in the Tomcat startup script

greys:sc
Copy the code

The sc command of Greys can also clearly see where the current class is loaded from

tomcat-classloader-locate
Copy the code

Through the following url to know from where the current class loading curl http://localhost:8006/classloader/locate? class=org.apache.xerces.xs.XSObjec

other

dmesg

If you find that your Java process has quietly disappeared, leaving few clues, then DMESG might have what you’re looking for.

sudo dmesg|grep -i kill|less
Copy the code

Go to the keyword oom_killer. The results found are similar to the following:

[6710782.021013] Java invoked oom - killer: Gfp_mask =0xd0, order=0, oOM_adj =0, oOM_scoe_adj =0 [6710782.070639] [< ffffff81118898>]? Oom_kill_process + 0 x68/0 x140 Task [6710782.257588]in /LXC011175068174 killed as a result of limitOf /LXC011175068174 [6710784.698347] cgroup out of Memory: Kill process 215701 (Java) score 854 or Sacrifice child [6710784.707978] Kill Process 215701, UID 679, (java) total-vm:11017300kB, anon-rss:7152432kB, file-rss:1232kBCopy the code

The Java process was killed by OOM Killer with a score of 854. Explain OOM killer (out-of-memory killer), which monitors the machine’s Memory consumption. When the machine runs out of memory, the mechanism scans all the processes (calculated according to certain rules, memory usage, time, etc.), selects the process with the highest score, and kills it to protect the machine.

Dmesg log time conversion formula: Log Actual time = Greenwich 1970-01-01+(current time seconds – seconds since the system was started + Log time printed by DMESG) Seconds:

date -d "1970-01-01 UTC `echo "$(date +%s)-$(cat /proc/uptime|cut -f 1 -d' ') + 12288812.926194"|bc ` seconds"
Copy the code

All that remains is to see why the memory is so large that it triggers OOM-Killer.

A new skill get

RateLimiter wants fine control of QPS? For example, if you call an interface and they explicitly want you to limit your QPS to 400, how do you control that? That’s where RateLimiter comes in. Details can be found at http://ifeve.com/guava-ratelimite

Author: Red Devil Number seven

Link: https://yq.aliyun.com/articles/69520?utm_content=m_10360



Welcome to follow my wechat public account “Code farming breakthrough”, share Python, Java, big data, machine learning, artificial intelligence and other technologies, pay attention to code farming technology improvement, career breakthrough, thinking transition, 200,000 + code farming growth charge first stop, accompany you have a dream to grow together