Arthas – The ultimate tool for locating online problems in Java
Transfer: mp.weixin.qq.com/s?subscene=…
preface
Before Arthas, when you had Java online problems, such as CPU spikes, load spikes, memory overruns, etc., you had to look at the commands, look at the network, then do JPS, JStack, JMAP, JHAT, Jstat, hprof, etc. You end up in a mess, and you don’t necessarily know what the problem is. Today, you can use Arthas to locate most common problems easily, solve them quickly, stop your losses on time, and leave work on time.
1. Introduction to Arthas
Arthas is a Java diagnostic tool that Alibaba opened as an open source in September 2018. Support JDK6+, use the command line interactive mode, provide Tab automatic incomplete, can conveniently locate and diagnose online program running problems. As of this writing, Star 17000+ has been harvested.
The Arthas official documentation is very detailed, and this article also references the official documentation. In addition, there are not only problems feedback but also a large number of use cases in the Issues on Github, an open source project.
data
Open source address: github.com/alibaba/art…
Official documentation: alibaba.github. IO /arthas
WIKI: arthas.aliyun.com/doc
2. Usage scenarios of Arthas
Thanks to its powerful and rich capabilities, Arthas is able to do more than you could ever imagine. The following are just a few common usage scenarios, but more can be explored as you become familiar with Arthas.
Is there a global view of the health of the system?
Why does the CPU go up again, and what is taking up the CPU?
Are there deadlocks in running multiple threads? Is there a blockage?
It takes a long time to run. What takes a long time to run? How do you monitor it?
Which JAR is this class loaded from? Why are exceptions related to various classes reported?
Why is the code I changed not executed? Did I not commit? Wrong branch?
You can’t debug online when you encounter a problem, can you only add logs and republish?
Is there any way to monitor the real-time state of the JVM?
3. How does Arthas work
As mentioned earlier, Arthas is a Command-line Java diagnostic tool, written in Java, so you can download the corresponding JAR package and run it.
3.1 installation
It can be downloaded from the official Github, or if it is slow, try Gitee, the domestic code cloud.
Arthas is downloaded and installed
Making the download
Wget alibaba. Making. IO/arthas/arth…
Or Gitee download
Wget arthas. Gitee. IO/arthas – the boot…
3.2 run
Arthas is just a Java program, so you can run it directly with java-jar. Select the Java process to monitor at runtime or after run.
Printing Help Information
java -jar arthas-boot.jar -h
EXAMPLES: Jar Java -jar arthas-boot.jar Java -jar arthas-boot.jar –target-ip 0.0.0.0 Java -jar arthas-boot.jar — teln-port 9999 –http-port -1 java -jar arthas-boot.jar –username admin –password java -jar arthas-boot.jar –tunnel-server ‘ws: / / 192.168.10.11:7777 / ws’ – app -name demoapp Java jar arthas – boot. The jar – tunnel – server’ ws: / / 192.168.10.11:7777 / ws’ – agent – id bvDOe8XbTM2pQWjF4cfw Java – jar arthas – boot. The jar – stat – url ‘http://192.168.10.11:8080/api/stat’ Java – jar arthas-boot.jar -c ‘sysprop; As java-jar arthas-boot.jar –use-version 3.5.2 java-jar arthas-boot.jar –versions java -jar arthas-boot.jar –select math-game java -jar arthas-boot.jar –session-timeout 3600 java -jar arthas-boot.jar –attach-only java -jar arthas-boot.jar –disabled-commands stop,dump java -jar arthas-boot.jar –repo-mirror aliyun –use-http
Running mode 1: Run the Java process and select PID
java -jar arthas-boot.jar
Select process (enter [] internal number (not PID) and press Enter)
[INFO] arthas-boot version: 3.1.4 [INFO] Found existing java process, please choose one and hit RETURN.
-
[1]: 11616 com.Arthas
[2], 8676
[3]: 16200 org.jetbrains.jps.cmdline.Launcher [4]: 21032 org.jetbrains.idea.maven.server.RemoteMavenServer
Runtime mode 2: Select Java process PID at runtime
Java -jar arthas-boot.jar [PID] You can run the ps command or the JPS command provided by the JDK to view the PID.
View information about running Java processes
$ jps -mlvV
Filter Java process information
$JPS – mlvV | grep [XXX] the JPS screening to process.
Once the Arthas Logo is present, you can use the command to troubleshoot the problem. More on this below.
# $ps - ef | grep Java to find specific Java pid $Java - jar arthas - boot. Jar 15601 [INFO] arthas - boot version: 3.5.2[INFO] Start Download arthas from remote Server: https://arthas.aliyun.com/download/3.5.3?mirror=aliyun [INFO] Download arthas success. [INFO] arthas home: / home/haochezhu - dev /. Arthas/lib / 3.5.3 arthas [INFO] Try to attach the process 15601 [INFO] attach process 15601 success. [INFO] Arthas - client connect 127.0.0.1 3658, -.,. -- -- -- -- -- -, -- -- -- -- -- -- -- -.,.,.,. -, -. / O \ | -. '-.. -' | '-' | / O \ '-' |. -. | | '-', '| |. | -. | |. -. | `. ` - | | | | | | \ \ | | | | | | | | | |. -' | ` - ` '` -' - '-' ` - '` -' `--'`--' `--'`-----'wiki https://arthas.aliyun.com/doctutorials https://arthas.aliyun.com/doc/arthas-tutorials.htmlversion 3.5.3 main_classpid 15601 time 2021-08-11 11:29:41[arthas@15601]$Copy the code
For more startup modes, see the help help command.
Other USES
EXAMPLES: Jar Java -jar arthas-boot.jar Java -jar arthas-boot.jar –target-ip 0.0.0.0 Java -jar arthas-boot.jar — teln-port 9999 Java – – – HTTP port – 1 jar arthas – boot. The jar – tunnel – server ‘ws: / / 192.168.10.11:7777 / ws’ Java – jar arthas – boot. The jar — tunnel – server ‘ws: / / 192.168.10.11:7777 / ws’ – agent – id bvDOe8XbTM2pQWjF4cfw Java – jar arthas – boot. The jar – stat – url Java – jar arthas’ http://192.168.10.11:8080/api/stat ‘- the boot. The jar – c’ sysprop; As java-jar arthas-boot.jar –use-version 3.1.4 java-jar arthas-boot.jar –versions java -jar arthas-boot.jar –session-timeout 3600 java -jar arthas-boot.jar –attach-only java -jar arthas-boot.jar –repo-mirror aliyun –use-http
3.3 web console
Arthas currently supports a Web Console, which is automatically started upon successful startup of the connection process and can be accessed directly at http://127.0.0.1:8563/, where the page operates in exactly the same mode as the Console.
3.4 Common Commands
Here are some common Arthas commands that you may not know how to use. Don’t worry, they will be covered later.
Dashboard Real-time data panel of the current system Thread Displays thread stack information of the current JVM watch Method Performs data observation – method debug debugging, method variable values, etc. The trace method calls the path internally and prints the time on each node on the method path. Tp99 Monitoring time stack output the call path of the current method is called tt method execution data space-time tunnel, record the input parameters and return information of each call of the specified method, And can be called to observe these different times of the monitor method to perform monitoring JVM to view the current JVM information vmOption view, Update parameters related to JVM diagnostics sc View information about classes loaded by the JVM SM View information about methods loaded by the JVM JAD Decompile the source code of the specified loaded class Classloader View the inheritance tree of classloader, urls, Class loading information Heapdump Is similar to the heapdump function of the jmap command
3.5 out of shutdown
When you exit with shutdown, Arthas automatically resets all enhanced classes at the same time.
4. Common Arthas operations
Now that you’ve seen what Arthas is and how it can be started, the following details how Arthas can be used, depending on some circumstances. If there is a problem when using a command, you can run the -h command to view the help information.
Write a test class to run and use Arthas to locate the problem.
import java.util.HashSet; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import lombok.extern.slf4j.Slf4j; /** * <p> * Arthas Demo * * * @author niujinpeng */ @slf4j public class Arthas {private static HashSet HashSet = new HashSet(); / * * thread pool, the size of 1 * / private static ExecutorService ExecutorService = Executors. NewFixedThreadPool (1); Public static void main(String[] args) {// the simulated CPU is too high. // Simulate thread blocking thread(); // Simulate deadThread(); // Add data to the hashSet. AddHashSetThread (); Public static void addHashSetThread() {// Initialize constant new Thread(() -> {int count = 0; while (true) { try { hashSet.add("count" + count); Thread.sleep(10000); count++; } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); } public static void cpu() { cpuHigh(); cpuNormal(); } private static void cpuHigh() {Thread Thread = new Thread(() -> {while (true) {log.info(" CPU ") start 100"); }}); // Add to thread executorService.submit(thread); } private static void cpuNormal() {for (int I = 0; i < 10; i++) { new Thread(() -> { while (true) { log.info("cpu start"); try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); }} /** * Private static void thread() {thread thread = new thread() -> {while (true) { log.debug("thread start"); try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); }}}); // Add to thread executorService.submit(thread); } /** * deadThread */ private static void deadThread() {/** Create resource */ Object resourceA = new Object(); Object resourceB = new Object(); Thread threadA = new Thread(() -> {synchronized (resourceA) {log.info(thread.currentThread () + "get" ResourceA"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceB"); synchronized (resourceB) { log.info(Thread.currentThread() + " get resourceB"); }}}); Thread threadB = new Thread(() -> { synchronized (resourceB) { log.info(Thread.currentThread() + " get ResourceB"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceA"); synchronized (resourceA) { log.info(Thread.currentThread() + " get resourceA"); }}}); threadA.start(); threadB.start(); }}Copy the code
4.1 Monitoring Dashboards globally
Use the dashboard command to get an overview of your program’s threading, memory, GC, and runtime environment.
4.2 Why did CPU take off
The above code example has a CPU idling infinite loop, which is very CPU consuming, so how to find out?
Using thread, you can see that the CPU usage of each thread is 100%.
Using the command thread 12 to view the high CPU usage of thread 12, you can see the high CPU usage method and the number of lines (the number of lines may differ from the number of lines in the above code, which was rearranged as I wrote this article).
The above is by looking at the overall thread information, and then see the specific thread running. If you want to find the threads with high CPU usage, you can run the thread -n [Number of threads displayed] command to list the threads with Top N CPU usage.
Locate to the method with the highest CPU usage.
4.3 Thread Pool Thread status
Before locating thread problems, let’s review some common thread states:
In the operation of the RUNNABLE
TIMED_WAITIN A thread that calls the following method will enter TIMED_WAITING:
Thread#sleep()
Object#wait() with a timeout parameter
Thread#join() with a timeout parameter
LockSupport#parkNanos()
LockSupport#parkUntil()
WAITING A thread enters the WAITING state when it calls the following method:
Object#wait() with no timeout argument
Thread#join() with no timeout parameter
LockSupport#park()
BLOCKED BLOCKED, waiting for the lock
In the above simulation, we define a pool of thread size 1, submit a thread in the cpuHigh method, and submit another thread in the thread method. The next thread is blocked because the pool is full.
Using thread | grep pool in order to check the thread pool thread information.
You can see that the thread pool has WAITING threads.
4.4 Thread Deadlock
In the above simulation code, the deadThread method implements a deadlock. Use the thread-b command to view the information directly locating the deadlock.
/** Deadlock */ private static void deadThread() {/** Create resource */Object resourceA = new Object(); Object resourceB = new Object(); Thread threadA = new Thread(() -> {synchronized (resourceA) {log.info(thread.currentThread () + "get" ResourceA"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceB"); synchronized (resourceB) { log.info(Thread.currentThread() + " get resourceB"); }}}); Thread threadB = new Thread(() -> { synchronized (resourceB) { log.info(Thread.currentThread() + " get ResourceB"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceA"); synchronized (resourceA) { log.info(Thread.currentThread() + " get resourceA"); }}}); threadA.start(); threadB.start(); }Copy the code
The checked deadlock information.
4.5 Decompile jad
The above code is placed under package COM. Assuming this is a threaded environment, you can either decompile the code directly or optionally view the field or method information of the class if you suspect that the code currently running is not what you want.
If you suspect it is not your code, you can use the jad command to decompile the class directly.
Jad The jad command also provides some other parameters:
Decompile shows only the source code
jad –source-only com.Arthas
Decompile a method of a class
jad –source-only com.Arthas mysql
4.6 Viewing Field Information
Run the sc -d -f command to view the field information about the class.
[arthas@20252]$ sc -d -f com.Arthassc -d -f com.Arthas class-info com.Arthas code-source /C:/Users/Niu/Desktop/arthas/target/classes/ name com.Arthas isInterface false isAnnotation false isEnum false isAnonymousClass false isArray false isLocalClass false isMemberClass false isPrimitive false isSynthetic false simple-name Arthas modifier public annotation interfaces super-class +-java.lang.Object class-loader +-sun.misc.Launcher$AppClassLoader@18b4aac2 +-sun.misc.Launcher$ExtClassLoader@2ef1e4fa classLoaderHash 18b4aac2 fields modifierfinal,private,static type org.slf4j.Logger name log value Logger[com.Arthas] modifierprivate,static type java.util.HashSet name hashSet value [count1, count2] modifierprivate,static type java.util.concurrent.ExecutorService name executorService value java.util.concurrent.ThreadPoolExecutor@71c03156[Ru nning, pool size = 1, active threads = 1, queued ta sks = 0, completed tasks = 0]Affect(row-cnt:1) cost in 9 ms.
Copy the code
4.7 Viewing Method Information
Use the sm command to view the method information of the class.
[arthas@22180]$ sm com.Arthascom.Arthas <init>()Vcom.Arthas start()Vcom.Arthas thread()Vcom.Arthas deadThread()Vcom.Arthas lambda$cpuHigh$1()Vcom.Arthas cpuHigh()Vcom.Arthas lambda$thread$3()Vcom.Arthas addHashSetThread()Vcom.Arthas cpuNormal()Vcom.Arthas cpu()Vcom.Arthas lambda$addHashSetThread$0()Vcom.Arthas lambda$deadThread$4(Ljava/lang/Object; Ljava/lang/Object;) Vcom.Arthas lambda$deadThread$5(Ljava/lang/Object; Ljava/lang/Object;) Vcom.Arthas lambda$cpuNormal$2()VAffect(row-cnt:16) cost in 6 ms.Copy the code
4.8 Is curious about the value of the variable
With the ognl command, ogNL expressions can easily manipulate the desired information.
In the same code as the example above, let’s look at the data in the variable hashSet:
View information about the static variable hashSet.
[arthas@19856]$ ognl '@com.Arthas@hashSet'@HashSet[ @String[count1], @String[count2], @String[count29], @String[count28], @String[count0], @String[count27], @String[count5], @String[count26], @String[count6], @String[count25], @String[count3], @String[count24],
Copy the code
View the size of the static variable hashSet.
[arthas@19856]$ognl ‘@[email protected]()’ @Integer[57] can even operate.
[arthas@19856] ognl ‘@[email protected](“test”)’ @Boolean\[true\] \[arthas@19856\]
View the added characters
[arthas @ 19856] ognl ‘@ com. Arthas @ hashSet | grep test @ String \ [test \], \ [arthas @ 19856 \] ognl can do many things, See the special usage of ognl expressions (github.com/alibaba/art…) .
4.9 Is there any problem with the program
4.9.1 Slow Operation and Long Time Consumption
You can run the trace command to trace the statistical method time
This time, I’m going to use another simulation code. The getUser method on the control layer calls userService.get(uid); the getUser method on the control layer calls userService.get(uid). Check, service, redis, mysql;
@RestController@Slf4jpublic class UserController { @Autowired private UserServiceImpl userService; @GetMapping(value = "/user") public HashMap<String, Object> getUser(Integer UID) throws Exception {// Simulate user query userService.get(uid); HashMap<String, Object> hashMap = new HashMap<>(); hashMap.put("uid", uid); hashMap.put("name", "name" + uid); return hashMap; }}Copy the code
Mock code Service:
@Service@Slf4jpublic class UserServiceImpl { public void get(Integer uid) throws Exception { check(uid); service(uid); redis(uid); mysql(uid); } public void service(Integer uid) throws Exception { int count = 0; for (int i = 0; i < 10; i++) { count += i; } log.info("service end {}", count); } public void redis(Integer uid) throws Exception { int count = 0; for (int i = 0; i < 10000; i++) { count += i; } log.info("redis end {}", count); } public void mysql(Integer uid) throws Exception { long count = 0; for (int i = 0; i < 10000000; i++) { count += i; } log.info("mysql end {}", count); } public Boolean check (uid) Integer throws the Exception {the if (uid = = null | | uid < 0) {log. The error (" uid is not correct, the uid: {} ", uid); Throw new Exception("uid is incorrect "); } return true; }}Copy the code
After running Springboot, use the trace== command to start detecting the time.
[arthas@6592]$trace com.userController getUser Access interface /getUser. The com.userServiceImpl :get() method takes a lot of time.
Continue to trace the time-consuming method, and then access it again.
[arthas@6592]$ trace com.UserServiceImpl get
It is clear that the mysql method com.UserServiceImpl is the most time-consuming.
Affect(class-cnt:1 , method-cnt:1) cost in 31 ms. —ts=2019-10-16 14:40:10; thread_name=http-nio-8080-exec-8; id=1f; is_daemon=true; priority=5; TCCL = org. Springframework. Boot. Web. Embedded. Tomcat. 23 a918c7 TomcatEmbeddedWebappClassLoader @ – [6.792201] ms Com. UserServiceImpl: get () + – [0.008] ms com. UserServiceImpl: check () # 17 + – [0.076] ms com. UserServiceImpl: service () # 18 + – [0.1089] ms com. UserServiceImpl: redis () # 19 ` – [6.528899] ms com. UserServiceImpl: mysql (# 20)
4.9.2 Time consuming of statistical methods
Run the monitor command to monitor the execution of statistical methods.
Count the get method execution of the com.UserServiceImpl class every 5 seconds.
monitor -c 5 com.UserServiceImpl get
4.10 Want to observe method information
The following example uses the first two mocks in this article.
4.10.1 Observing the Input and output parameters of a method
Use the watch command to easily view input and output parameters and exceptions.
USAGE: watch [-b] [-e] [-x <value>] [-f] [-h] [-n <value>] [-E] [-M <value>] [-s] class-pattern method-pattern express [condition-express] SUMMARY: Display the input/output parameter, return object, and thrown exception of specified method invocation The express may be one of the following expression (evaluated dynamically): target : the object clazz : the object's class method : the constructor or method params : the parameters array of method params[0..n] : the element of parameters array returnObj : the returned object of method throwExp : the throw exception of method isReturn : the method ended by return isThrow : the method ended by throwing exception #cost : the execution time in ms of method invocation Examples: watch -b org.apache.commons.lang.StringUtils isBlank params watch -f org.apache.commons.lang.StringUtils isBlank returnObj watch org.apache.commons.lang.StringUtils isBlank '{params, target, returnObj}' -x 2 watch -bf *StringUtils isBlank params watch *StringUtils isBlank params[0] watch *StringUtils isBlank params[0] params[0].length==1 watch *StringUtils isBlank params '#cost>100' watch -E -b org\.apache\.commons\.lang\.StringUtils isBlank params[0]
Copy the code
WIKI: alibaba. Making. IO/arthas/watc… Common operations:
View the incoming and outgoing parameters
$ watch com.Arthas addHashSet ‘{params[0],returnObj}’
View the incoming and outgoing parameter sizes
$ watch com.Arthas addHashSet ‘{params[0],returnObj.size}’
Check to see if the incoming and outgoing arguments contain ‘count10’
$ watch com.Arthas addHashSet ‘{params[0],returnObj.contains(“count10”)}’
Look at the incoming and outbound parameters. The outbound parameter toString
$watch com.arthas addHashSet ‘{params[0], returnobj.toString ()}’
View the returned exception information.
4.10.2 Observe the call path of the method
Run the stack command to view method invocation information.
Observe the path of the mysql method call for class com.UserServiceImpl
Stack com.userServiceImpl mysql can see the call path as shown in figure.
4.10.3 Method call space-time tunnel
Use the tt command to record the details of the method execution.
The tt command method executes the space-time tunnel of data, recording the incoming parameters and return information of each call to the specified method, and observing these different calls at different times.
Common operations:
Start recording method invocation information: tt -t com.userServiceImpl check
You can see that the record with INDEX=1001 has is-exp = true, indicating that this call IS abnormal.
View the recorded method call information: tt-l
To view the details of the call record (-i specifies INDEX) : tt -i 1001
You can view the exception information recorded with INDEX=1001.
Re-invoke, using the specified record, and re-invoke with -p.
The result of tt-i 1001-p is shown in the figure.
The code has been uploaded to Github.
Github.com/niumoo/lab-…