Arthas – The ultimate tool for locating online problems in Java

Transfer: mp.weixin.qq.com/s?subscene=…

preface

Before Arthas, when you had Java online problems, such as CPU spikes, load spikes, memory overruns, etc., you had to look at the commands, look at the network, then do JPS, JStack, JMAP, JHAT, Jstat, hprof, etc. You end up in a mess, and you don’t necessarily know what the problem is. Today, you can use Arthas to locate most common problems easily, solve them quickly, stop your losses on time, and leave work on time.

1. Introduction to Arthas

Arthas is a Java diagnostic tool that Alibaba opened as an open source in September 2018. Support JDK6+, use the command line interactive mode, provide Tab automatic incomplete, can conveniently locate and diagnose online program running problems. As of this writing, Star 17000+ has been harvested.

The Arthas official documentation is very detailed, and this article also references the official documentation. In addition, there are not only problems feedback but also a large number of use cases in the Issues on Github, an open source project.

data

Open source address: github.com/alibaba/art…

Official documentation: alibaba.github. IO /arthas

WIKI: arthas.aliyun.com/doc

2. Usage scenarios of Arthas

Thanks to its powerful and rich capabilities, Arthas is able to do more than you could ever imagine. The following are just a few common usage scenarios, but more can be explored as you become familiar with Arthas.

Is there a global view of the health of the system?

Why does the CPU go up again, and what is taking up the CPU?

Are there deadlocks in running multiple threads? Is there a blockage?

It takes a long time to run. What takes a long time to run? How do you monitor it?

Which JAR is this class loaded from? Why are exceptions related to various classes reported?

Why is the code I changed not executed? Did I not commit? Wrong branch?

You can’t debug online when you encounter a problem, can you only add logs and republish?

Is there any way to monitor the real-time state of the JVM?

3. How does Arthas work

As mentioned earlier, Arthas is a Command-line Java diagnostic tool, written in Java, so you can download the corresponding JAR package and run it.

3.1 installation

It can be downloaded from the official Github, or if it is slow, try Gitee, the domestic code cloud.

Arthas is downloaded and installed

Making the download

Wget alibaba. Making. IO/arthas/arth…

Or Gitee download

Wget arthas. Gitee. IO/arthas – the boot…

3.2 run

Arthas is just a Java program, so you can run it directly with java-jar. Select the Java process to monitor at runtime or after run.

Printing Help Information

java -jar arthas-boot.jar -h

EXAMPLES: Jar Java -jar arthas-boot.jar Java -jar arthas-boot.jar –target-ip 0.0.0.0 Java -jar arthas-boot.jar — teln-port 9999 –http-port -1 java -jar arthas-boot.jar –username admin –password java -jar arthas-boot.jar –tunnel-server ‘ws: / / 192.168.10.11:7777 / ws’ – app -name demoapp Java jar arthas – boot. The jar – tunnel – server’ ws: / / 192.168.10.11:7777 / ws’ – agent – id bvDOe8XbTM2pQWjF4cfw Java – jar arthas – boot. The jar – stat – url ‘http://192.168.10.11:8080/api/stat’ Java – jar arthas-boot.jar -c ‘sysprop; As java-jar arthas-boot.jar –use-version 3.5.2 java-jar arthas-boot.jar –versions java -jar arthas-boot.jar –select math-game java -jar arthas-boot.jar –session-timeout 3600 java -jar arthas-boot.jar –attach-only java -jar arthas-boot.jar –disabled-commands stop,dump java -jar arthas-boot.jar –repo-mirror aliyun –use-http

Running mode 1: Run the Java process and select PID

java -jar arthas-boot.jar

Select process (enter [] internal number (not PID) and press Enter)

[INFO] arthas-boot version: 3.1.4 [INFO] Found existing java process, please choose one and hit RETURN.

  • [1]: 11616 com.Arthas

    [2], 8676

    [3]: 16200 org.jetbrains.jps.cmdline.Launcher [4]: 21032 org.jetbrains.idea.maven.server.RemoteMavenServer

Runtime mode 2: Select Java process PID at runtime

Java -jar arthas-boot.jar [PID] You can run the ps command or the JPS command provided by the JDK to view the PID.

View information about running Java processes

$ jps -mlvV

Filter Java process information

$JPS – mlvV | grep [XXX] the JPS screening to process.

Once the Arthas Logo is present, you can use the command to troubleshoot the problem. More on this below.

# $ps - ef | grep Java to find specific Java pid $Java - jar arthas - boot. Jar 15601 [INFO] arthas - boot version: 3.5.2[INFO] Start Download arthas from remote Server: https://arthas.aliyun.com/download/3.5.3?mirror=aliyun [INFO] Download arthas success. [INFO] arthas home: / home/haochezhu - dev /. Arthas/lib / 3.5.3 arthas [INFO] Try to attach the process 15601 [INFO] attach process 15601 success. [INFO] Arthas - client connect 127.0.0.1 3658, -.,. -- -- -- -- -- -, -- -- -- -- -- -- -- -.,.,.,. -, -. / O \ | -. '-.. -' | '-' | / O \ '-' |. -. | | '-', '| |. | -. | |. -. | `. ` - | | | | | | \ \ | | | | | | | | | |. -' | ` - ` '` -' - '-' ` - '` -' `--'`--' `--'`-----'wiki https://arthas.aliyun.com/doctutorials https://arthas.aliyun.com/doc/arthas-tutorials.htmlversion 3.5.3 main_classpid 15601 time 2021-08-11 11:29:41[arthas@15601]$Copy the code

For more startup modes, see the help help command.

Other USES

EXAMPLES: Jar Java -jar arthas-boot.jar Java -jar arthas-boot.jar –target-ip 0.0.0.0 Java -jar arthas-boot.jar — teln-port 9999 Java – – – HTTP port – 1 jar arthas – boot. The jar – tunnel – server ‘ws: / / 192.168.10.11:7777 / ws’ Java – jar arthas – boot. The jar — tunnel – server ‘ws: / / 192.168.10.11:7777 / ws’ – agent – id bvDOe8XbTM2pQWjF4cfw Java – jar arthas – boot. The jar – stat – url Java – jar arthas’ http://192.168.10.11:8080/api/stat ‘- the boot. The jar – c’ sysprop; As java-jar arthas-boot.jar –use-version 3.1.4 java-jar arthas-boot.jar –versions java -jar arthas-boot.jar –session-timeout 3600 java -jar arthas-boot.jar –attach-only java -jar arthas-boot.jar –repo-mirror aliyun –use-http

3.3 web console

Arthas currently supports a Web Console, which is automatically started upon successful startup of the connection process and can be accessed directly at http://127.0.0.1:8563/, where the page operates in exactly the same mode as the Console.

3.4 Common Commands

Here are some common Arthas commands that you may not know how to use. Don’t worry, they will be covered later.

Dashboard Real-time data panel of the current system Thread Displays thread stack information of the current JVM watch Method Performs data observation – method debug debugging, method variable values, etc. The trace method calls the path internally and prints the time on each node on the method path. Tp99 Monitoring time stack output the call path of the current method is called tt method execution data space-time tunnel, record the input parameters and return information of each call of the specified method, And can be called to observe these different times of the monitor method to perform monitoring JVM to view the current JVM information vmOption view, Update parameters related to JVM diagnostics sc View information about classes loaded by the JVM SM View information about methods loaded by the JVM JAD Decompile the source code of the specified loaded class Classloader View the inheritance tree of classloader, urls, Class loading information Heapdump Is similar to the heapdump function of the jmap command

3.5 out of shutdown

When you exit with shutdown, Arthas automatically resets all enhanced classes at the same time.

4. Common Arthas operations

Now that you’ve seen what Arthas is and how it can be started, the following details how Arthas can be used, depending on some circumstances. If there is a problem when using a command, you can run the -h command to view the help information.

Write a test class to run and use Arthas to locate the problem.

import java.util.HashSet; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import lombok.extern.slf4j.Slf4j; /** * <p> * Arthas Demo * * * @author niujinpeng */ @slf4j public class Arthas {private static HashSet HashSet = new HashSet(); / * * thread pool, the size of 1 * / private static ExecutorService ExecutorService = Executors. NewFixedThreadPool (1); Public static void main(String[] args) {// the simulated CPU is too high. // Simulate thread blocking thread(); // Simulate deadThread(); // Add data to the hashSet. AddHashSetThread (); Public static void addHashSetThread() {// Initialize constant new Thread(() -> {int count = 0; while (true) { try { hashSet.add("count" + count); Thread.sleep(10000); count++; } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); } public static void cpu() { cpuHigh(); cpuNormal(); } private static void cpuHigh() {Thread Thread = new Thread(() -> {while (true) {log.info(" CPU ") start 100"); }}); // Add to thread executorService.submit(thread); } private static void cpuNormal() {for (int I = 0; i < 10; i++) { new Thread(() -> { while (true) { log.info("cpu start"); try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); }} /** * Private static void thread() {thread thread = new thread() -> {while (true) { log.debug("thread start"); try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); }}}); // Add to thread executorService.submit(thread); } /** * deadThread */ private static void deadThread() {/** Create resource */ Object resourceA = new Object(); Object resourceB = new Object(); Thread threadA = new Thread(() -> {synchronized (resourceA) {log.info(thread.currentThread () + "get" ResourceA"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceB"); synchronized (resourceB) { log.info(Thread.currentThread() + " get resourceB"); }}}); Thread threadB = new Thread(() -> { synchronized (resourceB) { log.info(Thread.currentThread() + " get ResourceB"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceA"); synchronized (resourceA) { log.info(Thread.currentThread() + " get resourceA"); }}}); threadA.start(); threadB.start(); }}Copy the code

4.1 Monitoring Dashboards globally

Use the dashboard command to get an overview of your program’s threading, memory, GC, and runtime environment.

4.2 Why did CPU take off

The above code example has a CPU idling infinite loop, which is very CPU consuming, so how to find out?

Using thread, you can see that the CPU usage of each thread is 100%.

Using the command thread 12 to view the high CPU usage of thread 12, you can see the high CPU usage method and the number of lines (the number of lines may differ from the number of lines in the above code, which was rearranged as I wrote this article).

The above is by looking at the overall thread information, and then see the specific thread running. If you want to find the threads with high CPU usage, you can run the thread -n [Number of threads displayed] command to list the threads with Top N CPU usage.

Locate to the method with the highest CPU usage.

4.3 Thread Pool Thread status

Before locating thread problems, let’s review some common thread states:

In the operation of the RUNNABLE

TIMED_WAITIN A thread that calls the following method will enter TIMED_WAITING:

Thread#sleep()

Object#wait() with a timeout parameter

Thread#join() with a timeout parameter

LockSupport#parkNanos()

LockSupport#parkUntil()

WAITING A thread enters the WAITING state when it calls the following method:

Object#wait() with no timeout argument

Thread#join() with no timeout parameter

LockSupport#park()

BLOCKED BLOCKED, waiting for the lock

In the above simulation, we define a pool of thread size 1, submit a thread in the cpuHigh method, and submit another thread in the thread method. The next thread is blocked because the pool is full.

Using thread | grep pool in order to check the thread pool thread information.

You can see that the thread pool has WAITING threads.

4.4 Thread Deadlock

In the above simulation code, the deadThread method implements a deadlock. Use the thread-b command to view the information directly locating the deadlock.

/** Deadlock */ private static void deadThread() {/** Create resource */Object resourceA = new Object(); Object resourceB = new Object(); Thread threadA = new Thread(() -> {synchronized (resourceA) {log.info(thread.currentThread () + "get" ResourceA"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceB"); synchronized (resourceB) { log.info(Thread.currentThread() + " get resourceB"); }}}); Thread threadB = new Thread(() -> { synchronized (resourceB) { log.info(Thread.currentThread() + " get ResourceB"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceA"); synchronized (resourceA) { log.info(Thread.currentThread() + " get resourceA"); }}}); threadA.start(); threadB.start(); }Copy the code

The checked deadlock information.

4.5 Decompile jad

The above code is placed under package COM. Assuming this is a threaded environment, you can either decompile the code directly or optionally view the field or method information of the class if you suspect that the code currently running is not what you want.

If you suspect it is not your code, you can use the jad command to decompile the class directly.

Jad The jad command also provides some other parameters:

Decompile shows only the source code

jad –source-only com.Arthas

Decompile a method of a class

jad –source-only com.Arthas mysql

4.6 Viewing Field Information

Run the sc -d -f command to view the field information about the class.

[arthas@20252]$ sc -d -f com.Arthassc -d -f com.Arthas class-info        com.Arthas code-source       /C:/Users/Niu/Desktop/arthas/target/classes/ name              com.Arthas isInterface       false isAnnotation      false isEnum            false isAnonymousClass  false isArray           false isLocalClass      false isMemberClass     false isPrimitive       false isSynthetic       false simple-name       Arthas modifier          public annotation interfaces super-class       +-java.lang.Object class-loader      +-sun.misc.Launcher$AppClassLoader@18b4aac2                     +-sun.misc.Launcher$ExtClassLoader@2ef1e4fa classLoaderHash   18b4aac2 fields            modifierfinal,private,static                   type    org.slf4j.Logger                   name    log                   value   Logger[com.Arthas]​                   modifierprivate,static                   type    java.util.HashSet                   name    hashSet                   value   [count1, count2]                       modifierprivate,static                   type    java.util.concurrent.ExecutorService                   name    executorService                   value   java.util.concurrent.ThreadPoolExecutor@71c03156[Ru                           nning, pool size = 1, active threads = 1, queued ta                           sks = 0, completed tasks = 0]​​Affect(row-cnt:1) cost in 9 ms.
Copy the code

4.7 Viewing Method Information

Use the sm command to view the method information of the class.

[arthas@22180]$ sm com.Arthascom.Arthas <init>()Vcom.Arthas start()Vcom.Arthas thread()Vcom.Arthas deadThread()Vcom.Arthas lambda$cpuHigh$1()Vcom.Arthas cpuHigh()Vcom.Arthas lambda$thread$3()Vcom.Arthas addHashSetThread()Vcom.Arthas cpuNormal()Vcom.Arthas cpu()Vcom.Arthas lambda$addHashSetThread$0()Vcom.Arthas lambda$deadThread$4(Ljava/lang/Object; Ljava/lang/Object;) Vcom.Arthas lambda$deadThread$5(Ljava/lang/Object; Ljava/lang/Object;) Vcom.Arthas lambda$cpuNormal$2()VAffect(row-cnt:16) cost in 6 ms.Copy the code

4.8 Is curious about the value of the variable

With the ognl command, ogNL expressions can easily manipulate the desired information.

In the same code as the example above, let’s look at the data in the variable hashSet:

View information about the static variable hashSet.

[arthas@19856]$ ognl '@com.Arthas@hashSet'@HashSet[    @String[count1],    @String[count2],    @String[count29],    @String[count28],    @String[count0],    @String[count27],    @String[count5],    @String[count26],    @String[count6],    @String[count25],    @String[count3],    @String[count24],
Copy the code

View the size of the static variable hashSet.

[arthas@19856]$ognl ‘@[email protected]()’ @Integer[57] can even operate.

[arthas@19856] ognl ‘@[email protected](“test”)’ @Boolean\[true\] \[arthas@19856\]

View the added characters

[arthas @ 19856] ognl ‘@ com. Arthas @ hashSet | grep test @ String \ [test \], \ [arthas @ 19856 \] ognl can do many things, See the special usage of ognl expressions (github.com/alibaba/art…) .

4.9 Is there any problem with the program

4.9.1 Slow Operation and Long Time Consumption

You can run the trace command to trace the statistical method time

This time, I’m going to use another simulation code. The getUser method on the control layer calls userService.get(uid); the getUser method on the control layer calls userService.get(uid). Check, service, redis, mysql;

@RestController@Slf4jpublic class UserController { @Autowired private UserServiceImpl userService; @GetMapping(value = "/user") public HashMap<String, Object> getUser(Integer UID) throws Exception {// Simulate user query userService.get(uid); HashMap<String, Object> hashMap = new HashMap<>(); hashMap.put("uid", uid); hashMap.put("name", "name" + uid); return hashMap; }}Copy the code

Mock code Service:

@Service@Slf4jpublic class UserServiceImpl { public void get(Integer uid) throws Exception { check(uid); service(uid); redis(uid); mysql(uid); } public void service(Integer uid) throws Exception { int count = 0; for (int i = 0; i < 10; i++) { count += i; } log.info("service end {}", count); } public void redis(Integer uid) throws Exception { int count = 0; for (int i = 0; i < 10000; i++) { count += i; } log.info("redis end {}", count); } public void mysql(Integer uid) throws Exception { long count = 0; for (int i = 0; i < 10000000; i++) { count += i; } log.info("mysql end {}", count); } public Boolean check (uid) Integer throws the Exception {the if (uid = = null | | uid < 0) {log. The error (" uid is not correct, the uid: {} ", uid); Throw new Exception("uid is incorrect "); } return true; }}Copy the code

After running Springboot, use the trace== command to start detecting the time.

[arthas@6592]$trace com.userController getUser Access interface /getUser. The com.userServiceImpl :get() method takes a lot of time.

Continue to trace the time-consuming method, and then access it again.

[arthas@6592]$ trace com.UserServiceImpl get

It is clear that the mysql method com.UserServiceImpl is the most time-consuming.

Affect(class-cnt:1 , method-cnt:1) cost in 31 ms. —ts=2019-10-16 14:40:10; thread_name=http-nio-8080-exec-8; id=1f; is_daemon=true; priority=5; TCCL = org. Springframework. Boot. Web. Embedded. Tomcat. 23 a918c7 TomcatEmbeddedWebappClassLoader @ – [6.792201] ms Com. UserServiceImpl: get () + – [0.008] ms com. UserServiceImpl: check () # 17 + – [0.076] ms com. UserServiceImpl: service () # 18 + – [0.1089] ms com. UserServiceImpl: redis () # 19 ` – [6.528899] ms com. UserServiceImpl: mysql (# 20)

4.9.2 Time consuming of statistical methods

Run the monitor command to monitor the execution of statistical methods.

Count the get method execution of the com.UserServiceImpl class every 5 seconds.

monitor -c 5 com.UserServiceImpl get

4.10 Want to observe method information

The following example uses the first two mocks in this article.

4.10.1 Observing the Input and output parameters of a method

Use the watch command to easily view input and output parameters and exceptions.

 USAGE:   watch [-b] [-e] [-x <value>] [-f] [-h] [-n <value>] [-E] [-M <value>] [-s] class-pattern method-pattern express [condition-express]​ SUMMARY:   Display the input/output parameter, return object, and thrown exception of specified method invocation   The express may be one of the following expression (evaluated dynamically):           target : the object            clazz : the object's class           method : the constructor or method           params : the parameters array of method     params[0..n] : the element of parameters array        returnObj : the returned object of method         throwExp : the throw exception of method         isReturn : the method ended by return          isThrow : the method ended by throwing exception            #cost : the execution time in ms of method invocation Examples:   watch -b org.apache.commons.lang.StringUtils isBlank params   watch -f org.apache.commons.lang.StringUtils isBlank returnObj   watch org.apache.commons.lang.StringUtils isBlank '{params, target, returnObj}' -x 2   watch -bf *StringUtils isBlank params   watch *StringUtils isBlank params[0]   watch *StringUtils isBlank params[0] params[0].length==1   watch *StringUtils isBlank params '#cost>100'   watch -E -b org\.apache\.commons\.lang\.StringUtils isBlank params[0]
Copy the code

WIKI: alibaba. Making. IO/arthas/watc… Common operations:

View the incoming and outgoing parameters

$ watch com.Arthas addHashSet ‘{params[0],returnObj}’

View the incoming and outgoing parameter sizes

$ watch com.Arthas addHashSet ‘{params[0],returnObj.size}’

Check to see if the incoming and outgoing arguments contain ‘count10’

$ watch com.Arthas addHashSet ‘{params[0],returnObj.contains(“count10”)}’

Look at the incoming and outbound parameters. The outbound parameter toString

$watch com.arthas addHashSet ‘{params[0], returnobj.toString ()}’

View the returned exception information.

4.10.2 Observe the call path of the method

Run the stack command to view method invocation information.

Observe the path of the mysql method call for class com.UserServiceImpl

Stack com.userServiceImpl mysql can see the call path as shown in figure.

4.10.3 Method call space-time tunnel

Use the tt command to record the details of the method execution.

The tt command method executes the space-time tunnel of data, recording the incoming parameters and return information of each call to the specified method, and observing these different calls at different times.

Common operations:

Start recording method invocation information: tt -t com.userServiceImpl check

You can see that the record with INDEX=1001 has is-exp = true, indicating that this call IS abnormal.

View the recorded method call information: tt-l

To view the details of the call record (-i specifies INDEX) : tt -i 1001

You can view the exception information recorded with INDEX=1001.

Re-invoke, using the specified record, and re-invoke with -p.

The result of tt-i 1001-p is shown in the figure.

The code has been uploaded to Github.

Github.com/niumoo/lab-…