Arthas – Java’s ultimate tool for locating and handling online problems

Due to the incompatibility of typora documents, the documents are always unable to be saved and published. Please refer to the new typesetting documents

Juejin. Cn/post / 699501…

preface

Before Arthas, when you had Java online problems, such as CPU spikes, load spikes, memory overruns, etc., you had to look at the commands, look at the network, then do JPS, JStack, JMAP, JHAT, Jstat, hprof, etc. You end up in a mess, and you don’t necessarily know what the problem is. Today, you can use Arthas to locate most common problems easily, solve them quickly, stop your losses on time, and leave work on time.

1. Introduction to Arthas

Arthas is a Java diagnostic tool that Alibaba opened as an open source in September 2018. Support JDK6+, use the command line interactive mode, provide Tab automatic incomplete, can conveniently locate and diagnose online program running problems. As of this writing, Star 17000+ has been harvested.

The Arthas official documentation is very detailed, and this article also references the official documentation. In addition, there are not only problems feedback but also a large number of use cases in the Issues on Github, an open source project.

Open source address: github.com/alibaba/art…

Official documentation: alibaba.github. IO /arthas

2. Usage scenarios of Arthas

Thanks to its powerful and rich capabilities, Arthas is able to do more than you could ever imagine. The following are just a few common usage scenarios, but more can be explored as you become familiar with Arthas.

Is there a global view of the health of the system?

Why does the CPU go up again, and what is taking up the CPU?

Are there deadlocks in running multiple threads? Is there a blockage?

It takes a long time to run. What takes a long time to run? How do you monitor it?

Which JAR is this class loaded from? Why are exceptions related to various classes reported?

Why is the code I changed not executed? Did I not commit? Wrong branch?

You can’t debug online when you encounter a problem, can you only add logs and republish?

Is there any way to monitor the real-time state of the JVM?

3. How does Arthas work

As mentioned earlier, Arthas is a Command-line Java diagnostic tool, written in Java, so you can download the corresponding JAR package and run it.

3.1 Installation Can be downloaded from the official Github, if the speed is slow, you can try the domestic code cloud Gitee download.

Making the download

Wget alibaba. Making. IO/arthas/arth…

Or Gitee download

Wget arthas. Gitee. IO/arthas – the boot…

Printing Help Information

Java-jar arthas-boot.jar -h 3.2 Running Arthas is only a Java program, so you can run it directly with java-jar. Select the Java process to monitor at runtime or after run.

In operation mode 1, run the Java process PID first

java -jar arthas-boot.jar

Select process (enter [] internal number (not PID) and press Enter)

[INFO] arthas-boot version: 3.1.4 [INFO] Found existing java process, please choose one and hit RETURN.

  • [1]: 11616 com.Arthas [2]: 8676 [3]: 16200 org.jetbrains.jps.cmdline.Launcher [4]: 21032 org.jetbrains.idea.maven.server.RemoteMavenServer

Runtime mode 2: Select Java process PID at runtime

Java -jar arthas-boot.jar [PID] You can run the ps command or the JPS command provided by the JDK to view the PID.

View information about running Java processes

$ jps -mlvV

Filter Java process information

$JPS – mlvV | grep [XXX] the JPS screening to process.

Once the Arthas Logo is present, you can use the command to troubleshoot the problem. More on this below.

For more startup modes, see the help help command.

Other USES

EXAMPLES: Jar Java -jar arthas-boot.jar Java -jar arthas-boot.jar –target-ip 0.0.0.0 Java -jar arthas-boot.jar — teln-port 9999 Java – – – HTTP port – 1 jar arthas – boot. The jar – tunnel – server ‘ws: / / 192.168.10.11:7777 / ws’ Java – jar arthas – boot. The jar — tunnel – server ‘ws: / / 192.168.10.11:7777 / ws’ – agent – id bvDOe8XbTM2pQWjF4cfw Java – jar arthas – boot. The jar – stat – url Java – jar arthas’ http://192.168.10.11:8080/api/stat ‘- the boot. The jar – c’ sysprop; As java-jar arthas-boot.jar –use-version 3.1.4 java-jar arthas-boot.jar –versions java -jar arthas-boot.jar –session-timeout 3600 java -jar arthas-boot.jar –attach-only java -jar Arthas -boot.jar –repo-mirror aliyun — use-HTTP 3.3 Web Console Arthas currently supports the Web Console, which automatically starts after successfully starting the connection process. You can go directly to http://127.0.0.1:8563/, and the mode of action on the page is exactly the same as on the console.

3.4 Common Commands

Here are some common Arthas commands that you may not know how to use. Don’t worry, they will be covered later.

This command introduces dashboard real-time data panel of the current system. Thread Displays the thread stack information of the current JVM. Stack outputs the call path through which the current method was called. Tt method executes data through the space-time tunnel, recording the input parameters and return information of each call to the specified method. And can be called to observe these different times of the monitor method to perform monitoring JVM to view the current JVM information vmOption view, Update parameters related to JVM diagnostics sc View information about classes loaded by the JVM SM View information about methods loaded by the JVM JAD Decompile the source code of the specified loaded class Classloader View the inheritance tree of classloader, urls, Class load information Heapdump Heap dump functionality similar to the jmap command 3.5 Exit Arthas automatically resets all enhanced classes when you exit using shutdown.

4. Common Arthas operations

Now that you’ve seen what Arthas is and how it can be started, the following details how Arthas can be used, depending on some circumstances. If there is a problem when using a command, you can run the -h command to view the help information.

Write a test class to run and use Arthas to locate the problem.

import java.util.HashSet; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import lombok.extern.slf4j.Slf4j;

/ * *

  • Arthas Demo
  • Official account: unread code
  • @Author niujinpeng

*/ @Slf4j public class Arthas {

private static HashSet hashSet = new HashSet(); / * * thread pool, the size of 1 * / private static ExecutorService ExecutorService = Executors. NewFixedThreadPool (1); Public static void main(String[] args) {// the simulated CPU is too high. // Simulate thread blocking thread(); // Simulate deadThread(); // Add data to the hashSet. AddHashSetThread (); Public static void addHashSetThread() {// Initialize constant new Thread(() -> {int count = 0; while (true) { try { hashSet.add("count" + count); Thread.sleep(10000); count++; } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); } public static void cpu() { cpuHigh(); cpuNormal(); } private static void cpuHigh() {Thread Thread = new Thread(() -> {while (true) {log.info(" CPU ") start 100"); }}); // Add to thread executorService.submit(thread); } private static void cpuNormal() {for (int I = 0; i < 10; i++) { new Thread(() -> { while (true) { log.info("cpu start"); try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); }} /** * Private static void thread() {thread thread = new thread() -> {while (true) { log.debug("thread start"); try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); }}}); // Add to thread executorService.submit(thread); } /** * deadThread */ private static void deadThread() {/** Create resource */ Object resourceA = new Object(); Object resourceB = new Object(); Thread threadA = new Thread(() -> {synchronized (resourceA) {log.info(thread.currentThread () + "get" ResourceA"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceB"); synchronized (resourceB) { log.info(Thread.currentThread() + " get resourceB"); }}}); Thread threadB = new Thread(() -> { synchronized (resourceB) { log.info(Thread.currentThread() + " get ResourceB"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceA"); synchronized (resourceA) { log.info(Thread.currentThread() + " get resourceA"); }}}); threadA.start(); threadB.start(); }Copy the code

} 4.1 Global Monitoring Use the dashboard command to view the thread, memory, GC, and running environment information of a program.

The above code example has an infinite loop in which the CPU is idling, which is very performance consuming, so how do you find out?

Using thread, you can see that the CPU usage of each thread is 100%.

Using the command thread 12 to view the high CPU usage of thread 12, you can see the high CPU usage method and the number of lines (the number of lines may differ from the number of lines in the above code, which was rearranged as I wrote this article).

The above is by looking at the overall thread information, and then see the specific thread running. If you want to find the threads with high CPU usage, you can run the thread -n [Number of threads displayed] command to list the threads with Top N CPU usage.

Locate to the method with the highest CPU usage.

4.3 Thread Pool Thread State Before locating thread issues, let’s review some common thread states:

In the operation of the RUNNABLE

TIMED_WAITIN A thread that calls the following method will enter TIMED_WAITING:

Thread#sleep()

Object#wait() with a timeout parameter

Thread#join() with a timeout parameter

LockSupport#parkNanos()

LockSupport#parkUntil()

WAITING A thread enters the WAITING state when it calls the following method:

Object#wait() with no timeout argument

Thread#join() with no timeout parameter

LockSupport#park()

BLOCKED BLOCKED, waiting for the lock

In the above simulation, we define a pool of thread size 1, submit a thread in the cpuHigh method, and submit another thread in the thread method. The next thread is blocked because the pool is full.

Using thread | grep pool in order to check the thread pool thread information.

You can see that the thread pool has WAITING threads.

In the simulation code above, the deadThread method implements a deadlock. Use the thread-b command to view the information directly locating the deadlock.

/ * *

  • A deadlock

/ private static void deadThread() {/* Create resource */ Object resourceA = new Object(); Object resourceB = new Object(); Thread threadA = new Thread(() -> {synchronized (resourceA) {log.info(thread.currentThread () + “get” ResourceA”); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + “waiting get resourceB”); synchronized (resourceB) { log.info(Thread.currentThread() + ” get resourceB”); }}});

Thread threadB = new Thread(() -> { synchronized (resourceB) { log.info(Thread.currentThread() + " get ResourceB"); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } log.info(Thread.currentThread() + "waiting get resourceA"); synchronized (resourceA) { log.info(Thread.currentThread() + " get resourceA"); }}}); threadA.start(); threadB.start();Copy the code

} check the deadlock information.

The above code is put under the com package, assuming that this is a threaded environment, when you suspect that the current running code is not the code you want, you can directly decompile the code, or optionally view the field or method information of the class.

If you suspect it is not your code, you can use the jad command to decompile the class directly.

Jad The jad command also provides some other parameters:

Decompile shows only the source code

jad –source-only com.Arthas

Decompile a method of a class

Jad –source-only com.Arthas mysql 4.6 Viewing field information Run the **sc -d -f ** command to view the field information of the class.

[arthas@20252] Sc – d – fcom. Arthassc – d – fcom. Arthasclass – infocom. Arthascode – source/C: / Users/Niu/Desktop/arthas/target/classes/namecom Artha sisInterfacefalseisAnnotationfalseisEnumfalseisAnonymousClassfalseisArrayfalseisLocalClassfalseisMemberClassfalseisPrimi TivefalseisSyntheticfalsesimple – nameArthasmodifierpublicannotationinterfacessuper – class + – Java. Lang. Objectclass – loader + – s un.misc.Launcher sc -d -f com.Arthas sc -d -f com.Arthas class-info com.Arthas code-source /C:/Users/Niu/Desktop/arthas/target/classes/ name com.Arthas isInterface false isAnnotation false isEnum false isAnonymousClass false isArray false isLocalClass false isMemberClass false isPrimitive false isSynthetic false simple-name Arthas modifier public annotation interfaces super-class +-java.lang.Object class-loader + – sun. Misc. Launchersc – d – fcom. Arthassc – d – fcom. Arthasclass – infocom. Arthascode – source/C: / Users/Niu/Desktop/arthas/target/cl asses/namecom.ArthasisInterfacefalseisAnnotationfalseisEnumfalseisAnonymousClassfalseisArrayfalseisLocalClassfalseisMemb ErClassfalseisPrimitivefalseisSyntheticfalsesimple – nameArthasmodifierpublicannotationinterfacessuper – class + – Java. Lang, Ob Jectclass – loader + – sun. Misc. 18 b4aac2 LauncherAppClassLoader @ + – sun. Misc. The Launcher $ExtClassLoader @ 2 ef1e4fa classLoaderHash 18b4aac2 fields modifierfinal,private,static type org.slf4j.Logger name log value Logger[com.Arthas]

               modifierprivate,static
               type    java.util.HashSet
               name    hashSet
               value   [count1, count2]

               modifierprivate,static
               type    java.util.concurrent.ExecutorService
               name    executorService
               value   java.util.concurrent.ThreadPoolExecutor@71c03156[Ru
                       nning, pool size = 1, active threads = 1, queued ta
                       sks = 0, completed tasks = 0]
Copy the code

Affect(row-cnt:1) cost in 9 ms.4.7 Querying method Information Run the sm command to query method information of a class.

[arthas@22180] smcom.Arthascom.Arthas

()Vcom.Arthasstart()Vcom.Arthasthread()Vcom.ArthasdeadThread()Vcom.Arthaslambda sm com.Arthas com.Arthas

()V com.Arthas start()V com.Arthas thread()V com.Arthas deadThread()V com.Arthas lambdasmcom.Arthascom.Arthas

()Vcom.Arthasstart()Vcom.Arthasthread()Vcom.ArthasdeadThread()Vcom.ArthaslambdacpuHigh 1()Vcom.ArthascpuHigh()Vcom.Arthaslambda1()V com.Arthas cpuHigh()V com.Arthas lambda1()Vcom.ArthascpuHigh()Vcom.Arthaslambdathread 3()Vcom.ArthasaddHashSetThread()Vcom.ArthascpuNormal()Vcom.Arthascpu()Vcom.Arthaslambda3()V com.Arthas addHashSetThread()V com.Arthas cpuNormal()V com.Arthas cpu()V com.Arthas lambda3()Vcom.ArthasaddHashSetThread()Vcom.ArthascpuNormal()Vcom.Arthascpu()Vcom.ArthaslambdaaddHashSetThread 0()Vcom.Arthaslambda0()V com.Arthas lambda0()Vcom.ArthaslambdadeadThread 4(Ljava/lang/Object; Ljava/lang/Object;) Vcom.Arthaslambda4(Ljava/lang/Object; Ljava/lang/Object;) V com.Arthas lambda4(Ljava/lang/Object; Ljava/lang/Object;) Vcom.ArthaslambdadeadThread 5(Ljava/lang/Object; Ljava/lang/Object;) Vcom.Arthaslambda5(Ljava/lang/Object; Ljava/lang/Object;) V com.Arthas lambda5(Ljava/lang/Object; Ljava/lang/Object;) Vcom. ArthaslambdacpuNormal $2 (V) Affect (row – CNT: 16) cost in 6 ms. 4.8 use of the value of the variable is very curious ognl command, ognl expressions can easily operate want information.

In the same code as the example above, let’s look at the data in the variable hashSet:

View information about the static variable hashSet.

[arthas@19856]$ ognl ‘@com.Arthas@hashSet’ @HashSet[ @String[count1], @String[count2], @String[count29], @String[count28], @String[count0], @String[count27], @String[count5], @String[count26], @String[count6], @string [count25], @string [count3], @string [count24] check the size of the static variable hashSet.

[arthas@19856]$ognl ‘@[email protected]()’ @Integer[57] can even operate.

[arthas @ 19856] ognl ‘@ [email protected] (” test “)’ @ Boolean [true] [arthas @ 19856] ognl ‘@ [email protected] (” test “)’ @ Boolean [true] [arthas @ 19856] ognl ‘@ [email protected] (” test “)’ @ Boolean [true] [arthas @ 19856]

View the added characters

[arthas @ 19856] ognl ‘@ com. Arthas @ hashSet’ ∣ greptest @ String [test], [arthas @ 19856] ognl ‘@ com. Arthas @ hashSet | grep test @string [test],[arthas@19856] ognl ‘@com.Arthas@hashSet’ ∣greptest@String[test],[arthas@19856] ognl can do a lot of things, See the special usage of ognl expressions (github.com/alibaba/art…) .

4.9 Is there any Problem with the Program? 4.9.1 Slow Running and Long Time Consuming Run the trace command to trace the statistical method time

This time, I’m going to use another simulation code. The getUser method on the control layer calls userService.get(uid); the getUser method on the control layer calls userService.get(uid). Check, service, redis, mysql;

@RestController @Slf4j public class UserController {

@Autowired private UserServiceImpl userService; @GetMapping(value = "/user") public HashMap<String, Object> getUser(Integer UID) throws Exception {// Simulate user query userService.get(uid); HashMap<String, Object> hashMap = new HashMap<>(); hashMap.put("uid", uid); hashMap.put("name", "name" + uid); return hashMap; }Copy the code

} mock code Service:

@Service @Slf4j public class UserServiceImpl {

public void get(Integer uid) throws Exception { check(uid); service(uid); redis(uid); mysql(uid); } public void service(Integer uid) throws Exception { int count = 0; for (int i = 0; i < 10; i++) { count += i; } log.info("service end {}", count); } public void redis(Integer uid) throws Exception { int count = 0; for (int i = 0; i < 10000; i++) { count += i; } log.info("redis end {}", count); } public void mysql(Integer uid) throws Exception { long count = 0; for (int i = 0; i < 10000000; i++) { count += i; } log.info("mysql end {}", count); } public Boolean check (uid) Integer throws the Exception {the if (uid = = null | | uid < 0) {log. The error (" uid is not correct, the uid: {} ", uid); Throw new Exception("uid is incorrect "); } return true; }Copy the code

} After running Springboot, use the **trace== ** command to start detecting the time.

[arthas@6592]$trace com.userController getUser Access interface /getUser. The com.userServiceImpl :get() method takes a lot of time.

Continue to trace the time-consuming method, and then access it again.

[arthas@6592]$ trace com.UserServiceImpl get

It is clear that the mysql method com.UserServiceImpl is the most time-consuming.

Affect(class-cnt:1 , method-cnt:1) cost in 31 ms. —ts=2019-10-16 14:40:10; thread_name=http-nio-8080-exec-8; id=1f; is_daemon=true; priority=5; TCCL = org. Springframework. Boot. Web. Embedded. Tomcat. 23 a918c7 TomcatEmbeddedWebappClassLoader @ – [6.792201] ms Com. UserServiceImpl: get () + – [0.008] ms com. UserServiceImpl: check () # 17 + – [0.076] ms com. UserServiceImpl: service () # 18 + – [0.1089] ms com. UserServiceImpl: redis () # 19 ` – [6.528899] ms com. UserServiceImpl: mysql () # 20 4.9.2 statistical method take use of the monitor Command to monitor the execution of statistical methods.

Count the get method execution of the com.UserServiceImpl class every 5 seconds.

monitor -c 5 com.UserServiceImpl get

The following example uses the first two mocks in this article.

4.10.1 Viewing Input and Output Parameter Information You can run the watch command to view input and output parameters and exception information.

USAGE: watch [-b] [-e] [-x ] [-f] [-h] [-n ] [-E] [-M ] [-s] class-pattern method-pattern express [condition-express]

SUMMARY: Display the input/output parameter, return object, and thrown exception of specified method invocation The express may be one of the following expression (evaluated dynamically): target : the object clazz : the object’s class method : the constructor or method params : the parameters array of method params[0..n] : the element of parameters array returnObj : the returned object of method throwExp : the throw exception of method isReturn : the method ended by return isThrow : the method ended by throwing exception #cost : the execution time in ms of method invocation Examples: watch -b org.apache.commons.lang.StringUtils isBlank params watch -f org.apache.commons.lang.StringUtils isBlank returnObj watch org.apache.commons.lang.StringUtils isBlank ‘{params, target, returnObj}’ -x 2 watch -bf *StringUtils isBlank params watch *StringUtils isBlank params[0] watch *StringUtils isBlank params[0] params[0].length==1 watch *StringUtils isBlank params ‘#cost>100’ watch -E -b org.apache.commons.lang.StringUtils isBlank params[0]

WIKI: alibaba. Making. IO/arthas/watc… Common operations:

View the incoming and outgoing parameters

$ watch com.Arthas addHashSet ‘{params[0],returnObj}’

View the incoming and outgoing parameter sizes

$ watch com.Arthas addHashSet ‘{params[0],returnObj.size}’

Check to see if the incoming and outgoing arguments contain ‘count10’

$ watch com.Arthas addHashSet ‘{params[0],returnObj.contains(“count10”)}’

Look at the incoming and outbound parameters. The outbound parameter toString

$watch com.arthas addHashSet ‘{params[0], returnobj.toString ()}’

View the returned exception information.

4.10.2 Viewing Method Invocation Paths Run the stack command to view method invocation information.

Observe the path of the mysql method call for class com.UserServiceImpl

Stack com.userServiceImpl mysql can see the call path as shown in figure.

Use the TT command to record the details of method execution.

The tt command method executes the space-time tunnel of data, recording the incoming parameters and return information of each call to the specified method, and observing these different calls at different times.

Common operations:

Start recording method invocation information: tt -t com.userServiceImpl check

You can see that the record with INDEX=1001 has is-exp = true, indicating that this call IS abnormal.

View the recorded method call information: tt-l

To view the details of the call record (-i specifies INDEX) : tt -i 1001

You can view the exception information recorded with INDEX=1001.

Re-invoke, using the specified record, and re-invoke with -p.

The result of tt-i 1001-p is shown in the figure.

The code has been uploaded to Github.

Github.com/niumoo/lab-…