1. Introduction of arthas
Arthas is a cross-platform open source tool for Java application diagnostics, which is open sourced and maintained by Alibaba. Arthas integrates JVM tracking, decomcompiling, hot updating, hot loading, and code execution tracking to make it easy for developers to locate online problems and tune.
-
Official Document Address
Arthas official documentation
-
Github source address
Arthas code repository
The official documentation lists the following scenarios where Arthas can assist:
- From which JAR is this class loaded? Why are all kinds of class-related exceptions reported?
- Why didn’t the code I changed execute? Did I not commit? Got the branch wrong?
- If you encounter a problem, you cannot debug it online. Can you only re-publish it by logging?
- There is a problem with a user’s data processing online, but it cannot be debugged online, and it cannot be reproduced offline!
- Is there a global view of the health of the system?
- Is there any way to monitor the real-time health of the JVM?
- How to quickly locate application hot spots, generate flame map?
2. Installation and basic use
2.1 Download and Installation
Making release:github.com/alibaba/art…
- Fast installation
#Download the jar package
curl -O https://arthas.aliyun.com/arthas-boot.jar
java -jar arthas-boot.jar
#When you open it for the first time, the associated package is downloaded, but the download speed is slow. You can use the Aliyun image
java -jar arthas-boot.jar --repo-mirror aliyun --use-http
Copy the code
- Install shell commands
curl -L https://arthas.aliyun.com/install.sh | sh
Copy the code
This command downloads the startup script file as.sh to the current directory, and you can configure environment variables to quickly start the system
Quick installation is recommended. Jar software enables flexible startup and provides consistent operation experience across multiple platforms.
2.2 Basic Usage
Arthas uses the Java process as the basic operating unit. After arthas is started, all the Java processes are listed. After selecting the number, an Arthas Server is started and multiple sessions can be created to connect to arthas Server. You can diagnose Java processes using the commands provided by Arthas.
2.2.1 Starting and Connecting to a Java process
#Start the arthas
cd arthasPath
java -jar arthas-boot.jar
Copy the code
Once Arthas is started, all the Java processes are listed with their numbers
Arthas Script Version: 3.4.6 [INFO] JAVA_HOME: / Library/Java/JavaVirtualMachines jdk1.8.0 _271. JDK/Contents/Home/INFO Process 3983 already using port 3658 [INFO] Process 3983 already using port 8563 Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER. * [1]: 1795 [2]: 2484 com.intellij.database.remote.RemoteJdbcServer [3]: 1828 org.jetbrains.jps.cmdline.Launcher [4]: 533 [5]: 1290 com.intellij.database.remote.RemoteJdbcServer [6]: 3982 org.jetbrains.jps.cmdline.Launcher [7]: 3983 top.learningwang.arthasdemo.ArthasDemoApplicationCopy the code
Enter the number of the Java process you want to diagnose and press Enter. Wait for the arthas Server to be created. Once the session is established, you can use the arthas command
Attaching to 3983 using version /Users/wangjingbiao/software/arthas... Real 0m0.202s user 0m0.369s SYS 0m0.045s Attach success. Current timestamp is 1613983174 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. ,---. ,------. ,--------.,--. ,--. ,---. ,---. / O \ | .--. ''--. .--'| '--' | / O \ ' .-' | .-. || '--'.' | | | .--. || .-. |`. `-. | | | || |\ \ | | | | | || | | |.-' | `--' `--'`--' '--' `--' `--' `--'`--' `--'`-----' wiki https://arthas.aliyun.com/doc tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html version 3.4.6 pid 3983 time 2021-02-22 16:15:16 arthas @ 3983 $Copy the code
2.2.2 Running the arthas Command
Once the arthas session is started, you can enter commands to execute arthas support. Currently supported list of commands: arthas.aliyun.com/doc/command…
The following describes common commands and command operations in specific scenarios.
2.2.3 exit arthas
- quit
Exit the current Arthas client, leaving other Arthas clients unaffected
- exit
Equivalent to quit
- Stop Closes the Arthas server and exits all Arthas clients
Arthas common commands
dashboard
Official dashboard command document
- function
Displays real-time data (thread, GC, JVM footprint, and so on) for the current Java process
- Command Example
#Display the current Java process thread CPU usage, GC information, JVM information, etc., refresh every 5s
dashboard
#The refresh is terminated after 10 times. The refresh occurs every 5 seconds by default
dashboard -n 10
#2s refresh once
dashboard -i 2000
#2S refresh once, refresh 10 times in total
dashboard -i 2000 -n 10
#2S is refreshed once, 10 times in total, and saved to a file for subsequent analysis
dashboard -i 2000 -n 10 | tee /data/dashback_2021_02_21.txt
Copy the code
Sysprop, Sysenv, JVM
Official documents of sysprop commands Sysenv official documents of JVM commands
- function
Sysenv: Displays System Environment Variables of the current JVM. JVM: displays information about the current JVM
#Viewing JVM Information
jvm
#View all system properties of the current JVM
sysprop
#View a specific system property
sysprop java.class.version
#Fuzzy matching of certain system attributes
sysprop | grep java.class.
#Sysenv is used with sysprop
Copy the code
thread
Official document of the thread command
- function
Displays detailed thread information about the current Java process
- Command Example
#Displays a list of all threads
thread
thread-all
#The CPU usage sampling is changed to 2s. The default value is 200ms
thread -i 2000
#Details about the top five threads in CPU usage are displayed, that is, the busy threads. For details about how to collect CPU usage statistics, see the official documentation
thread -n 5
#Example Print the details of the thread whose ID is 5
thread 5
#Filter thread data based on state (NEW, RUNNABLE, TIMED_WAITING, WAITING, BLOCKED, TERMINATED)
thread --state RUNNABLE
thread |grep RUNNABLE
#List the threads that hold a lock and block other threads the most, and check for deadlocks
thread -b
Copy the code
classloader
Official document of the classloader command
- function
View class loader information
- Command Example
#View the class loader and load information
classloader
#Look at the class loaderhash, the parent information
classloader -l
#View inheritance trees between class loaders
classloader -t
#Lists all class loaders and loaded classes
classloader -a
#View the actual URLClassLoader urls
classloader -c hashcode
Copy the code
sc
Official document of the sc command
- function
View information about a class in the JVM
- Command Example
#Fuzzy matching class information, regular expression support, if the interface, will also list all implementation classes
sc java.lang.String*
#Fuzzy matching class information, limit the maximum number of matches, default 100
sc java.lang.String* -n 2
#View class details
sc -d java.lang.String
#Specifies the class loader to view class information
sc -c hashcode java.lang.String*
#View class details, including field information
sc -d -f java.lang.String
Copy the code
sm
Official document of the sm command
- function
Query method information for a class in the JVM
- Command Example
#View information about all methods in a class
sm top.learningwang.arthasdemo.controller.VipUserController
#View information about a method under a class
sm top.learningwang.arthasdemo.controller.VipUserController dropVipUser
#View Method Details
sm -d top.learningwang.arthasdemo.controller.VipUserController dropVipUser
#Specifies the class loader to view the details of a method under the class
sm -c hashcode -d top.learningwang.arthasdemo.controller.VipUserController dropVipUser
Copy the code
monitor
Official document of the monitor command
- function
Statistical method execution over a period of time
- Command Example
#The statistics period is 60 seconds by default
monitor top.learningwang.arthasdemo.controller.VipUserController helloUser
#Statistics are collected about the execution of a method in a class. 10s is the statistical period for three times
monitor top.learningwang.arthasdemo.controller.VipUserController helloUser -c 10 -n 3
#Statistics the execution information of a method in a class. The parameter value is limited. The statistical period is 5s
monitor top.learningwang.arthasdemo.controller.VipUserController helloUser "params[0].length<5" -c 5
Copy the code
watch
Official document of the watch command
- function
Observe the details of a method’s execution
- Command Example
#Observe the execution details of a method, support OGNL expression output observation resultswatch *VipUserController helloUser '{clazz,method,isReturn,isThrow,params,target,returnObj,throwExp}' :<<! Clazz: The object's class method: the constructor or method params: the parameters array of method params[0..n] : the element of parameters array returnObj : the returned object of method throwExp : the throw exception of method isReturn : the method ended by return isThrow : the method ended by throwing exception#cost : the execution time in ms of method invocation
!
#Limit the number of observations performed
watch *VipUserController helloUser -n 2
#Set the traversal depth of observation results
wathch *VipUserController helloUser -x 2
#Only successful execution is observed
watch *VipUserController helloUser -s
#Only execution failures are observed
watch *VipUserController helloUser -e
#At the same time, observe the results before and after the implementation of the method
watch *VipUserController helloUser -b -f
#Observe the detailed exception stack information when the method executes an exception
watch *VipUserController helloUser '{throwExp}' -e -x 2
#Observe methods whose execution time is greater than 200ms
watch *VipUserController helloUser '#cost>100'
Copy the code
trace
The official document of the trace command
- function
Trace the method internal invocation path and print the time spent on each node on the method path (only printed to the first layer)
- Command Example
#Observe the order and time of method calls
trace top.learningwang.arthasdemo.controller.VipUserController helloUser
#The order and time consuming of internal invocation of observation methods were observed for only 3 times
trace top.learningwang.arthasdemo.controller.VipUserController helloUser -n 3
#Observe method internal call order and time, including JDK internal methods
trace top.learningwang.arthasdemo.controller.VipUserController helloUser --skipJDKMethod false
#Limit the range of observation, the first parameter length is greater than 10
trace top.learningwang.arthasdemo.controller.VipUserController helloUser params[0].length>=10
#Limit the observation range, and the execution time is greater than 100ms
trace top.learningwang.arthasdemo.controller.VipUserController helloUser '#cost>100'
#Trace multiple methods of multiple classes simultaneously
trace -E com.test.ClassA|org.test.ClassB method1|method2|method3
Copy the code
stack
Official document of the stack command
- function
Prints the call path to which the current method is called
- Command Example
#Observe the call stack of a method
stack top.learningwang.arthasdemo.service.VipUserService getVipUserByCardNo
#Observe the call stack of a method, 3 times
stack top.learningwang.arthasdemo.service.VipUserService getVipUserByCardNo -n 3
#Observe the call stack for a method that takes more than 100ms
stack top.learningwang.arthasdemo.service.VipUserService getVipUserByCardNo '#cost>100'
Copy the code
headdump
Headdump official document of the command
- function
The headdump file is generated to the execution directory to analyze object memory usage. Eclipse Memory Analyzer(MAT)
- Command Example
#Generates the headdump file to the specified directory
heapdump /tmp/dump.hprof
#Only live objects are dumped
heapdump --live /tmp/dump.hprof
Copy the code
4. Actual combat
4.1 Changing a Log Level
Web services exist, and the log level is Error. To rectify online faults, you need to view the information displayed in info to rectify data faults. You can change the log level as follows:
#The specific class information is found
sc -d *VipUserController
#Use the OGNL expression to view log attribute information and determine the log level
ognl -c classLoaderHash "@top.learningwang.arthasdemo.controller.VipUserController@logger"
#Use ogNL expressions to change log levels
ognl -c classLoaderHash "@top.learningwang.arthasdemo.controller.VipUserController@logger.setLevel(@ch.qos.logback.classic.Level@DEBUG)"
#Check whether the log level is changed successfully
ognl "@top.learningwang.arthasdemo.controller.VipUserController@logger"
#Example Change the global log level
ognl -c classLoaderHash '@org.slf4j.LoggerFactory@getLogger("root").setLevel(@ch.qos.logback.classic.Level@DEBUG)'
Copy the code
4.2 Updating code and hot loading
There is a problem in the online code. After modifying the code without restarting it, hot loading can be realized
#Decompile the source code and save it in a file
jad --source-only top.learningwang.arthasdemo.controller.VipUserController > /data/VipUserController.java
Copy the code
Edit the code to change Hello to Hello2
#According to the source, compile the class file, and save to the execution directory
mc -c 18b4aac2 /data/VipUserController.java -d /data/
#Reload the compiled class file
redefine -c 18b4aac2 /data/top/learningwang/arthasdemo/controller/VipUserController.class
Copy the code
Access the interface to test the hot loading results
4.3 Troubleshooting function Invocation exceptions
Observation method Perform abnormal specific information, input and output parameters, etc
#Observe the detailed exception stack information when the method executes an exception
watch *VipUserController helloUser '{throwExp}' -e -x 2
Copy the code
4.4 Arthas background performs diagnostic tasks asynchronously
When you run commands such as dashboard, watch, and trace, you can suspend command execution and output the results to a file for subsequent analysis without affecting other commands.
#The dashboard is executed in the background and output to a file
dashboard >> /data/dash.log &
#View background tasks in progress
jobs
#Ending a Background Task
kill jobid
Copy the code
4.5 Troubleshooting Method execution efficiency
An interface on the line is executing slowly. It is impossible to determine which code is faulty.
#Observe the call stack for several methods that take more than 100ms to execute
trace -E com.test.ClassA|org.test.ClassB method1|method2|method3 '#cost>100'
Copy the code
You can narrow down the troubleshooting scope based on the execution time of each method.
5. Remote diagnosis
5.1 webConsole
When arthas Server is started, the webConsole service is started by default. The default port number is 3658. After port access is granted, external users can directly run arthas commands remotely in a browser by accessing http://ip:port
5.2 tunnel server
Arthass Tunnel Server official document
Arthas provides a Springboot Starter that, when configured into a project, starts an Arthas Tunnnel Server
<! -- SpringBoot dependency -->
<dependency>
<groupId>com.taobao.arthas</groupId>
<artifactId>arthas-spring-boot-starter</artifactId>
<version>${arthas.version}</version>
</dependency>
Copy the code
Springboot arthas configuration item
Once the service is started, arthas remote diagnostics can be implemented while the network is available
java -jar arthas-boot.jar 'ws://ip:port/ws'
Copy the code
Attachment 1: source code address
Demo command used when the Springboot Web service source
Attachment 2: Study materials
- Arthas official documentation
- Ali Cloud Arthas online tutorials
- Official guide to OGNL expressions
- Powerful OGNL
- Java dynamic tracking technology exploration