1. Introduction

On December 25, 2018, Matrix, an APM tool developed by wechat team, was officially open source.

Matrix is an APM (Application Performance Manage) developed by wechat and used daily. Currently, it is mainly running on Android platform. Matrix aims to establish a unified application performance access framework, quickly integrate various performance monitoring schemes, collect and analyze abnormal data of performance monitoring items, and output corresponding problem analysis, positioning and optimization suggestions, so as to help developers develop higher-quality applications.

After a close reading of the source code of Matrix, I uttered two voices of praise and ridicule. To its credit, the APM framework has been designed in a way that Android developers can learn a lot from. Understanding it can broaden our programming horizons. Ironically, in terms of documentation richness, code readability, code comments, open source services, etc., they’re not doing very well. As the top development team in China, there is still a big gap between these aspects and the top open source development team abroad.

At present, the integration of Matrix is really convenient. According to the official documents, it can be integrated into the project in about 10 minutes. However, it has a hard defect, unfriendly to developers, it is very convenient to integrate, but it is difficult to analyze and locate problems, developers need to build a set of data collection and analysis system, greatly increasing the threshold for developers to use. It currently has 8.5K stars on Github. If the wechat team can open source the acquisition system and data analysis system, the threshold for users to use will be greatly reduced, and I believe that the amount of STAR can be improved to a higher level.

Ridicule is ridicule. Ridicule is not the end. I hope domestic developers can be more sincere to make more useful open source products. With that said, I’m going to talk about some design details not covered in the official Matrix documentation. It’s really amazing. If you have any prejudice against Matrix, please follow me to break the prejudice and break through the threshold set by the wechat team.

2. Matrix function introduction

The current monitoring scope of Matrix includes: application installation package size, SQLite operation optimization, frame rate variation, lag, startup time, page switching time, slow method, file read and write performance, I/O handle leak, memory leak, etc.

For more details, please go to Matrix introduction

This paper will focus on the realization principle of slow detection method and data format analysis. Data format is the most important part of Matrix, and it is necessary to have a good understanding of data format whether it is generated in the collection process or analyzed in the process. As MY understanding of Matrix deepens, other series of articles will be released on the official account. Please pay attention.

3. Slow method demonstration

The official Demo, TestTraceMainActivity#testJankiess(View View), simulates the scenario of calling methods in the main thread for more than 700ms. The default threshold for slow methods in Matrix is 700ms. It can be configured by users. The corresponding field is

//Constants.java

public static final int DEFAULT_EVIL_METHOD_THRESHOLD_MS = 700;

Clicking the EVIL_METHOD button invokes the testJankiess method. Print the Log as follows

At first glance, the monk is a little confused. If such a log is displayed, the call duration of the main thread exceeds 700ms. Format the JSON corresponding to the content in the log and get the following result:

{ "machine":"UN_KNOW", "cpu_app":0, "mem":1567367168, "mem_free":851992, "detail":"NORMAL", "cost":2262, "Usage" : "0.35%", "scene" : "sample. Tencent. Matrix. Trace. TestTraceMainActivity", "Stack ":"0,1048574,1,2262\n 1,117,1,2254\n 2,121,1,2254\n 3,124,1,384\n 4,125,1,165\n 5,126,1,21\n 5,127,1,21\n 5128,, 4129 3130,1,65,1,24 \ n \ n \ n 4131,1,21,1,6 \ n 4132 4133 3134,1,1004,1,8 \ n \ n \ n ", "stackKey" : "121 |", "tag":"Trace_EvilMethod", "process":"sample.tencent.matrix", "time":1620523013050 }Copy the code

There is also an official article about data formats, github.com/Tencent/mat…

This article focuses on the Stack field. Its function is to report the corresponding stack. But why are there a bunch of Arabic digits in the stack? Let’s start at the beginning.

4. Calculate the time spent on method calls

4.1 Calculate the time a method call takes

So let’s say I have method A. I want to calculate how long it takes. That’s what we do

public void A() {
  long startTime = SystemClock.uptimeMillis()
  SystemClock.sleep(1000);
  long endTime = SystemClock.uptimeMillis()
  System.out.println("cost time " + (endTime-startTime));
}
Copy the code

We can do this for individual methods. But if I want to calculate the call cost for all the methods in the Android project, we need to use bytecode staking. At the beginning and end of all methods, add code to record time. Matrix also uses the peg technique to calculate the time call of the method.

A method of TestTraceMainActivity inA project

Decompilate TestTraceMainActivity in APK using the Jadx tool. We found that method A increased before and afterAppMethoBeat i. (121) and AppMethoBeat. O (121).

What does the I/O method parameter 121 mean?

4.2 121 Description

The Gradle plugin, when processing a method, matches the name of the method with a number incrementing from 1. We open the app/build/outputs/mapping/debug/methodMapping. TXT file. From the picture we can see 121 corresponding method is sample. Tencent. Matrix. The trace. TestTraceMainActivity (V) A

The preceding stack number 0,1048574,1,2262\n the second column,1048574, corresponds to the method name. The advantage of this is that data collection saves traffic.

4.3 Obtaining the Call Stack Time

The method invocation is as follows, assuming that method invocation A takes 1000ms. How can I determine which submethod in the call stack takes the most time to call?

public void A() {
  B();
  C();
  D();
}
Copy the code

Matrix framework has implemented call stack time monitoring, I will explain the specific analysis later. The point is in chapter 6.2

4.4 Obtaining the Main Thread Time

Depends on the main thread Looper to monitor the execution time of each dispatchMessage

public static void loop() { ... for (;;) {... // This must be in a local variable, in case a UI event sets the logger Printer logging = me.mLogging; if (logging ! = null) { logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what); } msg.target.dispatchMessage(msg); if (logging ! = null) { logging.println("<<<<< Finished to " + msg.target + " " + msg.callback); }... }}Copy the code

All tasks executed by the main thread are distributed and executed in the dispatchMessage method. We set a Printer for the main thread Looper by setMessageLogging. Because dispatchMessage prints the corresponding information before and after execution. We can calculate the time before and after execution.

5. How does Matrix achieve pile insertion

This paper simply introduces the Matrix pile insertion technology. Dabble. Realize plug pile code is com/tencent/matrix trace/MethodTracer. Java, its inner class TraceMethodAdapter implements AppMethoBeat. I () and AppMethoBeat. O () insertion function.

6. Principle of slow method monitoring

6.1 Draw the Call stack tree by hand

The code call for the slow method demonstration mentioned in section 3 is as follows.

public void testJankiess(View view) {
    A();
}

private void A() {
    B();
    H();
    L();
    SystemClock.sleep(800);
}

private void B() {
    C();
    G();
    SystemClock.sleep(200);
}

private void C() {
    D();
    E();
    F();
    SystemClock.sleep(100);
}

private void D() {
    SystemClock.sleep(20);
}

private void E() {
    SystemClock.sleep(20);
}

private void F() {
    SystemClock.sleep(20);
}

private void G() {
    SystemClock.sleep(20);
}

private void H() {
    SystemClock.sleep(20);
    I();
    J();
    K();
}

private void I() {
    SystemClock.sleep(20);
}

private void J() {
    SystemClock.sleep(6);
}

private void K() {
    SystemClock.sleep(10);
}

private void L() {
    SystemClock.sleep(1000);
}
Copy the code

Its corresponding methodMapping file is as follows:

117, 1, sample. Tencent. Matrix. Trace. TestTraceMainActivity testJankiess (Landroid. View. The view) V, 2, 121 sample. Tencent. Matrix. Trace. TestTraceMainActivity A (V), 4, 122 sample. Tencent. Matrix. '. TestBatteryActivity OnDestroy () 123, 9 V, sample. Tencent. Matrix. Sqlitelint. TestSQLiteLintHelper qualityClose (Ljava. IO. Closeable;) V, 2, 124 sample. Tencent. Matrix. Trace. TestTraceMainActivity (V), 2, 125 B sample. Tencent. Matrix. Trace. TestTraceMainActivity C (V), 2, 126 sample. Tencent. Matrix. The trace. TestTraceMainActivity D (V) 127, 2, sample. Tencent. Matrix. Trace. TestTraceMainActivity E (V), 2, 128 sample. Tencent. Matrix. The trace. TestTraceMainActivity F (V), 2, 129 sample. Tencent. Matrix. Trace. TestTraceMainActivity G (V) 130, 2, sample. Tencent. Matrix. Trace. TestTraceMainActivity H (V), 2, 131 sample. Tencent. Matrix. The trace. TestTraceMainActivity I (V), 2, 132 sample. Tencent. Matrix. Trace. TestTraceMainActivity J () V , 2, 133 sample. Tencent. Matrix. Trace. TestTraceMainActivity K (V), 2, 134 sample. Tencent. Matrix. The trace. TestTraceMainActivity L (V), 1, 1048574. The android OS. Handler dispatchMessage (Landroid. OS. The Message) VCopy the code

The above method calls can be summarized into the following tree:

Notice that I drew this tree directly from the order of the calls. How does the program generate the call stack tree from the order of the calls?

6.2 Generating call stack tree

6.2.1 Function calls are recorded in queues

I /o is called before and after each function is executed at runtime. If it is executed on the main thread, the time offset for the current distance from the initialization of the MethodBeat module is obtained before and after the function is executedTo compress the data, store it in a variable of type long.MethodBeat I or O, Mehtod ID, and time offset are currently executed into a long variable. To the index position in a pre-initialized array long[] (the pre-allocated buffer is 100W and occupies about 7.6m of memory). The data store is shown as follows:

//AppMethodBeat.java private static long[] sBuffer = new long[Constants.BUFFER_SIZE]; //Constants.java public static final int BUFFER_SIZE = 100 * 10000; / / 7.6 MCopy the code

The AppMethodBeat. I/O method will eventually be called

//AppMethodBeat.java
private static void mergeData(int methodId, int index, boolean isIn) {
    if (methodId == AppMethodBeat.METHOD_ID_DISPATCH) {
        sCurrentDiffTime = SystemClock.uptimeMillis() - sDiffTime;
    }
    long trueId = 0L;
    if (isIn) {
        trueId |= 1L << 63;
    }
    trueId |= (long) methodId << 43;
    trueId |= sCurrentDiffTime & 0x7FFFFFFFFFFL;
    sBuffer[index] = trueId;
    checkPileup(index);
    sLastIndex = index;
}
Copy the code

The testJankiess method call, through the mergeData method, finally fills the sBuffer as shown below:

6.2.2 Check whether the threshold is exceeded after the main thread is invoked

Evilmethodtracer. Java dispatchEnd indicates that the execution of the main thread ends. If the execution time exceeds the threshold, AnalyseTask is executed in the MatrixHandlerThread to analyze the time of the call stack.

6.2.3 Analyzing the stack

  1. Call TraceDataUtils. StructuredDataToStack () method
  2. Call the TraceDataUtils. TrimStack () method
  3. Call TraceDataUtils. GetTreeKey () method

6.2.4 Converting the call queue to the tree’s sequential traversal result

1. Call the addMethodItem method to push the result of the post-traversal onto the stack

StructuredDataToStack (), if it’s an out method, will go off the stack and push it to the bottom of the stack. The results are as follows:

If we reverse the queue, compared to the hand-drawn tree we know that the result is a back-order traversal.

126 127 128 125 129 124 131 132 133 130 134 121 117 1048574

2. Call stackToTree() to transform the queue into a multi-fork tree.

The result is the same as a hand-drawn tree

3. Call treeToStack() to get the first-order traversal result The results are as follows

6.2.5 Clipping the call stack

Matrix uploads a maximum of 30 stacks by default. If there are more than 30 stack calls, you need to crop the stack. The clipping strategy is as follows:

  1. If the stack size is greater than 30, nodes whose execution time is less than 5* total traversal times are removed

  2. The maximum overall traversal is 60 times, and the comparison time increases by 5ms for each overall traversal

  3. If the stack size is still greater than 30 after 60 iterations, delete the rest

6.2.5 Stack Information Generation Key

If the calling time of a method is 0.3 times longer than the calling time of the whole main thread, the method ID will be recorded in the list, and finally sorted and filtered. Generate traceKey.

6.2.6 Generate the report stack based on the trimmed sequential traversal result

ReportBuilder is the final stack information reported. This parameter is consistent with the log information at the beginning of this article

7. Parse logs

The log parsing result is as follows:

8. The last

If there is any error, please leave a message. It’s not easy to write. It took a whole day. If you like it, share it with your friends. If you need to parse the source of the log file, welcome to pay attention to the “byte station” public number of the same name.