The source of Caton

Through the screen rendering mechanism we know :Android screen rendering is achieved through Vsync, the software layer calculates the data, puts it into the buffer, and the hardware layer reads the data from the buffer and draws it on the screen. The rendering cycle is 16ms, so we can see the constantly changing picture.

If it goes beyond 16ms, a lag will occur, of course, in the software layer (if it happens in the hardware layer, just change the device). So, the computation time of the software layer needs to be less than 16ms, so where does this computation take place?

It’s in the Handler, in the UI Handler, to be more precise.

Android processes interact with the Binder and threads communicate with handlers.

After the software layer receives the vsync signal from the hardware layer, the Java layer will send a message to the UI Handler to calculate the view data, that is, to perform the measurement layout drawing, which is shown in the code layer: Execute the performTraversals() function in ViewRootImpl, which we mentioned in the View measurement layout drawing.

So, the view’s data is being evaluated in the UI Handler, so if something else is being done in the UI Handler, and it takes a long time to execute, it indirectly causes the stackage to occur, and we need to find it and kill it.

So how do you find them?

As we know, the message processing of Handler is distributed through Looper, so we can get the UI Looper first, implant the detection code before and after the message execution, and add the detection logic, then we can analyze and figure out the time spent in the message execution.

// Get the UI Looper
Looper uiLooper = Looper.getMainLooper();

// Record the time before the message is sent
void preHandle(a){
  	time = System.currentTimeMillis();
}

// The processing logic after the message is distributed, calculates the time difference and prompts
void postHandle(a){
  	long delay = System.currentTimeMillis() - time;
  	if(delay > 16) {
      	// If you think you are stuck, you can do something like print the current thread stack}}// Insert the above two methods before and after the uiLooper message distribution (if there is one)
uiLooper.xxxxxxx();

Copy the code

So how do you integrate these two functions into a uiLooper? There are already apis available in Looper.

How to detect application stutter

According to the above, we only need to record the time before message execution, then calculate the time difference, and compare the time difference with the lag threshold we passed in. If the lag threshold is greater than this value, it is considered that the lag has occurred, and then dump the main thread stack and show it to the developer.

So how do you find the pre – and post-message insertion points?

Looper itself provides a method to set the log printing class:

    /**
     * Control logging of messages as they are processed by this Looper.  If
     * enabled, a log message will be written to <var>printer</var>
     * at the beginning and ending of each message dispatch, identifying the
     * target Handler and message contents.
     *
     * @param printer A Printer object that will receive log messages, or
     * null to disable message logging.
     */
    public void setMessageLogging(@Nullable Printer printer) {
        mLogging = printer;
    }
Copy the code

This printer will be used to print logs before and after message execution. The code is in the loop() function of Looper as follows:

// The message is printed before execution
// This must be in a local variable, in case a UI event sets the logger
final Printer logging = me.mLogging;
if(logging ! =null) {
    logging.println(">>>>> Dispatching to " + msg.target + "" +
            msg.callback + ":" + msg.what);
}

// The message is printed after execution
if(logging ! =null) {
    logging.println("<<<<< Finished to " + msg.target + "" + msg.callback);
}
Copy the code

Using this principle, we can pass in a custom Printer, then duplicate the println() method, and calculate the time difference ** between before and after the execution of **message. If it is greater than the target value (say, 500ms), we consider that a lag has occurred.

So, how do you tell if println() is called before or after message?

We can judge by using the message contents printed in Println, for example, >>>>> Dispatching to…. is displayed before the command is executed <<<<< Finished to

public void println(String msg) {
  	if(msg.startsWith(">>>>> Dispatching to")) {
      	// This is before execution
    }else {
      	// This is after execution}}Copy the code

But this is too low, and string matching is inherently inefficient, so we can use another method.

Since the log is printed in pairs before and after message execution, we can define a Boolean value that indicates whether the log is printed before message execution and changes the value each time the log is printed. Such as:

// Is printed before message execution, since the first print must be done before message execution, so the initial value is true
private boolean isPre = true;

public void println(String msg) {
  	if(isPre) {
      	// The main thread stack is dumped before message is executed
    }else {
      	// We can stop the stack of the dump thread by printing after message execution
    }
  	// Change the value once, this time before message execution, the next time must be after execution; This time is after the execution, the next time must be before the executionisPre = ! isPre; }Copy the code

Ok, now that we know the core principles, let’s look at the implementation of the existing project code BlockCanary.

BlockCanary

Simple to use

  • 1 Adding a Dependency
dependencies {
  	// Use it in both debug and release versions
    compile 'com. Making. Markzhai: blockcanary - android: 1.5.0'

  	// Only for debug
    / / debugCompile 'com. Making. Markzhai: blockcanary - android: 1.5.0'
    / / releaseCompile 'com. Making. Markzhai: blockcanary - no - op: 1.5.0'
}
Copy the code
  • 2 Code Integration

First, define an AppBlockCanaryContext that inherits BlockCanaryContext. We need to override several methods in the AppBlockCanaryContext.

public class AppBlockCanaryContext extends BlockCanaryContext {

    private static final String TAG = "AppBlockCanaryContext";

    /** * returns an identifier that can be passed in as app_name, version number, channel, etc. */
    public String provideQualifier(a) {
        return "my_app" + BuildConfig.VERSION_CODE;
    }

    /** * returns a user ID to identify */
    public String provideUid(a) {
				return "10086";
    }

    /** * Returns the network type, such as :2G, 3G, 4G, wifi, etc. */
    public String provideNetworkType(a) {
        return "wifi";
    }

    /**
     * Config monitor duration, after this time BlockCanary will stop, use
     * with {@code BlockCanary}'s isMonitorDurationEnd
     *
     * @return monitor last duration (in hour)
     */
    public int provideMonitorDuration(a) {
        return -1;
    }

    /** * returns the threshold you think is stuck in milliseconds, and should pass in a value of different sizes */ according to the performance of different devices
    public int provideBlockThreshold(a) {
        return 500;
    }

    /** ** Dump interval of the thread. When stall occurs, the main thread is dumped at intervals
    public int provideDumpInterval(a) {
        return provideBlockThreshold();
    }

    /** * Path for saving logs */
    public String providePath(a) {
        return "/blockcanary_log/"
    }

    /** * Whether a notification is displayed immediately */
    public boolean displayNotification(a) {
        return true;
    }
		
    /** * the card will be called immediately, you can print the log here, or upload to your own server */
    public void onBlock(Context context, BlockInfo blockInfo) {
        Log.d(TAG, "onBlock: "+ blockInfo); }}Copy the code

Call onCreate() from Application:

public class MainApplication extends Application {
    @Override
    public void onCreate(a) {
      	// Pass in the AppBlockCanaryContext we created above
        BlockCanary.install(this.newAppBlockCanaryContext()).start(); }}Copy the code

When caton occurs, we are notified and can see our own printed log in Logcat.

This article focuses on source code analysis. The full use can be found on Github.

Source code analysis

BlockCanary. Install ():

public static BlockCanary install(Context context, BlockCanaryContext blockCanaryContext) {
  	// Initialize BlockCanaryContext
    BlockCanaryContext.init(context, blockCanaryContext);
  	Blockcanarycontext.get () is the argument we passed in, and displayNotification() is the argument we defined above whether to display the notification
    setEnabled(context, DisplayActivity.class, BlockCanaryContext.get().displayNotification());
  	// Create a singleton and return it
    return get();
}

// Implementation of the get() function
public static BlockCanary get(a) {
    if (sInstance == null) {
        synchronized (BlockCanary.class) {
            if (sInstance == null) {
                sInstance = newBlockCanary(); }}}return sInstance;
}

// BlockCanary construction
private BlockCanary(a) {
    BlockCanaryInternals.setContext(BlockCanaryContext.get());
  	
  	// Create a core class, which includes log analysis, stack dump, and CPU collection
    mBlockCanaryCore = BlockCanaryInternals.getInstance();
  	
  	// Core code to add the interceptor, which is the AppBlockCanaryContext we passed in
  	// When caton is detected, its onBlock() function is called
    mBlockCanaryCore.addBlockInterceptor(BlockCanaryContext.get());
  	
  	// Return if no notification is displayed
    if(! BlockCanaryContext.get().displayNotification()) {return;
    }
  	// Otherwise add a service to pop up notifications
    mBlockCanaryCore.addBlockInterceptor(new DisplayService());
}
Copy the code

Key methods in BlockCanaryContext:

static void init(Context context, BlockCanaryContext blockCanaryContext) {
  	/ / save the Context
    sApplicationContext = context;
  	// Save the BlockCanaryContext argument, which is our custom AppBlockCanaryContext
    sInstance = blockCanaryContext;
}

public static BlockCanaryContext get(a) {
    if (sInstance == null) {
        throw new RuntimeException("BlockCanaryContext null");
    } else {
      	// Returns the arguments saved above
        returnsInstance; }}Copy the code

Now, let’s go back to the main line logic and look at blockCanary. Start ():

/ / testing
public void start(a) {
  	// Add a Boolean value to prevent repeated processing
    if(! mMonitorStarted) { mMonitorStarted =true;
      	// Set the value of this parameterLooper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor); }}// Stop the test
 public void stop(a) {
      if (mMonitorStarted) {
          mMonitorStarted = false;
        	/ / remove the Printer
          Looper.getMainLooper().setMessageLogging(null);
        	// Stop stack dump
          mBlockCanaryCore.stackSampler.stop();
        	// Stop CPU collectionmBlockCanaryCore.cpuSampler.stop(); }}Copy the code

MBlockCanaryCore: mBlockCanaryCore: mBlockCanaryCore: mBlockCanaryCore: mBlockCanaryCore: mBlockCanaryCore: mBlockCanaryCore: mBlockCanaryCore

mBlockCanaryCore = BlockCanaryInternals.getInstance();
Copy the code
// this is a singleton
static BlockCanaryInternals getInstance(a) {
    if (sInstance == null) {
        synchronized (BlockCanaryInternals.class) {
            if (sInstance == null) {
                sInstance = newBlockCanaryInternals(); }}}return sInstance;
}

// look at the constructor
public BlockCanaryInternals(a) {
    // stack dump, the first argument is the UI thread, and the second argument is the dump interval we set
    stackSampler = new StackSampler(Looper.getMainLooper().getThread(),sContext.provideDumpInterval());
    // CPU collector, the parameter is the dump interval we set
    cpuSampler = new CpuSampler(sContext.provideDumpInterval());
    // Set log printing class Printer
    setMonitor(new LooperMonitor(new LooperMonitor.BlockListener() {
        // This method is executed when caton is detected
        @Override
        public void onBlockEvent(long realTimeStart, long realTimeEnd, long threadTimeStart, long threadTimeEnd) {
          	// Get stack information
            ArrayList<String> threadStackEntries = stackSampler.getThreadStackEntries(realTimeStart, realTimeEnd);
            if(! threadStackEntries.isEmpty()) {// Construct BlockInfo and call it back to the interceptor, thus calling it inside onBlock() in our AppBlockCanaryCotnext
                BlockInfo blockInfo = BlockInfo.newInstance()
                        .setMainThreadTimeCost(realTimeStart, realTimeEnd, threadTimeStart, threadTimeEnd)
                        .setCpuBusyFlag(cpuSampler.isCpuBusy(realTimeStart, realTimeEnd)) // Pass in the CPU information dumped to
                        .setRecentCpuRate(cpuSampler.getCpuRateInfo()) // Pass in the CPU information dumped to
                        .setThreadStackEntries(threadStackEntries) // Pass in the stack information dumped to
                        .flushString();
              // Save the delay information
              LogWriter.save(blockInfo.toString());
                // If there is an interceptor, its onBlock() method is executed. Remember the interceptor we added earlier
                if(mInterceptorChain.size() ! =0) {
                    for (BlockInterceptor interceptor : mInterceptorChain) {
                        interceptor.onBlock(getContext().provideContext(), blockInfo);
                    }
                }
            }
        }
    }, 
    getContext().provideBlockThreshold(), // We set the holdup threshold
    getContext().stopWhenDebugging())); // Whether to stop in debug mode, the default return true, because debug mode is common

    LogWriter.cleanObsolete();
}
Copy the code

Next we’ll look at the LooperMonitor class:

Println () = println()
class LooperMonitor implements Printer {
 		// The parameters are: the immediate delay callback, the delay threshold, and whether to stop in debug mode
    public LooperMonitor(BlockListener blockListener, long blockThresholdMillis, boolean stopWhenDebugging) {
        if (blockListener == null) {
            throw new IllegalArgumentException("blockListener should not be null.");
        }
        mBlockListener = blockListener;
        mBlockThresholdMillis = blockThresholdMillis;
        mStopWhenDebugging = stopWhenDebugging;
    }
  	
  	// The core function
  	@Override
    public void println(String x) {
      	// Stops in debug mode
        if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
            return;
        }
      	// Use a Boolean value to determine whether it is before or after execution
        if(! mPrintingStarted) {// Record the start time
            mStartTimestamp = System.currentTimeMillis();
            mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
            mPrintingStarted = true;
          	// Start to dump stack and CPU information
            startDump();
        } else {
          	// Record the end time
            final long endTime = System.currentTimeMillis();
            mPrintingStarted = false;
          	// Detect stalling and callback
            if (isBlock(endTime)) {
                notifyBlockEvent(endTime);
            }
          	/ / stop the dumpstopDump(); }}// Whether it is stuck
		private boolean isBlock(long endTime) {
      	// The time difference is greater than the threshold we pass in
    		return endTime - mStartTimestamp > mBlockThresholdMillis;
  	}
	
  	// Caton callback
    private void notifyBlockEvent(final long endTime) {
        final long startTime = mStartTimestamp;
        final long startThreadTime = mStartThreadTimestamp;
        final long endThreadTime = SystemClock.currentThreadTimeMillis();
        HandlerThreadFactory.getWriteLogThreadHandler().post(new Runnable() {
            @Override
            public void run(a) {
							// Here is the callback tomBlockListener.onBlockEvent(startTime, endTime, startThreadTime, endThreadTime); }}); }/ / the dump
    private void startDump(a) {
      	// dump stack information
        if (null! = BlockCanaryInternals.getInstance().stackSampler) { BlockCanaryInternals.getInstance().stackSampler.start(); }// Dump CPU information
        if (null != BlockCanaryInternals.getInstance().cpuSampler) {
            BlockCanaryInternals.getInstance().cpuSampler.start();
        }
    }
	
  	/ / end dump
    private void stopDump(a) {
        if (null! = BlockCanaryInternals.getInstance().stackSampler) { BlockCanaryInternals.getInstance().stackSampler.stop(); }if (null! = BlockCanaryInternals.getInstance().cpuSampler) { BlockCanaryInternals.getInstance().cpuSampler.stop(); }}}Copy the code

BlockCanary. Start () creates LooperMonitor and stackSampler and cpuSampler classes to grab stack and CPU information. Dump will start and record the start time. When message is finished, dump will stop and record the end time. Then the difference between the end time and the start time will be used. BlockInfo is constructed from the dumped stack information and CPU information and passed to the developer via callbacks.

Now let’s take a look at the code that dumps stack and CPU information, starting with their parent class AbstractSampler, and the entry function start() inside:

abstract class AbstractSampler {
    protected AtomicBoolean mShouldSample = new AtomicBoolean(false);
  	// This is the entry function
    public void start(a) {
      	// Use an atomic variable to avoid repeated startup
        if (mShouldSample.get()) {
            return;
        }
        mShouldSample.set(true);
				
      	// Remove the previous one
        HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
      	// Post new. Note that the second parameter is 0.8 times the dump interval we set in AppBlockCanaryContextHandlerThreadFactory.getTimerThreadHandler().postDelayed(mRunnable,BlockCanaryInternals.getInstance().getSampleDelay());  }// Corresponding stop function
    public void stop(a) {
        if(! mShouldSample.get()) {return;
        }
        mShouldSample.set(false); HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable); }}Copy the code

We see that it does this by Posting a runnable, and then we look at the runnable:

protected long mSampleInterval;

private Runnable mRunnable = new Runnable() {
    @Override
    public void run(a) {
      	// Core function: doSample();
        doSample();
        if (mShouldSample.get()) {
          	// Post out againHandlerThreadFactory.getTimerThreadHandler().postDelayed(mRunnable, mSampleInterval); }}};Copy the code

As you can see, doSample() is called in a loop, depending on the dump interval we set inside AppBlockCanaryContext.

So let’s look at the core function doSample(), which is an overloaded function, and look at the implementation in StackSampler first

@Override
protected void doSample(a) {
   	private static final int DEFAULT_MAX_ENTRY_COUNT = 100;
    private int mMaxEntryCount = DEFAULT_MAX_ENTRY_COUNT;
  
    private static final LinkedHashMap<Long, String> sStackMap = new LinkedHashMap<>();

    StringBuilder stringBuilder = new StringBuilder();

  	// Iterate over the current thread's StackTrace to generate a String
    for (StackTraceElement stackTraceElement : mCurrentThread.getStackTrace()) {
        stringBuilder
          	.append(stackTraceElement.toString())
            .append(BlockInfo.SEPARATOR);
    }
		
  	// Add each StackTrace to the sStackMap using lRU
    synchronized (sStackMap) {
      	// mMaxEntryCount defaults to 100
        if (sStackMap.size() == mMaxEntryCount && mMaxEntryCount > 0) {
            sStackMap.remove(sStackMap.keySet().iterator().next());
        }
      	// Key is the current time value, and value is the StackTrace to which this dump is basedsStackMap.put(System.currentTimeMillis(), stringBuilder.toString()); }}Copy the code

The core logic is to get the current thread’s StackTrace and save it to the map, up to the last 100, where key is the time value and value is StackTrace.

Remember how we got the stack information in onBlockEvent()? Yes, yes

stackSampler.getThreadStackEntries(realTimeStart, realTimeEnd)
Copy the code

It is implemented in StackSampler as follows:

// Find the result between startTime and endTime in the sStackMap we saved above and return it in the List.
public ArrayList<String> getThreadStackEntries(long startTime, long endTime) {
    ArrayList<String> result = new ArrayList<>();
    synchronized (sStackMap) {
        for (Long entryTime : sStackMap.keySet()) {
            if(startTime < entryTime && entryTime < endTime) { result.add(BlockInfo.TIME_FORMATTER.format(entryTime) + BlockInfo.SEPARATOR + BlockInfo.SEPARATOR + sStackMap.get(entryTime)); }}}return result;
}
Copy the code

The implementation is as simple as looking in a sStackMap for results between startTime and endTime, and storing the results as a List to return.

Now let’s look at the implementation of doSample() in CpuSampler:

@Override
protected void doSample(a) {
    BufferedReader cpuReader = null;
    BufferedReader pidReader = null;

    try {
      	// Read the "/proc/stat" file
        cpuReader = new BufferedReader(new InputStreamReader(
                new FileInputStream("/proc/stat")), BUFFER_SIZE);
      	// Get the CPU rate from the "/proc/stat" file
        String cpuRate = cpuReader.readLine();
        if (cpuRate == null) {
            cpuRate = "";
        }

      	// Get the process ID
        if (mPid == 0) {
            mPid = android.os.Process.myPid();
        }
      	// Obtain the /proc/mpid/stat file corresponding to the process based on the process ID
        pidReader = new BufferedReader(new InputStreamReader(
                new FileInputStream("/proc/" + mPid + "/stat")), BUFFER_SIZE);
      	// To get the CPU rate of the process
        String pidCpuRate = pidReader.readLine();
        if (pidCpuRate == null) {
            pidCpuRate = "";
        }
				
      	// Parse the data
        parse(cpuRate, pidCpuRate);
    } catch (Throwable throwable) {
        Log.e(TAG, "doSample: ", throwable);
    } finally {
        try {
            if(cpuReader ! =null) {
                cpuReader.close();
            }
            if(pidReader ! =null) { pidReader.close(); }}catch (IOException exception) {
            Log.e(TAG, "doSample: ", exception); }}}Copy the code

Parse () : /proc/mpid/stat () : /proc/mpid/stat () : /mpid/stat ()

// Save a maximum of 10 data items
private static final int MAX_ENTRY_COUNT = 10;
// Used to store CPU information
private final LinkedHashMap<Long, String> mCpuInfoEntries = new LinkedHashMap<>();

private void parse(String cpuRate, String pidCpuRate) {
  	// Convert to an array
    String[] cpuInfoArray = cpuRate.split("");
    if (cpuInfoArray.length < 9) {
        return;
    }

  	// Parse the subscripts one by one
    long user = Long.parseLong(cpuInfoArray[2]);
    long nice = Long.parseLong(cpuInfoArray[3]);
    long system = Long.parseLong(cpuInfoArray[4]);
    long idle = Long.parseLong(cpuInfoArray[5]);
    long ioWait = Long.parseLong(cpuInfoArray[6]);
    long total = user + nice + system + idle + ioWait
            + Long.parseLong(cpuInfoArray[7])
            + Long.parseLong(cpuInfoArray[8]);

    String[] pidCpuInfoList = pidCpuRate.split("");
    if (pidCpuInfoList.length < 17) {
        return;
    }

    long appCpuTime = Long.parseLong(pidCpuInfoList[13])
            + Long.parseLong(pidCpuInfoList[14])
            + Long.parseLong(pidCpuInfoList[15])
            + Long.parseLong(pidCpuInfoList[16]);

  	// Convert data to String and save
    if(mTotalLast ! =0) {
        StringBuilder stringBuilder = new StringBuilder();
        long idleTime = idle - mIdleLast;
        long totalTime = total - mTotalLast;

        stringBuilder
                .append("cpu:")
                .append((totalTime - idleTime) * 100L / totalTime)
                .append("%")
                .append("app:")
                .append((appCpuTime - mAppCpuTimeLast) * 100L / totalTime)
                .append("%")
                .append("[")
                .append("user:").append((user - mUserLast) * 100L / totalTime)
                .append("%")
                .append("system:").append((system - mSystemLast) * 100L / totalTime)
                .append("%")
                .append("ioWait:").append((ioWait - mIoWaitLast) * 100L / totalTime)
                .append("%]");

      	// Store data in mCpuInfoEntries. Key is the current time value and is the LRU policy used
        synchronized (mCpuInfoEntries) {
            mCpuInfoEntries.put(System.currentTimeMillis(), stringBuilder.toString());
            if (mCpuInfoEntries.size() > MAX_ENTRY_COUNT) {
                for (Map.Entry<Long, String> entry : mCpuInfoEntries.entrySet()) {
                    Long key = entry.getKey();
                    mCpuInfoEntries.remove(key);
                    break; }}}}// Update the data for the next round
    mUserLast = user;
    mSystemLast = system;
    mIdleLast = idle;
    mIoWaitLast = ioWait;
    mTotalLast = total;

    mAppCpuTimeLast = appCpuTime;
}
Copy the code

The logic here is similar to StackSample: obtain CPU information and store it in mCpuInfoEntries. Key is the current time value and value is the String corresponding to CPU information, which is also the Lru policy.

Remember how we got the CPU information in onBlockEvent()? Yes, through cpuSampler. GetCpuRateInfo (), its implementation is as follows:

// Get CPU rate information
public String getCpuRateInfo(a) {
    StringBuilder sb = new StringBuilder();
    synchronized (mCpuInfoEntries) {
      	// Returns by directly iterating through mCpuInfoEntries and writing to String
        for (Map.Entry<Long, String> entry : mCpuInfoEntries.entrySet()) {
            long time = entry.getKey();
            sb.append(BlockInfo.TIME_FORMATTER.format(time))
                    .append(' ') .append(entry.getValue()) .append(BlockInfo.SEPARATOR); }}return sb.toString();
}
Copy the code

This directly converts the CPU information we have stored in mCpuInfoEntries to a String.

IsCpuBusy () is the key logic of isCpuBusy().

conclusion

The core logic of BlockCanary is simple:

  • 1 throughLooperTo provide thesetMessageLogging(Printer)The function passes in a customLooperMonitor.
  • 2 in theMessageStart before executiondumpThread stack sumcpuInformation.
  • 3 inMessageStop after executiondumpAnd use the time difference to judge whether there is a lag.
  • 4 If there is a deadlock, it willdumpThe data is parsed and passed to the developer via callbacks.
  • 5 developers can use this data to analyze the causes of caton.