Here are some tips from SALON:

PPT and video

Video address PPT address

Hummingbird team mobile terminal abnormal monitoring system construction

The second speaker is Pan Wankun, the Android terminal leader of ele. me Hummingbird team. Engaged in Android development for 6 years, successively engaged in Android development work in Guevara, Ele. me and other companies. Currently, I serve as the android leader of ele. me Logistics Mobile team, mainly focusing on mobile terminal code architecture, performance optimization and other fields, and responsible for ensuring the high and stable operation of App online. What he brought was the mobile anomaly monitoring system that Ele. me is putting into use.

The problem background

There are three problems in the conventional APM system used by the company and abnormal data reporting:

  • Abnormal conditions such as Crash and ANR are too single to be further classified.
  • The online environment is complex, and most cases cannot be covered by testing and development in daily business processes, so it is difficult to reproduce and has certain effectiveness.
  • In addition to the exception of the passive defense, improve the quality of the release requirements can be more active to solve the problem;

In response to these problems, the Ele. me hummingbird team developed a mobile exception monitoring system to deal with all the problems. Here is the structure diagram of the whole system:

TimeBomb (Custom exception Monitoring System)

We call it TimeBomb, and it stands for N exceptions in a row over a period of time. There are many scenarios exposed to this problem in hummingbird products, such as abnormal login, abnormal iteration, abnormal positioning, large picture application, stuck and so on.

By monitoring data in real time, we can find centralized reported problems. Here, the horizontal axis represents the time segment, the vertical axis represents the number of reports, and there are many filter options to control whether to display the data curve, which helps us to focus on only a few reported logs.

In addition, for daily attention, we can control the display of the curve reported by the front end through background configuration. Others, can be based on a field of inquiry, can also be a point of a more in-depth exploration.

T salon xiaobian note: very similar to Kibana system; The advantage is that the data we need can be added to the configuration, friendly to PMS and data analysts, with real-time attention to data.

Dogger logging system

The Dogger logging system is basically the same as the everyday logging system. It has four main concerns: lifecycle -Life, Click -Click, web request-HTTP, and customization-custom. On top of that, Dogger has some more advantages:

  • Write files through Mmap to make dot logging more efficient;
  • Cloud control platform to achieve remote configuration, real-time configuration, timely upload;
  • High compression ratio, saving user traffic;
  • For sensitive information, a powerful encryption algorithm is used to ensure security;
  • Can generate automatic script, powerful;

Talk is cheap, so the hummingbird team opened source the Dogger logging system, which you can see on GitHub.

(Project will be renamed from Trojan to Dogger)

DoggerService

Aiming at the powerful Dogger log system on the client side, the server side also launched a set of DoggerService to display the log information collected by the Dogger system, so as to achieve a good classification. DoggerService is currently under construction. Here are some small goals for DoggerService:

  • Elegant display of log content;
  • Intuitive display of event frequency of buried point;
  • Have certain data analysis ability for specific business module;

Now that most of our DoggerService is complete, here are some of the completed renderings:

DoggerMonitor

Both the Dogger log system and the DoggerService log summary and data summary system are troubleshooting methods after problems are exposed. Of course, as more proactive developers, ensuring quality development and packaging is actually a more positive way to solve problems. The DoggerMonitor development monitor is a tool born to meet this need.

DoggerMonitor is relevant to the development experience of business developers. By integrating it into the App, we can use the hover window to view in real time the most frequently watched metrics such as FPS, CPU, and memory usage. Click the floating window to enter the specific index monitoring and historical record curve. The stack situation during debugging can also be recorded in real time on the client. This function enables QA testers to locate problems when they find problems and report them to the r&d team for debugging.

According to the needs of the development students and QA students in the group, we made a set of architecture for DoggerMonitor to meet these needs:

DoggerMonitor
Dogger-Support

Finally, all the data is placed in the front-end database.

Method Time – Mirror

The Mirror library is used to calculate the time of the destination method and record the Class information and parameter information.

void method(a){
    longstartTime = System.currentTimeMillis(); . .long endTime = System.currentTimeMillis();
    MirrorUtils.mirror(endTime - startTime, className + methodName + params);
}
Copy the code

Caton – BlockCanary

The Caton team uses BlockCanary, an open source solution. BlockCanary monitors the main thread operations transparently and outputs useful information to help developers analyze and locate problems and optimize applications quickly. The scheme has the following features:

  • Non-intrusive, simple two lines to turn on monitoring, no need to poke around and break code elegance.
  • Precise. The output information helps locate the problem (down to the line), rather than taking your time like Logcat.

There is a 16-millisecond rule in Android that at a refresh rate of 60Hz, drawing within 16 milliseconds will have a better user experience without feeling stuck.

Choreographer.getInstance()
    .postFrameCallback(new Choreographer.FrameCallback() {
        @RequiresApi(api = Build.VERSION_CODES.JELLY_BEAN)
        @Override
        public void doFrame(long l) { BlockCanaryManager.getInstance().stop(runnable); BlockCanaryManager.getInstance().start(runnable , TIME_BLOCK); }});Copy the code

Traffic Statistics – TNet

TNet Phase I is monitoring HTTP interfaces. So initially we only hook the HTTP request to record the length of the request and response.

public class TNetInterceptor implements Interceptor {
    @Override
    public Response intercept(Chain chain) throws IOException {
        Request request = chain.request();
        Response response = chain.proceed(request);
     
        recordHttp(request);
        recordHttp(response);
         
        returnresponse; }}Copy the code

For three-party network requests, through the AOP solution, okHttp3.okHttpClient hooks our interceptors method into, so that we can count all HTTP requests.

@TargetClass("okhttp3.OkHttpClient")
@NameRegex("okhttp3/RealCall")
@Proxy("networkInterceptors")
public List<Interceptor> networkInterceptors(a) {
    List<Interceptor> interceptors = (List<Interceptor>) Origin.call();
    List<Interceptor> newList = new ArrayList<>(interceptors.size() + 1);
    newList.addAll(interceptors);
    
    newList.add(new TNetInterceptor());
    
    return newList;
}
Copy the code

In hummingbird’s product APP, there are other situations besides HTTP requests. Therefore, in the second phase, we plan to make a plan to collect the complete traffic. The querySummary method queries the traffic of a specific request based on different network types and start and end times. But this requires the user’s permission.

BUILD_VERSION >= 23

NetworkStatsManager.querySummary(ConnectivityManager.TYPE_MOBILE,
        uid,
        startTime,
        endTime);
        
BUILD_VERSION < 23
TrafficStats.getUidRxBytes(uid) + TrafficStats.getUidTxBytes(uid);
Copy the code

Large image detection – ImageWatcher

During the packaging process, ImageWatcher detects all images in the project. The computeSize conversion formula matches the size of bitmaps that load images into the phone at various resolutions based on different file directories. This way, if the image Bitmap size exceeds the threshold during packaging, an error can be thrown to alert the developer that there is a problem with the image.

public int computeSize(int targetDensity) {
    if (density == Image.DEFAULT) {
        return width * height * 4;
    } else {
        return (int) ((width * targetDensity / density + 0.5 f)
                * (height * targetDensity / density + 0.5 f)
                * 4); }}Copy the code

In addition, through the generation method of Hook BitmapFactory, the information of Bitmap is printed and the size of its usage is obtained. If the value exceeds the threshold, an exception is thrown.

conclusion

Through the construction of hummingbird team mobile terminal abnormal monitoring system, the following three pain points in the development process of business personnel are mainly solved:

  1. Timely report online abnormalities;
  2. Improve the efficiency of problem tracing;
  3. Ensure the quality of development releases;

The entire system team has open source plans, not just Dogger. The whole monitoring system is still under construction. We also hope that developers can make more issues and pull requests and build a more perfect monitoring platform together. In addition, the vision for DoggerService is to work with QA staff to complete the automated testing process to achieve a true BDD behavior driven development model; DoggerService also needs stronger server-side backend support, which the hummingbird team will improve in the future.