This article is reprinted from the public account 58 architect, has been authorized by the author!

As the largest classified information life service platform in China, each product of 58 Group will invest a lot of manpower to analyze user behavior to improve operation efficiency. But the need to analyze user behavior is basically similar across products. In this context, we developed the WMDA unburied user behavior analysis platform by ourselves, and provided support for PC, M and APP to help each business line better mine the real behavior of users.

For the use of SDK, the business side does not need to manually bury the point, a few lines of code, can realize the full data collection. For mobile terminal SDK, the accuracy, timeliness, comprehensiveness and other factors of collected data directly determine the subsequent analysis of user behavior. This paper will introduce the data acquisition technology without buried point on Android terminal in detail from the perspective of technology selection and technology implementation scheme.

First, technology selection

First of all, technology serves requirements, and WMDA is positioned to analyze user behavior with unburied technology. At the same time to help solve the manual burying point is not easy to maintain, prone to wrong burying, missing burying and other pain points. Therefore, while collecting user behavior data, SDK has high requirements on development efficiency, acquisition performance, accuracy and real-time performance, and needs to support data traceability.

Based on the research on the existing buried spot technology in the market, the current technical solutions can be roughly divided into three categories:

  1. Traditional code burying points

    Implementation scheme: Manual embedding in Coding phase.

    Representative solutions: Umeng, Baidu statistics.

    Advantages: Flexible, accurate and can be customized.

    Disadvantages: the amount of business buried points is very large, the development cost is high, and it is not easy to maintain. If you want to modify or add buried points, you need to re-issue the version.

  2. Dynamic buried point

    Implementation: Use the AccessibilityDelegate to set up a proxy for each view instance, listening for control click events.

    Representative solution: Github open source Mixpanel

    Advantages: You do not need to manually bury points. You can dynamically deliver configurations to listen to specified controls through visual circling.

    Disadvantages: Does not support data traceability, cannot collect Fragment page data, only supports API 14 and above, meanwhile, this monitoring mode has a serious impact on app performance, each control needs to be dynamically bound, and the ViewTree needs to be refreshed when the interface changes, which is inefficient.

  3. Bytecode staking points at compile time

    Implementation scheme: Using Gradle plug-in, embedded code is inserted in the code during compilation stage to collect data.

    Representative scheme: GrowingIO, Meituan replacement UI control scheme.

    Advantages: High development efficiency, no need to manually bury points, code insertion during compilation, high performance, support data traceability.

    Disadvantages: low flexibility of buried point.

Through the above brief analysis, we can see that the advantages and disadvantages of the three schemes are obvious. Finally, we adopt a technical solution that mainly uses Gradle plug-in to inject embedded code automatically, supplemented by manual embedded data customization completion.

Note: Check out GrowingIO’s official documentation, GrowingIO now also offers support for hand-buried points

Two, technical implementation

The overall architecture of the WMDA SDK Android terminal is mainly divided into three parts: circling module, event collection and reporting, and configuration management, as shown in the following figure.

The following describes how to collect, process, store, report, and select events based on the event collection and reporting process.

2.1 Event Collection

WMDA mobile terminal data collection can be divided into three types: page browsing event, control click event and custom event. As a no-buried solution, the core of the SDK is traceless collection of events. Among them, these three events correspond to different collection and processing methods. WMDA collects the events through different technical solutions, and finally processes the events in a unified manner, and then stores and reports the events.

2.1.1 Entrance of pile insertion

Event collection is the core of unburied point technology, in which WMDA intercepts Fragment and control click events by using WMDA Plugin developed by itself. During compilation, IT uses ASM to inject code in the way of bytecode insertion to realize event collection.

For event interception, you first need to determine the insertion timing and the bytecode file to be modified. Here we use the Transform API as the piling entry point, modifying the bytecode file after the Java Compiler class file before packing it into the dex file. Since the Transform API comes with Gradle plugin version 1.5.0, the Gradle plugin version cannot be lower than 1.5.0 during project development.

The classpath 'com. Android. Tools. Build: gradle: 1.5.0'
                                        

Copy the code

We then iterate through and manipulate bytecode files in transform, because many large projects now are componentized into multiple libraries. So in addition to modifying our application source code, we also need to scan bytecode files in the third-party library.

void transform(Context context, Collection<TransformInput> inputs, Collection<TransformInput> referencedInputs, TransformOutputProvider outputProvider, boolean isIncremental) throws IOException, TransformException, InterruptedException { inputs.each { TransformInput input -> input.directoryInputs.each { DirectoryInput directoryInput JarInputs. Each {JarInput JarInput -> JarInput JarInput -> // Class for third party library}}}Copy the code

We also introduced an injection whitelist mechanism. If the app does not want certain classes under package names to be injected, such as using a third-party SDK, you can create a wmDA-rules.properties file in the project level root directory and fill in the class path that you do not want to inject. You can add multiple values, one for each line, ending with a slash (/) :

com/wuba/sdk/com/wuba/test/Copy the code

2.1.2 Page Events

WMDA collects page view events in two different ways.

For activities, WMDA uses LifecycleCallback to listen for pages to open and close. When the page starts, intercept the lifecycle method onResume, and then process it in the event processing module, format it into an event structure, and store it for reporting.

@overridePublic void onActivityResumed(Activity Activity) {// Page browsing events are resumed PageManager.getInstance().onActivityResumed(activity); }Copy the code

However, this scheme has an adaptation problem, in Android4.0(API14) below, the system does not support this method.

Note: Currently, the lowest version of App is android 4.0 or above

In earlier versions, we can Hook interception as well. By intercepting the main thread Instrumentation instances, to achieve low version of the page monitoring. At the same time, it is necessary to consider the case that third-party plug-ins also Hook the instance, and the corresponding method before Hook is executed to ensure no impact on other plug-ins in the APP. The downside is that if other SDKS use this approach, it might affect our interception.

@overridePublic void callActivityOnResume(Activity Activity) {// Page browsing event collection and processing PageManager.getInstance().onActivityResumed(activity); / / execution Hook before Instrumentation instances onResume method oldInstrumentation. CallActivityOnResume (activity); }Copy the code

2.1.2.2 Fragment Collection For Fragments, since Android system does not have callback monitoring on Fragment life cycle, WMDA uses ASM library to conduct bytecode operation during compilation through Gradle plug-in. Inject the corresponding page collection method of WmdaAgent into the Fragment to complete event collection. In terms of the injection strategy, we just need to inject the collection code into the Fragment parent for the following two pages.

android/app/Fragmentandroid/support/v4/app/FragmentCopy the code

Examples of code to inject into fragment-related methods:

@OverrideMethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions); MethodVisitor wrappedMv = mv; if (wrappedMv ! = null) {/ / in onResume insert WmdaAgent onFragmentResumed method if (name) equals (" onResume ") && desc. Equals (" () V ")) { wrappedMv.visitCode() wrappedMv.visitVarInsn(Opcodes.ALOAD, 0) wrappedMv.visitMethodInsn(Opcodes.INVOKESTATIC, "com/wuba/wmda/autobury/WmdaAgent", "onFragmentResumed", "(Landroid/app/Fragment;) V") } }}Copy the code

WmdaAgent code:

Public static void onFragmentResumed(Fragment Fragment) {// Collect and process page browsing events PageManager.getInstance().onFragmentResumed(fragment); }Copy the code

2.1.3 Control Click event

For click event collection, WMDA adopted Mixpanel as an open source solution in the early development process. As mentioned above, the development efficiency of this scheme is good, but there are bottlenecks and risks due to performance problems, failure to collect Fragment pages and version adaptation problems.

As we continued to explore, we found that using Gradle plug-ins at compile time can perfectly inherit the benefits of the Mixpanel solution, while avoiding performance, data accuracy, and versioning issues. Therefore, in the control click event collection, we adjust the technical implementation scheme, from dynamic View set agent evolution to compile time to insert buried code.

WMDA supports common third-party frameworks for click event blocking, such as:

ButterKnife, Databinding, AndroidAnnotations, and RxBindingCopy the code

The exact technique is the same as the Fragment insertion point, the onClick method is injected at compile time and the event is intercepted and handled AOP way. The core implementation ideas are as follows:

The plug-in code

@OverrideMethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions); MethodVisitor wrappedMv = mv; if (wrappedMv ! = null) {// find a method named onClick with an entry parameter of View, Wmdaagent.onviewclick (View View) if (name.equals("onClick") &&desc.equals ("(Landroid/ View /View;)) V")) { wrappedMv.visitCode() wrappedMv.visitVarInsn(ALOAD, 1); wrappedMv.visitMethodInsn(INVOKESTATIC, "com/wuba/wmda/autobury/WmdaAgent", "onViewClick", "(Landroid/view/View;) V", false); }}}Copy the code

WmdaAgent corresponding code:

Public static void onViewClick View (View) {/ / controls click event AutoEventManager collection and processing. The getInstance (). The onEvent (View); }Copy the code

2.1.4 Custom Events

No buried point is the core function of WMDA, but due to the characteristics of business scenarios, no buried point can not completely meet the requirements of all business scenarios, so WMDA also provides support for mobile buried point, making WMDA more flexible in actual use and more comprehensive data statistics.

There are no key technical points in this section, just common code burying points, which I won’t cover too much here.

2.2 Event Handling

After the event collection is complete, it is sent to the event processing thread to process the native event so that the server can better analyze and process it. In page event processing, we use the class full path of the page as the eigenvalue of the page. APP_PAGE example:

com.wuba.wmdademo.MainActivityCopy the code

After collecting the page event, it will be passed to the child thread for processing, and then extract the page ID and page custom attributes set by the business developer in the page onCreate method, format these data uniformly, construct the page browsing event, and pass it to the event storage module.

In the control event processing, one of the biggest problems we face is how to distinguish each control, that is, how to define the eigenvalue of the control. Here, we refer to the Mixpanel method, which collects the class name and index of the View itself as well as the class name and index of all its parent views step by step, and then assembles all traversal information as the unique characteristic value of the View on the current page. Control unique identifier: page APP_PAGE + ViewPath + index

ViewPath example:

/MainWindow/LinearLayout[0]/FrameLayout[1]/ActionBarOverlayLayout[0]#decor_content_parent/ContentFrameLayout[0]/LinearLa yout[0]/ScrollView[0]/LinearLayout[0]/AppCompatButtonCopy the code

For the subsequent processing of the collection event, we have further optimized the UI performance. Because events are intercepted bytecode pegs, the most time-consuming point in event processing is actually generating View eigenvalues. In Android, memory leaks occur because views are held and manipulated by child threads.

In WMDA, we split the method of constructing eigenvalues. In THE UI thread, we only extract View data, which can be understood as a directional View traversal copy, and do no other time-consuming operations. Then we pass the copied ViewData to the child thread to construct eigenvalues, integrate the data to construct formatted click events. Finally, the event is passed to the storage module.

Click event processing sequence diagram is as follows:

2.3 Event Storage

After the event processing is complete, it is passed on to the storage module for local persistence. Before storage, the system checks the storage policy and stores data after the policy is met.

Storage here, native SQLite is used to store the binary of the Protobuf instance, and then aes-256 is used for encrypted storage.

2.4 Event Reporting

After the event store is complete, the request is triggered to report. Before reporting, WMDA checks the reporting policy and reports after the policy is met.

To narrow down the WMDA package, only HttpUrlConnection is used to handle network operations. GZIP+ProtoBuf is used during data reporting to reduce traffic consumption, ensure data collection and improve user experience.

2.5 Circle Module selection

Before, only the data collection scheme was introduced. After the data is collected and reported, it will not be directly analyzed and processed, and a process of index selection is required. For an introduction to circle selection, see the circle section of data-driven growth: 58 Practical Paths to Unburied User Behavior Analysis, which is not repeated here.

Usually, when we circle a page for a long time, there is no need to keep sending the current page snapshot data to the server, because the page has not changed. There is an optimization strategy in this section. The SDK will generate a fingerprint based on the current screen snapshot. Only when the current screen is changed, the current page snapshot data will be sent to the user analysis platform.

2.6 Other technical points

2.6.1 Multi-process Data Collection

There are only two modules of event collection and event processing in the sub-process. In order to ensure the continuity of events, data storage and reporting are put into the main process for unified processing. In this way, the synchronization problem of database is avoided, the accuracy of data is increased, and the system performance is improved.

Since event acquisition is a trigger type, in-application broadcast is adopted in inter-process communication. The advantage of broadcast is low coupling degree, small influence on sub-process, and relatively acceptable performance. Application scenarios are important for technology selection, so there is no use of sockets or AIDL to handle interprocess communication.

2.6.2 Selecting multiple Processes

Considering that circling is a real-time and continuous process, SDK uses Socket to realize inter-process communication. All sub-processes send page snapshot information to the master process, and the master process interacts with the server.

Third, existing problems

Of course, there are still some shortcomings in the bytecode piling scheme adopted by the no-buried point technology at present, which need to be explored and solved in the future.

  • Android :onClick=”xxxMethod” is used in layout. In WMDA, the listener is intercepted by the OnClickListener click event, so it cannot be monitored.

  • The popular RN framework is not supported because it is currently used for compile-time insertion burying points.

  • Support for hot updates may also be inadequate because of compile-time insertion buries.

Four,

This paper mainly introduces the practice of 58 unburied data acquisition technology on Android terminal. It includes the use of bytecode staking in unburied points and processing of acquisition events. At the same time, at the present stage, there are still some problems that need to be explored and solved. We welcome students who are interested to communicate with us.