Author: LXJ, member of the ordering terminal team

preface

This scenario is familiar to business teams:

  • Dot demand: Every new function, data products will add dot demand synchronously. When data dot goes online for a period of time, data products/business products will analyze the conversion rate of data and adjust the business demand;
  • Validation of dotting: When the data conversion rate changes significantly and does not meet expectations on a certain day, the data product/business product will confirm with the development whether the dotting position is wrong;
  • Online question screen: report received a problem come from line, but the development is unable to reproduce this case, the need data to further analysis of the case should be on the cable problem, if there is no data, that the problem is likely to become a “cold case”, will suffer more questioned and stability for the business partner’s sense of security is reduced greatly.

Therefore, data is very important. Let’s talk about data collection in detail from the importance of data collection, data division, collection methods and buried point scheme in wechat mini program.

First, the importance of data collection

In this paper, we will focus on data collection. We will not discuss the role of data in detail, but first summarize the important role of data for performance optimization, business growth and online problem detection, which is why we need to bury points.

The role of data for online troubleshooting:

  • User behavior data is restored onsite to help analyze and locate problems and improve problem locating efficiency
  • Provide strong evidence for problem analysis

Data for performance optimization:

  • Helps identify and monitor key success indicators for online businesses
  • Help identify and prioritize the environments that need to be optimized the most
  • Information to help identify challenges and improve decisions
  • Help provide better analysis of application testing and optimization

Data for business growth:

  • Helps measure marketing effectiveness
  • Strategies to help discover the effects of activation transformation
  • Help discover retention and active analytics
  • Help product revenue realization analysis

Ii. Division and sorting of collected data

From the first point, we summarize the importance of data. Different business projects have different emphasis on the importance of data. Then what data should be collected?

Firstly, the closed-loop data includes:

  1. User behavior
  2. Customer Information, CRM (Customer Relations)
  3. Transaction data, server log data

Only the above three data can be regarded as a complete closed loop of data flow. Of course, data can be further divided in different business scenarios, and the general key points are basically no more than these three. For front-end data collection, the first two items of closed-loop data are mainly reported by the client, while the third point is mainly recorded by the server and assisted by the client, because the transaction request can be regarded as a closed loop to complete the transaction only when it reaches the server to complete processing. User behavior data also includes five elements — when, where, who, how and what, which are similar to the five elements of news. Some user information services involve user sensitive information and privacy and require authorization. Therefore, the specific dimension of user information depends on the business scenario. The most basic data requirement is to uniquely identify users. CRM, transaction data and user information are similar, the specific required data details are determined by the business scenario, CRM basic data requirements are login information, member-related information, transaction data includes — transaction time, transaction object, transaction content, transaction amount, transaction status.

Iii. Data reporting method

After talking about data, the next step is to figure out how to get the data we really need. Data reporting methods can be broadly grouped into three categories:

  1. The first type is code burying point, that is, the node that needs burying point manually calls the interface to upload burying point data, umeng, Baidu Statistics and other third-party data statistics service providers mostly adopt this scheme;

  2. The second type is the visualization buried point, that is, through the configuration of the visualization tool to collect nodes, the front end automatically analyzes the configuration and reports the buried point data, so as to achieve the so-called “traceless buried point”, which represents the open source Mixpanel;

  3. The third type is “no buried point”, which does not really need burying point. Instead, the front-end automatically collects all events and reports burying point data, and filters out useful data during data calculation at the back-end, which represents the domestic GrowingIO.

Focus on unburied point, visual buried point can be considered as a derivative of unburied point so visual buried point is not discussed here, mainly compared with code buried point and unburied point.

3.1 Code burying or burying disadvantages of Capture mode

For data products:

  1. Rely on human experience and intuition. Business-related buried points require subjective judgment of data products or business products, while technology-related buried points require subjective judgment of technical personnel.
  2. High communication cost Data products determine the required data, need to put forward requirements and development communication, and data personnel are not particularly familiar with the technology, also need to clarify with developers whether the feasibility of relevant information can be reported.
  3. Data cleaning costs vary with service changes, and the data required for subjective judgment may change. In this case, the data that has been previously ordered needs to be manually cleaned, and the cleaning workload is not small.

For development:

  1. Developer burnout burdensome points for business teams are often criticized by the developers involved. Developers can’t just focus on technology, they need to distract themselves with highly repetitive, mechanical tasks like burying sites.
  2. Embedded code is highly intrusive and negatively affects system design and code maintainability. Most business-related data points need to be embedded manually, and the embedded code has to be coupled with the business code. Even if the industry has no buried point SDK, data products focus on the business special points can not escape the manual buried point. As the demand for data changes under the ever-changing business, the embedded code also needs to change. Further increase in development and code maintenance costs.
  3. Error prone, leakage due to the subjective awareness of artificial dot difference, dot location accuracy is difficult to control, and easy to leak data
  4. There is the cost of operation process. When data is lost or misadopted, we have to go through the development process and online process again, which is inefficient.

3.2 Advantage of no buried point

Compared with manual burying point, no burying point advantage need not be explained.

  1. To improve efficiency
  2. Data is more comprehensive and extracted on demand
  3. Reduce code intrusion

Iv. Wechat small program without embedded SDK scheme

4.1 No buried data requirement

  • Report the initialization execution of small programs
  • Interface Request Reporting
  • Error reporting
  • User Behavior Reporting



    Since applets are different from Web services, there is no JS/CSS resource loading. Therefore, more attention is paid to recording and capturing the initialization status and execution status of applets. In the figure, resource integrity check corresponds to thisInitialize the completion check.The request domain name in the online mini program must be HTTPS protocol, so the probability of DNS hijacking is greatly reduced or even unlikely to happen, and the feasibility of monitoring DNS hijacking from the client is low (paradox exists), so the situation of DNS hijacking is not considered for the time being.

4.2 Difficulties and key points of developing non-embedded SDK for wechat applets

  • The request module has been encapsulated by wechat, and the running environment of the small program is not a browser object, so it is not easily rewritten and encapsulated like web applications.
  • Ensure the monitoring compatibility of the three operating environments

    • On Android, js running environment is X5 kernel
    • On iOS, the JAVASCRIPT environment is JavaScriptCore
    • Development tools, js runtime environment is NWJS (Chrome kernel)


  • User behavior cannot be monitored directly
  • Strong extensibility needs to be used in a variety of architectural design scenarios (small programs)
  • There is a 2M limit for each applet package, and the applet does not support the introduction of NPM package in the code, so the SDK itself will occupy the 2M limit. Although small program has subcontract internal test, but this function is not fully open, moreover as an SDK is too large is unreasonable.
  • Large amount of data collection to minimize performance loss
  • No impact on business (basic requirements)

4.3 SDK design of wechat small program without buried point

Data layer design:



Data flow direction design:





Collection mode design:

Access mode:

The SDK NPM package code is introduced before the initialization code of the small program, and the SDK code is introduced into the project when the small program is packaged, and the data can be collected automatically after initialization. Initialization examples are as follows:

import Prajna from './lib/prajna-wxapp-sdk.js';

Prajna.init({channel: 'channel',env: config.IS_PRODUCION ? 'product': 'beta',project: 'yourProjectName',methodConfg: {} // Business-specific method execution and custom dot-names})Copy the code


Unburied point combined with buried point:

Small program without buried point way, you can obtain a large amount of data can basically do the user’s use of the scene of a high degree of restoration. The granularity of SDK is the execution of a certain method. When the granularity of special business concerns is smaller than that of SDK, it cannot be completely solved by SDK without burying point alone, but can be combined with burying point without burying point. Therefore, our small program without burying point SDK also provides API interface of manual burying point to improve the integrity of data. Further to solve more problems (review the role mentioned in reference to the importance of data).

5. Problems encountered in the SDK without burying points for small programs

In addition to solving the problems mentioned above about the difficulties and key points of developing unburied SDK for wechat small program, we also encountered some new problems.

  1. The SDK itself will have a certain impact on the business performance. Data is temporarily stored in the localstorage of the small program, and frequent storage/retrieval of the localstorage of the small program will expose the operation lag problem when the performance of the business side is relatively high. Reduce the save/fetch operation of localStorage. Only unuploaded data is saved to LocalStorage when the page is closed
  2. There was a huge amount of data with no buried point in the full scale, and the problem of server availability decreased due to server overload was encountered when gray scale went online. The amount of data to be reported is controlled. Only key nodes automatically report data. Other nodes with service concerns can be configured during access initialization to avoid excessive redundant data. In addition, special attention should be paid to the design of the reported data structure, which aims to be clear, concise and easy to retrieve (distinguish) data.
  3. At the beginning, I originally wanted to make a “switch” for the use of SDK or not for gray on-line, so as to avoid the small program rollback process. Since the “switch” relies on server interface control and the request is asynchronous, it means that the initialization process and the start of the small program must wait until the interface of the control switch returns, otherwise the “switch” is effectively invalid. Considering that the SDK does not affect service performance, discard the “switch” and make a try and catch inside the SDK to avoid impact on service availability.

There are no buried points reported to obtain data, then the data can be used to solve many problems. For data utilization, please look forward to the next section – Data applications.



References:

[1] Pei Ji, Translator: Yao Jun et al., In-depth Understanding of Website Optimization, Publishing House: China Machine Press, 2013-08

[2] Zhang Ximeng, Chief Growth Officer, China Machine Press, 1st edition (November 6, 2017)