1 review

The scheme includes two parts: the implementation scheme of buried technology and the design scheme of buried service. This scheme aims to complete the construction of data collection technology and business design, complete the burying point of user behavior data collection with the cooperation of App and small program system suppliers, and build online user behavior labels and portraits based on the burying point data.

1.1 Data burying point architecture design ideas

The so-called “buried point” is a term in the field of data collection (especially in the field of user behavior data collection). It refers to the relevant technology and implementation process of capturing, processing and sending specific user behaviors or events. For example, the number of times a user clicks an icon, the length of watching a video and so on.

The technical essence of burying point is to monitor the events in the running process of software application first, and judge and capture the events that need attention when they happen.

Full buried: The access terminal only needs to import the SDK for global configuration to complete full buried operations. The SDK can automatically collect some user behaviors, such as App startup, exit, page browsing, and control click. And all reported, without the need for developers to add additional code.

Code burying point: The custom event reporting code is embedded in each event function that needs to collect information. For key business events (such as purchase, payment, course playing, etc.), this scheme can be combined to define the business and behavior data to be collected according to business requirements.

1.2 Design ideas of buried service

Burial point business design, first of all, according to the business analysis to clarify the target behavior of collection, and further figure out where and what kind of points should be buried. It is recommended to use “Event model” to describe various behaviors of users. The Event model includes two core entities, Event and User.

By describing user behavior based on 4W1H model, the whole behavior can be clearly described. Key points include: who, when, where, in what way and what he did. The combination of these two entities provides a clear description of user behavior.

Take the browsing of an App page as an example to define the behavior analysis of burying points and dimensions:

1.3 Description of professional terms in the program

The dimension

Dimensions describe the characteristics or attributes of a thing. For example, what gender a person belongs to, what city they live in, what color they like, these are all attributes that a person possesses.

In the field of website analysis, dimensions are often used to describe and analyze indicators. For example, a single number of visits indicator cannot tell you much information. Once the dimension of source is added, it immediately becomes meaningful.

indicators

Indicators, that is, specific values. Visitors, page views, and length of stay are common metrics.

The index can be divided into counting index and compound index. Counting indicators such as visitors, visits, page views, stay time, etc.; Composite indicators such as jump rate, interaction depth, conversion rate and so on. Indicators usually have greater significance when they are analyzed along with dimensions.

Show and click

Display refers to the number of exposures of elements on the page. Click refers to the number of times a page element is clicked by the user.

These two indicators are mainly applicable to online advertising, such as assessing how many times a brand advertisement is displayed and clicked on sina’s home page.

visitors

Visitor is a person who visits a website or App. 1. A Unique visitor to a hotel.

For data statistics tools, anonymous IDS are generally used to mark visitors, cookies (a short piece of text put on the user’s browser by the website server) are the products on the web side, and device IDS are the products on the App side.

access

Visit, a common concept of web products, refers to a series of continuous page browsing behaviors of users, which is synonymous with Session. With the rise of the mobile Internet, Session has gradually replaced Visit as the main term considering the use of apps.

The industry sets a validity period for the interval between activities within a Session. The interval is 30 minutes for web products and 1 minute for App products.

Page views

PageView, also known as PV, refers to the number of times a page is viewed by the user, strictly defined as a request made and completed by the user to the website to download a page.

The concept of page browsing is mainly applicable to web products. For App analysis, screen browsing, namely ScreenView, is mainly used.

The stay time

Corresponding to user Session, there is the index of duration of stay, which is mainly used to measure the depth of interaction between users and websites and apps. The deeper the interaction, the longer the stay.

Generally, there are concepts such as page stay time, session duration and average stay time. The core principle of its calculation is to record the time stamp when user behavior occurs, and then apply the corresponding formula to calculate later.

Bounce rate

BounceRate, an important indicator of the quality of the landing page. The concept of jump out refers to that users choose to leave after only one interaction in a visit. The concept of jump out rate exists in both a single page and the whole site.

The page bounce rate is the percentage of the number of page jumps in the total number of page accesses. The total site bounce rate is the number of access jumps divided by the total number of access.

The depth of the interaction

Interaction depth refers to how many pages a user visits during a single visit to a website or App. The more pages a user visits in a single browse, the deeper the interaction. The depth of interaction can reflect the attractiveness of a website or App to users.

Session can be used to calculate the average user interaction depth.

Conversion rate

The core metrics that any product needs to focus on are mainly used to measure the user’s ability to transform from traffic to actual goals.

It is common to divide the number of conversions or the number of people entering the target conversion funnel by the number of conversions or the number of people entering the target conversion funnel. Depending on the behavior of the target, conversion rate is a very flexible metric. For example, you can customize the conversion rate of registration, login, purchase, search success, etc.

Design of buried point technology

SDK buried point collection behavior data source terminals include iOS, Android, Web, H5, wechat applets, etc. SDKS of different terminals use SDKS of corresponding platforms and mainstream languages. Data collected at buried points are submitted to the server API through JSON data in HTTP POST mode.

The server API consists of a data access system that uses Nginx to receive data sent through the API and write it to a log file. Use Nginx to achieve high reliability and high scalability.

For the logs printed to files by Nginx, the Source module of Flume will read the Nginx logs in real time, and the Channel module will process the data. Finally, the Sink module will publish the processing results to Kafka.

Kafka is a widely used, highly available distributed message queue that serves as a buffer between data access and data processing processes, as well as a backup of recent data. By providing external access apis, the data center can pull data directly from Kafka into the data warehouse to build metrics.