preface

Buried point is a common data collection method for website analysis. We mainly use it to collect user behavior data (such as page access path and clicked elements) for data analysis, so that operation students can arrange operation plans more rationally. Now there are many third-party buried point service providers on the market, Baidu statistics, Youmeng, growingIO and so on, we should not be too strange, most of the cases we just use, recently I studied the web buried point, do you want to understand.

There are three types of buried sites

User behavior analysis is a large system, a typical data platform. It consists of several modules: user data collection, user behavior modeling and analysis, visual report form display. The existing buried point acquisition scheme can be roughly divided into three types: manual buried point, visual buried point and no buried point

  1. Manual burying point Manual code burying point is common. The service side that needs to call the burying point invokes the burying point method where data needs to be collected. The advantage is that the traffic is controllable. The service provider can collect data in any scenario at any place as required, and the collected information is completely controlled by the service provider. The downside of this is that the business side needs to write down the method, and if the collection scheme changes, the business side needs to rewrite the code and redistribute it.
  2. Visualization burying point but burying point is the burying point trend this year, a lot of big factory’s own data burying point department also began to do this. The advantage is that the business side has less work to do, while the disadvantage is that it is technically difficult to promote and implement (the business side front-end code specification is a major prerequisite). Many of Ali’s active pages are operated through the visual interface drag-and-drop configuration, and these active control elements are uniquely identified. Through the buried configuration background, you can automatically generate buried code to be embedded in the page by associating elements with events to be collected.
  3. No buried point Without buried point, the front end automatically collects all events and reports buried point data, and the back end filters and calculates useful data. The advantage is that the front end only needs to load buried point scripts. The disadvantage is that the traffic and data collection is too large, the server performance pressure is huge, the mainstream GrowingIO is this kind of implementation scheme.

We give up the realization of visual buried points for the time being, and try on manual buried points and no buried points. For the convenience of description, I will call the collection script SDK below.

Think about a few questions

Buried development needs to consider a lot of content, through the principle of not easy to start writing code, we first consider the following questions before development

  1. What are we going to collect and what are the conventions of the collection interfaces
  2. How does the business side invoke our collection script
  3. Manual embedding point: THE SDK needs to encapsulate a method for the business side to call, and the business side can control the parameter transmission mode
  4. No-buried point: Considering the pressure of data volume on the server, we need to configure no-buried point switch and which elements can be configured for no-buried point collection
  5. User identification: How to distinguish and associate the collected data of tourist users and login users
  6. Device Id: When a user accesses web pages through a browser, the device Id must be stored in the browser. When a user accesses websites of different service providers, the device Id must be the same
  7. Single page application: Is there any difference in data collection between popular single page application and ordinary Web page
  8. Hybrid application: Hybrid application of APP and H5 How do we communicate

What are we going to collect and what are the conventions of the collection interfaces

In the first phase, we collected the basic indicators of PV (page views or clicks), UV (multiple visits by the same visitor within a day), clicks and user’s access path. The traffic transformation of refined analysis needs to be related to business, and it needs to make an agreement with the data analysis party, so we reserve expansion. Therefore, our acquisition interface needs the following conventions

{"header":{// HTTP header" x-device-id ":" 550E8400-E29B-41D4-A716-446655440000 ", // Device Id, Used to distinguish user devices "x-source-URL ":"https://www.baidu.com/", // Source address, associated with the entire operation process of users, used for user behavior path analysis, such as login, to the home page, enter the details of goods, Exit the entire complete path "x-current-URL ":"",// the Current address, the page where the User action occurred" x-user-id ":"",// the User Id, }, "body":[{// HTTP body" PageSessionID":"", // page ID, used to identify page events, such as loading and leaving we will send two events, "PageTitle": "buried test page ", // PageTitle :" CurrentTime": "1517798922201", // Event occurrence time "ExtraInfo": {} // Extended field, pass parameter for specific service analysis}]}Copy the code

The above is the general event collection interface that we have agreed on now, and the parameters passed will basically change according to the different collection events. However, the user’s device does not change during the whole access behavior of the user. If you want to collect device information, you can agree a new interface and send device information before the whole collection starts, so as to avoid repeated collection of fixed data on the event collection interface.

{"header":{// HTTP header" x-device-id ": "550e8400-e29b-41d4-a716-446655440000", // device id}, "body":{// HTTP body" DeviceType": "web", // DeviceType" ScreenWide" : 768, // screen width "ScreenHigh": 1366, // screen height "Language": "zh-cn" //Copy the code

How does the business side invoke our collection script

The burying point should make it possible for the calling business side to do as little work as possible, preferably nothing, 😁, but this is a bit difficult to implement. The solution we adopted is to let the business side reference our SDK through script script in the code. The business side only needs to configure some parameters for buried customization (👆, the flow control without buried point we talked about), and then do nothing to collect the basic data.

(function() { var collect = document.createElement('script'); collect.type = 'text/javascript'; collect.async = true; collect.src = 'http://collect.trc.com/index.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(collect, s); }) (); Var _XT = []; var _XT = []; _XT.push(['Target','div']);Copy the code

Manual burying point: SDK

If the business side needs to collect more business-specific data, it can call the methods we exposed to do so

// customEvent sdk.dispatch('customEvent',{extraInfo:' additional information for customEvent'})Copy the code

Visitor and user association

We use the userId as the userId. The user of the same device switches from the tourist user to the login user. If we want to associate them, we need a device Id to associate them

Web device Id

When users access web pages through the browser, device ids must be stored in the browser. The device ids must be the same when the same user accesses different service websites. Web variable storage, we immediately think of cookie, sessionStorage, localStorage, but these three storage methods are related to the domain name of access to resources. We can’t create a new device fingerprint every time we visit a website, so we need a way to share device fingerprints across domains

The solution we came up with was to load a static page with a nested iframe, store the device ID in the domain name loaded on the iframe, and get the device ID by sharing variables across domains using the principle of iframe contentWindow communication. Get the event state through postMessage, call the encapsulated callback function for data processing concrete implementation

// Web application, cross-domain cookie communication by embedding iframe, Set the device ID, collect.setIframe = function () { var that = this var iframe = document.createElement('iframe') iframe.id = "frame", . The iframe SRC = 'http://collectiframe.trc.com' / / proxy configuration domain name, Iframe.style. display='none' // Iframe is set to generate a fixed device ID, Do not display the document. The body. The appendChild (iframe) iframe. Onload = function () {iframe. ContentWindow. PostMessage (' the loaded ', '*'); } // Listen for the message event, iframe is loaded, get device ID, Helper. on(window,"message",function(event){that.deviceid = event.data.deviceId if(event.data && event.data.type == 'loaded'){ that.sendDevice(that.getDevice(), that.deviceUrl); setTimeout(function () { that.send(that.beforeload) that.send(that.loaded) },1000) } }) }Copy the code

Iframe communicates with the SDK

Function receiveMessageFromIndex (event) {getDeviceInfo() // Obtain device information var data = {deviceId: _deviceId, type:event.data } event.source.postMessage(data, '*'); If (window.addeventListener){window.addeventListener ("message", receiveMessageFromIndex, false); }else{ window.attachEvent("onmessage", receiveMessageFromIndex, false)Copy the code

If you want to know you can check out my other blog about Web browser fingerprint sharing across domains

Single page application: Is there any difference in data collection between popular single page application and ordinary Web page

We know that single-page applications load without a refresh, so our handling of page jumps will be different from our normal pages. The single-page routing plugin uses the window’s pushState and replaceState methods to modify the user’s browsing history without refreshing.

The window’s history object provides two methods to modify the user’s browsing history without refreshing. The difference between pushSate and replaceState is pushState, which adds an access record to the end of the user’s page. ReplaceState directly replaces the current access record, so we just need to rewrite the history method and execute our collection method before the method is executed to realize the collection of page jump events for a single page application

// Change the idea: Copy the window default replaceState function and override history.replaceState to insert our collection behavior into the method. Windows default replaceState methods collect = {}. Collect onPushStateCallback: Function (function(history){var replaceState = history.replacestate; ReplaceState = function(state, param) {// replaceState var url = arguments[2]; if (typeof collect.onPushStateCallback == "function") { collect.onPushStateCallback({state: state, param: param, url: url}); } return replacestate. apply(history, arguments); // Call native replaceState}; })(window.history);Copy the code

If you want to learn more, check out my other blog post on how you need to know how to implement single-page routing

Hybrid application: Hybrid application of APP and H5 How do we communicate

Now most of the applications are not pure native applications, and the mixed application of APP and H5 is now a mainstream.

Pure Web data acquisition We consider that front-end storage data is easy to lose, we use the acquisition interface to transmit the collected data when each event is triggered. Considering that many users’ mobile phones are monitored by the data manager software, if h5 still collects data in the App and sends it to the server, it is very likely that the data manager will detect it and give an alarm to the user, so that the user will no longer trust your App. Therefore, we send the data to the App when the user operates. Store it in your app. When the user switches the application to the background, it is packaged and transmitted to the server through the SDK of the APP. The method we provide to the APP encapsulates an adapter

// Mix app with H5, SaveEvent = function (jsonString) {collect.dcpDeviceType && setTimeout(function () { if(collect.dcpDeviceType=='android'){ android.saveEvent(jsonString) } else { window.webkit && window.webkit.messageHandlers ? window.webkit.messageHandlers.nativeBridge.postMessage(jsonString) : window.postBridgeMessage(jsonString) } },1000) }Copy the code

Implementation approach

After thinking about the above questions, we have roughly had some ideas about the realization of buried points. We use mind mapping to restore what we are going to do. Remember to enlarge the picture, because it is too small to see clearly.

  1. We need to expose the methods invoked by the business side

Let’s look at the implementation of some core code

Utility methods

We have defined several tools to improve the happiness of development 😝

var helper = {}; Helper. Uuid = function(){// Element binding event listener, function(){// element binding event listener, Helper. on = function(){} // element removes the event to listen to the adapter function, Helper. remove = function(){} // Convert json to string, and convert the parameter type of event transmission to helper.changeJSON2Query = function(){} // Parse the relative path to the full document path helper.normalize = function(){}Copy the code

Collecting logic

var collect = { deviceUrl:'http://collect.trc.com/rest/collect/device/h5/v1', eventUrl:'http://collect.trc.com/rest/collect/event/h5/v1', isuploadUrl:'http://collect.trc.com/rest/collect/isupload/app/v1', parmas:{ ExtraInfo:{} }, device:{} }; Collect.setparames = function(){} // Update the access path and page information collect.updatepageInfo = function(){} // Obtain the event parameters Collect.getparames = function(){} // Obtain device information collect.getDevice = function(){} // Collect event collect.send = function(){} // Collect device information Collect.senddevice = function(){} // Check whether collect. Determine whether to collect or not, cancel event monitoring if not collected (this method will be judged twice in the collection situation of distinguishing tourist identity and user identity in the project) 2. A. A collection is performed. Do not perform any operation. B. No collection is performed. A. If the application is a web application, call collect.setIframe to setIframe b. If it is a hybrid application will begin to load and load transmission complete event to app} / / click event handler. Collect clickHandler = function () {} / / leave page event handler. Collect beforeUnloadHandler = Function () {} / / page back event handlers. Collect onPopStateHandler = function () {} / events/system initialization, left event registration, Collect. event = function(){} // Obtain records Start to load data information collect.getBeforeload = function(){} // After the storage is loaded, obtain the device type. Collect. Onload = function(){1. Check whether the cookie contains device type information. If yes, it indicates mixed application. 2. Call collect.isupload to determine whether to collect} // Web application to conduct cross-domain cookie communication by embedding iframe. Set device ID collect.setIframe = function(){} SaveEvent = function(){} // Collect user-defined event types collect.dispatch = function(){} // Add the parameter userId SessionStorage collect.storeUserId = function(){} SaveEventInfo = function(){} // Page initialization call method collect.init = function(){1. 2. Obtain SDK configuration information and device information 3. Rewrite the history two methods, single page application page jump before calling our own method 4. Call collect.onload method} collect.init(); Initialization / / / / exposed to the business side the method called return {dispatch: collect. Dispatch, storeUserId: collect storeUserId,}Copy the code

extension

👆 is the result of my research during this period of time, the length of the code is relatively long, so it will not be put in the blog. Interested students can add my wechat to communicate, or leave a message below the article. You are also welcome to give me suggestions to help optimize 😝.

Web browser fingerprints are shared across domains

What you need to know about single-page routing

What are the data burying points? What is the point of setting a burial point?

Data acquisition and burial point

Meituan point evaluation front – end traceless embedding practice

How to explain the definition of “UV and PV” clearly and easily