Buried data acquisition scheme
What is a burial point
Buried point, whose scientific name is Event Tracking, mainly refers to the relevant technology and implementation process of capturing, processing and sending specific user behaviors or business processes. Buried point is a technical term in the field of data, but also a common name in the field of Internet.
Buried point is the basis of product data analysis, which is generally used for feedback of recommendation system, monitoring and analysis of user behavior, statistical analysis of new function or operation effect, etc.
Buried points contain two important concepts: events and attributes.
- Event: What happens in the application, such as a user action, system event, or system error. For example, if you take a product, it contains the following events: enter_page (enter page), leave_page (leave page).
- Attributes (PARAM) : Attributes, such as language preference or geographic location, defined to describe the segmentation of a user group. For example, the “Enter practice” event contains the following event attributes: enter_from (from which page), class_id (course ID), and so on.
- Attribute value: The dimension of the attribute, that is, the specific dimension when the behavior is triggered. For example, enter_from: home, system, and so on.
The mainstream plan
- Traceless buried point (full buried point), using the browser or APP built-in monitoring method to collect users’ browsing page, clicking and other behaviors, generally used for coarse granularity data analysis, such as the company’s SLARDAR
- Advantages:
- Access is simple, almost non-intrusive, and requires no additional development costs
- The collection of user actions is so complete that there are few omissions
- Disadvantages:
- Data is noisy and will be collected whether it is useful or not
- Unable to customize burial points, unable to collect specified events and business attributes
- Less information is available to DA
- Advantages:
- Code burials, where front-end developers customize listening and collecting in the code
- Advantages:
- Can be precisely buried, with a clear event identification
- Business attributes are very rich
- Buried point trigger mode can be defined flexibly
- DA is more convenient and accurate to use
- Disadvantages:
- The workload is large, and the code is very intrusive, and the later maintenance is not very convenient
- Advantages:
- Buried point SDK. The SDK exposes the interface for reporting buried points to the outside, and the developer is unaware of the monitoring and collection process.
- Advantages:
- Business development only needs to focus on event identification, business attributes, and so on
- Take into account the advantages of non-trace burying point and code burying point
- Disadvantages:
- For the moment
- Advantages:
Common buried point properties
Usually, the front end calculates buried points according to the page dimension, and the common event attributes are as follows:
attribute | describe |
---|---|
uid | User ID, or if the user is not logged in, a specific id is returned |
url | The URL of the page triggered by the current event |
eventTime | Trigger the timestamp of the buried point |
localTime | The local time of the user when the embedded point is triggered is expressed in the standard YYYY-MM-DD HH: MM: SS format, which facilitates subsequent query using a string |
deviceType | The type of device the current user is using, such as Apple, Samsung, Chrome, etc |
deviceId | Id of the device used by the current user |
osType | The operating system type of the current user, such as Windows, MacOS, ios, and Android |
osVersion | System version used by the current user |
appVersion | Current App Version |
appId | Current App ID |
extra | Custom data, usually serialized strings, and the data structure should remain stable |
Common buried event
The event | Report the timing | describe |
---|---|---|
Page to stay | When the current page is switched or uninstalled | Record the browsing time of previous page |
pv | When entering the page | Page access times, UV only need to filter according to deviceId |
Interaction events | When a user interaction event is triggered | Such as click, long press and so on |
Logical event | When the logical conditions are met | Such as login, jump page and so on |
Performance data collection solution
Currently, most performance indicator data comes from window.performance API.
Performance.timing
Parameter names | describe |
---|---|
connectEnd | HTTP (TCP) returns the timestamp when the connection between the browser and the server was established. If a persistent connection is established, the return value is the same as the value of the fetchStart attribute. Connection establishment refers to the completion of all handshake and authentication processes. |
connectStart | Timestamp of the end of HTTP (TCP) domain name query. If persistent Connection is used, or the information is stored in a cache or local resource, this value will be consistent with fetchStart. |
domComplete | The timestamp when the current Document is parsed, i.e. Document.readyState becomes ‘complete’ and the corresponding readyStatechange is triggered |
domContentLoadedEventEnd | The timestamp when all scripts that need to be executed immediately have been executed, regardless of the order in which they are executed. |
domContentLoadedEventStart | When the parser sends the DOMContentLoaded event, the timestamp when all the scripts that need to be executed have been parsed. |
domInteractive | The timestamp when the DOM structure of the current web page finishes parsing and starts loading the embedded resource (that is, when the Document.readyState property changes to “interactive” and the corresponding readyStatechange event is triggered). |
domLoading | The timestamp when the DOM structure of the current web page starts parsing (i.e., when the document. readyState property becomes “loading” and the corresponding readyStatechange event is triggered). |
domainLookupEnd | Time when DNS domain name query is complete. Is equal to the fetchStart value if local caching (that is, no DNS query) or persistent connections are used |
domainLookupStart | DNS The UNIX timestamp from which domain name queries start. If persistent Connection is used, or the information is stored in a cache or local resource, this value will be consistent with fetchStart. |
fetchStart | The browser is ready to fetch the time stamp of the document using an HTTP request. This point in time is before any application caches are checked. |
loadEventEnd | The timestamp when the load event ends, that is, when the load event completes. If the event has not yet been sent or completed, its value will be 0. |
loadEventStart | The timestamp when the LOAD event was sent. If the event has not already been sent, its value will be 0. |
navigationStart | Timestamp at the end of an unload of a page on the same browser. If there is no previous page, this value is the same as fetchStart. |
redirectEnd | The timestamp when the last HTTP redirect was completed (that is, when the last bit of the HTTP response was received directly). If there is no redirect, or a different source in the redirect, this value returns 0. |
redirectStart | The timestamp when the first HTTP redirect started. If there is no redirect, or a different source in the redirect, this value returns 0. |
requestStart | Returns the timestamp when the browser made an HTTP request to the server (or when it started reading the local cache). |
responseEnd | Returns the timestamp when the browser received (or read from the local cache, or read from a local resource) the last byte from the server (or closed if the HTTP connection has been closed before). |
responseStart | Returns the timestamp when the browser receives the first byte from the server (or reads it from the local cache). If the transport layer fails after initiating the request and the connection is reopened, this property will be counted as the corresponding initiation time of the new request |
secureConnectionStart | HTTPS returns the timestamp when the handshake between the browser and the server began the secure link. If the current page does not require a secure connection, 0 is returned. |
unloadEventEnd | Corresponding to unloadEventStart, unload the timestamp when event processing is complete. If there is no previous page, this value returns 0. |
unloadEventStart | The timestamp when the previous page unload event was thrown. If there is no previous page, this value returns 0. |
Common Performance Indicators
Indicators of | describe |
---|---|
FP | When the page was first drawn |
FCP | The first time a page has content drawn |
FMP | The first time a page is effectively drawn, FMP >= FCP |
TTI | Page fully interactive time |
FID | The delay time of the user’s first interaction during page loading |
MPFID | Maximum latency for user interaction during page loading |
LOAD | The time when the page fully loads (when the load event occurs) |
FP
The First Paint (FP) indicator usually reflects the white screen time of a Web page. The white screen time reflects the network loading performance of the current Web page. If the loading performance is good, the white screen time is shorter, the user waits for the content, and the loss probability is reduced.
The index can be achieved by the performance. GetEntriesByType (‘ paint ‘) method to get PerformancePaintTiming API provides some information, to find the name for the first – the object of the paint, The indicator data of FP is described, as shown in the figure below:
FCP
First Contentful Paint (FCP) is the time when the First content is rendered. In performance statistics, the period from the time when a user accesses a Web page to the time when the FCP is considered as the time when no content is created. Generally, FCP >= FP.
The index can be achieved by the performance. GetEntriesByType (‘ paint ‘) method to get PerformancePaintTiming API provides some information, Select first- contentfulful -paint, which describes the FCP indicator data, as shown in the following figure:
FMP
First Meaningful Paint (FMP) is the time when Meaningful content is drawn for the First time. When the overall page layout and text content is fully rendered, the First Meaningful content is considered to have been drawn. Therefore, FMP measures the time for users to see the main content of a web page, which is an important indicator of user experience.
One of the most accepted ways of calculating FMP in the front-end industry is “the time it takes to draw a page after the biggest layout change during loading and rendering.” MutationObserver can be used to monitor the overall DOM change of each page and trigger the callback of MutationObserver. When the callback calculates the change score of the current DOM tree, the moment when the score changes most dramatically is the time point of FMP.
TTI
TTI (Time To Interactive) : The Time between the page loading and the page being fully Interactive. When a page is in a fully interactive state, the following three conditions are met:
- The page already displays useful content.
- The event response function associated with the visible element on the page has been registered.
- The event response function can be executed within 50ms after the event occurs.
Resource loading Indicators
Window. The performance. GetEntriesByType (‘ resource ‘) returns the current page loading all of the resources (js, CSS, img…). Can be used to collect static resource performance data.
The main types are: Script, link, IMG, CSS, XMLHttprequest, Beacon, FETCH, and Other. PerformanceResourceTiming – Web APIs | MDN
Parameter names | describe |
---|---|
connectEnd | A DOMHighResTimeStamp indicating the time after the browser finished establishing a connection to the server to retrieve the resource. |
connectStart | A DOMHighResTimeStamp, representing the time before the browser began establishing a connection to the server to retrieve resources. |
decodedBodySize | A number that represents the size (in octets) received from the request for the message body (HTTP or cache) after removing the content encoding of any application. |
domainLookupEnd | A DOMHighResTimeStamp, indicating the time after the browser has completed the domain lookup for the resource. |
domainLookupStart | A DOMHighResTimeStamp, indicating the time before the browser immediately begins the domain name lookup for the resource |
duration | Return a timestamp, the difference between the responseEnd and startTime properties. |
encodedBodySize | A number that represents the size (in octet bytes) received from a request (HTTP or cache) for a valid content body before any application’s content encoding is removed. |
entryType | Return to the “resource”. |
fetchStart | A DOMHighResTimeStamp, indicating the time before the browser is about to start retrieving the resource. |
initiatorType | A string that represents the type of resource that started the performance entry |
name | Returns the resource URL. |
nextHopProtocol | A string representing the network protocol used to obtain the resourceALPN Protocol ID (RFC7301)Definition. |
redirectEnd | A DOMHighResTimeStamp that represents the time when the last byte sent from the last redirect response was received. |
redirectStart | A DOMHighResTimeStamp represents the time before the request to initiate redirection began. |
requestStart | A DOMHighResTimeStamp, indicating the time before the browser began requesting resources from the server. |
responseEnd | A DOMHighResTimeStamp, indicating the time after the browser receives the last byte of the resource or before the transport connection is closed, whichever comes first. |
responseStart | A DOMHighResTimeStamp that represents the time since the browser received the first byte of the response from the server. |
secureConnectionStart | A DOMHighResTimeStamp indicating the time before the browser is about to start the handshake process to protect the current connection. |
serverTiming | A PerformanceServerTiming array containing PerformanceServerTiming entries for server timing metrics. |
startTime | Returns a timestamp indicating the time when resource retrieval began. This value is equivalent to fetchStart. |
transferSize | A number represents the size (in octet bytes) of the fetched resource. This size includes the response header field as well as the response valid content body. |
workerStart | A DOMHighResTimeStamp that returns the timestamp before the FetchEvent was dispatched if the Service Worker thread was already running, or before the Service Worker thread was started if it was not. If the service Worker does not intercept the resource, this property will always return 0. |
Calculation methods of other indicators
Indicators of | describe | calculation |
---|---|---|
The DNS query | DNS Phase Time | domainLookupEnd – domainLookupStart |
A TCP connection | TCP Phase Time | connectEnd – connectStart |
SSL jianlian | SSL Connection time | connectEnd – secureConnectionStart |
First byte network request | First byte Response time (TTFB) | responseStart – requestStart |
Content transmission | Content transmission, Response phase time-consuming | responseEnd – responseStart |
The DOM parsing | Dom parsing time | domInteractive – responseEnd |
Resource to load | Resource to load | loadEventStart – domContentLoadedEventEnd |
The first byte | The first byte | responseStart – fetchStart |
DOM Ready | dom ready | domContentLoadedEventEnd – fetchStart |
Redirect time | Redirection time | redirectEnd – redirectStart |
DOM render | Dom rendering time | domComplete – domLoading |
load | Page loading time | loadEventEnd – navigationStart |
unload | Page Uninstallation Time | unloadEventEnd – unloadEventStart |
Request time | Request time | responseEnd – requestStart |
Bad time | Bad time | domLoading – navigationStart |
Error data acquisition scheme
There are three types of errors that can be caught:
- Resource loading error, pass
addEventListener('error', callback, true)
Catch a resource load failure error in the capture phase. - Js execution error, pass
window.onerror
Catch JS errors.- Cross-domain scripts will give a “Script Error.” message and will not get the specific Error message and stack information. You need to add it in the script tag
crossorigin="anonymous"
Property, and the resource server needs to add CORS related configuration, such asAccess-Control-Allow-Origin: *
- Cross-domain scripts will give a “Script Error.” message and will not get the specific Error message and stack information. You need to add it in the script tag
- Promise error, pass
addEventListener('unhandledrejection', callback)
Catch a promise error, but there is no information about the number of rows, columns, and so on. You can only throw the error message manually.
// During the capture phase, a resource load failure error is caught
addEventListener('error'.e= > {
const target = e.target
if(target ! =window) {
monitor.errors.push({
type: target.localName,
url: target.src || target.href,
msg: (target.src || target.href) + ' is load error'.time: Date.now()
})
}
}, true)
// Listen for js errors
window.onerror = function(msg, url, row, col, error) {
monitor.errors.push({
type: 'javascript'.row: row,
col: col,
msg: error && error.stack? error.stack : msg,
url: url,
time: Date.now()
})
}
// Listen to the promise error
addEventListener('unhandledrejection'.e= > {
monitor.errors.push({
type: 'promise'.msg: (e.reason && e.reason.msg) || e.reason || ' '.time: Date.now()
})
})
Copy the code
Data reporting scheme
In this scenario, two issues need to be considered:
- If the data reporting interface and the service system use the same domain name, the browser has restrictions on the number of concurrent requests. Therefore, there is a possibility of network resource competition.
- Browsers typically ignore asynchronous Ajax requests during page unload, and if data requests are required, they typically delay page unload by creating synchronous Ajax requests in either unload or beforeUnload events. From the user side, that is, the page hops are slow.
Beacon
It can be seen that, except for Internet Explorer, the mainstream modern browsers have a very high support rate for Beacon. Beacon-mdn document
The Beacon interface is used to schedule asynchronous non-blocking requests to the Web server.
- Beacon requests use HTTP
POST
Method, and no response is required. - The Beacon request ensures that initialization is completed before the page triggers an UNLOAD.
In layman’s terms, a Beacon sends data asynchronously to the server and ensures that the request is sent before the page offload is complete (to solve the problem that Ajax page offload will terminate the request). The usage method is as follows:
navigator.sendBeacon(url, data);
Copy the code
The data argument is optional and can be of type ArrayBufferView, Blob, DOMString, or FormData. This method will return true if the browser successfully enqueued the Beacon request to be sent, false otherwise
When using Beacon, the background needs to use the POST method to receive parameters. Considering the cross-domain problem, the background also needs to transform the interface and configure CORS. Also, the request header must satisfy the cerS-Safelisted request-header, Content-type must be Application/X-www-form-urlencoded, multipart/form-data, or Text /plain.
type ContentType = 'application/x-www-form-urlencoded' | 'multipart/form-data' | 'text/plain';
const serilizeParams = (params: object) = > {
return window.btoa(JSON.stringify(params))
}
function sendBeacon(url: string, params: object) {
const formData = new FormData()
formData.append('params', serilizeParams(params))
navigator.sendBeacon(url, formData)
}
Copy the code
Image
Compatibility issues with sendBeacon are inevitable, but you can take advantage of the fact that most browsers load images before the page is unloaded by adding an IMG file to the page to report data.
function sendImage(url: string, params: object) {
const img = new Image()
img.style.display = 'none'
const removeImage = function() {
img.parentNode.removeChild(img)
}
img.onload = removeImage
img.onerror = removeImage
img.src = `${url}? params=${serilizeParams(params)}`
document.body.appendChild(img)
}
Copy the code
Because IMG images are get requests, different servers have limits on the length of URIs. When the length exceeds the limit, HTTP 414 errors will occur. Therefore, it is necessary to pay attention to the reporting frequency and reduce the number of attributes uploaded at one time.
HTTP 1.1 defines Status Code 414 Request-URI Too Long for the cases where a server-defined limit is reached further details on RFC 2616. For the case of client-defined limits, there is no sense on the server returning something, because the server won’t receive the request at all.
Compatible with the plan
SendBeacon is preferred, and Image is used as fallback.
function sendLog(url: string, params: object) {
if(navigator.sendbeacon) {sendBeacon(url, params)}else {
sendImage(url, params)
}
}
Copy the code
The related documents
- In-depth understanding of front-end monitoring principles
- Front-end buried point data collection and reporting scheme