Pay attention to the public number “kite holder”, get a lot of front-end learning materials.
First, the current situation of front-end monitoring
In recent years, front-end monitoring is becoming more and more popular. At present, there are many mature products for us to choose to use, as shown in the following figure
There are so many monitoring platforms, so why learn to do front-end monitoring?
- On the one hand, they want money
- On the other hand, my own project needs customized functionality.
Second, the purpose of front-end monitoring
- Improve user experience
- Faster discovery Discover, locate, and resolve exceptions
- Understanding business data to guide product upgrades – data-driven thinking
Third, the process of front-end monitoring
3.1 to collect
The first step of front-end monitoring is data collection. The collected information includes environment information, performance information, exception information, and service information.
3.1.1 Environment Information
Environmental information is essential for every monitoring system. After all, when troubleshooting problems, we need to know which page is from, who is the browser, and who is the operating user… In this way, the problem can be quickly located and solved. The common environment information mainly includes:
- Url: The page that is being monitored. The page may have performance or exception problems. The value can be obtained from window.location.href
- Ua: indicates the userAgent information of the user, including the type and version of the operating system and browser. Access is: the window. The navigator. UserAgent
- Token: Records who the current user is. By recording who the user is. On the one hand, it is convenient to establish a connection between all the monitoring information of the user and facilitate data analysis; On the other hand, you can view all the operations of the user through the identifier, which is convenient to reproduce the problem.
3.1.2 Performance information
Page performance has a direct impact on user retention. Google DoubleClick research has shown that if a mobile page takes longer than three seconds to load, users give up and leave. The BBC found that for every one second longer a page takes to load, users lose 10%. Google DoubleClick research shows that if a mobile page takes longer than three seconds to load, users give up and leave. The BBC found that for every one second longer a page takes to load, users lose 10%. So our goal is to improve page performance. What metrics do we need to monitor to improve performance?
3.1.2.1 Category of indicators
There are many metrics, which I summarize in the following two aspects: the network layer and the page presentation layer.
1. Network layer
At the network level, indicators include redirection time, DNS resolution time, TCP connection time, SSL time, TTFB network request time, data transmission time, and resource loading time…… , the explanation of each indicator is shown in the following table:
indicators | explain |
---|---|
Redirection time | The amount of time redirected |
DNS Resolution time | After the browser enters the address, the DNS resolves the address first, which can give feedback on whether the server is working or not |
TCP Connection Time | It refers to the time spent in establishing a connection |
SSL Connection time | It takes time to establish data security and integrity |
TTFB network request time | Represents the time the browser receives the first byte |
Data transmission time | The amount of time the browser takes to receive the content |
Resource loading time | The time between when the DOM is built and when the page loads |
Second, the page display level
The indicators at the page display level are proposed for user experience, including FP, FCP, LCP, FMP, DCL, L, etc. These indicators are actually the indicators of the Performance module in Chrome (as shown in the figure).
Each indicator is explained in the following table.
indicators | explain |
---|---|
FP (First Paint) | First drawn, marks the point at which the browser renders anything visually different from what was on the screen before the navigation. |
FCP (First Contentful Paint) | The first content drawing marks the point in time when the browser rendered the first piece of content from the DOM, which could be text, images, SVG, or even elements. |
LCP (Largest Contentful Paint) | Maximum content rendering, which represents the point in time at which the largest visible elements of the viewable area “content” begin to appear on the screen. |
FMP (First Meaningful Paint) | First effective drawing, the point at which the “main content” of the page begins to appear on the screen. It is our primary measure of user loading experience. |
DCL (DomContentLoaded) | When the HTML document is fully loaded and parsed, the DOMContentLoaded event is fired without waiting for the stylesheet, images, and subframes to complete loading. |
L (onLoad) | This is triggered when all the dependent resources have been loaded |
TTI (Time to Interactive) | Interactionable time, the point at which an application is marked as visually rendered and can reliably respond to user input |
FID (First Input Delay) | Initial input delay, the time between the user’s first interaction with the page (clicking a link, clicking a button, etc.) and the page’s response interaction |
3.1.2.2 Index solution
How do you get all these indicators? The browser provides the corresponding interface — the magic window.performance interface. Through this interface, you can obtain some parameters related to performance. The following takes Baidu.com as an example to see the parameters related to these indicators:
The timing property in window.performance is the value needed to solve the above indicators. Look at the above attribute values and then corresponding to the following performance access flow chart, the whole process is not clear.
With the above values, let’s solve the above indices together:
1. Network layer
indicators | To calculate |
---|---|
Redirection time | redirectEnd – redirectStart |
DNS Resolution time | domainLookupEnd – domainLookupStart |
TCP Connection Time | connectEnd – connectStart |
SSL Connection time | connectEnd – secureConnectionStart |
TTFB network request time | responseStart – requestStart |
Data transmission time | responseEnd – responseStart |
Resource loading time | loadEventStart – domContentLoadedEventEnd |
Second, the page display level
Google engineers have been pushing for user-centric performance metrics, so the change at the presentation level is big and the solution is slightly different:
- FP and FCP
Through the window. The performance. GetEntriesByType (” paint “)
const paint = window.performance.getEntriesByType('paint');
const FP = paint[0].startTime,
const FCP = paint[1].startTime,
Copy the code
- LCP
Function getLCP() {// add a new PerformanceObserver((entryList, observer) => { let entries = entryList.getEntries(); const lastEntry = entries[entries.length - 1]; observer.disconnect(); console.log('LCP', lastEntry.renderTime || lastEntry.loadTime); }).observe({entryTypes: ['largest-contentful-paint']}); }Copy the code
- FMP
function getFMP() {
let FMP;
new PerformanceObserver((entryList, observer) => {
let entries = entryList.getEntries();
observer.disconnect();
console.log('FMP', entries);
}).observe({entryTypes: ['element']});
}
Copy the code
- DCL
DomContentLoadEventEnd - fetchStartCopy the code
- L
LoadEventStart - fetchStartCopy the code
- TTI
DomInteractive - fetchStartCopy the code
- FID
function getFID() {
new PerformanceObserver((entryList, observer) => {
let firstInput = entryList.getEntries()[0];
if (firstInput) {
const FID = firstInput.processingStart - firstInput.startTime;
console.log('FID', FID);
}
observer.disconnect();
}).observe({type: 'first-input', buffered: true});
}
Copy the code
3.1.3 Exception Information
For the website, abnormal information is the most fatal, the most impact on user experience, need to focus on monitoring. There are two types of exception information: runtime error and interface error. Here’s a look at the two types of mistakes.
A runtime error
There are seven types of errors that can occur when JavaScript is run: syntax errors, type errors, scope errors, reference errors, eval errors, URL errors, and resource load errors. To catch code errors, you need to consider two types of scenarios: non-Promise scenarios and Promise scenarios, because they have different strategies for catching errors.
1. Non-promise scenarios
Non-promise scenarios can catch errors by listening for error events. The errors caught by the ERROR event fall into two categories: resource errors and code errors. Resource error refers to js, CSS, IMG and so on are not loaded, the error can only be obtained in the capture phase, and for resource error event.target. LocalName value (with this to distinguish resource error and code error); Code errors refer to syntax errors, type errors, etc., you can get code error information, stack, etc., used to troubleshoot errors.
export function listenerError() { window.addEventListener('error', (event) => {if (event.target.localname) {console.log(' This is a resource error ', event); } else {console.log(' This is a code error ', event); } }, true) }Copy the code
2. Promise
Promise scenarios are handled differently. When a Promise is rejected and there is no REJECT handler, an unhandlerejection event is fired, so you can catch errors by listening for unhandlerejection events.
export function listenerPromiseError() { window.addEventListener('unhandledrejection', (event) => {console.log(' This is an error in the Promise scenario ', event); })}Copy the code
2. The interface is incorrect
For the browser, all interfaces are implemented based on XHR and Fetch. In order to catch errors in the interface, you can override this method and then judge the status of the current interface by the information returned by the interface. The following is an example of XHR to show the encapsulation process.
function newXHR() { const XMLHttpRequest = window.XMLHttpRequest; const oldXHROpen = XMLHttpRequest.prototype.open; XMLHttpRequest. Prototype. The open = (method, url, async) = > {/ / do some data report operation return oldXHROpen. Apply (this, the arguments); } const oldXHRSend = XMLHttpRequest.prototype.send; XMLHttpRequest. Prototype. Send = (body) = > {/ / do some data report operation return oldXHRSend. Apply (this, the arguments); }}Copy the code
3.1.4 Service information
Each product has its own business information, such as user online duration, PV, UV, user distribution, etc. Only by obtaining these business information can we have a clearer understanding of the current product status, so that the product manager can better plan the future direction of the product. Since there is so much information about each product, you can write the code as you want, so I won’t go into that here.
3.2 report
There are no more than two ways to report: one is Ajax way to report; The other is to report by Image. At present, many large factories are using a 1*1 pixel GIF image to report, since they are using this strategy, then we will chew a Lao Lao the following two questions.
-
Why is Image reported?
- No cross-domain problems. Because the data server and the back-end server are likely to have different domain names, if the Ajax method is used for processing, cross-domain problems should be dealt with, otherwise the data will be intercepted by the browser.
- It doesn’t block page loading, it just needs the new Image object.
-
There are many types of images. Why use GIF format to report images? It really boils down to one word — small. For a 1*1px image, BMP files need 74 bytes, PNG files need 67 bytes, and GIF files need only 43 bytes. For the same response, GIF can save 41% traffic compared with BMP and 35% traffic compared with PNG, so select GIF to report.
3.3 analysis
After logs are reported, you need to clean them, obtain the required content, and store the analyzed content. According to the data volume, there are two methods: single machine and cluster.
A, single
Websites with few visits and logs can analyze data in a single-node manner. For example, node can read log files, obtain the required information from the log files, and finally store the processed information in the database.
Second, the cluster
Many products have a large number of visits and logs, so Hadoop should be used for distributed processing to obtain the final processing results. The processing flowchart is as follows:
Decide on your own analysis method based on your log magnitude. What is right is best, and you don’t have to blindly pursue the optimal, state-of-the-art processing method.
3.4 the police
When the number of abnormal types exceeds a certain threshold, an alarm notification is required, so that the corresponding staff can deal with the problem and stop the loss in time. According to different alarm levels, you can choose different alarm modes.
- Mail — general alarm
- SMS – Serious alarm, which has affected some services
- Phone calls – particularly serious, e.g. when the system is down
1. If you think this article is good, please share, like, let more people see it
2. Follow the public number of kite holder, get learning materials, regularly push original depth good article for you
reference
www.alloyteam.com/2020/01/141… www.colabug.com/2019/1224/6…