Synthetic monitoring is in rotation practice

Why you need to monitor

As we all know, performance plays an important role in improving site retention and conversion rates, especially for e-commerce sites like Zhuan, where performance indirectly affects a company’s revenue.

But if you were asked to describe the performance of your website application, what would your answer be? You might say: Better than most apps on the market. But how would you describe it if you had to say what was good about it? At this time we will think of the first site performance monitoring, with data speak. But what to monitor and how?

What to monitor

The first thing we need to be clear about is what should we monitor, or how should we measure the performance of a site?

Google first proposed a RAIL model to measure application performance, namely, Response, Animation, Idle and Load, which respectively represent four different aspects of web application life cycle. And pointed out that the best performance indicators are: should be as fast as possible to respond to the user’s operation, preferably within 100ms; When displaying animations, each frame should be rendered in 16ms to maintain consistency and avoid stalling; To maximize idle time, when using the JS main thread, divide tasks into sections with execution times less than 50ms to free up threads for user interaction; Your site should load in less than 1s, up to 5 seconds.

Google also has some performance metrics based on user experience

FCP First Contentful Paint: First content Paint, the time when the browser First draws content from the DOM. The content must include text, images, non-white canvas or SVG, and text with a Web font being loaded. This is what users are seeing for the first time

Largest Contentful Paint (LCP) : Maximum content Paint, the maximum amount of time that the Largest content element in the visible area is displayed on the screen, used to estimate how long the main content of the page is visible to the user. Img images, cover of video elements, background loaded via URL, text nodes, etc. To provide a better user experience, the site should draw the maximum content within 2.5s or less.

FID First Input Delay: Input lag for the first time, and for the first time from the user to interact with the page to the browser actually able to respond to the interaction time, input lag because the browser’s main Cheng Zheng busy doing other things, so can’t respond to user, is a common cause of this occurring browser are busy to parse and execute JavaScript application of load calculation.

TTI Page Time to Interactive: the point at which the web page is first fully Interactive and the browser is continuously responding to user input. The Time at which the web page is fully Interactive is at the end of the last long task and the network and main thread are idle for the next 5 seconds. By definition, the Chinese term continuous interaction time or fluent interaction time is more appropriate.

TBT Total Block Time: The Total blocking Time, which measures the Total Time between the FCP and TTI within which the main thread is blocked long enough to prevent input responses. The main thread is considered blocked whenever there is a long task running on the main thread for more than 50 milliseconds

CLS Cumulative Layout Shift (CLS Cumulative Layout Shift) : CLS measures the Cumulative Layout Shift score of all the individual Layout shifts for each unexpected Layout Shift occurring in the entire page life cycle. It is an indicator scheme to ensure the visual stability of the page and improve the user experience.

Google decided the metrics were a little too many and whittled it down to three, Web Vitals, which was introduced in 2020. That the site as long as good loading performance LCP, interactive FID, visual stability CLS, basic performance can be

Inside the rotation also has its own precipitation of a set of performance measurement indicators, including self-developed according to the DOM weight calculation of FMP indicators, and then with the white screen time, second open rate, DOM loading time to evaluate the performance of a website.

How to monitor

Having metrics, then we need to monitor. You can monitor the performance status and trend of Web applications to locate bottlenecks and improve service stability. In addition, you can know the impact of a release on performance and the probability of service errors.

At present, the mainstream monitoring on the market is divided into two kinds, one is synthetic monitoring, one is real user monitoring.

Synthetic monitoring means running your page in a simulation scenario and extracting performance metrics to produce an audit report. Another kind is a real user monitoring: real user monitoring is a kind of application service, monitored web application through the SDK and other way to access the service, will be the real user access, interactive performance indexes such as data collection report to our logs, through the data on the server performance analysis report is formed after cleaning processing, finally on our monitoring platform to display

A comparison of two types of surveillance

By comparing the advantages and disadvantages of the two kinds of monitoring, it can be found that synthetic monitoring is more suitable for qualitative analysis under certain scenarios or monitoring with small amount of data with CI, while real user monitoring is more suitable for quantitative analysis, combining data and in-depth mining. At present, we have different implementations of these two kinds of monitoring, namely, our internal detection platform and performance platform, which complement each other to jointly complete the monitoring of performance indicators.

For real user monitoring performance platform before the public number has been related to the article, the next mainly introduced how to carry out synthetic monitoring

Synthetic Monitoring -Lighthouse

The most popular synthetic monitor is Google’s Lighthouse. Lighthouse is an open source automation tool for analyzing and improving the quality of Web applications. There are four ways to start Lighthouse: The Chrome Developer Tools, the Chrome Extension, the Node CLI, and the Node Module.

Running Lighthouse as one of the most commonly used and convenient developer tools produces the following report.

Although the report covers most of the indicators, this approach still has some limitations:

Unable to detect the page you want to log in to
There are too many and miscellaneous indicators to be customized and differentiated
There is no access platform to understand the overall situation

At present, most companies in the industry will choose to build a synthetic monitoring platform and solve the above problems programmatically through Node Module

Lighthouse Operation Process

Running Lighthouse programmatically requires a thorough understanding of the running process of Lighthouse

The architecture diagrams provided on the website break Lighthouse into four modules that are drivers, Gatherers, Audits, and Reports.

The Lighthouse Driver interacts with the browser using the Chrome DevTool Protocol, executes a series of commands, collects information during page loading using the Gatherers module, and generates artifacts. These artifacts are aggregated as input credentials to the Audit case logic in the Auditing phase, and output scores, optimizations, details, descriptions, reasons, presentation forms, errors, and other information through a defined set of custom Audit standards. Finally get a series of LHR statistics results and output UI report.

To solve the problem

So with that in mind, how does the running process of Lighthouse solve these problems?

The login problem

As we know, using Lighthouse directly to detect any page that requires a login state will ultimately output the result of the login page, which is obviously not what we want.

The most convenient and flexible way to access Puppeteer is to use the Chrome DevTool Protocol to simulate user login.

There are two ways to use the Puppeteer in Lighthouse. One is to start the browser with the Puppeteer and then return control to Lighthouse, or the other is to start the browser with Lighthouse/ Chrome-Launcher. Control is then returned to the Puppeteer. Here we take the first approach

personalized

The default detection indicator is only a general detection model. In actual situations, we need to develop different detection models according to different service forms. For example, on the PC side, we may write a lot of nested components due to the complexity of the logic, so the depth of dom building is an indicator to pay attention to; On mobile, where we tend to focus more on performance and experience, horizontal scrollbars are a good indicator to look at.

How to collect these indicators? The answer is in Lighthouse’s Gatherers module.

Each Gatherer inherits from the same parent class Gatherer, which defines three template methods, and the subclasses need only implement the template methods of interest

For example, if we want to collect the title of a page, we can do a Gatherer

After all the Gatherers have run, an intermediate Artifact artifact is generated, which Lighthouse can then use for subsequent analysis.

Check the initial detection model as follows

platform

This can be integrated according to your company’s CI/CD tools, by invoking the detection interface when deploying the service, and finally storing the JSON data output from the report

Assigned to the business

After developing this detection platform, the first thing to think about is how to combine with the business. At that time, when the company was promoting 618, there was a lot of demand for the operation personnel to use the company’s internal Rubik’s Cube system to build the activity page, so they developed a set of detection model specifically for the Rubik’s Cube business.

Rubik’s Cube system is mainly illustrated, so the focus is on the layout offset, size, quantity and error detection of the picture and other dimensions

The areas detected to be optimized (the red part) will be given a hint in the list page of the Rubik’s Cube system, urging relevant personnel to adjust and modify.

conclusion

Detection platform as an auxiliary detection system, when the relevant detection model is developed, it can facilitate us to quickly locate the performance problem. At present, we are also in the exploratory experiment stage, and interested students can exchange and learn from each other.

Why Is Performance So Important How can Ant Financial make the most of its front-end performance monitoring? How to build a performance detection system from 0 to 1