preface

In recent years, with the explosion of mobile business and the shrinking of Internet demographic dividend, users prefer sites with high efficiency and excellent experience. The speed of page “arrival” directly affects user experience, and “performance” becomes more and more important. According to The statistical observation of Google big data, the tolerance of users on mobile terminals for slow page loading is much lower than that on PC terminals. The delay of the first screen loading time from 2s to 3s will cause 9.4% PV decline, 8.3% jump rate increase and 3.5% conversion rate decrease. The commercial value of performance optimization is self-evident. We can’t watch the market colleagues’ hard work to “seek” the users in our hands, right? This is not allowed as a qualified former ying Tai B īn! Offer him!

Relationship between performance optimization and monitoring

Some companies also think the first thing is the “optimization”, a wave of performance optimization search on the net list of “best practices” to others do as some kind of, performance does in ascension, but is not necessarily the highest efficiency, but some put the cart before the horse, and other people’s problems are not necessarily their own problems, “parallel does not represent cause and effect”, to do performance tuning, The first thing to do is to master detailed performance index information, according to the barrel principle, find the most lagging-back index “short board”, focus on optimization, minimize the marginal cost. Therefore, the first thing we should do is to collect and analyze performance indicators.

Three methods of data collection

Indicators are collected in three ways:

1. Local simulation “Lab”

Most notably, Lighthouse from Google is an open-source automation tool that can be installed as a Chrome extension or run directly from the command line. It runs a series of tests against the target page and then outputs a score report on the page’s performance.

According to this report, one by one optimization can be targeted.

Advantages:

1. The scoring report is comprehensive and authoritative. 2. Major performance issues foundCopy the code

Disadvantages:

1. There is "fluctuation", and there are differences in visitor groups at different moments. Data can only reflect the "effect" at the current moment 2. The test environment is relatively single, and the environment of the user group is different, so it can not be generalizedCopy the code

2. Offline collection

The target page links are “delegated” to third-party services to execute real access instructions and collect performance indicators to generate report output.

This model is generated under the background of technology realization. Excellent tools are paid, of course, there are many excellent tools of open source and free, alicom is one of them, now offline. We can’t speculate on the cause of the outage, but hats off to those who are open source and providing free services! Here no longer recommended, so as not to have the suspicion of soft text.

3. Real tracking “Field is also called RUM”

Inject scripts on the target page, collect performance indicator data at a specified time, report data to the data center in a unified manner, integrate data centers to generate reports, and analyze performance based on the reports.

Advantages:

1. Comprehensive data, which can collect the performance of all users in various environments and generate an intuitive distribution map 2. The data is authentic and comes from real users. 3. Timely feedbackCopy the code

Disadvantages:

Data centers and scripts need to be independently developed, resulting in high costCopy the code

The so-called monitoring is actually “real performance tracking”. Although there are many dependencies, the feedback on performance indicators is the most real and effective. Here we will focus on performance monitoring.

Performance monitoring

How to evaluate the performance of a page?

How fast does the page load? Loading time X.X seconds?

In fact, good and bad, fast and slow, are very vague concepts. We often hear people say, “Our white screen has gone from 3s to 2s.” The problem with that statement isn’t that it’s untrue, it’s that it’s distorted. The white screen loading time varies greatly with different users, depending on device performance and network status. We can’t simply represent white screen times as a single number and ignore users who take too long to load. What’s more, the data collected are survivor-biased.

The following figure shows the distribution histogram of user white screen time in a certain period of time on a certain page. You can intuitively reflect the “performance” of this page.

We can say, “90% of users can load the page in 3s,” but the “mean” or “median” only reflects the performance of a particular point on the page, because no one time period represents the whole thing.

The goal of performance optimization can be to increase the proportion of users within 1s, or to concentrate the following users between 1s and 2s as much as possible. These need to be flexibly adjusted according to product characteristics and goals.

Indicators that need to be monitored

The metrics we need to collect and focus on should be metrics that accurately reflect the user experience.

experience performance indicators
Does it happen? Is navigation started successfully? Is the server responding? First Draw (FP)/First Content Draw (FCP)
Is it useful? Has enough content been rendered to interact with the user? First valid draw (FMP)/Hero element timing
Is it available? Can the user interact with the page, or is the page still busy loading? Interactive time (TTI)
Is it enjoyable? Is the interaction smooth and natural, with no lag or lag? Long tasks (there are technically no long tasks)

The following sequential screenshots visually show the metrics corresponding to user experience points for your understanding.

FP and FCP can be calculated through browser point API, while FMP and TTI have no standardized definition, so it is difficult to have standardized point API output, partly because it is difficult to define “effective” points in a general way.

In addition to the key indicators, we also need to pay attention to the basic indicators, so as to analyze the impact points that cause the “bad data” of the key indicators. For example, DNS resolution, TCP connection, network request, and resource loading are required.

Standardize defined metrics

Standardized definitions of indicators are as follows

PerformanceTiming

Performance.timing is a read-only property that returns the PerformanceTiming object that contains page-related Performance information.

  • StartTime (navigationStart): Timestamp for a previous page (not necessarily the same domain as the current page) to unload in the same browser context, or equal to the fetchStart value if there was no previous page unload

  • UnloadEventStart: Timestamp of the previous web page (same domain as the current page) unload, 0 if there is no previous web page unload or the previous web page is in a different domain than the current page

  • UnloadEventEnd: Corresponding to unloadEventStart, returns the timestamp when the callback function bound with the Unload event on the previous page has finished executing

  • RedirectStart: The time when the first HTTP redirect occurs. The value is 0 only when there is a redirect within the same domain name

  • RedirectEnd: The time when the last HTTP redirection is complete. The value is 0 only when there is a redirect within the same domain name

  • FetchStart: The time when the browser is ready to fetch a document using an HTTP request, before checking the local cache

  • DomainLookupStart: specifies the start time of DNS domain name query. If local cache (no DNS query) or persistent connection is used, the value is the same as the fetchStart value

  • DomainLookupEnd: time when the DNS domain name query is complete. If local cache (no DNS query) or persistent connection is used, the value is the same as the fetchStart value

  • ConnectStart: The time when the HTTP (TCP) connection is started. If the connection is persistent, this value is equal to the fetchStart value. If an error occurs at the transport layer and the connection is re-established, this is the time when the new connection is started

  • ConnectEnd: the time when the HTTP (TCP) connection is established (the handshake is completed). If the connection is persistent, this value is equal to the fetchStart value. If an error occurs at the transport layer and the connection is re-established, this value is displayed as the time when the newly established connection is completed

    Note: The handshake is complete, including the establishment of the security connection and the SOCKS authorizationCopy the code
  • SecureConnectionStart: HTTPS connection start time, if it is not a secure connection, the value is 0

  • RequestStart: the time when the HTTP request starts reading the actual document (the connection has been established), including reading from the local cache, and the time when the new connection has been established if the connection error reconnects

  • ResponseStart: The time at which the HTTP starts receiving the response (first byte is retrieved), including reading from the local cache

  • ResponseEnd: The time when the HTTP response has been fully received (the last byte fetched), including reading from the local cache

  • DomLoading: Parsing the time to render the DOM tree, document. readyState becomes loading, and readyStatechange events are thrown

  • DomInteractive: The time when parsing the DOM tree is complete, document. readyState changes to interactive, and readyStatechange related events are thrown

    Note: only the DOM tree has been parsed, and the resources within the page have not been loadedCopy the code
  • DomContentLoadedEventStart: after DOM parsing is complete, the starting time of the resources within the web page loading, document DOMContentLoaded event time

  • DomContentLoadedEventEnd: after DOM parsing is complete, within the web resource loaded time (JS) script loading has been completed, the end of the document DOMContentLoaded event time

  • DomComplete: When the DOM tree is parsed and the resources are ready, document. readyState becomes complete and the readyStatechange event is thrown

  • LoadEventStart: The load event is sent to the document, which is the time the LOAD callback function starts executing, value 0 if no LOAD event is bound

  • LoadEventEnd: The time when the callback function of the load event completes execution. If no load event is bound, the value is 0

See W3C Recommendation – NavigationTiming or W3C Editor’s Draft for more explanation.

Using the above API, we can calculate the performance basis for fineness

Basic indicators describe calculation note
rs Preparing a new page takes time fetchStart – navigationStart
rdc Redirection time redirectEnd – redirectStart
dns DNS Resolution Time domainLookupEnd – domainLookupStart
tcp TCP Connection Time connectEnd – connectStart
ssl SSL connection time connectEnd – secureConnectionStart This parameter is valid only in HTTPS mode
ttfb Time to First Byte (TTFB) : Indicates the network request Time responseStart – requestStart TTFB can be calculated in a variety of ways. ARMS is defined by Google Development
trans Data Transmission time responseEnd – responseStart
dom DOM Parsing Time domInteractive – responseEnd
res Resource Loading Time loadEventStart – domContentLoadedEventEnd Represents a synchronously loaded resource on a page
fbt The first package time responseStart – domainLookupStart
fpt First Paint Time, First render Time/white screen Time responseEnd – fetchStart The time difference between the start of the request and the time the browser starts parsing the first bytes of the HTML document
tti Time to Interact, Time to Interact for the first Time domInteractive – fetchStart The browser completes all THE HTML parsing and completes the DOM construction, at which point the browser starts loading the resources
load Page full load time loadEventStart – fetchStart Load = first rendering time + DOM parsing time + synchronous JS execution + resource loading time

For compatibility, please refer to the following figure. Most browsers already support this API, and it can be used on mobile devices with ease

Paint Timing

Relative to the above basic metrics, the ones that are most relevant to user experience are probably the “Paint” moments, FP (First Paint), FCP (First Contentful Paint), and FMP (First Meaningful Paint).

With the popularity of SPA (single-page systems), it’s difficult to accurately time individual paints with PerformanceTiming alone. Thankfully, Chrome 60+ brings us a new API, Paint Timing, which provides the ability to capture the time spent on “pages” and “resources”. This API is still experimental and is not included in the W3C standards, so it is only supported by browsers with older versions of the WebKit kernel. Better than nothing.

The Performance-getentries () method, like Paint, counts requests as an array for every object (script file, style sheet, image file, etc.) on a web page. We can use the method of the performance. GetEntriesByType (‘ paint ‘) easily won two PerformancePaintTiming object, corresponding FP and FCP is respectively.

More about FP and FCP, can refer to this article www.w3cplus.com/performance…

In terms of FMP, though, Chrome’s Performance tool indicates the timing of FMP, but still doesn’t provide an API, in part because it’s “unstandardized.”

About the FMP

Do you know FMP?

Execution point in time

Due to Performance, we do not need to load and execute our monitoring scripts at the “beginning” of the page. Even small files will block the drawing of the first screen. However, if there are dependencies, such as algorithms that rely on monitoring DOM changes, initialization should be done as soon as possible, so the timing should be before and after load. According to the needs of the algorithm to adjust, there is no accurate scheme.

The data processing

After the data is collected, it is data reporting and processing

There are many mature solutions for data reporting, including “active submission” and “reverse proxy”, as long as the data is fully reported to the data center without affecting service functions and performance.

With data, we can purposefully process the data into the form we need.

You can analyze the user distribution

Horizontal comparison, fine check difference cause and effect, sum up experience.

You can also expand by latitude and time to sniff out the impact of time or traffic on performance. You can also find outliers and filter out abnormal logs for special attention.

In short, the limited data can be played out of infinite possibilities!

These are some of the basics of performance monitoring that I hope will help you.


[1] User-centric performance metrics

[2] github.com/fengzilong/…

Please indicate the source for reprinting

By Zwwill Kiba

Zwwill/Blog# 31