preface

I’ve read a lot of articles about front-end optimization recently. When it comes to front-end optimization, that’s a big range. More articles may be based on a certain point to in-depth explanation. So take the opportunity to sort out a set of basic outline of front-end optimization, on the one hand, to let oneself precipitation of these knowledge, on the other hand, it is convenient for follow-up optimization when there is a list to compare with the optimization.

Front-end optimization scheme

Here’s an interview question: “How do you do front-end optimization?” So what does a relatively complete front-end optimization solution look like?

In the past experience, I think: first you need to know which pages need to be optimized, how to optimize, and then targeted optimization. So I will divide front-end optimization into the following steps:

  • Identify which pages need to be optimized
    • Establish monitoring system
    • Determine monitoring indicators
    • Analyze the page based on monitoring information
  • Targeted optimization
    • Resource optimization
    • Build optimization
    • Transmission optimization
    • Network optimization

Identify which pages need to be optimized

Establish monitoring system

The first step is to establish a front-end performance monitoring system. There are two front-end performance monitoring schemes:

  1. Mature services provided by third parties: such as Ali Cloud ARMS, Tianyun, Monitoring treasure, etc
  2. Independent set

There is no need to describe the services provided by various third-party platforms, but mainly explain the realization ideas of independently building a front-end performance monitoring platform.

Determine collection index

In terms of front-end monitoring, we generally focus on two directions, performance and stability. So we need to collect about four indicators.

  • RUM (Real User Monitoring) indicators: including FP, TTI, FCP, FMP, FID, MPFID.

  • Navigation Timing: includes indicators of DNS, TCP, DOM parsing, etc.

  • JS Error: After parsing, it can be broken down into runtime exceptions and static resource exceptions.

  • Request exceptions: Collect Ajax request exceptions.

Then let’s analyze each indicator one by one. How to collect them

RUM indicators

Core Web metrics are a subset of Web metrics that apply to all Web pages, should be measured by every Web developer, and will be displayed in all Google tools. Each core Web metric represents a different aspect of the user experience that can be realistically measured and reflects the real experience of user-centered key outcomes.

The composition of core Web metrics will evolve over time. The current metrics structure focuses on three aspects of user experience: loading performance, interactivity, and visual stability

It mainly includes the following indicators (and corresponding thresholds of each indicator) :

LCP, FID, CLS

Largest Contentful Paint (LCP) measures loading performance. To provide a good user experience, the LCP should occur within 2.5 seconds after the page first starts loading.

First Input Delay (FID) : First Input Delay, which records the Delay of the First user interaction during page loading. The FID metrics affect a user’s first impression of the interactivity and responsiveness of a page.

The Cumulative Layout Shift (CLS) measures visual stability. To provide a good user experience, a page’s CLS should be 0.1 or less.

For LCP, FID and CLS, we can directly use Web-Vitals for collection, and the collection code is as follows

import { getLCP, getFID, getCLS } from 'web-vitals';

getLCP((data) = > console.log('LCP', data))
getFID((data) = > console.log('FID', data))
getCLS((data) = > console.log('CLS', data))
Copy the code

In addition to the above three indicators, we can also monitor through the following indicators

FP, FCP

First Paint Time (FP) : First Paint is the point in time at which the First rendering takes place.

First Content Rendering Time (FCP) : The First Contentful Paint is the point in time when content is First rendered.

These two indicators look pretty much the same, but FP must have occurred earlier than FCP. FP refers to the pixels that are drawn. For example, if the background color of the page is gray, then FP is recorded when the gray background is displayed. However, at this time, DOM content has not started to draw, which may require file download, parsing and other processes. Only when DOM content changes, such as rendering a paragraph of text, FCP indicators will be recorded. Therefore, we can regard these two indicators as indicators related to the white screen time, so it must be the fastest the better.

According to the official recommended time, we should compress FP and FCP to 2 seconds.

The collection code is as follows

window.performance.getEntriesByType('paint')
Copy the code

TTI

Fully interactive Time (TTI) : Time to interactive. It records the Time from the Time the page loads to the Time the page is fully interactive.

The rules for obtaining TTI are as follows:

  • Start with the first Content drawing (FCP)

  • Search forward 5 seconds, no long tasks, and no more than 2 GET requests

  • Search back for the last long task (task that took more than 50 milliseconds) and stop at FCP if no long task is found

  • TTI time is the time of the last long task, or FCP time if there is no long task

Google would like to standardize TTI metrics and expose them in browsers through PerformanceObserver, but they are not currently supported.

Currently only one Polyfill can detect the current TTI, which works with all browsers that support the Long Tasks API.

According to the official recommended time, we should keep the TTI under 3.8 seconds

The collection code is as follows

import ttiPolyfill from 'tti-polyfill'

// collect the longtask
if (PerformanceLongTaskTiming) {
  window.__tti = { e: []};const observer = new PerformanceObserver((list) = > {
    for (const entry of list.getEntries()) {
      // observe the longtask to get the time to interactive (TTI)
      if (entry.entryType === 'longtask') {
        window.__tti.e.concat(entry); }}}); observer.observe({entryTypes: ['longtask']}); } ttiPolyfill.getFirstConsistentlyInteractive().then((tti) = > {
  console.log('TTI', tti)
});
Copy the code

Navigation Timing

Here we pay attention to the main indicators are

  • DNSQuery takes
  • TCPLinks to take
  • requestTime consuming
  • parsingDOMThe tree takes
  • Bad time
  • domreadytime
  • onloadtime

The way to get them is simple, we can get them directly through PerformanceTiming

window.addEventListener('load'.() = > {
  setTimeout(() = > {
    const timing = window.performance.timing;
    console.log('DNS query time: ', timing.domainLookupEnd - timing.domainLookupStart)
    console.log('TCP connection Time: ', timing.connectEnd - timing.connectStart)
    console.log(Request time:, timing.responseEnd - timing.responseStart)
    console.log(DOM tree parsing time:, timing.domComplete - timing.domInteractive)
    console.log('White screen Time:', timing.domLoading - timing.fetchStart)
    console.log('DomReady time:', timing.domContentLoadedEventEnd - timing.fetchStart)
    console.log('onload time: ', timing.loadEventEnd - timing.fetchStart)
  }, 0);
})
Copy the code

JS Error

In collecting JS errors, we can use the two events of Error and unhandledrejection to catch JS errors and rejection exceptions that Promise does not handle.

The following is a simple code

window.onerror = (errorMsg, url, lineNumber, columnNumber, errorObj) = > {
  let errorStack = errorObj ? errorObj.stack : null;
  // Report here
  console.log(errorMsg, url, lineNumber, columnNumber, errorStack)
};

window.onunhandledrejection = (e) = > {
  let errorMsg = "", errorStack = "";
  if (typeof e.reason === "object") {
    errorMsg = e.reason.message;
    errorStack = e.reason.stack;
  } else {
    errorMsg = e.reason;
  }
  // Report here
  console.log(errorMsg, errorStack)
}
Copy the code

Abnormal request

To catch request exceptions, we can override window.fetch and XMLHttpRequest.

Analyze the page based on monitoring information

The above is just a primer on the acquisition method of indicators. In practice, the construction of a set of front-end monitoring system is a huge project, not only in the data collection, but also in many cases we need data classification, data cleaning, data presentation and so on. For example: when collecting information, because the detection data may be very large, we need to make localized storage to facilitate data compression and merger, and then report. Message queue may be adopted to prevent reported loss, and Navigator. SendBeacon can entrust the browser to report data after the page is closed, etc.

Once we have a complete monitoring system, we can then optimize the page based on the monitoring indicators given above. There is no single answer to which page needs to be optimized, it all depends on your business and your input/output ratio.

Here’s a suggested optimization goal: be at least 20% faster than your fastest competitor

According to user psychology research, if you want your web site to be perceived as faster than your competitors, you have to be at least 20 percent faster. Study your main competitors, collect their performance metrics on mobile devices and desktops, and set thresholds to help you outperform your competitors. However, to get accurate performance metrics and set goals, it is important to first understand the user experience through research and analysis. Then you can simulate testing based on feedback and experience from 90% of your primary users.

Targeted optimization

Resource optimization

Use Brotli for plain text compression

In 2015, Google launched Brotli, a new open source lossless data format supported by all modern browsers.

Brotli has a higher compression rate than Gzip and Deflate, but it also takes longer compression times, so real-time compression on request is not a good idea. However, we can pre-compress static files and provide them directly to the client. This way we can avoid Brotli’s low compression efficiency and use the highest compression quality to compress files, minimizing the size of the file.

In addition, not all browsers support Brotli algorithm, so we need to provide two kinds of files on the server side, one is the file compressed by Brotli, and the other is the original file. If the browser does not support Brotli, we can use gzip to compress the original file for the client.

Brotli can be used for any plain text content such as HTML, CSS, SVG, JavaScript, etc.

Use Brotli + Gzip to precompress static resources with the highest compression ratio configuration, and use Brotli to configure levels 3 to 5 to compress HTML quickly. Ensure that Brotli or Gzip content negotiation headers are handled correctly by the server.

Use responsive images and WebP

WebP images is a new image format developed by Google. Compared with PNG and JPG, the size of WebP image is reduced by about 30% under the same visual experience. In addition, WebP image formats support lossy compression, lossless compression, transparency, and animation. In theory, it can completely replace PNG, JPG, GIF and other image formats, but currently WebP is not fully supported, but we can still use it through the bottom of the scheme.

Whenever possible, use responsive images with SRCset, SIZES and elements. It is also possible to use the WebP format with elements and JPEG pockets.

But WebP is not without its drawbacks, it does not support progressive rendering like JPEG, which is why users using JPEG may see the actual image faster! Although WebP images may load faster on the network. With JPEG we can give the user a “decent” experience in half or even a quarter of the time, and load the rest of the data later, instead of half-empty images like WebP. So it depends on what we want: with WebP, we reduce the image size; Using JPEG will improve the perception of the image.

Is the image properly optimized?

Loading images quickly is very important in our project, so be careful when using images:

  • Make sure JPEG is rendered incrementally and compressed using mozJPEG or Guetzli.

  • Use Pingo to compress PNG

  • Compress SVG using SVGO or SVGOMG

  • Lazy loading of images or iframes

  • Use looping video or WebP instead of GIfs if possible

[usingffmpeggifintoMP4Post-performance optimization]

Is the web font optimized?

Most of the time, the font packs we use don’t need to fit all the text, especially if Chinese fonts tend to be 10M+ in size. so

  • If possible, we can subset the font (subfont can be used to help analyze the project)
  • usepreloadTo preload fonts
  • Cache fonts if necessaryService WorkerIn the

Build optimization

Are you using tree-shaking, scope binding, and code-splitting?

  • Tree-shaking is a way to clean up useless dependencies in build packages, so that the build results only contain code that is actually used in production, and eliminate unused imports in Webpack. We can also implement Scope collieries with Webpack and Rollup, both of which can detect where the import chain can be terminated and converted into an inline function without breaking the code. With Webpack, you can also use JSON Tree Shaking

  • Code-spliting is another feature of Webpack that splits your code into chunks to load on demand. Not all JavaScript must be downloaded, parsed, and compiled immediately. Once split points are defined in the code, Webpack can handle dependencies and output files. It allows the browser to keep the initial downloads small and request code on demand as the application requests it.

  • Consider using the preload-webpack-plugin, which allows browsers to preload chunks of separated code using or depending on how your code is delimit. Webpack inline directives also provide some control over preload/prefetch (but be aware of priority issues).

Can I pull JavaScript out of the Web Worker?

In order to shorten the time consuming of interactive time, it is better to extract the JavaScript with heavy computation to the Web Worker or cache it through the Service Worker. Because DOM manipulation runs on the main thread in conjunction with JavaScript. Using a Web worker, you can move these expensive operations to other threads running in the background. You can use the Web Worker to pre-load and store some data for later use when needed. Comlink can be used to simplify communication with Web workers.

Can you pull frequently executed functions out of WebAssembly?

We can strip the heavy computing out of WebAssembly (WASM), which is a binary instruction format designed to be a portable object compiled in high-level languages such as C/C ++ / Rust. Most modern browsers already support WebAssembly, and as function calls between JavaScript and WASM become faster, this approach will become more feasible. The purpose of WebAssembly is not to replace JavaScript, but to supplement iT when you find that JavaScript is better suited for most Web applications when CPU usage is too high, WebAssembly is best suited for computationally intensive Web applications, such as Web games.

Here is a comparison of JavaScript and WebAssembly processing:

Do you use module/nomodule in JavaScript?

We just want to send the necessary JavaScript over the network, but that means a lot more focus and detail on the delivery of these resources. The idea of module/nomodule is to compile and provide two separate JavaScript packages: “Regular” builds are built in such a way that one contains Babel transformations and Polyfills and is only provided to older browsers that actually need them, and another package (the same functionality) does not contain Babel transformations and Polyfills.

The JS Module (also known as ES Module, ECMAScript Module) is a major new feature, or series of new features. You may have used a third party module loading system. CommonJs such as nodeJs, AMD such as requireJs and so on. These module-loading systems all have one thing in common: they allow you to perform import and export operations.

Browsers that recognize the type=module syntax ignore scripts with the nomodule attribute. That is, we can use some scripts for browsers that support Module syntax and provide a nomodule script for browsers that don’t.

Files built with Type =module are 30 to 50 percent smaller than those built normally, and you can expect browser performance optimizations for the new syntax.

Identify and remove unused CSS/JS

The CSS and JavaScript code Coverage tools in Chrome let you know what code executes or applies and what doesn’t. We can launch a coverage check and see the coverage results. Once unused code is detected, identify those modules and use import() for lazy loading. Then repeat the code coverage check to confirm that less code is now loaded during initialization.

You can use Puppeteer to collect code coverage, and there are many other uses for Puppeteer, such as monitoring unused CSS at every build.

In addition, Purgecss, UnCSS and Helium can help you remove unused styles from your CSS.

Trim the JavaScript package size

Add dependency package reviews to your daily workflow. Replace some of the larger libraries you might have added years ago with smaller, lighter ones, such as day.js instead of moment.js

Using a tool like Bundlephobia can help you understand the costs of adding an NPM package. Size-limit not only checks the size of the package, but also shows how long the JavaScript is running.

Use optimizations for the target JavaScript engine.

Take a look at which JavaScript engines dominate your user base and explore ways to optimize them. For example, when optimizing for THE Blink browser, V8 used in the Node.js runtime, and Electron, script flows can be used to handle the overall script.

Script flow optimizes parsing of JavaScript files. Previous versions of Chrome took a simple approach, downloading the script in its entirety before starting to parse it, but didn’t fully utilize the CPU until the download was complete. Starting with version 41, Chrome parses asynchronous and deferred scripts on a separate thread as soon as the download starts. This means that parsing can be done within milliseconds of downloading and improves page load times by up to 10%. This is especially effective for large scripts and slow network connections.

Once the download starts, the script flow allows async or defer scripts to be parsed on a separate background thread, so page load times can be reduced by up to 10% in some cases. Furthermore, using Script defer in the header allows the browser to discover the resource earlier and then parse it in the background thread.

Warning: Opera Mini does not support script delay, so if your primary user is Opera Mini, defer will be ignored, resulting in rendering being blocked until the script completes.

Client-side rendering or server-side rendering?

Client-side rendering or server-side rendering? It all depends on the performance of the application. The best way to do this is to set up some sort of progressive guidance: use server-side rendering to get the first meaningful graph (FCP) quickly, and include some minimal volume of JavaScript required, with the interaction time (TTI) as close as possible to the rendering of the first meaningful graph. If the JavaScript executes too late after the FCP, the browser locks the main thread while parsing, compiling, and executing the JavaScript that is later executed, reducing the interactivity of the site or application.

To avoid this, it is important to break the execution of the function into separate asynchronous tasks and use requestIdleCallback whenever possible. Use WebPack’s dynamic import() support to lazily load portions of the UI, avoiding the cost of loading, parsing, and compiling until the user actually needs them.

Once in the interactive state, we can start non-essential parts of the application on demand or as time permits. However, frameworks generally do not provide developers with a simple concept of priority, so implementing step-by-step startup is not easy for most libraries and frameworks.

Let’s take a look at some of the current rendering mechanisms:

  • Full Client Rendering (CSR)

    All the logic, rendering, and startup is done on the client. The result is usually an increased interval between TTI and FCP. Because the entire application must start on the client to render anything, the application feels sluggish. SSR is generally faster than CSR. But for many applications, CSR is the most common implementation.

    Attached is the link diagram of traditional CSR:

  • Full Server-side Rendering (SSR)

    Server rendering responds to navigation by generating complete HTML for pages on the server. This avoids the other round-trip process of data retrieval and templated on the client because it is processed before the browser gets the response.

    The difference between FP and FCP is usually small, and running the page logic and rendering on the server avoids sending a lot of JavaScript to the client, which helps achieve fast interactionable times (TTI). And you can stream HTML to the browser and render the page immediately. However, we took longer to parse, resulting in a longer time for the first byte to reach the browser (TTFB), and we didn’t take advantage of the responsive capabilities and other rich features of modern applications.

  • Static Site Generation (SSG)

    Static web site generation is similar to server-side rendering, except that the page is rendered at build time rather than on request. Unlike server rendering, it can also be consistently fast to the first byte time (TTFB) because it does not have to dynamically generate HTML for the page. Typically, static rendering means generating separate HTML files for each URL ahead of time. With pre-generated HTML responses, static renderers can be deployed to multiple CDNS to take advantage of edge caching. Therefore, we can quickly display the page and then pre-fetch the SPA frame for subsequent pages. However, this approach only works in scenarios where page generation does not depend on user input.

  • Encounter server Rendering with (Re) (SSR + CSR)

    Tells is translated as Hydration. Is not a face meng force! In human terms, the process of re-rendering previously rendered HTML is called hydration.

    Navigation requests, such as full page loading or reloading, are handled by the server, which renders the application as HTML and then empresses JavaScript and data for rendering into the generated document. Ideally, quick FCP can be achieved like server rendering, which is then patched up by re-rendering on the client using a technique called (Re)Hydration.

    React lets you use the ReactDOMServer module on Node and then call renderToString to generate the top-level components as static HTML strings. With Vue, we can use vue-server-renderer and call renderToString to render the Vue instance into HTML.

    This approach also has its downsides, we do retain full flexibility on the client side while providing faster server-side rendering, but the gap between FCP and TTI is getting larger, and FID has increased. Chocolate is very expensive, and SSR pages with Hydration often look deceptive and interactive, but don’t actually respond to input before executing client-side JS and attaching event handlers.

    Note that bundle.js is still full CSR code, and the page is not truly interactive until the code executes. Therefore, while FP (First Paint) improves in this mode, TTI (Time To Interactive) may slow down because the page cannot respond To user input (blocked by JS code execution) until the client finishes rendering the second Time.

    For issues where secondary rendering makes the interaction unresponsive, possible optimization directions are incremental rendering (e.g. React Fiber) and progressive rendering/partial rendering

  • Sports Streaming Server Rendering using Gradual (Re) (SSR + CSR)

    To minimize the gap between TTI and FCP, we can make multiple requests and send the content wholesale when it is generated (the response body returned is a stream). Thus, we don’t have to wait for the full HTML string before sending the content to the browser, and we can shorten the first byte time (TTFB).

    In React, we can use renderToNodeStream instead of renderToString to pipe back responses and send HTML in chunks. In Vue, we can use renderToStream to implement pipes and streams. With the advent of React Suspense, asynchronous rendering can be used for the same purpose.

    On the client side, instead of starting the entire application at once, we start the components step by step. The piece-by-piece decomposition of the application is first put into a separate script and then “activated” gradually (in order of priority). We can activate the key components first and activate the rest later. You can then define client-side or server-side rendering for each component. We can then delay the activation of certain components until they are visible or needed for user interaction or the browser is idle.

    For Vue, using Chocolate when interacting with users or vue-lazy-Hydration, which can define the timing of component visibility or activate components when interacting with specific users, reduced interaction time for SSR applications. You can also implement partial variations with Preact and Next-js.

    The ideal streaming server rendering flow is as follows:

    1. requestHTMLThe server returns to the skeleton screen firstHTML, and then return the required data, or with dataHTML, and finally closes the request.
    2. requestjs.jsAfter you return and execute, you can interact.

  • Client prerender

    Similar to server-side prerendering, but instead of rendering pages dynamically on the server, the application is rendered as static HTML at build time.

    Use the renderToStaticMarkup method instead of the renderToString method during the build process to generate a static page without attributes like data-reactid, which is preloaded with the master JS and any routes that might be used later. In other words, pre-render is exactly what the page was when it was packaged. Once the JS is downloaded and executed, the page will be rendered again if there are any data updates on the page. This creates an illusion of data delay.

    The result is less TTFB (first byte arrival time) and FCP time, and a shorter interval between TTI and FCP. You can’t use this method if you expect the content to change a lot. In addition, all urls must be known in advance to generate all pages.

  • Tripartite isomorphic rendering

    Tripartite isomorphic rendering may also come in handy if a Service Worker is available. This technique involves using a streaming server to render the initial page and then taking over the HTML rendering after the Service Worker loads. This keeps cached components and templates up to date, and enables navigation like a single page application to pre-render new views in the same session. This works best when the same template and routing code can be shared between the server, the client page, and the Service Worker.

Tripartite isomorphic rendering using the same code in three locations: on the server, in the DOM, or in the service worker.

The technical spectrum of server-side rendering to client-side rendering.

As for how to choose, here are some half-baked suggestions:

  1. rightSEONot very demanding, at the same timeThere are many requirements for operationProject, such as some management background system, recommended useCSR. Because only after the executionbundleThen the page can interact, it doesn’t make much sense to just see elements but not interact, andSSRThere are additional development and maintenance costs.
  2. If the pageNo data, or thePure staticPage, recommended useSSG. Because this is a way to build pages with preview packaging, it doesn’t burden the server.
  3. rightSEOThis parameter is recommended when there are large requirements and a large number of page data requestsSSR.

The HTTP cache header is set correctly

Double check that expires, max-age, cache-Control, and other HTTP cache headers are set correctly. In general, resources can be cached for a short time or indefinitely, and their version can be changed in the URL as needed.

Cache-control: immutable is used to indicate that the response body does not change over time. Resources do not change on the server If they are not expired, so the browser does not send cache validations (such as if-none-match or if-modified-since) to check for updates, even If the user refreshes the page.

We can specify the Cache time using the cache-control response header, for example, cache-control: max-age=60. After 60 seconds, the browser will retrieve the resource again, but this will cause the page to load slowly. So we can avoid this problem by using stale-while-revalidate; For example: cache-control: Max -age=60, stale-while-revalidate=3600 After 3660 seconds, it is completely out of date, and it is time for traditional synchronous resource retrieval.

In June-July 2019, Chrome and Firefox began support for HTTP cache-control stale-while-revalidate, which can speed up subsequent page loading since expired assets no longer clog rendering. Effect: RTT is zero for reconnected views.

RTT: Short for Round Trip Time.

In addition, ensure that no unnecessary headers are sent (e.g. X-powered-by, pragma, X-UA-compatible, Expires, etc.) and ensure that messages contain useful Security and performance related headers (e.g. Content-security-policy, X-xss-protection, x-content-type-options, etc.) Finally, note the performance cost of CORS requests in a one-page application.

Transmission optimization

Is the JavaScript library loaded asynchronously?

When the user requests a page, the browser gets the HTML construct DOM, gets the CSS construct CSSOM, and then generates a render tree by matching the DOM and CSSOM. But browsers delay rendering pages whenever JavaScript needs to be parsed. So as developers, we must explicitly tell the browser to start rendering the page immediately. You can do this by adding defer and Async properties from the HTML to the script.

However, we should defer instead of Async. Scripts using Async are executed as soon as they are retrieved. If the async script gets very fast, it will actually block HTML rendering while the script is in cache ready state. With defer, the browser does not execute the script until the HTML has been parsed. Therefore, unless you need to execute JavaScript before you start rendering, it is best to use defer.

Also, limit the influence of third-party libraries and scripts, especially the use of social share button SDKS and iframe tags such as maps. You can prevent JavaScript library bloat by using the size-limit library: If you accidentally add a large dependency, the tool will notify you and throw an error.

Use IntersectionObserver and image lazy loading

As a general rule, we should lazily load all the performance-wasting components, such as large JavaScript, videos, iframes, widgets, and potentially loaded images. For example, Native lazy-loading helps us lazily load images and iframes.

Native lazy-loading is the img tag and iframe tag of the browser that support lazy loading.

According to the test: you need to set width and height in the IMG tag to support lazy loading.

The most efficient way to lazily load scripts is to use the Intersection Observer API, which asynchronously observes changes in the Intersection between a target element and an ancestor element or a document’s viewport. We need to create a IntersectionObserver object, which receives a callback function and the corresponding parameters, and then we add an observation target. The following

let options = {
  root: document.querySelector('#scrollArea'),
  rootMargin: '0px'.threshold: 1.0
}

let observer = new IntersectionObserver(callback, options);

let target = document.querySelector('#listItem');
observer.observe(target);

// Threshold = 1.0 means that the callback will be executed when the target element is fully visible in the element specified by the root option.
Copy the code

The callback is executed when the target becomes visible or invisible, so when it intersects the viewPort, we can do something before the element becomes visible. So, we have fine-grained control over when the observer callback is invoked through rootMargin (the margin around the root) and threshold (a number or set of numbers that represent the percentage of visibility of the target).

Progressive loading of images

We can take lazy loading to a new level by using progressive image loading in pages. Similar to Facebook, Pinterest, and Medium, we can load low-quality or even blurry images and then replace them with the full, high-quality version using LQIP (low-quality image placeholder) technology as the page continues to load.

With lazy loading, we can use the off-the-shelf library: lozad.js

To give the simplest demonstration code:

<img data-src="https://assets.imgix.net/unsplash/jellyfish.jpg?w=800&h=400&fit=crop&crop=entropy"
          src="https://assets.imgix.net/unsplash/jellyfish.jpg?w=800&h=400&fit=crop&crop=entropy&px=16&blur=200&fm=webp"
>
<script>
    function init() {
        var imgDefer = document.getElementsByTagName('img');
        for (var i=0; i<imgDefer.length; i++) {
            if(imgDefer[i].getAttribute('data-src')) {
                imgDefer[i].setAttribute('src',imgDefer[i].getAttribute('data-src')); }}}window.onload = init;
</script>
Copy the code

Did you send the key CSS?

To ensure that the browser starts rendering the page as soon as possible, including only the CSS required to render the visible portion of the first screen is called “critical CSS”. Inline it in the tag of the page to reduce round-trip request traffic. Because of the limited size of packets exchanged by TCP during the slow start phase, the size of the critical CSS should not exceed 14KB. (This particular restriction does not apply to TCP BBR, but it is always a good idea to prioritize critical resources and load them early). If you go beyond this limit, the browser will need additional transport round-trips for more variety.

Here’s a quick overview of the aboveSlow startTCP BBR, briefly describes several TCP congestion avoidance algorithms

The original IMPLEMENTATION of TCP is to send large packets to the network after the connection is established successfully. If the network has problems, many such large packets will accumulate on the router, which is easy to exhaust the cache space of the router on the network, resulting in congestion. So the TCP agreement, new established connection will not be able to send large size data from the start, but to start with a small size package to send, in the process of sending and data confirmed by the other party receiving rate to calculate each other, to gradually increase the volume of data every time send finally reached a steady value, enter the stage of high speed transmission. In the process of slow start, the TCP channel is in the low-speed transmission phase. That strategy is slow start.

  1. Slow start algorithm

The idea of the slow start algorithm is to add a Congestion Window to the sender, denoted as CWND.

The congestion window refers to the maximum number of MSS (maximum packet segment size) that can be sent without receiving an ACK from the peer end. The congestion window is a value maintained by the sender and is not advertised to the peer like the receiver window (RWND). The size of the sender window is the minimum of CWND and RWND. The current Linux congestion window starts at 10 MSS.

For the slow start algorithm, the CWND becomes twice as large as before after each RTT. The sender starts by sending initcwnd segments (assuming there are no restrictions on the receiver window) and then waits for an ACK (an acknowledgement character, a transport control character sent from the receiving station to the sending station in data communication). Indicates that the data sent has been received correctly. . When this ACK is received, the congestion window expands to initcwnd*2, that is, initcwnd*2 packet segments can be sent. When an ACK for this outgoing segment is received, the congestion window continues to expand to initcwnd*4, an exponentially increasing relationship.

  1. Congestion avoidance algorithm

Congestion avoidance algorithm and slow start algorithm are two different algorithms, but they are both designed to solve congestion. In practice, the two algorithms are usually implemented together. Compared with the slow start algorithm, the congestion avoidance algorithm maintains one more slow start threshold ssthRESH.

When CWND < SSTHRESH, the congestion window grows exponentially using a slow start algorithm. When CWND > SSTHRESH, the congestion window uses the congestion avoidance algorithm and grows linearly.

Each time the congestion avoidance algorithm passes through an RTT, the congestion window adds initcwnd.

When congestion occurs (timeout or duplicate ACKS are received), RFC5681 considers that ssthRESH needs to be set to half of unacknowledged packets, but not less than two MSS. Also, CWND is set to initcwnd if congestion is caused by timeout.

Timeout retransmission has a serious impact on transmission performance. Reasons: 1. In RTO(The retransmission timeout period is calculated from the time when the data is sent. After the timeout period, retransmission will be performed.) phase can not transfer data, equivalent to a waste of time; Two, the sharp reduction of congestion window, equivalent to the next much slower transmission.

  1. Fast retransmission algorithm

Sometimes congestion is mild, only a few packets are lost, and subsequent packets arrive normally. When subsequent packets arrive at the receiver, the receiver will find that the Seq number is larger than expected, so it will Ack the expected Seq number every time it receives a packet to remind the sender of retransmission. When the sender receives three or more repeated acknowledgments (Dup Acks), it realizes that the corresponding packet has been lost and immediately retransmits it. This process is called fast retransmission.

Why do we have to have three? This is because network packets are sometimes out of order, and out-of-order packets can also trigger repeated ACKS, but retransmission for out-of-order purposes is not necessary. Since the distances of out-of-order generally do not vary much, e.g., packet # 2 May run behind packet # 4, but it is unlikely to run behind packet # 6, limiting the number to 3 or more can largely avoid triggering fast retransmissions due to out-of-order.

In addition, there is another problem. If we lost the packets of No.2 and no.3, but the packets of no.4, no.5, No.6 and No.7 were received normally, and ACK 2 was triggered for three times. If the sending end receives multiple ACK packets and considers that packet loss occurs, TCP retransmits the subsequent packets from the last confirmed packet. In this way, packets that have been correctly transmitted may be sent repeatedly, which degrades TCP performance. To improve the situation, SACK (selective acknowledgement) technology was developed, using the SACK option to inform the employer of what data was received. Upon receipt of this information, the employer would know which data was missing and then immediately retransmit the missing part.

  1. Fast recovery algorithm

If fast retransmission occurs during the congestion phase there is no need to handle the congestion window as timeout retransmission does. The sender now assumes that the network may not be congested, given that it would not receive multiple duplicate acknowledgements if the network were congested. Therefore, instead of executing the slow start algorithm, the CWND is set to the value after ssTHRESH is halved, and then the congestion avoidance algorithm is executed to make the CWND grow slowly. This process is called fast recovery.

Conclusion:

  • Timed retransmission has the greatest impact on performance because no data can be transferred during the RTO, and the congestion window is drastically reduced. Timeout retransmissions should be avoided as much as possible.

  • Packet loss affects very small files more severely than large files, because small files may not be able to trigger three repeated ACKS, resulting in fast retransmission.

  • When using the fast recovery algorithm, the slow start algorithm is used only when the TCP connection is established and the network times out.

TCP BBR is a new TCP congestion control algorithm published by Google, which has been widely used inside Google and released with Linux 4.9.

BBR is an acronym for bottleneck bandwith and round-trip Propagation time (BBR). Congestion control is implemented by detecting bandwidth and RTT. The main characteristics of BBR algorithm are as follows:

  1. BBR doesn’t take packet loss into account because packet loss (in this day and age) is not necessarily a sign of network congestion
  2. BBR depends on real-time detection bandwidth and RTT to determine the size of the congestion window: Window size = bandwidth * RTT

Try regrouping your CSS rules

According to the research of CSS and Network Performance, splitting CSS files according to media query conditions may improve our page Performance to some extent. This way, the browser uses high priority to retrieve key CSS and low priority to process everything else.

Such as:

<link rel="stylesheet" href="all.css" />
Copy the code

We put all the CSS in one file, and the browser handles it like this:

<link rel="stylesheet" href="all.css" media="all" />
<link rel="stylesheet" href="small.css" media="(min-width: 20em)" />
<link rel="stylesheet" href="medium.css" media="(min-width: 64em)" />
<link rel="stylesheet" href="large.css" media="(min-width: 90em)" />
<link rel="stylesheet" href="extra-large.css" media="(min-width: 120em)" />
<link rel="stylesheet" href="print.css" media="print" />
Copy the code

When we split it into queries by media, the browser looks like this:

Avoid using @import in CSS files because it works in a way that affects parallel downloads in browsers. However, currently we use SCSS and less, which include @import files directly in CSS without generating additional HTTP requests.

Also, do not place before async code snippets. If JavaScript scripts do not depend on styles, consider placing asynchronous scripts above styles. If there are dependencies, you can split the JavaScript into two parts and load them on either side of the CSS.

Dynamic styles can also be costly, although because React performs well, this usually only happens when a large number of composite components are rendered in parallel. According to The study of The Unseen Performance costs of modern CSS-in-JS libraries in React Apps, Components created with CSS-in-JS may take twice as long to render as regular React components. So when applying CSS-in-JS, you can use the following solutions to improve your application performance:

  1. Do not overgroup nested style components: That would make it possibleReactWith fewer components to manage, rendering can be done faster
  2. Prefer “static” components: someCSS-in-JSThe library will be in yourCSSNo dependency subject orpropsOptimize its execution. The more “static” your tag template is, the more yoursCSS-in-JSThe more likely it is that the runtime will execute faster.
  3. Avoid invalid React rerenders: Make sure to render only when needed, this can be avoidedReactCSS-in-JSRuntime work of the library.
  4. Whether the zero runtime CSS-in-JS library can be applied to your project: Sometimes we choose to live inJSIn the writingCSSBecause it does provide some great developer experiences, and we don’t need to access additional onesJS API. If your application doesn’t need support for themes, you don’t need to use a lot of complex onesCSS props, then zero run-timeCSS-in-JSLibraries may be a good choice. Using zero runtime libraries you can get from yourbundleFile reduced by 12KB as mostCSS-in-JSLibraries range in size from 10KB to 15KB, while zero runtime libraries (e.glinaria) less than 1KB.

The preheated connection is used to speed up transmission

Save time by using resource hints:

Dns-prefetch: prompts the browser to perform DNS query and protocol handshake for the resource before the user clicks the link.

Preconnect: Provides a prompt to the browser, suggesting that the browser opens the connection to the linked site in advance, without giving away any private information or downloading any content, so that the content of the link can be retrieved more quickly when following the link.

Prefetch: prompts the browser that the user may need to load target resources in the future. Therefore, the browser may obtain and cache corresponding resources in advance to optimize the user experience.

Preload: Tells the browser to download the resource because it will be needed later during the current navigation.

And then there’s a prerender missing, right? (It is recommended that browsers obtain the linked resources beforehand and that the prefetched content be displayed off-screen so that it can be quickly presented to the user if needed.)

But it’s really hard to use in practice. Because it causes the browser to not only get the resource, but also preprocess the resource, and other resources that the HTML page depends on, such as

So, unsurprisingly, it was abandoned. But the Chrome team came up with the NoState Prefetch mechanism based on this. Prior to Chrome 63, Chrome already treats prerender as a NoState Prefetch that behaves like a prerender, with NoState Prefetch fetching resources ahead of time; But unlike prerender, it does not execute JavaScript and does not pre-render any part of the page. NoState Prefetch uses only about 45MiB of memory, and the priority of sub-resource processing is IDLE. Since Chrome 69, NoState Prefetch has added the Purpose: Prefetch header to all requests so that they are different from normal browsing.

In fact, we use at least preconnect and DNS-prefetch, and use preFETCH, preload, and PRERender carefully. One caveat: Use prerender only if you know what resources the user will need next (e.g. purchase process, registration process).

Even with preconnect and DNS-Prefetch, the browser limits the number of hosts that can be connected in parallel, so it’s better to sort them by priority.

Using Resource Hint is probably the easiest way to improve performance, and it does give you a performance boost.

The request priority comparison table of resources at each stage is attached below:

Since font files are generally important resources on a page, it is sometimes a good idea to use Preload to prompt the browser to download important fonts. However, you need to take a closer look at whether this actually improves performance, because there is a priority conundrums when it comes to preloading fonts: since preloading is considered so important, it can skip even more critical resources, such as critical CSS.

You can also load JavaScript dynamically, effectively delaying load execution. In addition, since accepts the media attribute, resources can be selectively prioritized according to the @Media query rule.

Finally, there are a few things to be aware of:

Preload helps bring the start of a resource download closer to the time of the initial request, but preloading the resource takes up the memory cache of the page.

Preload handles HTTP caching well: if the resource is already in the HTTP cache, a network request is never sent.

Therefore, preload is useful for resources that trigger subsequent loading, such as background-image-loaded images, inlining key CSS (or JavaScript) and preloading the rest. In addition, the preload tag does not initiate preloading until the browser receives the HTML from the server and the parser finds it.

There are also some new proposals, such as Early Hints that enable preload before HTML response headers, and Priority Hints that give developers a way to indicate resource priorities to browsers, allowing us to better control the order in which resources are loaded.

Note: If you are using preload, you must define as or it will not load; Preloaded fonts without the Crossorigin attribute are retrieved repeatedly.

<link rel="preload" href="style.css" as="style">
<link rel="preload" href="main.js" as="script">
Copy the code

Optimize rendering performance

Use CSS will-change to tell the browser which elements and attributes will change.

will-change: auto
will-change: scroll-position
will-change: contents
will-change: transform
will-change: opacity
will-change: left, top

will-change: unset
will-change: initial
will-change: inherit
Copy the code

Most CSS styles are computed by THE CPU, but CSS also has some 3D styles and animated styles, and computing these styles will have a lot of repetitive and large computing tasks, which can be handed over to the GPU.

The browser will use GPU rendering for the following CSS:

  • transform
  • opacity
  • filter
  • will-change

Note that GPU hardware acceleration requires creating a new layer, and moving the element to the new layer is a time-consuming operation that may cause flash, so it’s best to do it in advance. Will-change is to tell the browser in advance to put elements on a new layer at the beginning, so that later GPU rendering does not need to do a new layer.

Do you know how to avoid backflow and redraw?

Speaking of backflow and redraw, let’s review the browser rendering process:

  1. Build a DOM tree

    • HMTLLexical analysis, into the correspondingAST
  2. Style calculation

    • Format style attributes such as REM -> px, white -> #FFFFFF, etc

    • Compute the style properties of each node: Build the Render tree with the DOM tree based on the CSS selector

  3. Generate layout tree

    • Let me get rid of some of thatdispy:noneHide style elements as they are not presentrenderIn the tree
  4. Create layer tree

    • Mainly divided into “explicit synthesis” and “implicit synthesis”
      • When redrawing, you only need to redraw the current layer
  5. Generate draw list

    • Convert the layer tree into a list of instructions to draw
  6. Generate map blocks and bitmaps

    • The drawing list is delivered to the composition thread for layer partitioning.

    • A rasterized thread pool is specially maintained in the rendering process, which is specially responsible for rendering the image blocks to the GPU

    • After GPU rendering, bitmap information is transferred to the synthesis thread, which displays the bitmap information on the display

  7. Display content

The fourth step to build a layer tree is very important, we will focus on it. Browser from DOM tree image quality to screen graphics, need to do tree structure to layer structure transformation. Here are four points:

  1. RenderObject

    A DOM node corresponds to a render object, which maintains the tree structure of the DOM tree. The render object knows how to draw the content of the DOM node by issuing the necessary draw instructions to a GraphicsContext.

  2. RenderLayer

    As the first layer model constructed in browser rendering, rendering objects in the same coordinate space of the same level are merged into the same rendering layer. Therefore, according to the cascading context, rendering objects in different coordinate space of different levels will form multiple rendering layers to reflect the cascading relationship between them. Therefore, browsers automatically create new layers for rendered objects that meet the conditions for forming a cascading context. In general, there are several common situations that will cause the browser to create a new render layer for it:

    • documentThe element
    • position: relative | fixed | sticky | absolute
    • opacity < 1
    • will-change | fliter | mask | transform ! = none | overflow ! = visible
  3. Graphics Slayer

    The Graphics layer is a layer model that is responsible for generating the content graph that is finally ready to be rendered. It has a GraphicsContext that is responsible for the output of the bitmap of the layer. The bitmap stored in the shared memory will be uploaded to the GPU as a texture (think of it as a bitmap image moved from main memory to image memory). Finally, the GPU will synthesize multiple bitmaps and draw them on the screen. At this point, our page will be displayed on the screen.

    So the graphics layer is an important rendering vehicle and tool, but it doesn’t deal directly with the rendering layer, it deals with the composition layer.

  4. CompositingLayer

    Render layers that meet certain special conditions are automatically promoted to compositing layers by the browser. The compositing layer has a separate graphics layer, while other rendering layers that are not compositing layers share one with the first parent layer that has a graphics layer.

    So what special conditions does a rendering layer meet to be promoted to a compositing layer? Here are some common cases:

    • 3D transforms
    • video,canvas,iframe
    • opacityThe animation transformation
    • position: fixed
    • will-change
    • animationtransitionSet upopacity,transform,fliter,backdropfilter

As mentioned above, rendering layers that meet some special conditions will eventually be upgraded by the browser to the composition layer, called explicit composition. In addition, there is an implicit composition in the composition phase of the browser. Let’s take a look at some examples:

  • Let’s say we have twoabsolutePositioning of thedivOverlapped on the screen, according toz-indexOne of themdivIt’ll be on top of the other one.

  • At this point, if we givez-index: 3Set up thetransform: translateZ(0)And ask the browser to promote it toComposite layer. After the ascensionz-index: 3The composition layer is going to be indocuemntUp. So logicallyz-index: 3Will be inz-index: 5Up here, we set it upz-indexYou get overlapped relationships

  • In order to correct the wrong overlapping order, the browser must make the rendering layer that should be “overlaid” on top of it also promote to the compositing layer. This is called implicit composition

Upgrading the rendering layer to the compositing layer gives us a number of benefits:

  1. The bitmap of the composite layer will be handed overGPUSynthesis,CPUProcessing is much faster;
  2. When repainting is required, only the repainting itself is required, not affecting other layers;
  3. After the element is promoted to the synthesis layer,transformopacityWill not trigger redraw, if it is not a composite layer, it will still trigger redraw.

Of course, abuse of anything has side effects, such as:

  1. The layers drawn must be transferred toGPU, the number and size of these layers may cause very slow transmissions, which may cause flickering on some low – and mid-range devices;
  2. Implicit composition tends to produce excessive composition layers, each of which takes up extra memory, formingLayer explosion. Take upGPUAnd a large amount of memory resources, serious loss of page performance. Memory is a valuable resource on mobile devices, and using it too much can cause browsers to crash, making performance optimizations counterproductive.

OK, now that we know how browsers render, let’s look at how to reduce backflow and redraw in practice:

  1. Always set the width and height properties on the image: the browser allocates boxes and reserves space by default and does not need to backflow after the image resource is loaded.
  2. Avoid multiple changes: let’s say we need to change oneDOMheight/width/marginThree properties, at which point we can passcssTextTo modify, not passdom.style.heightTo modify.
  3. Batch modify DOM: willDOMHide or clone the changes and replace them, but now browsers use queues to store multiple changes and optimize. This is not a very wide range of applications.
  4. Out of document flow: For things like animations that change frequentlyDOM. You can use absolute positioning to take it out of the document flow to avoid frequent backflow of the parent element.

Network optimization

Is OCSP Stapling enabled?

You can speed up TLS handshakes by enabling OCSP Stapling on the server. The Online Certificate Status Protocol (OCSP) is a substitute for certificate Revocation List (CRL). Both protocols are used to check whether the SSL certificate has been revoked. However, the OCSP protocol does not require the browser to spend time downloading and searching the certificate information list, thus reducing the time required to shake hands.

  • Certificate Revocation List (CRL) : CRLS are distributed in publicly available repositories. Browsers can obtain and view the latest CRL of a CA during certificate verification. One drawback of this method is that the time granularity of the revocation list is limited by the CRL release period. The browser will be notified of the revocation only after the CA vendor has updated all currently issued CRLS. Signing CA vendors vary in their strategies, some in hours, some in days, even weeks.

  • Online Certificate Status Protocol (OCSP) : In order to solve the problem of large single file and high latency, a new solution OCSP is introduced. The browser requests the revocation status of the certificate from the online OCSP Server (also known as the OCSP Response Server) and the OCSP Server responds. This method avoids the CRL update delay problem. The disadvantages of this method are:

    1. Each HTTPS request created by a browser must be connected to the CA OCSP Server for verification. However, the NETWORK connection between the IP address of the browser and the CA OCSP Server is not normal. Moreover, OCSP validation has network IO, which takes a long time and seriously affects the user experience of browser accessing the server.
    2. When the browser sends the HTTPS certificate number of the Server to the CA OCSP Server, the user’s privacy is also exposed and the url accessed by the user is disclosed to the CA OCSP Server.
  • OCSP Stapling: OCSP Stapling solves the disadvantages of CRL and OCSP by calling the OCSP Server to obtain the certificate revocation status. The Web Server can directly query the OCSP information. To avoid network access restrictions and the physical distance between the OCSP server and users, you can also cache query responses for use by other browsers. Since the OCSP response is also signed with the CA RSA private key, there is no need to worry about forgery.

    1. Fixed slow access issues

    2. Solved the problem of user privacy disclosure

Is the HTTP protocol properly optimized?

With the popularity of HTTPS and HTTP/2, many optimization strategies of the HTTP/1.1 era are no longer effective, or even anti-optimization.

Here we take a brief look at the various versions of the HTTP protocol:

  • HTTP / 1.0

    • Browser and server only keep short connection, the browser every request of the need to establish a TCP connection with the server (TCP connection of new cost is very high, because of the need to the client and the server three-way handshake), immediately after the server to complete the request processing disconnect a TCP connection, the server does not track each customer nor recorded the last request
  • HTTP / 1.1

    • Pipelining) : Proposes a pipelining scheme to solve the connection delay, which can be set by the serverKeep-AliveTo allow the connection to delay closing time, but because the browser itselfMax-ConnectionThe maximum connection limit, which limits the number of connections requested within the same domain (Google chrome limits the number of connections within the same domain to 6 at a time), can only be achieved by multiple domains, which is our static resource selectionCDNOn or under other domain names to speed up resource loading. Pipelining schemes require front-end and back-end support, but most of themHTTPAgents are not friendly to pipelining support.
    • Support only GET/HEAD: Pipelining is supported onlyGET / HEADMode to transmit data is not supportedPOSTAnd other means of transmission.
    • Redundant header information:HTTPIs stateless, the client/server can only passHEADAs a result, each connection request will carry a large number of redundant header information, including header informationCOOKIEInformation, etc.
    • Hypertext protocol:HTTP/1.XIt’s hypertext protocol transfer. Hypertext protocol transmission, when sending a request to find the position of the beginning and end of the data frame, and remove redundant space, select the best way to transmit. If you useHTTPS, then the data will be encrypted, which will cause loss of transmission speed to a certain extent.
    • Team head block: pipelining a scheme that closes over a delayed connection, although multiple requests to the server can be made simultaneously, but the serverresponseStill followFIFOThe (first-in, first-out) rules return in sequence. For example, the client sends requests 1, 2, 3, and 4. If 1 is not returned to the client, then 2, 3, and 4 are not returned either. This is called “queue head blocking”. Congestion is obvious in high concurrency and high latency scenarios.
  • HTTP / 2.0

    • multiplexing: Only one field is requiredTCPConnect, realize real concurrent request, reduce delay, improve bandwidth utilization.
    • The head of compression: Progressive update maintenance on the client/server is adoptedHPACKCompression saves traffic occupied by the packet header.
      1. The same header information is not sent through the request, and the header information carried by the previous request is used instead.
      2. New/modified header information will be added toHEADIn, both ends are gradually updated.
    • Request priority: Each stream has its own priority level, which the client can specify. And can do flow control.
    • Server push: for example, if we load index.html, we might also need index.js, index.css, etc. The traditional request will only request the resource load when it gets the index. HTML and resolves the introduction of the index.js/index. CSS in the HTML. However, with server data, the resource can be pushed to the client in advance, so that the client can call it directly when it needs it, without sending the request.
    • Binary protocol: Uses binary protocolWith HTTP / 1. XHypertext protocol. When the client (service) end sends or receives data, the data is sent in random order. When receiving data, the receiving end passes throughstreamIDIdentity to merge data. Binary protocols parse more efficiently, are more compact “on line” and, most importantly, have fewer errors.

    It should also be added that HTTP2 is not an overall advantage over HTTP1.1 because HTTP2 banishes multiple HTTP connections into the same TCP connection, following the same traffic state control. As long as the first HTTP stream is blocked, subsequent HTTP streams cannot be sent at all. This is called “queue blocking”.

  • HTTP / 3.0

    Using QUIC protocol, based on UDP protocol, to avoid some shortcomings of TCP protocol, using TLS1.3 to reduce the RTT required by HTTPS to at least 0.

    • The deficiency of TCP

      • TCP may suspend data transfers intermittently: if a data segment with a lower sequence number has not been received, TCP’s receiver sliding window does not continue processing even if other segments with a higher sequence number have been received. This can cause the TCP stream to suspend instantaneously or, worse, close the connection even if one of the segments is not received. This problem is called line blocking of TCP streams (HoL)

      • TCP does not support stream cascade: While TCP does allow multiple logical connections between application layers, it does not allow multiplexing of packets within a TCP stream. With HTTP/2, the browser can only open one TCP connection with the server and use the same connection to request multiple objects, such as CSS and JavaScript files. Upon receiving these objects, TCP serializes all of them into the same stream. Therefore, it does not know the object level partitioning of the TCP segment.

      • TCP creates redundant communication: TCP connection handshakes have redundant message exchange sequences, even for connections established with known hosts.

    • Advantages of THE QUIC protocol

      • Choose UDP as the underlying transport layer protocol: a new transport mechanism built on top of TCP inherits all of TCP’s shortcomings. Therefore, UDP is a wise choice. In addition, QUIC is built at the user level, so kernel changes are not required with each protocol upgrade.

      • Flow multiplexing and flow control: QUIC introduces the concept of multiplexing over connections. QUIC is designed to implement individual, per-stream flow control, which solves the problem of clogging the entire connection.

      • Flexible congestion control: THE TCP congestion control mechanism is rigid. The protocol reduces the congestion window size by half each time congestion is detected. In contrast, QUIC’s congestion control is designed to be more flexible, enabling more efficient use of available network bandwidth for better throughput.

      • Better error handling: QUIC uses enhanced lost-recovery mechanisms and forward error correction capabilities to better handle error packets. The feature is a boon for users who can only access the Internet over slow wireless networks, which often have high error rates during transmissions.

      • Faster handshake: QUIC uses the same TLS module for secure connections. However, unlike TCP, QUIC’s handshake mechanism is optimized to avoid redundant protocol swapping each time communication is established between two known peers.

Here are two formulas to compare HTTP/3 with HTTP/2 in combination with HTTPS, to give you a more intuitive feeling, the details will not be outlined. After all, that’s not the topic of this paper, so I can go into more details later if you’re interested.

HTTP/2: Total HTTPS communication time = TCP connection time + TLS connection time + HTTP transaction time = 1.5RTT + 1.5RTT + 1RTT = 4 RTT

HTTP/3: For the first link, THE QUIC uses TLS1.3, which requires 1RTT. A total of 2RTT is required for an HTTP data request. During reconnection, the Session ID is directly used, and TLS authentication is not required again. Therefore, only 1RTT is required

In the current era of mainstream HTTP2 + HTTPS, which optimization strategies have been outdated or even anti-optimization

  1. Reduced number of requests:HTTP / 1.1Because of “queue head congestion”, we usually merge resources, bundle files (Sprite art, etc.) to reduce the number of requests. But in theHTTP/2We need to pay more attention to the website cache tuning, transmission of lightweight, fine-grained resources, convenient independent cache and parallel transmission.
  2. Multi-domain storage:HTTP / 1.1Because the browser has a maximum number of connections, we distribute resources to different domains to increase the maximum number of connections. But in theHTTP/2There is only one link in each domain, so we don’t need to store multiple domains, and multiple domains even cause extra storageTLSConsumption.

Reduce the size of the request header

Reduce the size of the request header, most commonly cookies. For example, there are a lot of cookies stored in our main site (such as www.test.com), and our CDN domain name (cdn.test.com) is the same as our main domain. At this time, we will attach cookies in the.test.com domain when we request. And these cookies are useless to the CDN and will increase the size of the package we requested. Therefore, we can distinguish the CDN domain name from the main domain. For example, the CDN domain name of Taobao (https://www.taobao.com) is https://img.alicdn.com.

conclusion

Optimization tools

PageSpeed Insights and WebPageTest are two of the tools we’ve outlined at length.

PageSpeed Insights can help us view RUM data of the website, provide optimization deficiencies of the website, and provide optimization suggestions, etc.

WebPageTest offers free site speed testing in multiple locations around the world. You can also provide a wealth of diagnostic information based on test results, including resource loading waterfall diagrams, page speed optimization checks, and improvement suggestions, giving each a final rating.

Quick scheme

The list is huge, and it could take a long time to complete all the optimizations. So, if you have a limited amount of time to optimize, what do you do? Let’s boil it down to 15 easy points.

  1. Set appropriate goals based on practical experience. A reasonable goal is: visual area rendering < 1s, page rendering < 3s, weak network 3G actionable time < 5s, and repeatable interactivity time (TTI) < 2s.
  2. Prepare the key CSS on the front screen and inline it on the page. forCSS / JS, the key file size is controlled to the maximumCompressed 170 KBInside.
  3. Extract, optimize, and lazily load as many scripts as possible and choose lightweight alternatives (e.gDayJsInstead ofMomentJs), and limit the impact of third-party scripts.
  4. Only to have<script type="module">The module/nomodule modelOlder versions of browsers provide older version code.
  5. Try to regroupCSSThe rules.
  6. Adding a resource prompt (resource hints) to speed up page loading, for exampledns-prefetch,preconnect,prefetch,preload,prerenderAnd so on.
  7. Set up a Web font subset and load it asynchronously, and leverageCSSIn thefont-displayFast first renders.
  8. usemozjpeg,guetzli,pingoSVGOMGOptimize the image and consider using the imageCDNWebPService.
  9. checkHTTPWhether the cache header and security header are set correctly.
  10. Enable on the serverBrotliCompression. (If not, enable itGzipCompression).
  11. As long as the server is runningLinuxOn kernel version 4.9+, it is enabledTCP BBRCongestion.
  12. Enable if possibleOCSP stapling.
  13. ifHTTP/2If yes, enable itHPACKCompression. Try enabling it if you’re more aggressiveHTTP/3.
  14. inService workerCaches resources such as fonts, styles, JavaScript, and images in the
  15. Try to use progressivehydrationAnd the streaming server renders your single page application.

Refer to the article

www.smashingmagazine.com/2021/01/fro…

Juejin. Cn/post / 701064…

Juejin. Cn/post / 684490…

Segmentfault.com/a/119000002…

www.jianshu.com/p/1ad439279…

zhuanlan.zhihu.com/p/364991981

zhuanlan.zhihu.com/p/102382380

web.dev/vitals/