This post is a translation Of The Cost Of JavaScript in 2019 by Addy Osmani (@addyosmani[1]) on V8’s official blog. Use Script-Streaming to Improve page loading performance

It was first launched on the official account of Maoyan Front-end team: My-fee

The dramatic improvements in JavaScript performance over the past few years have largely depended on how quickly browsers can parse and compile JavaScript. In 2019, the main performance penalty for handling JavaScript is download and CPU execution time.

User interaction is delayed when the browser main thread is busy executing JavaScript, so script execution time and bottleneck optimization on the network are especially important.

A high-level guide to what is possible


What does this mean for Web developers? Parsing and compilation performance degradations are no longer as slow as previously thought. We need to focus on three things:

  • Increase download speed
    • Reduce the size of JavaScript packages, especially on mobile devices. Smaller packages increase download speeds, lead to lower memory footprint, and reduce CPU performance losses.
    • Avoid packing code into one large file. If a package is larger than 50-100 kB, split it into smaller packages. (Due to the multiplexing nature of HTTP/2, multiple requests and responses can arrive simultaneously, reducing the load of additional requests.)
    • Due to the network speeds on mobile devices, you should reduce network transfers, and you also need to maintain lower memory usage.
  • Improve execution speed
    • Avoid long tasks that keep the main thread busy and make the page interactive faster. Script execution time is now a major performance drain.
  • Avoid large inline scripts (because they are also parsed and compiled in the main thread). A good rule of thumb is that if a script is larger than 1KB, don’t inline it (another reason is that the bytecode cache requirement for external scripts is a minimum of 1KB).

Why is optimizing download and execution times important?


Why is optimizing download and execution times important? Download time is critical in a low-end network environment. Despite the rapid growth of 4G (or even 5G) around the world, the actual speeds we perceive are inconsistent with the hype, and many times it feels like 3G (or worse).

JavaScript execution time is important on phones using low-end cpus. Due to differences in CPU, GPU and heat dissipation, the performance of different mobile phones varies greatly. This affects JavaScript performance because JavaScript execution is cpu-intensive.

In fact, as much as 30% of the total page load time on a browser like Chrome is spent executing JavaScript. Here is a task load (Reddit.com) that is typical of a site on a high-end desktop device,

JavaScript processing in V8 takes up 10-30% of page load time.

On mobile devices, JavaScript execution time on the mid-range machine (Moto G4) is 3-4 times that on the high-end machine (Pixel 3), and performance difference on the low-end machine (Alcatel 1X for less than $100) is more than 6 times:

Reddit’s JavaScript performance drain on different device types (low end, mid end, and high end)

Note: Reddit is a completely different experience on desktop and mobile, so results on the MacBook Pro are not directly comparable to results on other devices.

When you try to optimize JavaScript execution time, pay attention to long tasks, which can hog the UI thread for a long time. These tasks block the execution of critical tasks even if the page appears to have loaded. Break up long tasks into smaller ones. By splitting code and specifying load priorities, you can speed up page interactivity and hopefully reduce input latency.

Long tasks monopolize the main thread and should be split.

What has V8 done to speed up parsing and compiling?


On Chrome 60+, V8 parses the original JavaScript twice as fast. At the same time, the performance penalty for initial parsing and compilation is less, thanks to other parallel optimizations on Chrome.

V8 reduces parsing and compilation tasks on the main thread by an average of 40% (46% on Facebook, 62% on Pinterest, for example) and up to 81% (YouTube), thanks to moving parsing and compilation tasks to worker threads. This is a complement to streaming parsing/compilation.

The image below shows CPU parsing times on different Chrome V8 versions. Chrome 61 took the same amount of time to parse Facebook JS, Chrome 75 now takes 6 times as long to parse Facebook as It does Twitter.

Chrome 61 can parse Facebook JS time, Chrome 75 can parse Facebook and Twitter JS 6 times at the same time.

Let’s look at the changes that are unleashed. Long story short, streaming parsing and worker threads compile scripts, which means:

  • V8 can parse and compile JavaScript without blocking the main thread.
  • Streaming parsing begins with the entire HTML parser encounter<script>The label. For scripts that block parsing, the HTML parser pauses, while the asynchronous script continues execution.
  • V8 parsing is faster than downloading for most real-world network connection speeds, so V8 completes parsing and compiling very quickly after the script is downloaded.

Explain a little bit… Older Chrome would parse a script after it was fully downloaded, which was straightforward but didn’t make full use of the CPU. Between Chrome 41 and 68, Chrome parses the async and defer scripts on a separate thread at the start of the download.

The script on the page is broken up into chunks. As soon as the code block exceeds 30KB, V8 starts streaming parsing.

On Chrome 71, we started making a task-based adjustment where the scheduler could parse multiple Async /defer scripts at once. The impact of this change is a 20% reduction in main thread parsing time and a more than 2% improvement in TTI/FID on real sites.

First Input Delay (FID) measures the time a user First interacts with your site (that is, when they click a link, click a button, or use a custom javascript-driven control) to the time the browser is actually able to respond to that interaction. Interaction time (TTI) is a measure of how long an application takes to load and how quickly it can respond to user interactions.

On Chrome 72, we switched to streaming parsing as the primary parsing method: asynchronous scripts are now generally parsed this way (except inline scripts). We also stopped doing away with task-based parsing if the main thread needed it, because that would just be doing unnecessary rework.

Earlier versions of Chrome supported streaming parsing and compilation, and source data from the web had to reach Chrome’s main thread before being forwarded to the streaming processor.

This often causes the streaming parser to wait for data that has already been downloaded but has not yet been forwarded to the streaming task because it is blocked by other tasks (such as HTML parsing, layout, or JavaScript execution) on the main thread.

We are now trying to start parsing the preload, which the main thread bounce blocks beforehand.

More details from Leszek Swirski’s BlinkOn demo:

How do I view these changes on DevTools?


In addition to the above, DevTools has a problem with its implicit use of the CPU, which can affect the rendering of the entire parsing task. However, the parser blocks when it parses the data (it needs to run on the main thread). This has become even more obvious since we have moved from a single stream processing thread to a stream task. Here’s what you’ll often see in Chrome 69:

The “Parse script” task in the image above took 1.08 seconds. Parsing JavaScript isn’t slow! Most of the time you do nothing but wait for data to pass through the main thread.

Chrome 76 behaves quite differently:

On Chrome 76, parsing scripts are broken down into smaller streaming tasks.

In general, the DevTools performance panel is a good place to see what’s happening on the page. For more detailed V8 specific metrics, such as JavaScript parsing compile times, we recommend Chrome Tracing with runtime Call Statistics (RCS). In RCS results, parse-background and compile-background represent the time spent parsing and compiling JavaScript outside of the main thread, whereas Parse and Compile record the main thread metrics.

The real impact of these changes?


Take a look at some real site examples and how scripted streaming parsing works.

On the MacBook Pro, the main thread and workder thread parse the time it takes to compile Reddit’s JS.

Reddit.com had multiple 100 KB+ code packages wrapped in external functions that caused a lot of lazy compilation on the main thread. In the figure above, the running time is critical because the main thread is busy and delays the interactivity time. Reddit spends most of its time on the main thread, and the Work/Background thread is underutilized.

This benefits from splitting large packages into smaller packages (say, 50KB each) for maximum parallelization, so that each package can be parsed and compiled independently by streaming, reducing the stress on the main thread during startup.

Facebook main thread and Worker thread parse and compile time comparison on Macbook Pro

Or look at Facebook.com. Facebook loaded 6MB of compressed JS through 292 requests, some of which were asynchronous, some of which were pre-loaded, and some of which had a lower load priority. Many of them have very small JavaScript granularity – which is useful for overall parallelization on Background/Worker threads because these small JavaScript pieces can be compiled by streaming parsing at the same time.

Note that you may not be Facebook, and you probably don’t have a long-lived app like Facebook or Gmail, and on the desktop, there’s nothing wrong with having so much JavaScript. However, in general, you should keep your packages coarse-grained and loaded on demand.

Although most JavaScript parsing and compiling tasks can be done as a stream in the background thread, some tasks must still be done in the main thread. The page cannot respond to user input when the main thread is busy. Pay attention to how downloading execution code affects your user experience.

Note: Currently, not all JavaScript engines and browsers implement Script Streaming to optimize loading. But we believe that people will join us for the good user experience.

Parsing the performance cost of JSON


Because JSON syntax is much simpler than JavaScript syntax, parsing JSON is also faster. This can be used to improve the startup performance of web applications by using jSON-like object literals (such as inline Redux Store). Do not use JavaScript object literals to inline data, such as this:

const data = { foo: 42, bar: 1337 }; / / 🐌Copy the code

It can be represented in string-like JSON format, which becomes parsed JSON at runtime:

const data = JSON.parse('{"foo":42,"bar":1337}'); / / 🚀Copy the code

If the JSON string is executed only once, especially during the cold startup phase, the json.parse method is much faster than JavaScript object literals. This technique works better on objects larger than 10 KB – but you should test it out first.

There is another risk of using plain object literals on large data: they can be parsed twice!

  1. The first occurs during literal preparsing.
  2. The second occurs in the literal lazy parsing phase.

The first parsing is unavoidable. Fortunately, the second one can be avoided by putting object literals on top, or in PIFE.

What about parsing/compiling on repeated access?


V8’s bytecode cache optimization helps a lot. When JavaScript is first requested, Chrome downloads it and gives it to V8 to compile. Chrome also stores files in the browser’s disk cache. When the JS file is requested again, Chrome pulls it out of the browser cache and passes it to V8 for compilation again. At this point, the compiled code is serialized and added as metadata to the cached script file.

A diagram of how bytecode caching works in V8

The third time, Chrome takes the file and the file metadata out of the cache and hands them over to V8 for processing. V8 deserializes metadata so compilation can be skipped. Bytecode caching takes effect for the first two accesses within 72 hours. Chrome’s bytecode caching works better with the service worker to cache JavaScript code. You can learn more about bytecode caching in this article for developers.

conclusion


Download and execution times are major bottlenecks for loading JavaScript in 2019. The first screen shows a small package with asynchronous (inline) JavaScript, and the rest of the page uses deferred loaded JavaScript. Decompose large packages to load code on demand. This maximizes parallel parsing in V8.

On mobile devices, you should transfer less JavaScript considering network, memory usage, and execution time on low-end cpus. Balance cacheability and latency to maximize the number of parse compilation tasks outside the main thread.

Further reading

  • Blazingly fast parsing, part 1: optimizing the scanner
  • Blazingly fast parsing, part 2: lazy parsing