• This browser tweak saved 60% of requests to Facebook
  • Nate Schloss Ben Maurer
  • The Nuggets translation Project
  • Translator: vuuihc
  • Proofread by: lorinlee, Airmacho

Over the past two years, we at Facebook have been working with browser vendors to improve caching in browsers. As a result, Chrome and Firefox have recently introduced features that make their caching mechanisms much more efficient for us and the web as a whole. With the help of these improvements, the number of static resource requests sent to our server has decreased by 60%, thus greatly improving web page load times. (Static resources are files that the server reads from disk and makes available to the public without running any extra code.) This article will explain in detail what we did with Chrome and Firefox to achieve this effect — but we need to define some concepts and semantic environments first, This helps explain the problem we need to solve. The first thing to talk about is revalidation.

Each revalidation means another request

When you browse the web, browsers often reuse the same resources, such as the same logo or JavaScript code in different pages. If the browser needs to download these resources repeatedly, it is very wasteful.

To avoid repeated downloads, the HTTP server can specify an expiration time and validation mechanism for each request, which can indicate that the browser does not need to repeat the download until the resource expires. The expiration time is sent via the cache-Control field in the HTTP header, which tells the browser how long it can reuse the latest response. The validation mechanism allows the response to be reused by the browser even if it expires. It allows the browser to confirm to the server that the resource is still valid and that the previous response can be reused. The validation mechanism is defined by the LAST-Modified or Etag field in the HTTP header.

The resource in the following example expires after an hour and has last-Modified validation.

    $ curl https://example.com/foo.png
    > GET /foo.png

    < 200 OK
    < last-modified: Mon, 17 Oct 2016 00:00:00 GMT
    < cache-control: max-age=3600
    <image data>
Copy the code

In this example, the browser that receives this response can reuse it for the next hour without having to send a request to example.com. The browser must then revalidate the resource by sending a conditional request to confirm that the image is still up to date:

$ curl https://example.com/foo.png -H 'if-modified-since: Mon, 17 Oct 2016 00:00:00 GMT' > GET /foo.png > if-modified-since: Mon, 17 Oct 2016 00:00:00 GMT If the image is Not Modified, return: < 304 Not Modified < last-modified: Mon, 17 Oct 2016 00:00:00 GMT < cache-control: max-age=3600 Tue, 18 Oct 2016 00:00:00 GMT < cache-control: max-age=3600 <image data>Copy the code

If the resource has not been modified, the server sends an unmodified (304) response. This is better than transferring the entire resource again because there is less data to transfer, but it does not eliminate the latency associated with browser-server communication. Every time the server returns an unmodified (304) response, the browser already has the resource it wants. We hope to avoid these wasteful revalidation by allowing client caching for longer.

Indicates that there is no need to re-download for a long time

Revalidation leaves us with a thorny question: How long should the expiration date be? If you set an expiration time of one hour, the browser will have to communicate with the server every hour to confirm whether the resource has been modified. Many resources like logos or JavaScript code rarely change; Hourly checks are not necessary in these cases. On the other hand, if the expiration time is long and the browser keeps fetching resources from the cache, it is possible to display expired resources.

To solve this problem, Facebook uses the concept of content-addressable urls. Our URL is not a URL that describes a logical resource (e.g. “logo.png”, “library.js”), but a hash of our content. Each static resource is hashed each time a website is published. We maintain a database to store these hashes and map the hash values to their contents. When the server provides the resource, we create a URL with a hash value instead of providing it by name. For example, if the logo PNG hash is abc123, we use the URL www.facebook.com/rsrc.php/ab…

Because this scheme uses the hash of the file’s content as the URL, it provides an important guarantee that the content addressed URL points to never changes. Therefore, we provide a very long expiration time (currently one year) for all content addressing urls. Also, because the content of the URL never changes, our server will always respond to 304 unmodified for all conditional requests about static resources. This saves CPU cycles while allowing us to respond to such requests more quickly.

The refreshProblems brought about

The refresh button in the browser enables the user to retrieve a newer version of the current page. When you hit Refresh, the browser will revalidate the current page even if the page has not expired. In addition, however, all child resources on the page — such as images and JavaScript files — are revalidated.

Re-validation of subresources means that each subresource must request re-validation to the server even if the user has already visited the site they are refreshing. On sites that use content-addressing urls, such as Facebook, these revalidation requests are futile. The content of the content-addressing URL never changes, so revalidation always yields 304 unmodified responses. In other words, revalidation, requests, and resources spent on the whole process are not necessary in the first place.

Conditions of the requestToo much

In 2014, we found that 60% of static resource requests received 304 responses. Since content-addressing urls never change, this means there is an opportunity to optimize 60% of static resource requests. With Scuba’s help, we started looking at the data for conditional requests. We noticed a huge difference in performance between browsers.

After finding that Chrome had the most 304 responses, we started working with them to figure out why it was sending so many conditional requests.

Chrome

A line of Chrome source code answers our question. This line of code lists several reasons why Chrome might ask to revalidate resources on a page, including the user hitting refresh. In one example, we found that Chrome revalidates all resources on web pages returned by POST requests. The reason for this, the Chrome team tells us, is that POST requests tend to occur when a user changes information about a web page (such as making a purchase or sending an email) and wants to have the latest web page. However, sites like Facebook use POST requests during the login process. Each time a user logs in to Facebook, the browser ignores its cache and revalidates all previously downloaded resources. We worked with Chrome product managers and engineers to determine that this behavior is unique to Chrome and unnecessary. After correcting this, Chrome reduced conditional requests from 63% to 24% of all requests.

Our work with Chrome on login issues is a great example of how Facebook can work with the browser team to quickly resolve an error. In general, when we view data, we often view it separately by browser. If we find data anomalies in a browser, it indicates that something in that browser can be optimized. We can then work with the browser vendors to solve the problem.

Despite some gains, the percentage of conditional requests from Chrome is still higher than other browsers, indicating that there is still some opportunity for improvement. We started looking into the refresh process and found that Chrome treats same-address access as a refresh, whereas other browsers don’t. Same-address access means that the user enters the url of the currently loaded web page in the address bar and tries to access it. Chrome fixes the same-address access issue, but we don’t see much improvement. We started talking to the Chrome team about changing the behavior of the refresh button.

Changing the revalidation mechanism for the refresh button is a change to long-standing design on the Web. However, discussing this issue, we realized that developers are unlikely to rely on this mechanic. The end user of the site does not know what resource expiration times and conditional requests are. While some users may press the refresh button when they want to update a page, Facebook’s statistics show that most users don’t use the refresh button. Therefore, if a developer is changing a resource with an expiration time of X, the developer must either force the user to use the old data until it expires, or the user must modify the URL. If the developer has already changed the resource, there is no reason to revalidate the child resource.

There was some debate in the industry about how to handle this, and we came up with a compromise whereby resources with a large max-age are never revalidated, but for resources with a short max-age the old policy will be used. The Chrome team considered this and decided to apply the new policy to all resources. You can check out their process here. Thanks to Chrome’s package approach, all developers and websites themselves can benefit from this improvement without any changes.

As you can see in this example, instead of requiring a network request for each child resource on the refreshed page, each file can now be read directly from the cache without being blocked by network requests.

After Chrome released this final improvement, the percentage of conditional requests from Chrome dropped dramatically – a win-win for Facebook and its users, with fewer 304 unmodified requests for the server to respond to, and users able to refresh pages more quickly.

Firefox

With Chrome out of the way, we started talking to other browser vendors about the refresh button behavior. We submitted a bug to Firefox, but they chose not to change the long-held default behavior of the refresh button. Instead, his team implemented a scheme that our engineers came up with to add a new cache-control header to some resources, telling the browser that the resource never needs to be revalidated. The idea behind this header is that it is an additional promise the developer gives the browser that the resource will never change during its maximum lifetime. Firefox chooses to implement this directive in the form of the cache-Control :immutable header.

With this added header, a request to Facebook for resources will now receive a response similar to the following:

$ curl https://example.com/foo.png
> GET /foo.png

< 200 OK
< last-modified: Mon, 17 Oct 2016 00:00:00 GMT
< cache-control: max-age=3600, immutable
<image data>
Copy the code

Firefox quickly implemented cache-Control :immutable, and introduced it when Chrome fully released its ultimate improvement to refresh behavior. You can read more about Firefox’s improvements here.

With Firefox, we have some development costs. But when we modified the server code to add the IMmutable header, we got some good results.

After the improvement

Improvements in Chrome and Firefox have resulted in significantly fewer revalidation requests from these modern browsers. This reduces the bandwidth strain on our servers and, more importantly, improves the loading speed of people visiting Facebook.

Unfortunately, this kind of change is difficult to measure exactly — the new version of the browser contains so many improvements that it’s almost impossible to isolate the impact of specific improvements. However, when testing this improvement, the Chrome team ran A/B tests and found that 90% of all sites refresh faster by 1.6 seconds for mobile users using 3G networks.

conclusion

It’s a tricky job because we’re asking to change long-standing online behavior. But it shows that Web browsers can and have worked with Web developers to make the Web a better place for everyone. We’re excited to have friends on the Chrome and Firefox teams who have built great relationships with us and are excited that we can continue to work together to improve everyone’s Web.


Did this article help you? Welcome to join the front End learning Group wechat group: