• There are those who could Control for Civilians
  • Originally written by: Harry
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: sunui
  • Proofreader: Portandbridge, YZW7489757

The best web requests are those that don’t need to communicate with the server: in a race for website speed, avoiding network exhaustion is better than using it. To that end, using a solid caching strategy can make a huge difference to your visitors’ experience.

That being said, I see more and more practice opportunities in my work that are unconsciously missed, or even completely ignored. Perhaps because of the excessive focus on the first visit, or simply because of a lack of awareness and knowledge. Whatever the reason, we need to do a little review of the relevant knowledge.

Cache-Control

One of the most common and effective ways to manage a static resource Cache is to use the cache-Control HTTP header. This header applies independently to each resource, which means that everything on our page can have a very customized, granular caching policy. This gives us a great deal of control and allows us to create incredibly complex and powerful caching strategies.

A cache-control header might look like this:

Cache-Control: public, max-age=31536000
Copy the code

Cache-control is the header field name, public and max-age=31536000 are the instructions. The Cache-Control header can accept one or more instructions, and what I want to talk about in this article is what these instructions really mean and how they are best used.

publicprivate

Public means that any cache, including a CDN, proxy server, and so on, can store a copy of the response. Public directives are often redundant because the presence of other directives (such as max-age) already implicitly indicates that the response is cacheable.

In contrast, private is an explicit directive that indicates that only the ultimate recipient of the response (client or browser) can cache the file. While private itself does not have security features, it is intended to effectively prevent public caches (such as CDNS) from storing responses that contain user personal information.

max-age

Max-age defines a unit of time (in seconds, relative to the request time) to ensure that the response is considered “fresh.”

Cache-Control: max-age=60
Copy the code

Responses can be cached and reused for the next 60 seconds.

The cache-Control header tells the browser that it can use the file from the Cache for the next 60 seconds without worrying about revalidation. After 60 seconds, the browser will go back to the server to revalidate the file.

If there is a new file available for the browser to download, the server returns 200, the browser downloads the new file, the old file is removed from the HTTP cache, the new file takes its place, and the new cache header is applied.

If there is no new copy available for download, the server returns 304, does not need to download the new file, and uses the new header to update the cached copy. That is, if the cache-control: max-age=60 header still exists, the 60 seconds of the cached file will restart. The total cache time for this file is 120 seconds.

Note: Max-age itself has a giant pit that tells the browser that the resource is expired, but not that the expired version must not be used. The browser may use its own mechanism to decide whether to release expired copies of files without verification. This behavior is somewhat nondeterministic, and it’s hard to know exactly what the browser will do. To that end, we have a more explicit set of instructions for enhancing max-Age, thanks to Andy Davies for clarifying this for me.

s-maxage

S-maxage (note that there is no – between Max and age) overrides the max-age directive, but only in the public cache. Max-age and s-maxage together allow you to set different refresh times for private and public caches (such as proxies and CDNS).

no-store

Cache-Control: no-store
Copy the code

What if we don’t want to cache files? What if the file contains sensitive information? Like an HTML page that contains your bank account information, or something that’s time-sensitive? Or a page with real-time stock prices? We don’t want to store or release responses from the cache at all: we want to discard sensitive information and get the latest real-time information. In this case we need to use no-store.

No-store is a very high priority directive, indicating that no information, private or otherwise, will be persisted to any cache. Any resource with a no-store directive will always hit the network, without exception.

no-cache

Cache-Control: no-cache
Copy the code

Most people will be confused by this…… No-cache does not mean “no cache”. It means “you cannot use the cached copy until you have verified with the server and the server tells you that you can use the cached copy”. Yes, that sounds like must-revalidate! But it’s not as simple as it sounds.

In fact, no-cache is a very smart way to keep content as fresh as possible, while also using cached copies as quickly as possible. Always no – cache hit network, because the release of the browser’s cache copy file has been updated (unless the server’s response) before, it must be with the server to verify, but if the server response allows the use of the cached copy, network will only transfer the file header: file body can be obtained from the cache, without having to download.

So as I said, this is an intelligent way to balance file freshness with the possibility of retrieving files from the cache, with the disadvantage that it triggers the network for at least one HTTP header response.

A good use case for no-cache is dynamic HTML page fetching. Think of the front page of a news site: it’s not real-time, and it doesn’t contain any sensitive information, but ideally we want the page to always be up to date. We can use cache-control: no-cache to make the browser check back to the server first, and reuse the cached version if the server has no fresher content to offer (304). If the server has fresher content, it returns (200) and sends the latest file.

Tip: it is useless to send max-age and no-cache together because the time limit for revalidation is zero seconds.

must-revalidate

To make matters even more confusing, although the previous instruction said it should be called must-revalidate, it is still something different. (This one is more similar.)

Cache-Control: must-revalidate, max-age=600
Copy the code

Must-revalidate requires an associated max-age directive; We set it to 10 minutes above.

If a no-cache validates the cached copy immediately to the server before it is allowed to use it, then must-revalidate is more like a no-cache with a wide lifetime. The thing is, for the first ten minutes the browser won’t (I know, I know……) Revalidate to the server, but at the moment ten minutes have passed, it calls the server again, and if the server has nothing new, it returns 304 and the new cache-Control header is applied to the cached file — our ten minutes start again. If we have a new file on the server ten minutes later, we’ll get a 200 response and its message, and the local cache will be updated.

A good scenario for this is blogs (like mine) : static pages rarely change. Sure, the latest content is available, but considering how rarely my site changes, we don’t need something as heavy-handed as a no-cache. Instead, we assume everything is fine for ten minutes and then retest it later.

proxy-revalidate

In the same vein as s-maxage, proxy-revalidate is the public cache version of must-revalidate. It is simply ignored by the private cache.

immutable

Immutable is a very new and neat instruction that tells browsers more about the types of files we send — is the content mutable or immutable? Before we understand what immutable is, let’s look at the problem it solves:

A user refresh causes the browser to force validation of a file regardless of whether the file is fresh or not, because a user refresh usually means that one of two things has happened:

  1. Page crashes and so on;
  2. The content appears to be out of date……

. So we’re going to check to see if there’s anything fresher on the server.

If a fresher piece of content is available on the server, of course we want to download it. We’ll get a 200 response, a new file, and — hopefully — the problem has been fixed. If there is no new file on the server, we return 304 header, no new file, just a delay for the entire round trip request. If we revalidate a large number of files and all return 304, this adds hundreds of milliseconds of unnecessary overhead.

Immutable is a way of telling browsers that a file will never change-that it is immutable, so don’t bother to reverify it. We can completely subtract the round-trip overhead that causes the delay. So what do we mean by a mutable or immutable file?

  • style.css: When we change the contents of a file, we do not change its name. This file always exists and its contents can always be changed. This file is mutable.
  • style.ae3f66.css: This file is unique — its name carries a fingerprint based on the contents of the file, so we get a brand new file every time the file is modified. This file is immutable.

We will discuss this in detail in the Cache Busting section.

If we could somehow tell the browser that our file is immutable — that the content never changes — then we could also let the browser know that it doesn’t have to check for newer versions: there will never be a new version, because once the content changes, it no longer exists.

This is exactly what the immutable instruction does:

Cache-Control: max-age=31536000, immutable
Copy the code

In browsers that support immutable, user refreshes do not cause revalidation as long as they do not exceed 31,536,000 seconds of fresh life. This means that we avoid responding to 304 round-trip requests, which may save us a lot of latency on CSS blocks rendering. In high latency scenarios, the savings are perceived.

Note: Never apply immutable to any non-immutable file. You should also have a well-thought-out cache corruption policy in case you accidentally cache immutable files strongly.

stale-while-revalidate

I really, really wish stale-while-revalidate was better supported.

We’ve talked a lot about revalidation: the process by which the browser travels back to the server to check if a new file is available. In high latency scenarios, the process of revalidation is perceived, and the time before the server responds that we can publish a cached copy (304) or download a new file (200) is dead time.

Stale-while-revalidate provides a grace period (set by us) that allows browsers to use expired (old) resources while we check the new version.

Cache-Control: max-age=31536000, stale-while-revalidate=86400
Copy the code

This tells the browser, “This file will last another year, but after that, you’ll have an extra day to continue using the old resource until you revalidate it in the background.”

Stale-while-revalidate is a great directive for non-critical resources, and we certainly want the fresher version, but we know that when we check for updates, we won’t have any problems if we still use the old resource.

stale-if-error

Similar to stale-while-revalidate, stale-if-error gives the browser a grace period to use the old response if an error such as 5xx is returned when a resource is revalidated.

Cache-Control: max-age=2419200, stale-if-error=86400
Copy the code

Here we make the cache valid for 28 days (2,419,200 seconds), after which we provide an additional day (86,400 seconds) if we encounter an internal error, allowing access to the older version resources.

no-transform

The no-transform has no relationship to store, service, or revalidate freshness, but it tells the intermediary agent not to make any changes or transformations to the resource.

A common case for intermediate proxy change responses is for the telecom provider to optimize for the user on behalf of the developer: the telecom provider may request through their stack proxy image and do some optimization before their mobile network is passed on to the end user.

The problem here is that the developer starts to lose control of the resource presentation, and the image optimization done by the telecom provider may be too aggressive or even unacceptable, or maybe we’ve already optimized the image to the point where any further optimization is unnecessary.

Here, we are telling the middleman: Do not convert our content.

Cache-Control: no-transform
Copy the code

No-transform can be used with any other header and runs independently of other instructions.

Beware: Some conversions are a good idea: the CDN selects Gzip or Brotli encoding for the user to see if the former is required or the latter can be used; Image conversion service automatically to WebP and so on.

Beware: if you’re running over HTTPS, middleware and proxies can’t change your data anyway, so no-transform is useless.

Cache Busting

It would be irresponsible to talk about caching without talking about Cache Busting. I always recommend addressing cache corruption policies before even considering cache policies. Doing the opposite is asking for trouble.

Cache corruption solves problems like: “I only told the browser to use this file for the next year, but then I changed it and I didn’t want users to wait a whole year before getting a new copy! What should I do? !”

No cache corruption —style.css

This is the least recommended thing to do: no cache corruption at all. This is a mutable file and it’s really hard for us to break the cache.

You have to be very careful about caching files like this, because once we’re on the user’s device, we lose almost all control over them.

Although this example is a stylesheet, HTML pages fall into this camp as well. We can’t change the file name of a web page. Imagine the damage! This is why we tend never to cache them.

Query string —style.css? V = 1.2.14

This is still a mutable file, but we have added a query string to the file path. Better than nothing, but not perfect. If something gets rid of the query string, we’re completely back to where we were before with no cache corruption. Many proxy servers and CDNS do not cache query strings, either through configuration (for example, Cloudflare official documentation says: “…… When requested from the cache service, ‘style.css? Something ‘will be normalized to’ style.css ‘”) or defensively ignored (the query string may contain information requesting a particular response).

The fingerprint –style.ae3f66.css

Adding fingerprints is currently the preferred method of breaking file caches. The file name is changed every time the content changes, and technically we don’t cache anything: we get a brand new file! This is robust and allows you to use immutable. If you can do this on your static resources, do it! Once you have successfully implemented this very reliable cache destruction strategy, you can use the ultimate form of caching:

Cache-Control: max-age=31536000, immutable
Copy the code

Implementation details

The point of this approach is to change the file name, but it doesn’t have to be a fingerprint. The following examples all have the same effect:

  1. /assets/style.ae3f66.css: Destroys file content through hash.
  2. / assets/style. 1.2.14. CSS: Corrupted by release number.
  3. / assets / 1.2.14 / style.css. CSS: Changes the directory in the URL.

However, the last example means that we are versioning each version, not individual files. This in turn means that if we only wanted to cache corrupt our stylesheet, we would also have to corrupt all static files for this version. This can be a bit wasteful, so options (1) or (2) are recommended.

Clear-Site-Data

Caches are hard to fail — a problem well known in computer science — so there is an implementation specification that helps developers explicitly Clear the entire cache of a Site domain at once: clear-site-data.

I don’t want to get too deep into clear-site-data in this article. After all, it’s not a cache-control instruction, it’s actually a brand new HTTP header.

Clear-Site-Data: "cache"
Copy the code

Applying this header to any static file in your domain clears the cache for the entire domain, not just the file attached to it. That is, if you need to clean up the cache of all visitors to your entire site, you can simply add the header above to your HTML.

Browser support, as of this writing, only Chrome, Android Webview, Firefox, and Opera are supported.

Tip: Clear-site-data can receive many commands: “cookies”, “storage”, “executionContexts”, and “*” (apparently, meaning “all of the above”).

Chestnut and its edible method

Okay, let’s look at some scenarios and the types of cache-Control headers we might use.

Online Banking Webpage

The pages of apps like online banking list your recent transactions, your current balance, and sensitive bank information, all of which require real-time updates (imagine how you felt when you saw your balance listed on the page as it was a week ago!). They are strictly confidential (you don’t want to store your bank account details in a shared cache (caches are bad)).

To do this, we do:

Request URL: /account/
Cache-Control: no-store
Copy the code

According to the specification, this is sufficient to prevent browsers from persisting responses to disk in all private and shared caches:

The no-store response directive requires that nothing about client requests and server responses be stored in the cache. This directive applies to both private and shared caches. “No store” in the above context means that the cache must not intentionally store information in non-volatile memory and must use its best efforts to remove information from volatile memory as soon as possible after transfer.

But if you’re still worried, maybe you can choose this:

Request URL: /account/
Cache-Control: private, no-cache, no-store
Copy the code

This explicitly instructs you not to store any information in a public cache (such as a CDN), always provide an up-to-date copy, and don’t persist anything.

Real time train schedule page

If we want to create a page that displays quasi-real-time information, we want to ensure that users always see the most accurate and real-time information possible. We use:

Request URL: /live-updates/
Cache-Control: no-cache
Copy the code

This simple directive allows the browser to display the response from the cache without being directly authenticated by the server. This means that users will never see expired information, and they will benefit from grabbing files from the cache if they have the same latest information on the server as in the cache.

This is a smart choice for almost any site: give us as much up-to-date content as possible, while allowing us to enjoy the speed of caching as much as possible.

The FAQ page

A page like a FAQ may be rarely updated, and its content is unlikely to be time-sensitive. It’s certainly not as important as real-time sports scores or flight status. We can cache such HTML pages for a period of time and force the browser to check for new content periodically rather than every time it is accessed. We set it like this:

Request URL: /faqs/
Cache-Control: max-age=604800, must-revalidate
Copy the code

This allows the browser to cache the HTML page for a week (604,800 seconds), and once the week has passed we need to check the server for updates.

Beware: Applying different caching strategies to different pages on the same site can cause problems. Your no-cache front page will ask for the latest style.f4fa2b.css it references, while your three-day cached FAQ page will still point to style.ae3f66.css. This may not matter much, but it cannot be ignored.

Static JS (or CSS) App bundles

For example, with our app.[fingerprint].js, we update very frequently — almost every release — and we put in the work to fingerprint the file every time it changes and use it like this:

Request URL: /static/app.1be87a.js
Cache-Control: max-age=31536000, immutable
Copy the code

It doesn’t matter how often we update JS: because we can reliably break the cache, we can cache it for as long as we want. In this case, we set it to one year. It’s a year in the first place because it’s a long time, and there’s no way a browser can keep a file that long anyway (browsers have a limited amount of storage for HTTP caches, and they periodically empty some of it; Users may also clear the cache themselves. More than a year of configuration is most likely useless.

Furthermore, since the contents of this file never change, we can instruct the browser that the file is immutable. We didn’t have to revalidate it for a whole year, even if the user didn’t have to refresh the page. Not only do we gain the speed advantage of using the cache, but we also avoid the delay disadvantage of revalidation.

Decorative picture

Imagine a purely decorative photo accompanying the article. It’s not an infographic, and it doesn’t contain key content that affects the rest of the page. Users won’t even notice if it’s completely gone.

Images tend to be a heavy resource to download, so we want to cache it; Since it’s not as critical on the page, we don’t need to download the latest version; We can even continue to use this photo after it’s a little out of date. See how:

Request URL: /content/masthead.jpg
Cache-Control: max-age=2419200, must-revalidate, stale-while-revalidate=86400
Copy the code

Here we tell the browser cache that after 28 days (2,419,200 seconds) we want to check the server for updates. If the image has not exceeded the one day (86,400 seconds) expiration date, we request the latest version in the background before replacing it.

Key points to remember

  • Cache corruption is extremely, extremely, extremely important. Before you start working on a cache policy, resolve the cache corruption policy.
  • In general, caching HTML content is a bad idea. HTML urls cannot be broken, because HTML pages are often the entry point to other sub-resources on the page, and you cache reference declarations to static files as well. This will make you (and your users)…… It’s a long story.
  • When caching HTML, if one type of page is never cached and other types of pages are sometimes cached, different caching strategies for different types of HTML pages can lead to inconsistencies.
  • If you can reliably do cache destruction (using fingerprints) for your static resources, then it’s best to cache everything for several years at a time for optimal results.
  • Non-critical content can be usedstale-while-revalidateSuch instructions give a stale grace period.
  • immutablestale-while-revalidateNot only does this provide the traditional benefits of caching, but it also allows us to reduce latency costs when revalidating.

Avoiding the web as much as possible will provide a faster experience for users (and lower throughput for our infrastructure, both). With a detailed understanding of the resources and an overview of what is available, we can begin to develop a granular, customized, and effective caching strategy for our application design.

Cache in hand, everything under control.

References and related readings

  • Best Practices & Max-Age Gotchas — Jake Archibald, 2016
  • Cache-Control: immutable —— Patrick McManus, 2016
  • Stale-While-Revalidate, Stale-If-Error Available Today —— Steve Souders, 2014
  • A Tale of Four Caches —— Yoav Weiss, 2016
  • The Clear – Site – Data – the MDN
  • RFC 7234 — HTTP/1.1 Caching —— 2014

Do not imitate what I say or do

Before anyone gets mad at me for being so bad, it’s worth mentioning that my own blog caching strategy is so bad that I can’t even watch it anymore.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.