This article is taken from my public account [Sun Wukong, Don’t talk nonsense]
— The content of this article refers to the small volume of self-study “Front-end performance Optimization Principles and Practices”, and adds my own description and understanding on this basis.
For those interested in this chapter, go to this booklet for more details
Getting to know the browser cache
When we have a problem with our browser/phone/computer, we actively/passively try “Clean the cache” first. . It seems that “cleaning the cache” is a one-size-fits-all solution to the problem. It’s kind of like “reboot it and try it” when you have a problem with your computer, right?
Sometimes it solves the problem, sometimes it doesn’t. The amazing thing is, whether or not it’s resolved, we don’t seem to know exactly what the concept of “cache” actually contains.
Today, we’re going to take a look at the browser cache.
Why do YOU need a browser cache
Before we dive into the various caching mechanisms, let’s talk about why browser caching is needed.
Simply put: the browser designers thought: The Internet isn’t fast enough! Getting resources from caches and hard drives is much faster than getting resources from the Internet. Browser designers have observed that on most web sites, the same resource appears repeatedly on multiple pages. Naturally, one wonders: Why does a browser need to repeatedly download the same resource for multiple pages? Why not just download once, and then use the same resource on different pages, so you can skip a lot of downloads? After all, getting resources from a cache or hard drive is much faster than getting them from the Internet. If we do that, the browser user experience will be much better, too.
Browser Cache Summary
The concept of browser caching can mean different things to different people.
Sometimes this is magnified: HTTP caches, Memory caches, cookies, Webstorages, and IndexedDB data are all called caches. The reason is that it is all data stored on the client side and there is no difference.
Sometimes it is too narrow: some people tend to think of browser caching simply as “HTTP caching.”
In fact, there are four aspects to the browser caching mechanism, which are listed in order of priority when requesting resources:
- Memory Cache
- Service Worker Cache
- HTTP Cache
- Push Cache
You should be familiar with HTTP caches (cache-control, Expires, etc.), but you may not be familiar with the other types of caches. Let’s look at a screenshot of the developer tool:
Then zoom in on the size column:
In this case, a description like “(from XXX)” appears, which corresponds to the resource we obtained from the cache. “From Memory Cache” is marked to the memory cache type, and “From ServiceWorker” is marked to the ServiceWorker cache type. As for Push Cache, this one is special and new to HTTP2.
This article focuses on two types of caches: Memory Cache and HTTP Cache. The other two are not very common, so I ignore them.
Considering that HTTP caching is the most important and representative caching strategy, the following sections focus on the HTTP caching mechanism.
An exploration of HTTP caching mechanism
HTTP caching is one of the caching mechanisms we are most familiar with in our daily development. It works primarily through the Header field in HTTP. It can be used in either request headers or response headers. In general, the HTTP caching mechanism refers to the different caching decisions made between the server and the browser for different fields in the response Header.
In more detail, HTTP caching is divided into strong caching and negotiated caching. The strong cache has a higher priority. The negotiation cache is enabled only when the strong cache fails to be matched.
Strong cache
Strong caching is controlled using the expires and cache-Control fields in the HTTP Header. In a strong cache, when a request is made again, the browser determines whether the target resource “matches” the strong cache based on the expires and cache-control parameters. If so, the browser directly obtains the resource from the cache without communicating with the server.
Implementation of strong caching: from Expires to cache-control
Implement strong caching, which was used until HTTP 1.1. When the server returns a Response, write the expiration time in the Response Headers field. Such as:
Look at the Expires field: Mon, 10 Feb 2020 03:00:35 GMT.
As you can see, expires is a timestamp, and then if we try to request a resource from the server again, the browser will compare the local time to the Expires timestamp. If the local time is less than the expiration date set by expires, it will simply fetch the resource from the cache.
It’s not hard to guess from this description that Expires is problematic, and that its biggest problem is its reliance on “local time.” If the server and client times are set differently, or if you just manually change the client time, expires won’t meet your expectations.
To address the limitations of Expires, HTTP1.1 added the cache-Control field to accomplish the expires task. Cache-control does everything expires does. Cache-control can do things that expires can’t. Therefore, cache-control can be considered a complete replacement for Expires.
Cache-control: max-age=600 cache-control: max-age=600 Cache-control controls the validity period of a resource by setting max-age to 600. The unit of 600 is seconds. This means that the resource is valid for 600 seconds, perfectly circumventing the “local time” problem of the Expires field rule.
Cache-control is analyzed
As a complete expires alternative, cache-control is more powerful than you might think.
To better analyze cache-control, let’s take a look at some new concepts:
Max – age and s – maxage
S-maxage is not as well known as max-age. It only works on proxy servers, not clients. Does that mean s-maxage is not important? Not really.
In scenarios where the project is not particularly large, max-age is sufficient. But in large architectures that rely on various proxies, we have to consider proxy server caching. S-maxage is used to indicate the validity time of the cache on the cache server (such as the cache CDN), and is only valid for the public cache.
Let’s look at public.
Public and private
Public and private are opposing concepts for whether a resource can be cached by a proxy server.
If we set the resource to public, it can be cached by both the browser and the proxy. If we set private, the resource can only be cached by the browser.
Since there is a setting not to be cached by proxy servers, is it possible to set resources not to be cached by browsers? Short ~ really have. Take a look at the following set of concepts.
No – the store with no cache
No-cache bypasses the browser: With no-cache for a resource, every time a request is made, the browser is no longer asked about the cache, but the server is asked to confirm that the resource is expired (following the route of negotiating the cache that we’ll explain later).
No-store is even worse: as the name implies, it doesn’t use any caching strategy. On top of no-cache, it also bypasses the server’s cache acknowledgement, allowing you to send the request directly to the server and download the complete response.
Negotiate the cache
Negotiated caching is a browser – server caching strategy. It relies on communication between the server and the browser.
As we learned above, the negotiated cache is only used if the strong cache fails. In this case, the browser needs to ask the server for cached information to determine whether to re-initiate the request, download the complete response, or fetch cached resources locally.
If the server prompts that the cache resource is Not Modified, the resource is redirected to the browser cache, in which case the status code for the network request is 304 (see figure below).
Negotiated cache: Last-Modified to Etag
Last-modified is a timestamp. If we enable the negotiated cache, it will return with Response Headers on the first request:
last-modified: Mon, 06 Jan 2020 08:00:19 GMT
Copy the code
We then request the same resource each time with a timestamp field called if-modified-since, whose value is the last-modified value returned to it in response:
If-Modified-Since: Mon, 06 Jan 2020 08:00:19 GMT
Copy the code
After receiving the timestamp, the server compares the timestamp with the last modification time of the resource on the server to determine whether the resource has changed.
If it changes, a complete Response is returned and a new last-Modified value is added to Response Headers; Otherwise, a 304 Response is returned, and Response Headers does not add the last-Modified field.
Last-modified has some drawbacks, of which two scenarios are most common:
-
We edited the file, but the contents of the file did not change (e.g., we changed the file, but did not save it before closing it). The server doesn’t know if we’ve actually changed the file, it’s still judging by the last edit time. This resource is therefore treated as a new resource when requested again, triggering a full response – and re-requested when it should not be.
-
When we modify a file too quickly (such as when it took 100ms to complete the change), if-modified-since can only detect the time difference in seconds, so it doesn’t sense the change — it doesn’t request it again when it should.
Both scenarios point to the same bug — the server does not correctly perceive file changes. To solve this problem, Etag emerged as a last-Modified supplement.
Etag is a unique identification string generated by the server for each resource. The identity string is encoded based on the content of the file, and the corresponding Etag is different as long as the content of the file is different, and vice versa. Therefore, Etag can accurately sense changes in the file.
Etag is similar to last-Modified in that it has a similar judgment process. It is not covered here.
Etag is not without its drawbacks:
- Its generation process requires the server to pay extra overhead, which affects the performance of the server.
- For distributed backend, each server may generate different unique identification strings for the same resource. At this point, more processing logic is required to make Etag work correctly.
Therefore, enabling Etag requires us to evaluate the situation. As we just mentioned — Etag is not a replacement for Last-Modified, it can only exist as a complement and enhancement to last-Modified. Etag senses file changes more accurately than last-Modified and has a higher priority. When both Etag and Last-Modified exist, Etag prevails.
HTTP cache decision recommendations
So far, we have clearly sorted out the HTTP cache. So what do we do when faced with a real cache requirement? Will there be any suggestions?
Let’s take a look at the definitive guide from Chrome:
Let’s take a look at this flowchart:
When the content of the resource cannot be reused, we directly set the no-store for cache-control to reject all forms of caching. If yes, set cache-control to no-cache. If yes, set cache-control to no-cache. Otherwise, consider whether the resource can be cached by the proxy server and decide whether to set it to private or public based on the result. Then consider the expiration time of the resource and set the corresponding max-age and s-maxage values. Finally, configure Etag and last-Modified parameters required by the negotiation cache.
I hope this chart (authoritative guide) will be of some help to our specific work.
MemoryCache introduction
A MemoryCache is a cache that exists in memory. In terms of priority, it is the first cache that the browser tries to hit. In terms of efficiency, it is the fastest type of cache.
Memory caching is fast and short-lived. It “lives and dies” with the rendering process, and when the process ends (TAB closes), the data in memory ceases to exist.
So what files are going into memory?
In fact, this rule of division has always been inconclusive. However, you can understand that memory is limited. In many cases, you need to consider the amount of memory available for real-time display first, and then determine the proportion of resources allocated to memory and disk based on the specific situation. The location of resources is random to some extent.
Therefore, if we find the same resource in the development process, sometimes it is “from disk cache”, sometimes it is “from memory cache”. Don’t be surprised either. This is because of the randomness above.
summary
In this article, we talked about the need for browser caches and discussed two common browser caches: Http Cache and Memory Cache. Hope to gain!
Front-end performance optimization series:
(a) : start with the TCP three-way handshake
(two) : for the blocking of TCP transmission process
(3) : optimization of HTTP protocol
(IV) picture optimization
(v) : browser cache strategy
(vi) : How does the browser work?
(vii) : Webpack performance optimization