preface
When visiting a web page, the client downloads the required resources from the server. However, some resources rarely change, such as HTML, JS, CSS, images, font files, etc. If these resources are downloaded from the source server every time a page is loaded, it not only increases the retrieval time, but also puts some strain on the server. Therefore, it is important to reuse acquired resources. The requested resource is cached and the next time the same resource is requested, the stored copy is used instead of going to the source server to download it. This is what we call caching.
There are many types of caches: browser cache, gateway cache, CDN cache, proxy server cache, etc. These caches fall roughly into two categories: shared caches and private caches. Shared caches can be used by multiple users, while private caches can only be used by a single user. The browser cache exists only for each individual client, so it is a private cache.
This article focuses on private (browser) caches. You will learn:
- Classification of browser caches
- How do I enable and disable caching
- The location of cache storage
- How do I set the expiration time of the cache
- What happens after the cache expires
- How to develop an appropriate caching strategy for your application
- debugging
- How can I tell if a site has caching enabled
- How do I disable browser caching
The cache storage
Enable the cache
Cache-Control
The browser decides whether to cache the resource based on some fields in the HTTP Response Headers. Caching can be enabled by setting cache-Control and Expires in Response Headers so that the resource is cached to the client.
Cache-control Enables caching by setting private, public, max-age, and no-cache.
Cache-Control: private/public
Cache-Control: max-age=300
Cache-Control: no-cache
Copy the code
private
: Indicates that the resource can only be cached by the browser.public
: indicates that the resource can be cached by both the browser and any middleman (such as proxy server or CDN).max-age
: Indicates the maximum time that the resource can be cached. If setmax-age=0
The resource will still be cached by the browser, but it will expire immediately.no-cache
: The resource is cached but expires immediately. Therefore, you need to check with the server whether the resource has changed. The cache is used only when the resource has not changed. The equivalent ofmax-age=0
。
Expires
Expires controls when a resource Expires by specifying when the cache Expires. Caching can be enabled by setting Expires. However, note that the Expires value is Greenwich Mean Time (GMT), not local Time.
Expires: Fri, 08 Mar 2029 08:05:59 GMT
Expires: 0 // Expires: 0 will still enable caching, but it will expire immediately.
Copy the code
priority
Since both cache-control and Expires can be cache-enabled, the question is, how should a browser Cache that resource if cache-Control: max-age=600 and Expires: 0 are both set? Only cache-control: max-age=600 works. Because cache-control takes precedence over Expires, if both cache-control and Expires are set, cache-control prevails.
Default behavior of the browser
After cache-control is set, you can see that the browser does enable caching (from disk Cache). As follows:
Cache-Control:max-age=604800, must-revalidate, public
Copy the code
But I found that even if cache-Control and Expires are not set in the Response Header, browsers still Cache certain resources. Why is that?
It turns out that when a Response Header has last-Modified but no cache-control or Expires, the browser uses its own algorithm to determine how long the resource will be cached. This is an optimization made by the browser to improve performance, and it may behave differently from browser to browser, and some browsers may not even have such an optimization. Therefore, if you want to enable caching, you should set cache-Control and Expires appropriately and not rely on the browser’s own caching algorithm. Of course, if you’re debugging a file that should have been updated, don’t forget to check to see if it’s cached by the browser.
Prohibit the cache
Setting no-store to cache-control prevents browsers and middlemen from caching the resource. Useful when working with resources that contain personal privacy data or banking data.
Cache-Control: no-store
Copy the code
Cache target object
In general, browser caches can only store GET responses, such as static resources such as HTML, JS, CSS, images, etc. Because these resources do not change very often, caching can help speed up the retrieval of resources. But like some POST/DELETE requests, these requests are basically different each time and therefore have little cache value.
The cache location
The browser can create a space in memory or hard disk to hold a copy of the requested resource. We often see Memory Cache and Disk Cache in Dev Tools, where the Cache is located. When a resource is requested, the system searches the Cache according to its priority (Service Worker -> Memory Cache -> Disk Cache -> Push Cache). If a match is made, the Cache is used; otherwise, a network request is initiated. This section describes only common Memory Cache and Disk Cache.
200 from Memory Cache
The cache is read directly from memory without accessing the server. Cached resources are fast to read because they are stored in memory, but cached resources are destroyed when the process is shut down. In general, the system does not allocate large amounts of memory, so memory caches are used to store small files. Memory caching is also useful in time-sensitive scenarios (such as the browser’s privacy mode).
200 from Disk Cache
The cache is read directly from the hard disk without accessing the server. Hard disk is slower to read than memory, but hard disk caching lasts longer and cached resources remain after a process is shut down. Due to the large capacity of hard disks, they are generally used to store large files.
In summary:
Memory cache: fast read, short duration, small capacity
Hard disk cache: Reads data slowly, lasts for a long time, and has a large capacity
Classification of cache
Browser caches generally fall into two categories: strong caches (also called local caches) and negotiated caches (also known as weak caches). The decision process is as follows:
- Before sending a request, the browser checks the cache to see if the strong cache is hit. If so, the browser reads the resource directly from the cache and does not send the request to the server. Otherwise, go to the next step.
- When a strong cache is not hit, the browser must make a request to the server. The server determines whether the negotiated cache is hit based on some of the fields in the Request Header. If it hits, the server returns a response, but without any response entity, just telling the browser that it can fetch the resource directly from the cache. Otherwise, go to the next step.
- If the first two steps fail, the resource is loaded directly from the server.
What strong and negotiated caches have in common is that, if hit, resources are loaded from the client cache rather than from the server. The difference is that the strong cache does not send a request to the server, whereas the negotiated cache sends a request to the server to verify that the resource is expired. A normal refresh enables negotiated caching and ignores strong caching. Strong caching is enabled only when you enter a url in the address bar or favorites, reference a resource through a link, and so on.
Cache expiration policy
When the cache expires, the browser makes an HTTP request to the server to determine if the resource has changed. If the resource does not change, the browser continues to use the locally cached resource; If the resource has changed, the browser removes the old cached resource and caches the new resource locally.
Expiration time
Cache-control: max-age= XXX and Expires in the Http Response Header can both set the expiration time of the Cache, but they have some differences:
Expires: Identifies a point in time when the resource Expires. It is an absolute value after which the cached resource Expires. Max-age: indicates the maximum time that the resource can be cached. It is a relative value to the “request initiation time” recorded by the server when the document was first requested.
While cache-control is a new feature of HTTP 1.1, it’s not that max-age is superior to Expires. Each of them has its own usage scenario, and it is up to the business to decide which one to use. For example, Expires should be used when a resource needs to expire at a certain point in time. If it’s just to enable caching, it’s probably better to use max-age because cache-control takes precedence over Expires.
For files that rarely change in your application, you can usually set a long expiration time to keep the cache valid. Such as images, CSS, JS and other static resources.
Cache validation
As mentioned in the previous section, when a browser requests a resource and finds that the resource is in the cache but has expired, it makes an HTTP request to the server to verify that the cached resource has changed.
Cache validation timing
When will cache validation take place?
- Refresh the page. In general, to ensure that the user gets the latest data, most browsers do not use the cached data when refreshing the page, but instead issue a request to the server for validation.
- Cache-control: must-revalidate is set in the Response Header. When a cached resource expires, the source server must verify that the resource has not expired before the cache can be used.
Cache validator
How does the server determine whether a resource has changed or not? When the server returns the Response content, it also sets some authentication identifiers in the Response Header. When the cached resource expires, the browser will send a request to the server with the authentication identifiers. The server can know whether the cached resource has changed by comparing these identifiers.
There are two main groups of validation identifier fields in the Header: Etag and if-none-match, last-Modified and if-Modified-since. The request header field in the form of if-xxx can be called a conditional request. For example, return or upload files only if certain conditions are met to save bandwidth.
Last-Modified
Last-modified is a validator. When the server returns the resource to the client, it adds the last-modified time of the resource to the Response Header. The browser marks the resource with this information, and when the cache expires, the browser sets this information to the if-modified-since in the Request Header to make a Request to the server.
If the value in if-modified-since matches the last modification time of the resource on the server, it indicates that the resource has not been Modified, and the server returns a 304 status code with no response entity, thus saving the amount of data transferred. If not, the server returns the 200 status code and, as with the first HTTP request, the response entity and the validator.
Last-Modified:Fri, 04 Jan 2019 14:00:21 GMT
Copy the code
Etag
The server uses an algorithm to calculate a unique identifier for the resource. When returning the Response to the client, the server adds the Etag: unique identifier to the Response Header and returns it to the client.
Etag:"952d03d8561454120b550f0a5679a172c4822ce8"
Copy the code
The client saves the Etag and sends the Etag to the server as the if-none-match value in the Request Header for subsequent requests. By comparing the Etag sent from the client to the Etag saved on the server, you can know if the resource has changed. If the resource has not changed, 304 is returned and the client continues to use the cache. If the resource has been modified, return 200.
Develop a cache strategy
Caching is really something we both love and hate. During development, we often encounter the question: why is this file not working when we have already modified it? Okay, files are cached… But when we go live, we want the files to be cached by the browser as much as possible to improve performance. Therefore, it is important to have an appropriate caching strategy for your application.
Set long cache times for static resources.
Some resources do not change for a long time, such as some tripartite libraries, images, font files, etc. You can set a long expiration time for them, such as “one year.”
Ensure that file changes take effect by giving a unique identifier to the file name.
Sometimes to solve bugs, we may modify some files, such as application CSS, JS, etc. If these files are already cached, the user will not be able to retrieve new files until the cache expires, unless the user forces a page refresh. How do you get the browser to download a new file instead of using the cache? One way to do this is to assign a unique identifier to the file name, such as Hash or version information. This unique identifier changes when the file is modified. When the browser notices that the file has changed, it will not use the cache.
Determine if any resources cannot be cached.
For example, if sensitive data should not be cached by the browser, you need to set cache-control: no-store in the Response Header.
debugging
How do I know if the requested resource is cached? Open Chrome’s Developer Tools and you can see the Size column below. If the actual Size of the file is displayed, it indicates that the file is not cached. If from XXX cache is displayed, it indicates that the request is using a cached file. As follows:
To disable browser cache during debugging, select Disabel Cache on the developer tool.
The last
And finally, I want you to remind yourself of what we just said by looking at this picture.
reference
- HTTP cache
- Weird caching strategy