1. Browser caching process
The process of communication between the browser and the server is a reply mode: the browser makes a request and the server returns a result based on the parameters in the request. Caching reduces data transfer and improves access efficiency. After receiving the request result from the server for the first time, the browser decides whether to cache the result based on the CACHE identifier of the HTTP header in the response packet. The specific process is as follows:
- Each time the browser initiates a request, it first looks up the results and caches the identity in the browser cache.
- Each time the browser receives the result of the returned request, it stores the result and the cache id in the browser cache.
These two points ensure that each request is cached and read, and once you understand the rules of browser caching, everything becomes clear. The caching process can be divided into two parts, strong caching and negotiated caching, depending on whether an HTTP request needs to be re-sent to the server.
2. Browser cache mechanism
Browser cache policies are divided into strong cache and negotiated cache. The process of requesting cache is as follows:
- Browsers load resources based on the request header
expires
andcache-control
Check whether the strong cache is matched. If the strong cache is matched, the resources are directly read from the cache without sending requests to the server. - If the strong cache is not hit, a request is sent to the server, passing
last-modified
ande-tag
Verify that the resource matches the negotiated cache. If so, the server returns the request, but does not return data for the resource, and still reads the resource from the cache. - If the negotiated cache also does not hit, the resource is loaded directly from the server.
The similarity between the two caches is that if a hit is made, the resource is loaded from the client, not from the server. The difference is that the strong cache does not send a request to the server, whereas the negotiated cache sends a request to the server.
The parameters of the response header are explained in detail:
2.1 strong cache
The difference between Expires and cache-Control is that the former is an HTTP1.0 protocol and the latter is an HTTP1.1 protocol, and the latter takes precedence over the former.
2.1.1 Expires
Expires is a header that indicates that a resource has expired. It describes an absolute time that is returned by the server.
Expires: Wed, 11 May 2018 07:20:00 GMT
Copy the code
2.1.2 Cache-Control
The cache-control header is optional and can be used for requests and responses.
Cache-control is the most important rule in HTTP1.1. It controls the caching of web pages.
public
: All content is cached (both client and proxy servers are cacheable)private
(Default): All content can only be cached by the client, cache-controlno-cache
The actual mechanism is that resources are still cached, but each time a cached resource MUST be validated with the server before being used. That is, negotiation cachingno-store
: None of the content is cached, either by force or negotiation.max-age=xxx
(XXX is Numeric) : The cache contents will expire after XXX seconds, and if the cached resources are less than the specified number of times, the client will accept the resources directly from the community. If 0 is specified, you usually need to request the server directlyMin-fresh =60(in seconds)
: Requires the cache server to return cached resources that have not been cached for at least a specified time. For example, resources that do not expire within 60 seconds are required to return.s-maxage
(unit: s) : The same as max-age,This parameter is valid only on the proxy server(such as CDN cache). Such as whens-maxage=60
During this 60 seconds, even if updatedCDN
The browser will not request the content.max-age
For normal caching, whiles-maxage
Used for proxy caching.s-maxage
Value has a higher priority than max-age. If there is an S-maxage, it overwrites itmax-age
andExpires header
.must-revalidate
: If yesmax-age
Cache is used when the resource is smaller than max-age, otherwise the resource needs to be validated.Max-stale =3600(unit: second)
: indicates that cached resources are received even if they are expired.only-if-cached
: indicates that the client will only ask the target resource to be returned if it is cached locally by the cache server. In other words, the directive requires that the cache server not reload the response or revalidate the resource.proxy-revalidate
: requires all cache servers to validate the cache again before receiving a response from a client with this directiveno-transform
: specifies that the cache cannot change the media type of the entity body, either in the request or the response
2.1.3 Flow diagram of strong cache
2.2 Negotiated Cache
When the strong cache is Not hit, the browser sends a request to the server to verify that the negotiated cache is hit. If the negotiated cache is hit, the request response returns an HTTP status of 304 and displays a Not Modified string.
The negotiated cache is managed using last-Modified, if-Modified-since and ETag, if-none-match headers.
2.2.1 last-modified, the if-modified-since
Last-modified indicates the date the local file was Last Modified. The browser will add if-modified-since to the request header and ask the server If the resource has been updated Since that date. Updates send the new resource back. The drawback of this parameter is that it causes last-modified if the cache file is opened locally. Hence the ETag in HTTP1.1.
2.2.2 ETag, if-none-match, if-match
Etag is a hash value generated by the server for a specific resource and used as a unique identifier for the resource. Resource changes will result in the transformation of ETag, regardless of the last modification time.
If-none-match’s header sends the last returned ETag to the server, asking If the resource’s ETag has been updated, and sending a new resource back If it has changed.
If-match tells the server to Match the entity tag of the resource, and the server cannot use the weak Etag value. The server executes the request only when the IF-match field value and the Etag of the resource are consistent, otherwise it returns the feed Fialed, 412 Feed. You can also ignore this value by using “*”
ETag has a higher priority than Last-Modified for several reasons:
- Periodic file changes, where only the time has changed and the contents have not changed, are expected to be read by the client from the cache
- The original
Expires
You can only control the file modification frequency to the level of seconds. If the file is modified N times within 1s, you cannot determine how many times the file is modified. - Some servers do not know exactly when a file was last modified.
2.2.3 Strong ETag and Weak ETag
- Strong ETag: Changes the value of an entity no matter how slight a change it makes
- Weak ETag: indicates whether the resources are the same or not. The value of ETag is changed only when the resources are fundamentally changed. In this case, the value of ETag is appended at the beginning of the field value
W/
2.3 Browser Status Code
- 200: strong slow
Expires/Cache-Control
When storage fails, a new resource file is returned - 200(from disk cache /from Memory cache): strong cache
Expires/Cache-Control
Both are present, not expired,Cache-Control
Give priority toExpires
Is displayed, the browser successfully obtains resources from the local PC - 304(Not Modified) : Negotiated cache
Last-modified/Etag
When there is no expiration, the server returns status code 304
2.4 Heuristic caching
- There are no caching fields — no caching policies are set
- Typically 10% of the Date minus the last-modified value in the response header is taken as the cache time
2.5 Actual Scenarios
The general order is as follows:
- Cache-control — before requesting the server
- Expires – Before a request to the server
- If-none-match (Etag) — Request server
- If-modified-since (last-modified) — Request server
Negotiated caching needs to be used in conjunction with strong caching. If strong caching is not enabled, negotiated caching is meaningless
Most Web servers have negotiated caching enabled by default, and both [last-modified, if-modified-since] and [ETag, if-none-match] are enabled.
But the following scenarios need to be noted:
Last-modified files must be consistent across multiple machines in a distributed system to avoid load balancing on different machines resulting in failed comparisons. Distributed systems turn off ETAGS as much as possible (eTAGS are generated differently for each machine);
2.6 summary
The overall cache request flow is as follows:
3. Position of the browser cache
At the end of the previous section, different cache read locations for the 200 status code were mentioned. In this section, to sort out the relevant knowledge points.
From the point of view of the cache position, there are four kinds, and each has its priority. When the cache is searched in sequence and none is hit, the network will be requested. Are:
- Service Worker
- Memory Cache
- Disk Cache
- Push Cache
3.1 the Service Worker
Service Workers essentially act as a proxy server between the Web application and the browser, and can also act as a proxy between the browser and the network when the network is available. They are designed, among other things, to enable the creation of an effective offline experience, intercept network requests and take appropriate action based on whether the network is available and whether updated resources reside on the server. They also allow access to push notifications and background synchronization apis. — MDN DOCS
For a detailed introduction to the Service Worker, a separate article can be compiled to introduce its use and principle. Here we briefly mention a few. (Digging)
Service workers must use HTTPS protocol to ensure security because they are involved in request interception. Service workers can freely control which files to cache, how to match the cache, how to read the cache and establish a sustainable cache.
Service Worker implementation caching can be roughly divided into three steps:
- registered
Service Worker
. - Listening to the
install
The event - Cache the required files, so the next time the user access can be queried by intercepting the request whether there is a cache, there is a cache can be directly read cache files, otherwise to request data.
When the Service Worker does not hit the cache, we need to call the fetch function to fetch the data. That is to say, if the cache is not caught in the Service Worker, the cache will be hit according to the browser’s cache policy. However, no matter where we get the data from, The browser will display what we fetched from the Service Worker.
3.2 the Memory Cache
Memory Cache refers to the Memory Cache, which contains the resources captured in the current page, such as CSS, JS, and images that have been downloaded on the page. Memory cache is fast to read, but persistent, and will be released as the process is released. Once we close the TAB page, the cache in memory is freed.
The memory cache contains an important cache resource, which is downloaded by preload related instructions. This is a common method of page optimization, where you can parse js/ CSS files while requesting the next resource.
It is important to note that memory caching does not care about the HTTP Cache header cache-Control value of the returned resource, and the matching of resources is not only for URL matching, but also for content-Type,CORS and other characteristics.
3.3 Disk Cache
Disk Cache is a hard Disk Cache with slow read speed, large capacity, long duration, and wide usage.
This cache is the most widely used of all caches and is the primary storage location for our cache policy in Section 2. What files are stored in the Memory Cache and what files are stored in the Disk Cache depends on the browser policy. The general logic may be:
- For large files, there is a high probability that they will not be saved in memory, and vice versa is preferred
- When the memory usage is high, files are preferentially stored on hard disks.
3.4 Push the Cache
Push Cache is an http2 thing that is used when all three caches fail. It only exists in the Session, is released once the Session ends, has a short cache time (about 5 minutes in Chrome), and does not strictly implement the cache instructions in the HTTP header.
Here are a few general conclusions for understanding (source):
- All resources can be pushed and cached, but Edge and Safari support is relatively poor
- You can push no-cache and no-store resources
- Once the connection is closed, the Push Cache is released
- Multiple pages can use the same HTTP/2 connection and thus use the same Push Cache. This depends on the browser implementation. For performance reasons, some browsers may use the same HTTP connection for different tabs with the same domain name.
- The Cache in a Push Cache can only be used once
- The browser can refuse to accept an existing resource push
- You can push resources to other domains
4. Actual scenarios and user behaviors
- Frequently changing resources:
Cache-Control: no-cache
- A resource that is not constantly changing:
Cache-Control: max-age=31536000
There are three main types of user behavior:
- Open the web page: check whether there is a match in the disk cache. Use if available; If no network request is sent.
- Plain flush: Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next
- Forced refresh: The browser does not use caching, so requests are sent with cache-control: no-cache(with Pragma: no-cache for compatibility) in the header, and the server returns 200 and the latest content.
Refer to the link
- Cache (1) to (5)
- Thoroughly understand the browser caching mechanism
- Have an in-depth understanding of the browser caching mechanism
- The Way of front-end interviews # Caching