Introduction to Browser Caching
Copy requested and returned WEB resources (HTML pages, images, JS files, CSS files, data, etc.) into a copy and store it in the browser cache.
Benefits of caching
- Reduce network latency and speed up page opening
- Reduce Internet broadband consumption
- Reduce server stress
HTTP caching mechanism
According to what rules is the cache
- Freshness (expiration mechanism): That is, the cache copy expiration date. A cached copy must satisfy the following conditions for the browser to consider it valid and new enough:
- Contains the complete expiration control header (HTTP header) and is still valid.
- The browser has already used this cache copy and has already checked for freshness in a session;
- Check value (validation mechanism) : When the server returns a resource, it sometimes carries the Entity Tag (Etag) of the resource in the control header, which can be used as the check identifier for the browser to request again. If the verification identifier does not match, the resource has been modified or expired, and the browser needs to obtain the resource content again.
The two caching phases of HTTP
Browser caches can be divided into two categories: strong caches (also known as local caches) and negotiated caches (weak caches)
Local cache phase
Before sending a request, the browser checks the cache to see if a strong cache has been hit. If a strong cache has been hit, the browser reads the resource directly from the cache and does not send the request to the server. Otherwise, go to the next step.
Negotiation cache phase
When the cache misses, the browser sends a request to the server. The server determines whether the negotiated cache is hit based on some of the fields in the Request Header. If a hit is made, the server returns a 304 response, but without any response entity, just telling the browser that it can fetch the resource directly from the browser cache. If neither the local cache nor the negotiated cache is hit, the resource is loaded directly from the slave server.
Common cached message headers
- Pragma: no-cache Response headers
- However, the browser ignores the resource cache copy and needs to fetch it from the server every time it accesses it
- HTTP1.1 uses cache-control instead
- Cache-Control: The cache controls the response header
- No-cache: instructs the browser to ignore the resource cache copy and force the server to fetch the resource (the browser is still caching)
- No-store: do not keep a copy of the cache under any circumstances (caching is disabled for the current requested resource)
- Max-age: indicates the validity period of the cache copy, the number of seconds from request time to expiration time
- Public: indicates that the cache can be identified by proxy servers or other intermediate servers
- Private: indicates that only the user’s own browser can cache. The public proxy server does not allow caching
- Expires: (GMT)
- A new feature in HTTP1.0 marks the point at which the resource expires. It is an absolute value, Greenwich Mean Time (GMT), after which the cached resource expires.
- Starting with HTTP1.1, use cache-control: max-age=1024(seconds) instead
4.Etag Last-Modified
Explain this picture
When a browser requests a resource, it first determines whether the cache of the resource has expired
- If the cache of the resource is not outdated, read the data from the cache (from dist cache) -> render it to the browser
- If the resource has expired, it checks for an Etag, last-Modified HTTP response header for the resource
If these two headers are not available, the request is sent directly to the server, requesting data -> return accordingly/cache -> render
If there are two headers, the browser sends the request to the server with Etag as if-none-match and last-modified as if-modified-since headers. The server uses these headers to compare whether the resource has changed or not. If it has not changed, 304 is returned. And tell the browser that you already have (read cache)-> browser rendering. If the resource on the server has changed and if-modified-since is not the same as last-modified of the server resource in the request header, the resource has changed and the browser is resent with data (200)-> browser rendering.
* Reference article – The Heart of the Machine