What is caching
Caching is a technique for saving a copy of a resource and using it directly on the next request. (MDN)
The Web browser cache here refers primarily to the HTTP cache.
The role of caching
- Reduce network latency and speed up page opening
- Reduce network bandwidth consumption
- Reduce server pressure
Classification of cache
By permission
- Private cache, private cache can only be used by individual users
- Shared caches, which store responses that can be used by multiple users. For example, an ISP or your company might set up a Web proxy to serve users as part of the local network infrastructure. Hot resources can then be reused, reducing network congestion and latency.
By storage location
Browser local cache
From memory cache
The memory cache has two features: fast reading and timeliness. Fast reading: The memory cache directly stores the compiled and parsed files into the memory of the process, occupying certain memory resources of the process, and facilitating the fast reading of the files in the next run. Timeliness: Once the process is shut down, its memory is emptied.
From disk cache
Disk cache directly writes the cache to disk files. To read the cache, I/O operations are performed on the disk files stored in the cache and then the cache contents are re – parsed. The reading of the cache is complex and slower than that of the memory cache.
Listed as the instructions displayed in the browser:
- From memory cache: uses the cache in memory
- From disk cache: indicates that the cache on the hard disk is used
- From Prefetch cache, when preload or Prefetch resources are loaded, they are also stored in HTTP cache. After loading, if the resources can be cached, they are stored in HTTP cache for future use. If the resource cannot be cached, it is stored in memory cache until it is used.
Browser resource cache access priority
When a browser requests a resource, it searches the Cache in order of priority (Service Worker -> Memory Cache -> Disk Cache -> Push Cache). If a resource is matched, the Cache is used; otherwise, the browser initiates the request.
Take Service access without Service Worker as an example: Baidu.com – > 200 – > Close the label of the page – > Open the baidu.com – > 200(from disk cache) – > Refresh – > 200(from memory cache).
Proxy/server cache
The gateway cache, CDN, reverse proxy cache, load balancer, and other caches deployed on the server
The Tencent CDN is used as an example. In the request header, X-cache-lookup :Hit From MemCache indicates that the memory of the CDN node is Hit. X-cache-lookup :Hit From Disktank Indicates hitting disks of the CDN. X-cache-lookup :Hit From Upstream Indicates that the CDN is not matched.
Process for accessing browser resources
The key steps are as follows:
- Before sending a request, the browser checks the cache for a match. If a match is found, the browser reads resources from the cache and does not send the request to the server. Otherwise, go to the next step.
- When the strong cache does not hit, the browser makes a request to the server.
- The server determines whether the negotiated cache is hit based on some of the fields in the Request Header. If a hit is made, the server returns a 304 response, but without any response entity, just telling the browser that it can fetch the resource directly from the browser cache.
- If neither the local cache nor the negotiation cache is hit, the resource is loaded directly from the slave server, and the server returns the cache rule to the browser in the HTTP header of the HTTP response, along with the request result
Http header cache related fields
Strong cache (local cache) Control expires&cache-control
The fields for mandatory-caching are Expires and Expires in the request header, where cache-control takes precedence over Expires.
Expires(response header)+Date
Expires is a response header field that controls the HTTP/1.0 web cache. Its value is the expiration time that the server returns to the cache of the result of the request. That is, when the client initiates the request again, if the time is less than the Expires value, the cached result is used directly.
Expires is an HTTP/1.0 field, but browsers now use HTTP/1.1 by default. Is web caching still controlled by Expires in HTTP/1.1?
With HTTP/1.1, Expire was replaced by cache-control. The reason is that Expires caching works by comparing the client time with the time returned by the server. If either the client or the server has an incorrect time), the forced cache will be invalidated, so the existence of the forced cache will be meaningless.
Cache-control (this property is supported by both request and response headers)
The cache-control header defined in HTTP/1.1 is used to distinguish between support for caching mechanisms, and is supported by both request and response headers. Caching policies are defined by the different values it provides.
- Cache-control: no-store // No information about client requests and server responses can be stored in the Cache. The complete response is downloaded for each request initiated by the client.
- Cache-control: no-cache // Forces the Cache to submit requests to the original server for validation (negotiated Cache validation). Each time a request is made, the cache sends the request to the server (which should have validation fields related to the local cache), and the server verifies that the cache described in the request is expired. If it is not, the cache uses the local cache copy.
- Cache-control: private // “private” indicates that the response is dedicated to a single user. Middlemen cannot Cache the response, and the response can only be applied to the browser’s private Cache.
- Cache-control: public // The “public” directive indicates that the response can be cached by any middleman (intermediate proxy, CDN, etc.). If “public” is specified, pages that are not normally cached by middlemen (such as pages with HTTP authentication information (account password) or some specific status code) will be cached by middlemen.
- Cache-control: max-age=31536000 // Indicates the maximum time that a resource can be cached (kept fresh). In contrast to Expires, max-age is the number of seconds since the request was initiated. For files that do not change in an application, you can manually set the duration to ensure that the cache is valid, such as static resources such as images, CSS, and JS. Used with Age
- Cache-control: must-revalidate // Cache When considering using an obsolete resource, its state must be verified first. Expired caches will not be used
Pragma(request headers)
Pragma is a header property defined in the HTTP/1.0 standard. The effect of the Pragma included in the request is defined in the cache-control header: No-cache is the same, but the HTTP response header does not explicitly define this property, so it is not a complete replacement for the cache-control header defined in HTTP/1.1. Pragma is generally defined for backward compatibility with HTTP/ 1.0-based clients.
Negotiated cache control
The identifier of the negotiation cache is also returned to the browser in the HTTP header of the response packet together with the request result. The fields controlling the negotiation cache are as follows: Last-modified/if-modified-since and Etag/if-none-match, where Etag/if-none-match has a higher priority than last-modified/if-modified-since.
Last-modified versus if-modified-since
Belongs to HTTP 1.0. When requesting a resource from a server with an if-Modified-since header, the server checks for last-Modified and returns a 304 response with no body If last-Modified is earlier than or equal to if-Modified-since. Otherwise the resource is returned.
Last-modified, which is only accurate to a second, serves as a weak validator.
ETag(response header) and if-none-match (request header)
Belongs to HTTP 1.1. ETag is a response header field, a strong validator. It is a hash string generated from the entity content to identify the state of the resource. It is generated by the server. If-none-match is a conditional request header. If this field is added to the request header and the value is the ETag of the resource previously returned by the server, the server will return a 200 response with the requested resource entity if and only if the server does not have any resource ETag values listed in this header. Otherwise, the server will return a 304 response with no entity.
ETag VS Last-Modified
The last-Modified tag is only accurate to the second level. If a file has been Modified more than once in a second, it will not accurately indicate the freshness of the file. Some files may change periodically, but their contents do not change (only the modification time), but last-modified changes make the file uncacheable. The server may not obtain the correct file modification time or the time on the proxy server may be inconsistent with that on the proxy server. ETag has a higher priority than last-Modified, and ETag is used when it exists at the same time.
Vary (response header)
Is an HTTP response header that determines whether a cached response should be used for a future request header or a new response should be requested from the source server. It is used by the server to indicate which headers should be used when selecting a resource representative in the Content Negotiation algorithm. It indicates that a response is different for a response header.
Vary: Accept, for example, means that the response varies depending on the format header of the requested resource, so resources accessed through the same URI will know that their content format is different based on that header.
The same URL can provide multiple different documents, which requires a mechanism between the server and client to select the most appropriate version, called content negotiation. The server automatically sends the most appropriate version based on certain fields in the request header sent by the client. There are two more types of request header fields that can be used for this mechanism: content-negotiation special fields (Accept fields) and other fields.
For example, the accept-Encoding field is a special field for Content negotiation. The server only needs to add the Content-Encoding field in the response header to specify the Content compression format. Or not printing content-encoding indicates that the Content is not compressed. The cache server caches different Content for different content-encoding and returns the most appropriate version based on the Accept-Encoding field in the specific request. Add the Vary: accept-encoding response header to explicitly tell the cache server to cache different versions of the accept-Encoding field.
Different resource caching policies
-
Different resources may have different update requirements. Review and determine the appropriate max-age for each resource;
-
Some resources are updated more frequently than others. If a particular part of a resource (such as a JS function or a set of CSS styles) is updated frequently, consider providing its code as a separate file. This way, the rest of the content (such as library code that doesn’t update frequently) can be fetched from the cache each time an update is fetched, ensuring a minimum amount of content is downloaded;
-
Using a combination of resource urls containing content signatures and a short or no-cache lifetime for HTML documents can control the speed at which the client receives updates. Low frequency updated resources (JS/CSS) are changed, and entry changes are made only in high frequency updated resource files (HTML).
Special instructions
F5/ Click refresh button in toolbar/right-click menu to reload
F5 works differently than simply typing in the URI field and then pressing Enter. F5 will cause the browser to send an HTTP Request to the Server anyway, even if there was an Expires header in the previous response. So, when I press F5 on a web page, the browser sends an HTTP Request to the Server with Headers like this:
Cache-Control: max-age=0 If-Modified-Since: Fri, 15 Jul 2016 04:11:51 GMT
Cache-control is enforced by Chrome, and if-modified-since contains the last-modified header. The browser resends the time using the if-Modified-since header to confirm If the resource needs to be resended. In fact, the Server did Not modify the index.css file, so it returned a 304(Not Modified), which is a small response, does Not consume much route-trip, and the page quickly refreshes.
There is no ETag in the above example. If the Response contains ETag, the Http Request raised by F5 will also contain if-none-match.
Cache rule implementation
The client browser cache described above is stored in the client browser, but the actual setting up of the client browser cache is done in the resources on the server. Although the client browser cache properties are described above, the actual setting of these properties is done in the server’s resources. There are usually two ways to set the browser cache, either by directive declaration or programmatically.
Ngnix command setting
Example 1
# JS and CSS cache time set the location ~ *. (JS) | CSS? $ { expires 1y; }
Example 2:
location ~ .*.(css|js|swf|php|htm|html )$ { add_header Cache-Control no-store; }
Example 3:
http { etag off;
Example 4
Configure last-Modified (enabled by default)
programmatically
The server services made in different languages are not described in detail here. You can view the module description. The KOA implementation is used as an example:
//koastart
var koa = require('koa');
var app = new koa();
// response
app.use(function* (){
this.body = 'Hello World';
var etag = this.get('ETag');
console.log("etag:"+etag);
var date = new Date;
var hashStr = this.body;
var hash = require("crypto").createHash('sha1').update(hashStr).digest('base64');
this.set({
'Cache-Control':'max-age=120'.'Etag': hash,
'Last-Modified': new Date
});
});
app.listen(3000);
Copy the code
Reference documentation
Mdn-http cache: developer.mozilla.org/zh-CN/docs/… ; HTTP cache control: imweb. IO /topic/5795d… ; web.dev/http-cache/ ; Mp.weixin.qq.com/s/d2zeGhUpt… ; www.jiqizhixin.com/articles/20… ;