Browser cache

why:

The term “Cache Cache” comes from a paper published in 1967. It is used in computer engineering. When the CPU processes data, it first searches the Cache and then searches the memory. Browser caching is also a solution to the problem that the network loads data much faster than the browser itself can process it. Avoid some resources being downloaded on the server every time

what:

  • A Web cache is a copy of a Web resource (such as HTML pages, images, JS, data, and so on) that exists between a Web server and a client (browser)
  • The cache stores a copy of the output based on incoming requests; When the next request comes in, if it is the same URL, the cache will decide, depending on the caching mechanism, whether to respond to the access request directly with a copy or to request the server to send the request again (and perhaps whether the access server uses caching or not).

how:

Caching mechanism: expiration mechanism & validation mechanism

  1. Freshness (expiration mechanism) : Cache copy validity period. A cached copy must satisfy the following conditions if the browser considers it valid and new enough:
    • Contains the complete expiration control header (HTTP protocol header) and is still valid
    • The browser has already used this cache copy and has already checked for freshness in a session
  2. Checksum (validation) : The server sometimes returns a resource with the entity label of the resource in the control headerEtag, which can be used as a validation flag for the browser’s re-request process. If the verification identifier does not match, the resource has been modified or expired, and the browser needs to obtain the resource content again

Cache phase: local (strong) cache & negotiated (weak) cache

  1. Before sending a request, the browser checks whether the strong cache is matched. If yes, the browser directly reads resources from the cache and does not send the request to the server. Otherwise, the browser goes to the next step
  2. When the strong cache does not hit, the browser must send a request to the server, and the server will base it onrequest HeaderTo determine if the negotiated cache has been hit, and if so, the server returns304The response, however, does not carry any response entity, but simply tells the browser that the resource can be fetched directly from the browser cache, or loaded directly from the server if neither the local cache nor the negotiated cache is hit

Local caches: cache-control, Expires

Negotiation cache: last-modified, if-modified-since, ETag, if-none-match

  • HTML tags in control

    <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
    Copy the code
  • HTTP message header

    The rules The message header Value/sample type role
    freshness Expires Sun, 16 Oct 2021 05:00:00 GMT The response Tell the browser to use the copy before expiration time
    Pragma no-cache The response Tells the browser to ignore the cached copy of the resource
    Cache-Control no-cache The response Tell the browser to ignore the cached copy of the resource, forcing each request to be sent directly to the server
    no-store The response Force caching does not hold any copies under any circumstances
    Max – age = [s] The response Specifies the duration of the cache copy, the number of seconds between the request time and expiration time
    public The response Any cache can cache the resource unconditionally
    Private The response Only cache resources for a single user or entity
    Last-Modified Sun, 16 Oct 2021 05:00:00 GMT The response Tells the browser when the current resource was last modified
    If-Modified-Since Sun, 16 Oct 2021 05:00:00 GMT request If last-Modified is not empty in the browser’s response on the first request, the second request for the same resource sends it to the server as the value of the item
    Calibration value ETag 50b1c2c24c433cr:df3 The response A unique identifier that tells the browser that the current resource is on the server
    If-None-Match 50b1c2c24c433cr:df3 request If ETag is not empty in the browser’s response on the first request, it will be sent to the server as this value on the second request for the same resource
    auxiliary vary The response Assists in filtering suitable versions from multiple cache copies
Expires.

The HTTP1.0 feature, which represents the point at which the resource expires, is an absolute value after which the cached resource expires. Changing the local time may invalidate the cache

Expires: Wed, 1 May 2021 00:00:00 GMT
Copy the code
Cache-control:

A new HTTP1.1 feature, added to compensate for Expires shortcomings, provides a more nuanced caching mechanism for HTTP requests and responses by specifying directives.

  • Public: Indicates that the response can be cached by any object, even content that is not normally cacheable
  • Private: indicates that the response can only be cached by a single user, not as a shared cache
  • No-cache: Forces the cache to submit requests to the original server for validation before publishing a cache copy
  • No-store: The cache should not store anything about client requests or server responses, that is, no cache is used
  • Max-age =: Sets the maximum period for cache storage. If the period exceeds this period, the cache is considered expired
  • S – maxage = : coveragemax-ageorExpiresHeader, but only for shared caches (such as individual agents) and ignored by private caches
  • Max-stale =[]: indicates that the client is willing to accept an expired resource. An optional number of seconds can be set to indicate that the response cannot be out of date beyond that given time.
  • Min-fresh =: indicates that the client wants to get a response that will keep it up to date for a specified number of seconds.
  • Must-revalidate: When a resource expires (e.g., when it has been exceededmax-age), the cache cannot respond to subsequent requests with this resource until successfully validated to the original server.
  • Proxy-revalidate: has the same effect as must-revalidate, but applies only to shared caches (such as proxies) and is ignored by private caches.
  • No-transform: Resources cannot be converted or transformed.Content-Encoding,Content-Range,Content-TypeSuch HTTP headers cannot be modified by a proxy
  • Only-if-cached: indicates that the client accepts only cached responses and does not check with the original server for newer copies.

Priority: Cache-Control has a higher priority than Exoires

Last-Modified & If-Modified-Since

Last-modified (Response Header) and if-modified-since (Request Header) are a pair of headers that belong to HTTP 1.0.

If-modified-since is a request header field and can only be used in GET or HEAD requests. Last-modified is a response header field that contains the date and time the resource was Modified as determined by the server. When accessing the server to request a resource with the if-Modified-since header, the server checks for last-Modified and returns a 304 If the last-Modified time is earlier than or equal to if-Modified-since, otherwise returns the resource again

ETag & If-None-Match

ETag is a response header field. It is a hash string generated from the entity content to represent the state of the resource, generated by the server. If-none-match is a conditional request header. If this field is added to the request header and the value is the ETag returned by the previous server, the server returns a response of 200 with the requested resource entity if and only if no resource on the server has an ETag attribute value that conflicts with this header, otherwise 304 is returned

Priority: The ETag priority is higher than last-Modified, and ETag is used when both of them exist

What problems does ETag solve:

A. The Last modification of the last-Modified tag can only be accurate to the second level. If some files have been Modified more than once in less than one second, the last-Modified tag cannot accurately indicate the freshness of the file.

B. Some files may change periodically, but their contents do not change (only the modification time is changed), but last-modified changes make the file unavailable for caching;

C. The server may not obtain the correct file modification time or the time on the proxy server may be different from that on the proxy server.

Why ETag?

  • Some files may change periodically, but their contents do not change (just change the modification time). At this point we do not want the client to think that the file has been modified and GET again;
  • Some files are Modified very frequently, such as If they are Modified less than seconds (say N times in 1s), and if-modified-since the granularity that can be checked is s-level, and such changes cannot be determined (or the UNIX record MTIME is only accurate to seconds).
  • Some servers do not know exactly when a file was last modified.

How to choose the right cache:

Rough order

  • Cache-control — before requesting the server
  • Expires – Before a request to the server
  • If-none-match (Etag) — Request server
  • If-modified-since (last-modified) — Request server

Negotiated caching needs to be used in conjunction with strong caching. If strong caching is not enabled, negotiated caching is meaningless

Most Web servers have negotiated caching enabled by default, and both last-Modified, if-modified-since and ETag, if-none-match are enabled.

But the following scenarios need to be noted:

  • Last-modified files must be consistent across multiple machines in a distributed system to avoid load balancing on different machines resulting in failed comparisons.
  • Distributed systems turn off ETAGS as much as possible (eTAGS are generated differently for each machine);

Cookie

— — — – pit

LocalStorage

— — — – pit

SessionStorage

— — — – pit

IndexDB

— — — – pit

Service Worker

— — — – pit

Reference article:

Understand browser caching strategies inside out

Browser caching mechanism