Without further ado, let’s start with a flow chart to understand what happens when a request is sent to a result with caching

To the body

The browser cache mechanism is also known as the HTTP cache mechanism. Its mechanism is based on the cache identifier of HTTP packets. Therefore, before analyzing the browser cache mechanism, we will briefly introduce HTTP packets with pictures and pictures.

HTTP packets

1. HTTP Request packets

The packet format is as follows:Request line - HTTP header (general information header, request header, entity header) - Request message body (Only POST has message body), the following figure

2. HTTP Response packets

The format of the packet is as follows: Status line – HTTP header (general information header, response header, or entity header) – Response packet body, as shown in the following figure

Note: The general information header refers to the header fields supported by both request and response packets, which are cache-Control, Connection, Date, Pragma, Transfer-Encoding, Upgrade and Via respectively. Entity header is the entity header field of entity information. Allow, content-base, content-Encoding, content-language, content-Length, content-location, content-MD5, content-range, and Conten respectively T-type, Etag, Expires, Last-Modified, extension-header. For ease of understanding, generic message headers, response/request headers, and entity headers are all classified as HTTP headers.

Second, cache process analysis

Initiate a request for the first time

  • Each time the browser initiates a request, it first looks up the result of the request and the cache identifier in the browser cache
  • Each time the browser receives the result of the returned request, it stores the result and the cache id in the browser cache

Here we divide the caching process into two parts, mandatory caching and negotiated caching, depending on whether an HTTP request needs to be re-initiated to the server.

Mandatory caching

Forced caching is the process of looking up the result of a request to the browser cache and deciding whether to use the cached result based on the result’s caching rules

3.1. Pragma  

Pragma is a generic header field left over from previous versions of HTTP/1.1 and is used only for backward compatibility with HTTP/1.0. Although it is a generic header, its behavior in response to a message is non-canonical and depends on the browser implementation. In the RFC, this field has only one optional value, no-cache. It informs the browser not to use cache directly and requires the browser to send a request to the server to check freshness. Because it has the highest priority, it must not hit the strong cache when it exists.

3.2. Cache-Control    

Cache-control is a generic header field, and is the main HTTP/1.1 Control browser Cache field. Related to the browser cache are the following response commands:

instruction parameter instructions
private There is no Indicates that the response can only be cached by a single user, not as a shared cache (that is, the proxy server cannot cache it)
public Can be omitted Indicates that the response can be cached by any object, including the client that sent the request, the proxy server, and so on
no-cache Can be omitted You must verify its validity before caching
no-store There is no No content of the request or response is cached
max-age=[s] necessary The maximum response, but in seconds
  • A private response is only a private cache and cannot be cached by a CDN. If HTTP authentication is required, the response is automatically set to private.

  • Public indicates that the response can be cached by the browser, CDN, and so on.

  • No-cache indicates that the request must confirm the validity of the cache with the server. If the cache is valid, the cache (negotiation cache) can be used. The strong cache will not be matched if the field appears in the response header or request header. Chrome hard reloading (Command+ Shift +R) puts Pragma: no-cache and cache-Control: no-cache at the beginning of requests.

  • No-store forbids the browser and all intermediate caches to store any version of the return response. Strong cache and negotiated cache must not occur. It is suitable for personal privacy data or economic data.

  • Expires Expires is a response header field that specifies a date/time before the HTTP cache is considered valid. An invalid date such as 0 indicates that the resource has expired. If you also set max-age for the cache-control response header field, Expires is ignored. It is also a generic header field left over from HTTP/1.1 and is used only as backward compatibility with HTTP/1.0.

There are three main cases of forced caching (the negotiation cache process is not analyzed), as follows:

1. If the cache result and cache id do not exist and the cache is forcibly invalidated, the request is directly sent to the server (the same as the first request), as shown below:

2. The cache result and cache identifier exist, but the result is invalid and the cache is forced to become invalid, then the negotiated cache is used (not analyzed for now), as shown below:

3. If the cache result and cache id exist and the cache result is not invalid, the cache is forced to take effect and the cache result is directly returned, as shown in the following figure:

Negotiation cache

Negotiation cache is a process in which the browser sends a request to the server with the cache id after the cache is invalid, and the server decides whether to use the cache based on the cache ID

  • The negotiation cache takes effect and 304 is returned as follows

  • Negotiation cache invalid, return 200 and request result as follows

1. Last-Modified/If-Modified-Since

If-modified-since is a request header field and can only be used in GET or HEAD requests. Last-modified is a response header field that contains the date and time the resource that the server identified was Modified. When requesting a resource from a server with an if-Modified-since header, the server checks last-Modified and returns a 304 response with no body If last-Modified is earlier than or equal to if-Modified-since, otherwise the resource is returned.

If-Modified-Since: , :: GMT Last-Modified: , :: GMT

2. ETag/If-None-Match     

ETag is a response header field. It is a hash string generated from the entity content to identify the status of the resource. It is generated by the server. If-none-match is a conditional request header. If the request resources in this field, the request first add value for resources on the ETag, returned by the server before, if and only if the server does not have any resources ETag attribute values and the first listed in the server will return 200 response with the requested resource entities, or the server will return 304 response without entity. ETag has a higher priority than last-Modified, and ETag is used when it exists at the same time.

If-none-match: <etag_value> if-none-match: <etag_value>, <etag_value>,… If-None-Match: *

conclusion

1. The process

2. Negotiate cache and strong cache

3. catch-contral

The demo link

Github.com/songpengyua…

Reference article:

  1. Understand the browser caching mechanism thoroughly
  2. Interview picks for HTTP caching
  3. Browser HTTP caching mechanism
  4. The front-end also needs to understand the Http caching mechanism