Original link:

Segmentfault.com/a/119000001…

1. The HTTP cache

The first time a browser makes an HTTP request to a Web server, the server returns the requested resource and adds cache-specific fields such as Cache-Control, Expires, Last-Modified, ETag, Date, and so on to the response header. The browser then requests the resource from the server and can use strong and negotiated caching as appropriate.

  • Strong cache: The browser fetches data directly from the local cache without interacting with the server.
  • Negotiated cache: The browser sends a request to the server, and the server determines whether the local cache can be used.

Connections and differences:

  • Both caches end up using local caches;
  • The former does not interact with the server, the latter does.

2. Strong cache

The procedure shown in the red line above represents strong caching. After a user initiates an HTTP request, the browser finds that the requested resource is cached locally and checks whether the cache expires. There are two HTTP header fields that Control the expiration date of a Cache: Expires and cache-Control. Browsers determine whether a Cache is expired based on two steps:

  1. Check whether the Cache has the s-maxage or max-age instruction of cache-control. If yes, use the generation time of response message Date + s-maxage/max-age to obtain the expiration time. Compare with the current time (S-maxage is suitable for a common cache server with multiple users);
  2. If there is no s-maxage or max-age directive for cache-control, the expiration time in Expires is compared to the current time. Expires is an absolute time.

Note:

In HTTP/1.1, when header field cache-control has an s-maxage or max-age directive, s-maxage or max-age is handled in preference to header field Expires.

Cache-control: cache-control:

  • No-cache: indicates that the local cache is not used but the negotiated cache is used. That is, the cache is confirmed with the server first.
  • No-store: cache is disabled.
  • Public: indicates that other users can also use the cache. This applies to the case of a public cache server.
  • Private: indicates that only specific users can use the cache. This applies to the case of a public cache server.

After the above two steps, if the cache is not expired and the return status code is 200, the cache is directly read from the local, which completes the whole strong cache process. If the cache expires, it enters the process of negotiating the cache or the server returning a new resource.

3. Negotiate cache

When the browser finds that the cache has expired, the cache is not necessarily unusable, because the resources on the server side may still be the same, so you need to negotiate with the server to let the server determine whether the local cache is still usable.

The browser determines whether there is an ETag or Last-Modified field in the cache. If not, an HTTP request is made and the server returns the resource. If these two fields are present, an if-none-match field (If an ETag field is present) and an if-Modified-since field (If a last-Modified field is present) are added to the request header.

Note: If the if-none-match and if-modified-since fields are sent at the same time, the server only needs to compare if-none-match and ETag. If the contents are consistent and the server thinks the cache is still available, the server returns a status code 304 and the browser reads the local cache directly, completing the process of negotiating the cache, as shown in the blue line. If the content is inconsistent, the other status code is returned as appropriate and the requested resource is returned.

Here is a detailed explanation of the process:

1. The ETag and If – None – Match

Both values are unique identifying strings assigned by the server to each resource.

  1. When the browser requests resources, the server adds an ETag field to the response header. When the resource is updated, the ETag value on the server is also updated.
  2. When the browser requests resources again, the if-none-match field is added to the header of the request packet. The value of the if-none-match field is the value of the ETag in the last response packet.
  3. The server checks whether the values of ETag and if-none-match are consistent. If they are inconsistent, the server accepts the request and returns the updated resource. If so, the resource has not been updated, and a response with status code 304 is returned, you can continue to use the local cache. Note that the ETag field is added to the response header, even though it has not changed.

2. The last-modified and If – Modified – Since

Both values are time strings in GMT format.

  1. After the browser requests a resource to the server for the first time, the server adds a last-Modified field to the response header, indicating the Last time the resource was Modified.
  2. When the browser requests the resource again, it adds the if-Modified-since field to the request header, whose value is the last-Modified value in the Last response message from the server.
  3. The server checks whether last-Modified and if-Modified-since values are consistent. If they are inconsistent, the server accepts the request and returns the updated resource. If yes, the resource has not been updated, and a response with status code 304 is returned, the local cache can continue to be used. Unlike ETag, the last-Modified field is not added to the response header.

3. Advantages of ETag over Last-Modified

You might think that using Last-Modified is enough to let the browser know if the local cached copy is new enough, so why ETag?

ETag was introduced in HTTP1.1 to solve several last-Modified problems:

  • Some files may change periodically, but their contents do not change (just change the modification time). At this point we do not want the client to think that the file has been modified and GET again;

  • Some files are Modified very frequently, such as If they are Modified less than seconds (say N times in 1s), and if-modified-since the granularity that can be checked is s-level, and such changes cannot be determined (or the UNIX record MTIME is only accurate to seconds).

  • Some servers do not know exactly when a file was last modified.

In this case, the cache can be more accurately controlled by using ETag, because ETag is the unique identifier of the resource automatically generated by the server on the server side. Every time the resource changes, a new ETag value is generated. Last-modified can be used with ETag, but the server validates the ETag first.

4. The impact of user behavior

A graph illustrates the impact of user behavior on the browser cache