HTTP cache

preface

HTTP cache is a copy of AN HTTP resource (such as HTML pages, images, JS, data, etc.) that exists between the HTTP server and the client (browser). The cache saves a copy of the output based on incoming requests. When the next request comes, if it is the same URL, the cache will decide whether to respond to the access request directly with the replica or send the request again to the source server, depending on the caching mechanism.

The use of HTTP caching is pretty obvious:

Reduce network bandwidth consumption, when the HTTP cache copy is used, only a small amount of network traffic will be generated, which can effectively reduce operational costs;
After the validity period of network resources is set, users can reuse the local cache to reduce the requests to the source server and indirectly reduce the server pressure.
Reduce network latency and speed up page opening. For end users, the use of caching can significantly speed up page opening and achieve a better experience.

Browser Cache Classification

The browser cache is divided into strong cache and negotiated cache. The simple process of loading a page by the browser is as follows:

HTTP caching starts on the second request. The first time a resource is requested, the server returns the resource with its cache parameters in the Respone header.
On the second request, the browser determines these request parameters and hits strong cache with 200. The server does not return the resource information and the browser continues to load the resource from the cache.
Otherwise, add the request parameter to the request header and pass it to the server to see if it matches the negotiation cache. If it does, return 304. Otherwise, the server returns the new resource.

Strong cache

When a strong cache is hit, the browser does not send the request to the server. In Chrome’s developer tools, you can see that the HTTP return code is 200, but the Size column is displayed as (from cache).

Strong caching is controlled by using the Expires or cache-control fields in the HTTP return header, which indicate how long the resource will be cached.

Expires

The cache expiration time, which specifies the expiration time of the resource, is a specific point in time on the server. In other words, Expires=max-age + request time, which needs to be used in combination with last-Modified. But as mentioned above, cache-control has a higher priority. Expires is a Web server response header field that tells the browser in response to an HTTP request that data can be cached directly from the browser until the expiration date, without having to request it again.

This field returns a time, such as Expires: Mon, 22 Feb 2021 15:02:11 GMT. This time represents the expiration time of the resource, meaning that it will be valid until 15:02:11 on February 22, 2021. One obvious disadvantage of this approach is that since the invalidity time is an absolute time, it can cause cache chaos when the client local time is changed and the server/client time bias becomes larger. Hence the development of cache-control.

Cache-Control

Cache-control is a relative time, such as cache-control :3600, which indicates that the resource is valid for 3600 seconds. Since these are relative times and are all compared to the client time, server and client time deviations do not cause problems. Both cache-control and Expires can be enabled on the server at the same time. Cache-control takes precedence when both Settings are enabled.

Cache-control can be composed of multiple fields. The value can be:

Max-age: specifies a period of time during which the cache is valid. The unit is s. For example, if cache-control :max-age=600, the Cache validity period is 600s. If the cache is not disabled and the valid time has not expired, the resource will be hit again. Instead of requesting the resource from the server, the resource will be fetched directly from the browser cache.
No-cache: it can be cached locally or on a proxy server, but this cache must be authenticated by the server.
No-store: Completely disable caching, local and proxy servers do not buffer, get it from the server every time.

Negotiate the cache

If the strong cache is not hit, the browser sends the request to the server. The server determines whether the negotiation cache is matched based on last-modify/if-modify-since or Etag/ if-none-match in the HTTP header. If a match is made, the HTTP return code is 304 and the browser loads the resource from the cache.

Last-Modify/If-Modify-Since

When the browser requests a resource for the first time, the server returns a header with last-modify. Last-modify is a time that indicates the Last time the resource was modified. For example, Mon, 22 Feb 2021 13:33:55 GMT.

When the browser requests the resource again, the sent request header contains if-modify-since, which is the last-modify returned before caching. After receiving if-modify-since, the server determines whether the cache is hit based on the last modified time of the resource.

If the cache is hit, http304 is returned and the resource content is not returned, and last-modify is not returned. The client-server time gap does not cause problems because of the server time comparison. However, sometimes the last modification time is not accurate to determine whether the resource has been modified (the last modification time can be consistent with the resource change), because the time can only be accurate to the second level, not to the millisecond level. Hence ETag/ if-none-match.

ETag/If-None-Match

Unlike last-modify/if-modify-since, Etag/ if-none-match returns an Etag (entity tag). Etags ensure that each resource is unique, and changes in resources result in changes in eTAGS *. If the ETag value changes, the resource status has been modified. The server determines whether the cache is hit based on the if-none-match value sent by the browser.

Last-modified is accurate to the second level. If a file has been Modified more than once in less than a second, it will not be accurate to indicate when it was Modified.
Some files may change periodically, but the contents of the file do not change (only the modification time). At this time, we do not want the client to think that the file has been changed and GET again.

Etag is a unique identifier of the corresponding resource on the server that is generated automatically by the server or by the developer, which can control the cache more accurately. Last-modified and ETag can be used together. The server validates the ETag first. If the LAST-Modified is consistent, the server will continue to compare the last-Modified before deciding whether to return 304.

How to set up strong cache and negotiated cache

Nodejs Settings:

Res. SetHeader (" Max - age ":"3600Public ') res.setheader (etag: '5c20abbd-e2e8')24 Dec 2018 09:49:49 GMT)
Copy the code

Nginx Settings:

Although you can set the negotiation cache yourself, most Web servers enable the negotiation cache by default, with last-Modified and if-modified-since enabled along with ETag and if-none-match. However, we can set our own strong cache to ensure 2 file cache time.

conclusion

First request:

Second request: