Writing is not easy, without the permission of the author forbid to reprint in any form! If you think the article is good, welcome to follow, like and share!

A brief analysis of HTTP caching

Why cache is needed

  • Using the caching mechanism, previously obtained resources can be reused in corresponding scenarios.
  • Significantly improve site performance and responsiveness
  • Reduce network traffic and wait rendering times
  • Reduce server pressure

HTTP Cache types

  • Strong cache
  • Negotiate the cache

Strong cache

For strong caching, the static resource response header returned by the server sets a mandatory cache time. During the cache time, if the browser refreshes and requests the same resource, the cached resource is directly used if the cache time has not expired. If the cache resource has expired, the negotiated cache policy is implemented.

  • The following are the HTTP header fields associated with strong caching
field use The instance priority The HTTP version
Expires Expiration date of strong cache Expires:Thu,06 Aug 2021 14:36:18 GMT low 1.0
Cache-Control Specifies instructions to implement the caching mechanism Cache-Control:max-age=60 high 1.1

Expires

  • The response header Expires field contains the expiration time of the strongly cached resource
  • A value of 0 indicates that the resource is expired or not strongly cached

Cache-Control

A generic header field that implements caching through instructions. Explain easy to confuse the two fields, other instructions reference instructions daji.

  • no-cache

The cache is forced to submit the request to the original server for validation (negotiated cache validation) before the cache copy is published.

  • no-store

The cache should not store anything about the client request or the server response, that is, no cache is used.

The difference between Expires and Cache-Control

  • The time difference between
    • Expires Indicates the absolute time when the cache Expires in the future.
    • Cache-control indicates the relative time, which is relative to the current time. For example, the Cache expires 60 seconds later
  • priority
    • Expires has a lower priority than the cache-control field
    • If both cache-control and Expires exist, the cache-control directive prevails
  • The HTTP version
    • Expires was introduced in HTTP/1.0 and is more browser compatible
    • Cache-control was introduced in HTTP/1.1, and browser compatibility is not good, so Expires and cache-Control can exist together, and in browsers that don’t support cache-Control Expires prevails

Negotiate the cache

  • Negotiation cache negotiates with the server whether to use the cache, and then determines whether to reload the resource or HTTP StatusCode 304
  • The following fields determine whether to use negotiated caching instead of strong caching:
field Negotiate the cache priority
Pragma Pragma:no-cache high
Cache-Control Cache-Control:no-cache / Cache-Control:max-age=0 low

Pragma

  • Pragma is a generic header specified in HTTP1.0 that behaves like cache-Control: no-cache if cache-control does not exist. The cache server is forced to submit the request to the source server for negotiation validation before returning the cached version.
  • Pragma has only one value, no-cache, and it has a higher priority than cache-control.

Cache-Control

  • Cache-control was introduced earlier, and its instructions can be used for both strong caching and negotiated caching policies
  • Cache-control: no-cache and cache-control: max-age=0 force the server to send requests for authentication (negotiated resource authentication).

Negotiation strategy

Pragma fields or cache-Control :no-cache fields require negotiation policies. The two common pairs of negotiation Cache fields are as follows

  • ETag/If-None-Match
  • Last-Modified/If-Modfied-Since
field Corresponding field value describe priority
Last-Modified If-Modified-Since GMT time Time when the server resource was last modified low
If-Modified-Since Last-Modified GMT time Last-modified of the Last request header, which verifies whether the server resource was Last Modified, returns 304 or 200. low
Etag If-None-Match Content Hash/file information The generated hash of file information or file content for the server cache resource high
If-None-Match Etag Content Hash/file information The last request for ETag in the response header verifies that the server resource has not been modified. high

The advantages and disadvantages

  • If a server changes a piece of code and then changes it back.
    • The modification time of the resource file has changed
    • In fact, the file hasn’t changed
    • This invalidates the cache, causing unnecessary transfers
  • ETag can be compared based on the hash generated by the content. As long as the content of the resource file remains unchanged, client caching is applied to reduce unnecessary transfers.
  • So ETag is more accurate, efficient, and bandwidth efficient than last-Modified caching.

ETag

What is ETag?

Etag is an Entity tag that is used to identify a resource on the server that is passed to the client in the HTTP response header. Etag :W/” 50b1C1d4f775C61: DF3″

ETag format

  • ETag:W/”xxxxxxxx”
  • ETag:”xxxxxxx”

Strong type validation

  • Each byte of the resource is the same.

The W/ prefix represents weak type validation

  • Each byte does not need to be the same, such as the time of the footer or the advertisement displayed. Building a tag (ETAG) architecture for weak validation types can be complicated because it involves ranking the importance of different elements on the page, but it can be quite helpful for cache performance optimization.

What are the requirements for ETag generation?

  1. When the file changes, the ETag value must change
  2. Easy to calculate as far as possible, not special CPU consumption.
    1. Use digest algorithm generation (MD5, SHA128, SHA256) should be considered carefully, these are CPU intensive operations.
    2. It’s not that it doesn’t work, there’s no best algorithm, only the one that fits the response scenario.
  3. In distributed deployment, ETag values generated on multiple server nodes remain the same.

How ETag is generated (Nginx)

Nginx source code ETag from last_modified and content_length concatenated

etag->value.len = ngx_sprintf(etag->value.data, "\"%xT-%xO\"",
                                  r->headers_out.last_modified_time,
                                  r->headers_out.content_length_n)
                      - etag->value.data;                                
Copy the code
  • Translated into the following pseudocode
etag = header.last_modified + "-" + header.content_lenth
Copy the code
  • Summary: The ETag in Nginx is a hexadecimal combination of last-Modified and Content-Length representations of the response header.

Lodash requested an inspection

const LAST_MODIFIED = new Date(parseInt('5fc4907d'.16) * 1000).toJSON()
const CONTENT_LENGTH = parseInt('f48'.16)

console.log(LAST_MODIFIED) / / the 2020-11-30 T06:26:05. 000 z
console.log(CONTENT_LENGTH) / / 3912
Copy the code
  • The output

  • Since ETag in Nginx is made up of last-Modified and content-Length, it is a last-modified version. Where is the last-modified version?
  • Last-modified can only be used for second-level changes, whereas ETag in Nginx adds a condition to the file size, not only related to the modification time, but also related to the content, making it more precise.

How is last-Modified created

In Linux

  • Mtime: Modified time Indicates the timestamp when the file content is changed
  • Ctime: change time indicates the timestamp when file attributes are changed, including mtime. On Windows, it represents creation time
  • However, the LAST-Modified HTTP service usually selects MTime, indicating the time when the file content is Modified, which is compatible with Windows and Linux.
  • The following is the nginx source code
    r->headers_out.status = NGX_HTTP_OK;     
    r->headers_out.content_length_n = of.size;     
    r->headers_out.last_modified_time = of.mtime;
Copy the code

If the ETag value in the HTTP response header changes, does that mean the contents of the file must have changed?

  • Not necessarily
  • The file changes in a second and the file size stays the same
  • It’s extremely rare
  • Therefore, it is normal to tolerate a less than perfect but efficient algorithm.

An analysis of HTTP caching

Continue to share technical blog posts, follow the wechat official account 👇🏻