Write in the front

Caching plays an important role in Web development, especially in heavy load Web system development.

For more information on performance optimization, see HTML, CSS, and JS section optimization, page loading speed optimization, and network transport layer optimization

Recommended reading time: 5 minutes

Read this article

  • Caching concepts
  • The role of cache
  • Caching mechanisms
  • Cache policy details

Conceptual knowledge of caching

  • Classification of caches: server caches (proxy server caches, CDN caches), third party caches, browser caches, etc.
  • Terms related to caching:
    • Cache hit ratio: The ratio of the number of requests for data from the cache to the total number of requests. Ideally, the higher the better.
    • Expired content: Content marked as’ stale ‘after the set validity period. Usually expired content cannot be used to reply to client requests, and you must either re-request new content from the source server or verify that cached content is still available.
    • Validation: Verifies that the expired content in the cache is still valid, and refreshes the expiration time or policy if so.
    • Invalidation: Invalidation is the removal of content from the cache. Invalid content must be removed when content changes.
  • Also: The browser cache is the least expensive because the browser cache relies on the client and consumes almost no resources on the server (equivalent to purely static pages in extreme cases).

The role of cache

  • Reduce network bandwidth consumption
  • Reduce server stress
  • Reduce network latency and speed up page opening

Caching mechanisms

  • The strong cache takes precedence over the negotiated cache. If the strong cache takes effect, the strong cache is used. If the strong cache fails, the negotiated cache is used
  • The server decides whether to use the negotiation cache. If the negotiation cache is invalid, the request cache is invalid, and the request result is obtained again and stored in the browser cache. If effective, return 304 and continue using cache

Cache policy image source: IMWeb front end

Http-headers involving caching mechanisms

Expires (strong caching mechanism)

  • valueIs an absolute time in GMT format,ExpiresThe date time must be Greenwich Mean Time (GMT), not local time. For example:Expires: Fri, 30 Oct 1998 14:19:41
  • What it does: Tells the cache how long the relevant copy has been fresh. After that time, the cache sends a request to the source server to check if the document has been modified.
  • compatibility: Is supported by almost all caching serversExpiresExpiration time attribute
  • Rule: Based on the last time the client viewed the copy (last access time) or the last time the document was modified on the server
  • application:
    • Especially useful for setting up the cache of static image files such as navigation bars and image buttons; Because these images are rarely modified, you can give them an extremely long expiration time, which will make your site very responsive to users
    • It’s also useful for controlling web pages that change regularly. For example, if you update a news page at 6 a.m. every day, you can set the expiration time of the copy to that same time, so that the cache server knows when to pick up an updated version without the user having to press the browser’s “refresh” button.
    • The expiration time header value can only be the date and time in HTTP format. Anything else will be resolved to “before” the current time and the copy will expire
  • limitations: Although the expiration time attribute is useful, it has some limitations.
    • First: the date is involved, so the Web server’s time and the cache server’s time must be synchronized. If something is not synchronized, either the cached content has expired earlier or the expired results are not updated.
    • If you set the expiration time to a fixed time, and if you return the content without updating the next expiration time, then all subsequent access requests will be sent to the source Web server, increasing the load and response time

Cache-control (strong Cache mechanism)

  • value:Max - age = [s]– The maximum time for executing the cache is considered to be the latest.
    • Relative time, not absolute time
    • In seconds: The number of seconds between the start of the request time and the expiration time.
  • What it does: Gives website publishers more control over their content and location of expiration time limits. It was introduced in HTTP 1.1 to compensate for the Expires bug.
  • Related control field:
    • S - maxage = [s]– Similar to the max-age attribute, except that it applies to the shared (e.g., proxy server) cache
    • public– Content authenticated by tags can also be cached. Generally speaking, the output of content that can be accessed only after HTTP authentication cannot be cached automatically.
    • no-cache– Forces each request to be sent directly to the source server without being verified by the local cache version. This is useful for applications that require validation (which can be used in conjunction with public), or for applications that require strict use of up-to-date data (at the expense of all the benefits of using caching);
    • no-store– Mandatory caching Do not keep any copies under any circumstances
    • must-revalidate– Tells the cache to follow all the freshness you give copies
    • proxy-revalidate– andmust-revalidateSimilar, except that it only works on cached proxy servers

Last-modified/if-modified-since (negotiated cache mechanism)

  • Usually the server knows when the data you requested was last modified, and HTTP provides a way for the server to send the latest modified data along with the data you requested.
  • If you request the same data a second (or third, or fourth) time, tell the server the last modification date it received: Send one in the requestIf-Modified-SinceHeader information that contains the date last obtained from the server along with the data.
  • If the data hasn’t changed since then, the server will return a special HTTP status code 304, which means “this data hasn’t changed since the last request.”
  • When the server sends status code 304, the data is not resold. So you don’t need to download the same data over and over again when the data isn’t updated
  • compatibility: All modern browsers support (last-modified).

ETag/ if-none-match (negotiation cache mechanism)

  • Function: Do not re-download data if there is no change
  • Way to work
    1. EtagIs returned by the server the last time the resource was loadedresponse headerIs a unique identification of the resource. As long as the resource changes,EtagIt will regenerate
    2. When the browser sends a request to the server next time it loads a resource, it returns the previous requestEtagValues inrequest headerIn theIf-None-MatchIn, the server compares the client sentIf-None-MatchWith the resource on its own serverETagAre consistent
    3. If the server discoversETagIf it doesn’t match, then it’s just normalGET 200The return form will bring new resources (including, of course, newETag) to the client; ifETagIf yes, return 304 and tell the client to use the local cache directly.

Comparison of several caching strategies

Comparison of two strong caching mechanismsExpires VS Cache-Control

  • There was so little difference betweenThe difference isExpiresHTTP1.0The product ofCache-ControlHTTP1.1The product of
  • Priority onIf both exist at the same time,Cache-ControlPriority overExpires ,ExpiresMore like an alternative, in some not supportedCache-ControlIn the environment
  • The common disadvantage of both is that the strong caching mechanism only cares whether the cache has exceeded or exceeded a certain expiration time, and does not care whether the server resources have been updated. Therefore, the use of these two caching strategies alone will result in the client’s resources not being up to date

Comparison between the two negotiation cache mechanismsLast-Modified/If-Modified-Since VS ETag/If-None-Match

  • On the accuracy of.ETagTo be significantly better than the former,Last-Modified/If-Modified-SinceThe time unit of the policy is seconds, which means that on second level requests, you can’t really update, butETagIt is changed on each request to ensure accuracy, and on servers that use load balancing, the individual servers generateLast-ModifiedIt could be different
  • On the performance.ETagTo be worse thanLast-Modified/If-Modified-SinceStrategy, after allLast-Modified/If-Modified-SinceStrategy just keeps track of time, andETagA step hash is required
  • Priority on, the server will take precedenceETag

The impact of user behavior on caching policies

Normal caching is not enabled for all operations and can be skipped for certain user actions

  1. Address bar access, link redirection is normal user behavior, will trigger the browser caching mechanism
  2. F5 refresh, the browser will setmax-age=0, the strong cache judgment is skipped, and the negotiated cache judgment is performed
  3. CTRL +F5 refresh, skip the strong cache and negotiation cache, and pull resources directly from the server