Classification of Http caches

Http caches can be divided into two main categories, mandatory caches (also known as strong caches) and negotiated caches. The two types of cache rules are different. The mandatory cache does not need to interact with the server if the cache data is not invalid. Negotiated caches, as the name suggests, need to be compared to see if they can be used.

The two types of cache rules can exist at the same time, and the forced cache has a higher priority than the negotiated cache. That is, if the forced cache rule takes effect, the cache is used directly and the negotiated cache rule is not executed.

The original model

// Set up an Express server without any cache headers.
const express = require('express');
const app = express();
const port = 8080;
const fs = require('fs');
const path = require('path');

app.get('/'.(req,res) = > {
    res.send(` 
         Document   Http Cache Demo   `)
})

app.get('/demo.js'.(req, res) = >{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    res.end(cont)
})

app.listen(port,() = >{
    console.log(`listen on ${port}`)})Copy the code

The request process is as follows:

  • The browser requests the static resource Demo.js
  • The server reads the disk file demo.js and sends it back to the browser
  • The browser requests again, and the server reads the disk file A.js again and sends it back to the browser.
  • Loop request..

It can be seen that the traffic of this method is related to the number of requests, but the disadvantages are also obvious:

  • Waste user traffic
  • A waste of server resources, the server has to read the disk file and then send the file to the browser
  • The browser waits for JS to download and execute before rendering the page, affecting the user experience

Mandatory cache

There are two types of mandatory caching, Expires and cache-control.

Expires

The Expires value is the cache expiration time (GMT) that the server tells the browser. That is, if the current time on the browser has not reached the expiration time on the next request, the cached data will be used directly. Let’s set the Expires response header using the Express server.

// Other code...
const moment = require('moment');

app.get('/demo.js'.(req, res) = >{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    res.setHeader('Expires', getGLNZ()) / / 2 minutes
    res.end(cont)
})

function getGLNZ(){
    return moment().utc().add(2.'m').format('ddd, DD MMM YYYY HH:mm:ss') +' GMT';
}
// Other code...
Copy the code

We added an Expires response header to demo.js, but since it’s GMT, we’ll use Momentjs to convert it. The first request will still be sent to the server, and the expiration date will be returned to us along with the file; But when we refresh, we see the magic: the file was read directly from the memory cache, and no request was made. We’re going to set the expiration time here to two minutes, and after two minutes you can refresh the page and see the browser sends the request again.

Although this method adds cache control and saves traffic, it still has the following problems:

  • Because the browser time and server time are not synchronized, if the browser is set to a later time, the expiration time is not used
  • After the cache expires, the server reads the file again and returns it to the browser, regardless of whether the file has changed

However, Expires is an HTTP 1.0 thing, and the default browser now uses HTTP 1.1 by default, so its role is largely ignored.

Cache-Control

A new caching scheme is added for the time inconsistency between browser and server. Instead of telling the browser the expiration date directly, the server tells a relative time cache-control =10 seconds, meaning that within 10 seconds, the browser Cache is used directly.

app.get('/demo.js'.(req, res) = >{
  let jsPath = path.resolve(__dirname,'./static/js/demo.js');
  let cont = fs.readFileSync(jsPath);
  res.setHeader('Cache-Control'.'public,max-age=120') / / 2 minutes
  res.end(cont)
})
Copy the code

Negotiate the cache

The downside of mandatory caching is that the cache is always out of date. However, if the file has not changed after the expiration time, it is a waste of server resources to retrieve the file again. The negotiation cache has two groups of packets:

  1. The last-modified and If – Modified – Since
  2. The ETag and If – None – Match

Last-Modified

In order to save server resources, improve the scheme again. The browser negotiates with the server. Each time the server returns a file, it tells the browser when the file was last modified on the server. The request process is as follows:

  • The browser requests the static resource Demo.js
  • The server reads the disk file demo.js and returns it to the browser with last-modified (GMT standard format).
  • When the cache file on the browser expires, the browser takes on the request headerIf-Modified-SinceLast-modified (equal to last-Modified of the previous request) request server
  • The server compares the items in the request headerIf-Modified-SinceAnd the last time the file was modified. If so, continue with the local cache (304), if not return the file contents and last-Modified again.
  • Loop request..

The code implementation process is as follows:

app.get('/demo.js'.(req, res) = >{
  let jsPath = path.resolve(__dirname,'./static/js/demo.js')
  let cont = fs.readFileSync(jsPath);
  let status = fs.statSync(jsPath)

  let lastModified = status.mtime.toUTCString()
  if(lastModified === req.headers['if-modified-since']){
      res.writeHead(304.'Not Modified')
      res.end()
  } else {
      res.setHeader('Cache-Control'.'public,max-age=5')
      res.setHeader('Last-Modified', lastModified)
      res.writeHead(200.'OK')
      res.end(cont)
  }
})
Copy the code

Although this scheme is further optimized than the previous three schemes, the browser detects if the file has been modified, and if it has not, it no longer sends the file; But there are still the following disadvantages:

  • The last-Modified time is GMT, which can only be accurate to seconds. If a file has been Modified for several times within one second, the server does not know that the file has been Modified, and the browser cannot obtain the latest file
  • If the file on the server has been modified many times but the contents have not changed, the server needs to return the file again.

ETag

In order to solve the problem caused by the inaccurate file modification time, the server and the browser negotiate again, this time do not return the time, return the unique identifier of the file ETag. ETag changes only when the contents of the file change. The request process is as follows:

  • The browser requests the static resource Demo.js
  • The server reads the disk file demo.js and sends it back to the browser with the ETag that uniquely identifies the file
  • When the cache file on the browser expires, the browser takes on the request headerIf-None-Match(equal to the ETag of the last request) request server
  • The server compares the items in the request headerIf-None-MatchAnd file ETag. If consistent, the local cache continues (304), and if inconsistent, the file contents and ETag are returned again.
  • Loop request..
const md5 = require('md5');

app.get('/demo.js'.(req, res) = >{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    let etag = md5(cont);

    if(req.headers['if-none-match'] === etag){
        res.writeHead(304.'Not Modified');
        res.end();
    } else {
        res.setHeader('ETag', etag);
        res.writeHead(200.'OK'); res.end(cont); }})Copy the code

other

In the “distant” http1.0 era, there were two fields — Pragma and Expires — that allowed clients to cache. Although these two fields can be discarded long ago, you can see many sites still include these two fields in order to make HTTP protocol backward compatibility.

About the Pragma

When the value of this field is no-cache, the browser is told not to cache the resource, that is, it must send a request to the server each time.

res.setHeader('Pragma'.'no-cache') // Disable caching
res.setHeader('Cache-Control'.'public,max-age=120') / / 2 minutes
Copy the code

Caching is disabled by Pragma. Caching is set to two minutes with cache-control, but if we revisit it we will find that the browser will issue a request again, indicating that Pragma takes precedence over cache-Control

About the cache-control

  • Private: The client can cache
  • Public: both client and proxy servers can be cached
  • Max-age = XXX: the cached content will be invalid in XXX seconds
  • No-cache: A comparative cache is required to validate cached data
  • No-store: all contents will not be cached, forced cache, and comparison cache will not be triggered

Pragma: no-cache and cache-Control: no-cache are Pragma: no-cache and cache-Control: no-cache.

Priority of the cache

The following order is obtained from online information:

Pragma > Cache-Control > Expires > ETag > Last-Modified

References:

HTTP cache priority Thoroughly understand the MECHANISM and principle of HTTP cache control summary of browser HTTP cache mechanism Through express framework simple practice several Settings HTTP cache control