After several interviews in college admissions, HTTP caching has always been used as an interview question. The first time I was asked this question, I was dumbstruck, so I concluded the following

In any front-end project, it is common to visit the server for data, but if the same data is requested more than once, the extra number of requests inevitably wastes network bandwidth and delays the browser rendering of the content to be processed, affecting the user experience. If a user is using pay-as-you-go, the extra requests can also add to the user’s network traffic bill. Therefore, it is an effective strategy to use cache technology to reuse acquired resources to improve website performance and user experience.

The principle of caching is to save a response copy of the requested resource after the first request. When the user initiates the same request again, the request is intercepted and the response copy stored before is returned to the user if the cache matches the request. In this way, the server does not initiate another request for resources.

There are many types of caching technologies, such as proxy caching, browser caching, gateway caching, load balancers, and content distribution networks, which can be roughly divided into two categories: shared caching and private caching. Shared cache means that cached content can be used by multiple users, for example, a Web proxy set up within a company. A private cache is a cache that can only be used by the user alone, such as a browser cache.

HTTP cache should be the front developing one of the most often contact caching mechanism, it can be divided into mandatory buffer cache and negotiation, the two biggest difference is that a judge a cache hit, the browser will need to ask the server to negotiate the cached information, and then determine whether need to the content of the response to the request. Let’s look at the mechanism of HTTP caching and its decision strategy.

Strong cache

// 通过设置Expires
res.writeHead(200, {      // Disadvantages: Client time and server time may be out of sync
    Expires: new Date('2021-5-27 "').toUTCString()    
})
// By setting cache-control
res.writeHead(200, {      
    'Cache-Control': 'max-age=5' // Slide time, in seconds
})
Copy the code

Negotiate the cache

// Determine the negotiation cache by setting the file modification time
const { mtime } = fs.statSync('./img/03.jpg')     
const ifModifiedSince = req.headers['if-modified-since']     
if (ifModifiedSince === mtime.toUTCString()) {      // The cache takes effect
    res.statusCode = 304      
    res.end()      
    return   
}     
const data = fs.readFileSync('./img/03.jpg')     
// Tell the client to use the negotiated cache for the resource
// The client asks the server if the cache is valid before using it
// Server:
// Valid: 304 is returned, and the client uses the local cache resource
// Invalid: returns the new resource data directly, the client can use it directly
res.setHeader('Cache-Control'.'no-cache')    
// The server issues a field to tell the client when the resource is updated
res.setHeader('last-modified', mtime.toUTCString())    
res.end(data)
// Last-modified deficiency
/ / see below


// Determine whether the file content changes through negotiation cache
const data = fs.readFileSync('./img/04.jpg')    
// Generate a unique password stamp based on the file contents
const etagContent = etag(data)     
const ifNoneMatch = req.headers['if-none-match']     
if (ifNoneMatch === etagContent) {      
    res.statusCode = 304      
    res.end()      
    return    
}     
// Tell the client to negotiate caching
res.setHeader('Cache-Control'.'no-cache')    
// Send the password stamp of the content of the resource to the client
res.setHeader('etag', etagContent)    
res.end(data)

Copy the code

The negotiated cache implemented through Last-Modified satisfies most usage scenarios, but there are two obvious drawbacks:

  • First of all, it only judges according to the last modified timestamp of the resource. Although the requested file resource is edited, the content does not change and the timestamp will be updated, which results in the invalid judgment on the validity of the negotiation cache, and a complete resource request needs to be made again. This will undoubtedly cause the waste of network bandwidth resources, and prolong the time for users to obtain the target resources.
  • Secondly, the time stamp unit to identify the modification of file resources is second. If the modification speed of file is very fast, assuming that the modification is completed within several hundred milliseconds, the above time-stamp method to verify the validity of cache cannot identify the update of file resources.

In fact, both defects are caused by the server’s inability to recognize a real update based solely on the timestamp of the resource modification, resulting in a re-request that uses a cached Bug scenario.

Negotiation cache based on ETag

To compensate for the lack of timestamps, an ETag header has been added since the HTTP 1.1 specification, known as the Entity Tag.

The content is a string generated by the hash operation of the server for different resources. The string is similar to the file fingerprint. As long as the encoding of the file content is different, the corresponding ETag tag value will be different.

Cache decision tree