Caching mechanisms in HTTP

An overview of the

In front performance optimization, the browser cache has been, has been one of the more important an excellent caching strategies can shorten the distance of web page request resources, reduce the delay, and because the cache file can be repeated use, can also reduce the bandwidth, reduce the network load, so understanding the browser’s cache mechanism principle, for the front-end engineer, it is necessary.

Browser Cache classification

The browser cache classification is mainly divided into strong cache and cache negotiation. Let’s take a look at the simple process of loading a page:

The browser determines whether a resource request is hit based on the HTTP header informationStrong cacheIf a match is made, the cache resource is loaded directly and no request is sent to the server.
If you missStrong cache, the browser sends a resource request to the server. The server determines whether the browser’s local cache is invalid. If the browser can continue to use the cache, the server does not return the resource information, and the browser continues to load the resource information from the cacheCache consultation.
If you missCache consultation, the server returns the complete resource information to the browser, which loads the new resource and updates the local cache.

Strong caching and cache negotiation have in common: if a hit is made, the resource is loaded from the browser cache, not from the server. The difference is that strong caching does not send requests to the server, whereas cache negotiation requires sending requests to the server.

The cache location

There are four cache locations with different priorities. If the cache is searched in sequence and no match is found, the server resource is requested.

Service Worker
Memory Cache
Disk Cache
Push Cache

Service Worker

A Service Worker is an independent thread running behind the browser and can generally be used to realize offline caching. You can click the link to learn more about Service workers. In short, Service Worke is like a middleman. All requests sent by the browser are intercepted by The Service Worke, so HTTPS must be used for security. It gives us freedom to control which files are cached, how the cache is matched, how the cache is read, and the cache is persistent. Service Worke Demo

All resources cached by the Service Worker can be seen in the Cache Storage of the Application Tab on the console:

Memory Cache

As the name implies, the cache is directly read from the memory, generally after the page is refreshed, mainly contains the current page has captured resources, such as downloaded styles, scripts, images, etc., when the page is closed, the memory will be released.

Disk Cache

Compared with Memory Cache, most of the Cache comes from Disk Cache. It mainly determines which resources need to be cached and which resources can be directly cached according to the fields in HTTP Herder. Which resources have expired and need to be rerequested will be discussed later in this article.

What files do browsers put in memory? Which ones are on the hard drive?

Large files generally probability is not big to memory, it is easy to understand, memory and hard disk capacity is small, the operating system memory usage to scrimp, so generally, after refresh the page cached in memory of the resources is generally small, such as the style of the already downloaded on the page, script, images. Also, in the case of high system memory usage, files are preferentially stored on hard disk.

Push Cache

Push Cache is HTTP/2 and is used when all three caches fail. It exists only in sessions and is released as soon as the Session ends, and is often cached “for shorter times” in HTTP/2 Push Is Demander.

Storage priority

Service Worker > memory cache > disk cache > Push Cache

Strong cache

The browser will not send a request to the server if it hits the strong cache. As mentioned above, the strong cache is determined by the HTTP header information of the resource request. In fact, strong caching is controlled by the Expires or cache-Control fields in the request header. Both fields are implemented in the HTTP response header and represent the expiration date of the resource in the browser Cache. Cache-control is http1.1. If both exist, cache-control takes precedence.

Expires

Expires is a cache expiration date that specifies the expiration date of a resource. It is a specific point in time returned from the server. An HTTP request tells the browser that the browser can load the resource from the cache before the expiration date without making a request again. As shown in the figure above, the server returns Expires: Sat, 11 Sep 2021 06:48:30 GMT, which means the resource expires until 2021-09-11 14:48:30. The obvious bug with this approach is that the server returns an absolute time, so when the client’s local time is changed, it can affect the Cache hit result, hence the cache-control field. This is why Expires and cache-Control exist together. The latter will prevail. Here is cache-control:

Cache-Control

Cache-control is a relative Expires, so cache-control doesn’t have the problem of changing client time to affect a Cache hit. Cache-control can be composed of multiple fields, including the following values:

max-age:max-ageThe value of is the length of time, in s, during which the cache is valid, as shown in the figure belowmax-age=31536000, that is, the cache validity period is31536000sIf the cache is not disabled and the validity period is not exceeded, reaccessing the resource will hit the cache.
public:Cache-Control: publicIndicates that the response content can be used by any object (client, proxy server…) The cache.
private:Cache-Control: privateIndicates that only the browser that initiated the request can cache.
s-maxageWith:max-age, but only on proxy servers.
No-cache: this is not a literal ‘no cache’, but uses comparative cache validation to force validation to the server. The implication is that each request is authenticated to the server before it is sent, and the local cache can only be used if the server allows it.
No-store: this disables caching in a real sense, as data is retrieved from the server on every request.

`cache-control`How are instructions used?

Cache-control and max-age can be set to a long Cache. Node is used to see how cache-control is set.

// server.js
const http = require('http')
const fs = require('fs')
 
http.createServer(function (request, response) {
  console.log('request come', request.url)
 
  if (request.url === '/') {
    const html = fs.readFileSync('test.html'.'utf8')
    response.writeHead(200, {
      'Content-Type': 'text/html'
    })
    response.end(html)
  }
 
  if (request.url === '/script.js') {
    response.writeHead(200, {
      'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=200' // Browser cache time
    })
    response.end('console.log("script loaded twice")')
  }
}).listen(8888)
 
console.log('server listening on 8888')
Copy the code

If the client makes a request to the server, does that mean that the entire entity content of the resource must be read and retrieved?

Imagine a resource to save on the client cache expiration time has expired, but this time the server is not updated this resource, if the resource is very big, the client requires that the server put this thing to send it again, at this time server is rejected, the inner resources does not change, will send very waste of bandwidth and time again, Is there a way to let the server know that the client’s existing cache file is valid, and then directly tell the client: “This thing you can directly use the cache, I haven’t updated it, so I won’t pass it again”? In order to enable the client and server to verify whether the cache file is updated and improve the reuse rate of the cache, HTTP has added several fields to do this, which are described below.

Cache consultation

Note: Cache negotiation must be accompanied by the use of strong caching. If strong caching is not enabled, cache negotiation is meaningless.

When a browser requests a resource to have no strong hit cache, will send a request to the server, verify whether caching consultation, if negotiation cache hit, the server will return an HTTP status code of 304 and displays a Not Modified response, browser after receiving the response, will continue to load from the cache to this resource.

200 and 304

Hello, how old are you? S: I'm 18 years old. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = (half a year after the conversation:) C: how old are you? I guess you're 18. S: Damn, you know that and ask me? (304) = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = (after a year of conversation:) C: how old are you? I guess you're 18. S: I'm 19 years old (200)Copy the code

Cache negotiation is determined by last-modified, if-modified-since and ETag, if-none-match pairs.

The if-modified-since last-modified

Last-ModifiedRepresents the last time the resource was modified on the server. This field is returned by the server when the browser first requests the resource.
When the browser asks the server for the resource again, it adds the header to the requestIf-Modified-SinceHeader, whose value was returned in the previous requestLast-ModifiedThe value of the.
When the server receives a request for a resource again, the request is processed based on theIf-Modified-SinceField compares to the last time the resource was modified on the server and returns if there is no change304 Not ModifiedOtherwise, the resource content is returned as normal.

If the cache negotiation also fails, the browser loads the resource directly from the server, at which point the last-Modified field is updated after the reload, and if-modified-since enables the last-Modified value returned from the upload on the next request.

Actually this method also can have defects, such as the server side has a modified very frequent file, may will appear on the server resources has changed, but the last modified time have no change, this time a cache hit will appear problem, so it spawned another pair ETag, If – None – Match 】【 to manage cache consultation, Here is an introduction:

ETag, If – None – Match

Unlike last-modified, if-modified-since, ETag, if-none-match returns a check code. ETag can ensure that each resource is unique, and changes of resources will lead to ETag changes. The server will judge whether the cache is hit according to the if-Modified-since field value sent by the browser, where the if-Modified-since field value is the first time that the browser requests this resource. The ETag value returned by the server solves the last-Modified problem described above.

Last-modified and ETag can also be used together, but the server validates the ETag first. If the ETag is consistent, last-Modified will continue to be compared before returning 304.

`ETag`How is it generated

In Apache, ETag generation depends on the following factors:

The I-Node number of a file is the number used by Linux/Unix to identify the file, which can be seen by running the ‘ls -i’ command
The time when the file was last modified.
File size.

When generating an Etag, one or more of these factors can be used, using a collision-proof hash function.

Here is a problem I saw when checking ETag generation.

If the ETag value in the HTTP response header changes, does that mean the contents of the file must have changed?

`ETag`The use of

Here’s a simple implementation using Node:

// server.js
const http = require('http')
const fs = require('fs')
 
http.createServer(function (request, response) {
  console.log('request come', request.url)
 
  if (request.url === '/') {
    const html = fs.readFileSync('test.html'.'utf8')
    response.writeHead(200, {
      'Content-Type': 'text/html'
    })
    response.end(html)
  }
 
  if (request.url === '/script.js') {
    console.log(request.headers)
    const etag = request.headers['if-none-match']
    if(etag === '777') {
      response.writeHead(304, {
        'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=2000000, no-cache'.'Last-Modified': '123'.'Etag': '777'
      })
      response.end(' ') // There is no content here, and even if there is content, the browser will not read it
    } else {
      response.writeHead(200, {
        'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=2000000, no-cache'.// With no-cache, the browser verifies to the server that it will not read from the cache, even if it is not expired.
        'Last-Modified': '123'.// An arbitrary value
        'Etag': '777'
      })
      response.end('console.log("script loaded twice")')
    } 
  }
}).listen(8888)
 
console.log('server listening on 8888')
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

An overview of the

Browser Cache classification

The cache location

Service Worker

Memory Cache

Disk Cache

Push Cache

Storage priority

Strong cache

Expires

Cache-Control

`cache-control`How are instructions used?

Cache consultation

The if-modified-since last-modified

ETag, If – None – Match

`ETag`How is it generated

`ETag`The use of

conclusion

Caching mechanisms in HTTP

An overview of the

Browser Cache classification

The cache location

Service Worker

Memory Cache

Disk Cache

Push Cache

Storage priority

Strong cache

Expires

Cache-Control

cache-controlHow are instructions used?

Cache consultation

The if-modified-since last-modified

ETag, If – None – Match

ETagHow is it generated

ETagThe use of

conclusion

Related Posts

HTTP Request and HTTP Cache control

Vue front-end project internationalization

Applets architecture and common components

`cache-control`How are instructions used?

`ETag`How is it generated

`ETag`The use of