An overview of the
In front performance optimization, the browser cache has been, has been one of the more important an excellent caching strategies can shorten the distance of web page request resources, reduce the delay, and because the cache file can be repeated use, can also reduce the bandwidth, reduce the network load, so understanding the browser’s cache mechanism principle, for the front-end engineer, it is necessary.
Browser Cache classification
The browser cache classification is mainly divided into strong cache and cache negotiation. Let’s take a look at the simple process of loading a page:
- The browser determines whether a resource request is hit based on the HTTP header information
Strong cache
If a match is made, the cache resource is loaded directly and no request is sent to the server. - If you miss
Strong cache
, the browser sends a resource request to the server. The server determines whether the browser’s local cache is invalid. If the browser can continue to use the cache, the server does not return the resource information, and the browser continues to load the resource information from the cacheCache consultation
. - If you miss
Cache consultation
, the server returns the complete resource information to the browser, which loads the new resource and updates the local cache.
Strong caching and cache negotiation have in common: if a hit is made, the resource is loaded from the browser cache, not from the server. The difference is that strong caching does not send requests to the server, whereas cache negotiation requires sending requests to the server.
The cache location
There are four cache locations with different priorities. If the cache is searched in sequence and no match is found, the server resource is requested.
- Service Worker
- Memory Cache
- Disk Cache
- Push Cache
Service Worker
A Service Worker is an independent thread running behind the browser and can generally be used to realize offline caching. You can click the link to learn more about Service workers. In short, Service Worke is like a middleman. All requests sent by the browser are intercepted by The Service Worke, so HTTPS must be used for security. It gives us freedom to control which files are cached, how the cache is matched, how the cache is read, and the cache is persistent. Service Worke Demo
All resources cached by the Service Worker can be seen in the Cache Storage of the Application Tab on the console:
Memory Cache
As the name implies, the cache is directly read from the memory, generally after the page is refreshed, mainly contains the current page has captured resources, such as downloaded styles, scripts, images, etc., when the page is closed, the memory will be released.
Disk Cache
Compared with Memory Cache, most of the Cache comes from Disk Cache. It mainly determines which resources need to be cached and which resources can be directly cached according to the fields in HTTP Herder. Which resources have expired and need to be rerequested will be discussed later in this article.
What files do browsers put in memory? Which ones are on the hard drive?
Large files generally probability is not big to memory, it is easy to understand, memory and hard disk capacity is small, the operating system memory usage to scrimp, so generally, after refresh the page cached in memory of the resources is generally small, such as the style of the already downloaded on the page, script, images. Also, in the case of high system memory usage, files are preferentially stored on hard disk.
Push Cache
Push Cache is HTTP/2 and is used when all three caches fail. It exists only in sessions and is released as soon as the Session ends, and is often cached “for shorter times” in HTTP/2 Push Is Demander.
Storage priority
Service Worker
> memory cache
> disk cache
> Push Cache
Strong cache
The browser will not send a request to the server if it hits the strong cache. As mentioned above, the strong cache is determined by the HTTP header information of the resource request. In fact, strong caching is controlled by the Expires or cache-Control fields in the request header. Both fields are implemented in the HTTP response header and represent the expiration date of the resource in the browser Cache. Cache-control is http1.1. If both exist, cache-control takes precedence.
Expires
Expires is a cache expiration date that specifies the expiration date of a resource. It is a specific point in time returned from the server. An HTTP request tells the browser that the browser can load the resource from the cache before the expiration date without making a request again. As shown in the figure above, the server returns Expires: Sat, 11 Sep 2021 06:48:30 GMT, which means the resource expires until 2021-09-11 14:48:30. The obvious bug with this approach is that the server returns an absolute time, so when the client’s local time is changed, it can affect the Cache hit result, hence the cache-control field. This is why Expires and cache-Control exist together. The latter will prevail. Here is cache-control:
Cache-Control
Cache-control is a relative Expires, so cache-control doesn’t have the problem of changing client time to affect a Cache hit. Cache-control can be composed of multiple fields, including the following values:
- max-age:
max-age
The value of is the length of time, in s, during which the cache is valid, as shown in the figure belowmax-age=31536000
, that is, the cache validity period is31536000s
If the cache is not disabled and the validity period is not exceeded, reaccessing the resource will hit the cache. - public:
Cache-Control: public
Indicates that the response content can be used by any object (client, proxy server…) The cache. - private:
Cache-Control: private
Indicates that only the browser that initiated the request can cache. - s-maxageWith:
max-age
, but only on proxy servers. - No-cache: this is not a literal ‘no cache’, but uses comparative cache validation to force validation to the server. The implication is that each request is authenticated to the server before it is sent, and the local cache can only be used if the server allows it.
- No-store: this disables caching in a real sense, as data is retrieved from the server on every request.
cache-control
How are instructions used?
Cache-control and max-age can be set to a long Cache. Node is used to see how cache-control is set.
// server.js
const http = require('http')
const fs = require('fs')
http.createServer(function (request, response) {
console.log('request come', request.url)
if (request.url === '/') {
const html = fs.readFileSync('test.html'.'utf8')
response.writeHead(200, {
'Content-Type': 'text/html'
})
response.end(html)
}
if (request.url === '/script.js') {
response.writeHead(200, {
'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=200' // Browser cache time
})
response.end('console.log("script loaded twice")')
}
}).listen(8888)
console.log('server listening on 8888')
Copy the code
If the client makes a request to the server, does that mean that the entire entity content of the resource must be read and retrieved?
Imagine a resource to save on the client cache expiration time has expired, but this time the server is not updated this resource, if the resource is very big, the client requires that the server put this thing to send it again, at this time server is rejected, the inner resources does not change, will send very waste of bandwidth and time again, Is there a way to let the server know that the client’s existing cache file is valid, and then directly tell the client: “This thing you can directly use the cache, I haven’t updated it, so I won’t pass it again”? In order to enable the client and server to verify whether the cache file is updated and improve the reuse rate of the cache, HTTP has added several fields to do this, which are described below.
Cache consultation
Note: Cache negotiation must be accompanied by the use of strong caching. If strong caching is not enabled, cache negotiation is meaningless.
When a browser requests a resource to have no strong hit cache, will send a request to the server, verify whether caching consultation, if negotiation cache hit, the server will return an HTTP status code of 304 and displays a Not Modified response, browser after receiving the response, will continue to load from the cache to this resource.
200 and 304
Hello, how old are you? S: I'm 18 years old. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = (half a year after the conversation:) C: how old are you? I guess you're 18. S: Damn, you know that and ask me? (304) = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = (after a year of conversation:) C: how old are you? I guess you're 18. S: I'm 19 years old (200)Copy the code
Cache negotiation is determined by last-modified, if-modified-since and ETag, if-none-match pairs.
The if-modified-since last-modified
Last-Modified
Represents the last time the resource was modified on the server. This field is returned by the server when the browser first requests the resource.- When the browser asks the server for the resource again, it adds the header to the request
If-Modified-Since
Header, whose value was returned in the previous requestLast-Modified
The value of the. - When the server receives a request for a resource again, the request is processed based on the
If-Modified-Since
Field compares to the last time the resource was modified on the server and returns if there is no change304 Not Modified
Otherwise, the resource content is returned as normal.
If the cache negotiation also fails, the browser loads the resource directly from the server, at which point the last-Modified field is updated after the reload, and if-modified-since enables the last-Modified value returned from the upload on the next request.
Actually this method also can have defects, such as the server side has a modified very frequent file, may will appear on the server resources has changed, but the last modified time have no change, this time a cache hit will appear problem, so it spawned another pair ETag, If – None – Match 】 【 to manage cache consultation, Here is an introduction:
ETag, If – None – Match
Unlike last-modified, if-modified-since, ETag, if-none-match returns a check code. ETag can ensure that each resource is unique, and changes of resources will lead to ETag changes. The server will judge whether the cache is hit according to the if-Modified-since field value sent by the browser, where the if-Modified-since field value is the first time that the browser requests this resource. The ETag value returned by the server solves the last-Modified problem described above.
Last-modified and ETag can also be used together, but the server validates the ETag first. If the ETag is consistent, last-Modified will continue to be compared before returning 304.
ETag
How is it generated
In Apache, ETag generation depends on the following factors:
- The I-Node number of a file is the number used by Linux/Unix to identify the file, which can be seen by running the ‘ls -i’ command
- The time when the file was last modified.
- File size.
When generating an Etag, one or more of these factors can be used, using a collision-proof hash function.
Here is a problem I saw when checking ETag generation.
If the ETag value in the HTTP response header changes, does that mean the contents of the file must have changed?
ETag
The use of
Here’s a simple implementation using Node:
// server.js
const http = require('http')
const fs = require('fs')
http.createServer(function (request, response) {
console.log('request come', request.url)
if (request.url === '/') {
const html = fs.readFileSync('test.html'.'utf8')
response.writeHead(200, {
'Content-Type': 'text/html'
})
response.end(html)
}
if (request.url === '/script.js') {
console.log(request.headers)
const etag = request.headers['if-none-match']
if(etag === '777') {
response.writeHead(304, {
'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=2000000, no-cache'.'Last-Modified': '123'.'Etag': '777'
})
response.end(' ') // There is no content here, and even if there is content, the browser will not read it
} else {
response.writeHead(200, {
'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=2000000, no-cache'.// With no-cache, the browser verifies to the server that it will not read from the cache, even if it is not expired.
'Last-Modified': '123'.// An arbitrary value
'Etag': '777'
})
response.end('console.log("script loaded twice")')
}
}
}).listen(8888)
console.log('server listening on 8888')
Copy the code