background
In recent resource reform, all static resource urls are delivered by the server, and params such as authentication and expiration time are added. As a result, HTTP cache, one of the most powerful tools for static resource optimization, is invalid. As long as the URL of the delivered resource expires, the resource will request download again, but in fact, the resource has not changed. In order to understand the issues related to caching, this article explores offline/long caching from three aspects:
- HTTP cache
- Application Cache
- Service Worker
HTTP cache
Caching requested resources locally and re-using them for subsequent requests as much as possible, reducing the number of HTTP requests and significantly improving the performance of websites and applications. So when do you cache resources locally? When do cache resources expire? When are these cached resources used?
Caching mechanism Flow
As can be seen from the flow, after the browser initiates a resource request, there are roughly three parts: strong cache check, negotiated cache check, and resource request. This article mainly explains the strong cache and negotiation cache module, the resource request part is a normal HTTP interaction process, but it is worth noting that: because generally only GET requests will be cached, so here refers to the general GET resource request.
Strong cache
There is no need to send additional requests to the server, using the local cache directly. In Chrome, the local strong cache is divided into two types: disk cache and memory cache. If you check the Networks in DevTools, you can see the request status is 200. And the request followed by from disk cache and from memory cache is using a strong cache, as shown in the following two figures.I have not yet understood how Chrome controls two kinds of strong cache, so I do not expand, so as not to mislead readers, hope to have a master point out !!!! Put what you found hereChrome Official DocumentationThe two strong caching strategies are related to the life cycle of the rendering process, which roughly corresponds to the TAB TAB:
Chrome employs two caches — an on-disk cache and a very fast in-memory cache. The lifetime of an in-memory cache is attached to the lifetime of a render process, which roughly corresponds to a tab. Requests that are answered from the in-memory cache are invisible to the web request API.
Whether to use strong caching is controlled by three HTTP header fields: Expires, Pragma, and cache-Control.
Expires
The Exipres field is a field in HTTP/1.0 that has the lowest priority of the three cache control fields.As shown in the figure, the Expires value in the response header is a timestamp. When a request is made, if the local system time is before this timestamp, the cache is valid; otherwise, the cache is invalid and the negotiation cache is entered. If the response header sets Expires to an invalid date, such as 0, it represents a past date, i.e. the resource has expired.
Cache-Control
Cache-control is a general header field specified in HTTP/1.1. The common attributes are as follows:
- No-store: the cache is disabled and each request is sent to the server for the latest resources.
- No-cache: directly enters the negotiation cache module without strong cache and requests the server to check whether the resource is fresh.
- Private: Indicates the private cache. The intermediate proxy server cannot cache resources
- Public: public cache. Intermediate proxy servers can cache resources
- Max-age: indicates the maximum cache validity period, in seconds. The start time is the Date field in the cache header, that is, the validity period reaches responseDate + max-age. When the request is initiated, the cache expires.
- Must-revalidate: If the cache expires, the server must validate it again.
Pragma
Pragma is a generic header field specified in HTTP/1.0 for backward compatibility with cache servers that only support the HTTP/1.0 protocol. This field has only one value: no-cache, which behaves like cache-control: no-cache, but it is not explicitly defined in the HTTP response header, so it cannot be used as a complete replacement for the cache-control header defined in HTTP/1.1.
If both Pragma and cache-control fields exist, Pragma takes precedence over cache-Control.
Negotiate the cache
When strong Cache expires or request header fields are not set to strong Cache, such as cache-control :no-cache and Pragma:no-cache, the negotiated Cache section is entered. The negotiated cache involves two pairs of header fields, last-modified/if-Modified-since, and ETag/ if-none-match. If the request header contains if-modified-since or if-none-match, the server is sent to check whether the resource has changed. If the resource has changed, the server returns 200. The browser calculates whether the resource has been cached and uses the resource. If not, the cache is hit and 304 is returned. The browser updates the cache header based on the response header, extends the validity period, and uses the cache directly.
Last-Modified/If-Modified-Since
Last-modified/if-modified-since is the resource modification time. When requesting a resource for the first time, the server puts the Last Modified time of the resource in the last-Modified field of the response header. When requesting the resource for the second time, the browser automatically puts the last-Modified value of the resource in the Last response header into the if-Modified-since field of the second request header. The server compares the last modification time of the server resource to the if-modified-since value in the request header, and returns 304 If it is equal to the cache hit, or 200 If it is not.
ETag/If-None-Match
The value of ETag/ if-none-match is a string of hash values (different hash algorithms). It is the identifier of the resource. When the resource content changes, the hash value also changes. The process is similar, except that the server compares the hash value of the server resource to the if-none-match value in the request header, but the comparison is different because there are two types of ETag:
- Strong verification: The hash value of a resource is unique and changes as well.
- Weak check: The resource hash value starts with W/. If the resource changes a little, it may also match the cache.
For example:
ETag: “33a64df551425fcc55e4d42a148795d9f25f89d4″ ETag: W/”0815”
The difference between
- ETag/ if-none-match has a higher priority than last-modified/if-modified-since.
- Last-modified/if-modified-since there is a 1S problem where the server modifies the file within 1S and returns 304 incorrectly when requested again.
Proxy service cache
Vary is a header field in HTTP/1.1 whose value is a field in the request header, such as accept-Encoding in the figure above, and can be multiple, separated by commas, that records which header fields are referenced by the proxy server’s return resource. The proxy server receives the response from the source server and caches different versions of the resource based on the list of fields in Vary. When a resource request is accessed again, the proxy server analyzes the request header field and returns the correct version.
Application Cache (deprecated)
Although some browsers still support it, the W3C has deprecated the solution and recommends that developers use the Service Worker.
Introduction to the
HTML5’s Application Cache is a caching mechanism (not a storage technology) based on a manifest file (Cache manifest file, commonly named.appCache). The files that need to be cached are defined in the manifest file. Browsers that support the MANIFEST file will store files locally according to the manifest file rules. When the network is offline, the browser will display the data stored offline. This operation applies to scenarios where the content changes little and is relatively fixed. The process is roughly as follows:It has the following advantages:
- Offline browsing – Users can use them while the application is offline.
- Faster speed – Cached resources load faster.
- Reduce server load – The browser will only download updated or changed resources from the server.
Configuration file
A typical manifest file structure looks like this:
CACHE MANIFEST
#version 1.0
CACHE:
/static/img/dalizhineng.c66247e.png
http://localhost:8080/static/img/setting-icon-hover413.c0d7.png
NETWORK:
*
FALLBACK:
/html5/ /404.html
Copy the code
The CACHE MANIFEST in the first line is a fixed line and must come first. The second line is usually a comment starting with a #. When there are cached files that need to be updated, you can change the comment content. It can be a version number, timestamp, or MD5 code. The rest is divided into three sections (which can be arranged in any order, and each section can be repeated in the same list) :
- CACHE (required)
Identify which files need to be cached, either relative or absolute.
- NETWORK (optional)
Identifies which files must go through the network request. It can be a relative path or an absolute path, indicating that the specified resource must pass the network request. You can also use wildcard * to indicate that all resources except CACHE require network requests. For example in the following example ‘index.css’ is never cached and must be requested through the network.
NETWORK:
index.css
Copy the code
- FALLBACK (optional)
The browser uses the Fallback resource when it indicates that the specified resource is unreachable. Each record lists two URIs: the first for the resource and the second for the Fallback resource. Both URIs must use relative paths and be cognate with the manifest file. You can use wildcards, such as 404.html when the page is not accessible in the following example.
FALLBACK:
*.html /404.html
Copy the code
Method of use
Set the manifest attribute in the HTML tag of the document and reference the manifest file. The manifest file can point to an absolute url or a relative path, but the absolute URL must be the same as the corresponding network application, and the MIME-type (text/cache-manifest) must be correctly configured on the server.
<html lang="en" manifest="manifest.appcache">
Copy the code
Access and manipulate the cache
Some browsers provide window.applicationCache objects to access and manipulate the offline cache.
- The cache state
Window. The applicationCache. The status attribute said the current cache state.
state | The status value | describe |
---|---|---|
UNCACHED | 0 | No cache, that is, no application cache associated with the page |
IDLE | 1 | Idle: the application cache has not been updated |
CHECKING | 2 | Checking, that is, downloading the description file and checking for updates |
DOWNLOADING | 3 | Downloading: the application cache is downloading the description file |
UPDATEREADY | 4 | The update is complete and all resources have been downloaded |
OBSOLETE | 5 | Deprecated, meaning that the application cache’s description file no longer exists, so pages can no longer access the application cache |
- Cache event
The event name | describe |
---|---|
cached | The browser fires the “cached” event when the download is complete and the application is first downloaded into the cache |
checking | Each time the application loads, the manifest file is checked and a “checking” event is always triggered first |
downloading | If the application is not cached or the list file is changed, the browser downloads and caches all resources in the list. The “Downloading” event is triggered to start the downloading process |
error | If the browser is offline and the manifest list fails to be checked, an “error” event is raised, also when an uncached application references a nonexistent manifest file |
noupdate | If no changes are made and the application has been cached, the “noupdate” event is fired and the whole process ends |
obsolete | If a cached application references a manifest file that doesn’t exist, it will trigger “obsolete” and will not load resources from the cache but from the network after the application is removed from the cache |
progress | The “Progress” event is triggered intermittently during the download process, usually at the end of each file download |
updateready | When the download is complete and the cached application is updated, the browser triggers the Updaterady event |
- The cache method
The method name | describe |
---|---|
abort | Canceling resource loading |
swapCache | Replace the old cache with a new one, but using location.reload() is more convenient |
update | Update the cache |
Matters needing attention
- Updating a file listed in the manifest does not mean that the browser will re-cache the resource; the manifest itself must be changed.
- Browsers may have different limits on how much data they can cache (some browsers set a limit of 5MB per site).
- If the manifest file, or one of the internally listed files, does not download properly, the entire update process fails and the browser continues to use the old cache entirely.
- The HTML that references the MANIFEST must be cognate with the MANIFEST file, in the same domain. Resources in FALLBACK must be cognate with the manifest file.
- The browser automatically caches HTML files that reference the MANIFEST file, so if the HTML content is changed, either the manifest version needs to be updated or the application cache needs to be updated.
Service Worker
Introduction to the
Service workers are another typeweb worker, additional capability for persistent offline caching. The host environment provides a separate thread to execute its scripts, solving the performance problems associated with time-consuming and resource-consuming operations in JS. As you can see from the chart below, the support is quite high except for IE.
The characteristics of
- Independent of the main thread of the JS engine, scripts run in the background, do not affect the page rendering
- It exists forever after being installed, unless manually uninstalled. Manual unloading mode:
if ('serviceWorker' in navigator) {
navigator.serviceWorker.getRegistrations()
.then(function (registrations) {
for (let registration of registrations) {
// Find the SW to remove
if (registration && registration.scope === 'https://xxx.com') { registration.unregister(); }}}); }Copy the code
- Can intercept requests and returns, cache files. The FETCH API is used by sw to intercept and process network requests, and in conjunction with cacheStorage to manage caching of Web pages and communicate with front-end PostMessages.
- You can’t manipulate the DOM directly: Because sw is a script that runs independently of the web page, you can’t access Windows or the DOM in its environment.
- The production environment can be used only when HTTPS is used. In local debugging, http://localhost and http://127.0.0.1 can also be used. However, Bypass for network must be selected. Otherwise, static resources are cached (without hash values) and debugging fails.
- Asynchronous implementation, sw makes heavy use of Promise.
- As described in the documentation, cacheStorage capacity is not limited at the SW level, but it is still limited by the host environment QuotaManager.
scope
The scope of the SW is a URL path address that represents the range of pages that the SW can control. For example, you can control it down herehttp://localhost:8080/ehx-room/All pages in the directory. The default scope is the path at registration time, as shown in the following example./ehx-room/sw.js.Can also be in the navigator. ServiceWorker. The register () method is introduced into {scope: ‘/ XXX /yyyy/’} specifies the scope, but specify scope must be in the path of SW registration, such as the scope: ‘/ XXX /yyyy/’}
The life cycle
When we register a Service Worker, it goes through stages of its life cycle and triggers events. The entire lifecycle includes installing –> installed –> activating –> activated –> redundant. When the Service Worker is installed, the install event is triggered. After activated, an Activate event is triggered.
- Installing
This state occurs after the service worker is registered, indicating the start of installation. The install event is triggered during this process and resources can be cached offline.
- Within the install callback event function, you can call the event.waitUntil() method and pass in a promise, and install will not end until the promise completes.
- You can also use the self.skipwaiting () method to enter the activating state directly without waiting for the other Service workers to be shut down
- Installed
The SW has been installed and entered the waiting state, waiting for other Service workers to be shut down
- Activating
In this state, the client that is not controlled by other SW allows the current worker to complete the installation, clears other workers and the old cache resources associated with the cache, and waits for the new Service worker thread to be activated.
- Activated
In this state, the Activate event callback is handled and provides processing for functional events: FETCH, sync, push.
In addition to supporting the event.waituntil () method, you can also use the self.clients.claim() method in the activate callback event function to control the currently opened web page without refreshing it.
- Redundant
This state indicates that one SW has ended its life cycle and is being replaced by another SW.
The working process
- After the main thread successfully registers the Service Worker, it starts to download and parse the Service Worker file. During the execution, it starts to install the Service Worker. During this process, the install event of the Worker thread will be triggered.
- If the install event callback executes successfully (the install callback usually does some cache read/write work and may fail), then the Service Worker is activated. During this process, the Worker thread’s activate event is triggered. If the install event callback fails to execute, the life cycle enters the Error termination state, terminating the life cycle.
- After activation, the Service Worker can control resource requests for pages under scope and can listen for fetch events.
- If the Service Worker is unregistered or a new version of the Service Worker is updated after activation, the Service Worker ends its life cycle and enters the Terminated state.
The sample
// In the page onload event callback, register SW
if ('serviceWorker' in navigator) {
window.addEventListener('load'.() = > {
navigator.serviceWorker.register('service-worker.js')
.then(registration= > {
// Registration succeeded
})
.catch(err= > {
// Failed to register
});
});
}
Copy the code
// service-worker.js
const CACHE_VERSION = 'unique_v1';
// Listen for the activate event and clear other caches after activation
self.addEventListener('activate'.event= > {
const cachePromise = caches.keys().then(keys= > {
return Promise.all(
keys.map(key= > {
if(key ! == CACHE_VERSION) {returncaches.delete(key); }})); }); event.waitUntil(cachePromise).then(() = > {
// Give the new SW control of the current page through the clients.claim method
return self.clients.claim();
});
});
self.addEventListener('fetch'.event= > {
event.respondWith(
caches
.match(event.request, {
// Ignore the query part of the URL
ignoreSearch: DEFAULT_CONFIG.ignoreURLParametersMatching,
})
.then(response= > {
// If it matches a resource in the cache, it returns it directly
if (response) {
return response;
}
// Copy the original request if the match fails
const request = event.request.clone();
const url = request.url;
if (matchOne(url, DEFAULT_CONFIG.exclude)) {
return fetch(request);
} else if (request.method === 'GET' && matchOne(url, DEFAULT_CONFIG.include)) {
return fetch(request).then(httpRes= > {
// Correct requests are cached
if (httpRes && [200.304].includes(httpRes.status)) {
// Cache resources
const responseClone = httpRes.clone();
caches.open(DEFAULT_CONFIG.cacheId).then(cache= > {
cache.put(event.request, responseClone);
});
}
return httpRes;
});
} else {
returnfetch(request); }})); });Copy the code
conclusion
Method \ category | granularity | Whether you need to connect to the Internet | Can you take the initiative to update | Size limit |
---|---|---|---|---|
HTTP cache | A single resource | Strong cache resources can be used offline | no | Browser QuotaManager restriction |
Application Cache | The whole application | no | is | Generally 5 MB |
Service Worker | A single resource | no | no | Browser QuotaManager restriction |
Reference documentation
Read the HTTP caching mechanism
Caching and offline development with Service Worker and cacheStorage
Workbox webpack Plugins
Make your WebApp available offline
HTML5 Offline Cache – Introduction to the Manifest
Application caching Guide for beginners
Chapter 4 Service Worker · PWA Application practice
Service Worker offline caching practices