background

In recent resource reform, all static resource urls are delivered by the server, and params such as authentication and expiration time are added. As a result, HTTP cache, one of the most powerful tools for static resource optimization, is invalid. As long as the URL of the delivered resource expires, the resource will request download again, but in fact, the resource has not changed. In order to understand the issues related to caching, this article explores offline/long caching from three aspects:

  • HTTP cache
  • Application Cache
  • Service Worker

HTTP cache

Caching requested resources locally and re-using them for subsequent requests as much as possible, reducing the number of HTTP requests and significantly improving the performance of websites and applications. So when do you cache resources locally? When do cache resources expire? When are these cached resources used?

Caching mechanism Flow

As can be seen from the flow, after the browser initiates a resource request, there are roughly three parts: strong cache check, negotiated cache check, and resource request. This article mainly explains the strong cache and negotiation cache module, the resource request part is a normal HTTP interaction process, but it is worth noting that: because generally only GET requests will be cached, so here refers to the general GET resource request.

Strong cache

There is no need to send additional requests to the server, using the local cache directly. In Chrome, the local strong cache is divided into two types: disk cache and memory cache. If you check the Networks in DevTools, you can see the request status is 200. And the request followed by from disk cache and from memory cache is using a strong cache, as shown in the following two figures.I have not yet understood how Chrome controls two kinds of strong cache, so I do not expand, so as not to mislead readers, hope to have a master point out !!!! Put what you found hereChrome Official DocumentationThe two strong caching strategies are related to the life cycle of the rendering process, which roughly corresponds to the TAB TAB:

Chrome employs two caches — an on-disk cache and a very fast in-memory cache. The lifetime of an in-memory cache is attached to the lifetime of a render process, which roughly corresponds to a tab. Requests that are answered from the in-memory cache are invisible to the web request API.

Whether to use strong caching is controlled by three HTTP header fields: Expires, Pragma, and cache-Control.

Expires

The Exipres field is a field in HTTP/1.0 that has the lowest priority of the three cache control fields.As shown in the figure, the Expires value in the response header is a timestamp. When a request is made, if the local system time is before this timestamp, the cache is valid; otherwise, the cache is invalid and the negotiation cache is entered. If the response header sets Expires to an invalid date, such as 0, it represents a past date, i.e. the resource has expired.

Cache-Control

Cache-control is a general header field specified in HTTP/1.1. The common attributes are as follows:

  • No-store: the cache is disabled and each request is sent to the server for the latest resources.
  • No-cache: directly enters the negotiation cache module without strong cache and requests the server to check whether the resource is fresh.
  • Private: Indicates the private cache. The intermediate proxy server cannot cache resources
  • Public: public cache. Intermediate proxy servers can cache resources
  • Max-age: indicates the maximum cache validity period, in seconds. The start time is the Date field in the cache header, that is, the validity period reaches responseDate + max-age. When the request is initiated, the cache expires.
  • Must-revalidate: If the cache expires, the server must validate it again.

Pragma

Pragma is a generic header field specified in HTTP/1.0 for backward compatibility with cache servers that only support the HTTP/1.0 protocol. This field has only one value: no-cache, which behaves like cache-control: no-cache, but it is not explicitly defined in the HTTP response header, so it cannot be used as a complete replacement for the cache-control header defined in HTTP/1.1.

If both Pragma and cache-control fields exist, Pragma takes precedence over cache-Control.

Negotiate the cache

When strong Cache expires or request header fields are not set to strong Cache, such as cache-control :no-cache and Pragma:no-cache, the negotiated Cache section is entered. The negotiated cache involves two pairs of header fields, last-modified/if-Modified-since, and ETag/ if-none-match. If the request header contains if-modified-since or if-none-match, the server is sent to check whether the resource has changed. If the resource has changed, the server returns 200. The browser calculates whether the resource has been cached and uses the resource. If not, the cache is hit and 304 is returned. The browser updates the cache header based on the response header, extends the validity period, and uses the cache directly.

Last-Modified/If-Modified-Since

Last-modified/if-modified-since is the resource modification time. When requesting a resource for the first time, the server puts the Last Modified time of the resource in the last-Modified field of the response header. When requesting the resource for the second time, the browser automatically puts the last-Modified value of the resource in the Last response header into the if-Modified-since field of the second request header. The server compares the last modification time of the server resource to the if-modified-since value in the request header, and returns 304 If it is equal to the cache hit, or 200 If it is not.

ETag/If-None-Match

The value of ETag/ if-none-match is a string of hash values (different hash algorithms). It is the identifier of the resource. When the resource content changes, the hash value also changes. The process is similar, except that the server compares the hash value of the server resource to the if-none-match value in the request header, but the comparison is different because there are two types of ETag:

  • Strong verification: The hash value of a resource is unique and changes as well.
  • Weak check: The resource hash value starts with W/. If the resource changes a little, it may also match the cache.

For example:

ETag: “33a64df551425fcc55e4d42a148795d9f25f89d4″ ETag: W/”0815”

The difference between

  1. ETag/ if-none-match has a higher priority than last-modified/if-modified-since.
  2. Last-modified/if-modified-since there is a 1S problem where the server modifies the file within 1S and returns 304 incorrectly when requested again.

Proxy service cache

Vary is a header field in HTTP/1.1 whose value is a field in the request header, such as accept-Encoding in the figure above, and can be multiple, separated by commas, that records which header fields are referenced by the proxy server’s return resource. The proxy server receives the response from the source server and caches different versions of the resource based on the list of fields in Vary. When a resource request is accessed again, the proxy server analyzes the request header field and returns the correct version.

Application Cache (deprecated)

Although some browsers still support it, the W3C has deprecated the solution and recommends that developers use the Service Worker.

Introduction to the

HTML5’s Application Cache is a caching mechanism (not a storage technology) based on a manifest file (Cache manifest file, commonly named.appCache). The files that need to be cached are defined in the manifest file. Browsers that support the MANIFEST file will store files locally according to the manifest file rules. When the network is offline, the browser will display the data stored offline. This operation applies to scenarios where the content changes little and is relatively fixed. The process is roughly as follows:It has the following advantages:

  • Offline browsing – Users can use them while the application is offline.
  • Faster speed – Cached resources load faster.
  • Reduce server load – The browser will only download updated or changed resources from the server.

Configuration file

A typical manifest file structure looks like this:

CACHE MANIFEST 
#version 1.0 
 
CACHE: 
/static/img/dalizhineng.c66247e.png 
http://localhost:8080/static/img/setting-icon-hover413.c0d7.png 
 
NETWORK: 
* 
 
FALLBACK: 
/html5/ /404.html 
Copy the code

The CACHE MANIFEST in the first line is a fixed line and must come first. The second line is usually a comment starting with a #. When there are cached files that need to be updated, you can change the comment content. It can be a version number, timestamp, or MD5 code. The rest is divided into three sections (which can be arranged in any order, and each section can be repeated in the same list) :

  • CACHE (required)

Identify which files need to be cached, either relative or absolute.

  • NETWORK (optional)

Identifies which files must go through the network request. It can be a relative path or an absolute path, indicating that the specified resource must pass the network request. You can also use wildcard * to indicate that all resources except CACHE require network requests. For example in the following example ‘index.css’ is never cached and must be requested through the network.

NETWORK: 
index.css 
Copy the code
  • FALLBACK (optional)

The browser uses the Fallback resource when it indicates that the specified resource is unreachable. Each record lists two URIs: the first for the resource and the second for the Fallback resource. Both URIs must use relative paths and be cognate with the manifest file. You can use wildcards, such as 404.html when the page is not accessible in the following example.

FALLBACK: 
*.html /404.html 
Copy the code

Method of use

Set the manifest attribute in the HTML tag of the document and reference the manifest file. The manifest file can point to an absolute url or a relative path, but the absolute URL must be the same as the corresponding network application, and the MIME-type (text/cache-manifest) must be correctly configured on the server.

<html lang="en" manifest="manifest.appcache">    
Copy the code

Access and manipulate the cache

Some browsers provide window.applicationCache objects to access and manipulate the offline cache.

  • The cache state

Window. The applicationCache. The status attribute said the current cache state.

state The status value describe
UNCACHED 0 No cache, that is, no application cache associated with the page
IDLE 1 Idle: the application cache has not been updated
CHECKING 2 Checking, that is, downloading the description file and checking for updates
DOWNLOADING 3 Downloading: the application cache is downloading the description file
UPDATEREADY 4 The update is complete and all resources have been downloaded
OBSOLETE 5 Deprecated, meaning that the application cache’s description file no longer exists, so pages can no longer access the application cache
  • Cache event
The event name describe
cached The browser fires the “cached” event when the download is complete and the application is first downloaded into the cache
checking Each time the application loads, the manifest file is checked and a “checking” event is always triggered first
downloading If the application is not cached or the list file is changed, the browser downloads and caches all resources in the list. The “Downloading” event is triggered to start the downloading process
error If the browser is offline and the manifest list fails to be checked, an “error” event is raised, also when an uncached application references a nonexistent manifest file
noupdate If no changes are made and the application has been cached, the “noupdate” event is fired and the whole process ends
obsolete If a cached application references a manifest file that doesn’t exist, it will trigger “obsolete” and will not load resources from the cache but from the network after the application is removed from the cache
progress The “Progress” event is triggered intermittently during the download process, usually at the end of each file download
updateready When the download is complete and the cached application is updated, the browser triggers the Updaterady event
  • The cache method
The method name describe
abort Canceling resource loading
swapCache Replace the old cache with a new one, but using location.reload() is more convenient
update Update the cache

Matters needing attention

  • Updating a file listed in the manifest does not mean that the browser will re-cache the resource; the manifest itself must be changed.
  • Browsers may have different limits on how much data they can cache (some browsers set a limit of 5MB per site).
  • If the manifest file, or one of the internally listed files, does not download properly, the entire update process fails and the browser continues to use the old cache entirely.
  • The HTML that references the MANIFEST must be cognate with the MANIFEST file, in the same domain. Resources in FALLBACK must be cognate with the manifest file.
  • The browser automatically caches HTML files that reference the MANIFEST file, so if the HTML content is changed, either the manifest version needs to be updated or the application cache needs to be updated.

Service Worker

Introduction to the

Service workers are another typeweb worker, additional capability for persistent offline caching. The host environment provides a separate thread to execute its scripts, solving the performance problems associated with time-consuming and resource-consuming operations in JS. As you can see from the chart below, the support is quite high except for IE.

The characteristics of

  • Independent of the main thread of the JS engine, scripts run in the background, do not affect the page rendering
  • It exists forever after being installed, unless manually uninstalled. Manual unloading mode:
if ('serviceWorker' in navigator) { 
  navigator.serviceWorker.getRegistrations() 
    .then(function (registrations) { 
      for (let registration of registrations) { 
        // Find the SW to remove
        if (registration && registration.scope === 'https://xxx.com') { registration.unregister(); }}}); }Copy the code
  • Can intercept requests and returns, cache files. The FETCH API is used by sw to intercept and process network requests, and in conjunction with cacheStorage to manage caching of Web pages and communicate with front-end PostMessages.

  • You can’t manipulate the DOM directly: Because sw is a script that runs independently of the web page, you can’t access Windows or the DOM in its environment.
  • The production environment can be used only when HTTPS is used. In local debugging, http://localhost and http://127.0.0.1 can also be used. However, Bypass for network must be selected. Otherwise, static resources are cached (without hash values) and debugging fails.

  • Asynchronous implementation, sw makes heavy use of Promise.
  • As described in the documentation, cacheStorage capacity is not limited at the SW level, but it is still limited by the host environment QuotaManager.

scope

The scope of the SW is a URL path address that represents the range of pages that the SW can control. For example, you can control it down herehttp://localhost:8080/ehx-room/All pages in the directory. The default scope is the path at registration time, as shown in the following example./ehx-room/sw.js.Can also be in the navigator. ServiceWorker. The register () method is introduced into {scope: ‘/ XXX /yyyy/’} specifies the scope, but specify scope must be in the path of SW registration, such as the scope: ‘/ XXX /yyyy/’}

The life cycle

When we register a Service Worker, it goes through stages of its life cycle and triggers events. The entire lifecycle includes installing –> installed –> activating –> activated –> redundant. When the Service Worker is installed, the install event is triggered. After activated, an Activate event is triggered.

  • Installing

This state occurs after the service worker is registered, indicating the start of installation. The install event is triggered during this process and resources can be cached offline.

  1. Within the install callback event function, you can call the event.waitUntil() method and pass in a promise, and install will not end until the promise completes.
  2. You can also use the self.skipwaiting () method to enter the activating state directly without waiting for the other Service workers to be shut down
  • Installed

The SW has been installed and entered the waiting state, waiting for other Service workers to be shut down

  • Activating

In this state, the client that is not controlled by other SW allows the current worker to complete the installation, clears other workers and the old cache resources associated with the cache, and waits for the new Service worker thread to be activated.

  • Activated

In this state, the Activate event callback is handled and provides processing for functional events: FETCH, sync, push.

In addition to supporting the event.waituntil () method, you can also use the self.clients.claim() method in the activate callback event function to control the currently opened web page without refreshing it.

  • Redundant

This state indicates that one SW has ended its life cycle and is being replaced by another SW.

The working process

  1. After the main thread successfully registers the Service Worker, it starts to download and parse the Service Worker file. During the execution, it starts to install the Service Worker. During this process, the install event of the Worker thread will be triggered.
  2. If the install event callback executes successfully (the install callback usually does some cache read/write work and may fail), then the Service Worker is activated. During this process, the Worker thread’s activate event is triggered. If the install event callback fails to execute, the life cycle enters the Error termination state, terminating the life cycle.
  3. After activation, the Service Worker can control resource requests for pages under scope and can listen for fetch events.
  4. If the Service Worker is unregistered or a new version of the Service Worker is updated after activation, the Service Worker ends its life cycle and enters the Terminated state.

The sample

// In the page onload event callback, register SW
if ('serviceWorker' in navigator) { 
  window.addEventListener('load'.() = > { 
    navigator.serviceWorker.register('service-worker.js') 
      .then(registration= > { 
        // Registration succeeded
      }) 
      .catch(err= > { 
        // Failed to register
      }); 
  }); 
} 
Copy the code
// service-worker.js 
const CACHE_VERSION = 'unique_v1'; 
 
// Listen for the activate event and clear other caches after activation
self.addEventListener('activate'.event= > { 
  const cachePromise = caches.keys().then(keys= > { 
    return Promise.all( 
      keys.map(key= > { 
        if(key ! == CACHE_VERSION) {returncaches.delete(key); }})); }); event.waitUntil(cachePromise).then(() = > { 
    // Give the new SW control of the current page through the clients.claim method
    return self.clients.claim(); 
  }); 
}); 
 
self.addEventListener('fetch'.event= > { 
  event.respondWith( 
    caches 
      .match(event.request, { 
        // Ignore the query part of the URL
        ignoreSearch: DEFAULT_CONFIG.ignoreURLParametersMatching, 
      }) 
      .then(response= > { 
        // If it matches a resource in the cache, it returns it directly
        if (response) { 
          return response; 
        } 
        // Copy the original request if the match fails
        const request = event.request.clone(); 
        const url = request.url; 
        if (matchOne(url, DEFAULT_CONFIG.exclude)) { 
          return fetch(request); 
        } else if (request.method === 'GET' && matchOne(url, DEFAULT_CONFIG.include)) { 
          return fetch(request).then(httpRes= > { 
            // Correct requests are cached
            if (httpRes && [200.304].includes(httpRes.status)) { 
              // Cache resources
              const responseClone = httpRes.clone(); 
              caches.open(DEFAULT_CONFIG.cacheId).then(cache= > { 
                cache.put(event.request, responseClone); 
              }); 
            } 
            return httpRes; 
          }); 
        } else { 
          returnfetch(request); }})); });Copy the code

conclusion

Method \ category granularity Whether you need to connect to the Internet Can you take the initiative to update Size limit
HTTP cache A single resource Strong cache resources can be used offline no Browser QuotaManager restriction
Application Cache The whole application no is Generally 5 MB
Service Worker A single resource no no Browser QuotaManager restriction

Reference documentation

Read the HTTP caching mechanism

Caching and offline development with Service Worker and cacheStorage

Workbox webpack Plugins

Make your WebApp available offline

HTML5 Offline Cache – Introduction to the Manifest

Application caching Guide for beginners

Chapter 4 Service Worker · PWA Application practice

Service Worker offline caching practices