Caching is a simple and efficient way to optimize performance. A good caching strategy can shorten the distance of web page request resources, reduce latency, and reduce bandwidth and network load because cached files can be reused.

For a data request, it can be divided into three steps: initiating a network request, back-end processing and browser response. Browser caching helps us optimize performance in the first and third steps. For example, if the cache is used directly without making a request, or if the request is made but the back-end stores the same data as the front-end, there is no need to send the data back, thus reducing the response data.

The cache location

The cache is divided into four types from the cache position, and each has its priority. When the cache is searched in sequence and no match is found, the network will be requested.

  • Service Worker
  • Memory Cache
  • Disk Cache
  • Push Cache

Service Worker

A Service Worker is a separate thread that runs behind the browser and is typically used for caching. To use the Service Worker, the transport protocol must be HTTPS. Since request interception is involved in Service workers, the HTTPS protocol must be used for security. The Service Worker’s cache differs from other built-in caching mechanisms in that it gives us control over which files are cached, how the cache is matched, and how the cache is read, and the cache is persistent.

There are three steps to realize the caching function of Service Worker: The Service Worker needs to be registered first, and the required files can be cached after the install event is monitored. Then the next time the user accesses the server, it can intercept the request to check whether there is cache. If there is cache, it can directly read the cache file, otherwise it will request data.

When the Service Worker does not hit the cache, we need to call the fetch function to fetch data. That is, if we do not hit the cache in the Service Worker, we will look up the data based on the cache lookup priority. But whether we fetch data from the Memory Cache or from a network request, the browser displays what we fetch from the Service Worker.

There is much more about Service workers that will be discussed separately.

Caching and offline development with Service Worker and cacheStorage

Browser cache, CacheStorage, Web Worker, and Service Worker

Memory Cache and Disk Cache

Memory Cache refers to the Memory Cache, which is the fastest in terms of efficiency. But it is the shortest in terms of lifetime, and once we close the Tab page, the memory cache is freed.

A Disk Cache is a Cache stored on Disk. It is slower than a memory Cache in terms of access efficiency, but its advantages lie in storage capacity and storage duration.

Ok, now the question is, since each has its pros and cons, how does the browser decide whether to put resources in memory or hard disk? The main strategies are as follows:

  • Large JS and CSS files will be directly thrown into the disk, otherwise thrown into the memory
  • When the memory usage is high, files are preferentially transferred to disks

Some specific performance can be seen in this answer on Zhihu

How does the browser decide between “From Disk cache” and “from Memory cache”?

MemoryCache and diskCache flow details

Push Cache

Push cache is HTTP/2 content. It is the last line of defense for the browser cache. It exists only in the Session, is released once the Session ends, and is cached for a very short time, about 5 minutes in Chrome. It also does not strictly enforce caching instructions in HTTP headers.

HTTP/2 push is tougher than I thought

Caching strategies

There are two types of browser cache policies: strong cache and negotiated cache, and both are implemented by setting HTTP headers.

Strong cache

The first thing browsers use is strong caching

A strong cache does not send HTTP requests but reads resources directly from the cache. The status code is 200 and size is displayed as from Disk cache or from Memory cache.

Strong caching can be implemented by setting two HTTP headers: Expires and cache-Control

Expires (HTTP / 1.0)

Expires is a response header returned by the server that tells the browser to retrieve data directly from the cache before the expiration date, without having to request it again. Like this:

Expires: Wed, 22 Nov 2019 08:41:00 GMT
Copy the code

Indicates that the resource expires at 8:41 am on November 22, 2019. If the resource expires, a request must be sent to the server.

This seems fine and reasonable, but there is a potential pitfall: the server’s time and the browser’s time may not be the same, and the expiration date returned by the server may be inaccurate. So this approach was quickly abandoned in later versions of HTTP1.1.

Cache-control HTTP / 1.1) (

In HTTP/1.1, cache-control is the most important rule and is used to Control web page caching.

Cache-control can be set in either the request header or the response header, and can be combined with multiple instructions:

cache-control: public, max-age=31536000
Copy the code

The mechanism of strong caching can be summarized in the following diagram

Negotiate the cache

After strong cache invalidation, the negotiation cache phase is entered

The browser carries the cache identifier in the request header, and the server decides whether to use the cache based on the cache identifier. If the negotiated cache is valid, 304 and Not Modified are returned, and if it is invalid, 200 and the request result are returned

Negotiated caching can be implemented by setting two HTTP headers: last-Modified and ETag

The last-modified and If – Modified – Since

When the browser accesses the resource for the first time, the response header carries last-Modified (the Last modification time of the resource on the server, the minimum unit is s), which is cached after the browser receives it. The next time the resource is requested, the request header carries if-modified-since. If the Modified time has not changed, the cache is used and 304 is returned, otherwise the new resource and 200 are returned

Last-Modified: Tue, 30 Mar 2021 03:30:52 GMT
Copy the code

Last-modified has some drawbacks

  • If the cache file is opened locally, last-Modified is Modified even if the file is not Modified. The server cannot match the cache and sends the same resource
  • Because last-Modified can only be measured in seconds, if the file is Modified in an imperceptible amount of time, the server will assume that the resource is still a hit and will not return the correct resource

The ETag and If – None – Match

An Etag is a unique identifier (generated by the server) that is returned to the current resource file when the server responds to a request. The Etag is regenerated whenever the resource changes. The next time the browser loads a resource and sends a request to the server, it will place the last returned Etag value in if-none-match in the request header. The server can determine If the resource has been modified relative to the client by comparing if-none-match with the ETag of the resource on the server.

If the server finds that the ETag does not match, it returns 200 and a new resource (including the new ETag) to the client.

If the ETag is consistent, 304 is returned to inform the client to use the local cache directly.

Caching mechanisms

Mandatorized caching takes precedence over negotiated caching. If mandatorized caching (Expires and cache-control) is in effect, the Cache is used directly. If not, the negotiation cache (last-modified/if-modified-since and Etag/if-none-match) is implemented. The negotiation cache is determined by the server whether to use the cache or not. If the negotiation cache is invalid, the request cache is invalid and 200 is returned. Return the resource and cache id to the browser cache; If it takes effect, return to 304 and continue to use the cache. The specific flow chart is as follows:

If you have a question: what will the browser do if no caching policy is set?

In this case, the browser uses a heuristic algorithm that typically takes 10% of the Date minus last-Modified value in the response header as the cache time.

The cache policy is applied in actual scenarios

No cached resources are used

1. The meta cache header is set to disable caching

To disable the browser from reading the cache, add the following to the HTML head tag

<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" />
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="0" />
Copy the code

Cache-control on HTTP1.1 Pragma on HTTP1.0 Expires on Proxies

However, the browser can not load the cache when the resource is not modified, which can affect the experience

2. Add the js and CSS version number

When requesting JS/CSS, append the version number to the end of the request. If the browser finds that the version is higher, it will not read the cache of the lower version. V equals 1.0, for example

custom.css? V = 1.0 main. Js? V = 2.0Copy the code

Version number, of course, also can automatically add random number, but it has violated the original intention of the version number, so that the same browser cannot load when resources didn’t modify the cache, affect experience random version number add methods, the use of a random function can, of course, so it is only through a write js calls in js statements, for example

document.write("  + Math.random() + " '></s " + ' cript> ')
Copy the code

Or is it

var js = document.createElement(' script ')
js.src = ' test.js' + '? v=' + Math.random()
document.body.appendChild(js)
Copy the code

3. Add the MD5

MD5 is equivalent to the ID number of a file, and the MD5 of each file is different. If a file is modified, the MD5 is also different. Therefore, we can use MD5 to determine whether the resource is modified. Of course, it is impossible for us to add one by one. It must be a matter of the server. We will not say more here

Frequently changing resources

Cache-Control: no-cache

For resources that change frequently, cache-control: no-cache is used to make the browser request the server each time, and then ETag or Last-Modified is used to verify that the resource is valid. This does not save the number of requests, but it can significantly reduce the size of the response data.

A resource that is not constantly changing

Cache-Control: max-age=31536000

Such resources are typically processed by setting their cache-control to a large max-age=31536000 (a year) so that subsequent browser requests for the same URL will hit the forced Cache. In order to solve the update problem, dynamic characters such as hash and version number need to be added to the file name (or path), and then dynamic characters need to be changed to change the reference URL, so that the previous mandatory cache is invalidated (it is not immediately invalidated, but is no longer used). Libraries that are available online (jQUERy-3.3.1.min.js, lodash.min.js, etc.) use this pattern.

The impact of user behavior on browser caching

The effect of user behavior on the browser cache refers to the cache policies that are triggered when the user acts on the browser. There are three main types:

  • Open the web page and enter the address in the address bar to check whether there is a match in the disk cache. Use if available; If no network request is sent.
  • Plain flush (F5) : Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next.
  • Forced refresh (Ctrl + F5) : The browser does not use caching, so requests are sent with a headerCache-control: no-cache(And for compatibilityPragma: no-cache), the server returns 200 and the latest content.

ref

Browser caching mechanism

Talk about front-end caching

How to solve the problem of caching static resources