Front-end advanced: Best practices and considerations for web site cache control strategies

For a site, performance is all about user experience, and if you open the site in less time, you’ll retain more users. If your page takes 10 seconds to open, good user interaction is useless.

Cache control is performance optimization of the site is very common and important one annulus, cache control, in addition to improve the website in terms of performance in financial aspect also has important improvement: a better cache policy means less request, less traffic, less bandwidth, thereby saving a lot of or the cost of CDN server.

The cache control policy is the HTTP caching policy. The most effective policy is usually very simple. At the simplest level, all you need to know about HTTP cache is a cache-control header.

A good caching policy requires only two parts, and they are only controlled by cache-control:

Fingerprinted resources: permanent cache
Resources without fingerprints: Check the freshness each time

The diagram is as follows:

Fingerprinted resources: permanent cache

Cache-Control: max-age=31536000Copy the code

The world of martial arts, invincible, only fast not broken. The fastest way to request a resource is not to send a request to the server.

Static resources have hash values, which are fingerprints
Setting an expiration time of one year for a resource, i.e. 31536000, is generally considered permanent cache
The browser does not need to send requests to the server during permanent caching

Why can resources with hash values be cached permanently?

Because when the contents of the file change, a URL with a new hash value is generated. The front end will initiate a request for a new URL.

Resources without fingerprints: Check the freshness each time

Cache-Control: no-cacheCopy the code

Since there is no fingerprint, the freshness of the resource must be verified each time. (Fetched resource from cache, possibly expired resource)
If validated as the latest resource, the resource is loaded from the browser’s cache

Index. HTML is a fingerprint-free resource. If you put it in the cache, how can you ensure that when the server refreshes the data, the browser can get fresh resources?

Therefore, when cache-control: no-cache is used, the client checks the freshness of the server each time.

PS: What is the difference between no-cache and no-store?

There is no need to download the resource from the server every time even if the freshness is checked: if the browser /CDN cache is not expired. This is called a negotiated cache, and the HTTP status code returns 304, meaning Not Modified, or no change.

Fortunately, you don’t need to manage or configure negotiated caches, as nGINx or some OSS will automatically configure negotiated caches.

They have their own algorithm for the negotiated cache, which is based on the response header Last-Modified/ETag. Each time a browser requests a resource, it carries the ETag/Last-Modified of the Last server response as a flag and compares it with the ETag/ last-Modified of the server to determine the content change.

How is the ETag value generated in the HTTP response header?

At the bottom of the operating system, last-Modified is usually generated by the Mtime attribute in the file system. ETag provides finer granularity than last-Modified and is generated by the hash or mtime/size of the file contents. Of course, that comes later.

Be sure to add a cache-control response header to your resource

I often come across sites whose resource files do not have cache-control headers. The reason for this is that the cache policy configuration job is unclear, and sometimes it requires coordination between the front end and operations.

So if YOU don’t addCache-ControlWhat happens to the response header?

Does it automatically go to the server to check freshness every time? Unfortunately, it is not. In this case, resources are forcibly cached, and expired resources without fingerprint information are likely to be obtained. If an expired resource exists on the browser, you can also force the browser to refresh to get the latest resource. However, if the expired resources exist on the edge nodes of the CDN, the CDN refresh will be much more complicated and may require multi-party cooperation.

What is the default mandatory cache time

The first step is to clarify what the two response headers represent:

Date: indicates the time when the source server responds to the packet generation, which is equivalent to the time when the source server sends a request
Last-Modified: indicates the time when the static resource was last modifiedmtime

The LM Factor algorithm considers that when a server is requested, if cache-control is not set, the longer the last-Modified time is, the longer the forced Cache time will be generated.

The formula is expressed as follows, where factor is between 0 and 1:

MaxAge = (Date - LastModified) * factorCopy the code

Bundle Splitting: Minimize resource changes

Thanks to the development of single-page applications and front-end engineering, almost all resources are packaged with fingerprint information, which means that all resources can be permanently cached. The packaging strategy is shown below:

But is that all?

If all your JS resources are packaged into one file, it does have the advantage of permanent caching. But when a line of files is modified, the fingerprint information of the large package changes and the permanent cache is invalidated.

So what we need to do now is to cause minimal cache invalidation when modifying files. Packaging tools such as WebPack, which have a lot of performance optimizations built into Optimization, don’t do this for you; you need to do it yourself.

At this point, we can carry out the packaging scheme of hierarchical cache of resources, which is a suggested scheme:

webpack-runtime: In applicationwebpackThe version of the more stable, separated, to ensure long-term permanent cache
react/react-dom: reactAre also updated less frequently
vendor: Common third-party modules are packaged together, such aslodash.classnamesAlmost every page is referenced, but they are updated more frequently. In addition to the low frequency use of the third party module do not call
pageA: A page, when the components of A page change, its cache will be invalidated
pageBB: page
echarts: Uncommonly used and oversized third-party modules are packaged separately
mathjax: Uncommonly used and oversized third-party modules are packaged separately
jspdf: Uncommonly used and oversized third-party modules are packaged separately

With the development of HTTP2, especially multiplexing, the static resources of the initial page are not affected by the number of resources. Therefore, for better caching and on-demand loading, there are also many solutions that suggest packing all third-party modules into a single module.

summary

What the interviewers are looking at

Front-end advanced: Best practices and considerations for web site cache control strategies

Fingerprinted resources: permanent cache

Resources without fingerprints: Check the freshness each time

Be sure to add a cache-control response header to your resource

Bundle Splitting: Minimize resource changes

summary

Related Posts

Implement a chat application

Question 29: How to understand bucket sorting?

Dry! Mobile real machine debugging guide, easy to debug