1. Basic understanding of browser cache

It is divided into strong cache and negotiated cache

1. When loading a resource, the browser determines whether the resource matches the strong cache based on some HTTP headers. If the strong cache matches, the browser directly reads the resource from its cache without sending a request to the server. For example, if the cache configuration of a CSS file hits a strong cache when the browser loads the page on which it is located, the browser loads the CSS directly from the cache without sending the request to the server on which the page is located.

2. When the strong cache does not hit, the browser sends a request to the server. The server verifies that the resource has hit the negotiation cache based on other HTTP headers. Instead, the client is told to load the resource directly from the cache, and the browser loads the resource from its cache.

Strong caching and negotiated caching have one thing in common: if a hit is made, the resource is loaded from the client cache, not from the server. The difference is that the strong cache does not send requests to the server, whereas the negotiated cache does.

When the negotiation cache also fails, the browser loads the resource data directly from the server.

Second, the principle of strong cache

2.1 introduction

When a browser request for a resource hits a strong cache, the HTTP status returned is 200. In the Network of Chrome developer tools, size will be displayed as from cache. For example, there are many static resources configured with strong cache in the homepage of Jingjing. Looking at the network using F12, you can see that many requests are loaded from the cache

Strong caching is implemented using Expires or cache-control HTTP response headers, both of which are used to indicate the expiration date of a resource in the client Cache.

Expires is a header proposed by HTTP1.0 that represents a resource expiration time. It describes an absolute time that is returned by the server as a STRING in THE GMT format, for example: Expires:Thu, 31 Dec 2037 23:55:55 GMT

2.2 Expires Caching Principle

The first time the browser requests a resource from the server, the server returns the resource with a header in the Respone Expires header

2. When the browser receives the resource, it caches the resource along with all the response headers.

3. When a browser requests a resource, it first looks for the resource in the cache. When it finds the resource, it compares its Expires response time to the current request time

The Expires Header is updated when the browser loads the resource directly from the server if the cache is not hit

Expires is an old strong cache management header. Because it is an absolute time returned by the server, cache management is prone to problems when there is a large difference between the server time and the client time. For example, arbitrarily changing the client time can affect the cache hit result. Cache-control :max-age=315360000 cache-control :max-age=315360000 cache-control :max-age=315360000

2.3 Cache-control Mechanism of caching

The first time the browser requests a resource from the server, the server returns the resource with a cache-control header in the respone header.

2. When the browser receives the resource, it caches it along with all response headers

3, the browser to request the resource from the Cache to find first, find the resources, according to its first request time and the validity of the cache-control setting, calculate an expiration time resources, to compare the current request time the expiration time, if the request time prior to the expiration date, you can hit the Cache, otherwise we are not

4. If the Cache is not hit and the browser loads the resource directly from the server, the cache-control Header will be updated during reloading

Cache-control describes a relative time. The client time is used to determine Cache hits. Therefore, cache-Control Cache management is more effective and safer than Expires.

Cache-control takes precedence over Expires when both Expires and cache-control exist in response headers:

Third, strong cache management

The principle of strong cache is introduced above. In practical application, we will encounter scenarios that require strong cache and scenarios that do not need strong cache. There are usually two ways to set whether to enable strong cache

Add Expires and cache-control headers to the response returned by the Web server in code

2. Configure the Web server to add Expires and cache-control headers when responding to resources

In javaweb, for example, we can use code to set up strong caching

You can also disable strong caching through Java code

Nginx and Apache, as professional Web servers, have special profiles that can set Expires and cache-Control. If you’re interested in operations, You can search Baidu for nginx setting Expires cache-control or Apache setting Expires cache-control to find a number of relevant articles.

Because strong cache is not specially configured during development, and the browser caches static resources such as images, CSS and JS by default, so the development environment often does not see the latest effect because of strong cache resources are not updated in time. There are many methods to solve this problem, the common ones are as follows

Deal with caching issues

1, direct CTRL + F5, this method can solve the page directly referenced resource update problem

2. Use the browser’s privacy mode for development

3. If you are using Chrome, you can disable the cache on the network by f12 (this is a very effective method).

4. During development, add a dynamic parameter to the resource, such as CSS /index.css? V =0.0001, because each resource change must update the location of the reference and change the value of the parameter, so it is not very easy to operate, unless you are developing in dynamic pages such as JSP can use server variables (v=${sysRnd}), or you can use some front-end build tools to handle this parameter change problem

5, if the resource reference page is embedded in an iframe, you can right-click in the iframe area to reload the page, take Chrome as an example

6. If caching problems occur in Ajax requests, the most effective solution is to append random numbers to ajax request addresses

7. If the SRC of the iframe is set dynamically, you may not see the latest effect due to cache problems. In this case, adding random numbers after the SRC to be set can also solve the problem

If you develop using front-end tools such as Grunt and gulp, webpack, and contrib-connect to start a static server, you don’t need to worry about resource updates at development time. Because cache-control is always set not to cache in the Respone headers returned by all resources under this static server

Four, the application of strong cache

Strong cache is the most powerful tool for front-end performance optimization, no one, for a large number of static resources of the web page, must use strong cache, improve response speed. A common approach is to set an Expires or cache-control Expires for all of these static resources, so that when a user accesses a web page, the user will only request the static resource from the server on the first load, and the rest of the time will be loaded from their Cache as long as the Cache doesn’t expire and the user doesn’t force a refresh. For example, as mentioned above, the cache expiration time of jingdong home page is set to 2026

However this cache configuration mode brings a new problem that is released when resources update problems, such as a picture, when the user to access the first version has already been cached in the user’s computer, when a website released the new version, replace the image, the first version of the user has been visited as a result of the cache Settings, By default, the server does not request the latest image resources, and it cannot see the latest image effects unless it clears or disables the cache or forces a refresh

All the things mentioned in this article are theoretical solutions, but there are many front-end tools that can actually solve this problem, because each tool involves a lot of details, there is no way to go into detail in this article. Grunt Gulp WebPack Fis and EDP can solve this problem if you are interested. Fis and EDP are front-end development platforms launched by Baidu. There are available documents for reference:

Fis.baidu.com/fis3/api/in…

Ecomfe. Making. IO/edp/doc/ini…

In addition to server pages that can be considered dynamic resources, HTML that references static resources can also be considered dynamic resources. If this HTML is also cached, when the HTML is updated, There may be no mechanism for notifying the browser that the HTML has been updated, especially in a separate application where the pages are pure HTML pages and each access address may be a direct access to the HTML page. These pages are usually not cached to ensure that the browser always requests the latest resources from the server when accessing these pages

5. Principle of negotiation cache

5.1 introduction

When the browser’s request for a resource does Not hit the strong cache, it will send a request to the server to verify whether the negotiated cache is hit. If the negotiated cache is hit, the HTTP status returned by the request response is 304 and a Not Modified string will be displayed. For example, if you open the homepage of Jingdong and press F12 to open developer tools, Press F5 to refresh the page and check network. You can see that many requests hit the negotiation cache

If you look at the Response Header for a single request, you can also see the 304 status code and the Not Modified string, which indicates that the resource hit the negotiated cache and was loaded from the client cache rather than the latest resource on the server

5.2 Last-modified, if-modified-since Controls the negotiated cache

The first time the browser requests a resource from the server, the server returns the resource with a last-Modified header added to the respone header. This header indicates the Last time the resource was Modified on the server

2. When the browser requests the resource again, it adds an if-modified-since header to the header of the request. This header is the last-modified value returned from the previous request

3. When the server receives a resource request again, it determines whether the resource has changed according to if-modified-since and the time when the resource was last Modified on the server. If the resource has Not changed, 304 Not Modified is returned, but the resource content is Not returned. If there are changes, the resource content is returned as normal. When the server returns a 304 Not Modified response, the last-Modified header is Not added to the response header, because since the resource has Not changed, the last-Modified header will Not change. This is the response header when the server returns 304

When the browser receives the response from 304, it loads the resource from the cache

5. If the negotiated cache is not hit, the last-Modified Header will be updated when the browser loads the resource directly from the server, and if-modified-since will enable the last-modified value returned the previous time on the next request

Last-modified, if-modified-since are both headers that are returned based on server time. In general, the combination of these two headers is very reliable for managing negotiated caches without adjusting server time or tampering with client caches. However, sometimes the resources on the server actually change, but the last modification time does not change, and this problem is not easy to locate, and when this situation occurs, it will affect the reliability of the negotiated cache. So we have another pair of headers to manage the negotiation cache, called ETag and if-none-match. The way they manage their cache is

5.3 ETag and if-none-match control the negotiation cache

The first time the browser requests a resource from the server, the server returns the resource with a header from respOne and an ETag header. This header is a unique identifier generated by the server based on the requested resource. This unique identifier is a string. It has nothing to do with the Last modification, so it’s a good complement to the last-modified problem

2. When the browser requests the resource from the server again, it adds the if-none-match header to the request header. The header value is the ETag value returned from the previous request

If the two values are the same, the resource has not changed. If the two values are the same, the resource has changed. If the two values are different, the resource has changed. If there is no change, 304 Not Modified is returned, but the resource content is Not returned; If there are changes, the resource content is returned as normal. Unlike last-Modified, when the server returns a 304 Not Modified response, the response header returns the ETag because it has been regenerated, even though the ETag is unchanged

When the browser receives the response from 304, it loads the resource from the cache.

Management of negotiation cache

The negotiated cache is different from the strong cache. The strong cache does not send requests to the server, so sometimes the browser does not know that the resource is updated, but the negotiated cache will send requests to the server, so the server must know whether the resource is updated or not. Most Web servers have negotiated caching enabled by default, and both last-Modified, if-modified-since and ETag, if-none-match are enabled, such as Apache:

Without the negotiated cache, each request to the server would have to return the resource content, and the server would suffer from poor performance.

Last-modified, if-modified-since and ETag, if-none-match are generally enabled at the same time to deal with cases where last-Modified is unreliable.

There is one scenario to be aware of

Last-modified files must be consistent across multiple machines in a distributed system to avoid load balancing on different machines resulting in failed comparisons.

Distributed systems turn off ETAGS as much as possible (eTAGS are generated differently for each machine);

The repsones header is returned with only last-Modified, and no ETag:

Negotiated caching is used in conjunction with strong caching, and as you can see from the previous screenshot, in addition to the last-Modified header, there are also related headers for strong caching, because negotiated caching doesn’t make sense if strong caching is not enabled

The impact of relevant browser behavior on caching

If the resource has been the browser cache, the cache invalidation, before the request again, the default will first check whether hit cache, if strong cache hit directly read the cache, if strong cache does not hit send request to the server check the cache hit consultation, if negotiation cache hit, and then tell the browser or can be read from the cache, Otherwise, the latest resource is returned from the server. This is the default and can be changed by browser behavior:

1. When CTRL + F5 forces to refresh the web page, load it directly from the server, skipping strong cache and negotiated cache;

2. When F5 refreshes the web page, it skips the strong cache, but checks the negotiated cache