The browser cache, that is, the client cache, as well as web performance optimization in static resources to optimize a weapon, and countless web developers a big problem in the working process of the inevitable, so in product development, we are always trying to avoid the cache when and at the time of product release in want to cache strategy management upgrade the access speed of the page. Understanding the principle of browser cache hit is the basis of developing Web applications. This paper focuses on this, learning the relevant knowledge of browser cache, summarizing the methods of cache avoidance and cache management, and explaining the related problems of cache with specific scenarios. I hope it helps someone in need.
1. Browser cache basics
It is divided into strong cache and negotiated cache :(HTTP cached data is stored in the cache database)
1) When loading a resource, the browser determines whether the resource matches the strong cache according to some HTTP headers of the resource. If the strong cache matches, the browser directly reads the resource from its cache without sending a request to the server. For example, if the cache configuration of a CSS file matches the strong cache when the browser loads the page, the browser loads the CSS directly from the cache without sending the request to the server where the page is located.
2) When the strong cache is not hit, the browser must send a request to the server. The server verifies that the resource is hit by the negotiation cache based on other HTTP headers. If the negotiation cache is hit, the server returns the request, but does not return the data of the resource. Instead, the client is told to load the resource directly from the cache, and the browser will load the resource from its cache.
3) Strong caching and negotiated caching have one thing in common: if hit, the resource is loaded from the client cache, not from the server. The difference is that the strong cache does not send requests to the server, whereas the negotiated cache does.
4) When the negotiation cache also fails, the browser loads the resource data directly from the server.
2. Two strategies for browser cache
1. Strong cache: does not send requests to the server, but directly hits cache resources in the memory. Resource 200 can be seen in Chorme Network and is from disk cache or from memory cache.
Negotiation cache: Sends a request to the server. The server determines whether the resource needs to be updated according to the parameters in the Request header. If no update is required, the server returns the status code of 304 and notifies the browser to read the local cache.
3. Principle of strong cache
When a browser request for a resource hits a strong cache, the HTTP status returned is 200. In The Network of Chrome developer tools, the size will be displayed as from cache. For example, there are many static resources configured with strong cache in the homepage of Jd.com. If you look at the network using F12, you can see that many requests are loaded from the cache:
Strong caching is implemented using cache-Control or Expires HTTP response headers, both of which are used to indicate the expiration date of a resource in the client Cache. Cache-control takes precedence when both cache-control (for a time period) and Expires (for a future time point) exist.
Expires is a header proposed by HTTP1.0 that represents the expiration time of a resource. It describes an absolute time that is returned by the server as a STRING in THE GMT format, such as: Expires:Thu, 31 Dec 2037 23:55:55 GMT. 1) The first time the browser requests a resource from the server, the server returns the resource with a respone header and Expires header.
2) After the browser receives the resource, it caches the resource along with all the response headers (so the header returned by the cache hit request is not from the server, but from the previous cache header);
3) When the browser requests the resource again, it first looks for the resource in the cache. Once it finds the resource, it compares its Expires to the current request time. If the request time is before the specified Expires time, it hits the cache; otherwise, it doesn’t.
4) If the cache doesn’t hit, the Expires Header will be updated when the browser loads the resource directly from the server.
Expires is an old strong cache management header. Because it is an absolute time returned by the server, cache management is prone to problems when there is a large difference between the server time and the client time. For example, arbitrarily changing the client time can affect the cache hit result. Cache-control :max-age=315360000 cache-control :max-age=315360000 Cache-control :max-age=315360000 cache-control :max-age=315360000 cache-control :max-age=315360000 cache-control :max-age=315360000
The server returns the respone header with a cache-control header, as shown in the following figure:
2) After receiving the resource, the browser will cache the resource along with all the response headers;
3) browser to request the resource from the Cache to find first, find the resources, upon the request of the it for the first time, the time and the validity of the cache-control setting, calculate an expiration time resources, to compare the current request time the expiration time, if the request time prior to the expiration date, you can hit the Cache, otherwise we are not.
4) If the Cache is not hit, the cache-control Header will be updated when the browser loads the resource directly from the server.
4. Strong cache management
How do I enable strong caching
1) Add Expires and cache-control headers to the response returned by the Web server through front-end code;
2) Configure web servers to add both Expires and cache-control headers when responding to resources.
1. Strong cache is mainly determined by the following keys.
http response header key | description |
---|---|
Cache-Control | Http1.1 the primary key, specifying the caching mechanism |
Pragma | Http1.0 specifies the cache mechanism Pragma:no-cache, which is equivalent to cache-control :no-cache. |
Expires | Http1.0 specifies the expiration time of the Cache. Cache-control values are preferred if they exist together with cache-control |
Cache-control specifies the expiration time (s) of the Cache.
Cache-Control : max-age=300
Copy the code
2.Pragma is generally used for debugging, and now very few Pragma are handled manually in response.
3.Expires is similar to max-age, except that Expires is a fixed point in time on the server.
5. Negotiation cache principle
When the browser’s request for a resource does Not hit the strong cache, it will send a request to the server to verify whether the negotiated cache is hit. If the negotiated cache is hit, the HTTP status returned by the request response is 304 and a Not Modified string will be displayed. For example, if you open the homepage of Jingdong and press F12 to open developer tools, Press F5 to refresh the page and check network. You can see that many requests hit the negotiation cache:
The negotiated cache is managed using last-Modified, if-modified-since and ETag, if-none-match pairs of headers.
The last-modified, if-modified-since control cache works as follows:
1) When the browser first requests a resource from the server, the server returns the resource with a last-modified header. This header indicates the Last time the resource was Modified on the server:
2) Add if-modified-since to the header of the request, which is the last-modified value returned from the Last request:
3) When the server receives the resource request again, it determines whether the resource has changed according to if-Modified-since and the time when the resource was last Modified on the server. If there is no change, 304 Not Modified will be returned, but the resource content will Not be returned. If there are changes, the resource content is returned as normal. When the server returns a 304 Not Modified response, the last-Modified header is Not added to the response header, because since the resource has Not changed, the last-Modified header will Not change. This is the response header when the server returns 304:
4) When the browser receives the response from 304, it loads the resource from the cache.
5) If the negotiated cache is not hit, the last-Modified Header will be updated when the browser loads the resource directly from the server, and if-modified-since will enable the last-modified value returned Last time on the next request.
Last-modified, if-modified-since are both headers that are returned based on server time. In general, the combination of these two headers is very reliable for managing negotiated caches without adjusting server time or tampering with client caches. However, sometimes the resources on the server actually change, but the last modification time does not change, and this problem is not easy to locate, and when this situation occurs, it will affect the reliability of the negotiated cache. So we have another pair of headers to manage the negotiation cache, called ETag and if-none-match. Their cache management is as follows:
This header is a unique identifier generated by the server based on the requested resource. This unique identifier is a string. This string will be different if the resource changes. It has nothing to do with the Last modification time, so it is a good complement to the last-modified problem:
2) Add if-none-match to the header of the request, and the value of this header is the same as the value of the ETag returned in the previous request:
If the two values are the same, the resource has not changed. If the two values are the same, the resource has changed. If the two values are different, the resource has changed. If there is no change, 304 Not Modified is returned, but the resource content is Not returned; If there are changes, the resource content is returned as normal. Unlike last-Modified, when the server returns a 304 Not Modified response, the response header returns the ETag because the ETag has been regenerated, even though the ETag is unchanged:
4) When the browser receives the response from 304, it loads the resource from the cache.
6. Negotiate cache management
The negotiated cache is different from the strong cache. The strong cache does not send requests to the server, so sometimes the browser does not know that the resource is updated, but the negotiated cache will send requests to the server, so the server must know whether the resource is updated or not. Most Web servers have negotiated caching enabled by default, and both last-Modified, if-modified-since and ETag, if-none-match are enabled, such as Apache:
Negotiated caching is used in conjunction with strong caching, and as you can see from the previous screenshot, in addition to the last-Modified header, there are also related headers for strong caching, because negotiated caching doesn’t make sense if strong caching is not enabled.
7. Impact of browser behavior on cache
If the resource has been the browser cache, the cache invalidation, before the request again, the default will first check whether hit cache, if strong cache hit directly read the cache, if strong cache does not hit send request to the server check the cache hit consultation, if negotiation cache hit, and then tell the browser or can be read from the cache, Otherwise, the latest resource is returned from the server. This is the default and can be changed by browser behavior:
1) When CTRL + F5 forces to refresh the page, load it directly from the server, skipping strong cache and negotiated cache;
2) When F5 refreshes the web page, it skips the strong cache, but checks the negotiated cache;
8. To summarize
Typically, both strong and negotiated caching are used together.
For mandatory caching, the server notifies the browser of a cache time, within which the next request is made, the cache will be used directly.
For comparison cache, Etag and Last-Modified in the cache information are sent to the server through a request, which is verified by the server. When the 304 status code is returned, the browser directly uses the cache.
First request from browser:
When the browser requests again:
9. Why last-Modified with Etag?
Etag to solve a problem that Last-Modified could not solve
1, some files may change periodically, but their contents do not change (only change the modification time), at this time we do not want the client to think that the file has been modified, and GET again;
2. Some files are Modified very frequently, such as If they are Modified less than seconds (say N times in 1s), and if-modified-since the force can be checked is class S. This modification cannot be determined.
3, some servers do not know exactly when a file was last modified.
4. Sometimes ETag can compensate for the defect of Last-Modified judgment, but sometimes Last-Modified can compensate for the defect of ETag judgment. For example, some static files such as pictures are Modified. This is obviously much slower than directly comparing the modification time. So these two judgments are complementary