preface
It’s no secret that caching is important and useful in Web development. But it’s also complicated.
This article provides a comprehensive overview of caching from the following five aspects.
- Judgment strategy for caching
- The basics of knowing and knowing caching
- The pros and cons of various caches
- Best practices for caching
- Let’s see if you got it?
First, the judgment strategy of cache
Browsers have a complete mechanism for caching requested resources, including the following three policies: storage policy, expiration policy, and negotiation policy.
The storage policy occurs after the request response is received and is used to decide whether to cache the corresponding resources. The expiration policy occurs before the request and is used to determine whether the cache is expired. Negotiation policies occur in requests to determine whether cached resources are updated.
When a browser applies a cache policy, the procedure is as follows:
The cache decision process shown in the figure above is the complete decision process of the browser when applying the cache. However, the way resources are accessed in the browser can also lead to different judgment processes. Judging the flow will skip some of the flow depending on the way.
You can access resources in the following ways:
- (New label) Address bar enter
- Links to jump
- Forward and backward
- Open the link from the favorites bar
- (window.open) A new window is opened
- Refresh (Command + R/F5)
- Force refresh (Command + Shift + R/Ctrl + F5)
When using these seven ways to access resources, the caching strategy applies a little differently. As shown in the figure below. To access resources in the preceding seven ways, different cache application identification steps are performed. Do not do verification here, I believe that you have seen the content behind, to verify their own.
Note that Chrome in the current address bar, do not change the content, directly enter, equal to refresh the current page, and Firefox with other in the address bar enter the same. This is a special point that needs to be distinguished.
This article is accompanied by a test script, and the code is on Github. The test scripts are described below, and instructions can be found in the download link. To verify the preceding information, run the node cache-etag +max-age.js command to enable ETag and max-age at the same time, and then trigger the corresponding action. Use the Network panel and node logs to verify the information.
In addition, there is a concept that WebKit resources are divided into master and derived resources. The main resource is the resource returned by the URL request entered in the address bar, and the derived resource is the JS, CSS, and image resources referenced in the main resource.
When Chrome refreshes, only primary resources are cached as shown in the figure above, and derived resources are cached as new tabs open, judging whether the cache has expired. The difference when strong caching takes effect is that the new TAB opens as from Disk cache, whereas the current page refresh derives from from Memory cache.
In Firefox, the current page refreshes and all the resources appear as shown above. This feature is also used to refresh the current page, and derived resources are tested using caching. Otherwise, it is tedious to open a new label every time.
Second, will know will cache basis
The following table lists 10 cache-related fields in HTTP. To clarify their functions and usage, storage policy, expiration policy, negotiation policy, request header, and response header are distinguished in the following table.
Note: Out of a half-pair, last-Modified is a half-pair because it could trigger the heuristic cache and also cache the file. See below for details.
Caches are divided into strong caches and weak caches (also known as negotiated caches). Strong caches include Expires and cache-control, which are mainly used when an expiration policy takes effect. Weak caches, including last-Modified and ETag, are caches that are applied after a negotiated policy. The main difference between strong and weak caches is whether a request is sent when a resource is acquired.
2.1 Expires
As mentioned above, Expires specifies the expiration time of the cache, which is absolute time, a certain moment in time. Comparison is performed based on the local time and expires after the specified time. RFC 2616 recommends no more than one year.
The Expire header field is a response header field in the following format: Expires: Sat Oct 20 2018 00:00:00 GMT+0800 (CST).
Try the following steps to verify:
- perform
node cache-Expires.js
, the script will set the requested resourceExpires
, and the value is “2018-10-20 00:00:00”. - Access to the address
http://localhost:1030/
, open Network Tab and check avatar. JPG. The Expires value is shown below. - Refresh again to see that the resource has been cached and the size column shows as
(from memory cache)
. Change the local time to 2018-10-15 00:00:00 and refresh the time. The cache is still valid. - If you change the local time to “2018-10-25 00:00:00” and refresh it again, you will find that the image is no longer cached, but retrieved again because the local time exceeds the set value.
2.2 Cache-Control
Cache-control specifies the Cache mechanism for resources. It can be set in both request headers and response headers. It involves two of the above three policies: storage policy and expiration policy.
Cache-control syntax is as follows: cache-control: cache-directive[,cache-directive]. Cache-directive is a cache directive and case insensitive. There are 12 directives related to HTTP cache standards, as shown in the following table. There are 7 kinds of request instructions and 9 kinds of response instructions. Cache-control can set multiple Cache instructions separated by commas.
2.3.1 Cache-directive is case-insensitive
As shown above, the cache-directive directive is case-insensitive. Therefore, the cache-control directive can be case-insensitive. However, it is recommended to use lowercase. The verification is as follows:
- perform
node cache-directive-case-insensitive.js
, will the server willmax-age
Write it in capitals, as followsCache-Control: MAX-AGE=86400
. - Requesting the browser again will find that the cache also works.
2.3.2 Max-age in the request Header
Max-age is mainly used in the request header. Max-age =0 indicates that no cache is used. In Chrome and Firefox, the refresh operation (Command+ R/F5) adds max-age=0 to the request header, which does not use strong caching but allows negotiated caching (after the introduction of last-Modified and ETag negotiated caching, Check this out for yourself).
Cache-control: max-age=0
- Access image resources separately
http://localhost:1030/avatar.jpg
And open the Network, - Refresh to see the above content in the response header. As shown in the figure below. (The same in Firefox, there is no separate validation, the main and derived resources mentioned at the beginning of the presentation of the two browsers are different).
In addition, both Chrome and Firefox have proven poor support for max-age>0.
- Under Chrome, pass
Modify Headers
Add a plugin (Chrome and Firefox have similar plug-ins) to the requestmax-age=7200
. - perform
node cache-max-age.js
To accesshttp://localhost:1030
, first strong brush to ensure resource update. - Open NetWork and view
avatar.jpg
, you will find that resource access still goes through the cache. If according to the specification definition should be not effective.
2.3.3 Max – age and Expires
The max-age directive in cache-control is used to specify the relative time at which the Cache expires. The resource expires after the specified time. This function is similar to Expires. But it takes precedence over Expires. If both max-age and Expires are set, max-age takes effect and Expires is ignored. The verification is as follows:
- perform
node cache-max-age+Expires.js
Is set at the same timeCache-Control: max-age=86400
/Expires: Mon Oct 20 2018 00:00:00 GMT+0800 (CST)
, as shown below. - Refresh, and then change the local time to the current time 2 hours later (no more than 20), you will find that the cache takes effect. (Screenshots are no longer attached for the following two steps, similar to the examples above).
- If you change the time to two days later (assuming 20 is more than two days from now, otherwise the opposite is true), you will find that the cache is no longer in effect because the max-age limit is exceeded.
Instead, try again. If a max-age Expires for a longer time than Expires, you’ll see that max-age still works.
2.3.4 no cache and no – the store
Note that no-cache does not mean no files are cached. No-store means no files are cached. No-cache only indicates that the strong cache is skipped and the negotiation policy is enforced.
2.3 Pragma
The http1.0 field, usually set to Pragma:no-cache, has the same effect as cache-control :no-cache. When pressing (Comand + Shift + R/Ctrl + F5) in the browser or checking Disable Caches in the NetWork panel, Pragma:no-cache and cache-control :no-cache are automatically Pragma and the information involved in negotiation policies (if-modified-since/if-none-match described below) is not included. This is done without using any cache to retrieve the resource. As shown in the figure below.
2.4 last-modified/If Modified – Since/If Unmodified — Since
Last-modified Indicates when the requested resource was Last Modified. The syntax is as follows: Last-modified :< day-name>,
:
:
GMT(Greenwich Mean time). Use new Date().togmtString () to get the current GMT time. Since last-Modified is only accurate to the second, it is not suitable for resources that change multiple times in a second.
If Expires, Cache-Control: max-age, or cache-Control :s-maxage are not present in the response header and last-Modified is set, then the browser defaults to a heuristic algorithm called a heuristic Cache. Typically 10% of the Date_value – last-modified_value value of the response header is taken as the cache time. The verification is as follows:
- perform
node cache-Last-Modified.js
, the server gets the last modification time of the resource and sets it toLast-Modified
The value of the. accesslocalhost:1030
To see theavatar.jpg
, as shown in the figure below: - Refresh your browser and see that the image is fetched from the cache.
- Companies using heuristic caching can calculate the cache time, modify the local time to exceed the cache time, and then refresh, the cache will be found invalid.
Against 2.4.1 If – Modified – Since
If a resource is returned with a last-Modified identifier, the browser automatically carries if-modified-since, which is the last-Modified value returned. When the request reaches the server, the server determines that if it has not been updated since the last update, it returns 304. Returns if updated. The verification is as follows:
- perform
node cache-Last-Modified.js
, the server gets the last modification time of the resource and sets it toLast-Modified
The value of the. As shown below, and notice the size of the resource. - Refresh the page and look at NetWork again. You’ll notice that it’s in the request header
If-Modified-Since
. If the server determines that the resource has not changed, 304 is returned, and since the server returns 304, the resource is fetched from the cache, so the resource size is reduced, as shown below. - Modify the
index.html
The contents of the file are refreshed again. You’ll notice that the return is 200, the HTML content is updated, and a new one is returnedLast-Modified
The resource size changes accordingly.
304 A request can also trigger a storage policy, as shown in the flow judgment figure at the beginning of this article. You can verify the request by adding the corresponding header when you return.
Note that if-modified-since can only be used for GET and HEAD requests.
2.4.2 the If Unmodified — Since
If-unmodified-since indicates that the update is performed normally If the resource is not modified, otherwise the response of 412(Precondition Failed) state code is returned. There are two main scenarios.
- Used in insecure requests to make the request conditional (such as POST or other insecure methods), such as requesting an update to a wiki document, which is not modified before the update is performed.
- with
If-Range
Fields, when used together, can be used to ensure that new fragment requests come from an unmodified document.
2.5 the ETag/If – Match/If – None – Match
The ETag is the unique identifier of the requested resource on the server, and the browser can cache data based on the ETag value. If-none-match carries the last ETag value on the next request, returning 304 If the value remains unchanged, or new If you change.
Note that both ETag and if-none-match values are enclosed in double quotes.
The verification steps are similar to last-Modified. Run node cache-etag.js. I won’t go into details here.
If-match Judgment logic The logic is the opposite of if-none-match.
Finally, ETag takes precedence over last-Modified. When ETag and Last-Modified, ETag takes precedence, but last-Modified is not ignored and requires server implementation. The validation is as follows, where the server determines the priority:
- perform
node cache-ETag+Last-Modified.js
. The server is set in the resource’s response headerETag
andLast-Modified
. The diagram below: - Refresh your browser and you’ll see
index.html
Request time 304. View the Node logsETag effect
. As follows:
Advantages and disadvantages of caching
Well, in the long second part, we briefly covered the basics of HTTP Cache. Let me summarize the pros and cons of each type of cache. As shown in the following table:
4. Best practices
As you can see from the pros and cons of the various caches above, none of them are perfect. So it’s recommended to do something like this:
- Do not cache HTML to prevent users from obtaining the updated content in a timely manner.
- use
Cache-Control
andETag
To control the caching of static resources used in HTML. General is toCache-Control
themax-age
Set it to a higher value, and then useETag
Verify. - Use signatures or versions to differentiate static resources. In this way, static resources will generate different resource access links and will not be perceived after modification.
There are two other things that are not covered in this article, but are not recommended:
- Use HTML meta tags to specify cache behavior
- Use query strings to avoid caching. Because caching has known problems, using query strings can cause some proxy servers not to cache resources.
Five, small test knife, see you have mastered?
With all this content, it’s time to see the results. So let’s look at the following questions.
If you access localhost:1030 for the first time, the response header of Avatar. PNG is as follows:
HTTP/1.1 200 OK cache-control: no-cache Content-Type: image/ PNG last-modified: Tue, 16 Oct 2018 11:42:28 GMT Accept-Ranges: bytes Date: Tue, 16 Oct 2018 15:57:21 GMTCopy the code
Question 1: How can Avatar. PNG be reloaded after the page is refreshed?
Question 2: What happens if cache-control in the above information is set to private?
So let’s go back and think about it.
Thoroughly understand the Http cache mechanism – based on the cache strategy three elements decomposition method. Thank you.
All right, answer the question.
Problem 1: It is validated with if-modified-since and the server. Return 304 unchanged, return 200 changed.
Fault 2: When cache-control is set to private, heuristic caching is triggered. When the Cache is refreshed again, Avatar.png hits the strong Cache and is exchanged from the Cache.
conclusion
Well, this is the end of the article, I hope to be helpful to you.
Thank you
Thanks to liu Bowen, author of Vue. Js, for his valuable suggestions.
Refer to the link
- MDN | Cache-Control
- Thoroughly understand the Http caching mechanism – cache strategy based three-factor decomposition method
- Thoughts on browser caching mechanisms generated by memoryCache and diskCache
- A Web Developer’s Guide to Browser Caching
- Browser cache mechanism analysis
- HTTP cache
- Are Your Cache-Control Directives Doing What They Are Supposed to Do?
- Hypertext Transfer Protocol