As we all know, the cost of obtaining resources from the network is relatively high, the client needs to communicate with the server for many times, if you can effectively use the cache, can greatly improve the performance of the Web application, so it is necessary to understand each detail about the cache.
Before we start, in order to avoid errors, we agreed that caching refers to when a user closes a url or application and then opens it again.
What happens if the user hits refresh or force refresh (CTRL+F5) will be discussed at the end.
Some of the concepts
Browser caches come in two types: non-validative and validative caches
Non-validating Cache: The browser determines whether to use the Cache based on the expiration time. If the file is read directly from the browser Cache within the expiration time, no HTTP request is made. The header fields involved include cache-Control, Expires, and pragma
Validation cache: when sending a request to the server, conditions are attached in the header. The server makes a judgment based on the specified conditions when processing the request. If the conditions are met, a 304 status code is returned and an empty body is returned. If the condition is false, 200 status code is returned and the specified resource is returned. The header fields involved are ETAG and Last-Modified
As you can see from the above, a non-validative cache is optimal because it reads locally and does not even require a network request. The second is a validation cache, which generates network requests, but if the cache is available, it returns an empty body and very little data transfer.
The order in which the browser determines the cache is:
Non-validative cache > Validative cache
If you are not familiar with Nginx, you are advised to read this article: Learning Nginx 101 for Front-end Engineers
Non-validation cache
Non-validation caches are controlled by cache-control, Expires, and pragma headers. Since pragma is a specification in HTTP/1.0, there is no specification for how it behaves in a response, and it can be overridden by cache-control, so we’ll just look at cache-Control and Expires.
Cache-Control
If the nginx configuration file has the following configuration:
# nginx.conf
add_header Cache-Control max-age=20;
Copy the code
This directive specifies that the expiration time of the resource is 20s. Within 20s, if the browser has repeated requests for this resource, it will not generate HTTP requests and will read directly from the browser cache. The request response header is as follows:
HTTP/1.1 200 OK Server: nginx/1.12.2 Date: Fri, 28 Sep 2018 14:10:36 GMT Content-Type: text/ HTML Content-Length: 755 Last-Modified: Thu, 27 Sep 2018 22:44:02 GMT Connection: keep-alive Cache-Control: max-age=20 Accept-Ranges: bytesCopy the code
Max-age =20 specifies an expiration time of 20 seconds, which can be verified in the browser and server log to verify that no HTTP request has occurred.
The most commonly used cache-control is max-age, but there are many other instructions that have different meanings:
Cache-control: must-revalidate
Cache-control: no-cache
Cache-control: no-store
Cache-control: no-transform
Cache-control: public
Cache-control: private
Cache-control: proxy-revalidate
Cache-Control: max-age=<seconds>
Cache-control: s-maxage=<seconds>
Copy the code
Interested friends can learn more about MDN cache-control.
expires
Now reconfigure nginx as follows:
# nginx.conf
#add_header Cache-Control max-age=20;
add_header expires 'Thu, 27 Sep 2019 22:44:02 GMT';
Copy the code
Comment out cache-Control and add an Expires header with a future date. If you visit the page again, if the Cache hits, the response header will contain the following information:
HTTP/1.1 200 OK Server: nginx/1.12.2 Date: Fri, 28 Sep 2018 14:45:15 GMT Content-Type: text/ HTML Content-Length: 755 Last-Modified: Thu, 27 Sep 2018 22:44:02 GMT Connection: keep-alive expires: Thu, 27 Sep 2019 22:44:02 GMT Accept-Ranges: bytesCopy the code
The expiration time can be clearly seen in the response header, during which no new HTTP request is generated and the resource is read directly from the browser cache.
Cache-control takes precedence over Expires
If we re-edit the nginx configuration file as follows:
# nginx.conf
add_header Cache-Control max-age=20;
add_header expires 'Thu, 27 Sep 2019 22:44:02 GMT';
Copy the code
The following response header information can be seen:
HTTP/1.1 200 OK Server: nginx/1.12.2 Date: Fri, 28 Sep 2018 14:49:50 GMT Content-Type: text/ HTML Content-Length: 755 Last-Modified: Thu, 27 Sep 2018 22:44:02 GMT Connection: keep-alive Cache-Control: max-age=20 expires: Thu, 27 Sep 2019 22:44:02 GMT Accept-Ranges: bytesCopy the code
Because cache-control has a higher priority than Expires, it can be known in actual tests that the Cache expires after 20 seconds.
Expires is the HTTP 1.0 header, cache-Control is the HTTP 1.1 header, and cache-Control overwrites Expires if they both exist, and expires returns the server time. If the server time is inconsistent with the client time, it can cause a big error, and HTTP 1.1 is already supported by all browsers, so use cache-control.
Validation cache
After reading non-validation cache, I learned that it is very important for static resources, which can greatly save bandwidth and improve the performance of Web applications. However, some resources, such as HTML and API interfaces, are time-sensitive and need to be verified by the server frequently. In this case, validation cache is needed.
As mentioned earlier, a validation update requires sending a request to the server. If the server determines that there is no update, it returns a 304 status code and an empty body. The browser can read the resource directly from the local cache. If there is an update, return the 200 status code and return the latest resource.
The validation cache is mainly controlled by last-Modified and eTAG headers, which we’ll look at next.
last-modified
Last-modified is enabled by default on Nginx, so you don’t need to configure it manually.
Since non-validation caches have a higher priority than validation caches, they need to be set to invalid when testing, otherwise they will not work:
# nginx.conf
add_header Cache-Control max-age=0;
#add_header expires 'Thu, 27 Sep 2019 22:44:02 GMT'; Cache-control has a higher priority
Copy the code
Now, if you visit again, the request header will have fields like the following:
. If-Modified-Since: Thu, 27 Sep 2018 22:37:45 GMT ...Copy the code
If the resource is not updated, the response header is as follows:
HTTP/1.1 304 Not Modified
Server: nginx/1.12.2
Date: Fri, 28 Sep 2018 15:29:06 GMT
Last-Modified: Thu, 27 Sep 2018 22:37:45 GMT # Specify the last modification date
Connection: keep-alive
Cache-Control: max-age=0
Copy the code
You can also see that the browser side is fetching the content directly from the cache.
For resource updates, the response header is as follows:
HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Fri, 28 Sep 2018 15:29:06 GMT
Content-Type: application/javascript
Content-Length: 4770
Last-Modified: Fri, 28 Sep 2018 15:29:03 GMT # Specify the last modification date
Connection: keep-alive
Cache-Control: max-age=0
Accept-Ranges: bytes
Copy the code
See below:
Last-modified and if-modified-since come in pairs, each of which functions as:
last-modified
In the response header, the server tells the browser, what was the last time this resource was modifiedif-modified-since
In the request header, tell the server when the resource I requested was last modified. The server evaluates this value and returns 304 and empty body if it is consistent with the server’s existing value, or 200 and the latest resource if it is inconsistent with the server’s existing value (the resource has been updated).
Based on these two times, the server and browser can decide whether the resource is up to date and whether local caches can be used.
etag
The ETag HTTP response header is the identifier for a specific version of the resource. It is similar to Last-Modified, which is used for validation caching of the resource, but ETag is more precise (last-Modified is only accurate to seconds), and ETag avoids mid-air collisions. A detailed explanation can be found in the Etag introduction of MDN.
Let’s get straight to his implementation:
# nginx.conf
etag on; Enable eTAG manually
add_header Cache-Control max-age=0;
#add_header expires 'Thu, 27 Sep 2019 22:44:02 GMT';
add_header Last-Modified ' '; To test the effect of eTAG, set last-Modified to invalid
Copy the code
Now, if you visit again, the request header will have the following fields:
. If-None-Match:"5bad5bb9-13a3f".Copy the code
If the resource is not updated, the response header is as follows:
HTTP/1.1 304 Not Modified
Server: nginx/1.12.2
Date: Fri, 28 Sep 2018 22:43:11 GMT
Connection: keep-alive
ETag: "5bad5bb9-13a3f"
Cache-Control: max-age=0
Copy the code
You can see that the browser side is fetching the content directly from the cache.
For resource updates, the response header is as follows:
HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Fri, 28 Sep 2018 22:43:11 GMT
Content-Type: application/javascript
Content-Length: 4770
Connection: keep-alive
ETag: "5baeae7a-12a2"
Cache-Control: max-age=0
Accept-Ranges: bytes
Copy the code
You can see that the server transfers the latest content to the browser and returns 200 code
Etag and if-none-match come in pairs,
- An ETAG is a resource ‘fingerprint’ generated by the server according to certain rules and passed to the client, which stores it along with the cache
- If-none-match is a header that sends the local ETAG value to the server when the client requests the specified resource. The server compares the eTAG value of the current version of the resource with that of the server. If the two values match (that is, the resource has not changed), the server returns 304 unmodified status without any content. Tell the client that the cached version is available. If the eTAG value does not match successfully, 200 code and the resource content are returned.
User active refresh behavior
When a user actively hits refresh or force refresh, the browser attaches different fields to the request header to tell the server how to handle the action.
The user hits Refresh
When the user hits Refresh, the browser adds the following fields to the request header:
If-Modified-Since: Fri, 28 Sep 2018 22:43:06 GMT # If -modified is enabled
If-None-Match: "5baeae7a-12a2" # If eTag is enabled
Cache-Control: max-age=0
Copy the code
Even if cache-control sets a larger value, it does not read directly from the local Cache. Instead, it sends a new request to the server to verify that the resource has been updated, so it skips the first stage of the non-validation Cache and enters the validation Cache.
The user clicks force refresh
When the user hits Force Refresh, the browser places the following fields in the request header:
Pragma: no-cache
Cache-Control: no-cache
Copy the code
As you can see, even If cache-control sets a larger value, it will not be read directly from the Cache, and it will not send if-modified-since and if-none-match, meaning that the server does not get the last update time and etag value of the resource. The latest resource is returned anyway.
Therefore, when the user forces a refresh, the browser actively skips the non-validation cache and validation cache and directly obtains the latest resources from the server.
This is why when the demand side comes to us to look at problems, we always like to force them to refresh…
Reference documentation
- Developer.mozilla.org/zh-CN/docs/…
- Developer.mozilla.org/zh-CN/docs/…
- Developer.mozilla.org/zh-CN/docs/…
- Developer.mozilla.org/zh-CN/docs/…
- Blog.csdn.net/eroswang/ar…
- www.v2ex.com/t/356353
- Developers.google.com/web/fundame…
- Imweb. IO/topic / 5795 d…