What is caching
A Web cache is an HTTP device that can hold a copy of a document.
There are two types of HTTP caches, local caches and proxy caches. The local cache is the cache in the client device, and the proxy cache is the cache proxy server, commonly known as the CDN.
2. Cache mechanism
1. Cache mechanism
The mechanism of caching is for client-cache device-source interaction. The processing mechanism of caching is as follows:
In general, cache-control /Expires, also known as mandatory caching, is used to determine whether a Cache is fresh, as shown in the figure above. Server revalidation is typically determined by if-none-match + ETag or if-modified-since + last-Modified “conditional GET” requests, also known as comparison caching.
2. Special features of the local cache
When there are multiple caches, such as the caches in the client device and the CDN, there are two caches. As long as the client has a cache locally, the client is a Web caching device in the usual sense, except that the client-server distance is almost zero.
The concept of a cache is quite important. A cache is a device. All caching logic is based on three key points: client, cache, and source server. When there is a client cache and a proxy cache, it can be:
- The special feature of the local cache is that you can use the cache directly to determine that the cache is not invalid without sending a Request.
Therefore, when requesting an HTTP request, first query the local Cache of the client to check whether there is a Cache. If there is a Cache, then follow cache-control: Max-age = XXX is used to determine whether the Cache is fresh. If it is fresh enough, that is, the second request is within the cache-control time, and the locally saved Reponse + data can be directly used at this time, without any request.
If it’s not fresh enough, you can’t use the local cache, and you should make a formal request instead. When the request arrives at the cache server, the cache server sends a conditional revalidation request to the source site using methods like if-modified-since. If the validation is successful and the Cache is not expired, update the cache-control /Expires value, recalculate the time, integrate the Reponse, and return it to the client. The returned entity does not contain data, and the status code is 304. If the cache is invalidated, a status code of 200 is returned and the entity contains all data.
The sequence diagram is as follows:
3. The cache expires
Meaning: When there is a cache, use the expiration validation mechanism to verify that the cache can be used. This mechanism is also called forced caching by many people.
This is usually done in the local or proxy Cache, and is validated by cache-control or Expires.
1. Expires
The old HTTP1.0 protocol used the Expires field to indicate the expiration date of a document, such as:
Expires: Thu, 15 Apr 2010 20:00:00 GMTCopy the code
** Meaning: ** This field can use the current copy of a component until the specified time.
Defect:
- The clock on the client and server must be the same.
- The server needs to be reset when the time expires.
So here’s the second way:
2. Cache-Control:max-age
Cache-control :max-age is an optimization of Expires, such as:
Cathe - Control: Max - age = 315360000Copy the code
** Caching can be used from the start of the request at the max-age time, except for the request.
In this way, you can eliminate the uniform Expires time limit.
Cache-control: Max -age= XXX
Note: There are many other optional values for cache-control, which are described later.
Four, server reverification
Meaning: Even if the cache expires, it doesn’t mean that the cached file is inconsistent with the file on the original server, it just means that a time check is made to see if the cache is still usable. This situation is called server revalidation.
** Authentication mechanism: **HTTP allows the client to send a “condition GET” to the server. According to the condition, only when the document in the server is different from the cache, the server will include all the content in the Response body, otherwise 304 will be returned, and the Response does not contain resources.
There are many kinds of conditional statements, and there are two common ones:
1. The if-modified-since and last-modified
If-modified-since used by the client, added in the request header. Last-modified used by the server and returned in the response header. The two are used together to verify that the resource has really changed. If it is changed, the status code is 200, and the response body contains all the contents. If it is changed, the status code is 304, and the response entity does not contain the body, but only the header.
For example, 🌰 :
The Response of the first request is as follows:
As can be seen from the figure, Response contains cache-control: max-age=10, indicating that the Cache can be used directly within 10 seconds, and re-verification is required if the time exceeds 10 seconds. And Response with a last-modified:
Request and response headers for the second request:
The figure above indicates that cache revalidation was performed after 10 seconds and the validation was successful, so 304 is returned.
2. If – None – Match and ETag
** Meaning: ** Some resources are periodically overwritten, but the content is not changed. In this case, if-modified-since is not sufficient, and an entity tag is required.
The client records the server’s ETag in the response header and submits it to the server in the request pass using the IF-none-match field. Similarly, if the ETAGS are consistent, the cached resources have not changed in the source site. Therefore, 304 is returned, indicating that the cache is available. Otherwise, 200 is returned, with the full content in the returned body. Meanwhile, Response will return the latest ETag.
For example, cache revalidation passes:
4. Strength verification
** Meaning: ** Some documents have been modified, but the content of the modification is not important, such as comments. In this case, a strong and weak tag is needed to tell the user when the cache can continue to be used.
Still similar to code management on Git. Every time the code is committed, an index value is generated on the corresponding branch, but the version number is not changed every time the code is committed, let alone a large version number is generated every time the code is updated.
This is also the case with caching, because some innocuous changes to the resource, such as comments, do not affect the continued use of the original copy. Therefore, there is strong and weak verification:
** Weak validation: A weak validator changes only when the main meaning of the content of ** changes. ** Strong validation: ** Strong validators change whenever the content changes.
Such as:
Etag: w/"2.6" if-none-match: w/"2.6" Etag: w/"2.6" if-none-match: w/"2.6"Copy the code
Without w/ is a strong validation, for example:
Etag: "2.6" if-none-match: "2.6" Etag: "2.6" if-none-match: "2.6"Copy the code
5. Multiple values of if-none-match
If-none-match can have multiple values to indicate that copies of these versions exist in the cache, as shown in figure:
6. Priorities
Because of the cache control, there are many cases of cache use. For example, if cache-control is set to no-cache, the Cache can be used only after reverification succeeds. When cache-control is max-age= XXX, the Cache can be used within this time after the Response is received.
In addition, in addition to cache-control, there is also a tentative expiration mechanism, so the logic of whether to use the Cache or not is not a simple Yes or No, but a comprehensive judgment based on multiple conditions. Common logic for cache-control with and without Cache-Control is presented later.
Therefore, the relationship between if-modified-since/if-none-match and last-modified /ETag is uncertain. A common mechanism is for the server to return the last-Modified field, the client to cache it, and If a revalidation of the cache is required (such as when max-age has expired), then the stored value is added to the request header and sent to the server as the if-modified-since value.
7. Pay special attention
- If the server returns an entity tag (ETag), HTTP/1.1, the client must use the entity tag;
- If the server sends back only one last-Modified value, the client can use if-modified-since validation, but it is not required.
- If both are provided, then you need to use both authentication schemes so that HTTP/1.0 and HTTP/1.1 are compatible, but not required;
- If the client header contains both the entity label and the last modified date, the server will return 304 only if both conditions are validated.
As shown in figure:
This is where the tentative expiration caching strategy comes in.
5. Cache control
1. Cache-control for proxy caches
- no-store
The cache must not store anything about client requests and server responses. The full response is downloaded for each request initiated by the client.
- no-cache
Proxy caches can store caches, but they must be validated with the source before they are made available to the client.
- private
Can only be used for private caches (client caches). Middlemen cannot cache. Default is private.
- public
Public caches, which can be used by middlemen (proxy caches, CDN);
- must-revalidate
If it is expired, it must be verified before it can be used or provided to the client, which is slightly less generous than no-cache, which is verified regardless of whether it is expired.
- max-age
Expiration time, the number of seconds the resource is in a fresh state when it is sent from the server;
- pragma
Is a header property defined in the HTTP/1.0 standard. The effect of including the Pragma in the request is defined in the cache-control header: No-cache is the same, but the HTTP response header does not explicitly define this property, so it cannot be used as a complete replacement for the cache-control header defined in HTTP/1.1. Pragma is generally defined to be backward compatible with HTTP/ 1.0-based clients.
2. Cache-control for the client
A client can also add a cache-control header to the request. The full meaning is as follows:
Two of the most common are:
- Cache-Control: no-cache + Pragma: no-cache
Indicates that the proxy cache must authenticate the cache. The cache can be provided only after the authentication succeeds. Pragma is for support of HTTP1.0;
- Cache-Control: no-store
Indicates that the proxy cache cannot provide caching, and cache resources in the proxy cache should be deleted.
6. Exploratory expiration
Definition 1.
If there is no cache-control or Expires in the response, then the Cache can calculate a tentative maximum usage period.
The maximum lifetime can be calculated using any algorithm, but if you get a maximum trial period greater than 24 hours, you should want to add a tentative expiration to the response header, but this is rarely used, and the lM-Factor algorithm is commonly used.
2. LM – factor algorithm
Conditions of use:
-
- There is no cache-control or Expires in the response;
-
- Last-modified exists in the response;
Calculation method:
Trial Max = rate * (request time – last modified time)
For example, 🌰 :
3. Special tips
The Authoritative Guide to HTTP specifically states:
In fact, Safari implements exploratory expiration.
7. Cache policies implemented by the iOS system
NSURLRequestCachePolicy is used in NSURLSession in iOS. This enumeration means that Apple uses the iPhone as a local cache device in accordance with the HTTP protocol and implements the cache logic corresponding to the protocol. However, its logic is used to the tentative expiration, and its judgment logic is roughly as follows:
Validation:
In iOS, there’s this: If Response exists with ETag, the Request header will be automatically added with if-none-matched. However, If Response exists with last-modified, the Request header will not be added with if-modified-since actively, but directly stored and used after use. On the second use, the network is not asked to use the cache directly. At this time, the tentative expiration mechanism takes effect and the cache does not expire.
The original response header (omitted) :
HTTP/1.1 200 OK Date Wed, 19 Feb 2020 08:09:14 GMT the content-type image/jpeg/Server openresty 1.11.2.5 Content - MD5 c6090671ef82012e7e71b6dc938dc706 ETag 51cf999237cf860b7fd92e6986fc4767 Last-Modified Wed, 12 Feb 2020 12:01:36 Asia/ShanghaiCopy the code
The second time the request is made, the packet is not captured, but the response is 200 and contains data. At this point, the tentative expiration mechanism is in effect, using the local Reponse and data directly, and no request is made.
In fact, there are several operations that can be performed on this Response using a Charles breakpoint:
- Remove the last-modified
- Change last-Modified to close to the request time
- Add the cache-control
1. Remove the last-modified
Meaning: There is only ETag in Response, so the client will not trigger the exploratory expiration or the mandatory cache in the next request, but directly request the revalidation of the cache.
The operation is as follows:
Remove the last-Modified field, then modify the response header via Charles breakpoint, remove the last-Modified field, and the final response header looks like this:
HTTP/1.1 200 OK Date Wed, 19 Feb 2020 08:09:14 GMT the content-type image/jpeg/Server openresty 1.11.2.5 Content - MD5 c6090671ef82012e7e71b6dc938dc706 ETag 51cf999237cf860b7fd92e6986fc4767Copy the code
Making the request again, Charles was able to capture the packet and prove that neither mandatory caching was triggered nor tentative expiration was triggered. The request and response headers are as follows:
As can be seen from the figure, the client sends a conditional GET for cache verification, and the verification succeeds.
2. Change last-Modified to be close to the request time
Because cache-control does not exist but last-Modified does, iOS triggers a tentative expiration.
This value is relatively old before modification. After a tentative expiration trigger, the cache will not expire for a short period of time, so the request will not be sent and the cache will be used directly.
However, if the last-modified value is close to the request time, then the calculation result of the tentative expiration is that the cache will expire soon and the resource changes frequently, so at this time, iOS will send a conditional request for cache revalidation.
The modified Reponse is as follows:
HTTP/1.1 200 OK Date: Wed, 19 Feb 2020 09:06:27 GMT Content-type: image/jpeg Content-Length: 3444 Connection: Keep alive - Server: openresty / 1.11.2.5 Content - MD5: c6090671ef82012e7e71b6dc938dc706 ETag: 51cf999237cf860b7fd92e6986fc4767 Last-Modified: Wed, 19 Feb 2020 09:06:27 GMTCopy the code
The request and response headers for the second request are as follows:
3. Add the cache-control
** Meaning: ** After adding cache-control, it follows the normal logic, that is, it does not trigger the exploratory Cache.
Since cache-control is relatively complex, use max-age=5 and max-age=36500 as examples.
The modified response header is:
HTTP/1.1 200 OK Date: Wed, 19 Feb 2020 09:13:05 GMT Content-type: image/jpeg Content-Length: 3444 Connection: Keep alive - Server: openresty / 1.11.2.5 Content - MD5: c6090671ef82012e7e71b6dc938dc706 cache-control: Max - age = 5 ETag: 51cf999237cf860b7fd92e6986fc4767 Last-Modified: Wed, 12 Feb 2020 12:01:36 Asia/ShanghaiCopy the code
A request after 5 seconds does not trigger the mandatory cache. Instead, after the mandatory cache is invalidated, the normal cache revalidation is performed:
Similarly, if max-age=36500 and the cache force takes effect in a short period of time, it must be directly using the local cache without making a request.
Attach the test code:
// Image NSURL *url = [NSURL URLWithString:@"http://cms-bucket.ws.126.net/2020/0212/51cf9992j00q5kluo00bmc000tj00tjc.jpg?imageView&thumbnail=140y88"] ; NSMutableURLRequest *request = [NSMutableURLRequest new]; request.HTTPMethod = @"GET"; request.URL = url; / / query whether there is a cache NSCachedURLResponse * cacheReponse = [[NSURLCache sharedURLCache] cachedResponseForRequest: request]; If (cacheReponse) {NSLog(@" local cache "); } else {NSLog(@" no local cache "); } NSURLSessionDataTask *task = [[NSURLSession sharedSession] dataTaskWithRequest:request completionHandler:^(NSData * _Nullable data, NSURLResponse * _Nullable response, NSError * _Nullable error) { NSLog(@"%@",request.allHTTPHeaderFields); NSHTTPURLResponse *httpReponse = NSHTTPURLResponse *)response; NSLog(@"statusCode:%li", httpreponse.statusCode); } else {NSLog(@" response has no data ");}}]; [task resume];Copy the code
conclusion
A few knowledge points to summarize:
At the end
Normally, iOS uses the default caching mechanism, and the server configures both the Expires/ cache-control and if-modified-since/if-none-match fields according to HTTP/1.1. Basically, it meets most of the requirements, and leaves the decision of whether to update the cache or not to the server, that is, the H5 page can control whether the page in the App is updated or not, without sending packets.