preface
Negotiation cache, I believe you have seen a lot of articles. However, most of the time, knowledge points are simply logical, which is not only difficult to remember, but also easy to be gradually taken away in the process of transmission. Next, I’ll deconstruct the mechanism of negotiated caching from the perspective of nginx source code.
The cause of
Most articles on negotiated caching will talk about Etag and Last-Modified. Simply put, etag is an identifier generated based on the content of the file. Some articles will say that the identifier generated by the content hash, similar to MD5, to determine whether the identifier changes to decide whether to return the resource file. Last-modified is the last time the file was modified, and the change in this time is used to determine whether to return the resource file.
Typically, servers use both by default, so the first question is which will have the higher priority. Most of the articles said eTAG was a higher priority, which was fine with me at first. Because the identification is obviously more accurate, the cache should not be used if the file changes, and the change time is likely to change the content many times in a very short period of time without changing the time. The logic made sense until I read an article that said:
Other articles mentioned the high priority of ETAG. It was the first time I saw a specific process, and it was very counterintuitive. If the etag is the same as the content hash, how can the server continue to judge last-Modified?
At this time I feel, such a detailed problem should be difficult to get the answer through the search engine, then think, Nginx is very lightweight ah, can directly look at the source to find the answer?
Check nginx source code
I don’t want to go into details, but I’m looking for relevant keywords in the project, such as 304, if-none-match header fields and so on.
Take a look at the key source code
nginx/src/http/modules/ngx_http_not_modified_filter_module.c
The first two ifs are not surprising. It’s obviously last-modified first, and the idea is that once it’s false, it returns the resource. 304 is returned only if both judgments are true.
Analysis of the
So far we have the answer, but we don’t know why Nginx does this. The reason for our confusion is that we’ve been thinking in terms of our users. If our company’s product asks us to implement a negotiated caching mechanism for Nginx, it is to implement these two solutions, we have two ideas, four combinations (pseudocode) :
/** */
/ / a
if(header_in.if_modified_since && checkTime(header_in.if_modified_since)) return response(304)
if(header_in.if_none_match && checkEtag(header_in.if_none_match)) return response(304)
return newResource()
/ / 2
if(header_in.if_none_match && checkEtag(header_in.if_none_match)) return response(304)
if(header_in.if_modified_since && checkTime(header_in.if_modified_since)) return response(304)
return newResource()
/** */
/ / 3
if(header_in.if_modified_since && ! checkTime(header_in.if_modified_since))return newResource()
if(header_in.if_none_match && ! checkEtag(header_in.if_none_match))return newResource()
return response(304)
/ / 3
if(header_in.if_none_match && ! checkEtag(header_in.if_none_match))return newResource()
if(header_in.if_modified_since && ! checkTime(header_in.if_modified_since))return newResource()
return response(304)
Copy the code
It is definitely not advisable to combine one first, because if the modification time is judged first, the modification time accuracy is only seconds, once the file is changed repeatedly within 1 second, there will be judgment error and no new file will be returned.
If eTAG matches, 304 is returned. If etag matches, 304 is returned. If the eTAG does not match, then we can judge the modification time is also wrong, because the eTAG does not match. If the modification time is the same, 304 will be returned, which is still functionally incorrect.
Idea number one is not going to work, and then idea number two.
These two judgments, whether before or after, actually do not affect the function, are reasonable, either does not match should return a new file. Then we have to think about the efficiency of the implementation. Etag is only a checkEtag method, but if you want to checkEtag, do you need to calculate an etag from the file on the server and compare it to the etag sent by the header? Calculating the etag of a file is not easy, because if the file is large, it must be expensive. Judging the modification time is just judging a ready-made string, which obviously costs very little.
conclusion
If the modification time is not matched, the new file will be returned to avoid calculating the eTAG consumption of the file. If the modification time is matched, the eTAG will continue to be judged. If the modification time is matched, 304 will be returned. This is the nginx scheme.
One day
When I tried to control variables, I found that nginx’s etag method was extremely coarse. It only calculated the length of the file content, and also inserted the modification time into the method ngx_http_set_etag. There was no way to control the variables. I tried to modify the file multiple times in a second with the script without changing the bits, and found that miscalculations did occur. So far I deeply experience a truth, practice gives true knowledge, everybody mutual encouragement.
Note: The etag method may not be the same for different servers. Apache seems to compute hashes. If you are interested, you can explore it further.Copy the code