What is the browser cache policy? What exactly do request and response headers contain? Gold nuggets essay

HTTP request headers and response headers

General-header

Request URL: https://*****.com.cn/index/****List?keyid=100000032&key1Id=2863&key2Id=150600 Request URL
Request Method: GET Request Method
Status Code: 200 OK Status Code Remote Address: 192.168.0.181:443 Remote Address of the request Repuller Policy: no-repuler-when - repuller repuller Policy: no-repuler-when - repuller repullerCopy the code

Request Header (client -> server [Request Header])

GET /landingpage/getDealerList? keyid=1000&key1Id=286&key2Id=150600
HTTP/1.1(Protocol and version of request)
Host: localhost:8080 (connecting to the destination Host and port number) Accept: application/json, text/javascript, */*; Q = 0.01 (the client can receive the resource type) [image/webp image/apng, image / * and * / *; q = 0.8] Accept-Language: zh-CN,zh; Q =0.8 (the language type received by the client) Accept-encoding: gzip, deflate (the type of compressed data that clients can receive) Connection: keep-alive (Maintains the Connection between the client and server) Where the Referer: http://localhost/links.asp (never) The user-agent: Mozilla / 5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, Like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1 ** If-modified-since: Tue, 22 May 2017 18:23:51 GMTCookie: sensorsdasdkcro11ss=%7B%22distin322ct_id%22%3A%22NaN%22%2C%22first_id%22%3A%2217254cc2025400-3686400-17254cc2026450%22%2 C%22props%22%3A%7B%22%24latest_traffic2_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%C%22%24latest_search_keywo rd%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A593%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7DA % 2217254CC202541-0649a2DD8253DD-30667d00-3686400-17254CC2026450 %22%7D (client temporary storage server information) Date: Tue, 11 Jul 2000 18:23:51 GMT (Time when the client requests the server)Copy the code

Response Header (server -> client [Response Header])

HTTP/1.1(Protocol and version of the response) 200(status code) OK(Description)302 (the client requests the server, but the server does not have the corresponding resource, the server asks the client to request another server, i.e. the client makes a second request, redirection) will send two requests307 (the client requests the server, but the server does not have corresponding resources, the server itself requests to find other servers, that is, the client requests once, forwarding)304 (the client requests the server, at this time the client cache, there is no need to download new content from the server, the server called the client to find their own cache, optimization)500(resource requested by client, server present, but error during execution) Location: http://www.baidu.com (the server needs the client to access the page path) Server:OpenResty is the platform used by our company Refresh: 1; Url =http://www.baidu.com (the server requires the client to refresh after 1 second and then access the specified page path) Transfer-encoding: chunked (chunked data to client) Set-Cookie:SS=Q0=5Lb_nQ; Path =/search (server sends temporary data to client) ** Last-modified: Mon, 22 May 2017 09:41:07 GMT [Mon, 28 Sep 1970 05:00:00 GMT] ** cache-control: no-cache, no-store, must-revalidate Pragma: no-cache (server prevents client from caching page data) ** etag: W/"5f017c5c-a8e"  Connection: close(1.0)/(1.1) keep-alive (Maintains the Connection between the client and server) Content-language: zh-cn Content-Type:application/json; Charset = utF-8 【image/ GIF 】 Content-encoding: gzip (the server can send the compression Encoding type) Content-length: 888 (Length of compressed data sent by the server) Content-Disposition: attachment; Filename = aaA.zip (the server requires the client to open the file as a download) Date: Mon, 22 May 2017 18:41:07 GMTCopy the code

Talk about the content-type

MediaType stands for Internet MediaType. Also known as MIME Type. In Http headers, content-type is used to indicate the media Type in a specific request. For example, content-type: text/ HTML; charset:utf-8;

Common media format types are as follows:

Text/HTML: HTML formatText /plain: plain text formatText/XML: indicates the XML formatImage/GIF: indicates the GIF image formatImage/JPEG: JPG formatImage/PNG: indicates the PNG image formatCopy the code

Media format types beginning with Application:

Application/XHTML + XML: XHTML formatApplication/XML: XML data formatApplication/Atom + XML: Atom XML aggregation formatApplication /json: Json data formatApplication/PDF: in PDF formatApplication/MSWORD: Word document formatApplication /octet-stream: binary stream data (e.g. common file downloads)Application/X-www-form-urlencoded: <form encType= "" > Default encType, form form data is coded as key/value format and sent to the server (form default submit data format)Copy the code

Another common media format is used when uploading files:

Multipart /form-data: This format is used when files need to be uploaded in the formCopy the code

Let’s talk a little bit about status in the response header

1xx: Accept and continue processing200: Succeeded, and data is returned201: The request is accepted and the resource has been created202: Accept the request203: Request successful, but unauthorized204: Success, but no content205: Succeeded. Reset the content206: Success, part of it301: Permanent move, redirect302: Temporary redirect, original URL available304: The resource has not changed, can continue to use the cache305: Proxy access is required400: Request syntax error401: Require authentication403: Reject the request404: The resource does not exist500: Server errorCopy the code

Query String Parameters Formats the parameter

adtag: 123456
Copy the code

What is browser caching?

Simply put, the browser cache is a copy of a requested Web resource (such as HTML, IMG, JS, data, etc.) stored in the browser. The cache stores a copy of the output based on incoming requests. When the next request comes in, if it is the same URL, the cache will decide based on the caching mechanism whether to respond to the access request directly with the copy or send the request to the source server again. It is common for the browser to cache the page that has been visited by the website. When the browser visits this URL again, if the page has not been updated, it will not download the page again, but directly use the local cached page. Only when the site clearly indicates that the resource has been updated will the browser download the page again. As for the mechanism of how browsers and web servers indicate whether a web page is up to date,

Opening the browser page for the first time (all data from the first request from the server)

Open the browser page for the second time (from disk cache, from Memory cache)

Why use browser caching?

Traffic and broadband are money for network operators, especially large Internet companies. It could save hundreds of thousands of dollars. Using the cache will only have a small amount of network traffic to request server resources, reducing operating costs. 2. Reduce the pressure on the server. After the validity period of network resources is set, users can reuse the local cache to reduce the requests to the source server and indirectly reduce the pressure on the server. At the same time, the crawler robot of search engine can reduce the crawl frequency according to expiration mechanism, and also can effectively reduce the pressure of server. 3, can effectively reduce network delay, speed up the opening of the page bandwidth for personal website operators is very important, and for large Internet companies, may sometimes because of the money and really don't care. Does the Web cache still work? The answer is yes, for the end user, the use of caching can significantly speed up page loading, resulting in a better experience.Copy the code

Browser cache rules?

For browser-side caches, these rules are defined in HTTP headers and Meta tags in HTML pages. They use freshness and checksum to determine whether the browser can use the cached copy directly, or whether it needs to go to the source server to get the newer version. Freshness (expiration mechanism) : That is, the cache copy expiration date. A cached copy must satisfy the following conditions for the browser to consider it valid and new enough: 1. Contains the complete expiration control header (HTTP header) and is still valid. 2. The browser has already used the cached copy and has checked for freshness in a session; In either case, the browser will fetch the copy directly from the cache and render it. Check value (validation mechanism) : When the server returns a resource, it sometimes carries the Entity Tag (Etag) of the resource in the control header, which can be used as the check identifier for the browser to request again. If the check labels do not match, the resource has been modified or expired, and the browser needs to obtain the resource content again.Copy the code

Browser cache control?

Expires policy: Expires is a Web server response header field that tells the browser in response to an HTTP request that the browser can cache data directly from the browser before the expiration date without having to request it again. However, Expires is an HTTP 1.0 thing, and the default browser now uses HTTP 1.1 by default, so its role is largely ignored. One drawback to Expires is that the time returned is the server’s time. The problem with Expires is that if the client’s time is out of sync with the server’s time (for example, clocks are out of sync, or across time zones), the error can be large, so as of HTTP 1.1, cache-control is used: Max-age = second replacement.

Cache-control policy (important) : Cache-control works like Expires, specifying the expiration date of the current resource and controlling whether the browser caches data directly from the browser or rerequests it to the server. It’s just that cache-Control has more options, is more elaborate, and takes precedence over Expires when set at the same time.

The value can be public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, or max-age.Copy the code

Public indicates that the response can be cached by any cache. Private indicates that all or part of a response message from a single user cannot be processed by the shared cache. This allows the server to describe only when part of a user's response message is invalid for another user's request. No-cache indicates that a request or response message cannot be cached. This option does not mean that you can set "no cache". Sending in the request message will result in neither the request nor the response message being cached and not being saved at all. Max-age indicates that a client can receive a response with a lifetime not longer than the specified time in seconds. Min-fresh indicates that the client can receive a response with a response time less than the current time plus the specified time. Max-stale indicates that the client can receive response messages beyond the timeout period. If you specify a value for a max-stale message, the client can receive a response message beyond the specified value for the timeout period.

Last-modified/if-modified-since: last-modified/if-modified-since is used with cache-control.

Last-modified: Indicates when the response resource was Last Modified. When the Web server responds to a request, it tells the browser when the resource was last modified. If-modified-since: When a resource expires (using the max-age identified by cache-control) and the resource is found to have a last-Modified statement, if-Modified-since is added to the request to the Web server, indicating the request time. When the Web server receives the request, it finds the if-modified-since header and compares it with the last modification time of the requested resource. If the last modification time is relatively new, it indicates that the resource has been changed again, then the whole resource content is responded (written in the response message package), HTTP 200; If the last modification time is old, it indicates that the resource has not been modified. In this case, HTTP 304 is used to inform the browser to continue using the saved cache.

Etag/ if-none-match: Etag/ if-none-match is also used with cache-control.

Etag: A unique identifier that tells the browser that the current resource is on the server when the Web server responds to a request (the generation rule is determined by the server). In Apache, the value of ETag is hashed by INode, Size, and last modified time. If-none-match: When the resource expires (using the max-age identified by cache-control) and the resource is found to have an Etage statement, the web server requests the resource again with if-none-match (the Etag value). When the Web server receives the request, it finds if-none-match and compares it with the corresponding checksum string of the requested resource to return 200 or 304.

Since livingLast-ModifiedHe2 sheng’sEtag? You might feel like usingLast-ModifiedIt is enough to let the browser know if the local cached copy is new enough, why is it neededEtag(Entity identification)?HTTP1.1In theEtagThe emergence of is mainly to solve severalLast-ModifiedDifficult problems to solve:

Last-modified is only accurate to the second level. If some files are Modified more than once in a second, it will not be accurate to the time when the file was Modified. If some files are generated regularly, sometimes the contents are unchanged but last-Modified is changed. The server may not obtain the correct file modification time or the time is inconsistent with that of the proxy server. Etag is the unique identifier of the corresponding resource on the server that is automatically generated by the server or generated by the developer, which can control the cache more accurately. When last-Modified is used with ETag, the server validates the ETag first.

Yahoo’s Yslow rule warns against setting Etag carefully: It is important to note that last-Modified files must be consistent across multiple machines in distributed systems so that the load is not balanced to different machines and the comparison fails. Yahoo recommends that distributed systems turn off eTAGS as much as possible. Because except for last-Modified, inodes are also hard to keep consistent).

Pragma lines are designed for HTTP1.0 compatibility and are used in the same way as cache-Control: no-cache.

Browser HTTP request flow

At the first request

When you ask again

Requests that cannot be cached:

Of course, not all requests can be cached. Pragma :no-cache (HTTP1.0), cache-Control :max-age=0, etc. Dynamic requests whose input content is determined by cookies, authentication information, etc., cannot be cached. 3. Cache-control: cache-control: cache-control: cache-control: cache-control: cache-control: The HTTP response header does not contain last-Modified /Etag. The HTTP response header does not contain last-Modified /Etag. The HTTP response header does not contain last-Modified /Etag. Requests that do not contain cache-control /Expires cannot be cachedCopy the code

How does the browser decide between “From Disk cache” and “from Memory cache”?

Looking at zhihu, where everyone is struggling with this issue, I found two reliable answers:

1. Check the memory first. If yes, load it directly
2. If the memory does not exist, obtain the memory from hard disks. If yes, load the memory directly
3. If the hard drive is not available, make a network request 4. The loaded resources are cached to hard disks and memoryCopy the code

Let’s start with the characteristics of memory caching fast (read fast) timeliness (process dies, he dies)

Access -> 200 -> exit browser -> 200(from disk cache) -> refresh -> 200(from memory cache)

The second phenomenon (take the image as an example): as long as the image is base64 I see from memroy cache

The third phenomenon (take JS CSS as an example) is that large JS CSS files are directly disk cache

Fourth phenomenon: In private mode, almost all from Memroy cache