Every front-end developer is familiar with browser caching, and it is also a very important optimization method in our daily development. It plays a very important role in saving bandwidth, improving loading and rendering speed, reducing network congestion, and improving user experience.
The browser cache process is as follows
- Start loading, domain name resolution, DNS cache
- Local cache (Memory cache)
- Http caching (strong and negotiated caching)
- Server side cache (CDN cache)
DNS cache
Usually we enter a web address, it contains a domain name and port can be assigned a unique IP address, and then establish a connection for communication, and the domain name to find the IP address process is DNS resolution.
www.dnscache.com (domain name) - DNS resolution -> 11.222.33.444 (IP address)Copy the code
This process takes a toll on network requests, so the browser caches the IP address the first time it gets it. The next time a request is made from the same domain name, the browser looks in the local cache first. If the cache is valid, the browser returns the IP address directly. Otherwise, the browser continues to address the IP address.
- First search the browser’s own DNS cache, if present, domain name resolution is complete.
- If the browser does not find the corresponding entry in the cache, the browser attempts to read the hosts file of the operating system to check whether the mapping exists. If the mapping exists, domain name resolution is complete.
- If no mapping exists in the local hosts file, search for the local DNS server (ISP server or manually configured DNS server). If yes, the domain name is resolved.
- If the local DNS server is not found, it sends a request to the root server for a recursive query.
Memory cache
Memory cache is a browser’s own optimization to read the cache faster, independent of developer control and HTTP headers. After a resource is stored in the memory, the next request will not go through the network, but directly access the memory. When the page is closed, the resource is released from the memory. When the same page is opened again, the from memory cache will not be displayed.
So one might ask, when is the resource put into the memory cache
The answer is that almost all network requests are automatically added to the memory cache by the browser according to relevant policies. But memory caches are destined to be “short-term storage” because of both their large size and the fact that the browser can’t take up unlimited memory. When the amount of data is too large, the cache will fail even if the page is not closed.
The memory cache ensures that two requests (e.g. with the same SRC and with the same href) will be requested at most once on a page to avoid waste.
Disk cache (HTTP cache)
The hard disk cache, also known as the HTTP cache, is the most important part of the browser cache. Because as you can see, DNS cache is basically doing an IP address lookup and it’s autonomously done, and memory cache is out of control, it’s kind of a black box. Therefore, the importance of the remaining hard disk cache that we can control is self-evident, and most of the optimization is for hard disk cache.
HTTP caching is divided into mandatory caching and negotiated caching
Mandatory cache (also called strong cache)
For strong caching, the fields that Control it are: Expires and cache-control, where cache-control takes precedence over Expires.
When a client makes a request to the server, the server wants you to cache the resource, so it adds this content to the response header
Cache-control: max-age=3600 I want you to Cache this resource for 3600 seconds (1 hour) Expires: Thu, 10 Nov 2020 08:45:11 GMT Expires: Thu, 30 Apr 2020 12:39:56 GMT Etag:W/" 121-171CA289EBf ", This resource is Modified W/" 121-171CA289ebf "last-modified :Thu, 30 Apr 2020 08:16:31 GMT, Last Modified time of this resourceCopy the code
Cache-control and Expires are HTTP/1.1 and HTTP/1.0, respectively. In order to be compatible with HTTP/1.0 and HTTP/1.1, we set both fields in the actual project.
After the browser receives the response, it does the following
- The browser caches the response body of the request to a local file
- The browser marks the request method and the request path for this request
- The browser marks this cache for 3600 seconds
- The response time of the browser logging server is Greenwich Mean time
The 2020-04-30 12:39:56
This record is important because it provides a basis for future browser requests to the server.
Later, when the client is ready to request the same address again, it suddenly remembers: Is what I need in the cache?
At this point, the client will search the cache to see if there are cached resources, as shown below
To check whether the cache is valid, set max-age + Date to obtain an expiration time and check whether the expiration time is greater than the current time. If yes, the cache is still valid. If no, the cache is invalid.
Negotiate the cache
Once it finds that the cache is invalid, it doesn’t simply delete the cache, but rather, in hope, asks the server, “Can I still use this cache?”
The browser then issues a cached request to the server
Cached requests include the following headers:
If-modified-since: Thu, 30 Apr 2020 08:16:31 GMT: 2020-04-30 08:16:31 GMT: 2020-04-30 08:16:31 GMT: 2020-04-30 08:16:31 GMT: 2020-04-30 08:16:31 GMT If-none-match: W/"121-171ca289ebf" If -none-match: W/"121-171ca289ebf"Copy the code
The reason for sending two messages is to be compatible with different servers, because some servers only recognize if-modified-since and some servers only recognize if-none-match, Some servers recognize both, but in general if-none-match takes precedence over if-modified-since
There are two possible outcomes
- ** Cache invalidation: ** Then very simply, the server again gives a normal response (response code)
200
With the response body), and can be attached to a new cache instruction, the browser cache the new content - ** The cache is valid: ** The server returns a 304 redirect, and the browser responds with a new cache instruction in the response header.
Supplement (key field values)
Cache-Control
In the above tutorial, cache-control is a header that the server responds to the client, providing a max-age to specify the Cache time.
In fact, cache-control can set one or more of the following values:
public
: indicates that the server resource is public. Let’s say there’s a page resource where everyone sees the same thing. This value is not meaningful to the browser, but may be useful in some scenarios. On the principle of “I tell, you do,”http
A lot of the time in a protocol the client or server tells the other side the details, and it’s up to the other side to decide whether to use it or not.private
: indicates that the server resource is private. Let’s say you have a page resource that each user sees differently. This value is not meaningful to the browser, but may be useful in some scenarios. On the principle of “I tell, you do,”http
A lot of the time in a protocol the client or server tells the other side the details, and it’s up to the other side to decide whether to use it or not.no-cache
: tell the client that you can cache this resource, but don’tdirectlyIt is used. After you cache, each subsequent request needs to be accompanied by a cache directive that lets the server tell you if the resource has expired.no-store
: tells the client to do no caching for this resource, and each subsequent request will proceed as normal. If this value is set, the browser will not cache the resource.max-age
: No further elaboration
For example, cache-control: public, max-age=3600 indicates that this is a public resource. Please Cache it for 1 hour.
Cache-control: no-cache not only in the response header, but also in http1.1, cache-control: no-cache in the request header says to the server: Don’t worry about any caching, give me a normal result. This is the same functionality as the HTTP1.0 version of the header field pragma.
expire
In HTTP1.0, the expiration point is specified via the Expire response header, for example:
Expire: Thu, 30 Apr 2020 23:38:38 GMT
Copy the code
In http1.1, this has been changed to cache-control max-age.
conclusion
When the browser accesses an already accessed resource again, it does this:
1. Determine whether the strong cache is hit based on related fields. If yes, the cache is directly used.
2. If the strong cache is not matched, the system sends a request to the server to check whether the negotiation cache is matched.
3. If the negotiated cache is matched, the server returns 304 telling the browser to use the local cache.
4. Otherwise, return the latest resource.