Why is it faster to open the page the second time? Five steps to understand the front-end cache and make pages fly

This is the seventh day of my participation in the August More text Challenge. For details, see:August is more challenging

How to make the first screen load faster?

Why is it so much faster to open the page the second time?

How to keep the data from being cleared after refreshing or closing the browser?

This is mainly because the first time the page is loaded, some data is cached, and later loads are retrieved directly from the cache without having to request the server, so it is faster and the server is less stressed

Whether it’s interviewing or performance tuning, caching is an important and essential part of the front end. The main content of this article is a detailed summary of this piece, if it is helpful to you, it will support a wave of like it

There are two main aspects to this process

The network cache is divided into three sections: THE DNS cache, the HTTP cache, and the CDN cache. Some people call the HTTP cache here the browser cache, but you get the idea

Then there’s the local: the browser’s local and offline storage, which makes the first screen load faster and makes the page fly

DNS cache

When entering the page, the system performs A DNS query to find the IP address of the server corresponding to the domain name, and then sends the request

There are a lot of flowcharts on the web, and I’ve taken two from them

DNS domain name lookup is performed recursively on the client, as shown in the figure

Finding at any step ends the search process, and the client only issues a query request once during the process

If the forwarder configured on the DNS server is not set, the DNS server sends a resolution request to the root 13. Here is the iterative query, as shown in the figure

13 root: There are 13 root server IP addresses in the world, not 13 servers. With the help of anycast technology, mirror sites of these IP addresses can be set up around the world, so it is not the only host that is accessedCopy the code

Obviously, multiple query requests are made throughout the process

After entering the page for the first time, the DNS resolves the address record to be cached in the client, and further forward at least does not need to initiate subsequent iterations of the query, which is faster

HTTP cache

The idea is to store the page resources obtained by HTTP requests locally and then load them later, fetching them directly from the cache instead of requesting the server, resulting in a faster response. Look at the picture first:

Strong cache

On the first request, the server notifies the browser of the expiration time of the resource through the Expires and cache-control fields in the response header. If the browser requests the resource again, it will determine whether the resource has expired. If the resource has not expired, it will be used directly without sending a request to the server, which is called strong caching

Expires

Specifies the absolute time at which the resource expires and is added to the response header when the server responds.

expires: Wed, 22 Nov 2021 08:41:00 GMT
Copy the code

Note: Failure may occur if server and browser time are inconsistent. For example, if the current time is August 1, expires expires is August 2, and the client changes the computer time to August 3, it will not be able to use the cache

Cache-Control

Specify a resource expiration time in seconds, as follows, indicating that the resource is available within 300 seconds after this request is correctly returned, otherwise it will expire

cache-control:max-age=300
Copy the code

Why do you need two fields to specify the cache expiration time?

Some browsers only know cache-control, and some don’t, so if you don’t know Expires, look for Expires

The difference between Expires and cache-control

Expires isHTTP / 1.0Where cache-control isHTTP / 1.1The;
Expires is intended for compatibility and is not supportedHTTP / 1.1In the case of
Cache-control has a higher priority than Expires if both are present.

Cache-ControlRequest headerCommon properties

Field (in seconds)	instructions
max-age=300	Reject resources longer than 300 seconds. The value 0 indicates that the latest resource is obtained
max-stale=100	The cache is still used for 100 seconds after it expires
min-fresh=50	The cache expires in 50 seconds, so I don’t get it. It’s not fresh
no-cache	Negotiated cache validation
no-store	No caching
only-if-chached	Error 504 is reported if no cache is used
no-transform	No conversion or transformation of resources is permitted. Content-Encoding, Content-Range, HTTP headers such as content-type cannot be modified by the proxy. But it is of no damn use

The number of seconds is custom, but I’ll write death here for the sake of understanding

Cache-ControlResponse headersCommon properties

Field (in seconds)	instructions
max-age=300	The cache validity period is 300 seconds
s-maxage=500	Valid for 500 seconds, priority higher than max-age, applicable to shared caches (such as CDN)
public	It can be cached by any terminal, including proxy servers, CDN, etc
private	Can only be cached by the user’s browser terminal (private cache)
no-cache	Check with the server to see if the resource has changed
no-store	Don’t cache
no-transform	As in the request directive above
must-revalidate	The client cache expires and authenticates to the source server
proxy-revalidate	When the proxy cache expires, retrieve it from the source server

Disadvantages of strong caching

After the cache expires, the resource will be renewed regardless of whether the resource has changed or not

What we want is that if the resource file is not updated, we will continue to use the old resource without re-obtaining the resource even if it expires

So negotiate cache it comes, in the case of strong cache expiration, then go through the process of negotiating cache, to determine whether the file has been updated

Negotiate the cache

When the resource is requested for the first time, the server will return the browser with the expiration time stated above, and will add a last-Modified field to the response header to tell the browser when the resource was Last Modified

last-modified: Fri, 27 Oct 2021 08:35:57 GMT
Copy the code

When the browser requests it again, it sends that time to the server via another field, if-modified-since

if-modified-since: Fri, 27 Oct 2021 08:35:57 GMT
Copy the code

The server then compares the time of these two fields. If they are the same, it means that the file has not been updated, and returns status code 304 and empty response body to the browser. The browser directly takes expired resources to continue to use. If the comparison is different and the resource has been updated, the status code 200 and the new resource are returned, as shown in the figure

Last-modified/if-modified-since

disadvantages

If the cache file is opened locally, this will still happen even if the file has not been modifiedLast-ModifiedThe server cannot match the cache and sends the same resource
becauseLast-ModifiedIf a file is modified in an undetectable amount of time, the server considers it a hit and cannot return the correct resource
If the resource changes periodically, for example, after the resource is modified, it changes back to the original appearance within a cycle, we think that the cache before this cycle can be used, butLast-ModifiedThink otherwise

Because of these shortcomings, there is an additional pair of ETag/ if-none-matches to compare the contents of the file

ETag/If-None-Match

When a resource is requested for the first time, the server returns an Etag field in addition to an Expires, cache-Control, and last-Modified header, which represents a unique identifier for the current resource file. This identifier is generated by the server based on the content encoding of the file. It can accurately sense the changes of the file. Whenever the content of the file is different, the ETag will be generated again

etag: W/"132489-1627839023000"
Copy the code

When the browser requests it again, it sends the file id to the server with another field, if-none-match

if-none-match: W/"132489-1627839023000"
Copy the code

If the server finds that the two fields are the same, it means that the file has not been updated. It returns status code 304 and empty response body to the browser. The browser directly takes expired resources to continue to use. If the comparison is different and the resource has been updated, the status code 200 and the new resource are returned

The difference between last-modified and ETag

EtagPerception file accuracy is higher thanLast-Modified
When used at the same time, the server verification priorityEtag/If-None-Match
Last-ModifiedIn terms of performanceEtagBecause theEtagIt is not a complete replacement because it costs the server extra and affects the performance of the serverLast-ModifiedCan only be used as a supplement and reinforcement

The difference between strong cache and negotiated cache

Look for the strong cache first. If there is no hit, look for the negotiated cache
The strong cache doesn’t send a request to the server, so sometimes the browser doesn’t know if the resource has been updated, but the negotiated cache will send a request to the server, and the server will know if the resource has been updated
Most projects currently use cached copywriting
1. Negotiation cache generally stores:HTML
2. Strong cache general storage:css.image.js, file name belthash

Heuristic cache

If there is no Expires, cache-control: max-age, or cache-control :s-maxage in the response, and the response does not contain any other Cache restrictions, the Cache can use heuristics to calculate the Cache expiration

The cache time is typically calculated by subtracting 10% of the last-Modified value from the Date field (the time the packet was created) in the response header

max(0, (Date - Last-Modified)) % 10
Copy the code

Cache actual usage policies

For frequently changing resources:

Using cache-control: no-cache, in which the browser requests data each time, and then uses Etag or last-Modified to verify that the resource is valid, can significantly reduce the size of the response while not saving the number of requests

For infrequently changing resources:

You can set their cache-control to a large max-age=31536000(a year), so that the browser will hit the strong Cache when it requests the same URL later. To solve the update problem, you need to add hash, version number and other dynamic characters to the file name (or path), and then change the dynamic characters. This allows you to change the reference URL to invalidate the previous strong cache (it’s not immediately invalidated, it’s just no longer in use)

Cache location, and read priority

The priorities are in the following order

1. Service Worker

See another of my articles for more details

2. Memory Cache

The resource is stored in memory and the next access is read directly from memory. For example, when refreshing a page, much of the data comes from the memory cache. Generally stored scripts, fonts, pictures.

The advantage is that the reading speed is fast; Disadvantages: Once the Tab page is closed, the cache in memory is released, so the capacity and memory time is poor

3. Disk Cache(hard Disk)

That is, the resource is stored in the hard disk, and the next access is directly read from the hard disk. Based on the fields in the request header, it determines which resources need to be cached, which resources can be used without a request, and which resources have expired and need to be rerequested. And even in the case of cross-domain sites, resources at the same address, once cached by the disk, are not requested again.

The advantage is cached in the hard disk, large capacity, and storage time is longer; The disadvantage is that the reading speed is slower

4. Push Cache

This is the push cache, which is what’s in HTTP/2, and it’s only used when none of the above three caches are hit. It only exists in the Session and is released once the Session is over, so the Cache time is short and the Cache in the Push Cache can only be used once

CDN cache

When we send a request and the browser’s local cache is invalid, the CDN helps us calculate where the short and fast path to get the content is.

For example, in Guangzhou request guangzhou server than the request Xinjiang server response speed is much faster, and then to the nearest CDN node request data

The CDN will determine whether the cached data is expired. If not, the cached data will be returned to the client directly, thus speeding up the response. If the CDN determines that the cache is expired, it will send back a source request to the server, pull the latest data from the server, update the local cache, and return the latest data to the client.

CDN not only solves the problem of cross-carrier and cross-region access, greatly reduces the access delay, but also plays the role of diversion, reducing the load of the source server

The CDN you can’t help knowing

Several differences between refresh and carriage return

useCtrl+F5When the page is forcibly refreshed, the local cache file is directly expired, and the strong cache and negotiated cache are skipped, and the server is directly requested
Click Refresh orF5Refresh the page when the local cache file expires and then takeIf-Modifed-SinceandIf-None-MatchInitiate negotiation cache validation freshness
Browser type URL press Enter, browser searchDisk CacheIf yes, use; if no, send network request

The local store

Cookie

The earliest proposed local storage method carries cookies in each HTTP request to determine whether multiple requests are initiated by the same user. The characteristics are as follows:

There is a security problem, if it is intercepted, you can get all the Session information, and then forward the Cookie to achieve the purpose. (See my other post about attacks and defenses to understand browser security (same origin restriction /XSS/CSRF/ man-in-the-middle attack))
The number of cookies in each domain name cannot exceed 20 and the size cannot exceed 4KB
Cookies are sent whenever a new page is requested
The Cookie name cannot be modified after it has been created
Cookies cannot be shared across domain names

There are two ways to share cookies across domain names

Use the Nginx reverse proxy
After logging in to one site, write cookies to other sites. The Session on the server is stored on a node, and the Cookie stores the Session ID

Cookie usage scenarios

The most common use of Cookie and Session is to store the SessionId in a Cookie. Each request will carry the SessionId so that the server knows who made the request
Can be used to count the number of clicks on the page

What are the fields in a Cookie

Name,SizeSo the name implies
Value: Saves the user login status. This value should be encrypted and cannot be used in plaintext
Path: The path where this Cookie can be accessed. For example, juejin.cn/editor, path is /editor, and only /editor can read cookies
httpOnly: indicates that cookies are not accessed through JS, reducing XSS attacks.
Secure: can only be carried in HTTPS requests
SameSite: Specifies that browsers cannot carry cookies in cross-domain requests to reduce CSRF attacksLook here
Domain: Domain name, cross-domain or Cookie whitelist, allow a subdomain to obtain or operate the parent domain Cookie, implement single sign-on can be very useful
Expires/Max-size: Specifies the time or number of seconds to expire. If not set, the browser will expire just like Session if you close the browser

LocaStorage

H5 is a new feature, is to store information to the local, its storage size is much larger than the Cookie, 5M, and is permanent storage, unless actively clean, or will remain

Restricted by the same origin policy, that is, port, protocol, host address, any different cannot access, and in the browser set to private mode, cannot read the LocalStorage

It can be used in many scenarios, such as storing website themes, storing user information, and so on. It can be used for data that has a large amount of data or does not change much

SessionStorage

SessionStorage is also a new feature of H5. It is mainly used to temporarily save data in the same window or TAB. The data is not deleted when the page is refreshed, but is deleted after the window or TAB is closed

SessionStorage and LocalStorage are LocalStorage, and can not be crawled by crawlers, and the same source policy restrictions, but SessionStorage is more strict, only in the same browser under the same window can be shared

Its API is the same as LocalStorage getItem, setItem, removeItem, Clear, key

Its use scenarios are generally time-sensitive, such as the storage of some websites’ visitor login information, as well as temporary browsing records

indexDB

Is a browser local database with the following features

Key and value pair storage: internal object warehouse to store data, all types of data can be directly stored, including JS objects, in the form of key and value pairs, each data has a corresponding primary key, the primary key is unique
Asynchronous: While indexDB is operating, it is still possible for users to perform other operations. Asynchronous design is designed to prevent large amounts of data being read and written, slowing down the performance of the web page
Support transactions: for example, when you modify the entire table, an error is reported in the middle of the modification. At this time, all the data will be restored to the state of the unmodified level
Same-origin restriction: Each database must have its own domain name. Web pages can only access databases under their own domain name
Large storage space: Generally no less than 250MB, or even no upper limit
Binary storage is supported, such as ArrayBuffer objects and Blob objects

In addition to the above four front-end storage methods, there is WebSQL, similar to SQLite, is a relational database in the real sense, can use SQL to operate, but with JS to convert, more trouble

The difference between the top four

	Cookie	SessionStorage	LocalStorage	indexDB
Storage size	4k	5 m or more	5 m or more	infinite
Storage time	Can specify a time, not specified to close the window to expire	Invalid when the browser window is closed	permanent	permanent
scope	Same browser, all the same tags	Current TAB	Same browser, all the same tags
Exists in the	Back and forth in the request	Local client	Local client	Local client
The same-origin policy	The same browser can only be accessed from the same page with the same path	For my own use	Same browser, can only be accessed by the same page shared

Offline storage

Service Worker

The Service Worker is an independent thread running outside the main js thread behind the browser, and naturally cannot access the DOM. It is equivalent to a proxy server, which can intercept the user’s request, modify the request or directly respond to the user without contacting the server. Such as loading JS and images, which allow us to use web applications when offline

It is used for functions such as offline caching (to improve the loading speed of the first screen), message push, and network proxy. The HTTPS protocol must be used for Service workers because the Service Worker is involved in request interception and HTTPS is required for security

Caching using a Service Worker involves three steps:

It is a registered
You can then cache files after listening for install events
The next time you access it, you can intercept the request and return the cached data directly

/ / index. Js registration
if (navigator.serviceWorker) { 
    navigator.serviceWorker .register('sw.js').then( registration= > {
        console.log('Service worker registered successfully')
    }).catch((err) = >{
        console.log('Servcie worker registration failed')})}// sw.js listens for the 'install' event and caches the required files in the callback
self.addEventListener('install'.e= > {
    // Open the specified cache file name
    e.waitUntil(caches.open('my-cache').then( cache= > {
        // Add files to cache
        return cache.addAll(['./index.html'.'./index.css'])}})))// Intercepts all requested data in the request event cache
self.addEventListener('fetch'.e= > { 
    // Look for the response hit by the cache in the request
    e.respondWith(caches.match(e.request).then( response= > {
        if (response) {
            return response
        }
        console.log('fetch source')}})))Copy the code

conclusion

Praise support, hand stay fragrance, and have glory yan

Thanks for seeing this, come on!

reference

How browsers work and practice
Ruan Yifeng Web API tutorial
winty
Lvan-Zhang