The whole process of HTTP

Enter url, resolve domain name from URL –>DNS map to IP–>TCP three-way handshake (complete TLS/SSL handshake) –> construct HTTP request, Populate the context to the HTTP header –> initiate an HTTP request –>HTTP response –>(browser tracks the redirect address)–> server processes the request –> Server returns an HTML response –>(release the TCP connection as appropriate)–> Browser engine parses the HTML response, Render package body to user interface -> HTTP request for resources in the page.

DNS cache

What is the DNS

The full Name is Domain Name System.

  • The World Wide Web, as a distributed database of domain names and IP addresses, makes it easier for users to access the Internet without having to remember IP strings that can be read directly by machines.
  • DNS runs on TOP of UDP and uses port 53.

The DNS

The process of obtaining the IP address from a domain name is called domain name resolution (or hostname resolution)

www.dnscache.com (domain name) – DNS resolution -> 11.222.33.444 (IP address)

DNS cache

Browsers, operating systems, Local DNS, and root DNS all cache DNS results to a certain extent. DNS query process:

  • 1. Search the browser’s DNS cache. If the DNS cache exists, domain name resolution is complete.
  • 2. If no mapping exists in the browser’s own cache, the browser attempts to read the hosts file of the operating system to see if any mapping exists. If any mapping exists, domain name resolution is complete.
  • 3. If no mapping exists in the local hosts file, check the local DNS server (ISP server or manually configured DNS server). If yes, the domain name is resolved.
  • If the local DNS server is not found, it sends a request to the root server for a recursive query

CDN cache

What is the CDN?

Full name: Content Delivery Network. When users browse the website, CDN will select a CDN edge node nearest to the user to respond to the user’s request, which greatly reduces the access delay, which can play a role in streaming and reduce the server load pressure.

CDN cache

  • As for the CDN cache, when the browser’s local cache is invalid, the browser sends a request to the CDN edge node.

  • Similar to browser cache,CDN edge node also has a caching mechanism. CDN edge node Cache policies vary with different service providers, but generally follow the HTTP standard protocol and set the CDN edge node data Cache time through the cache-Control: max-age field in the HTTP response header.

  • When the browser requests data from the CDN node, the CDN node will judge whether the cached data has expired. If the cached data has not expired, it will directly return the cached data to the client.

  • Otherwise, the CDN node will issue a back source request to the server, pull the latest data from the server, update the local cache, and return the latest data to the client. CDN service providers generally provide multiple dimensions based on file suffixes and directories to specify CDN cache time to provide users with more refined cache management.

CDN advantage

CDN node solves the problem of cross-operator and cross-region access, and the access delay is greatly reduced. Most requests are completed at the edge of the CDN, which acts as a diversion and relieves the load on the source server.

Browser cache (HTTP cache)

The browser communicates with the server in reply mode, that is, the browser initiates an HTTP request and the server responds to the request.

  • So how does the browser determine whether or not a resource should be cached, and how to cache it?
  • After the browser sends the request to the server for the first time and gets the request result, it stores the request result and the cache identifier in the browser cache. The browser’s processing of the cache is determined by the response header returned during the first request for the resource. The specific process is shown below:

Each time the browser initiates a request, it searches the browser cache for the result of the request and the cache identifier. Each time the browser receives the returned request result, it stores the result and cache identifier in the browser cache

Advantages of browser caching

1. Reduce redundant data transfer such as direct use of cache without initiating requests

2. Reduce the burden of the server, greatly improve the performance of the website, such as initiated requests but consistent with the server cache data does not need to respond again

3. Speed up the client to load web pages such as a data request, can be divided into browser initiated request, service request response request, browser parse response three steps. Browser caching helps us optimize performance in the first and second steps,

Cache location classification

From the point of view of the cache position, there are four kinds, and each has its priority. When the cache is searched in sequence and none is hit, the network will be requested.

  • Service Worker
  • Memory Cache
  • Disk Cache
  • Push Cache

1.Service Worker

  • Service Worker: is running onA separate thread behind the browser, generally can be used to achieve the cache function.
  • Transport protocol HTTPS: The Service Worker involves request interception and must use THE HTTPS protocol to ensure security.
  • You can control which files are cached, how the cache is matched, how the cache is read, and the cache is persistent.

There are three steps to realize the caching function of Service Worker:

1. Register the Service Worker first.

2, listen to the install event: cache the required file, the next access through the interception request query whether there is cache, there is a cache can be directly read the cache file, otherwise to request data.

3. When the Service Worker does not hit the cache, the fetch function is called to get the data and search the data according to the cache search priority. 4. The browser will still display the data we fetched from the Service Worker in Memory Cache or network request.

2.Memory Cache Memory Cache

  • Reading data is definitely faster and more efficient than disk, but the cache duration is short and will be released as the process (such as closing the TAB page) is released,
  • The capacity is small, and the operating system needs to be careful about memory usage.
  • secondaryThe refreshMost of the data on the page comes fromMemory cache

Webkit resources are currently divided into two categories:

  • One is the main resource, such as HTML pages, or download items
  • One is derived resources, such as images or script links embedded in HTML pages
  • They correspond to two classes in the code: MainResourceLoader and SubresourceLoader.

Webkit supports memoryCache, but only for derived resources, corresponding to the CachedResource class, which holds raw data (CSS, JS, etc.) as well as decoded image data.

1. Memory cache resource instruction:preloaderRelated instructions (e.g<link rel="prefetch">) Download resources, yesOne of the most common methods of page optimizationedgeParses js/ CSS files, while the network requests the next resource.

2. Cache resources in memory has little to do with HTTP Cache header cache-control. In addition to matching urls, other features such as Content-Type and CORS may also be checked.

3.Disk Cache

Disk Cache:

  • Stored in theThe hard diskThe Cache reads slowly and can store anything. It is less efficient than Memory Cache but has a larger capacity.
  • Large coverage.
  • According to the fields in HTTP Herder, determine which resources need to be cached, which resources can be used directly without request, and which resources have expired and need to be re-requested. Its direct operation object is CurlCacheManager
  • Cross-site situationsOnce a resource with the same address is cached by the hard disk, it will not request data again.
  • Most of the Cache comes from Disk Cache

Files are preferentially stored on hard disks. Relatively small resources with high system priorities and high usage are stored in memory.

Because the CSS file to load a can render out, we will not read it frequently, so it doesn’t fit into the in-memory cache, but the js script, such as may be performed at any time, if the script in the disk, we need from the disk when executing the script into the memory, so it will make a great IO overhead, lose response could lead to a browser.

4.Push Cache

  • Push Cache is HTTP/2 and is used when all three caches fail.

  • It only exists in a Session, is released once the Session ends, and has a very short cache time, about five minutes in Chrome.

  • Caching instructions in HTTP headers are not strictly enforced. Resources in no-cache and no-store can be pushed. Once the connection is closed, Push cache is released

  • All resources can be pushed and can be cached

  • Edge and Safari are relatively poorly supported

  • Multiple pages can use the same HTTP/2 connection and thus use the same Push Cache. This depends on the browser implementation. For performance reasons, some browsers may use the same HTTP connection for different tabs with the same domain name.

  • The Cache in a Push Cache can only be used once

  • The browser can reject existing resources and push resources to other domains.

If none of the above caches is hit, then you have to make a request to fetch the resource.

Level 3 Caching principle (Access cache priority)

  • First look in memory, if there is, directly load.
  • If it is not present in memory, it looks in hard disk, if it is loaded directly.
  • If not, make a network request.
  • The requested resource is cached to hard disk and memory.

Cache policy classification

Therefore, for the sake of performance, most interfaces should select a cache policy. Generally, the browser cache policy is divided into two types: strong cache and negotiated cache. The cache policy is implemented by setting HTTP headers. When the browser requests resources from the server, it first determines whether the strong cache is hit, and then whether the negotiated cache is hit!

Strong cache

When loading a resource, the browser determines whether the strong cache is hit according to the information in the header of the local cached resource. If the strong cache is hit, the browser directly uses the cached resource and does not send a request to the server. In the Network option of the Chrome console, you can see that the status code returned by the request is 200. And Size displays from disk cache or from Memory cache.

The header information here refers to Expires and Cahe-Control.

Expires

This field is the HTTP1.0 specification, and its value is a time string in the GMT format of an absolute time, such as Expires:Mon,18 Oct 2066 23:59:59 GMT. This time represents the expiration time of the resource, before which the cache is hit. One obvious disadvantage of this approach is that since the outage time is an absolute time, it can lead to cache clutter when the server and client times diverging significantly.

  • Cache expiration time, used to specify the expiration time of resources, is a specific point in time on the server.

  • Expires=max-age + Request time. This parameter must be used in combination with last-Modified.

  • Expires is a Web server response header field that tells the browser in response to an HTTP request that the browser can cache data directly from the browser before the expiration date without having to request it again.

  • This field is the HTTP1.0 specification, and its value is a time string in the GMT format of an absolute time, such as Expires:Mon,18 Oct 2066 23:59:59 GMT. This time represents the expiration time of the resource, before which the cache is hit

  • Due to the local time, if the local time is changed, the time difference between the server and the client may be large, resulting in cache confusion.

Cache-Control

  • Cache-control is a header that appears in HTTP1.1 and is used to Control web caching.

  • For example, cache-control :max-age=3600, which means that the validity period of the resource is 3600 seconds (also recorded by the browser). If the resource is loaded again within this time, the strong Cache will be hit.

  • Cache-control can be set in either the request header or the response header, and can be combined with multiple instructions:

Public: All content will be cached (both client and proxy servers can be cached). Specifically, the response can be cached by any intermediate node, such as Browser <– proxy1 <– proxy2 <– Server, and the proxy in the middle can cache resources.

Private: All content can be cached only by the client. The default value of cache-control. For Browser <– proxy1 <– proxy2 <– Server, proxy will send data returned by Server to Proxy1 without caching any data and only request forwarding.

No-cache: The contents of the client cache need to be authenticated by the negotiation cache. Cache Control does not use cache-control for pre-validation. Instead, the Etag or Last-Modified field is used to Control the Cache. Before using cached data, the browser needs to check whether the data is consistent with the server

No-store: All content is not cached, that is, neither mandatory cache nor negotiated cache is used

Max-age: max-age= XXX (XXX is numeric) indicates that the cache contents will expire after XXX seconds

S-maxage (unit: s) : the same as max-age, only in the proxy server (such as CDN cache). S-maxage is used for proxy caching. S-maxage has a higher priority than max-age. Max-age and Expires headers are overwritten if s-maxage is present.

Max-stale: indicates the maximum expiration time that can be tolerated. A client can receive a response that has expired. If not specified, a response with any age is received (age represents the difference between the time the response was generated or validated by the source and the current time).

Min-fresh: Minimum level of freshness that can be tolerated. The client can accept the response as the sum of age plus the time set by Min-fresh.

Cache-control and Expires can be enabled at the same time in the server configuration, and cache-control has a high priority when both are enabled

Negotiate the cache

When a strong cache is not hit, the browser sends a request to the server, which determines whether the cache is hit based on the partial information in the header. If it hits, 304 and Not Modified are returned, telling the browser that the resource is Not updated and that the local cache is available.

Negotiation cache invalid, return 200 and request result:

The header information here refers to last-modify/if-modify-since and ETag/ if-none-match.

Last-Modify/If-Modify-Since

When the browser requests a resource for the first time, last-modify is added to the response header returned by the server. Last-modify indicates the time when the resource was Last Modified: last-modified: Fri, 22 Jul 2016 01:47:00 GMT

When the browser requests the resource again, the request header contains if-modify-since, which is the last-modify returned before the cache. After receiving if-modify-since, the server determines whether the resource matches the cache based on the last modification time.

If the cache is hit, 304 is returned, the resource content is not returned, and last-modify is not returned. If-modified-since is less than the last modification time of the resource on the server, indicating that the file has been updated, so the new resource file and 200 are returned.

Disadvantages:

Last-modified can only be measured in seconds. If a resource changes within a short period of time, last-Modified does not change. The server cannot hit the cache and sends the same resource.

Periodic changes. If the resource is Modified back to its original state within a period of time, it is considered acceptable to use the cache, but last-Modified is not, hence the ETag.

ETag/If-None-Match

  • An Etag is a unique identifier (generated by the server) that is returned to the current resource file when the server responds to a request. The Etag is regenerated whenever the resource changes.

  • The next time the browser loads a resource and sends a request to the server, it will place the last Etag value in if-none-match in the request header

  • The server can determine If the resource has been modified relative to the client by comparing if-none-match with the ETag of the resource on the server.

  • If the server finds that the ETag does not match, it sends the new resource (including the new ETag) to the client in a regular GET 200 packet return. If the ETag is consistent, 304 is returned to inform the client to use the local cache directly.

Last-modified and ETag can be used together. The server will verify the ETag first. If the ETag is consistent, the server will continue to compare last-Modified, and finally decide whether to return 304

  • First, Etag is superior to Last-Modified in accuracy. Last-modified is measured in seconds, and if a file changes multiple times within a second, their last-Modified is not actually Modified.

  • But the Etag changes every time to ensure accuracy; If the server is load-balanced, the last-Modified generated by each server may also be inconsistent.

  • Second, in terms of performance, Etag is inferior to Last-Modified, because last-Modified only takes time to record, whereas Etag requires the server to compute a hash value through an algorithm.

  • Third, server verification takes Etag as the priority

conclusion

When the browser accesses an already accessed resource again, it does this:

1. Check if the strong cache is hit. If so, use the cache directly.

2. If the strong cache is not matched, the system sends a request to the server to check whether the negotiation cache is matched.

3. If the negotiated cache is matched, the server returns 304 telling the browser to use the local cache.

4. Otherwise, 200 and the latest resource and cache id are returned.

The impact of user behavior on browser caching

  • Open the web page and enter the address in the address bar to check whether there is a match in the disk cache. Use if available; If no network request is sent.

  • Plain flush (F5) : Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next.

  • Forced refresh (Ctrl + F5) : The browser does not use caching, so requests are sent with cache-Control :no-cache(with Pragma:no-cache for compatibility) in the header, and the server returns 200 and the latest content.

practice

Strong cache-control Cache

index.js

// static middleware const static = require('koa-static') const app = new Koa() // static staticPath = './static' app.use(async (CTX, Ctx. set({' cache-control ': 'max-age=300'}); await next(); }) app.use(static( path.join( __dirname, staticPath) )) app.listen(3000, () => { console.log('[demo] static-use-middleware is starting at port 3000') })Copy the code

Cache-control for the refresh page response header is changed to max-age=300.

Verify the principle of three-level caching

After making the network request, the browser saves the image to disk and memory. According to the three-level caching principle, we first look for resources in memory, and we refresh the page. You can see that it is returned from a lookup in the memory cache.

Closing the page frees up resources in memory, but resources on disk are permanent, so they still exist. According to the principle of three-level caching, if the resource is not found in memory, it will go to the disk to find! You can see that it is returned from a lookup in the disk cache.

The resource is set to expire in 300 seconds, and we will verify that the cache is invalid. 300 seconds later… The cache is invalid.

Verify negotiation cache

index.js

Const Koa = require(' Koa ') const path = require('path') const conditional = require(' Koa -static') const conditional  = require('koa-conditional-get'); const etag = require('koa-etag'); Const app = new Koa() // Path to index.js const staticPath = './static' app.use(conditional()); app.use(etag()); // app.use(async (CTX, next) => {// // set cache-control to 300 seconds // ctx.set({// 'cache-control ': 'max-age=300' // }); // await next(); // }) app.use(static( path.join( __dirname, staticPath) )) app.listen(3000, () => { console.log('[demo] static-use-middleware is starting at port 3000') })Copy the code

First request.

We find that the return value already has an Etag value.

The second request

The browser takes the if-none-match request header and assigns it the Etag value of the last returned header and compares it with the Etag value of this returned value. If they are, the negotiation cache is matched. Return 304 Not Modified. You can see it’s hit.

Negotiation cache failure

Modify the picture content to see

First, when we refresh the page twice to load the resource and send the request to the server, we will put the Etag value returned last time into the if-none-match in the request header. The Etag is a unique identifier (generated by the server) that is returned to the current resource file when the server responds to the request. You can see that the Etag is regenerated after the image changes

If the if-none-match header of the client browser does not Match the ETag of the resource on the server, the new resource (including the new ETag) will be sent to the client in the normal GET 200 packet format.

You’re done

Github practice code address: github.com/LHDIYU/Koa/…