Read the original


preface

HTTP caching mechanism as an important means of Web application performance optimization, for students engaged in Web development, should be a basic part of the knowledge system, but also want to become a front-end architecture necessary skills.


The role of caching

The reason we use caching is because caching can bring the following benefits to our Web projects to improve performance and user experience.

  • Speed up the browser to load web pages;
  • It reduces redundant data transmission and saves network traffic and bandwidth.
  • Reduce the server burden, greatly improve the performance of the site.

Since static resources are read from the local cache, it is certain to speed up the loading of web pages in the browser, and it does reduce the data transfer. In terms of improving the performance of the web site, it may not be obvious that one or two users visit to reduce the burden on the server, but if the site is in a high concurrency situation, Using caching can make a qualitative difference in reducing server stress and overall site performance.


Introduction to Cache Rules

In order to facilitate understanding, we believe that the browser is a cache database, to store the cache information (actually a static resource is cached in memory and disk), the browser request data for the first time, at this time no corresponding cached data cache database, you will need to request to the server, the server caching rules and data will be returned, The browser stores the cache rules and data into the cache database.

Requested when an address is entered in the browser address barindex.htmlIt won’t be cached, butindex.htmlOther resources that are requested internally follow a caching policy. HTTP caching has a variety of rules, and can be divided into two main categories based on whether a request needs to be sent to the server: mandatory caching and negotiated caching.


Mandatory cache

1. Force caching of the process

Forcible cache means that after accessing the server for the first time to obtain data, the server does not request the data within the valid period. The forcible cache process is as follows.

2. Force the cache to determine the expiration time

So how do you determine if the cache expires? In fact, it is based on the response header of the server at first access, which is different in HTTP 1.0 and HTTP 1.1.

In HTTP 1.0, the server uses a response header field named Expires with a future absolute time (timestamp). If the current time of the browser request exceeds the set Expires time, the cache is invalid and the request needs to be sent to the server again, otherwise the data will be retrieved directly from the cache database.

In HTTP 1.1, the server uses the response header field cache-Control, which has multiple values with different meanings.

  • Private: the client can cache.
  • Public: Both client and proxy servers can be cached (for the front end, this can be considered withprivateSame effect);
  • Max-age = XXX: the contents of the cache will be inxxxExpires after seconds (relative time, in seconds);
  • No-cache: A negotiated cache (described below) is used to verify that data is out of date.
  • No-store: All content will not be cached. Neither forced cache nor negotiated cache will be triggered.

The most commonly used value of cache-control is max-age= XXX. Caching itself exists for data transfer optimization and performance, so no-store is rarely used.

Note: inThe HTTP 1.0Version,ExpiresThe absolute time of the field is obtained from the server. Since the request takes time, there is an error between the request time of the browser and the time when the server receives the request, which also leads to the error of cache hitThe HTTP 1.1Version, becauseCache-ControlThe value of themax-age=xxxIn thexxxIs the relative time in seconds, so the browser starts counting down after receiving the resource, avoiding itThe HTTP 1.0In order to be compatible with lower versions of THE HTTP protocol, both response headers are used in normal development.The HTTP 1.1Version implementation priority is higher thanThe HTTP 1.0.


3. Check the mandatory cache through Network

We use the Developer tools of Chrome browser to open NetWork to view the information about mandatory cache.

We can see clearly that it is compatible with HTTP 1.0 and HTTP 1.1 versions and has been stored with mandatory cache for 10 years.

Let’s take a look at how cached data differs from other resources in a Network.

In fact, the cache is stored in two locations: memory and disk. The cache is randomly determined by the current browser policy. The data fetched from the memory cache is displayed, and the data fetched from the disk cache is displayed.


NodeJS server implements mandatory caching

// Enforce caching
const http = require("http");
const url = require("url");
const path = require("path");
const mime = require("mime");
const fs = require("fs");

let server = http.createServer((req, res) = > {
    let { pathname } = url.parse(req.url, true); pathname = pathname ! = ="/" ? pathname : "/index.html";

    // Get the absolute path to read the file
    let p = path.join(__dirname, pathname);

    // Check whether the path is valid
    fs.access(p, err => {
        // If the path is invalid, the connection is interrupted
        if (err) return res.end("Not Found");

        // Set the mandatory cache
        res.setHeader("Expires".new Date(Date.now() + 30000).toGMTString());
        res.setHeader("Cache-Control"."max-age=30");

        // Set the file type and respond to the browser
        res.setHeader("Content-Type".`${mime.getType(p)}; charset=utf8`);
        fs.createReadStream(p).pipe(res);
    });
});

server.listen(3000, () = > {console.log("server start 3000");
});
Copy the code

The getType method of the MIME module can successfully return the corresponding file type of the file in the passed path, such as text/ HTML and application/javascript. It is a third-party module and needs to be installed before use.

npm install mime


Negotiate the cache

1. Negotiate the cache process

Negotiation cache and contrast the cache, the cache Settings consultation, access server to get data for the first time, the server will be data cache id is returned to the browser, with the client data and identification will be deposited in the cache database, the next request, will go to the cache to retrieve the cache id sent to the server to ask, when the server data changes will update logo, So from comparing the identity of the server to the browser, on behalf of the same data has not changed, in response to the browser notify data has not changed, the browser will go to get the data in the cache, if the logo is different, on behalf of the server data have change, so will the new data and new identity to return to the browser, the browser will be the new data and identification in the cache, The cache negotiation process is as follows.

The negotiated cache differs from the mandatory cache in that the negotiated cache communicates with the server every time a request is made, and the status code returned by the hit cache server is no longer200, but304.

2. Negotiate the cache judgment identifier

The mandatory cache controls access to the server by expiration time, whereas the negotiated cache interacts with the server each time to compare cache identity. Likewise, the implementation of the negotiated cache is different in HTTP 1.0 and HTTP 1.1.

In HTTP version 1.0, the server sets the cache identifier through the Last-Modified response header, usually taking the Last Modified time (absolute time) of the requested data as a value, and the browser caches the returned data and the identifier. The if-Modified-since request header is automatically sent with the value of the last Modified time (identifier) previously returned. The server retrieves the value of if-Modified-since against the last Modified time of the data. If the Last Modified time is greater than the value of if-modified-since, the Last Modified time and new data are returned through the last-Modified response header. If the Last Modified time is not changed, the status code 304 is returned to notify the browser that the cache was hit.

In HTTP 1.1, the server sets the cache identifier through the Etag response header (the unique identifier, like a fingerprint, is determined by the server). The browser receives the data and the unique identifier and stores it in the cache. On the next request, the if-none-match header brings the unique identifier to the server. The server compares the unique IDENTIFIER with the previous one. If the unique identifier is different, the new identifier and data are returned. If the unique identifier is the same, the status code 304 is returned to notify the browser that the cache is hit.

The flow chart of HTTP negotiation cache policy is as follows:

Note: When using a negotiated cacheThe HTTP 1.0If you add a character to a file and then delete it, the file doesn’t change, but the last modification time changes, it will be treated as a modification, should hit the cache, but the server resends the data, soThe HTTP 1.1Used in theEtagThe unique identifier is generated from the file content or summary, ensuring that the cache will be hit as long as the file content remains the same. For compatibility with lower versions of the HTTP protocol, both response headers are used in development, as wellThe HTTP 1.1Version implementation priority is higher thanThe HTTP 1.0.

3. View the negotiation cache through Network

We also use the Developer tools of Chrome browser to open NetWork to view the relevant information of negotiation cache.

Request header information from the server again:

Response headers that hit the negotiated cache:

Let’s take a look at the difference between the data fetched from the negotiated cache in the Network and the first load.

First request:

Cached request:

Through comparing two figure, we can find that effective negotiation cache status code of 304, and a message size and request time is greatly reduced, the reason is that the server returns only after identified than the header part, through the status code to notify the browser cache, no longer requires the message body together back to the browser.

4. NodeJS server implements negotiation cache

// Negotiate cache
const http = require("http");
const url = require("url");
const path = require("path");
const mime = require("mime");
const fs = require("fs");0
const crytpo = require("crytpo");

let server = http.createServer((req, res) = > {
    let { pathname } = url.parse(req.url, true); pathname = pathname ! = ="/" ? pathname : "/index.html";

    // Get the absolute path to read the file
    let p = path.join(__dirname, pathname);

    // Check whether the path is valid
    fs.stat(p, (err, statObj) => {
        // If the path is invalid, the connection is interrupted
        if (err) return res.end("Not Found");

        let md5 = crypto.createHash("md5"); // Create an encrypted transformation stream
        let rs = fs.createReadStream(p); // Create a readable stream

        // Read the contents of the file and encrypt it
        rs.on("data", data => md5.update(data));

        rs.on("end", () = > {let ctime = statObj.ctime.toGMTString(); // Get the last modification time of the file
            let flag = md5.digest("hex"); // Get the encrypted unique identifier

            // Get the request header for the negotiation cache
            let ifModifiedSince = req.headers["if-modified-since"];
            let ifNoneMatch = req.headers["if-none-match"];

            if (ifModifiedSince === ctime || ifNoneMatch === flag) {
                res.statusCode = 304;
                res.end();
            } else {
                // Set the negotiation cache
                res.setHeader("Last-Modified", ctime);
                res.setHeader("Etag", flag);

                // Set the file type and respond to the browser
                res.setHeader("Content-Type".`${mime.getType(p)}; charset=utf8`); rs.pipe(res); }}); }); }); server.listen(3000, () = > {console.log("server start 3000");
});
Copy the code

In the above code, the file content is read through the readable stream, and the result of MD5 encryption is used as the unique identifier through the Crypto module, so as to ensure that as long as the file content is unchanged, it will hit the cache, which is compatible with HTTP 1.0 and HTTP 1.1 versions. If one of them is met, 304 is returned to notify the browser of a hit.

Note: actually read the contents of the file encryption this practice is not desirable, if the reading is a large file, in reading the contents of the file and conductmd5Encryption can be a time-consuming process, so generate unique identifiers in a way that ensures server performance based on business realities, such as summaries of files.


conclusion

In order to make the caching policy is more robust, flexible, HTTP 1.0 and HTTP 1.1 caching strategies will be used at the same time, and even forced to cache cache and consultation will be used at the same time, for mandatory cache, the server browser notify a cache time, in the cache time, the next request, direct use of caching, beyond the valid time, Implement the negotiated cache policy. For the negotiated cache, Etag and last-Modified in the cache information are sent to the server through if-none-match and if-Modified-since request headers, and the server verifies and sets the new mandatory cache. If the 304 status code is returned, the browser directly uses the cache. If the negotiation cache fails, the server resets the negotiation cache identifier.