There is a method called gzip compression that is often mentioned in interview questions when asked about front-end performance optimization. I recently gained a new understanding of gzip compression while learning about Node file stream operations and the Zlib module. Today I’m going to share with you what gZIP is and what happens from the time the browser requests it to the time the server receives the data.
Due to the previous title, Do you know how Front-end performance optimization GZIP works? A little bit ambiguous, I was going to say how it works is what happens in this process, and a lot of the points that the big guy wants to see is the compression algorithm implemented. So I changed the title to “A Preliminary Study on Front-end Performance Optimization gZIP”, which may be followed by some superficial understanding of the implementation of Gzip compression. My little brother is not very frightened, please forgive me.
What is a gzip
Dude, have you heard of winRAR? You ever heard of 360 compression, fast compression, easy compression? Have you heard of GNUzip?
Gzip is an abbreviation of GNUzip. It is also a file compression program that can compress files into a.gz zip package. And our front-end gzip compression optimization, is through the gzip compression program, compress the resources, so as to reduce the file size of the requested resources.
How common is gzip compression optimization in the industry? Basically, you can open any website and look at its HTML, JS, and CSS files with GZIP compression (even if the JS and CSS files have been obtrusionally compressed, Gzip can still significantly optimize the file size). .
Tips: Gzip typically compresses plain text content to 40% of its original size. However, PNG, GIF, JPG, JPEG and other image files are not recommended to use Gzip compression (SVG is an exception), the first compressed image files can be compressed by gzip space is very small. In fact, adding headers, compressing dictionaries, and validating the response body might make it larger.
For example, if you are accessing nuggets, open the debug tool, select a JS or CSS from the Network request, and find the Content-Encoding: gzip key pair in Response Headers, which indicates that the file is gZIP compressed.
Gzip compression process
As you can see above, here is a growingIO data analysis file introduced by the Nuggets website, gzip compressed, 25.3K size. Now let’s download the file, set up a local server without Gzip enabled, and see how big the file is without gzip enabled (actually download it and you can see the file size, 88.73K).
Here we use the native Node to write a service, easy for us to understand, directory and code as follows:
const http = require("http");
const fs = require("fs");
const server = http.createServer((req, res) = > {
const rs = fs.createReadStream(`static${req.url}`); // Read the file stream
rs.pipe(res); // Return data as a stream
rs.on("error", err => {
// Return 404 not found
console.log(err);
res.writeHead(404);
res.write("Not Found");
});
});
/ / to monitor 8080
server.listen(8080, () = > {console.log("listen prot:8080");
});
Copy the code
With node server. Js, start the service, at this time we visit http://localhost:8080/vds.js, web page displays VDS. The content of the js file, check the Network version, will find the VDS js request size is 88.73 k, and the original resource file size is consistent, Content-encoding: gzip is also not present in Response Headers, indicating that it is not gZIP-compressed.
To enable gzip, node provides the zlib module. You can use it directly.
const http = require("http");
const fs = require("fs");
const zlib = require("zlib"); // <-- introduce zlib blocks
const server = http.createServer((req, res) = > {
const rs = fs.createReadStream(`static${req.url}`);
const gz = zlib.createGzip(); // <-- create gzip compression
rs.pipe(gz).pipe(res); // <-- gzip compressed data before returning
rs.on("error", err => {
console.log(err);
res.writeHead(404);
res.write("Not Found");
});
});
server.listen(8080, () = > {console.log("listen prot:8080");
});
Copy the code
Run this code, visit http://localhost:8080/vds.js, you’ll find page did not show the VDS js content, but the direct download a VDS. Js file, the size is 25 k, size seems to be after the compression. But if you try to open the file with an editor, you will find that it fails to open or that it is a binary file and not text. At this time if the quick reaction of friends may be the same as my first idea, try to change the js suffix to gz. Because, as I said, gzip is actually a zip program that compresses files into a.gz package. What if this place is actually a GZ-compressed package?
To stop the suspense, change the suffix to gz and a 88.73K vds.js file will come out after successful decompression.
I believe that here we should be suddenly enlightened, the original gzip is the resource file compression into a compression package, but the only problem is how I use this compression package, I request a file, server you give me a compression package, I can not identify ah.
The solution to this problem is simply to tell the browser when the server returns the package that it is actually a GZ package and the browser should unzip it before you use it. Content-encoding: gzip for Response Headers
Let’s make a final change to the code and add a request header:
const http = require("http");
const fs = require("fs");
const zlib = require("zlib");
const server = http.createServer((req, res) = > {
const rs = fs.createReadStream(`static${req.url}`);
const gz = zlib.createGzip();
res.setHeader("content-encoding"."gzip"); // Add content-encoding: gzip request header.
rs.pipe(gz).pipe(res);
rs.on("error", err => {
console.log(err);
res.writeHead(404);
res.write("Not Found");
});
});
server.listen(8080, () = > {console.log("listen prot:8080");
Copy the code
When the browser requests the gzip compressed file, it will decompress it and then use it, which is unconscious to us users. The working browser does it silently behind the back. We just see that the size of the network request file is much smaller than the actual resource on the server.
This section takes a long time to explain how Gzip works. It is really easy to understand, and when asked about front-end performance optimization in the future, I believe that Gzip should not be forgotten.
Gzip considerations
Which files are suitable for enabling gzip compression and which are not is a point of caution.
Another point to note is who does the gzip compression, in our example the Node server does the compression on request. This is the same as using compression middleware in Express, koA-COMPRESS middleware in KOA, Nginx and Tomcat, which is also a common way to compress by the server.
The server learns that we have a gzip compression requirement, and it fires up its CPU to do it for us. The compression process itself is time consuming, you can understand that we at the cost of server compression time and CPU (and browser parsing compressed file cost), save some time in the transmission process.
If we build, the resource file directly packaged into a GZ compression package, in fact, it is also possible, which can save the server compression time, reduce some server consumption.
For example, we can use compression-webpack-plugin when using webpack packaging tool, and gzip packaging when building the project. For detailed configuration, you can refer to the documentation of the plug-in, which is very simple.
Added: Gzip file analysis
It was mentioned at the beginning that gzip is a compression program and not an algorithm. After gzip compression, the file format is. Gz.
Using the fs module of Node to read a gz package, you can see the following Buffer:
const fs = require("fs");
fs.readFile("vds.gz", (err, data) => {
console.log(data); //
});
Copy the code
Generally, a GZ package consists of a file header, a file body, and a file tail. Head to tail specially used to store some documents related information, such as we see the Buffer data above, DiYiErGe bytes to 1 f 8 b (hexadecimal), usually first two bytes 1 f 8 b can preliminary judgment a gz compressed package, but the specific still should see whether completely accord with gz file format, the third byte value range is 0 to 8. Currently, only 8 is used, indicating that the Deflate compression algorithm is used. Other information, such as the time of the modification and the file system in which the compression was performed, is also in the header.
The end of the file identifies some information about the size of the original data, and the compressed data is placed in the middle of the file body.
As mentioned above, for already compressed images, enabling gzip compression may actually make them bigger, because the actual compression volume in the middle is not much smaller, but the information related to the compression of the head and tail is added.
Added: Gzip compression algorithm
The file body in the middle of Gzip uses the Deflate algorithm, which is a lossless compression and decompression algorithm. Deflate is the default algorithm for zip files, used in 7Z, XZ, etc. Deflate is actually an algorithm for compressing data streams. It can be used anywhere you need streaming compression.
Deflate algorithm compression, generally using Lz77 algorithm compression, then using Huffman coding.
The principle of Lz77 is that if two pieces of a file have the same content, we can replace the latter piece with a pair of pieces of information, such as the distance between them, and the length of the same content. Because of the distance between the two, the size of the pair of pieces of information, the length of the same piece of information, is smaller than the size of the piece of information being replaced, the file is compressed.
Here’s an example:
http://www.
baidu.com
https://www.
taobao.com
As you can see from the above paragraph, some of the content before and after is the same. We can replace the content of the following text with the distance and the same character length of the same content in the previous text.
http://www.baidu.com (21, 12) taobao (23, 4)
Deflate uses an improved version of the Lz77 algorithm, which deflates only repeated strings of more than three bytes before encoding. Secondly, hash table is used in the match search. A HEAD array records the most recent matching position and a PREv linked list records the previous matching position of the hash value conflict.
While Huffman coding, because the understanding is not very clear, it is inconvenient to say more here, but it is generally understood that through the probability of character occurrence, high-frequency characters are represented by shorter bytes so as to achieve the compression of string.
In fact, a brief look at these algorithms, we can probably understand why js, CSS files can still get considerable compression optimization through gzip, even after the tool confusion compression.
For more information on the GZIP algorithm, read the following article:
GZIP compression principle analysis series
Added: Brotli compression
Thanks to Wangyjx1 for your comments, especially for taking a look at Brotli compression, and sharing some of what you learned here.
Brotli was launched by Google in 2015 for offline font compression on the Web, and later released as an enhanced version of Brotli that includes universal lossless data compression. Brotli is based on a modern variant of LZ77 algorithm, Huffman coding and second-order context modeling.
Unlike common general-purpose compression algorithms, Brotli uses a predefined 120-kilobyte dictionary. The dictionary contains more than 13,000 commonly used words, phrases, and other substrings from a large corpus of text and HTML documents. Predefined algorithms can improve the compression density of small files. Using Brotli instead of Deflate to compress text files typically increases compression density by 20%, while compression and decompression speeds remain roughly the same.
At present, this compression method is well supported by most browsers (including mobile terminal) in the new version, detailed support can be found in Caniuse.
Browsers that support Brotli use the content Encoding type BR. For example, here is the accept-encoding value in the Chrome request header:
Accept-Encoding: gzip, deflate, sdch, br
If the server supports the Brotli algorithm, the following response header is returned:
Content-Encoding: br
Brotli compression only works in HTTPS, because in HTTP requests, accept-Encoding: gzip, deflate does not have BR.
At present, the use of this compression scheme, to check the network requests of several major websites, foreign Google, Facebook, Bing have used Brotli compression. Domestic words Taobao, Baidu, Tencent, JINGdong, B station several big stations are basically not used, only I found the use of brotli compression you guess which website? Is to know hu, as expected have forced case, nuggets can consider to keep up. Fortunately, Tencent cloud, Ali cloud, and beat cloud this kind of CDN acceleration service providers have supported brotli compression.
There are no native modules in Node that support Brotli compression. You can use third-party libraries such as Iltorb to support brotli compression. If you are interested, you can try it yourself.
The last
I’m sorry that the title and content gave you the wrong impression, but the Nuggets are really strict. The reason for writing this article is just because I understand the working process of Gzip in Learning Node and want to share it with you. Did not dare to talk about the compression algorithm, because this place itself to learn, and for the front-end is not a must master knowledge points, the interview should not ask so deep, of course, interested in their own understanding. But since you want to mention them, I’ll just share what I know and welcome your criticism.
The article lists
- From modularization to NPM private warehouse construction
- 10 minutes for Thanos to snap his fingers and disappear
- It’s time to develop your own vscode extension
- A quick guide to using Babel
- Front-end work and learning related website collection
- Why is the video link address of video website blob?