- Of SVG, Minification and Gzip
- Anton Khlynovskiy
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: lsvih
- Proofread by: HuskyDoge, Atuooo
How to reduce the size of resource files
Smaller files mean faster downloads. It is therefore beneficial to make resource files smaller before sending them to the client.
Streamlining and compressing resource files is not only a great thing to do, but also something every modern developer should try to do. However, tools for streamlining often don’t do it perfectly; The compressor used for compression will be good or bad depending on the data used for compression. Here are some tips and tricks to tune these tools to work best.
The preparatory work
We’ll take a simple SVG file as an example:
The content of the < SVG > image is a 10×10 pixel area (viewBox) containing two 6×6 squares (
). The original file size was 176 bytes and gzip compressed to 138 bytes.
Of course this image is not artistic, but it is enough to satisfy the purpose of this article and prevent it from becoming a long article.
Step 0: Svgo
Run svgo image.svg to compress directly.
(Carriage return and indentation are added for ease of reading)
You can obviously see that rect has been replaced with PATH. The path path shape is defined by its D property, followed by a string of commands similar to the Canvas draw function that controls a virtual pen movement to draw. The command can be absolute displacement (move to x,y) or relative displacement (move x,y in a certain direction). Take a closer look at one of these paths:
M 0 0: the starting point of the path is the coordinate (0, 0) H 6: move horizontally to the right 6 px v 6: move vertically down 6 px h 0: move horizontally to x = 0 z: close the path – move back to the starting point of the path
How accurate the path squares! And it’s much more compact than the RECt element.
In addition, #f00 has been changed to red, which is also missing a byte!
The file size is now 135 bytes, or 126 bytes after gzip compression.
Step 1: Scale as a whole
You may have noticed that all coordinates in both paths are even. Can we divide them all by 2?
The image looks the same as before, but it’s shrunk twice. Therefore, we can scale the viewBox to make the image as big as before.
The file size is now 133 bytes, or 124 bytes after gzip compression.
Step 2: Use a non-closed path
Let’s go back to the path. The last command in both paths is Z, or “closed path.” But the path is closed implicitly when filling, so we can remove these commands.
Another 2 bytes are missing and the file size is now 131 bytes, 122 bytes after gzip compression. As a matter of common sense, the smaller the number of raw bytes, the smaller the size that can be compressed. Now we have saved 4 gZIP bytes after SvGo.
You may be wondering: Why doesn’t SvGo automate these optimizations? The reason is that zooming the image and deleting the z command at the end is not safe. Look at the following example:
Here are some graphs with stroke. From left to right are: original graph, not closed case, not closed and scaled case.
The line width is totally messed up. Thankfully, we know we don’t need to use line widths. But Svgo does not know this, so it must keep the graph secure and avoid unsafe transformations.
Now it looks like you can’t delete anything from the code. XML syntax is strict, all attributes are now required, and their values cannot be quoted.
You thought it was over? No, this is just the beginning.
Step 3: Reduce the number of letters
Now, let me introduce a very convenient tool: Gzthermal. It analyzes files that need to be gzip compressed and colors the raw bytes that are encoded. Bytes that are better compressed are green, and data that is less compressed is red, plain and simple.
Pay attention to the D attribute again, especially the M command marked red. We can’t delete it, but we can replace it with the relative displacement m2 squared.
The initial “pointer” position is the origin of the coordinate axes (0, 0), so moving to (2, 2) is the same as moving from the origin (2, 2). Let’s try:
The original file is still 131 bytes, but after gzip compression it is only 121 bytes. What happened? The answer is…
Huffman Trees
Gzip uses the DEFLATE compression algorithm, which is based on the Huffman tree.
The core idea of Huffman coding is to use fewer bits to encode symbols that occur more often, and vice versa, symbols that occur less often need to occupy more bits.
That’s right, bits, not bytes. DEFATE treats a byte of characters as a series of bits, and DEFLATE treats them equally whether a byte contains 7, 9, or 100 bits.
For example, the string “Test” is encoded according to the letters that appear in it: 00 T 01 E 10 S 11 T
The string “Test” encoded for each symbol can be represented as: 00011011, a total of 8 bits.
Then we change the “T” at the beginning of it to a lowercase “test” and try again: 0 T 10 E 11 s
The letter T appeared more often, and its code became shorter, at just 1 bit. This string is encoded to be 010110, which is only 6 bits!
The same is true for the M letter in our SVG. After making it lowercase, the entire code contains no uppercase M, which can be removed from the tree, so the average code length can be shorter.
When you write GZip-friendly code, you should use more frequently used characters. Even if you can’t make the code shorter, it will consume fewer bits when compressed.
Step 4: Backreferences
The DEFLATE algorithm also has another feature: fallback references. Some code points do not encode directly, but tell the decoder to copy some of the most recently decoded bytes.
So instead of encoding the original bytes over and over again, it can be referenced directly: return n bytes forward, copy m bytes for example:
Hey diddle diddle, the cat and the fiddle.
Hey diddle < 7, 7 > * * * *, the cat and 12, 5 > < * * * * f * * * * 24, 5 > <.
Subtly, GzThermal also has a special mode that only shows back references. Gzthermal-z will display the following images:
Normal text bytes are orange, and backreference bytes are blue. The following animation is more intuitive:
The second path uses fallback references almost entirely, except for the fill value, the M command, and finally the H command. There’s nothing we can do about fill and M, because the second square does have a different color and position.
But the shape is the same, and we now have a much clearer picture of Gzip. Therefore, we can replace both absolute displacement commands H0 and H2 with relative displacement commands: H-3.
Now, the two separate fallback references are combined into a single file size of 133 bytes, or 119 bytes after gzip. Although we added 2 bytes before compression, gzip results in 2 bytes less!
We only care about the compressed size: 99.9% of clients use gzip or Brotli when transferring resources. Brotli, by the way.
Brotli compression algorithm
Brotli is an algorithm introduced in 2015 to replace GZIP (from 1992) in browsers. However, it is similar to Gzip in many ways: it is also based on Huffman encoding and backreferencing, so all the previous adjustments we made for Gzip can be equally beneficial to Brotli. Finally, let’s apply Brotli to all the previous steps:
Original file size: 106 bytes after step 0 (svGO) : 104 bytes after step 1 (viewBox) : 105 bytes after Step 2 (using a non-closed path) : 113 bytes after step 3 (lowercase M) : 116 bytes after step 4 (related command) : 102 bytes
As you can see, the resulting file is smaller than after SVGo. This goes to show that the cool work we did for Gzip applies to Brotli as well.
However, the file size in the middle step is messy, and Brotli’s compressed file becomes larger. After all, Brotli isn’t GZIP, it’s a new algorithm in its own right. Although there are some similarities with Gzip, there are differences.
The biggest difference is that Brotli has a predefined dictionary built in, which is used for context heuristics when coding. In addition, Brotli’s minimum fallback reference size is 2 bytes (gzip can only create fallback references of 3 bytes or more).
Brotli is arguably more unpredictable than Gzip. I’d like to explain what causes “compression degradation”, but Brotli doesn’t have tools like Gzip’s Gzthermal and defDB. I had to rely on its specifications and a trial-and-error approach to debugging.
Trial and error
Let’s try again. This time you will change the color inside the Fill property. Obviously red is shorter than #f00, but perhaps Brotli will compress it with a longer fallback reference.
Gzip is 120 bytes compressed, and Brotli is 100 bytes compressed. The gzip stream is 1 byte longer and the Brotli stream is 2 bytes shorter.
At this point, it performs better in Brotli and worse in Gzip. I mean, that’s fine! It is almost impossible to optimize data for all compressors at once and get the best results. Solving compressor problems is like working on a bad Rubik’s cube.
conclusion
All of the tweaks described above are not limited to scenarios where SVG is compressed to GZIP.
Here are some guidelines to help you write more compressed code:
- Compressing smaller source data may result in smaller compressed data.
- Fewer different characters means less entropy. And the lower the entropy, the better the compression.
- Frequently occurring characters are compressed in smaller bytes. Removing unusual characters and making common characters more common can improve compression efficiency.
- Long, repetitive chunks of code can be compressed into a few bytes. DRY (the “Don’t repeat Yourself” principle) is not necessarily the best option in every situation, and sometimes repeating yourself can lead to better results.
- Sometimes larger source data can yield smaller compressed data. Reducing entropy allows the compressor to better remove redundant information.
You can find all of these resources, compressed images, and more in this GitHub repo.
Hope you enjoyed this article. Next time we’ll talk about how to compress normal JavaScript code with JavaScript code in Webpack bundles.
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.