In-depth understanding of Webpack packaging chunking (Part 1)

preface

One of the problems we have to deal with as the front-end code becomes more and more heavy is that the front-end code is getting bigger and bigger. This results in long waits for compilation to complete, both at modality and at live, and users have to spend extra time and bandwidth downloading larger script files.

However, if you think about it, this is entirely avoidable: should a change in one line of code at development time require the entire script to be repackaged? Does a user need to download a script for the entire site just to skim the page? So the trend must be to break up code and make it available to users on demand, strategically. The recent popularity of microfronds follows this principle to some extent (but not entirely for this reason)

Fortunately, the tools we have now give us the power to do this. For example, Webpack allows us to block scripts when packaging; Using the browser cache we can load resources in a targeted way.

In searching for best practices, the biggest question FOR me was not whether we could do it, but how we should do it: what features should we take to split scripts? What caching strategy should we use? Does lazy loading and chunking work the same way? How much of a performance boost can you expect from a split? Most importantly, how do we start when there are so many options and tools and uncertainties? This article is to sort out and answer the above questions. The content of the article is generally divided into two aspects: on the one hand, the strategy of module separation is formulated; on the other hand, the scheme is implemented technically.

The 100% Correct Way to Split Your chunks with Webpack This article is a step-by-step guide for developers to split and optimize code step by step, so it serves as a clue to this article. At the same time, on the basis of it, I will do vertical expansion of Webpack and other knowledge points, the implementation of the scheme.

The following opening text

According to the Webpack glossary, there is a separation of the two types of files. These nouns sound interchangeable, but they’re not:

Bundle splitting: Create more, smaller files for better caching (but still load one request per file)
Code splitting: Dynamically loading Code, so users only need to download the part of the site they are currently viewing

The second strategy sounds more appealing, doesn’t it? In fact, many articles have assumed that this is the only scenario where small file splitting of JavaScript files is worthwhile.

But I’m here to tell you that the first strategy is more valuable for many sites and should be the first thing you do for your page

Let’s dig deeper

Bundle VS Chunk VS Module

Before we start coding, we need to clarify some concepts. For example, we use the word “chunk” throughout this article, and how it differs from the words “bundle” and “module” that we often refer to.

Unfortunately, even after consulting a lot of data, I still can’t get an exact standard answer. Therefore, I choose a definition that I personally agree with to share it here. The most important thing is that I hope it can play a role of unified caliber

First of all, there is no objection to the concept of “module”, which refers to the code fragments that we consciously encapsulate and organize in the process of coding. In the narrow sense, we first associate with the fragmented React component, or CommonJS module or ES6 module, but for Webpack and Loader, modules in the broad sense also include styles and images, and even different types of files

A bundle, on the other hand, is a single file into which all the relevant code is packaged. If you don’t want to put all your code into one package, you can divide it up into multiple packages, known as “chunks.” From this perspective, “block” equals “package”, which is the organization and encapsulation of another layer of code. If we have to make a distinction, usually when we talk about bundle, we refer to a single file into which all modules are packaged, and chunk refers to a collection of modules according to some rule. Chunk is larger than a single module and smaller than the entire bundle

(But if you look closely, Chunk is the technical term Webpack uses to manage the packaging process, and can even be divided into different types of chunks. I don’t think we need to understand it that way. Just remember the definition in the previous paragraph.)

Bundle splitting

The idea behind packaging separation is very simple. If you have a huge file and change only one line of code, the user still has to download the entire file again. But if you split it into two files, the user only needs to download the modified file and the browser can load the other file from the cache.

It’s important to note that because packaging separation is related to caching, it makes no difference to first-time visitors to the site

I think too much of the performance discussion is about the first visit to the site. Maybe that’s partly because the first image is important, and partly because it’s easy and neat to measure.)

Quantifying performance gains is a little trickier when it comes to frequent visitors, but we must!

This will require a table where we record the results of each scenario and each strategy combination

Let’s assume a scenario:

Alice visited the site once a week for 10 weeks
We update the site weekly
We update the Product List page weekly
We also have a “Product Details” page, but we don’t need to update it at this time
In week 5 we added an NPM package to the site
In week 8 we updated an existing NPM package

Of course, some people, myself included, wanted the scene to be as realistic as possible. But it doesn’t really matter, and we’ll explain why later.

Performance baseline

Let’s say our JavaScript packs a total of 400KB, name it main.js, and load it as a single file

We have a Webpack configuration similar to the following (I have removed the extraneous configuration items) :

const path = require('path');

module.exports = {
  entry: path.resolve(__dirname, 'src/index.js'),
  output: {
    path: path.resolve(__dirname, 'dist'),
    filename: '[name].[contenthash].js',}};Copy the code

When there is only a single entry, Webpack automatically names the result main.js

(For those new to caching: Whenever I refer to main.js, I’m actually talking about something like main.xmepwxho.js that contains a bunch of hash strings with file contents. This means that a new file name is generated when your application code changes, forcing the browser to download a new file.)

So every week when I publish new changes to the site, the package’s Contenthash changes. So that every week Alice visits our site and has to download a whole new 400KB file

Ten weeks in a row is 4.12MB

We can do better

Hash and performance

I don’t know if you really understand the above statement. A few points need to be clarified here:

Why do file names with hash strings affect the browser cache?
Why is the hash suffix in the file namecontenthash? If thecontenthashreplacehashorchunkhashWhat’s the impact?

To prevent the browser from redownloading the same file each time it is accessed, we typically set cache-control in the HTTP header returned by this file to max-age=31536000, which is a year (in seconds). This allows users to access the file for up to a year without sending a request back to the server and instead reading it directly from the cache until the cache is cleared or manually cleared.

What if I change the contents of the file halfway through and have to let the user download it again? Just change the file name. Different file names correspond to different cache policies. A hash string is a “signature” generated based on the contents of the file, and every time the contents of the file change, the hash string changes and the file name changes with it. The cache policy for the old version of the file will be invalidated, and the browser will reload the new version of the file. Of course, this is just one of the most basic caching strategies, but for a more complex scenario, see my previous article designing a Watertight browser caching solution: Ideas, details, ServiceWorker, and HTTP/2

So filename: [name]:[contenthash].js is configured in Webpack to automatically generate a new filename every time it is published.

However, if you know anything about Webpack, you should know that Webpack also provides two other hash algorithms for developers to use: Hash and chunkhash. So why not use them instead of contenthash? It starts with the difference between them. In principle, they serve different purposes, but in practice, they can be used interchangeably.

For the sake of illustration, let’s start with this very simple Webpack configuration, which has two packaging entries and extracts additional CSS files, resulting in three files. We used the hash identifier in the filename configuration and contenthash in the MinCssExtractPlugin, why this is the case will be explained later.

const CleanWebpackPlugin = require("clean-webpack-plugin");
const MiniCssExtractPlugin = require("mini-css-extract-plugin");

module.exports = {
  entry: {
    module_a: "./src/module_a.js".module_b: "./src/module_b.js"
  },
  output: {
    filename: "[name].[hash].js"
  },
  plugins: [
    new MiniCssExtractPlugin({
      filename: "[name].[contenthash].css"}})];Copy the code

hash

Hash is for every build, and the files generated after each build have the same hash. It is concerned with changes to the overall project, and if any of the file contents are changed, the hashes of other files will change after the build.

Obviously this is not what we need, if the contents of the Module_A file change, the hash of the packaged file for Module_A should change, but module_B should not. This will cause users to have to re-download the module_B package file that has not changed

chunkhash

Chunkhash is based on the changes of the contents of each chunk. If the contents of the chunk are changed, only the hashes of the output files of this chunk will change, and the other files will not. That sounds like the answer to our needs.

Earlier we defined chunk as a small unit of code aggregation. In the preceding example, the file is represented by entry, which means that each entry corresponds to a chunk. We’ll see more complex examples in later examples

contenthash

As the name implies, the hash is based on the contents of the file. In this sense, chunkhash and chunkhash can be substituted for each other. So in the “performance baseline” code the authors used Contenthash

What’s special about it, or the instructions I’ve read about it, is that if you want to use the hash tag in the ExtractTextWebpackPlugin or MiniCssExtractPlugin, you should use contenthash. But in my own testing, using hash or Chunkhash also worked fine (perhaps because the Extract plugin is strictly content-based? But isn’t Chunk?)

Separate the vendor class library

Let’s split the package file into main.js and Vendor.js

It’s simple, something like:

const path = require('path');

module.exports = {
  entry: path.resolve(__dirname, 'src/index.js'),
  output: {
    path: path.resolve(__dirname, 'dist'),
    filename: '[name].[contenthash].js',},optimization: {
    splitChunks: {
      chunks: 'all',}}};Copy the code

Is Webpack 4 doing its best to do this without you telling it how you want to split the package

This leads to some voices saying, “Amazing, Webpack is doing a great job!”

While others say, “What did you do to my doggy bag?”

Anyway, add optimization. SplitChunks. Chunks = ‘all’ configuration that is to say: “put all the things in the node_modules vendors. ~ the main js file”

After implementing the basic packaging-unpacking conditions, Alice still needs to download the 200KB main.js file on each visit, but only the 200KB vendors. Js scripts need to be downloaded in weeks one, five, and eight

Or 2.64 MB

The volume is reduced by 36%. Not bad for five new lines of code in the configuration. You can try it right away before you continue reading. If you need to upgrade Webpack 3 to 4, don’t worry, the upgrade is painless (and free!).

Separate each NPM package

Our vendors. Js suffers from the same problem as starting the main.js file — partial modification would mean redownloading all the files

So why not split each NPM package into separate files? It’s very simple to do

Let’s separate our React, Lodash, redux, moment, etc into different files


const path = require('path');
const webpack = require('webpack');

module.exports = {
  entry: path.resolve(__dirname, 'src/index.js'),
  plugins: [
    new webpack.HashedModuleIdsPlugin(), // so that file hashes don't change unexpectedly].output: {
    path: path.resolve(__dirname, 'dist'),
    filename: '[name].[contenthash].js',},optimization: {
    runtimeChunk: 'single'.splitChunks: {
      chunks: 'all'.maxInitialRequests: Infinity.minSize: 0.cacheGroups: {
        vendor: {
          test: /[\\/]node_modules[\\/]/,
          name(module) {
            // get the name. E.g. node_modules/packageName/not/this/part.js
            // or node_modules/packageName
            const packageName = module.context.match(/[\\/]node_modules[\\/](.*?) (/ / \ \ | $) /) [1];

            // npm package names are URL-safe, but some servers don't like @ symbols
            return `npm.${packageName.replace(The '@'.' ')}`; },},},},},},};Copy the code

The documentation is pretty good at explaining what’s going on here, but I still need to explain the subtleties because they took me quite a while to figure out

Webpack has some not-so-smart default “smart” configurations, such as allowing a maximum of three files when unpacking output files, and a minimum file size of 30KB (splicing them together if smaller files exist). So I’ve overwritten all of these configurations
cacheGroupsThis is where we use rules to tell Webpack how chunks should be organized into packaged output files. I’m here for all the loading fromnode_modulesThe Module in “Vendor” makes a rule called “vendor”. Usually, you only need the output file for yournameDefine a string. But I think ofnameDefined as a function (to be called when the file is parsed). In the function I return the package name based on the module path. As a result, I get a separate file for each package, for examplenpm.react-dom.899sadfhj4.js
In order to be able to release normallyThe name of the NPM package must be a valid URLSo we don’t needencodeURIEscape the noun of the package. But I ran into a problem. NET server does not include in the name@The file provides the file service, so I replaced it in the code snippet
The entire step of configuration is set up without maintenance — we don’t need to refer to any libraries by name

Alice redownloads 200KB of main.js every week, and she still downloads 200KB of NPM on her first visit, but she doesn’t have to download the same package twice

Or 2.24 MB

This is a 44% reduction from the baseline, which is a really cool piece of code that you can paste and copy from the article.

I wonder if we can go beyond 50%?

Isn’t that great

Wait, what’s going on with that Webpack configuration code

The question you may be asking at this point is, how does the configuration in the Optimization option isolate vendor code?

The next section explains the Optimization option for Webpack. Personally, I am not an expert in Webpack, and the configuration and corresponding description functions are not all verified, and not all covered. Please forgive me if there is any mistake.

The optimization configuration, as its name suggests, is designed to optimize code. If you look closely, most of the configuration is again under the splitChunk field, because it indirectly uses the SplitChunkPlugin to split chunks (these are new mechanisms introduced in Webpack 4). The CommonsChunkPlugin was used in Webpack 3 and is no longer used in 4. In general, SplitChunksPlugin has only one function: split — to split code. Separation is the separation of a single large file into several smaller files, as opposed to packing all modules into a single file.

When we initially separated the Vendor code, we used only one configuration

splitChunks: {
  chunks: 'all',},Copy the code

Chunks have three options: Initial, Async, and All. It indicates whether synchronous, async, or all code modules should be separated first. Asynchronous here refers to modules loaded dynamically (import()).

The key word here is priority. Take Async as an example. Suppose you have two modules A and B, both of which reference jQuery, but module A also introduces LoDash via dynamic loading. So in async mode, when the plugin is packaged, the chunk of Lodash ~for~ A. s will be separated, while jQuery, the common module of A and B, will not be separated. So it may also exist in both the packaged a.bundle.js and b.bundle.js files. Because async tells plug-ins to take precedence over dynamically loaded modules

Next, focus on the second paragraph that separates the Webpack configuration for each NPM package

MaxInitialRequests and minSize are truly plug-ins that flatter themselves. If the size of the chunk to be separated is less than 30KB, the chunk will not be separated. The maximum number of chunk requests for parallel downloads is set to 3. You can override both parameters by overwriting the minSize and maxInitialRequests configuration. Note that maxInitialRequests and minSize are in the splitChunks root directory, which we’ll call global configuration for now

The cacheGroups configuration is the most important, allowing custom rules to separate chunks. And each cacheGroups rule allows you to define the chunks and minSize fields mentioned above to override the global configuration (or to ignore the global configuration by setting the Enforce parameter to true).

The cacheGroups default vendors configuration, which separates the library modules in node_modules, is as follows:

cacheGroups: {
  vendors: {
    test: /[\\/]node_modules[\\/]/.priority: - 10
  },
Copy the code

If you don’t want to use its configuration, you can either set it to false or override it. Here I chose to override and added the additional configuration name and enforce:

vendors: {
  test: /[\\/]node_modules[\\/]/.name: 'vendors'.enforce: true,},Copy the code

Finally, there are two configurations that do not appear above but are still commonly used: Priority and reuseExistingChunk

ReuseExistingChunk: This option only appears in the separation rule of cacheGroups, meaning that existing chunks are reused. For example, Chunk 1 owns modules A, B, and C. Chunk 2 owns modules B and C. If reuseExistingChunk is false, the plugin creates a separate chunk for us at packaging time named common~for~1~2, which contains public modules B and C. If this value is true, the plugin will not create new modules for us because chunk 2 already has common modules B and C
Priority: It’s easy to imagine configuring multiple chunk separation rules in cacheGroups. What if the same module matches multiple rules at the same time? Priority solves this problem. Note that all default values of priority are negative, so the custom priority must be greater than or equal to 0

summary

So far, we’ve seen a pattern for separating code:

First decide what kind of problem we want to solve (avoid users downloading extra code every time they visit);
Decide which solution to use (by separating out repetitions of code that change infrequently and with appropriate caching strategies);
Finally decide what to implement (separate the code by configuring Webpack)

This article is also published in my Zhihu column, welcome your attention

The resources

Bundle VS Chunk

What are module, chunk and bundle in webpack?
Concepts – Bundle vs Chunk
SurviveJS: Glossary

Hash

What is the purpose of webpack hash and chunkhash?
Hash vs chunkhash vs ContentHash
Adding Hashes to Filenames

SplitChunksPlugin

Webpack 4 — Mysterious SplitChunks Plugin
Webpack (v4) Code Splitting using SplitChunksPlugin
Reduce JavaScript Payloads with Code Splitting
Webpack v4 chunk splitting deep dive
what reuseExistingChunk: true means, can give a sample?