preface

Webpack can almost always be found in modern mainstream front-end project development, and it seems to have become an indispensable part of front-end development today.

The figure below is the homepage of WebPack’s official website, which vividly shows the core function of WebPack: pack a bunch of modules with complex dependencies into orderly static resources.

The advent of Webpack, along with the support of ready-made scaffolding, allowed us to focus on project development without having to worry too much about the packaging process and results.

However, have you ever wondered how webpack packed JS code can be ordered to load execution?

Recall that in our projects using WebPack, any resource can be considered a module (as long as the corresponding Loader supports parsing), and the vehicle for a module is a file. However, after the project is packaged, the carrier of modules becomes functions, also known as module functions, and files become the carrier of chunks. The so-called chunk is a concept in Webpack, which is a collection composed of several modules. A chunk usually corresponds to a file.

In fact, to answer the above questions, we need to understand the relationship between packaged modules and modules, chunk and chunk, and module and chunk. Therefore, we will expand this article from the two dimensions of module and chunk.

The module

Since modules are the smallest unit of resource loading, we start with the simplest module loading.

Here is a basic Webpack4 configuration file.

// webpack.config.js
const path = require('path')

module.exports = {
  mode: 'production'.entry: {
    main: './src/main.js'
  },
  output: {
    filename: '[name].js'.chunkFilename: '[name].[contenthash:8].js'.path: path.resolve(__dirname, 'dist')},optimization: {
    // To make it easier to read and understand the packaged code, turn off code compression and module merging
    minimize: false.concatenateModules: false}}Copy the code

After executing the build command NPM run build, the project generates a dist folder containing a main.js file. Here’s what’s in the file. (All the sample code in this article has omitted some temporarily irrelevant code in order to focus the reader’s reading burden.)

(function(modules) { // webpackBootstrap
    // A place to cache loaded modules
    var installedModules = {};

    // The require function used to load the module
    function __webpack_require__(moduleId) {... }// Load the entry module (assuming the entry module id is 0)
    return __webpack_require__(0)}) ([...]. )Copy the code

The essence of the above code is an immediate call function expression (IIFE), where the function part is called webpackBootstrap and the argument passed in is an array or object containing the module’s functions.

webpackBootstrap

The following is the English meaning of bootstrap:

A technique of loading a program into a computer by means of a few initial instructions which enable the introduction of the rest of the program from an input device.

A technique of loading a program into a computer by means of initial instructions that allow the rest of the program to be imported from an input device.

If you still don’t understand it, you can simply understand it as a control center, responsible for everything starting, scheduling, execution, etc.

In webpackBootstrap, we define a module cache object (installedModules) to store loaded modules and a module loading function (__webpack_require__) to retrieve the module with the corresponding ID. Finally, the entry module is loaded to start the entire program.

Now let’s focus on module loading.

// Load the module
function __webpack_require__(moduleId) {
  // Check whether the module exists in the cache, if so, return directly
  if (installedModules[moduleId]) {
    return installedModules[moduleId].exports
  }

  // Initializes a new module and saves it in the cache
  var module = installedModules[moduleId] = {
    i: moduleId, / / module name
    l: false.// A Boolean value indicating whether the module is loaded
    exports: {} // The output object of a module, which contains each interface of the module output
  }

  // Execute the module function, passing in three arguments: the module itself, the module's output object, and the loading function. Define this as the module's output object
  modules[moduleId].call(
    module.exports,
    module.module.exports,
    __webpack_require__
  )

  // mark the module as loaded
  module.l = true

  // Returns the output object of the module
  return module.exports
}
Copy the code

The code above shows that the compiled module is loaded in accordance with the CommonJS specification. However, isn’t the CommonJS module specification a synchronous loading module that doesn’t work on the browser side? In fact, this is because the webPack compiled code ensures that the module has already been downloaded from the server when it is loaded, so there is no blocking due to synchronization requests. How this problem was solved will be explained in the chunk section below.

The following is a module loading flowchart.

There are a few things worth noting, though:

  • If the same module is loaded more than once, it will be executed only once, so you need to cache the loaded modules.
  • The new module is saved to the cache immediately after initialization, not after the module is loaded. This is actually to solve the problem of circular dependency between modules, that is, module A depends on module B, and module B depends on module A. This way, when an unfinished module is reloaded, the output object of the module (which may not contain the entire output interface of the module) is returned directly when the cache is checked to avoid an infinite loop.
  • In CommonJS modules, the top-level value of this ismodule.exports, so use the call function to define the module function’s this value asmodule.exports. But in the ES6 module, the top-level this is undefined, so this is converted to undefined at compile time.

Module functions

So what exactly do you do when you execute a module function? Simply put, you add the output interface of the loaded module to the output object.

Let’s look at a simple example of module functions (import/export from ES6 is used in this example because WebPack officially recommends using ES6 module syntax).

// src/lib.js
export let counter = 0

export function plusOne() {
  counter++
}

// SRC /main.js (entry module)
import { counter, plusOne } from './lib'

console.log(counter)

plusOne()
console.log(counter)
Copy the code

Here are the module functions packaged and compiled by Webpack.

(function(modules) {...// Step 1: Load the entry module
  return __webpack_require__(1}) ([/* moduleId: 0 */
    function(module, __webpack_exports__, __webpack_require__) {
      The ES6 module uses strict mode by default
      'use strict'

      // Step 1.1.1: Define the output interfaces on the output object
      __webpack_require__.d(__webpack_exports__, 'a'.function() {
        return counter
      })
      __webpack_require__.d(__webpack_exports__, 'b'.function() {
        return plusOne
      })

      // Step 1.1.2: Declare the values that define the output interface
      let counter = 0

      function plusOne() {
        counter++
      }
    },
    /* moduleId: 1 */
    function(module, __webpack_exports__, __webpack_require__) {
      'use strict'
      // Step 1.1: Load the lib.js module and return its output object
      var _lib__WEBPACK_IMPORTED_MODULE_0__ = __webpack_require__(0)
      // _lib__WEBPACK_IMPORTED_MODULE_0__ = {
      // get a() { return couter },
      // get b() { return pluseOne }
      // }

      // Step 1.2: Call the output interface on the output object
      console.log(_lib__WEBPACK_IMPORTED_MODULE_0__[/* counter */ 'a'])

      Object(_lib__WEBPACK_IMPORTED_MODULE_0__[/* plusOne */ 'b') ()console.log(_lib__WEBPACK_IMPORTED_MODULE_0__[/* counter */ 'a'])})Copy the code

In the code above, the ES6 module file is compiled into the module functions of the CommonJS specification. In keeping with the idiosyncrasies of ES6 module syntax, the compiled code is somewhat arcane. Here are a few things that are puzzling:

  1. What is the __webpack_require__.d function for?

    This function defines the interfaces of the output on the output object. But isn’t simple object attribute assignment enough to accomplish this task? This is because the ES6 module outputs read-only references to values.

    Here is an implementation of __webpack_require__.d.

    __webpack_require__.d = function(exports, name, getter) {
      // __webpack_require__.o is used to determine whether an output interface with the same name already exists on the output object
      if(! __webpack_require__.o(exports, name)) {Object.defineProperty(exports, name, { enumerable: true.get: getter })
      }
    }
    Copy the code

    The above code shows that when the interface of the ES6 module is output, the object.defineProperty method is used to define the attributes on the output Object and only the getters for the attributes are defined, thus making the output interface read-only. The output interface is referenced to the value through the getter of the property and the closure.

  2. Why always define the output interface at the top of the module function (except in some special cases of export default, such as export Default 1, where the output interface name is not explicitly specified)?

    This is because ES6 modules are compile-time output interfaces, in contrast to CommonJS modules that are run time loaded. The difference between the two is illustrated in the problem of cyclic loading between modules. So to emulate this feature of an ES6 module, you need to define the output interface name before the module loads the dependent module or performs any other operations.

  3. Why is it necessary to consume the output interface of the lib module from the output object each time (as in Step 1.2), rather than from the source code where the output interface is independent (as in the source code where the Couter variable is)?

    This is because the attribute on the output object is actually a getter function. If the attribute value is taken out and a variable is declared separately, the effect of closure will be lost. The change of the output interface value in the loaded module will not be tracked, and the ES6 module will lose the feature of reference to the value of the output interface. Using the example code above, the console normally prints 0 and 1 in sequence, but if you assign the compiled output interface value to a new variable, the console prints 0 twice.

The chunk (block)

If the size of a project’s code gets larger, then packing all the JS code into a single file is bound to run into performance bottlenecks, resulting in long resource loading times. This is where Webpack’s split Chunks technology comes in handy. Modules can be split into different chunk files based on different subcontracting optimization strategies.

Async Chunk

For some routes with low access frequency or components with low use frequency, they can be split into asynchronous chunks through lazy loading.

Asynchronous chunks can be obtained by dynamically loading modules by calling the import() method. Now let’s modify the main.js file to lazily load the lib module.

// src/main.js
import('./lib').then((lib) = > {
  console.log(lib.counter)

  lib.plusOne()
  console.log(lib.counter)
})
Copy the code

Here is the repackaged main.js file (showing only the new and changed code).

// dist/main.js
(function(modules) {
  // The function to execute after chunk completes the download
  function webpackJsonpCallback(data) {... }// An object used to mark the loading status of each chunk
  // undefined: chunk is not loaded
  // null: chunk preloaded/prefetched
  // Promise: Chunk is loading
  // 0: Chunk has been loaded
  var installedChunks = {
    0: 0
  }

  // The url of the chunk request containing the chunk name and the chunk hash
  function jsonpScriptSrc(chunkId) {
    return __webpack_require__.p + "" + ({}[chunkId]||chunkId) + "." + {"1":"3215c03a"}[chunkId] + ".js"
  }

  / / get the chunk
  __webpack_require__.e = function requireEnsure(chunkId) {... }// Chunk's public path, that is, output.publicPath in webpack configuration
  __webpack_require__.p = "";

  // A series of operations around webpackJsonp are described in detail below
  var jsonpArray = window["webpackJsonp"] = window["webpackJsonp"] | | [];var oldJsonpFunction = jsonpArray.push.bind(jsonpArray);
  jsonpArray.push = webpackJsonpCallback;
  jsonpArray = jsonpArray.slice();
  for(var i = 0; i < jsonpArray.length; i++) webpackJsonpCallback(jsonpArray[i]);
  var parentJsonpFunction = oldJsonpFunction;

  return __webpack_require__(0); }) ([/* 0 */
  (function(module, exports, __webpack_require__) {
    __webpack_require__.e(/* import() */ 1).then(__webpack_require__.bind(null.1)).then((lib) = > {
      console.log(lib.counter)

      lib.plusOne()
      console.log(lib.counter)
    })
  })
])
Copy the code

__webpack_require__.e

In the above code, the import(‘./lib’) in the entry module is compiled to __webpack_require__.e(1).then(__webpack_require__.bind(null, 1)), which is actually equivalent to the code below.

__webpack_require__.e(1)
  .then(function() {
    return __webpack_require__(1)})Copy the code

The code above consists of two parts, the first half of __webpack_require__.e(1) is used to load chunk asynchronously, and the second half of the then method is used to load the lib module synchronously.

This solves a browser-side execution jam caused by synchronously loading modules in the CommonJS specification.

/ / get the chunk
__webpack_require__.e = function requireEnsure(chunkId) {
  var promises = [];

  // Download JS chunk using JSONP
  var installedChunkData = installedChunks[chunkId];
  // 0 indicates that the chunk has been loaded
  if(installedChunkData ! = =0) {

    if(installedChunkData) { // Chunk is loading
      promises.push(installedChunkData[2]);
    } else {
      // Update the chunk status to loading in the chunk cache, and cache resolve, reject, and PROMISE
      var promise = new Promise(function(resolve, reject) {
        installedChunkData = installedChunks[chunkId] = [resolve, reject];
      });
      promises.push(installedChunkData[2] = promise);

      // Start preparing to download Chunk
      var script = document.createElement('script');
      var onScriptComplete;

      script.charset = 'utf-8';
      script.timeout = 120;
      if (__webpack_require__.nc) {
        script.setAttribute("nonce", __webpack_require__.nc);
      }
      script.src = jsonpScriptSrc(chunkId);

      // Create an error before the stack is unwound to get a useful stack trace later
      var error = new Error(a);// A callback function after the chunk download is complete (successful or abnormal)
      onScriptComplete = function (event) {
        // Prevent memory leaks in Internet Explorer
        script.onerror = script.onload = null;
        clearTimeout(timeout);
        var chunk = installedChunks[chunkId];
        if(chunk ! = =0) {
          if(chunk) {
            var errorType = event && (event.type === 'load' ? 'missing' : event.type);
            var realSrc = event && event.target && event.target.src;
            error.message = 'Loading chunk ' + chunkId + ' failed.\n(' + errorType + ':' + realSrc + ') ';
            error.name = 'ChunkLoadError';
            error.type = errorType;
            error.request = realSrc;
            // reject(error)
            chunk[1](error);
          }
          installedChunks[chunkId] = undefined; }};// Request processing timed out
      var timeout = setTimeout(function(){
        onScriptComplete({ type: 'timeout'.target: script });
      }, 120000);
      // Handle request successfully and exception
      script.onerror = script.onload = onScriptComplete;
      // Initiate a request
      document.head.appendChild(script); }}return Promise.all(promises);
};
Copy the code

Here’s how __webpack_require__.e executes:

  1. Promises variable to handle asynchronous loading of chunks (although the example above only needs to handle one JS chunk, some JS modules rely on CSS files, so loading JS chunks will also load their dependent CSS chunks. Asynchronous loading of multiple chunks is required, so this variable is an array.
  2. Check whether JS chunk has been loaded. If chunk has been loaded, skip to step 6; otherwise, continue.
  3. Determine if chunk is loading, if so, add the promise instance from the chunk cache (installedChunks) to promises array, then skip to step 6, otherwise proceed.
  4. A Promise instance is initialized to handle the chunk’s asynchronous loading process, and will consist of an array of the promise’s resolve and Reject functions and the Promise instance itself (i.e[resolve, reject, promise]) to the Chunk cache and add the Promise instance to promises array.
  5. Start preparing to download chunk, including creating script label, setting SRC, timeout and other properties of script, handling successful, failed and timeout events of script request, and finally adding script to Document to complete the sending of request.
  6. Execute and returnPromise.all(promises)After all asynchronous chunks are loaded successfully, the callback function in the THEN method (loading modules contained in chunk) is triggered.

One might wonder why resolve is not executed when chunk === 0 in onScriptComplete. Like this:

onScriptComplete = function (event) {... var chunk = installedChunks[chunkId];if(chunk ! = =0) {
    if(chunk) {
      ...
      // reject(error)
      chunk[1](error);
    }
    installedChunks[chunkId] = undefined;
  } else { // chunk === 0
    // resolve()
    chunk[0()}};Copy the code

The question is when does the asynchronous loading of chunk end? Is it over after the chunk download is complete? In fact, loading JS Chunk consists of two parts: downloading the chunk file and executing the chunk code. The chunk is not loaded until both are completed. Therefore, resolve is stored in the chunk cache and executed after the chunk code completes to end the asynchronous loading process. Although a script’s load event is triggered after it has been downloaded and executed, the load event is only concerned with the download itself and can be fired even if the script throws an exception during execution.

webpackJsonpCallback

When js chunk is downloaded successfully, it starts executing the code. Here is the chunk packaged by the lib.js module.

/ / dist / 1.3215 c03a. Js
(window["webpackJsonp"] = window["webpackJsonp"] || []).push([[1], [/* 0 */./ * 1 * /
  (function(module, __webpack_exports__, __webpack_require__) {

    "use strict";
    __webpack_require__.d(__webpack_exports__, "counter".function() { return counter; });
    __webpack_require__.d(__webpack_exports__, "plusOne".function() { return plusOne; });
    let counter = 0

    function plusOne() {
      counter++
    }

  })
]]);
Copy the code

Initialize window[“webpackJsonp”] as an array (if it has not been initialized before). The other is to add an array to the Window [“webpackJsonp”] array using a push operation (the expression is less precise, see below). The array of arguments is made up of two arrays, the first of which is a collection of chunkId. (Normally, this array contains only the current chunkId. But if the subcontracting strategy is wrong, the array may contain multiple chunkId), and the second array is a collection of module functions.

However, the native push operation simply adds the chunk’s data to the array. So where does WebPack actually do the data processing? And how to do the processing?

If you remember, there’s an operation around the Window [“webpackJsonp”] array in webpackBootstrap above.

var jsonpArray = window["webpackJsonp"] = window["webpackJsonp"] | | [];var oldJsonpFunction = jsonpArray.push.bind(jsonpArray);
// Replace the window["webpackJsonp"]. Push method with webpackJsonpCallback
jsonpArray.push = webpackJsonpCallback;
jsonpArray = jsonpArray.slice();
// Call webpackJsonpCallback for all previously loaded data in the initial chunk
for(var i = 0; i < jsonpArray.length; i++) webpackJsonpCallback(jsonpArray[i]);
// Assign the window["webpackJsonp"] array's native push method to the parentJsonpFunction variable
var parentJsonpFunction = oldJsonpFunction;
Copy the code

As can be seen from the code above, all chunks of data (except the chunk where webpackBootstrap resides) are loaded and processed by JSONP (calling webpackJsonpCallback). However, webpackJsonpCallback is not declared until the chunk where webpackBootstrap resides is loaded, so the data is temporarily stored in the window[“webpackJsonp”] array. Replace the push method of the window[“webpackJsonp”] array with the webpackJsonpCallback function. We call webpackJsonpCallback directly to process the data, and then call webpackJsonpCallback with the data previously stored in the window[“webpackJsonp”] array.

function webpackJsonpCallback(data) {
  var chunkIds = data[0];
  var moreModules = data[1];

  var moduleId, chunkId, i = 0, resolves = [];
  // Fetch the resolve function of the promise corresponding to each asynchronous chunk and mark the chunk state in the chunk cache as loaded
  for(; i < chunkIds.length; i++) { chunkId = chunkIds[i];if(installedChunks[chunkId]) {
      resolves.push(installedChunks[chunkId][0]);
    }
    installedChunks[chunkId] = 0;
  }
  // Add modules contained in chunk to the Modules object of webpackBootstrap
  for(moduleId in moreModules) {
    if(Object.prototype.hasOwnProperty.call(moreModules, moduleId)) { modules[moduleId] = moreModules[moduleId]; }}// Add chunk data to window["webpackJsonp"] using the native push method of the window["webpackJsonp"] array
  if(parentJsonpFunction) parentJsonpFunction(data);

  Execute resolve to fulfill the promise corresponding to each chunk and trigger the callback function in THEN
  while(resolves.length) { resolves.shift()(); }};Copy the code

The webpackJsonpCallback function mainly performs two processes on the chunk data: caching and ending the asynchronous loading process.

The cache consists of two layers: one is the cache of the chunk loading status to avoid sending multiple requests to the same chunk, and the other is the cache of module functions to facilitate the later loading of modules.

To end the asynchronous loading process of chunk, execute the resolve function in the chunk cache.

Initial Chunk

For the modules that need to be loaded in the initial stage of the website, they can be divided into several initial chunks such as core basic class library, UI component library and business code according to the size, sharing rate and updating frequency of the modules.

To get multiple initial chunks, tweak the main.js file and the webpack.config.js configuration.

// src/main.js
import * as _ from 'lodash'

const arr = [1.2]
console.log(_.concat(arr, 3[4]))

// webpack.config.js (based on the webpack configuration above)
module.exports = {
  ...
  optimization: {
    ...
    splitChunks: {
      cacheGroups: {
        vendors: {
          test: /[\\/]node_modules[\\/]/.name: 'vendors'.chunks: 'all'.priority: 10,},}}}}Copy the code

In the code above, the main.js module relies on the LoDash library and splits the LoDash library into a separate chunk, so add splitChunks to the Optimization object of the WebPack configuration. Used to split the Lodash library into a chunk called Vendors.

Here is the code added to webPackage Bootstrap after it is packaged.

(function(modules) { // webpackBootstrap
  function webpackJsonpCallback(data) {... var executeModules = data[2];
    // If the loaded chunk has an entry module, add it to the deferredModules array
    deferredModules.push.apply(deferredModules, executeModules || []);

    return checkDeferredModules();
  }

  // Check whether the chunk on which the entry module depends is loaded. If so, load the entry module. Otherwise, no operation is performed
  function checkDeferredModules() {
    var result;
    // Iterate over all entry modules
    for(var i = 0; i < deferredModules.length; i++) {
      var deferredModule = deferredModules[i];
      var fulfilled = true;
      // Check whether all chunks on which the entry module depends are loaded
      for(var j = 1; j < deferredModule.length; j++) {
        var depId = deferredModule[j];
        if(installedChunks[depId] ! = =0) fulfilled = false;
      }
      // If all chunks dependent on the entry module are loaded, the entry module is loaded
      if(fulfilled) {
        deferredModules.splice(i--, 1);
        result = __webpack_require__(deferredModule[0]); }}return result;
  }

  var deferredModules = [];

  // Add the entry module to the deferredModules array
  // The first element in the array is the entry module ID, and the following elements are the ids of the initial chunk that the entry module depends on
  deferredModules.push([1.1]);

  returncheckDeferredModules(); }) (...).Copy the code

When there was only one initial chunk, the chunk contained all the modules required for the initial phase, so when it was downloaded, the entry module could be loaded directly. However, when the module is split into multiple initial chunks, the entry module can be loaded only after all initial chunks are loaded and all modules required for the initial stage are ready. Therefore, the only difference is that the loading time of the entry module has been deferred.

So in the code above, both the webpackBootstrap and webpackJsonpCallback function call checkDeferredModules at the end, Ensure that all chunks after loading are checked to see if any entry modules have met the requirements (that is, all the initial chunks dependent on them have been loaded). If any entry modules have met the requirements, the entry module will be loaded.

summary

This article really answers the question: how does webpack js code work? At the heart of the answer are two things: module loading and chunk loading. The former blocks synchronously and the latter asynchronously non-blocks. When you know how to work the two together in harmony, you are close to the complete answer.