This article looks at the Commonjs module mechanism from a historical perspective

1. The Commonjs specification

1.1 Commonjs starting point

In the early stage of JS development, it mainly shines in the browser environment. Due to the early standardization of ES specification, the scope covered is relatively small. However, in practical application, the performance of JS depends on the support degree of the host environment for ES specification. It will be a web page from age into the era, and there are more and more powerful in the ES standard API, in the browser, there is more and more powerful API for js calls, thanks to the major browser vendors to support, however, the browser’s update iteration and the upgrading of the API only appear on the front end, The back-end JS specification is far behind, for JS itself, its specification is still very weak, there are some serious defects, such as: there is no module standard.

The Commonjs specification is proposed to make up for the lack of a modular standard for JS, so as to achieve the basic ability to develop large-scale applications like other languages (such as Java and Python), rather than stay in the stage of scripting. They expect applications written with the CommonJS specification to be able to execute across host environments (browser environments), so that they can write not only web applications using JS, but also servers, command-line tools, and even desktop applications.

Theory and practice always influence and promote each other. Node can appear in a mature attitude due to the influence of Commonjs specification. Similarly, on the server side, Commonjs can be written into the projects of various companies in a common attitude, which is also due to the excellent performance of Node. The relationship between Commonjs components and ES specifications:

Node uses Commonjs modularity specifications to implement a set of modules that are very easy to use.

1.2 Commonjs module specification

Commonjs module definition is very simple, mainly divided into module reference, module definition, module identifier three parts.

1.2.1 Module Reference

Example code for a module reference:

const fs = require('fs');
Copy the code

In the specification, there is the require() method, which takes the module identifier and uses it to import a module’s API into the current context.

1.2.2 Module Definition

Exports is a module object that represents the module itself. Exports is a module property. In Node, a file is a module. You can define how to export by mounting methods on exports objects as properties:

exports.add = function () {
    / /...
};
Copy the code

In another file, after importing the module through the require() method, we can call methods or properties:

const math = require('math');
const result = math.add(10.20);
Copy the code

1.2.3 Module Identification

The module id is simply the argument passed to require(), which must be a string with a small camel name, or a string with a. And.. The relative or absolute path at the beginning, which may not have the filename suffix.js

The module definition is very simple, the interface is also very simple, its significance lies in the accumulation of methods or variables is limited to the scope of the private use, at the same time support and export function is introduced to smooth the cohesion of different modules (documents), each module has independent space, they each other, at the time of reference is clean.

2. Module implementation of Node

While exports, require֖ and Module in the specification sound straightforward, what exactly Node goes through to implement it needs to be known:

To introduce modules into Node, you need to go through three steps: path analysis, file location, and compilation execution

It should be noted that in Node, modules are divided into two types. One is the built-in module of Node, called core module; The other is user-written modules called file modules.

  • Core modules in the compilation process of Node’s source code, compiled into the binary file, the process starts up, part of the core module is loaded directly into memory, the introduction of this part of the core module, file positioning and compile executable can omit these two steps, and in the process of path analysis of judging priority, so this part of the loading speed is the fastest.
  • The file module is dynamically loaded at runtime, requiring a complete path analysis, file location, compilation and execution process, slower than the core module.

Next, let’s analyze the module loading process in detail:

2.1 Preferentially load from cache

Until then, it’s important to know that just as browsers cache static files to improve performance, Node also caches imported modules to reduce the overhead of re-importing. The difference is that the browser only caches files, whereas Node caches compiled objects.

The require() method applies cache first to all secondary loads of the same module, regardless of whether it is a core module or a file module. And the cache check of the core module takes precedence over that of the file module.

2.2 Path Analysis and File Location

Because module identifiers come in several forms, module lookups and locations vary to varying degrees for different identifiers.

2.2.1 Module Identifier Analysis

As mentioned earlier, the require() method takes an identifier as an argument. Identifiers in Node fall into the following categories:

  • Core modules (built-in modules), such as HTTP, FS, PATH, etc
  • File modules with absolute or relative paths starting with /
  • A file module that is not a path, such as a custom module

2.2.1.1 Core modules

The core module is second in priority to cache loading and is compiled to binary during source compilation of Node, which is the fastest loading process.

If you try to load a custom module with the same identifier as the core module, it will not succeed. If you write an HTTP user module and want it to load successfully, you must choose a different identifier or use a different path.

2.2.1.2 File module

Identifiers beginning with. And/are treated as file modules. When analyzing a file module, the require() method converts the path to the real path and uses the real path as an index to store the compiled results in the cache for faster secondary loading.

Because the file module gives Node the exact location of the file, it saves a lot of time in the lookup process and loads more slowly than the core module.

2.2.1.3 User-defined modules

A custom module is a non-core module and is not an identifier in the form of a path. It is a special file module, which may be in the form of a file or package. This type of module lookup is the most time-consuming and the slowest of all.

Paths: Create a js file in any directory and print module.paths: Create a js file in any directory and print module.paths:

console.log(module.paths);
Copy the code

Then execute the code and get the following result:

It can be seen that the content of the module path is represented as an array of paths. The generation rules of the array are as follows:

  • Node_modules directory in the current file directory.
  • Node_modules directory in the parent directory.
  • Parent directory Node_modules directory in the parent directory.
  • Parent directory Node_modules in the parent directory of the parent directory.
  • Recurse up the path to the node_modules directory in the root directory.

It is generated in a very similar way to the lookup of JS’s prototype chain or scope chain. During loading, Node tries each path in the module path until it finds the target file. As you can see, the deeper the current file path, the more time it takes to find a module, which is why custom modules are the slowest to load.

2.2.2 Locating files

The optimization strategy of loading from cache eliminates the need for path analysis, file location and compilation execution during reloading, greatly improving the efficiency of reloading modules. But in the file location process, there are some details need to pay attention to, which mainly includes file extension analysis, directory processing:

2.2.2.1 Suffix Analysis:

  • During the analysis of an identifier, require() does not include a file extension in the identifier. The CommonJS module specification also allows for no file extension in the identifier, in which case Node will try to complement the extension in the order of.js,.json, and.node.

  • During the attempt, the FS module needs to be called synchronously blocking to determine whether the file exists. Because Node is single-threaded, this is where performance issues can arise. Here’s a quick tip: if it’s.node and.json files, adding the file suffix to the identifier passed to require() will speed things up a bit. Another trick: Synchronization with caching greatly alleviates the blocking calls of Node in a single thread.

2.2.2.2 Directory Analysis:

  • When analyzing an identifier, require() may not find a file by analyzing the file extension, but instead get a directory. This is often the case when introducing custom modules and mode-by-module look-throughs, Node treats the directory as a package.

  • Along the way, Node provides some support for the CommonJS package specification. First, Node looks for package.json in the current directory, parses the package description object through json.parse (), and extracts the file name specified by the main property to locate it. If the filename lacks an extension, the suffix analysis step is performed.

  • If the main property specifies the wrong file name, or if there is no package.json file at all, Node uses index as the default file name and looks for index.js, index.json, and index.node in sequence.

  • If no file is located during directory analysis, the custom module searches the next module path. If the module path array is traversed and the target file is still not found, a search failure exception is thrown.

2.3 Module Compilation

In Node, each file module is an object, defined as follows:

function Module(id, parent) {
    this.id = id;
    this.exports = {};
    this.parent = parent;
    if (parent && parent.children) {
        parent.children.push(this);
    }
    this.filename = null;
    this.loaded = false;
    this.children = [];
}
Copy the code

Compilation and execution are the final stages in introducing a file module. Once the file is located, Node creates a new object, loads it and compiles it according to the path. The loading method varies for different file extensions, as shown below.

  • .js file. Read files synchronously through fs module and compile and execute.
  • The node file. This is an extended file written in C/C++, loaded through the dlopen() method, and finally compiled into the resulting file.
  • Json file. After reading the file synchronously through the FS module, the returned result is parsed with json.parse ().
  • Other extension files. They are all loaded as.js files.

Each successfully compiled module will cache its file path as an index on the Nodule. Cache object to improve the performance of secondary imports. Node invokes different reads depending on the file extension. You can see how extensions are already loaded in your system by accessing require.Extensions in your code. Write the following code to test it:

console.log(require.extensions);
Copy the code

The results are as follows:

As you can see, there are three handlers that we can convert to a string and print:

console.log(Object.values(require.extensions).toString());
Copy the code

The results are as follows:

// Handle.js files
function (module, filename) {
    if (filename.endsWith('.js')) {
        const pkg = readPackageScope(filename);
        if (pkg && pkg.data && pkg.data.type === 'module') {
            const parentPath = module.parent && module.parent.filename;
            const packageJsonPath = path.resolve(pkg.path, 'package.json');
            throw newERR_REQUIRE_ESM(filename, parentPath, packageJsonPath); }}const content = fs.readFileSync(filename, 'utf8');
    module._compile(content, filename);
},
// Handle the.json file
function (module, filename) {
    const content = fs.readFileSync(filename, 'utf8');
    if (manifest) {
        const moduleURL = pathToFileURL(filename);
        manifest.assertIntegrity(moduleURL, content);
    }
    try {
        module.exports = JSONParse(stripBOM(content));
    } catch (err) {
        err.message = filename + ':' + err.message;
        throwerr; }},// Handle the.node file
function (module, filename) {
    if (manifest) {
        const content = fs.readFileSync(filename);
        const moduleURL = pathToFileURL(filename);
        manifest.assertIntegrity(moduleURL, content);
    }
    return process.dlopen(module, path.toNamespacedPath(filename));
}
Copy the code

The above three functions are how Node compiles.js files,.json files, and.node files respectively during module compilation.