preface

We often use NPM or YARN. About the installation mechanism and design philosophy of NPM, as well as some NPM problems or operations during development, do we understand what is behind them?

NPM or Yarn in a project, in addition to being responsible for the installation and maintenance of dependencies, NPM scripts can connect various functional parts so that independent parts can operate automatically.

Both NPM and Yarn are very large systems, and you are likely to ask the following questions when using them:

  • Is it risky to delete node_modules and lockfiles and then install again when project dependencies fail?

  • Is it a problem to install all dependencies into Dependencies without distinguishing devDependencies?

  • Our application relies on public library A and B, and public library A relies on public library B. Will public library B be installed or repackaged multiple times?

  • In a project, both NPM and Yarn are used. What problems can this cause?

  • Should we submit the lockFiles to the project repository?

Let’s start with the installation mechanism of NPM

The design philosophy

The installation mechanism of NPM follows a different design philosophy than many package management tools.

It prioritises the installation of dependencies to the current project directory, allowing for a hierarchy of dependencies for different application projects and reducing API compatibility for package authors, but the drawbacks are obvious: If our projects A and B both rely on the same public library C, then public library C will typically be installed once in project A and once in project B. This means that the same dependency package can be installed multiple times on our computer.

The installation mechanism

Overall flow chart:

Check the config (. NPMRC)

After NPM install is executed, first check and obtain the NPM configuration, where the priority is: project-level.npmrc files > user-level.npmrc files > global level.npmrc files > NPM built-in.npmrc files.

Check the package – lock. Js

  1. Check if package-lock.json and package.json declare the same dependencies:

    • Consistent, direct usepackage-lock.jsonTo load dependencies from cache or network resources.
    • Inconsistent, according tonpmVersions are processed (differentnpmVersion processing will be different, as shown in the figure).
  2. If not: build the dependency tree recursively from package.json. Then download the complete dependency resource according to the built dependency tree. At download time, check whether the relevant resource cache exists:

    • If yes, decompress the cache content tonode_modules;
    • Otherwise, go firstnpmRemote repository downloads packages, verifies package integrity, and adds them to the cache while unzipping tonode_modules.
    • The last generationpackage-lock.json.

Building a dependency tree

Current dependencies, whether direct or subdependent, should be placed in the node_modules root directory (the latest version of the NPM specification) in preference to the flattening principle. In this process, it will judge whether the version of the module placed in the dependency tree conforms to the version range of the new module when encountering the same module. If so, it will skip; If not, place the module under node_modules of the current module (the latest version of the NPM specification).

Note: Because different VERSIONS of NPM have different rules, the same project team should ensure the same VERSION of NPM.

In front-end engineering, dependencies are nested dependencies, and the node_modules installation package in a small or medium-sized project can be quite numerous. If each install process is downloaded from the network, it will inevitably increase the time cost. Caching is a good solution to network download problems.

Caching mechanisms

Local caching of the same version of a dependency is a common design of contemporary dependency management tools. To use the command, run the following command:

npm config get cache
Copy the code

The root directory to get the configuration cache is in /Users/cehou/.npm (the default cache location for NPM in Mac OS). The _cacache file can be found in /Users/cehou/.npm. In fact, after NPM V5, cached data is stored in the _cacache folder in the root directory.

(_cacache file)

We can use the following command to clear the files in /Users/cehou/.npm/_cacache:

npm cache clean --force
Copy the code

You can click here to see the corresponding NPM source.

Next open the _cacache file to see what NPM caches. There are three directories:

  • content-v2

  • index-v5

  • tmp

Content-v2 is basically a bunch of binary files. To make the binaries readable, we changed the extension of the binaries to.tgz, then unpacked them, and the result is actually our NPM package resources.

In the index-V5 file, we do the same thing to get some descriptive files, which are actually the indexes of the files in Content-v2.

How are these caches stored and exploited?

This is associated with the NPM Install mechanism. When NPM install is executed, pacote unzips the package to the corresponding node_modules. NPM downloads dependencies to the cache first and then decompresses them to the project node_modules. Pacote relies on NPM-registry-fetch to download packages. Npm-registry-fetch can generate cached data according to IETF RFC 7234 in a given path by setting cache attributes.

Then, each time the resource is installed, a unique key is generated based on the integrity, version, and name information stored in package-lock.json, which corresponds to the cached record in the index-v5 directory. If the cache resource is found, it will find the hash of the tar package, then find the cache tar package according to the hash, and extract the corresponding binary file to the corresponding project node_modules through Pacote again, saving the cost of network download resources.

Note that the caching strategy mentioned here starts with the NPM V5 release. Prior to NPM V5, each cached module was stored directly as a module name in the ~/. NPM folder: {cache}/{name}/{version}.

conclusion

Understanding these relatively basic content can directly help developers to troubleshoot problems related to NPM. For more information about NPM, check out the official website of NPM.

Reference: Front-end Infrastructure and Architecture – LucasHC (Hou Ce)