This article is translated

Title: Phantom Dependencies

Rush Official Doc

IO /pages/advan…

Some history and theory

It is well known that packages can depend on other packages, and the resulting dependency graph is a directed acyclic graph from a computer science perspective. Unlike a tree, a directed acyclic graph allows overlapping rhomboid branches. For example, library A might introduce libraries B and C, but then both B and C might introduce D, creating A “diamond dependency” between the four packages. Traditionally, module parsers in programming languages look for imported packages by traversing the edges of the graph (eage), and (in another architecture) the packages themselves are found in a centralized repository for easy sharing across multiple projects.

For historical reasons, NodeJS and NPM have taken different approaches to physically representing dependency graphs on disk: NPM uses folder copies of real software packages as the vertices of the graph and subfolder relationships as the edges of the graph. But the branches of the folder tree cannot be overlapped to complete the diamond. To solve this problem, NodeJS adds a special parsing rule that has the effect of introducing additional graph edges (pointing to direct child folders of all parent folders). From a computer science perspective, this rule loosens the tree data structure of the file system in two ways:

  1. It can now represent some, but not all, directed acyclic graphs.
  2. We can fetch some extra edges that are not defined in the package’s dependency declaration. These extra edges are called “ghost dependencies.”

The solution adopted by NPM contains a number of unique features that differentiate it from traditional package managers:

  • Each (top-level) project has its own node_Modules tree, which contains a large number of copies of package folders. Even a very small NodeJS project may contain more than 10,000 files under its dependency directory.
  • In NPM 2.x, the node_modules folder tree is deep and repetitive, but ghost dependencies are kept to a minimum. The installation algorithm introduced by NPM 3.x flattens the entire tree, which eliminates a lot of duplicate dependencies at the cost of introducing more ghost dependencies (extra graph edges). In some cases, the algorithm also selects slightly older package versions (while still satisfying SemVer) to further reduce package folder duplication.
  • The installed Node_modules tree is not unique. There are many possible ways to arrange package folders from a tree structure to approximate a directed acyclic graph, and there is no unique “normalized” arrangement. What kind of dependency tree you end up with depends on the heuristic you use for the package manager. NPM’s own heuristic algorithm is even sensitive to the order in which packages are added.

Node_modules trees are an unusual and theoretically interesting data structure. But let’s focus first on these three results, which can cause real trouble, especially for large and active Monorepos. We’ll also show how Rush has improved — alleviating these issues was one of the original motivations for creating the Rush tool!

Ghost rely on

Ghost dependency refers to the use of packages in a project that are not defined in its package.json file. Consider the following example:

my-library/package.json

{
  "name": "my-library"."version": "1.0.0"."main": "lib/index.js"."dependencies": {
    "minimatch": "^ 3.0.4"
  },
  "devDependencies": {
    "rimraf": "^ 2.6.2." "}}Copy the code

But suppose the code looks like this:

** my-library/lib/index.js**

var minimatch = require("minimatch")
var expand = require("brace-expansion");  // ???
var glob = require("glob")  // ???

// (more code using those libraries)
Copy the code

Just a second… Two libraries are not defined as dependencies in the package.json file at all. So how does it work? It turns out that brace-expansion is the dependency of Minimatch and Glob is the dependency of Rimraf. At installation time, NPM will flatline their folder into my-library/node_modules. NodeJS’s require() function can find them in dependent directories because require() is not affected by package.json files when looking for folders at all. This may seem counterintuitive, but it works fine. Maybe this is a feature rather than a bug?

Unfortunately, the missing declaration for this project is best understood as a bug. Because it can lead to unexpected errors:

  • Incompatible versions: Although our library’s package.json states that it requires version 3 of Minimatch, we have no say in the version of brace-expansion. As long as it does not affect minimatch’s API signature, it is perfectly legal for The SemVer system to include a MAJOR upgrade of brace-expansion libraries in a PATCH release. In fact, as the developers of ** my-library**, we would probably never have encountered this problem — and it would have been discovered by a poor victim who installed our published library in a different node_modules arrangement, This arrangement may have newer (or older) constraint rules than the environment we typically test.
  • Loss of depend on: packageglobFrom ourdevDependenciesWhich means it will only bemy-libraryDeveloper installation. For other users,require("glob")The execution should fail immediately becauseglobThey won’t install it at all. As soon as we releasemy-libraryI knew it, right? Not necessarily. In fact, probably most users will be able to use it for some reasonglob(For example, they introduced it themselvesrimraf), so it may appearrequire("glob")Status of successful execution. Only a small percentage of users experience import errors, which makes the problems they report seem strange and unrepeatable.

How Rush can help: Rush’s symlinking strategy ensures that node_modules for each project will only contain its own declared direct dependencies. This helps to catch ghost dependencies immediately at build time. If you are using the package manager PNPM, the same protection policy is applied to all indirect dependencies (providing the ability to deal with any “bad” packages flexibly by using pnpmfile.js).

Ghost node_modules folder

Suppose we have a monorepo and someone adds a root-level package.json file like this:

my-monorepo/package.json

{
  "name": "my-monorepo"."version": "0.0.0"."scripts": {
    "deploy-app": "node ./deploy-app.js"
  },
  "devDependencies": {
    "semver": "~ 5.6.0"}}Copy the code

This way, by executing NPM run deploy-app, our script will automatically deploy all the projects in Monorepo. (If you’re using Rush, please don’t! You should define a custom command instead. Note that this hypothetical script needs to use the library semver and is added to the devDependencies list. People will be asked to execute NPM install in the repo root folder before executing NPM run deploy-app.

The final installation directory structure will look like this:

- my-monorepo/
  - package.json
  - node_modules/
    - semver/
    - ...
  - my-library/
    - package.json
    - lib/
      - index.js
    - node_modules/
      - brace-expansion
      - minimatch
      - ...
Copy the code

But recall that NodeJS’s module parser looks for dependencies in the parent folder. This means that our my-library/lib/index.js can call require(“semver”) and find the package semver, even if it doesn’t appear in my-library/node_modules. This is a potentially deeper case of accidentally picking up ghost dependencies — sometimes the node_modules you’re looking for might not even be in your Git working directory!

How Rush helps: Rush covers you. The rush install command scans all possible parent folders and warns if any ghost node_modules folders are found.