This article is translated

NPM Doppelgangers

Rush Official Doc

IO /pages/advan…

This article continues with the chapter “Ghost dependency.” It is advisable to read ghost Dependency before reading this article.

How did NPM doppelganger arise

Node_modules data structures are sometimes forced to install two copies of the same version of the same package. Is it true? How does this happen?

Suppose we have A main project A that looks something like this:

{
  "name": "library-a"."version": "1.0.0"."dependencies": {
    "library-b": "^ 1.0.0"."library-c": "^ 1.0.0"."library-d": "^ 1.0.0"."library-e": "^ 1.0.0"}}Copy the code

While B and C both depend on F1:

{
  "name": "library-b"."version": "1.0.0"."dependencies": {
    "library-f": "^ 1.0.0"}}Copy the code
{
  "name": "library-c"."version": "1.0.0"."dependencies": {
    "library-f": "^ 1.0.0"}}Copy the code

But D and E depend on F2:

{
  "name": "library-d"."version": "1.0.0"."dependencies": {
    "library-f": "^ 2.0.0." "}}Copy the code
{
  "name": "library-e"."version": "1.0.0"."dependencies": {
    "library-f": "^ 2.0.0." "}}Copy the code

Node_modules trees can be shared by placing F1 at the top of the tree, but then F2 must be copied in subdirectories:

- library-a/ - package.json - node_modules/ - library-b/ - package.json - library-c/ - package.json - library-d/ - Package. json -node_modules / -library-f / -package. json <-- [email protected] -library-e / -package. json -node_modules / -library-f / -package. json <-- [email protected] - library-f/ -package. json <-- [email protected]Copy the code

Alternatively, the package manager can choose to place F2 at the top and F1 will be copied:

- library-a/ - package.json - node_modules/ - library-b/ - package.json - node_modules/ - library-f/ - package.json <-- [email protected] -library-c / -package. json -node_modules / -library-f / -package. json <-- [email protected] -library-d / Json -library-e / -package. json -library-f / -package. json <-- [email protected]Copy the code

In any case, we cannot arrange the dependency tree without two copies of the same version of Library-F. We call these copies “doppelgangers.” Traditional package managers in other programming languages do not face this problem; This is a very special aspect of the NPM node_modules tree. This is an inherent and inevitable problem in NPM design.

The consequences of being in two places at once

It is rare for small projects to encounter the problem of split bodies, but it is very common in large Monorepos. Here are some of the possible consequences:

  • Slower installation: Disk space has not been too expensive these days, but imagine that you have 20 libraries that depend on F1, which results in 20 copies being duplicated. Or suppose you have a post-execution script that downloads and unzips a large archive (PhantomJS, for example) and executes it individually for each clone. This can seriously affect the time it takes you to install dependencies.
  • Package size surges: Web projects often use a packer such as WebPack, which analyzes staticallyrequire()Statement and consolidate the code into a separate package file for deployment. This file should be as small as possible, because its size directly affects the performance of your Web application. When a doppelganger unexpectedly appears (e.g., due to one timenpm installThe operation has been rebalancednode_modulesTree), which results in two copies of the library being embedded into a packaged file, greatly increasing the file size.
  • Not a single case: assuminglibrary-fOne API exposes a cache object that is ready to be shared as a singleton with all users of the library. When two different components are calledrequire("library-f"), they might get two different library instances, which means that two instances of singletons might pop up (in other words, the underlying “global” variable is assigned to two different closures). This can lead to very strange behavior that is difficult to debug.
  • Duplicate types: Assume that library-f is a TypeScript library. The compiler encounters all *.d.ts files that the library copies repeatedly. For example, each class has two copies of its own definition, and since they are separate physical files, there is no sign linking to reduce duplication. In general, the same class definition cannot be converted arbitrarily in TypeScript, and mixing together can cause compilation errors. Typescript 2.x introduces a heuristic algorithm to detect and unify these declarations, but this involves additional complexity and processing. But then the build task becomes too inscrutable.
  • ** Semantically different dopes ** : hypothesisFThere is a dependencyGGAlso used by other packages in the dependency tree. In the tree,F1The first copy of theBUnder the searchGAnd theF1The second copy of theCStart the search.require()The algorithm will find different versions from two different starting pointsG. That means two thingsF1The runtime behavior of the instance may be different. Or at compile time, ifFExport a TypeScript class inherited fromGOne of the base classes in, oursThe same class of the same version and packageYou end up with a different function signature. This can lead to highly confusing compiler errors.

How Rush can help: Rush’s symlink strategy eliminated dependent doppelgangers in monorepo’s local project. If you are using NPM or Yarn as a package manager, any indirect dependencies are unfortunately still likely to be duplicates. However, if you use PNPM in conjunction with Rush, the dophyganger problem is completely solved (because PNPM’s setup model completely simulates a true directed acyclic graph).