One of Ryan’s ten regrets about Node. js is that it supports node_modules. Although node_modules can meet most scenarios, it still has various defects, especially in the field of front-end engineering, which causes a lot of problems. And possible improvements
The term
-
Package: Contains package.json, a package defined using package.json, usually corresponding to a module, but may not include module, For example, bin specifies a shell script, or even any file (using Registry as an HTTP server, or using UNpkg as a CDN), a package can be a tar package, or a local file protocol, Even git repository addresses
-
Module: any module that can be loaded by require is called a module. It is called a package only if the module contains package.json.
-
A folder containing package.json with the main field
-
A folder containing index.js
-
Any JS file
Synthesis: Module does not have to be package, package does not have to be module
Dependency Hell
Now there are two dependencies A and C in the project. A and C depend on different versions of B respectively. How to deal with this
There are two problems here
- First of all, B itself supports multi-version coexistence, as long as B itself has no side effects, this is natural, but for many libraries such as core-JS that pollute the global environment, do not support multi-version coexistence itself. Therefore, we need to report the error as soon as possible (conflict warning and runtime conflict check).
- If B itself supports multiple versions, you need to ensure that A is correctly loaded to B V1.0 and C to B v2.0
Let’s focus on the second question
NPM solution
The node solution relies on the node load module path lookup algorithm and node_modules directory structure to work together
How do I load the package from node_modules
The core is to recursively look up the package in node_modules. If require(‘bar.js’) is called in the ‘/home/ry/projects/foo.js’ file, Node.js will look it up in the following order:
/home/ry/projects/node_modules/bar.js
/home/ry/node_modules/bar.js
/home/node_modules/bar.js
/node_modules/bar.js
The algorithm has two cores
- The dependency of the nearest node_modules is read first
- Recursively look up node_modules dependencies
This algorithm simplifies the solution of Dependency Hell, but also brings a lot of problems.
Node_modules directory structure
nest mode
Using the feature require first looks up dependencies in the nearest node_module, we can think of a very simple way to maintain the topology of the original module directly on node_module
So mod-a uses mod-B version 1.0, and mod-c uses mod-B version 2.0, but this brings up another problem, If we rely on a mod-d that also relies on the 2.0 version of mod-b, node_modules will look like this
We found a problem here. Although mod-a and Mod-d rely on the same mod-B version, mod-B is installed twice. If your application has a lot of third-party libraries that rely on basic third-party libraries like LoDash, You’ll find that your node_modules is full of duplicate versions of Lodash, causing a huge waste of space and also making NPM install slow, which is notorious for node_modules hell
flat mode
We can also take advantage of the ability to recursively look up dependencies and put some common dependencies in a common node_module
According to require’s search algorithm
- A and D will first look for B in their node_module and find that B does not exist. Then they recursively look up and find the V1.0 version of B, which meets the expectation
- C will first find its own node_module to find B v2.0, as expected
At this time, we found that the depdency hell was solved and the repeated dependency problem caused by NEST mode of NPM2 was avoided
doppelgangers
But the problems didn’t end there. If D is dependent on Bv2.0 and E is dependent on Bv1.0, we find that either Bv2.0 or Bv1.0 is placed at top level, it will lead to another version of any duplication problem, such as the duplication problem of Bv2.0 here
Is there a problem with duplicate versions?
You might say that duplication of versions is a waste of space, and that this only happens when versions conflict, which doesn’t seem to be a problem, and it is, but it can still cause problems in some cases
Global types conflict
Although the code before each package does not pollute each other, their types can still affect each other. Many third-party libraries will modify the global type definition, typical of which is @types/ React. The following is a common error
The reason for the error is that the global types cause naming conflicts, so duplicate versions can cause global type errors. The general solution is to control which @types/ XXX loads are included.
Break the singleton pattern
Require’s caching mechanism
The first time a module is loaded, node caches the result. Subsequent require calls return the same result. However, node’s require cache is based not on the module name but on the resolve file path. It is case-sensitive, which means that even if your code looks like it is loading the same version of the same module, if the pathnames are resolved differently, it will be treated as different modules, which can cause problems if you do side effects on that module at the same time.
Take the example of React-loadable, which is used in both browser and Node layers
Do you use in the browser
The node layer using
Then bundle browser into bundle.js and load the compiled bundle.js code in the Node layer, although both the Node layer and browser access ‘react-loadable’. If webpack compilation involves path rewriting, even though the react-loadable version is the same, node and Browser will not loada react-loadble export object. Unfortunately, react-loadable relies heavily on Node and Browser exporting the same object. Because the Node layer reads READY_INITIALIZERS in browser Settings, if node and Browser export objects that are not the same, the read will fail
Another problem is the use of git submodules. Git submodules can easily cause multiple versions to coexist in an environment, such as multiple React versions.
Phantom dependency
We find that flat mode saves a lot of space compared with Nest mode, but it also brings a problem, namely phantom depdency. Check the following projects
We write the following code
Both glob and brace-expansion are not in our Depdencies, but they work well both in development and run time (because this is rimraf’s dependency). Once the library is released, Because the devDepdency of our library is not installed when the user installs our library, this causes the user to run an error. We call it Phantom Depdencies when a library uses a package that does not belong to its Depdencies. Phantom Depdencies will not only exist in the library, but will be more serious when we use Monorepo to manage the project. Not only could a package introduce phantom dependencies from DevDependency, but it could also introduce dependencies from other packages, which could cause problems when we deploy or run the project.
Doppelganger and phantom dependency don’t seem to work well with NODE_modules based on YARN or NPM. The essence is that NPM and YARN simulate the original Depdency graph through node resolve algorithm and node_modules tree structure. Is there a better simulation method to avoid the above problems?
Semver when ideal meets reality
NPM uses semantic versions of package versions, and Semver itself is a solution for Depdency Hell. If your project introduces more and more third party dependencies, you will face a dilemma
- If you write die for your each version, so if an underlying depend on the need of repair or upgrade, you are difficult to assess the influence of the upgrade will repair, this could lead to a cascade, collaboration with any package may hang up, cause the whole system needs to be full of regression testing, the final result of a thoroughly locked version may be the whole application, Never make any upgrades again
Therefore, Semver is mainly proposed to control the scope of influence of each package, so as to achieve smooth upgrade and transition of the system. NPM will install the latest dependencies that meet the constraints according to Semver’s limits every time it is installed.
Then each time NPM install will install the latest dependency that meets the “^4.0.0” constraint, which may be the 4.42.0 version. World peace would be if all libraries perfectly complied with semantic versions, but the reality is that many libraries do not comply with Semver for a variety of reasons, including
- Unpredictable bug. The patch version was originally released thinking that a certain version was just Bugfix, but the patch introduced unexpected breaking change, leading to the destruction of Semver
- Semver’s design is so idealistic that even the smallest bugfix can lead to a blurring of breaking change if the business side inadvertently relies on the bug
- Semver doesn’t make much sense. Typescript, for example, officially admits that semver semantics are never followed, In fact, typescript often introduces various breaking changes in the minor version. github.com/microsoft/T…
Lock the muti
So how do you deal with this in the real world? You don’t want your code to work locally, but die when you go online.
In the gap between your test completion and the launch of your business, if one of your dependencies doesn’t follow Semver and produces breaking change, you may have to go online in the middle of the night to check for bugs. We found that the root of the problem was how to ensure that the code at test time was exactly the same as the code at launch.
Write the dead version directly
A natural thought was that I would just kill off my third-party dependencies
However, the problem is not so simple. Although you lock the version of Webpack, the dependency of Webpack cannot be locked. If a dependency of Webpack does not follow Semver’s Breaking change, our application will still be affected. Unless we ensure that all third parties and their dependencies are written dead, which means the entire community abandons Semver, this is obviously impossible.
yarn lock vs npm lock
A more reliable way is to lock dependencies in a project and third-party dependencies at the same time. Yarn lock and NPM lock support this function. A common lock file is as follows
For example, our project has the Express dependency installed
The lock file is as follows
We found that all express dependencies and their dependent versions are locked in the lock file, so that another user or environment can reproduce the versions of each library in node_modules using the lock file.
Unable to cover the lock, however there are some scene when we first create the project installation or install a rely on for the first time, this time even if the third party database contains the lock file, but the NPM install | (yarn install) will not go to read the third party relies on the lock, As a result, users can still trigger bugs the first time they create a project. This is very common in the global CLI installation scenario. It is common to encounter that the last global CLI installation worked well, but the cli of this version hangs when you reinstall it. It is likely that breaking change has occurred in one of the upstream dependencies of the CLI version, because there is no global lock, so there is currently no good solution.
Resolutions fire Captain
If you install a new Webpack-CLI one day and find that the webpack-CLI does not work properly, you find that a recent version of the CLI that relies on PortFinder has a bug, but the author of the CLI is on vacation and cannot fix the CLI in time. But what happens when a project is rushed online? Yarn provides a called classic.yarnpkg.com/en/docs/sel… A mechanism that allows you to override dependency restrictions and lock PortFinder to a bug-free version as an emergency
NPM itself does not provide resolution, but a similar mechanism can be implemented through the nPm-froce-Resolution library
Should the library commit the LOCK file
NPM and YARN do not read lock files from third-party libraries. Do you need to provide lock files when you write libraries?
I don’t know if you’ve ever found a bug in a third-party inventory, downloaded the library, prepared to fix it, issued a Mr, NPM install && NPM build operation, and then saw a bunch of compiler errors that didn’t make any sense. These errors are most likely caused by a breaking change that the compiler relies on upstream. After some time in Google + StackOverflow, it is still not fixed. This problem can be largely avoided if the library developer submits the lock for the current compilation environment.
determinism !!!
Determinism refers to the same node_modules topology every time you reinstall a given package.json and lock file. In fact, YARN ensures the certainty of the same version, but not of different versions. NPM ensures the certainty of different versions.
Version certainty! == Topology deterministic
We said earlier that yarn.lock guarantees that all third-party libraries and their dependent versions are locked. Although versions are guaranteed, yarn.lock does not actually contain any node_modules topology information
In the example above, the lock file only guarantees the version of THE HAS-Flag and suppors-colors, but does not guarantee whether the HAS-Flag is in the top level or in the supports color. Both of the following topologies are valid
The first kind of
The second,
In contrast, NPM’s Lock information contains topology information
The above structure shows that HAS-Flag and SUPPORTS -color are at the same level
Define -property and IS-Accessor-descritpor dependencies are placed on node_modules in base
The relative path is harmful
Topological matters
The locking version + depdency topology is not a problem in most cases, even if the node_modules topology is inconsistent, but it can still be a problem in some cases. The following code actually makes a strong assumption about the topology of ndoe_modules, which can be problematic if @types is in a wrong position.
This also requires that we do not use any relative path when reading third party dependencies. Instead, we should read the module’s path through require.resolve, and then look it up based on that path. Another problem with which directory is relative to the path, is that it doesn’t make sense to take Babel as an example, when we compile code with Babel, there are usually three directories involved
- The current workspace: the return value of process.cwd(), in this case my-project
- The current code location: my-project/build.js
- The location of the compilation tool: XXX /node_modules/@babel/core
The question is who the @babel/preset-env position is relative to, it all depends on the internal implementation in Babel /core.
Monorepo: Link is hard
If the dependency issues in third-party libraries are somewhat manageable, they are magnified when we enter the Monorepo space. There are two serious problems when we manage multiple packages with a single repository
- The problem of repeated installation of third-party dependencies. If the same version of LoDash is used in both packageA and packageB, the same loDash version needs to be repeatedly installed in both packages without optimization
- Link hell: If A depends on B, and B depends on C and D, we need to do C and Dlink to B every time we develop. If the topology is very complex, it is unacceptable to do these link operations manually
Both LERNA and YARN are core working mechanisms
- Install all package dependencies to root-level node_modules in flat mode as far as possible, namely hoist, so as to avoid repeated installation of third-party dependencies for each package, which will have conflicting dependencies. Install node_modules in your own package to resolve dependency version conflicts
- Soft link each package to root-level node_modules, so that each package can use node’s recursive lookup mechanism, import other packages, do not need to manually link
- Link the bin soft link of node_modules in each package to node_modules at root level to ensure that the NPM script of each package can run normally.
While this approach solves the core problems of dependency duplication and Link Hell, it introduces other problems
- PackageA can easily import packageB, even if packageA does not declare packageB as its dependency, or packageA can easily import packageB’s third-party dependencies. This actually amplifies Phantom Dependency
- Dependencies in packageA are more likely to conflict with third-party dependencies in packageB, such as packageA using WebPack3 and packageB using WebPack4, This is easy to conflict with, and actually exacerbates the doppelgangers problem
Hoist is not safe. Check the following configuration
In fact, the htML-webpack-plugin relies on Webpack at runtime
Because of the hoist, react-scripts calls the html-webpack-plugin, which in turn calls webpack, according to node’s resolve algorithm, The webpack version in the nearest node_modules (webpack@2) is used first, but after the hoist, the root-level webpack version (webpack@1) is used according to the proximity principle, which causes a runtime error. For YARN and NPM, hoist is used preferentially, and only when the local version conflicts with root is hoisted (you can’t even determine which version is hoisted to root level when there are multiple versions). This problem is not limited to Webpack, esLint, Jest, Babel, etc., can affect anything that involves Core and its plug-ins. React has implemented a preflight check github.com/facebook/cr… React-scripts node_modules is used to check the consistency of Babel and webpack versions in the react-scripts node_modules and their ancestors node_modules
The React deprecated monorepo support due to the flaws of the Hoist itself github.com/facebook/cr… There is also a more radical mode to revert YARN, which is –flat mode. In this mode, each package in node_modules can only exist one version at a time. If version conflicts occur, You need to specify a version yourself (by specifying it in Resolution), which is obviously not possible in large projects because there are a lot of version conflicts in third-party libraries (160 + in WebPack alone). To illustrate the seriousness of Doopelganges, forcing all versions to be specified does not solve the problem.
PNPM: Explicit is better than implicit.
Without considering cyclic dependencies, Our actual depdency graph is actually a kind of directed acyclic graph (DAG), but what NPM and YARN simulate through the file directory and Node resolve algorithm is actually a superset of directed acyclic graph (with many errors between ancestor nodes and sibling nodes). This caused a lot of problems, so we needed a simulation that was closer to DAG. PNPM takes this approach by simulating DAG more accurately to solve the problem of YARN and NPM agents.
phantom dependency
PNPM only writes explicitly written dependency dependencies to root-level node_modules, which avoids the error of introducing implicit dependencies in business. Resolve phantom Dependency as shown in the following example
However, we can use the Debug module in our code, because it was introduced by Express, even though we didn’t explicitly introduce it ourselves
// src/index.js
const debug = require('debug')
Copy the code
If One day Express decided to replace the Debug module with the better-Debug module, our code would die.
The structure of NPM
The structure of the PNPM
We found that node_modules only has express module at the top level, but no debug module, so we can’t introduce debug errors into business code. Meanwhile, each third-party library has its own node_modules directory. Each node_modules directory has its own depdency soft chain, which ensures that debug versions can be loaded correctly in Express.
doppelgangers
PNPM has solved the Phantom Depdency problem as well as the Doopelganger problem. Look at the following code
/ / package. Json {" dependencies ": {" debug" : "3", "express" : "4.0.0", "koa" : "^ 2.11.0"}}Copy the code
After installing the dependencies using PNPM, we found two versions of Debug in the project
dependencies: ├─┬ send 1.1.02 └─┬ serve 1.1.02 ├─┬ send 1.1.02 ├─┬ serve 1.1.02 └─ ├ ─ 1.02, 1.02, 1.02, 1.02Copy the code
Check the version in node_modules. Unlike YARN, PNPM places different versions in the same layer and selects versions through the soft chain, while YARN places different versions in different layers and relies on the recursive search algorithm to select versions
We found that node_modules of PNPM contains three versions, with different modules connected to each version
In this way, even if version conflicts occur, you only need to link each module, and do not need to install each module repeatedly.
We can see that PNPM avoids the recursive lookup dependency nature of relying directly on node_modules, instead solving the phantom Dependency and Doppelgangers problem directly via soft links. Because it completely avoids the problem of duplicate packages, it saves a lot of space and speeds up the installation. Take a Monorepo project as an example
PNPM: node_modules 359M, 20s installation time YARN: node_modules 1.2G, 173s installation time Difference is significant
global store
PNPM not only ensures that each version of all packages in a project is unique, it even ensures that you can share unique versions between your different projects (just by using the public store), which can greatly save disk space. The core is that PNPM no longer relies on node’s algorithm to recursively look up node_modules, because the algorithm is strongly dependent on the physical topology of node_modules, which is the root cause of the difficulty in reuse of node_modules by projects of different projects. (Another way to do this is to use the code to write the dependent version number, which is deno’s way of doing this)
Cargo: the package management system for the global store
In fact, with the exception of node NPM, few other languages require each project to maintain a node_modules dependency (ever heard of node_modules hell), and few other languages have this recursive lookup dependency. So many other languages use the global store management system. We can take a look at how Rust does package management. Creating a New Rust project is as simple as running it
$rust new hello-cargo // create a project containing an executable binary $rust new hello-lib --lib // create a lib,Copy the code
Its generated directory structure is as follows
. ├ ─ ─ Cargo. Toml └ ─ ─ the SRC └ ─ ─ main. RsCopy the code
The functions of Cargo. Toml and package.json are almost identical (Tom supports annotations compared to JSON), including the following information
[package] name = "hello_cargo" version = "0.1.0" // authors = ["Your name <[email protected]>"] edition = "2018" [dependencies]Copy the code
SRC /main.rs is the main entry of the project, similar to index.js
// src/main.rs fn main() { println! ("Hello, world!" ); }Copy the code
Cargo has built in rust compilation (compared to the rich tools in the JS ecosystem, cargo built in RuSTC compilation is obvious, all third-party libraries only need to provide the source code, cargo does the first-party recursive compilation itself)
$cargo run // Run the binary fileCopy the code
Similar to NPM, cargo supports git and File dependencies.
[dependencies] time = "0.1.12" rand = {git = "https://github.com/rust-lang-nursery/rand.git"Copy the code
When you run build to install the dependency, you find a Cargo. Lock file, similar to yarn.lock, that contains deterministic versions of third-party libraries and their dependencies
# This file is automatically @generated by Cargo. # It is not intended for manual editing. [[package]] name = "Hello_cargo" version = "0.1.0" ] [[package]] name = "libc" version = "0.2.69" source = "registry+https://github.com/rust-lang/crates.io-index" checksum E85c08494b21a9054e7fe1374a732aeadaff3980b6990b94bfd3a70f690005 = "99" [[package]] name = "time" version = "0.1.43" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ca8a50ef2360fbd1eeb0ecd46795a87a19024eb4b53c5dc916ca1fd95fe62438" dependencies = [ "libc", "winapi", ] [[package]] name = "winAPI" version = "0.3.8" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8093091eeb260906a183e6ae1abdba2ef5ef2257a21801128899c3fc699229c6" dependencies = [ "winapi-i686-pc-windows-gnu", "winapi-x86_64-pc-windows-gnu", [[package]] name = "winapi-i686-PC-windows-GNU" version = "0.4.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6" [[package]] name = "winapi-x86_64-pc-windows-gnu" Source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"Copy the code
The directory structure is as follows
.├ ── freight.Bass Exercises ── freight.Bass Exercises ─ SRC ├─Copy the code
We found that the project didn’t have anything like node_modules to store all the dependencies of the project. So where did he store all his dependencies?
cargo home
Cargo stores all third-party dependent code in a directory called Cargo Home, which by default is ~/.cargo, and contains three main directories
/ c/c/c/c/c/c/c/c/c/c/c/cCopy the code
Monorepo support
Cargo itself provides support for Monorepo, and similar to YARN, Cargo supports Monorepo through the workspace concept
// Cargo.tom
[workspace]
members = [
"adder",
"hardfist-add-one"
]
Copy the code
This is almost equivalent to yarn.lock below
// package.json
{
"workspaces":["adder","hardfist-add-one"]
}
Copy the code
Let’s take a look at the workspace directory structure
Like YARN, it shares a Cargo. Lock file and we can link libraries to each other through a local file. Let’s assume that Monorepo adder relies on Hardfist-Add-One
// adder/Cargo. Toml [package] name = "hardfist-adder" version = "0.1.0" authors = ["hardfist <[email protected]>"] edition = "2018" description = "test publish" license = "MIT" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] hardfist-add-one = { path = ".. / hardfist - add - one ", version = "0.1.0 from"} rand = "0.7"Copy the code
We can point hardfist-Add-One locally via the PATH protocol. When we needed to publish adder, Cargo would not allow a dependency with only a path path, so we also needed to specify a version for hard-Fist-One to publish.
Disable implicit dependencies
Although in the same workspace, if our hardfist-Add-one relies on RAND and hardfist-Adder relies on hardfist-Add-one, If hardfist-Adder itself does not declare RAND as its dependency, cargo reports an error
Cargo automatically recursively compiles all of its dependencies each time a build is performed, requiring no additional tool support. Since most cargo apps are packaged into a binary, there is no problem with the vendor in monorepo as described below. Cargo also supports offline installation dependencies
vendor: for serverless
One of the larger problems with Monorepo is how to deploy each package separately, which is particularly problematic in a serveless scenario where the user’s resource size is limited. When we used Monorepo to manage the application, there were two problems with deployment
- Third party dependencies are installed at root level, resulting in node_modules in the package that does not contain all of the dependency information. We can only choose to pack all the root-level node_modules of the package together
- Since each package supports each other through the soft chain, even if we package node_modules, it still only contains the soft chain that depends on the package, there will still be problems.
There are basically two solutions to this problem
- The code is packaged as a bundle using the packaging tool, which does not rely on node_modules at runtime
- Extract the hoist third-party dependencies and link’s package and put them into node_modules of the package
The biggest problem with both approaches is implicit dependency
bundle
The front-end application bundle is common, but the server application bundle is not common. In fact, many server languages adopt the bundle solution, such as deno, Rust, go, etc., and all the online bundles are a bundle file. The bundle file may be binary or any other format.
In fact, there are some mature bundle schemes in the Node ecosystem, such as github.com/zeit/ncc, which intelligently bundle the server-side code into a JS file. It is even possible to package the Runtime together with the business code into a binary file, such as the github.com/zeit/pkg solution.
Server bundle is one of the biggest problems is to read and write files and dynamic import, because the compilation function cannot obtain need to read and write at compile time | import file information, so it is difficult to apply to some conventions than configuration framework (e.g., an egg and gulu), However, express and KOA frameworks that require explicit dependencies are fine.