What is Monorepo?

This article describes how to use YARN and NPM as dependency management tools. Finally, PNPM is introduced.

Monorepo is a code management pattern for managing multiple modules/packages in a project repository (REPO). The opposite of Monorepo is Multirepo (or Polyrepo), which is our common one repository per module. Google, Facebook, Microsoft and others have used it for many years, and well-known projects such as Vue3 and Yarn2 have now switched to Monorepo.

A Monorepo project directory might look like this:

├─ Changelog.md ├─ Readme.md ├─ ├─ ├─ ├─ package.json │ ├─ ├─ package2 │ ├─ package2 │ ├─ package.json │ ├─ package2 │ ├─ ├.json │ ├─ package2 │ ├─ ├ ├ ─ ├ ─ garbage (2 filesCopy the code

Why use Monorepo?

There are many benefits to using Monorepo when you need to maintain multiple projects that have dependencies between them and share the same infrastructure (build tools, Lint).

  • Common package management tools such as YARN and NPM can promote dependencies. Using Monorepo can reduce dependency installation time and space consumption.
  • It is very convenient to debug dependencies between projects. The upper application can sense the changes of its dependencies and modify and debug dependencies easily.
  • Infrastructure is shared by several projects without duplication of configuration.

Workspaces

Workspaces is a new way to set up package architectures. His goal is to make it easier to use Monorepo by having multiple projects centralize in the same repository and reference each other — code changes from dependent projects are fed back to dependent projects in real time. The subproject in Monorepo is called a workspace, and multiple workspaces are formed.

Benefits of using Workspaces (using YARN as an example) :

  • Dependency packages can be linked together, meaning that your workspaces can depend on each other and the code is updated in real time. This is a better way than ‘YARN Link’ because it only affects the workspace part, not the entire file system.
  • Dependencies for all projects are installed together, which makes it easier for Yarn to optimize installation dependencies.
  • Yarn has only one lock file, rather than one per subproject, which means fewer conflicts.

hoist

Why does using Monorepo reduce project dependency setup time and space? Let’s start with a review of the standalone project dependency installation process.

Prior to NPM V3, the dependency setup rule was simple. When installing dependencies, you put them into the node_modules file of your project. At the same time, if A direct dependency A also depends on other modules B, as an indirect dependency, module B will be downloaded to the NODE_modules folder of A, and the recursive execution will eventually form A huge dependency module tree.

This node_modules structure is straightforward and works as expected, but it is possible to install many duplicate packages for large projects, such as projects that rely directly on A and B, but both A and B rely on the same version of module C. So C repeats in node_modules that A and B depend on.

This duplication causes the installation to waste a lot of space and slow down the installation process. So after NPM v3, node_modules is flattened.

Flat installation

Assume that the project directly depends on A and C, and they depend on B v1.0 and B v2.0 respectively. Their dependency tree structure is shown as follows:

The flattening process is shown in the figure above.

A question to ponder:

If the original structure of the project is:

Which flat installation structure is shown below?

Figure 3-1 👆

Figure 3-2 👆

Both cases may depend on the order of installation that depends on A and C. If A is installed first, the dependencies of A, B v1.0, are installed on the top node_modules, and then C and D are installed on the top node_modules, and then C and D are installed on the top node_modules.

If A is first dependent and installed in the project, and then dependencies B and D are added, the project will look like Figure 3-1, which shows that B v2.0 has been installed multiple times. At this point we can remove node_modules and reinstall to get a cleaner structure as shown in Figure 3-2, or we can use the NPM dedupe command. Yarn automatically runs the dedupe command when installing dependencies.

hoist in workspaces

A new hierarchy was introduced in the Monorepo project that no longer relies heavily on node_modules to establish dependencies between modules. As shown below:

By promoting the child project and dependencies to the parent project’s node_modules (monorepo/node_modules), we reduced the number of repeated installs of B V1.0.

Package-1 and package-2 are also promoted to the root node_modules, which is why monorepo’s workspace allows you to use other subprojects just like NPM packages normally. Also, the subprojects in node_modules are soft-linked to the Packages/directory, so changes in the code of the dependent subprojects are updated to the dependent subprojects in real time. For example, if you want to debug package-2 in package-1, you just need to add the ‘package-2’ dependency to the package.json file of package-1. Without NPM link, yALC tools, and copying package-2 packages to node_modules, debugging becomes very smooth

Package-2 package name is /package-2/package.json#name field value instead of folder name.

Possible problems

XXX Module could not be found

This problem usually occurs for two reasons:

  1. Not declared in the subproject package.json

If subprojects A and B both depend on the package AXIos, but the ‘AXIos’ dependency is not declared in A’s package.json. You have no problem developing locally because other packages depend on this package. However, an error will be reported when the build goes live, because at compile time the Package Manager workspace state will be removed and dependencies will be pulled from NPM based on the package.json of the subproject will be compiled. Because the dependency table in package.json is incomplete, there will be an error that the dependency cannot be found.

This problem can be configured with plug-ins to check for reminders.

  1. A dependency does not support the Monorepo pattern

This should happen less often. Node is A recursive lookup module: when package A needs to be found, the lookup order is:

node_modules/A –> .. /node_modules/A –> .. /.. /node_modules/A … Until you find the global node_modules directory.

Hoist dependent, tools that comply with this specification can work as well. However, some tools do not follow this specification and assume that dependencies are in node_modules under this project. One possible solution to this problem is to soft-link all dependencies from the root directory to node_modules of the subproject. Package Manage does not do this automatically, and it is recommended that tool maintainers migrate to compatibility mode.

Hoist problem and cracking: PNPM

For those of you who want to know more about PNPM, check out this article: # Deep Thoughts on modern package managers — Why do I now recommend PNPM over NPM/YARN?

As mentioned above, dependency promotion can cause problems with normal local development and not compiling online. Our solution at the time was to use the plugin Lint to remind us to declare dependencies in a timely manner. But relying on plug-ins is never reliable enough. Is there another way to solve this problem?

The answer is PNPM. PNPM creates a new dependency management mechanism that is completely different from NPM V7 and YARN, which is why PNPM is covered separately here.

PNPM saves more disk space and faster installation

Monorepo promotes the dependencies of subprojects to the root directory, reducing some of the dependencies from repeated installations. However, if there are multiple MonorePo projects and the same dependency exists in multiple MonorePos, the dependency will still be installed multiple times. So there is room for optimization with dependency installations, and PNPM takes this to the extreme by storing dependencies in a content-addressable store, based on this:

  1. If you use different versions of the same dependency, only different files will be stored in store. For example, if Express has 100 files and only changes one of them in a version update,pnpm updateOnly the new file is added to the store, not the entire dependency.
  2. All files are stored in a separate place on the disk. When dependencies are installed, these files are hard-linked to that location without consuming additional disk space. This allows us to reuse the same version of dependencies across projects

PNPM’s new dependency management mechanism

We initialize a project with PNPM and install a dependency

pnpm init -y

pnpm install express
Copy the code

Take a look at node_modules at this point:

.pnpm

express

.modules.yaml
Copy the code

Node_modules only has the.npm and Express folders, the structure can be seen here.

This ensures that only the installed Express can be accessed in the project and will never be used if it is not declared in package.json.

The PNPM and monorepo

  1. Consistent node_modules structure: Dependencies declared by an item exist only in the node_modules of the item, and peerDependencies are resolved correctly
  2. The LOCK items for different projects are independent. PNPM maintains a separate LOCK entry for each version of the dependency, for the aboveThe react @ ^ 16.13.1The problem in PNPM will bepatternversionIf a project’s package.json has not changed, its dependencies will not change (PS: RushJS lock is also based on a specific tool, so if Rush uses it yarn Pretend to depend, also can have similar problem).
  3. PNPM natively supports installing dependencies only for specific subprojects, while maintaining a consistent node_modules structure
  4. PNPM is two to three times faster than YARN

The resources

  • Classic.yarnpkg.com/en/docs/wor…
  • Yarnpkg.com/features/wo…
  • Github.com/lerna/lerna…
  • npm workspaces
  • PNPM. IO/blog / 2020/0…