This article was first published on Github. More articles can be read on Github.
What is Yarn Duplicate
Those of you who use YARN as a package manager may find that the app will repeatedly package different versions of a package when it is built, even if these versions of the package are compatible.
For example, 🌰, assume the following dependencies:
When (p) NPM is installed in the same module, check whether the version of the installed module conforms to the version range of the new module. If yes, the module is skipped. If no, the module is installed under the node_modules of the current module. That is, lib-a will reuse [email protected], which the app depends on.
However, using Yarn V1 as the package manager, lib-a will install a separate [email protected].
- difference between npm and yarn behavior with nested dependencies #3951
- Yarn installing multiple versions of the same package
- Yarn v2 supports package deduplication natively
🤔 Think about it. If your app project relies on lib-b@^1.1.0, is that ok?
When the app is installed with lib-b@^1.1.0, if the latest version of Lib-b is 1.1.0, [email protected] is locked in yarn.lock.
After a period of time, if the latest version of Lib-b is 1.2.0, Yarn Duplicate still appears. Therefore, this problem is common.
Although the Monorepo project of the company was migrated to Rush and PNPM, many projects still use Yarn as the underlying package management tool without a migration plan.
For this type of project, you can use the yarn-deduplicate command line tool to modify yarn.lock to perform deduplicate.
Yarn-deduplicate – The Hero We Need
The basic use
Modify yarn.lock based on the default policy
npx yarn-deduplicate yarn.lock
Copy the code
Handling strategy
--strategy <strategy>
Highest strategy
The default policy is to try to use the largest version installed.
For example, the following yarn. Lock exists:
Library @^1.0.0: version "1.0.0" library@^1.1.0: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code
After modification, the result is as follows:
The library @ ^ 1.0.0, library @ ^ 1.1.0: version "1.3.0"Copy the code
Library @^1.0.0, library@^1.1.0 will be locked at 1.3.0 (the largest version currently installed).
Example 2:
Change library@^1.1.0 to [email protected]
Library @^1.0.0: version "1.0.0" [email protected]: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code
After modification, the result is as follows:
[email protected]:
version "1.1.0"
library@^1.0.0:
version "1.3.0"
Copy the code
[email protected] will remain unchanged, and library@^1.0.0 will be unified to 1.3.0, the largest version currently installed.
Fewer strategy
Try to use the minimum number of packages, the minimum number, not the lowest version, but the highest version if the number of installs is consistent.
Example 1:
Library @^1.0.0: version "1.0.0" library@^1.1.0: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code
After modification, the result is as follows:
The library @ ^ 1.0.0, library @ ^ 1.1.0: version "1.3.0"Copy the code
Note: This is the same as the highest policy.
Example 2:
Change library@^1.1.0 to [email protected]
Library @^1.0.0: version "1.0.0" [email protected]: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code
After modification, the result is as follows:
The library @ ^ 1.0.0, library @ ^ 1.1.0: version 1.1.0 ""Copy the code
You can see that version 1.1.0 is used to minimize the number of installed versions.
Incremental change
A shuttle is fast, but it can be risky, so it needs to support incremental modifications.
--packages <package1> <package2> <packageN>
Specifying a Specific Package
--scopes <scope1> <scope2> <scopeN>
Specifies a Package under a scope
Diagnostic information
--list
Only diagnostic information is displayed
The yarn-deduplicate principle is resolved
The basic flow
By looking at the package.json of yarn-deduplicate, you can see that the package depends on the following package:
- Commander complete Node.js command line solution;
- @yarnpkg/ lockFile Parses or writes to yarn.lock;
- Semver The Semantic versioner for NPM is used to determine whether The installed version meets package.json requirements.
There are two main files in the source code:
cli.js
, command line related capabilities. The parameters are parsed and executed according to themindex.js
Method in.index.js
. Main logic code.
The key point can be found in getDuplicatedPackages.
Get Duplicated Packages
First, clarify the implementation idea of getDuplicatedPackages.
Assuming the following yarn.lock exists, the goal is to find the bestVersion of Lodash@ ^4.17.15.
Lodash @^4.17.15: version "4.17.21" [email protected]: version "4.17.16"Copy the code
- through
yarn.lock
Analysis of theLodash @ ^ 4.17.15
的requestedVersion
for^ 4.17.15
,installedVersion
为4.17.21
; - Access to meet
RequestedVersion (^ 4.17.15)
All of theinstalledVersion
, i.e.,4.17.21
与4.17.16
; - from
installedVersion
Which satisfies the current policybestVersion
If the current policy isfewer
, thenLodash @ ^ 4.17.15
的bestVersion
for4.17.16
, or for4.17.21
).
👆🏻 This process is important and serves as a guiding principle for subsequent code.
The type definition
const getDuplicatedPackages = (
json: YarnLock,
options: Options
): DuplicatedPackages= > {
// todo
};
// Parse the object obtained by yarn.lock
interface YarnLock {
[key: string]: YarnLockVal;
}
interface YarnLockVal {
version: string; // installedVersion
resolved: string;
integrity: string;
dependencies: {
[key: string] :string;
};
}
// Something like this
const yarnLockInstanceExample = {
// ...
"Lodash @ ^ 4.17.15": {
version: "4.17.21".resolved:
"https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c".integrity:
"sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==".dependencies: {
"fake-lib-x": "^ 1.0.0".Lodash does not have dependencies}},// ...
};
// Is parsed from the command line arguments
interface Options {
includeScopes: string[]; // The default value for scope packages is []
includePackages: string[]; // Specifies the packages to be processed by default []
excludePackages: string[]; // Default to []
useMostCommon: boolean; // True if the policy is fewer
includePrerelease: boolean; // Whether to consider the prerelease version of the package default is false
}
type DuplicatedPackages = PackageInstance[];
interface PackageInstance {
name: string; // The package name is lodash
bestVersion: string; // The best version under the current policy
requestedVersion: string; // Required version ^15.6.2
installedVersion: string; // Version 15.7.2 has been installed
}
Copy the code
The ultimate goal is to get the PackageInstance.
To obtainyarn.lock
data
const fs = require("fs");
const lockfile = require("@yarnpkg/lockfile");
const parseYarnLock = (file) = > lockfile.parse(file).object;
// The file field is obtained from the command line argument with commander
const yarnLock = fs.readFileSync(file, "utf8");
const json = parseYarnLock(yarnLock);
Copy the code
Yarn.lock Object structure beautification
We need to filter out packages based on the Options parameter in the specified range.
In addition, the key in the yarn.lock object is in the form of lodash@^4.17.15, which makes it difficult to find data.
We can use lodash as key, value as an array, and array items as information of different versions to facilitate subsequent processing. Finally, we need to convert the yarn.lock object into the following ExtractedPackages structure.
interface ExtractedPackages {
[key: string]: ExtractedPackage[];
}
interface ExtractedPackage {
pkg: YarnLockVal;
name: string;
requestedVersion: string;
installedVersion: string;
satisfiedBy: Set<string>;
}
Copy the code
SatisfiedBy is used to store all installedVersions that meet the requestedVersion of the package. The default value is new Set().
From this set, the installedVersion that satisfies the policy is fetched, which is bestVersion.
The specific implementation is as follows:
const extractPackages = (json, includeScopes = [], includePackages = [], excludePackages = []) = > {
const packages = {};
// Match the regular expression of yarn.lock Object key
const re = / ^ (. *) @ ([^ @] *?) $/;
Object.keys(json).forEach((name) = > {
const pkg = json[name];
const match = name.match(re);
let packageName, requestedVersion;
if (match) {
[, packageName, requestedVersion] = match;
} else {
/ / if there is no matching data that do not specify a specific version number, is the * (https://docs.npmjs.com/files/package.json#dependencies)
packageName = name;
requestedVersion = "*";
}
// Filter out packages based on the specified range of parameters
// If a scopes array is specified, only packages under the scopes are handled
if (
includeScopes.length > 0 &&
!includeScopes.find((scope) = > packageName.startsWith(`${scope}/ `))) {return;
}
// If packages are specified, only related packages are processed
if (includePackages.length > 0 && !includePackages.includes(packageName))
return;
if (excludePackages.length > 0 && excludePackages.includes(packageName))
return;
packages[packageName] = packages[packageName] || [];
packages[packageName].push({
pkg,
name: packageName,
requestedVersion,
installedVersion: pkg.version,
satisfiedBy: new Set()}); });return packages;
};
Copy the code
After we have pulled out packages, we have information about different versions of the same package.
{
// ...
"lodash": [{"pkg": YarnLockVal,
"name": "lodash"."requestedVersion": "^ 4.17.15"."installedVersion": "4.17.21"."satisfiedBy": new Set()
},
{
"pkg": YarnLockVal,
"name": "lodash"."requestedVersion": "4.17.16"."installedVersion": "4.17.16"."satisfiedBy": new Set()
}
]
}
Copy the code
We need to supplement the satisfiedBy field for each of these array items and compute the bestVersion that satisfies the current requestedVersion from it, a process called computePackageInstances.
Compute Package Instances
Related types are defined as follows:
const computePackageInstances = (
packages: ExtractedPackages,
name: string.useMostCommon: boolean,
includePrerelease = false
): PackageInstance[] => {
// todo
};
// The final goal
interface PackageInstance {
name: string; // The package name is lodash
bestVersion: string; // The best version under the current policy
requestedVersion: string; // Required version ^15.6.2
installedVersion: string; // Version 15.7.2 has been installed
}
Copy the code
ComputePackageInstances can be implemented in three steps:
- Gets all of the current package
installedVersion
; - supplement
satisfiedBy
Field; - through
satisfiedBy
To calculate thebestVersion
.
** Get all installedVersion **
/** * Versions Records the data of all installedVersions of the current package * the distribution field stores requestedVersion * satisfied by the current installedVersion. The initial value is New Set() * Analyze the number of installedVersions that satisfy requestedVersion by the size of this field * for fewer policies */
interface Versions {
[key: string] : {pkg: YarnLockVal; satisfies: Set<string>}; }// Dependency information corresponding to the current package name
const packageInstances = packages[name];
const versions = packageInstances.reduce((versions, packageInstance) = > {
if (packageInstance.installedVersion in versions) return versions;
versions[packageInstance.installedVersion] = {
pkg: packageInstance.pkg,
satisfies: new Set()};return versions;
}, {} as Versions);
Copy the code
The version specific distribution field stores all RequestedVersions that the current installedVersion satisfies, with an initial value of new Set(), Based on the size of this set, the installedVersion that meets the most requestedVersion can be analyzed for fewer policies.
supplementsatisfiedBy
与 satisfies
field
// Iterate through all installedVersions
Object.keys(versions).forEach((version) = > {
const satisfies = versions[version].satisfies;
// Iterate over packageInstance one by one
packageInstances.forEach((packageInstance) = > {
PackageInstance's installedVersion must satisfy its requestedVersion
packageInstance.satisfiedBy.add(packageInstance.installedVersion);
if( semver.satisfies(version, packageInstance.requestedVersion, { includePrerelease, }) ) { satisfies.add(packageInstance); packageInstance.satisfiedBy.add(version); }}); });Copy the code
According to thesatisfiedBy
与 satisfies
To calculatebestVersion
packageInstances.forEach((packageInstance) = > {
const candidateVersions = Array.from(packageInstance.satisfiedBy);
// Sort
candidateVersions.sort((versionA, versionB) = > {
If you use the fewer strategy, sort by the size of the current satisfiedBy matching field
if (useMostCommon) {
if (versions[versionB].satisfies.size > versions[versionA].satisfies.size)
return 1;
if (versions[versionB].satisfies.size < versions[versionA].satisfies.size)
return -1;
}
// If the policy of highest is used, the highest version is used
return semver.rcompare(versionA, versionB, { includePrerelease });
});
packageInstance.satisfiedBy = candidateVersions;
packageInstance.bestVersion = candidateVersions[0];
});
return packageInstances;
Copy the code
In this way, we find the installedVersion and the bestVersion required for different versions of the same package.
Complete getDuplicatedPackages
const getDuplicatedPackages = (
json,
{
includeScopes,
includePackages,
excludePackages,
useMostCommon,
includePrerelease = false,}) = > {
const packages = extractPackages(
json,
includeScopes,
includePackages,
excludePackages
);
return Object.keys(packages)
.reduce(
(acc, name) = >
acc.concat(
computePackageInstances(
packages,
name,
useMostCommon,
includePrerelease
)
),
[]
)
.filter(
({ bestVersion, installedVersion }) = >bestVersion ! == installedVersion ); };Copy the code
conclusion
This document introduces Yarn Duplicate, introduces yarn-deduplicate as the solution, analyzes the internal implementation, and looks forward to the arrival of Yarn V2.