This article was first published on Github. More articles can be read on Github.

What is Yarn Duplicate

Those of you who use YARN as a package manager may find that the app will repeatedly package different versions of a package when it is built, even if these versions of the package are compatible.

For example, 🌰, assume the following dependencies:

When (p) NPM is installed in the same module, check whether the version of the installed module conforms to the version range of the new module. If yes, the module is skipped. If no, the module is installed under the node_modules of the current module. That is, lib-a will reuse [email protected], which the app depends on.

However, using Yarn V1 as the package manager, lib-a will install a separate [email protected].

  • difference between npm and yarn behavior with nested dependencies #3951
  • Yarn installing multiple versions of the same package
  • Yarn v2 supports package deduplication natively

🤔 Think about it. If your app project relies on lib-b@^1.1.0, is that ok?

When the app is installed with lib-b@^1.1.0, if the latest version of Lib-b is 1.1.0, [email protected] is locked in yarn.lock.

After a period of time, if the latest version of Lib-b is 1.2.0, Yarn Duplicate still appears. Therefore, this problem is common.

Although the Monorepo project of the company was migrated to Rush and PNPM, many projects still use Yarn as the underlying package management tool without a migration plan.

For this type of project, you can use the yarn-deduplicate command line tool to modify yarn.lock to perform deduplicate.

Yarn-deduplicate – The Hero We Need

The basic use

Modify yarn.lock based on the default policy

npx yarn-deduplicate yarn.lock
Copy the code

Handling strategy

--strategy <strategy>

Highest strategy

The default policy is to try to use the largest version installed.

For example, the following yarn. Lock exists:

Library @^1.0.0: version "1.0.0" library@^1.1.0: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code

After modification, the result is as follows:

The library @ ^ 1.0.0, library @ ^ 1.1.0: version "1.3.0"Copy the code

Library @^1.0.0, library@^1.1.0 will be locked at 1.3.0 (the largest version currently installed).

Example 2:

Change library@^1.1.0 to [email protected]

Library @^1.0.0: version "1.0.0" [email protected]: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code

After modification, the result is as follows:

[email protected]:
  version "1.1.0"

library@^1.0.0:
  version "1.3.0"
Copy the code

[email protected] will remain unchanged, and library@^1.0.0 will be unified to 1.3.0, the largest version currently installed.

Fewer strategy

Try to use the minimum number of packages, the minimum number, not the lowest version, but the highest version if the number of installs is consistent.

Example 1:

Library @^1.0.0: version "1.0.0" library@^1.1.0: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code

After modification, the result is as follows:

The library @ ^ 1.0.0, library @ ^ 1.1.0: version "1.3.0"Copy the code

Note: This is the same as the highest policy.

Example 2:

Change library@^1.1.0 to [email protected]

Library @^1.0.0: version "1.0.0" [email protected]: version "1.1.0" library@^1.0.0: version "1.3.0"Copy the code

After modification, the result is as follows:

The library @ ^ 1.0.0, library @ ^ 1.1.0: version 1.1.0 ""Copy the code

You can see that version 1.1.0 is used to minimize the number of installed versions.

Incremental change

A shuttle is fast, but it can be risky, so it needs to support incremental modifications.

--packages <package1> <package2> <packageN>

Specifying a Specific Package

--scopes <scope1> <scope2> <scopeN>

Specifies a Package under a scope

Diagnostic information

--list

Only diagnostic information is displayed

The yarn-deduplicate principle is resolved

The basic flow

By looking at the package.json of yarn-deduplicate, you can see that the package depends on the following package:

  • Commander complete Node.js command line solution;
  • @yarnpkg/ lockFile Parses or writes to yarn.lock;
  • Semver The Semantic versioner for NPM is used to determine whether The installed version meets package.json requirements.

There are two main files in the source code:

  1. cli.js, command line related capabilities. The parameters are parsed and executed according to themindex.jsMethod in.
  2. index.js. Main logic code.

The key point can be found in getDuplicatedPackages.

Get Duplicated Packages

First, clarify the implementation idea of getDuplicatedPackages.

Assuming the following yarn.lock exists, the goal is to find the bestVersion of Lodash@ ^4.17.15.

Lodash @^4.17.15: version "4.17.21" [email protected]: version "4.17.16"Copy the code
  1. throughyarn.lockAnalysis of theLodash @ ^ 4.17.15requestedVersionfor^ 4.17.15installedVersion4.17.21
  2. Access to meetRequestedVersion (^ 4.17.15)All of theinstalledVersion, i.e.,4.17.214.17.16;
  3. frominstalledVersionWhich satisfies the current policybestVersionIf the current policy isfewer, thenLodash @ ^ 4.17.15bestVersionfor4.17.16, or for4.17.21).

👆🏻 This process is important and serves as a guiding principle for subsequent code.

The type definition

const getDuplicatedPackages = (
  json: YarnLock,
  options: Options
): DuplicatedPackages= > {
  // todo
};

// Parse the object obtained by yarn.lock
interface YarnLock {
  [key: string]: YarnLockVal;
}

interface YarnLockVal {
  version: string; // installedVersion
  resolved: string;
  integrity: string;
  dependencies: {
    [key: string] :string;
  };
}

// Something like this
const yarnLockInstanceExample = {
  // ...
  "Lodash @ ^ 4.17.15": {
    version: "4.17.21".resolved:
      "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c".integrity:
      "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==".dependencies: {
      "fake-lib-x": "^ 1.0.0".Lodash does not have dependencies}},// ...
};

// Is parsed from the command line arguments
interface Options {
  includeScopes: string[]; // The default value for scope packages is []
  includePackages: string[]; // Specifies the packages to be processed by default []
  excludePackages: string[]; // Default to []
  useMostCommon: boolean; // True if the policy is fewer
  includePrerelease: boolean; // Whether to consider the prerelease version of the package default is false
}

type DuplicatedPackages = PackageInstance[];

interface PackageInstance {
  name: string; // The package name is lodash
  bestVersion: string; // The best version under the current policy
  requestedVersion: string; // Required version ^15.6.2
  installedVersion: string; // Version 15.7.2 has been installed
}
Copy the code

The ultimate goal is to get the PackageInstance.

To obtainyarn.lockdata

const fs = require("fs");
const lockfile = require("@yarnpkg/lockfile");

const parseYarnLock = (file) = > lockfile.parse(file).object;

// The file field is obtained from the command line argument with commander
const yarnLock = fs.readFileSync(file, "utf8");
const json = parseYarnLock(yarnLock);
Copy the code

Yarn.lock Object structure beautification

We need to filter out packages based on the Options parameter in the specified range.

In addition, the key in the yarn.lock object is in the form of lodash@^4.17.15, which makes it difficult to find data.

We can use lodash as key, value as an array, and array items as information of different versions to facilitate subsequent processing. Finally, we need to convert the yarn.lock object into the following ExtractedPackages structure.

interface ExtractedPackages {
  [key: string]: ExtractedPackage[];
}

interface ExtractedPackage {
    pkg: YarnLockVal;
    name: string;
    requestedVersion: string;
    installedVersion: string;
    satisfiedBy: Set<string>;
}

Copy the code

SatisfiedBy is used to store all installedVersions that meet the requestedVersion of the package. The default value is new Set().

From this set, the installedVersion that satisfies the policy is fetched, which is bestVersion.

The specific implementation is as follows:

const extractPackages = (json, includeScopes = [], includePackages = [], excludePackages = []) = > {
  const packages = {};
  // Match the regular expression of yarn.lock Object key
  const re = / ^ (. *) @ ([^ @] *?) $/;

  Object.keys(json).forEach((name) = > {
    const pkg = json[name];
    const match = name.match(re);

    let packageName, requestedVersion;
    if (match) {
      [, packageName, requestedVersion] = match;
    } else {
      / / if there is no matching data that do not specify a specific version number, is the * (https://docs.npmjs.com/files/package.json#dependencies)
      packageName = name;
      requestedVersion = "*";
    }

    // Filter out packages based on the specified range of parameters

    // If a scopes array is specified, only packages under the scopes are handled
    if (
      includeScopes.length > 0 &&
      !includeScopes.find((scope) = > packageName.startsWith(`${scope}/ `))) {return;
    }

    // If packages are specified, only related packages are processed
    if (includePackages.length > 0 && !includePackages.includes(packageName))
      return;

    if (excludePackages.length > 0 && excludePackages.includes(packageName))
      return;

    packages[packageName] = packages[packageName] || [];
    packages[packageName].push({
      pkg,
      name: packageName,
      requestedVersion,
      installedVersion: pkg.version,
      satisfiedBy: new Set()}); });return packages;
};
Copy the code

After we have pulled out packages, we have information about different versions of the same package.

{
    // ...
    "lodash": [{"pkg": YarnLockVal,
            "name": "lodash"."requestedVersion": "^ 4.17.15"."installedVersion": "4.17.21"."satisfiedBy": new Set()
        },
        {
            "pkg": YarnLockVal,
            "name": "lodash"."requestedVersion": "4.17.16"."installedVersion": "4.17.16"."satisfiedBy": new Set()
        }
    ]
}
Copy the code

We need to supplement the satisfiedBy field for each of these array items and compute the bestVersion that satisfies the current requestedVersion from it, a process called computePackageInstances.

Compute Package Instances

Related types are defined as follows:

const computePackageInstances = (
  packages: ExtractedPackages,
  name: string.useMostCommon: boolean,
  includePrerelease = false
): PackageInstance[] => {
  // todo
};

// The final goal
interface PackageInstance {
  name: string; // The package name is lodash
  bestVersion: string; // The best version under the current policy
  requestedVersion: string; // Required version ^15.6.2
  installedVersion: string; // Version 15.7.2 has been installed
}
Copy the code

ComputePackageInstances can be implemented in three steps:

  1. Gets all of the current packageinstalledVersion
  2. supplementsatisfiedByField;
  3. throughsatisfiedByTo calculate thebestVersion.

** Get all installedVersion **

/** * Versions Records the data of all installedVersions of the current package * the distribution field stores requestedVersion * satisfied by the current installedVersion. The initial value is New Set() * Analyze the number of installedVersions that satisfy requestedVersion by the size of this field * for fewer policies */
interface Versions {
  [key: string] : {pkg: YarnLockVal; satisfies: Set<string>}; }// Dependency information corresponding to the current package name
const packageInstances = packages[name];

const versions = packageInstances.reduce((versions, packageInstance) = > {
  if (packageInstance.installedVersion in versions) return versions;
  versions[packageInstance.installedVersion] = {
    pkg: packageInstance.pkg,
    satisfies: new Set()};return versions;
}, {} as Versions);
Copy the code

The version specific distribution field stores all RequestedVersions that the current installedVersion satisfies, with an initial value of new Set(), Based on the size of this set, the installedVersion that meets the most requestedVersion can be analyzed for fewer policies.

supplementsatisfiedBysatisfiesfield

// Iterate through all installedVersions
Object.keys(versions).forEach((version) = > {
  const satisfies = versions[version].satisfies;
  // Iterate over packageInstance one by one
  packageInstances.forEach((packageInstance) = > {
    PackageInstance's installedVersion must satisfy its requestedVersion
    packageInstance.satisfiedBy.add(packageInstance.installedVersion);
    if( semver.satisfies(version, packageInstance.requestedVersion, { includePrerelease, }) ) { satisfies.add(packageInstance); packageInstance.satisfiedBy.add(version); }}); });Copy the code

According to thesatisfiedBysatisfiesTo calculatebestVersion

packageInstances.forEach((packageInstance) = > {
  const candidateVersions = Array.from(packageInstance.satisfiedBy);
  // Sort
  candidateVersions.sort((versionA, versionB) = > {
    If you use the fewer strategy, sort by the size of the current satisfiedBy matching field
    if (useMostCommon) {
      if (versions[versionB].satisfies.size > versions[versionA].satisfies.size)
        return 1;
      if (versions[versionB].satisfies.size < versions[versionA].satisfies.size)
        return -1;
    }
    // If the policy of highest is used, the highest version is used
    return semver.rcompare(versionA, versionB, { includePrerelease });
  });
  packageInstance.satisfiedBy = candidateVersions;
  packageInstance.bestVersion = candidateVersions[0];
});

return packageInstances;
Copy the code

In this way, we find the installedVersion and the bestVersion required for different versions of the same package.

Complete getDuplicatedPackages

const getDuplicatedPackages = (
  json,
  {
    includeScopes,
    includePackages,
    excludePackages,
    useMostCommon,
    includePrerelease = false,}) = > {
  const packages = extractPackages(
    json,
    includeScopes,
    includePackages,
    excludePackages
  );
  return Object.keys(packages)
    .reduce(
      (acc, name) = >
        acc.concat(
          computePackageInstances(
            packages,
            name,
            useMostCommon,
            includePrerelease
          )
        ),
      []
    )
    .filter(
      ({ bestVersion, installedVersion }) = >bestVersion ! == installedVersion ); };Copy the code

conclusion

This document introduces Yarn Duplicate, introduces yarn-deduplicate as the solution, analyzes the internal implementation, and looks forward to the arrival of Yarn V2.