Task scheduling mechanism in Monorepo

preface

A project in Monorepo is called a project, and specific operations performed on the project are called tasks, such as build or test, which can be narrowly understood as operations registered in NPM scripts. The Monorepo administration tool should be able to schedule these tasks.

Take a look at 🌰 first. As shown in the figure above, there is a Monorepo with complex dependencies, and a task, such as build, needs to be executed. How to ensure the task execution sequence and task execution efficiency at the same time (assuming the maximum task parallelism is N)?

Next comes the tedious process of doing the problem. Let’s first abstract the above project dependency diagram into code.

The problem

interface Project {
  name: string;
  actions: { name: string; fn: () = > Promise<void> }[];
  dependencyProjects: Project[];
}

const sleep = (s: number) :Promise<void> = >new Promise((r) = > setTimeout(r, s));

// All projects registered in Monorepo
const projects: Project[] = [
  "@monorepo/a"."@monorepo/b"."@monorepo/c"."@monorepo/d"."@monorepo/x"."@monorepo/y"."@monorepo/z",
].map((name) = > ({
  name,
  actions: [{ name: "build".fn: () = > sleep(Math.random() * 1000)},dependencyProjects: [],}));const [A, B, C, D, X, Y, Z] = projects;

A.dependencyProjects = [B];
B.dependencyProjects = [D];
C.dependencyProjects = [D, X, Y];
X.dependencyProjects = [Y, Z];

/** * Implement this method so that build actions are executed in the correct order and efficiently *@param Projects A collection of projects that need to perform tasks *@param ActionName Operation name *@param Limit Maximum number of parallel tasks */
function run(projects: Project[], actionName: string, limit: number) {
  // todo
}

run(projects, "build".12);
Copy the code

The problem solving

Obviously, there is a dependency relationship between projects, so there is also a dependency relationship between tasks, so the following conclusions can be drawn:

If the current task is a downstream task, the dependent tasks of the upstream task need to be updated and removed from the downstream task after the current task is completed.
If the current task is an upstream task, the current task can be executed only when all downstream tasks of the current task are cleared (completed).

So task is defined as follows:

interface Task {
  // Task name '${projectName}:{actionName}
  name: string;
  This dependenciesSet is cleared, indicating that the current task can be executed
  dependencies: Set<Task>;
  DependenciesSet (removes current task from current task) - dependenciesSet (removes current task from current task)
  dependents: Set<Task>;
  // Execute the function for specific tasks
  fn: () = > Promise<void>;
}
Copy the code

Initialization Task

According to the projects parameter, the corresponding tasks of the project are constructed.

function run(projects: Project[], actionName: string, limit: number) {
  // Mapping between task name and task
  const tasks = new Map<string, Task>();
  projects.forEach((project) = >
    tasks.set(getTaskName(project, actionName), {
      name: getTaskName(project, actionName),
      dependencies: new Set(),
      dependents: new Set(),
      fn: project.actions.find((a) = > a.name === actionName)?.fn ?? noop,
    })
  );
}

// Get the task name
function getTaskName(project: Project, actionName: string) {
  return `${project.name}:${actionName}`;
}

function noop() :Promise<void> {
  return new Promise((r) = > r());
}
Copy the code

Add dependencies and dependents

Assuming project1 exists, do the following for it:

Retrieves the task task1 corresponding to the current project
Gets the current task downstream of the corresponding task dependencyTaskNames (based on project1. DependencyProjects)
Traversal downstream task name dependencyTaskName
Fetch the downstream task (initialized from the previous step) dependencyTask
Add The Dependencies of Task1
Add dependencyTask dependents

function run(projects: Project[], actionName: string, limit: number) {
  // ...
  // project Indicates the downstream task name of the task corresponding to project
  function getDependencyTaskNames(project: Project) :Set<string> {
    const dependencyTaskNames: Set<string> = new Set(a);// Iterate through downstream projects
    for (const dep of project.dependencyProjects) {
      // Collect downstream task names
      dependencyTaskNames.add(getTaskName(dep, actionName));
    }

    return dependencyTaskNames;
  }

  projects.forEach((project) = > {
    // 1. Obtain the task corresponding to the current project
    consttask = tasks.get(getTaskName(project, actionName))! ;// 2. Obtain the downstream task name corresponding to the current task
    const dependencyTaskNames = getDependencyTaskNames(project);
    // 3. Iterate through the downstream task name
    for (const dependencyName of dependencyTaskNames) {
      // 4. Fetch the downstream task (initialized from the previous step)
      constdependency: Task = tasks.get(dependencyName)! ;// 5. Add the dependencies of the current task
      task.dependencies.add(dependency);
      // 6. Update the downstream task dependentsdependency.dependents.add(task); }}); }Copy the code

Parallel task execution

function run(projects: Project[], actionName: string, limit: number) {
  // ...
  const taskQueue: Task[] = [];
  for (const [, task] of tasks) {
    taskQueue.push(task);
  }
  runTasks(taskQueue, limit);
}

async function runTasks(taskQueue: Task[], limit: number) {
  let currentActiveTasks = 0;
  function getNextTask() {
    for (let i = 0; i < taskQueue.length; i++) {
      const task: Task = taskQueue[i];
      // Return to the task ready to execute
      if (task.dependencies.size === 0) {
        return taskQueue.splice(i, 1) [0]; }}return null;
  }

  function _run(task: Task) :Promise<void> {
    return task.fn().then(() = > {
      console.log("act success", task.name);
      currentActiveTasks--;
      // Remove the current task from its dependencies of upstream tasks
      task.dependents.forEach((dependent: Task) = > {
        dependent.dependencies.delete(task);
      });
      // Continue execution
      start();
    });
  }

  async function start() {
    let ctask: Task | null = null;
    const taskPromises: Promise<void= > [] [];while (currentActiveTasks < limit && (ctask = getNextTask())) {
      currentActiveTasks++;
      const task: Task = ctask;
      taskPromises.push(_run(task));
    }

    await Promise.all(taskPromises);
  }

  start();
}
Copy the code

Run (projects, “build”, 12) outputs the results in the correct order.

act success @monorepo/z:build
act success @monorepo/y:build
act success @monorepo/x:build
act success @monorepo/d:build
act success @monorepo/b:build
act success @monorepo/a:build
act success @monorepo/c:build
Copy the code

Critical path length

The implementation above enables tasks to be executed in the correct order. However, in the actual task execution process, the longest task chain limits the execution speed of the whole task tree, and the efficiency cannot be guaranteed.

Critical path length: the distance of the root node with the farthest task distance.

interface Task {
  name: string;
  dependencies: Set<Task>;
  dependents: Set<Task>;
  // The associated path lengthcriticalPathLength? :number;
  fn: () = > Promise<void>;
}

function run(projects: Project[], actionName: string, limit: number) {
  // ...
  const taskQueue: Task[] = [];
  for (const [, task] of tasks) {
    // Calculate the critical path length
    task.criticalPathLength = calculateCriticalPaths(task);
    taskQueue.push(task);
  }
  // Sort tasks in descending order based on critical path length
  taskQueue.sort((a, b) = >b.criticalPathLength! - a.criticalPathLength!) ; runTasks(taskQueue, limit); }// Calculate the critical path length
function calculateCriticalPaths(task: Task) :number {
  // Repeat steps to a task directly returns the value
  if(task.criticalPathLength ! = =undefined) {
    return task.criticalPathLength;
  }

  // If there is no dependents, we are "root", i.e. an undependent project such as app
  if (task.dependents.size === 0) {
    task.criticalPathLength = 0;
    return task.criticalPathLength;
  }

  // recursively Max up +1 each time
  const depsLengths: number[] = [];
  task.dependents.forEach((dep) = >
    depsLengths.push(calculateCriticalPaths(dep))
  );
  task.criticalPathLength = Math.max(... depsLengths) +1;
  return task.criticalPathLength;
}
Copy the code

Selecting subsets of projects

In practical business development, it is generally not necessary to build all the projects in Monorepo. In the article of application-level Monorepo optimization scheme, some pits and related solutions that may be encountered by using Monorepo to manage business projects are introduced, among which there is such a problem:

Slow release

If app1 needs to be published, all packages will be built, not just the package1 and Package2 build scripts that App1 depends on.

Finally, the Selecting Subsets of Projects capability provided by Rush solved the above problems.

Specific commands are as follows:

#Only build scripts of @monorepo/app1 and its dependencies on package are executed
$ rush build -t @monorepo/app1
Copy the code

-t PROJECT is –to PROJECT, the following PROJECT parameter is the name of the end PROJECT package of the task. If you do not want to include the end PROJECT, you can change it to -t parameter, namely –to-except PROJECT. Similarly, parameters that can select a subset of the PROJECT include:

-f PROJECT.--from PROJECT
-F PROJECT.--from-except PROJECT
-i PROJECT.--impacted-by PROJECT
-I PROJECT.--impacted-by-except PROJECT
-o PROJECT.--only PROJECT

If these parameters are not specified, the NPM scripts for all projects registered in rush.json will be executed by default.

Of course, it is possible to specify more than one parameter, and in the worst case, the subsets obtained are projects themselves, which behave the same as if no parameter was specified (usually not).

Practical application scenario: in the CI stage, all the projects that have been changed and the projects that will be affected will be screened out for a build check. For example, if the source code of @monorepo/ A and @monorepo/b is changed, CI will execute the following command:

$ rush build -t @monorepo/a -f @monorepo/a -t @monorepo/b -f @monorepo/b
Copy the code

There is no need to worry that some tasks of projects will be repeatedly executed. Such tasks are just a point with non-zero entry degree in the graph. After subsets are selected, tasks can be executed according to the previous task scheduling mechanism.

In addition to the built-in rush build commands that support the above parameters (the build script in package.json is executed by default), Rush also opens up this capability to custom commands, which means you can customize a rush foo command, This is used to invoke the Foo script in the package.json project of the specified scope, and in conjunction with Rush’s task scheduling mechanism, tasks can be executed in a guaranteed order (if required), as described in the custom_commands section.

Specific implementations of Selecting subsets of projects are beyond the scope of this article.

conclusion

Topological sort and critical path, we did a problem. 🌚

Task scheduling mechanism in Monorepo

preface

The problem

The problem solving

Initialization Task

Add dependencies and dependents

Parallel task execution

Critical path length

Selecting subsets of projects

Slow release

conclusion

Related Posts

Img tag cover+ amplification

JavaScript introduction

Vue + Elemental-UI Table tag (slot)