Design and practice of Doodle distributed timing scheduling System Sigmax

1. Introduction

Sigmax is a high-performance, distributed task scheduling engine developed by the Doodle Intelligence Middleware team based on Golang. Sigmax provides a unified, stable and accurate scheduling platform for the complex and diverse scheduling scenarios unique to the IoT field to facilitate the implementation of scheduling scenarios for all lines of business within the company.

At present, Sigmax has been running stably in the company for one year, with daily task scheduling and daily task triggering of single cluster reaching tens of millions of times.

Background 2.

As the world’s leading AI+IoT platform, Doodle Intelligence is connected to a large number of smart devices. Every day, a large number of users control their smart devices to achieve some intelligent scenes, such as:

User A automatically opens the curtain at 7:00 every morning;
User B, when the visitor passes the intelligent access control, periodically removes the permission of the visitor;
User C, after sunrise or sunset, delays the execution of a linkage scene, etc.;

In addition, there are a number of scheduled tasks between our internal services, such as:

BI students regularly execute report generation every day;
Devops regularly checks the machines to be maintained every day;
Check your to-do list regularly, etc.

Finally, we want to implement some delayed message scenarios.

Before the launch of Sigmax system, these requirements were scattered in their respective systems and realized through their respective technical solutions, resulting in the disunity of technology stacks or the phenomenon of repeated “wheels”.

We also investigated open source projects. First of all, Quartz, as a leader in open source task scheduling, has its scheduling logic and task processing logic coupled in a project, which increases the complexity of the project while limiting the scheduling system’s ability to business processing. However, in our scenario, the processing logic of tasks varies according to business scenarios. Therefore, in design, we can decoupled the understanding of tasks to each business system and focus on building a task scheduling engine to achieve the goal of precision and high performance.

In addition, compared with other projects such as Machinery, Cron and Kala, none of them can fully meet our business needs in the aspects of timed task model, system scalability, low delay scheduling of task and high throughput of system, while the secondary development on any system can simultaneously meet functions and performance. The stability of the requirements are large transformation costs and maintenance costs.

Therefore, we hope to provide a unified platform to facilitate access to timed task scenarios such as timing and delay in the IoT domain, while maintaining stability, accuracy, high throughput, and “just right” in terms of functions close to the business.

3. System design

When we see the traditional scheduled task scheduling scheme, scheduled tasks are centrally stored in the database, and the scheduled tasks are retrieved and executed by periodically querying the database. As your business grows, there are three problems with this design.

First: as the number of users or equipment increases geometrically, there are certain limits to the horizontal expansion of the system. Second: more systems over-rely on relational database, business growth after the database pressure is greater, need to carry out complex sub-database sub-table; Thirdly, business scenarios determine that scheduled tasks have obvious peaks and lows. When a large number of tasks are concentrated at a certain point in time, the pressure on a single node is too high and the task delay is high, which will directly affect the user experience.

Based on service distribution, you need to support scheduled tasks in different time zones and daylight saving time (DST) systems in different countries.

3.1 Service Abstraction

After we fully understand the business timing scenarios and the pain points existing in the current system, we summarize all the timing scenarios and finally abstract out three types of timing task models:

CronJob: Permanent periodic scheduled task.
DelayJob: delay tasks such as countdown and delayed messages;
SimpleJob: A scheduled task that is executed at a certain frequency in a specific period of time.

After summarizing and abstracting the business scenarios, we focus on how to solve the bottleneck and pain points of the current system. Next, we will look at the overall architecture design of the system.

3.2 System Architecture

The core of Sigmax scheduling engine is to draw lessons from the idea of timewheel, and abstract several modules of task management, scheduled scheduling, task storage and distributed cluster management to enhance the scheduling ability and reliability of the system. The overall structure is divided into three layers, including APIServer access layer, distributed scheduling cluster Scheduler and scheduled task data storage layer. Among them, APIserver and Scheduler are stateless and support horizontal scaling. Scheduler decouples task scheduling and storage and supports distributed parallel scheduling. Finally, we realize the decoupling of Sigmax and business system by notifying corresponding business system of the trigger of scheduled task through message queue.

The components corresponding to the above architecture are as follows:

Sigmax-Apiserver: unified API entry for service access and task management;
Sigmax-master: cluster management controller, responsible for scheduling engine cluster management and cluster load balancing.
Sigmad: Timing task scheduling engine based on Timewheel;
Sigmax-console: a Web Console for task management and cluster management;

The overall system architecture is shown in the figure below:

Cluster data flow and control flow are shown below:

3.3 Detailed system design

APIServer is a set of standard task management interfaces provided by Sigmax. It provides RESTFul apis and RPC apis to manage the life cycles of different types of scheduled tasks.

3.3.1 Interface List

Adding a Scheduled Task
Deleting a Scheduled Task
Modifying a Scheduled Task
Querying a Scheduled Task
Suspending a Scheduled Task
Restoring a Scheduled Task
Resetting a Scheduled Task

3.3.2 Detailed design of Scheduler based on Timewheel

Sigmad is the core component of Sigmax. Based on the idea of timewheel, Sigmad is responsible for parallel scheduling and triggering of scheduled tasks. Sigmad is stateless, so it can be expanded or shrunk horizontally based on the load of the cluster.

Introduce the concept of real life clock into the system design, define a time period and step size. Generally, the time period can be set to 24 hours a day or 7 x 24 hours a week. You can adjust the step based on the time accuracy requirements of scheduled tasks. The default value is 1 minute. Therefore, each time round is evenly divided into 7 * 24 * 60 = 10080 time fragments, called Step in the system.

When the pointer passes through a scale (step), the system will get all the task list on the current time scale, load it into memory, and schedule calculation and trigger. Each Sigmad node corresponds to a time wheel, so more time wheels will carry more scheduled tasks at the same time, increasing the system’s throughput and parallel computing capabilities.

Note: Each Sigmad node corresponds to one or more timeWheels, but the same timewheel can only be scheduled by one Sigmad at a time.

The time wheel is shown in the figure below:

Multi-partition time wheel

In order to improve the throughput of the system, improve the concurrent scheduling ability of Sigmax and avoid the pressure on the system caused by scheduling a large number of tasks at a certain hot point, thus affecting the accuracy of task scheduling, we design a two-level concurrency. At the first level, the capacity of parallel computing is increased by expanding the capacity of Sigmad nodes. At the second level, the concept of multi-partition is introduced, and the single Partition of each time slice is divided into multiple time slices to increase the capability of parallel computing.

Note: Step and Partition are called a Slot together.

Time wheel multi-partition is shown in the figure below:

3.3.3 Distributed Cluster Management

In order to ensure the high availability of the cluster and the parallel computing capability of the cluster, Sigmax systems are designed in cluster mode to avoid single point of failure. Among them:

Sigmax-master implements the Master/Slave HA mode. Only one Master is working at a time, and the other works as a Slave Standby.
Sigmad stateless deployment supports horizontal dynamic capacity expansion and shrinkage.

Sigmax-master is responsible for the management of the Sigmad cluster, including capacity expansion, scaling, score scoring, Timewheel partition rebalance, and so on.

Cluster management is shown in the following figure:

3.3.4 Cluster Load

Sigmad is responsible for fetching the Job from the corresponding Timewheel slot and triggering the execution. When a Sigmad in the cluster is overloaded, the sigmax-master is responsible for “migrating” the tasks of this node to other Sigmad nodes to maintain the overall balance of the cluster. Its working principle is as follows:

4. Cluster monitoring

After Sigmax went online, we monitored and alerted the task and cluster to ensure that service abnormalities, cluster fluctuations or scheduling abnormalities could be found in time. The health monitoring of cluster nodes and services depends on the basic monitoring of the company. This section mainly describes how to monitor multiple maintenance for cluster task scheduling.

4.1 Task Monitoring

4.1.1 Task Life cycle Management

In view of the load of the whole cluster, we monitor the life cycle of all tasks in the cluster, and the tasks in the cluster can be seen intuitively, as shown in the following figure:

4.1.2 Scheduling accuracy

In view of the key factors that affect service stability and user experience, we monitor the accuracy of task scheduling. Scheduling accuracy mainly monitors the deviation between the task triggering time and the expected task triggering time. Scheduling accuracy is divided into several levels:

Ontime: no delay scheduling, all tasks are triggered ontime;
Delay :[1ms-200ms], (200ms-600ms], (600ms-1000ms], (1000ms-2000ms], (2000ms – ∞]

It can be found that most tasks in the current system are executed without deviation scheduling.

4.1.3 Task Triggering Success Rate

Task triggering success Rate The success rate of all scheduled tasks that are successfully scheduled and triggered by each system is used to discover that the task fails to notify the service system. Under normal circumstances, when all tasks are delivered normally, the ratio is 1.

4.1.4 Cluster Management monitoring

In order to facilitate the cluster management and control, we monitor the Master and slave nodes and the timewheel of Sigmad scheduling, so that the dynamic changes of the cluster can be intuitively observed.

5.Roadmap

Sigmax will continue to optimize and refine the details to meet the needs of different business systems and customers.

5.1 Task Priority

According to different service scenarios, users supported by Sigmax can set the priorities of scheduled tasks. Tasks with higher priorities are scheduled and executed first, and services with lower priorities can be scheduled within a certain period of time.

5.2 Fine Cluster Load Balancing

Sigmax currently implements timewheel level load balancing, but there are still a large number of time fragments and tasks under each Timewheel, which are not fine-grained enough. The next step is to carry out load balancing of the cluster based on time fragmentation to reduce the load fluctuation of Sigmad nodes caused by load balancing.

5.3 Optimization of scheduling accuracy

As can be seen from the task scheduling accuracy monitoring, there will be a small amount of task scheduling delay in the peak period of some large areas. We will also optimize the system design for higher loads to avoid the task scheduling delay in the peak period.

5.4 Visual Task Management

Visual task management will add more functions to the existing control console, so that users can add, modify and delete tasks more conveniently. It also supports more dimensions of task query, statistics and display.

5.5 Visualized Cluster Management

The visual management of cluster is mainly from the perspective of the operation and maintenance personnel of Sigmax system, providing a set of management and control console for cluster expansion and contraction, node operation and maintenance, and cluster load balancing.

6. Summary

This paper introduces the design and implementation of doodle Intelligence timing scheduling system Sigmax based on the specific timing task scenario of Doodle Intelligence in AIoT field. The innovative optimization design based on the idea of time wheel is introduced in order to meet the large-scale and high performance requirements of business. At the same time, the distributed and high availability design of Sigmax components is introduced to satisfy system HA. Finally, it introduces the partial monitoring landing and prospects of tasks and clusters.

This will be followed by continuous iterations to meet more business scenarios and further optimizations in terms of stability and performance. Welcome interested friends to communicate more.

This post was originally posted on The Doodle Smart Tech blog

Tech.tuya.com/tuya-schedu…

Please indicate the source of reprint