This article is intended for those with basic Java knowledge
Author: HelloGitHub – Salieri
HelloGitHub introduces the Open Source project series.
Note: Finally, the much-anticipated scheduling layer principle. In fact, I wanted to write this part for a long time, because there are so many people asking… The first sentence of most of our partners when they enter the user group is: “Pigs, how can lockless scheduling be realized?” “, the rest of the sharp little friends even directly asked: “Pigs, you this strong performance unlimited reflected in where ah?” .
Unfortunately, I arranged a thrilling and exciting trip to the northwest in early July. Every day I was either in the car or on the way to the car. Although I felt the vast territory of the motherland, the beautiful scenery and the bright culture, I was tired and half dead. Article well, nature is also all the way to now…
Well, it’s time to show some real skills
I. Scheduling layer overview
PowerJob currently supports four scheduled execution policies: CRON, Fixed frequency, fixed delay, and API. API is used to start tasks directly through the client interface provided by PowerJob without the server’s support for scheduling. The remaining three scheduling strategies can be divided into routine tasks and second-level tasks according to their execution frequency. Let’s start with the general mission.
Regular tasks refer to tasks whose periodic policies are specified by CRON expressions. This type of task is characterized by low execution frequency. PowerJob schedules such tasks based on database polling. The schematic diagram is as follows.
The task table of PowerJob maintains the basic metadata of the task, such as the task name, timing policy, and executor information. An additional field next_trigger_time is added to the task table. When a task is successfully created, The system uses a CRON expression to initialize this field to ensure that each CRON task has an available next schedule time.
With this field, the specific scheduling is easy to do. Powerjob-server enables a background thread to periodically scan the task table for locally scheduled tasks that are about to be executed (that is, the difference between the next scheduled time and the current time is less than a system-specified threshold).
(Here is a small hint, “scheduling by the machine” is actually the key to achieve lockless scheduling, which will be revealed in the next article. This paper mainly describes the scheduling process, so directly take the single machine as an example.)
Once it is found that there are tasks that need to be scheduled for execution in the following period of time, it will generate execution records for these tasks and push them into the time wheel, and finally complete the task scheduling.
What are the wonderful designs and implementations of a process that sounds so plain and simple? Please listen to me break down ~
Two, high performance scheduling – time round
If you were given a task and asked to execute it in 2 seconds, how would you solve it?
The simplest solution is to use sleep. 1 second later, so I let the current thread sleep 1 second, isn’t that enough? Yes, it is possible to implement the simplest timing executor with three lines of code based on the thread sleep feature, but its performance… Nature is also quite a drag… Since each task needs to be bound to a separate thread, this solution can consume enormous resources when there are a large number of tasks in the system.
So how to achieve efficient scheduling?
Perhaps, just like Newton, who was crushed by an apple, the inventor of the time wheel algorithm looked down at his Rolex when he was struggling to find an efficient scheduling solution. He thought the watch was so unsophisticated, but at the same time, he seemed to find a little inspiration
According to the previous analysis, the thread-sleep scheduler is inefficient because it requires a lot of thread resources, which wastes a lot of CPU and memory resources. So is there a way to avoid this consumption? Looking at this chart, someone found the answer.
Time round is a scheduling model that efficiently uses thread resources for batch scheduling. A large number of scheduling tasks are all bound to the same scheduler (a thread) above, using this scheduler to manage all tasks, trigger and run, can efficiently manage various delayed tasks, periodic tasks, notification tasks and so on.
The algorithm model of time wheel is shown in the figure above. Each time wheel has N slots, and the interval between the two slots is fixed. At each interval, the pointer moves forward one space and then begins processing all tasks in the current slot. The pointer continues to loop until there are no tasks left in the time wheel.
When a task is added, you can calculate a time slot based on the task scheduling time and the current time. In order to put the task into the specified location at the cost of time complexity O(1), the time slot needs to have the ability of random access, so this part is implemented with a circular array. The length of the task queue corresponding to each time slot is uncertain, and only sequential access is required. For this task queue, a one-way linked list is used.
Each time wheel has two required parameters, tickDuration and ticksPerWheel. The time interval is how often the hands turn, and the number of ticks is the number of slots in the dial. TickDuration is 1, ticksPerWheel is 12.
So many theories, here is a specific example to help you understand the time wheel (in fact, the concept of time wheel is very easy to understand, the specific implementation is not difficult, can be said to be a cost-effective data structure ~)
Suppose I now have a time wheel with 1 second intervals and a scale of 12, and I need to schedule three scheduled tasks to be executed after 1 second, 6 seconds, and 13 seconds. What is the workflow of the time wheel?
First, the first step is task insertion. Since the dial design is circular data, the slot subscript for the task can be calculated by using the formula (estimated execution time – time wheel start time) % scale, that is, the tasks are inserted into the linked list corresponding to slot 0, 5 and 0 respectively.
Once the task has been inserted, it is then up to the scheduling thread to pick up the task and execute it. The scheduling thread cycles through and executes tasks in the linked list in the next slot by hibernating tickDuration. Since tasks in the linked list may not be scheduled in this round (for example, tasks to be executed after 13 seconds are actually required to be executed in the next scheduling cycle), additional judgment is required on the expected execution time of tasks. Only tasks that meet the requirements will be scheduled and executed and removed from the linked list.
In this way, a large number of tasks can be completed by one thread with both performance and efficiency. The only drawback is that because tickDuration is used, there is some error in the scheduling. If you’re a stickler for timing accuracy, the time wheel might not be your thing. Otherwise, why not grab it?
With the concept of time wheels out of the way, let’s return to the framework itself. For example, due to the overhead of executing tasks after PowerJob scheduling (involving database operations), an additional processing thread pool is introduced to ensure the accuracy of scheduling. Source a total of 326 lines, interested in words, quickly go to see, class names are ready for you!
com.github.kfcfans.powerjob.server.common.utils.timewheel.HashedWheelTimer
Copy the code
Reliable scheduling — WAL
Reliable scheduling is also a problem widely concerned by everyone. Some students even left comments on GitHub Issue telling me the unreliable scheduling problem of their self-developed scheduling system in the production environment:
Is there a missed scheduling problem with PowerJobs? The answer is clearly no. (For a production-grade scheduling middleware that emphasizes extremely high availability and stability, what shame would it be if it couldn’t do that?
So the question is, how does this work?
WAL (write-ahead Logging) is a key technology used by mainstream relational databases (MS SQLServer, MySQL, Oracle) to ensure atomicity and persistence of transactions. The core idea of WAL is that data is written to the log before it is written to the database. In this way, write-ahead logs allow the storage system to recover to the state before the crash under the guidance of logs, avoiding data loss, provided that disk data is not damaged.
PowerJob also uses this idea to achieve reliable task scheduling. When each task is scheduled for execution, the system generates a record for it, which contains the expected scheduling time of the task instance (a task running is called a task instance). PowerJob persists the record to the database. After the persistence is successful, the task is officially pushed to the time wheel for scheduling.
Once the server goes down, the task is not executed on time. Other servers can restore the task instance records that have been written to the database to achieve reliable scheduling ~
That is, as long as you have a PowerJob-Server alive on your system, there will be no missing dispatchers.
4. Second level tasks
Enough about routine tasks, let’s talk about second tasks
Second tasks are characterized by extremely high frequency of execution, so can we use the same method that supports regular task scheduling to support second task scheduling?
The first is task acquisition. emmm… “Scan the task list at intervals for tasks to be performed”, which… By the time you get the mission, the day lily will be cold… That’s not right… That’s right. With traditional scheduling, step one fails. (I thought it would be hard, but not that hard!)
But some of the smarter students might have figured it out. Since the second task is executed very frequently, the server can retrieve the task and save it so that the next schedule does not need to look up the database separately, but instead chooses memory traversal as fast as possible, which seems to solve this problem.
However, this approach is still not perfect. As the saying goes, less is more, and second tasks are performed so frequently that, in most cases, it doesn’t matter if you fail once or twice, because the next task will immediately follow. Therefore, the reliable scheduling mechanism of traditional tasks is not applicable to second-level tasks. If second-level tasks use this mechanism, the database will be greatly impacted and the overall performance of PowerJobs will be greatly degraded. So what is the way out?
Which brings us to the ultimate solution to computing problems: divide and conquer. Since high reliability is not required for task execution, powerJob-Server can be delegated at this time.
Every second-level task will be directly delivered to a powerjob-worker in the cluster, and powerjob-worker is fully responsible for its execution. The Powerjob-Server is only responsible for fault recovery.
In this way, the pressure on the server will be further reduced. Meanwhile, since the scheduling and execution of second-level tasks fall on the worker, the scheduling accuracy will also be improved (at least the network delay of communication can be saved), which can be described as a perfect win-win solution.
Five, the last
That’s all for this article
This article describes the implementation of the PowerJob scheduling layer and some of its clever designs. However, limited to the length, the whole scheduling layer is not completely presented in front of everyone, at present, there is still a half-masked state ~ we are most concerned about how to avoid repeated scheduling tasks under the multi-server, how to achieve horizontal capacity expansion of the multi-server are not mentioned in detail, just a few words. Specific content, put in the next article to say ~ in advance to reveal the plot about it, the core of the four words: group isolation. If you can’t wait, find out for yourself in the code, boy
Project Address:
https://github.com/KFCFans/PowerJob
Follow the public account to join the communication group (author in Java Group)