This article is intended for those with basic Java knowledge

Author: HelloGitHub – Salieri

HelloGitHub introduces the Open Source project series. After several efforts and communication, I finally invited Salieri, author of Distributed Task Scheduling and Computing Framework: PowerJob, to join HG’s open source lecture series and open his PowerJob lecture series 🎉. There will be an update every Wednesday, please keep watching and hopefully you can learn something from this series.

The address of the project: https://github.com/KFCFans/PowerJob

One, the origin

Hi, I’m Salieri, the author of PowerJob. The story about PowerJob started a year ago.

A year ago, I went to Alibaba Group to start my summer internship. As luck would have it, the first serious development assignment I received was closely related to distributed task scheduling and computing.

At that time, a new task scheduling middleware (SchedulerX 2.0, the framework reference mentioned in the README) was developed internally and needed to be migrated from the old DTS to SchedulerX 2.0. This glorious and great task was naturally assigned to me by my senior brother. From then on, I began to contact and use this kind of distributed task scheduling and computing middleware.

SchedulerX and I had a honeymoon period for a long time after the migration. I have to say that SchedulerX’s design concept is very advanced. For example, dynamic transfer of runtime parameters through the console or OpenAPI makes traditional tasks very flexible, and different functions can be implemented without changing the code. Another example is the MapReduce processor, which allows developers to implement distributed computing with just a few lines of code to handle large amounts of data. However, the good times did not last long. On the occasion of double 11, two sad stories happened.

As double 11 is approaching, due to the surge of data to be processed, the offline tasks that had run perfectly on SchedulerX before began to fail frequently, and the frequency of alarm calls before the whole double 11 even exceeded the frequency of wechat reminders (well, part of the reason is that no one called me T_T). After an investigation with relevant developers, it was preliminarily concluded that the cause of the problem was that our application occupied too much memory, so SchedulerX did not have enough memory to complete the necessary tasks, thus leading to the task failure. SchedulerX is clearly unintelligible and reasonable, and does not meet the minimum operating requirements. It is like buying a Macbook Air and installing Windows to play PUBG only to find that there is no welcome screen. What can you say? You can only blame yourself for not meeting the minimum operating requirements specified, and you can only accept them. Finally, there is no way, can only rob Peter to pay Paul barely survive the double eleven.

The other thing is limiting the flow. To monitor the running status of tasks, I wrote a separate logic in another application that polls and queries the running status of SchedulerX tasks, and it worked perfectly. Until one day, after I had released a minor change, I logged on to the online logging platform to check the application’s runtime logs for production safety. The runtimeExceptions that filled the screen made me wonder if I had accidentally deleted a module, deleted a database, or released the wrong branch. After the panic, I calmed down and looked at the abnormal information. Only then did I find that the interface for checking the running status of tasks provided by the SchedulerX I called was reported wrong and the flow was limited. The reason is double 11 security. Well, because of the need to ensure the stability of the double eleven, so the first link is not in the double eleven circle but at least standing on the side of the application. Communication fruitless, can only a magic change code, to achieve their own task status monitoring.

The SchedulerX team doesn’t have a problem with these two things. After all, it is impossible to serve all business lines of the whole group without making some restrictions. However, it does exist that some individual needs cannot be met in the middle stage mode. For most access users, it’s just a matter of relying on a Jar package, writing some code, going to the console and configuring the task, and the experience is great. After all, not all users have the perverted needs of our millions of sub-tasks……

After the double Eleven, the internship expired, I will leave ali to go home, open the mix eat and wait for death mode, every day is not playing games is thinking about how to play games, right, and told myself that tomorrow must study hard.

After N months of unconsciousness, I finally remembered the graduation thesis. No way, for the humble degree, I can only temporarily wash my hands, into the writing of the paper. When I finished writing my paper, the epidemic was almost over, and all my friends who “sent their heads” went to work. The condition (number of people ==5) that constituted my desire to play games was destroyed, and I was completely free. Pick up my traditional art skill — Reading.

After reading a lot of weird books (including a romance novel), I finally remembered something I’d been wanting to do but had been putting on hold by my lazy self: OhMyScheduler (originally called OhMyScheduler), later renamed PowerJob, was born


Second, the first

Now that I’m done, it’s time to take on the banner of “the next generation distributed Task Scheduling and computing Framework” (which has a long way to go) and start the main body.

2.1 Task scheduling framework

Scheduled tasks are familiar, such as the classic Linux crontab. Timing scheduling and timing execution have gradually become the intermediate systems that all systems generally need to rely on. In the Java space, there are also many excellent task scheduling frameworks.

The current popular job scheduling frameworks in the market include the old Quartz, Quartz-based Elastic-Job, and the original Quartz-based xxl-job. Here are some disadvantages of these frameworks.

Quartz can be regarded as the first generation of task scheduling framework, and is basically the “ancestor” of all existing distributed scheduling frameworks. Due to historical reasons, it does not provide a Web interface and can only configure tasks through APIS. Therefore, it is not convenient and flexible to use. Meanwhile, it only supports the execution of tasks on a single machine and cannot effectively utilize the computing capability of the entire cluster. At the same time, Quartz requires scheduling and execution coupled in the same application without the ability to platform services.

Xxl-job can be regarded as the second-generation task scheduling framework, which solves the shortcomings of Quartz to a certain extent. In the past few years, xxL-Job has been an excellent scheduling framework, but it still has some shortcomings today, as follows:

  • Database support single: only support MySQL, the use of other DB need to change their own magic code
  • Limited distributed computing capability: only static sharding is supported, and complex tasks cannot be well calculated
  • Workflow is not supported: Dependency relationships between tasks cannot be configured and are not applicable to scenarios where complex dependencies exist between tasks

Just as the so-called Yangtze River wave after the wave before, in today’s era of increasing data volume, business is more and more complex, there is an urgent need for a more powerful task scheduling framework to solve the problem of appeal, and PowerJob therefore arises at the historic moment.

2.2 PowerJob debuts

PowerJob can be regarded as the third generation task scheduling framework. On the basis of task scheduling, it also provides additional distributed computing and workflow functions. Its main features are as follows:

  • Easy to use: A Web interface allows developers to visually manage scheduling tasks (adding, deleting, modifying, and querying tasks), monitor task running status, and view run logs.
  • Improved timing policies: Supports CRON expression, fixed frequency, fixed delay, and API timing policies.
  • Rich execution modes: supports single machine, broadcast, Map, and MapReduce execution modes. The Map/MapReduce processor enables developers to obtain cluster distributed computing capability with only a few lines of code.
  • Workflow support: Support online configuration of task dependencies, visual orchestration of tasks, and support data transfer between upstream and downstream tasks
  • The actuator supports a wide range of applications: Spring Bean, built-in/external Java classes, Shell, Python and other processors are supported.
  • Convenient operation and maintenance: Supports online logs. Logs generated by actuators can be displayed on the front-end console in real time, reducing debug costs and greatly improving development efficiency.
  • Rely on streamlining: minimum relying only on relational database (MySQL/PostgreSQL/Oracle/MS is essentially, etc.), at the same time support all Spring Data JPA supported by relational database.
  • High availability & High performance: The scheduling server is carefully designed to achieve lock-free scheduling by changing the strategy of other scheduling frameworks based on database locks. Deploying multiple scheduling servers enables both high availability and improved performance (with unlimited horizontal scaling).
  • Failover and recovery: If a task fails, retry the task based on the configured retry policy. The task can be successfully completed as long as there are enough compute nodes in the cluster.

2.3 Application Scenarios of PowerJob

To sum up, PowerJob is a new generation of distributed scheduling and computing framework, which enables you to easily complete task scheduling and distributed computing of complex tasks. It is suitable for all enterprises with task scheduling requirements. The unified deployment Server serves as the common scheduling platform of the whole company and becomes the middleware of distributed scheduling.

  • Service scenarios that require scheduled execution: For example, data is fully synchronized at dawn every day and service reports are generated.
  • For example, clearing cluster logs in broadcast mode is required.
  • For example, if a large amount of data needs to be updated and a single machine takes a long time to execute, you can use the Map or MapReduce processor to distribute tasks and mobilize the entire cluster to speed up computing.

Third, the outline

The following will gradually from the start to the use of core technology analysis, I hope you can gain from it, at the same time welcome friends can contribute code oh! The outline is too long (10+ chapters)

  • Quick learning
  • Overview of PowerJob technology
  • Technical analysis: Akka framework
    • The Actor model
    • Akka-remote simplifies communication code
    • Akka API is introduced
  • Technical analysis: task scheduling and distribution
    • Time wheel algorithm
    • Scheduling layer: OmsSchedulerService
    • Distribution layer: DispatchService
  • Technical analysis: The application of Spring AOP technology
    • intercept
    • exclude
  • , etc.

Iv. Summary and notice

This chapter mainly describes the birth story of PowerJob, and briefly introduces the functions and application scenarios of PowerJob framework, the outline of this series. In the next section, I’ll give you a quick start on PowerJob, which is a powerful distributed task scheduling and computing framework.

Follow the public account to join the communication group (author in Java Group)