What is distributed scheduling

What is distributed task scheduling? There are two meanings:

  1. Scheduling tasks running in a distributed cluster environment (if multiple copies of the same scheduled task are deployed, only one scheduled task should be executed)
  2. Distributed scheduling – > Distributed scheduling of scheduled tasks – > Split of scheduled tasks (that is, to split a large task into several small tasks and execute them simultaneously)

Elastic – Job is introduced

Elastic-job is an open-source distributed scheduling solution of Dangdang, based on Quartz secondary development, consisting of two independent sub-projects elastic-job-Lite and Elastic-job-cloud. The key point here is Elastic Job-Lite, which is positioned as a lightweight, decentralized solution that provides coordination services for distributed tasks in the form of Jar packages.

Main functions:

  • Distributed scheduling coordination: In a distributed environment, tasks can be executed according to the specified scheduling policy, and multiple instances of the same task can be avoided repeated execution
  • The rich scheduling policies are based on the mature scheduled task frameworkQuartz cronExpression to execute a scheduled task
  • Elastic capacity expansion when an instance is added to the cluster, it should also be able to be elected and perform tasks. When a cluster reduces an instance, the tasks it performs can be transferred to another instance.
  • Failover After a task fails to be executed, an instance is transferred to another instance
  • Missed execution job retriggering: If a job is missed for some reason, the missed jobs are automatically recorded and automatically triggered after the last job is completed.
  • Parallel scheduling: Task sharding is supported. Task sharding means that a task is divided into multiple small tasks and executed in multiple instances at the same time
  • Job sharding consistency: After tasks are sharded, there is only one execution instance for the same sharding in a distributed environment.

Elastic-job-lite lightweight decentralized features

Lightweight:

  1. Is full ofjarReferences, necessary dependencies onlyzookeeper
  2. No independent deployment is required

Decentralization:

  1. Perform node peer: program andjarThe only thing that might be different is sharding
  2. Automatic scheduling: No central scheduling node is assigned
  3. Service self-discovery
  4. The primary node is not fixed

Elastic – Job – Lite application

Jar package (API) + Install zK software

Zookeeper software (version 3.4.6 or later) must be installed. Zookeeper has the following functions: storage and notification.

Task fragmentation

A large and time-consuming Job, such as processing 100 million data at a time, which is stored in the database. If it takes a long time to process 100 million data with a single Job node, it is not acceptable in the Internet field, which prefers the increase of machines to horizontally expand the processing power. Therefore, ElasticJob can divide jobs into multiple tasks (each task is a single task) and assign each task to a specific machine instance (a machine instance can handle multiple tasks), but we can specify the logic for each task to execute.

The Strategy policy defines how to allocate the shard items to each machine. By default, the shard items are evenly distributed to each machine. The policy can be customized. Sharding and the jobs themselves are coordinated through a registry, because in a distributed environment, state data must be centralized to one point before it can be communicated across the distribution.

Code snippet:

The elastic expansion

When you add a new running instance app3, it will automatically register with the registry. When the registry finds a new service, the registry will notify ElasticJob to re-shard it.

1. The sharding item is also a JOB configuration. After the configuration is modified, the sharding algorithm will be called again before the next scheduled run. The result of which machine runs which slice is stored in the ZK. The master node will divide the shards into the registry, and then the execution node will get the information from the registry (the execution node will get the corresponding shard when the scheduled task is started).

2. If all nodes fail and only one node is left, all shards will point to the remaining node, which is also high availability for ElasticJob.

instructions

Article content output source: pull education Java high salary boot camp course summary