As ele. me continues to deepen its application of big data, it needs to solve problems such as rapid increase in the number of tasks, diverse tasks, complex task relationships, low task execution efficiency and uncontrollable task failure.


Current situation of Ele. me big data platform: More than 54,000 big data tasks are completed every day; There are 85 nodes in a cluster.

Open Source solutions

Ooize

Ooize is yahoo’s open-source Java Web application based on a workflow scheduling engine. Oozie Client and Oozie Server.

  


Oozie Server Is a Web application running in the Java Servlet container (Tomcat). Workflow must be a directed acyclic graph and Oozie is essentially a Hadoop client.

When you need to execute multiple associated MR tasks, you only need to write the MR execution sequence into workflow. XML and submit the task using Oozie. Oozie hosts the task flow.

AzKaban

  


AzKaban is a simple task scheduling service developed in Java by Linkedin. AzKaban includes Web Server, DB Server, and Executor Server.

It is used to run a set of jobs and processes in a specific order within a workflow, defines a KV file format to establish dependencies between tasks, and provides an easy-to-use Web user interface to maintain and track your workflow.

AirFlow

  


AirFlow is a platform for orchestrating, scheduling, and monitoring Workflow, open-source by Airbnb and now incubated at the Apache Software Foundation.

Workflow is programmed as DAGs composed of tasks by AirFlow, and the scheduler executes tasks on a set of Workers with specified dependencies.

In addition, AirFlow provides extensive command-line tools and an easy-to-use user interface for users to view and operate, and provides a monitoring and alarm system.

Ele. me scheduling system features

The characteristics of the ele. me scheduling system are as follows:

Task creation is simple, and the execution frequency supports CRon expression.

Tasks are divided into multiple task types, supporting 19 task types (calculation, push, extraction, and detection).

Task dependency configuration is simple, supports different cycle matching, provides recommendation dependency, DAG VIEW function.

Scheduling and execution supports HA, smooth publishing, downtime recovery, load balancing, alarm monitoring, troubleshooting, rapid capacity expansion, and resource isolation.

Supported task types:

Calculation: Hive, Spark, PySpark, MR, Kylin.

Push: MySQL push, HBase push, Redis push, Cassandra push, HiveToX push, MySQL push.

Extraction: Data extraction.

Detection: Dal-slave detection, data quality detection, Edsink detection, extraction data detection, data validity period, import and export verification.

Others: Scheduled email tasks.

The overall architecture of ele. me dispatching system

  


The overall architecture of ele. me scheduling system includes the following five parts:

Web services: mainly provides task creation, instance management, task dependency management, Worker control, task monitoring and alarm, etc.

Scheduling execution: mainly composed of master and standby schedulers and multiple Worker nodes, responsible for task scheduling and execution.

Basic services: It provides such functions as Eless self-publishing, ELK troubleshooting, Huskar configuration center, Etrace burying point monitoring, DOG alarm and so on.

Underlying services: Hive, Spark, Presto, Kylin, and Hadoop are supported.

Public facilities: including MySQL, Redis, and Zookeeper.

  


The task running process is shown in the figure above:

The API provided by The Web Service creates tasks and dependencies and stores the task information into MySQL.

Scheduler periodically generates all task instances for the next day, and periodically polls to check and change the task status to Ready(whether the execution time is up, whether the dependency is completed).

When Worker starts, it registers information to Zookeeper and periodically reports the machine status to Scheduler.

Scheduler’s ZkWorkerManager listens on Zookeeper to get Worker registration information.

The TaskPacketFactory constructs the task as TaskPacket and delivers the task to the Worker using the corresponding SubmitPolicy.

The Worker receives the task through Thrift, parses the task into an InterpreterContext, and delivers it to the corresponding Interpreter for execution. Finally, the Docker runs the task.

The Docker execution is returned to the Worker, who calls back to Scheduler to write the state to MySQL.

Ele. me scheduling system functions

Task dependent on

  


  


Task dependencies can be configured in either of the following ways:

Recommendation dependency: The information of tables and columns is stored in MySQL after task execution, and the Ele. me blood system makes recommendations according to the association of tables.

Manual dependency: Manually set table dependencies on the interface. Dependencies support task dependencies of different periods, and offsets support expressions [,] [~].

Failure fast automatic retry

If a task fails to be executed, the system automatically restarts the task and tries three times by default. If a node refuses to deliver a task due to resource shortage, the dispatcher attempts to deliver a task to another machine based on the load balancing policy.

Self-service Troubleshooting

  


Troubleshooting task execution errors: The node provides the HTTP Service. The node returns task execution logs to the Web Service over HTTP and displays them on the Web Service to facilitate self-service troubleshooting. Or use the link on the page to access the Ele. me Error Analysis platform (Grace) for automatic analysis.

Troubleshooting for non-execution errors: Task scheduling and execution Flume collects task logs. You can query the scheduling and execution status by searching for the global ID in the ELK.

Monitoring alarm

Task monitoring alarm: Based on the alarm rules and alarm frequency specified by users, alarms are generated for mobile phone, email, or nail when the task execution time is overdue or fails.

Fault monitoring and alarm: Scheduling and execution nodes perform Etrace burying points to monitor key points such as receiving, execution, and callback. When the indicator is lower than the average value in the time window of other nodes, an alarm is generated.

scheduler

  


The scheduling active/standby switchover is automatically performed

The scheduler registers with Zookeeper and randomly elects a Leader to provide scheduling services. The non-leader service monitors the Leader status and waits. When the Leader fails, it immediately switches to the Leader role to provide services.

Downtime recovery, self-repair

When all schedulers are down and the scheduling service is not restored, the Worker will perform the node callback abnormally.

At this point, the task status is saved to the local file database and callback is periodically retried. When the scheduling service is restored, the task status is restored.

When the Worker execution node is down, tasks on the node will be running. When the node is restarted, the Worker will self-repair the running tasks, reset the uninitiated tasks on the node, and restore the running tasks by reading the status file written into the local after the execution of the Docker.

A smooth release

When the version of the Worker node is upgraded, the running tasks repair themselves, as above.

Resource isolation and rapid capacity expansion

Use Docker to limit Memory and CPU usage per task; Package the underlying services that you depend on as images, so you can easily build the required environment when scaling up.

Node Fault Maintenance

  


When a node is faulty or needs maintenance, the Worker execution node can go online and offline through the Web interface. After the node is offline, it considers that it no longer receives tasks, but does not affect the running of tasks on the node.