Execute diagram and task chain

Programs and Data Flows (DataFlow)

All flink execution programs can be roughly divided into three parts:

  • Source is responsible for reading the data Source

  • Transformation uses various operators to process

  • Sink is responsible for the output

    At run time, programs running on Flink are mapped to “dataflows,” which includes all three parts. Each Dataflow began with one or more sources and ended at one or more sinks. Dataflow is similar to any directed acyclic graph (DAG). In most cases, transformations in a program correspond exactly to the operators in dataflow, but sometimes a transformation can have more than one operator.

    ExecutionGraph

    • The execution diagram in Flink can be divided into four layers: StreamGraph -> JobGraph -> ExecutionGraph -> physical execution diagram

Data transmission form

  • One-to-one: Stream maintains partitions and the order of elements (such as between source and map). This means that the map operator’s subtask sees the same number and order of elements as the source operator’s subtask produces. Operators such as map, Fliter, and flatMap are one-to-one.
  • Redistributing: The stream partition will change. Each operator’s subtask sends data to a different target task based on the transformation selected. For example, keyBy repartitions based on hashCode, and broadcast and rebalance repartition randomly. These operators both cause the redistribute process, The redistribute process is similar to the shuffle process in Spark.

Task Chains (Operator Chains)

Condition:

  • Same degree of parallelism
  • one to one

This article is reproduced in my personal blog Flink’s operating architecture (ii) under CC 4.0 BY-SA Copyright agreement

Welcome to exchange and study

Personal blog

CSDN home page