“This is the 42nd day of my participation in the November Gwen Challenge. See details: The Last Gwen Challenge 2021”.
I. Flink operation architecture
1. Flink runtime components
The Flink runtime architecture consists of four distinct components that work together when running flow processing applications: JobManager, ResourceManager, TaskManager, and Dispatcher. Because Flink is implemented in Java and Scala, all components run on the Java virtual machine. The responsibilities of each component are as follows:
-
JobManager
The main process that controls the execution of an application, that is, each application is executed by a different JobManager. The JobManager first receives the application to execute, which consists of a JobGraph, a Logical Dataflow Graph, and a JAR package containing all the classes, libraries, and other resources. JobManager converts JobGraph into a physical data flow graph, called an ExecutionGraph, that contains all tasks that can be executed concurrently. The JobManager requests resources necessary for performing tasks from ResourceManager, that is, slots on the TaskManager. Once it has acquired enough resources, it distributes the execution diagrams to the TaskManager that actually runs them. During the runtime, JobManager takes charge of any operations that require central coordination, such as checkpoints (which I capture).
-
ResourceManager (ResourceManager)
The TaskManager slot is a processing resource unit defined in Flink. Flink provides different resource managers for different environment and resource management tools, such as YARN, Mesos, K8s, and standalone deployment. When JobManager applies for slot resources, ResourceManager allocates TaskManager with free slot to JobManager. If ResourceManager does not have enough slots to satisfy JobManager requests, it can also initiate a session to the resource provider platform to provide a container to start the TaskManager process. In addition, ResourceManager terminates idle TaskManager to release computing resources.
-
TaskManager
Work process in Flink. There are usually multiple TaskManagers running in Flink, and each TaskManager has a number of slots. The number of slots limits the number of tasks a TaskManager can perform. Once started, TaskManager registers its slots with the resource manager; Upon receiving the resource manager’s instruction, The TaskManager provides one or more slots to the JobManager call. JobManager can then assign tasks to the slot to execute. During execution, a TaskManager can exchange data with other TaskManagers running the same application.
-
Dispatcher
It can be run across jobs and provides a REST interface for application submission. When an application is submitted for execution, the distributor starts and hands the application over to a JobManager. Because it is a REST interface, the Dispatcher can act as an HTTP access point to the cluster, which is not blocked by the firewall. The Dispatcher also launches a Web UI to easily display and monitor information about job execution. Dispatcher may not be necessary in the architecture, depending on how the application submission runs.
2. Task submission process
Let’s look at how Flink’s components interact when an application commits:
The figure above is a high-level view of how components interact in an application. If the cluster environment deployed is different (e.g. YARN, Mesos, Kubernetes, standalone, etc.), some of these steps can be omitted, or some of the components will run in the same JVM process.
Specifically, if we deploy the Flink cluster to YARN, we have the following commit flow:
After the Flink task is submitted, the Client uploads the Flink Jar package and configuration to the HDFS, and submits the Flink task to Yarn ResourceManager. ResourceManager allocates Container resources and notifies NodeManager to start ApplicationMaster. ApplicationMaster loads Flink Jar packages and configures the build environment. Then JobManager is started. ApplicationMaster applies for resources from ResourceManager to start TaskManager. ResourceManager allocates Container resources. ApplicationMaster starts TaskManager with NodeManager, which loads Flink jars, configates the build environment, and starts TaskManager. After TaskManager starts, it sends heartbeat packets to JobManager and waits for JobManager to assign tasks to it.
3. Principle of task scheduling
When Flink clusters are started, a JobManger and one or more TaskManagers are first started. The Client submits a task to the JobManager, which then dispatches the task to each TaskManager for execution. The TaskManager then reports heartbeat and statistics to the JobManager. Data is transferred between taskManagers in the form of streams. All three are independent JVM processes.
Client is the Client that submits jobs. It can run on any machine (connected to the JobManager environment). After the Job is submitted, the Client can end the process (the Streaming task) or wait for the result to return.
JobManager is mainly responsible for checking jobs and coordinating tasks to checkpoint, similar to Storm’s Nimbus. After receiving resources such as jobs and JAR packages from clients, the system generates optimized execution plans and allocates them to each TaskManager based on Task units.
TaskManager starts with a number of slots. Each Slot can start a Task, which is a thread. The system receives tasks to be deployed from the JobManager. After the deployment starts, the system establishes a Netty connection with the upstream device to receive data and process the data.
3.1. TaskManger and Slots
Each worker(TaskManager) in Flink is a JVM process that may execute one or more subtasks on separate threads. In order to control how many tasks a worker can receive, the worker controls them through task slot (each worker has at least one task slot).
Each Task slot represents a fixed-size subset of resources owned by the TaskManager. If a TaskManager has three slots, it triples the memory it manages into each slot. Resource slotting means that a subtask does not have to compete with subtasks from other jobs for managed memory, but instead has a certain amount of memory reserved. Note that there is no CPU isolation involved; slot is currently only used to isolate the managed memory of tasks.
Adjusting the number of Task slots allows users to define how subtasks are isolated from each other. If a TaskManager has one slot, it means that each Task group runs in a separate JVM (which might be started from a specific container), A TaskManager with multiple slots means that more subtasks can share the same JVM. Tasks within the same JVM process share TCP connections (based on multiplexing) and heartbeat messages. They may also share data sets and data structures, so this reduces the load on each task.
By default, Flink allows subtasks to share slots, even if they are subtasks of different tasks (provided they come from the same job). The result is that a single slot can hold the entire pipe of a job.
Task Slot is a static concept, refers to the concurrent execution ability, TaskManager has can pass parameters TaskManager. NumberOfTaskSlots configured; Parallelism is a dynamic concept, that is, the actual concurrency used by TaskManager to run programs. It can be configured using the parameter Parallelism. Default.
In other words, suppose there are 3 taskManagers, and each TaskManager is assigned 3 taskslots, that is, each TaskManager can receive 3 tasks, a total of 9 taskslots. If we set parallelism. Default =1, the default parallelism for running the program is 1.
3.2 program and DataFlow (DataFlow)
All Flink programs are composed of three parts: Source, Transformation, and Sink.
Source reads data sources, Transformation uses various operators for processing, and Sink is responsible for output.
At runtime, programs running on Flink are mapped to “logical dataflows,” which contain all three parts. Each dataflow begins with one or more sources and ends with one or more sinks. A Dataflow is similar to any directed acyclic graph (DAG). In most cases, transformations in a program correspond directly to operators in a Dataflow, but sometimes a transformation may correspond to multiple operators.
3.3. ExecutionGraph
The data flow graphs directly mapped by the Flink program are Streamgraphs, also known as logical flow graphs because they represent a high-level view of computational logic. To execute a flow handler, Flink needs to convert a logical flow diagram into a physical data flow diagram (also called an execution diagram) detailing how the program executes.
The ExecutionGraph in Flink can be divided into four layers: StreamGraph -> JobGraph -> ExecutionGraph -> physical ExecutionGraph.
StreamGraph: Is the initial graph generated from code written by the user through the Stream API. Used to represent the topology of a program.
JobGraph: The StreamGraph is optimized to give rise to JobGraph, the data structure submitted to JobManager. The main optimization is to combine multiple qualified node chains as one node, which can reduce the serialization/deserialization/transmission consumption of data flowing between nodes.
ExecutionGraph: JobManager root generates ExecutionGraph from JobGraph. ExecutionGraph, a parallel version of JobGraph, is the core data structure of the scheduling layer.
Physical ExecutionGraph: the graph is not a specific data structure after jobs are scheduled by the JobManager according to the ExecutionGraph and tasks are deployed on each TaskManager.
3.4. Parallelism
The execution of Flink program is parallel and distributed.
During execution, a stream contains one or more stream partitions, and each operator may contain one or more operator subtasks. These subtasks are executed independently of each other on different threads, on different physical machines, or in different containers.
The number of subtasks of a particular operator is called parallelism. In general, the parallelism of a stream program can be considered as the maximum parallelism of all its operators. In a program, different operators may have different degrees of parallelism.
Two, friendship links
Big data Flink-Java learning journey ii
Big data Flink- The first chapter of Java learning journey