preface

This article should start with an understanding of the Flink component and then get down to business with simple modes Local and Standlone. This article mainly elaborates on three modes under Yarn mode, as well as Kubernetes mode (this article will not elaborate).

component

Before we look at the commit pattern, let’s look at the collaboration between Flink components.

 

Resource Manager

(1) Manage the slot slot of TaskManager.

(2) When job manager JM applies for slot resources, RM will allocate TM with free slots to JM. If RM does not have enough slots to satisfy the JM request.

(3) It can also initiate a session to the resource provider platform to provide a container to start the TM process.

JobManager

(1) The main process that controls the execution of an application, that is, each application is executed by a different JM.

(2) JM will first receive the application to be executed, which will include: Job Graph, logical Dataflow Graph, and JAR packages containing all the classes, libraries, and other resources.

(3) JM converts Jobgraph into a physical data flow graph, called an Executiongraph, that contains all tasks that can be executed concurrently. The Job Manager requests resources necessary for performing tasks from Resourcemanager. Big data training is also a slot on the Taskmanager. Once it has acquired enough resources, it distributes the execution diagrams to the TM that actually runs them. During the runtime, the JM takes charge of everything that requires central coordination, such as checkpoints (which I’ve found).

Taskmanager

(1) The working process in Flink. There are usually multiple TMS running in Flink, each containing a certain number of slots. The number of slots limits the number of tasks a TM can perform.

(2) After startup, TM will register its slot with the resource manager; Upon receiving the instruction from the resource manager, TM provides one or more slots to the JM call. TM can then assign tasks to slots for execution.

(3) During execution, one TM can exchange data with other TMS running the same application.

Dispatcher

(1) It can run across jobs and provides a REST interface for application submission.

(2) When an application is submitted for execution, the distributor starts and hands the application over to JM.

(3) Dispatcher will start a web user interface (WebUi) to conveniently display and monitor the information of job execution.

The Local mode

JobManager and TaskManager share the same JVM and only require JDK support. They run on a single node and are mainly used for debugging.

Standlone mode

 

Standlone is a distributed cluster of Flink. It does not depend on other resource scheduling frameworks or YARN.

In the role of Master is the JobManager.

The Slave/Worker role is TaskManager

Configuration and Startup

(1) There are two files in conf: masters and workers specify address.

(2) The self-configuration of conf/flink-conf.yaml needs to be configured.

(3) Distribute each machine.

(4) Start the cluster bin/start-cluster.sh

(5) Submit the task Flink Run

Yarn pattern

First, look at the submission process

 

(1) Before submitting the App, upload Flink Jar package and configuration to HDFS, so that JobManager and TaskManager can share HDFS data.

(2) The client submits a Job to ResourceManager. After receiving the request, ResouceManager allocates container resources and notifies NodeManager to start ApplicationMaster.

(3) ApplicationMaster loads the HDFS configuration and starts the corresponding JobManager. The JobManager then analyzes the current job graph and converts it into an execution graph (which contains all tasks that can be executed concurrently) to know the specific resources required.

(4) JobManager applies for resources from ResourceManager. After receiving the request, ResouceManager allocates container resources. It then tells ApplictaionMaster to start more TaskManagers (allocate container resources before starting TaskManager). When Container starts TaskManager, it also loads data from HDFS.

(5) After TaskManager starts, it sends heartbeat packets to JobManager. The JobManager assigns tasks to the TaskManager.

Session Mode

 

Session mode preinitializes a cluster and then commits applications to the cluster. All applications run in the same cluster and share resources. There’s only one JobManager. Jobs submitted to this cluster can be run directly. As is shown in

Session mode: Dispatcher and ResourceManager are shared, and jobs share cluster resources.

Session jobs are not isolated from each other, and if a TaskManager fails, all jobs hosted on it will fail as well. Similarly, the more jobs that are started, the greater the load on JobManager.

Therefore, the Session mode is suitable for scenarios with short life cycles and low resource consumption.

submit

./bin/flink run -t yarn-session \

-Dyarn.application.id=application_XXXX_YY \

./examples/streaming/TopSpeedWindowing.jar

Per-Job Cluster Mode

 

In per-job mode, each Job submitted to YARN has an independent Flink cluster with its own JobManager and TaskManager. That is, one job is a cluster, and the jobs are isolated from each other.

The per-job Job submission can have high startup latency because the PipelineExecutor does not need to share the cluster. Therefore, the PipelineExecutor creates a cluster and submits JobGraph and the required files to the Yarn cluster to perform a series of initialization operations. It takes some time at this point. When submitting a task, all jar packages of local Flink will be uploaded to the corresponding temporary directory on HDFS, which will also bring a lot of network overhead.

The advantage is that the resources between jobs are completely isolated, the failure of one job’s TaskManager does not affect the running of other jobs, and the JobManager load is spread out without a single point of problem. When the job completes, the cluster associated with it is destroyed and the resource is freed.

Therefore, the per-job pattern is typically used to deploy jobs that run for a long time.

submit

./bin/flink run -t yarn-per-job –detached ./examples/streaming/TopSpeedWindowing.jar

“Other Operations”

#List running job on the cluster

./bin/flink list -t yarn-per-job -Dyarn.application.id=application_XXXX_YY

#Cancel running job

./bin/flink cancel -t yarn-per-job -Dyarn.application.id=application_XXXX_YY

Application Mode

 

The Application pattern attempts to combine the resource isolation of the per-job pattern with a lightweight, extensible Application submission process. To do this, it creates a cluster per Job, but the main() of the application will be executed in JobManager.

The Application pattern creates a cluster for each submitted Application, which can be thought of as a cluster of sessions shared between jobs for a particular Application and terminated when the Application completes. In this architecture, the Application pattern provides resource isolation and load balancing between applications

Executing the main() method in JobManager saves the required CPU cycles. As an added benefit, because each application has a JobManager, the network load can be spread more evenly.

submit

./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

“Other Operations”

#List running job on the cluster

./bin/flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY

#Cancel running job

./bin/flink cancel -t yarn-application -Dyarn.application.id=application_XXXX_YY

Multiple Jobs in Application Mode are actually coded to allow execute/executeAsyc methods to be called multiple times within an Application. However, the execute method is blocked, that is, the execute of the next job can be continued only after one job is completed. However, asynchronous non-blocking execution can be performed using executeAsync.

Yarn Mode Summary

​ 

This article is from big Data and Machine Learning Abstracts