Text address Flink Parallel and setting

TaskManager and Slot.

Each TaskManager has one to more SolTs. The number of Solts is usually proportional to the number of CPU cores available per TaskManager node. Normally your slot number is the number of CPU cores per node.

Slot is a process. If Flink on Yarn mode is used, resource configuration is not required.

Parallelism (Parallel)

A Flink program consists of multiple tasks (Source, Transformation sink). A task is executed by multiple parallel instances (threads), and the number of parallel instances (threads) of a task is called the parallelism of the task.

Parallel setting

The parallelism setting for a task can be specified at multiple levels

  1. Operator Level
  2. Execution Environment Level
  3. Client Level
  4. System Level

Operator Level

The parallelism of an operator, data source, and sink can be specified by calling setParallelism ()

Execution Environment Level

The default degree of parallelism for the execution environment (task) can be specified by calling setParallelism ().

When specified in this way, all operators, data sources and data sinks have the same degree of parallelism

Of course, the parallelism of the execution environment can be overridden by explicitly setting the parallelism of the operator

Client Level

The degree of parallelism can be set when the client submits the job to Flink. For the CLI client, you can specify the parallelism by using the -p parameter./bin/flink run -p 10 WordCount-java.jar

System Level (try not to use)

At the system level, you can specify the default parallelism for all execution environments by setting the parallelism. Default property in the flink-conf.yaml file

Parallelism diagram

Example1:

Explanation:

  1. infink-conf.yamltaskmanager.numberOfTaskSlotsThe default value is 1, that is, eachTask ManagerThere’s only oneSlot, here is 3
  2. In Example1, the WordCount program sets parallelism to 1, meaning the programSource,Reduce,SinkThere’s only one instance, only one instanceSlot

Example2: description: after the parallelism is set to 2, two slots will be occupied

Example3: description: setting parallelism to 9 will occupy nine slots

Example4: explain:

  1. By setting parallelism to 9, and settingsinkIs 1, thenSource,ReduceThere will be nine instances, butSinkOnly one instance exists

conclusion

We can see from the above example

  • The parallelism of the operator cannot exceed Slot in the cluster
  • There can be different kinds of operators in a Slot
  • Operators of the same kind can only run in different slots