Flink problem solution summary

How to set the concurrency of the Source Operator when the Source is Kafka?

If not specified, the number of Source operators is equal to the number of TaskManagers in the cluster. If you set this parameter manually, you are advised to use slot number = Number of Kafka Partitions/Number of TaskManagers. The number of slots must be greater than or equal to 2, because there is a Source Operator. It is also not recommended to enable multithreading in a Slot.

What happens if the Barrier is lost?

Since the Barrier is sent periodically from the Source, the unblocked input channel will receive the next checkpoint Barrier after a period of time. Then Flink will compare the Barrier and find that if the current checkpoint is not completed, But the next checkpoint has already arrived, so Flink will abandon the current checkpoint and move on to the next checkpoint.

Cancel Job on Flink UI. Will all tasks stop?

Answer: No. The Cancel button simply stops the Source, Transform, and Sink operators, and the corresponding thread stops. But the entire TaskManager is still there. So, if there are any objects in the Job that do not initialize the Spring container in Operator, those objects will still exist even after canceling the Job. So, the correct posture is to initialize the Spring container in Operator’s open () method. These resources are released in the close () method.

What if the TaskManager hangs during Job execution?

If the TaskManager hangs, Flink cancels the Job first. Then deploy the Job to the TaskManager that is still alive in the cluster with the same JobID, and if there are enough task slots available, the Job can be restored. However, there is a problem: some TaskManagers will have more tasks deployed on them than before, resulting in heavy load on these taskmanagers, which may still cause problems. You need to recover the TaskManager as soon as possible.

What if a piece of data fails to transfer between Input channels?

An Exception will be thrown, and the Job will restart.

How long is Checkpoint set when Flink reads Kafka?

The snapshot itself is very lightweight, usually in a few meters or dozens of meters. If the snapshot is too large, such as a few hundred megabytes or more, it will affect the performance of the program. The official example is given every few seconds, depending on the Job.

What’s the difference between Checkpoint and Savepoint?

Savepoint is a special checkpoint. A savepoint is a pointer to a checkpoint. It needs to be triggered manually and will not expire or be overwritten unless manually deleted. Savepoint is not required in normal online environments. Test runs are required only when major changes are made to jobs or clusters.

Flink Operator can’t have a member variable?

Flink Operator Function cannot carry member variables that have not implemented Flink serialization. Flink has its own serialization method, and validation will be included when the task is submitted. If a class that does not implement Flink serialization is used as a member variable, an error will be reported when the task is submitted. The current solution is to separate operator Function from the actual business logic. Or make the member static.

How many taskslots are appropriate for each TaskManager?

The number of CPU cores is recommended.

The BufferPool in TaskManager is not enough.

Need to increase the configuration items: taskmanager.net work. NumberOfBuffers values, the value network stack buffer number, it said the size of the flow at the same time the TaskManager can have processing of data exchange on the number of the channel.

OOM is displayed during Job running. Procedure

Reserved space is not enough, need to reduce the middle tier of space at this moment, through configuration to reduce the taskmanager. Memory. The fraction of the value to reduce the middle tier of memory. This value represents the proportion of memory that Flink uses to manage the underlying buffer.

How to set the parallelism of Job?

Set all the Transform operator and Sink Operator parallism to the same, and the Source operator parallism to the source. Flink automatically merges both the Transform operator and the Sink operator into a piple line. In this case, a job has only two operators, the source operator and the merge operator. The pipeline operator has no buffer between it, and the performance is optimal.

Related Posts

Use Java code to obtain the Access token of Sina Weibo application

Configuration files are written like this to switch between multiple environments

Mysql Expression #1 of SELECT list is not in GROUP BY clause and contains non