1. Brief introduction to Flink
Flink is a framework and distributed processing engine for stateful computation of unbounded and bounded data streams. It provides core functions such as data distribution, fault tolerance mechanism and resource management.
2. The difference between Flink and Streaming
-
Spark Streaming includes Master, Worker, Driver, and Executor while Flink includes Jobmanager, Taskmanager, and Slot.
-
Spark Streaming creates DStreamGraph, JobGenerator and JobScheduler in sequence by continuously generating tiny batches of data to build a directed acyclic graph DAG. Flink generates StreamGraph from user-submitted code, optimizes it to generate JobGraph, and then submits it to JobManager for processing. JobManager generates ExecutionGraph from JobGraph. ExecutionGraph is the core data structure of Flink scheduling. JobManager schedules jobs based on ExecutionGraph.
-
Time mechanism: Spark Streaming supports a limited time mechanism, only processing time. Flink supports three definitions of time for stream handlers: processing time, event time, and injection time. It also supports the watermark mechanism to process lagging data.
-
For Spark Streaming tasks, we can set a checkpoint, and then if a fault occurs and restart, we can restore from the last checkpoint. However, this behavior only keeps data from being lost and may be processed repeatedly. It is not possible to process semantics exactly once. Flink uses the two-phase commit protocol to solve this problem.
3. Flink’s checkpoint problem
In order to achieve fault tolerance and exact-once semantics, Flink regularly persists state. This process is called checkpoint, which is a snapshot of the global status of Flink Job at a certain moment. Ck process:
Each Flink job generates a JobManager, which in turn has a checkpoint coordinator to manage the checkpoint process. A checkpoint coordinator can set an interval for sending a checkpoint event to the source task (task1 and task2 in the parallel diagram) in each Container. When a Source operator receives a Barrier, it will pause its data processing and then make a snapshot of its current state and save it to the specified persistent store. Finally, an Acknowledge character (ACK) is sent asynchronously to the CheckpointCoordinator, and the Barrier is broadcast to all downstream operators of the CheckpointCoordinator to resume its data processing. Each operator continuously creates snapshot according to the above and broadcasts to the downstream until the Barrier passes to the sink operator, at which point the snapshot is completed. In this case, it should be noted that upstream operators may be multiple data sources, and all barriers need to be completed before checkpoint is triggered at one time. Therefore, a long checkpoint time may be caused by the long time required for data alignment.Copy the code
4. What are the functions of each component of FLink
- JobManager
JobManager plays the role of the manager Master in the cluster. It is the coordinator of the whole cluster, responsible for receiving Flink jobs, coordinating checkpoints, Failover recovery, etc., and managing TaskManager of slave nodes in the Flink cluster.
- TaskManager
A TaskManager is a group of tasks that perform Flink jobs on the Worker that is actually responsible for computing. Each TaskManager manages the resource information on its node, such as memory, disk, and network, and reports the resource status to The JobManager when it is started.
- Client
Client is the Client for Flink program submission. When a user submits a Flink program, a Client will be created first. The Client will preprocess the Flink program submitted by the user and submit it to the Flink cluster for processing. Therefore, the Client needs to obtain the JobManager address from the Flink program configuration submitted by the user, establish a connection to the JobManager, and submit the Flink Job to the JobManager.
5. Slot concept
As mentioned in the Flink architecture role, TaskManager is the Worker that is actually responsible for performing the computation. TaskManager is a JVM process that executes one task or more subtasks in separate threads. To control how many tasks a TaskManager can accept, Flink introduced the concept of Task Slot.
In simple terms, TaskManager divides the resources managed on its nodes into different slots: a fixed-size subset of resources. This prevents tasks of different jobs from competing with each other for memory resources, but the main requirement is that Slot only performs memory isolation. CPU isolation is not done.
6, back pressure
If the producer uses the unbounded buffer, when the production speed of the producer is much faster than the consumption speed of the consumer, the data on the producer end will be overstocked due to the low consumption capacity of the consumer end, resulting in OOM.
- Static flow control: When the TPS of the producer is more than that of the consumer, we use the overwrite method and use Batch encapsulation to send our data in batches. After each sending, we sleep for a period of time. The calculation method of this time is left (remaining data)/TPS. But this approach is very difficult to predict the system. Confirm: Speed needs to be estimated and cannot be adjusted dynamically
- Dynamic flow control: Flink data exchange can be roughly divided into three types, one is the data exchange of the same Task, the other is the data exchange between different tasks and JVM. The third is the exchange between different tasks and different JVMS.
-
The first method is the forward strategy method, which is the data exchange of the same Task, mainly to avoid serialization and network overhead, resulting in unnecessary resource waste.
-
The second way of data exchange is that the data will be serialized through a record Writer and transferred to the Result Partition, and then the data will be transferred to the Input Gate of another Task through the local channel for deserialization. And then push it to the Record Reader for action.
-
* The third type of data exchange involves a different JVM, so there is some network overhead. The difference is that the data is pushed to a remote Task via netty. We can see that the backlog is just for the consumer to be aware of what is going on on our production side, and event1 is pushed to TaskB with a backlog = 1. After Event1 is received by TaskB, TaskB will return an ACK to TaskA along with a credit = 3, which tells TaskA how many more pieces of data it can receive. Flink makes both producers and consumers aware of each other's status through this way of mutual information.Copy the code
Blog.csdn.net/weixin_3875…
7. Memory composition of FLink
The JVM is a very complex system, and when it runs out of memory it causes OOM and crashes. Flink will allocate a cut off reserved memory after getting the memory allocated by us, so as to ensure the security of the system. Netword buffers are a bounded buffer. Momery Manager is a memory pool, which can be set to in-heap or out-of-heap memory. Of course, in streaming operations, this part of the buffers can be set to out-of-heap memory. The Free part is the memory block provided to the user.Copy the code