Flink introduction

What is the Flink

Apache Flink is an open source computing platform for distributed data streaming and batch data processing. It supports both stream and batch applications.

Flink characteristics

Existing open source computing solutions treat stream processing and batch processing as two different application types: stream processing generally needs to support low latency and Exactly once guarantee, while batch processing needs to support high throughput and efficient processing
Flink fully supports stream processing, that is, the input data stream is unbounded when viewed as stream processing; Batch processing is treated as a special kind of stream, except that its input data stream is defined as bounded

Flink component stack

Deployment layer:

Flink supports various deployment modes: local, cluster (Standalone/YARN), cloud (GCE/EC2)
The Runtime layer:

The Runtime layer provides all the core implementations that support Flink computing, such as distributed Stream processing, JobGraph to ExecutionGraph mapping, scheduling, and so on, and provides the base services for the upper API
The API layer:

The stream-processing and batch-oriented API are mainly implemented, in which stream-processing corresponds to DataStream API and batch-processing corresponds to DataSet API
Libaries layer:

Application-specific implementation computing frameworks built on top of the API layer also correspond to stream-oriented and batch-oriented respectively

Flink has its own advantages

Supports high throughput, low latency, and high performance stream processing

(Flink runs the same business code faster than Spark Streaming and Strom, flink supports millisecond delay operation)
Supports highly flexible Window operations
Support for the exact-once semantics of stateful computation

Apache	Flink	SparkStreaming	Storm
architecture	The architecture is between Spark and Storm. The master-slave structure is similar to Spark Streaming and DataFlow Grpah is similar to Storm. The data stream can be represented as a directed graph. Each vertex is a user-defined operation, and each edge represents a flow of data.	The architecture depends on Spark, and each Batch process depends on driver. It can be understood as Spark DAG in the time dimension. Micro-Batch	It is in master-slave mode and depends on ZK. There is little dependence on the master during processing.
Fault tolerance	Based on chandy-Lamport Distributed Snaps Hots Checkpoint Medium	WAL and RDD lineage mechanism High	Records ACK Medium
Deal with models and delays	Single time processing. Sub-second low latency	All events in a time window. High second delay	One event at a time. Sub-second low latency
Data processing guarantee	Exactly once	Exactly once(implementation adopts candy-Lamport algorithm, i.e. Marker-checkpoint)High	6. Record-level initials (not explained) Medium

Basic concepts of Flink & programming model

The basic concept

The basic building blocks of Flink are streams and transformations.
Each data flow starts at one or more sources and ends at one or more sinks
Aggregations on streams need to be scoped by Windows, such as “calculate the last 5 minutes” or the sum of the last 100 elements.
Windows are usually divided into different types, such as scroll Windows (no overlap), sliding Windows (overlap), and session Windows (interrupted by inactive interval time).

Basic architecture

Flink is based on the master-slave style architecture
When the Flink cluster starts, one JobManager process and at least one TaskManager process are started

TaskManager

The Worker who is actually responsible for performing the calculation executes a group of tasks of Flink Job on it
Manages resource information, such as memory, disk, and network information, and reports the resource status to JobManagers upon startup.

JobManagers

Coordinator of the Flink system, it is responsible for receiving Flink jobs and scheduling the execution of multiple tasks that comprise the Job.
Cell phone Job status information and manage TaskManager from nodes in Flink cluster.

Client

When a user submits a Flink program, a Client will be created first, which will preprocess the Flink program submitted by the user and submit it to the Flink cluster.
The Client will assemble a JobGraph from the Flink application submitted by the user and submit it as a JobGraph.

Quick start

mvn archetype:generate \ -DarchetypeGroupId=org.apache.flink \ -DarchetypeArtifactId=flink-quickstart-java \ -darchetypeversion =1.7.2 \ -darchetypecatalog =local MVN archetype:generate \ -darchetypegroupid =org.apache.flink \ -darchetypeartifactid =flink-quickstart-scala \ -darchetypeVersion =1.7.2 \ -darcheTypecatalog =localCopy the code

Related Posts

How to gracefully view system CPU information in Linux

JavaSE method

Part 8 – Linux ARM assembler definition operations