Flink1.12 YARN deployment and compatibility with the CDH environment

1. Download and install Flink1.12

1.1 download

Flink.apache.org/downloads.h…

Select scala2.11 for download

CD/opt/wget HTTP: / / https://www.apache.org/dyn/closer.lua/flink/flink-1.12.2/flink-1.12.2-bin-scala_2.11.tgzCopy the code

1.2 unzip

Tar xf flink 1.12.2 - bin - scala_2. 11. TGZCopy the code

1.3 Configuration Environment

vim /etc/profile
export HADOOP_CONF_DIR=PATH/lib/hadoop/etc/hadoop
export HBASE_CONF_DIR=PATH/lib/hbase/conf
export HADOOP_CLASSPATH=`hadoop classpath`

source /etc/profile
Copy the code

Run hadoop classpath to display the Lib JAR package. The configuration is correct

1.4 Test Example (Based on YARN)

CD flink - 1.12.2. / bin/flink run - m yarn - cluster - 5 - p yjm 1024 m - ytm 1024 m examples/streaming/WordCount jarCopy the code

1.5 Flink standalone mode

CD flink-1.12.2. /bin/start-cluster.sh # Start the clusterCopy the code

Access: http://ip:8081 Access the UI, but the following tasks can be submitted and viewed on the UI

# running tasks. / bin/flink run examples/streaming/WordCount. JarCopy the code

Compatible with CDH environment

There is no problem with the official example, just a word statistics, and no hadoop ES Kafka and other read and write operations, the next connection to Hadoop will be a problem, so to do a compatibility

2.1 Official Website

Ci.apache.org/projects/fl…

Flink – shaded – hadoop – 2 – uber is what?

A project to resolve dependencies for a specific version of HADOOP, in which case CDH belongs to the specific version

Warning: As of Flink 1.11, the Flink-Shaded 2-UberFlink project is no longer officially supported to improve the method applied in the distribution. You are advised to provide the Hadoop dependency HADOOP_CLASSPATHCopy the code

After the official website said that Flink1.11, we won’t do the project flink-shaded-hadoop-2-uber, so we need to pack it up and put it in the Lib directory of Flink after we improve the method in this case

2.2 configuration flink – shaded

Flink – is shaded? Flink relies on projects

Address: github.com/apache/flin…

The 1.11 and 1.12 versions of this project no longer have a Hadoop project, so we had to pull the 1.10 branch for packaging

Download address: codeload.github.com/apache/flin…

Start the configuration

Shaded in the method proposed in this paper, which is not to be applied in the method proposed in this paper, is not to be applied in the method proposed in this paper, which is not to be applied in the method proposed in this paper Version >3.0.0</hadoop.version> <zookeeper.version>3.4.5</zookeeper.version> CD Flink-shaded -hadoop-2-uber vim Pom.xml # added this because Hadoop Commons-CLI is version 1.2. <groupId> Commons -cli</groupId> <artifactId> Commons -cli</artifactId> <version>1.3.1</version> </dependency> cd .. /.. XML <profile> <id>vendor-repos</id> <property> <name>vendor-repos</name> </property> </activation> <! -- Add vendor maven repositories --> <repositories> <! -- Cloudera --> <repository> <id>cloudera-releases</id> <url>https://maven.aliyun.com/repository/central</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> </profile>Copy the code

2.4 Compiling and Packaging

Maven installation skip (download official package, decompress, declare environment variables, configuration file does not need to change)

CD flink-shaded-release-10.0/ MVN clean install-dskipress-drat. Skip = true-Shaded -pvder-pos -dhadoop.version. CDH = 3.0.0WillCopy the code

When the packing is done, However, this method is not applied in the method proposed in this paper, which is not to improve the method on the side of 3.0.0-10.0.jar in flink, which is not to be applied in the method proposed in this paper, however, to improve the method on the side of 3.0.0-10.0.jar Lib directory

2.5 Declare that the Flink command is available globally

Ln - sv/opt/flink - 1.12.1 / bin/flink/usr/bin/flinkCopy the code

2.6 Example of running read and write with Hadoop and ES

flink run -m yarn-cluster -p 5 -yjm 1024m -ytm 2048m -ynm MysqlSyncEsStream -yqu root.default -c Com. XXXX. XXX. XXXX. Task. MysqlSyncEsStream/root/XXX XXXX - XXX - xx - 1.0 - the SNAPSHOT. 9 jarCopy the code

3. Deployment Mode (added in Flink1.11)

3.1 Three Flink deployment modes

Session mode Preallocates resources, applies for a Flink cluster from the cluster, and stays in the cluster for a long time. With a fixed number of JobManagers and TaskManagers, submitted tasks can work, avoiding the application creation resources and overhead. However, the resources are limited, and the more tasks, the greater the load, and the resources are not isolated. Session mode is typically used to deploy latency-sensitive but short-run jobs
Per-job Mode An independent Flink cluster is created and applied for each time. This is the flink run yarn-cluster mode used in the preceding example. Resource isolation, has a dedicated JobManager and TaskManager, start task resource re-creation, slow start. Jobs are run, resources are destroyed, and the per-job mode is typically used to deploy long-running jobs.
Application Mode New deployment mode in Flink1.11

Reference links:

www.jianshu.com/p/90d9f1f24…

Blog.csdn.net/weixin\_328…