1/Flink supports multiple operating modes

<1> Local: Only one machine is used

<1> Download the Flink package <2> upload it to the VM and decompress it. <3> Configure environment variables <4> Start flink,./bin/start-cluster.sh,./bin/stop-cluster.sh <5> Test flinkCopy the code

<2> Standalone: Standalone mode, Flink comes with its own cluster, suitable for development test environments

1. Decompress the Flink package to a specified directory (the specified directory of the master node) 2. Yaml file jobManager.rpc. address parameter TaskManager.tmp. dirs parameter Because it is a YAML file, the colon must be followed by a space 3. Configure the master file and slaves file (or workers file). The master file is the IP address of the master machine or the name of the machine. The slaves file is the slave machine batch or the name of the machine. Configure the environment variable export export 5. Distribute the Flink directory to other nodes. Distribute the entire Flink installation directory SCP to other slave nodes to ensure that all the machines in the cluster are the same. 6. Distribute environment variable files to other nodes such as /etc/profile or the. Bash_profile file in the home directory to other nodes 7. 8. Submit the WordCount application for test 9. Check the Flink WebUICopy the code

<3> YARN: Manages computing resources on YARN, applicable to the production environment

On YARN Implements logicCopy the code

Yarn-session. sh(create resources)+ Flink run(submit tasks)

Start a Flink cluster that has been running./bin/yarn-session.sh -n 2 -jm 1024 -tm 1024 [-d] Attach to an existing Flink yarn session. /bin/yarn-session.sh -id Application_1463870264508_0029 mission. / bin/flink run. / examples/batch/WordCount. The jar - input HDFS: / / hadoop100:9000 / LICENSE - the output HDFS: / / hadoop100:9000 / wordcount - result. TXT stop task [web interface or the command line follow the cancel orders]Copy the code

Flink run -m yarn-cluster(create resource + submit task)

Start the cluster, perform a task. / bin/flink run - m yarn - cluster - yn 1024-2 - yjm ytm 1024. / examples/batch/WordCount. Jar note: The YARN_CONF_DIR or HADOOP_CONF_DIR or HADOOP_HOME environment variable must be set on the client to read YARN and HDFS configuration information. Otherwise, the startup failsCopy the code

<4> Standalone and on YARN are both operating modes of Spark

Standalone mode: Spark manages resources independently of other components such as master and worker nodes. The resource scheduler is not flexible and supports only FIFO scheduling. Resources can only be managed in memory On Yarn: Spark runs as an application on YARN without deploying the Spark cluster. The master and worker nodes do not support rich scheduling and management policies. The standalone feature is similar to the standalone feature. The standalone feature is naturally faster than on YARN, but it is not much faster. After all, there are many more steps to interact with YARN when on yarn than when standaloneCopy the code

2/ Steps for setting up Flink’s Standalone run mode

<1> Download the Flink zip package

Download url: https://flink.apache.org/downloads.html download flink package, download a pre-compiled binaries (bin) # generally on the file name, both flink version, also have the version of hadoop, or scale of the versionCopy the code

<2> Upload to the Linux server

Upload the zip package to your Linux server, wherever you like, according to your personal habits.Copy the code

< 3 > extract

The tar - ZXVF flink - 1.6.0 - bin - hadoop26 - scala_2. 11. TGZCopy the code

<4> Move the decompressed folder to /usr/local and change the directory to flink or flink-1.6.0

/usr/local/flink # /usr/local/flink Flink -1.6.0-bin-hadoop26-scala_2.11 flink-1.6.0-bin-hadoop26-scala_2.11 flink-1.6.0-bin-hadoop26-scala_2.11Copy the code

<5> Modify the configuration file

Conf /flink-conf.yaml /flink /flink-conf.yaml /flink /flink-conf.yaml /flink /flink-conf.yaml /flink /flink-conf.yaml /flink /flink-conf.yaml /flink /flink-conf.yaml Taskmanager.tmp. dirs: /usr/local/flink/ TMPCopy the code

<6> Modify the two files Masters and Slaves (sometimes Workers) to configure the information of JobManager and taskManager

The Flink distributed framework is also master-slave. The masters file holds the master node (management node) and the Slaves file holds the Slave node, which is the working node. Master nodes can only be placed in the Masters file instead of slaves file so that master nodes only manage and do not participate in the calculation. If the master node is placed in both the Masters file and slaves file, the master node both manages and participates in the calculation. Finally, you can check whether the master node participates in the calculation by using the JPS command on the master server.Copy the code

<7> Set environment variables

In the /etc/profile file, FLINK_HOME export FLINK_HOME=/usr/local/flink export PATH=$FLINK_HOME/bin:$PATHCopy the code

<8> Distribute /etc/profile to the other two nodes

SCP /etc/profile node-2:/etc SCP /etc/profile node-3:/etc # Since each node in each cluster needs to be configured with environment variables, # for convenience, the SCP command is usedCopy the code

<9> Each node reloads the environment variables so that the profile takes effect immediately

   source /etc/profile
Copy the code

<10> Distribute flink to other nodes using the SCP command

   scp -r /usr/local/flink  node-2:/usr/local
   scp -r /usr/local/flink  node-3:/usr/local
Copy the code

<11> Start the Flink cluster

Sh # Start flink cluster /bin/stop-cluster.sh # Stop flink clusterCopy the code

Flink cluster after opening, can pass the JPS check process, to master the nodes perform the JPS, can see StandaloneSessionClusterEntrypoint this process, if not is have a problem. When JPS is executed on each slave node, you can see the TaskManagerRunner process. As shown below:Copy the code

<12> Start the HDFS cluster

If yes, you do not need to start it againCopy the code

<13> Upload the wordcount. TXT file to HDFS /test/input

   hadoop fs -put /home/hadoop/wordcount.txt /users/houzhen03/inputs/flink
Copy the code

<14 Test the cluster environment

Bin/flink run/export/servers/flink - 1.6.0 / examples/batch/WordCount. The jar - input hdfs://node-1:9000/users/houzhen03/inputs/flink/wordcount.txt --output hdfs://node-1:9000/users/houzhen03/outputs/flink/result.txtCopy the code

<15 Browse Flink Web UI

http://node-1:8081 # where node-1 is the IP address of the master nodeCopy the code