Flink development environment deployment and configuration

Flink is an open source big data project with Java and Scala as development languages. The code is open source on Github, and maven is used to compile and build the project. For most of you who are developing or using Flink, the Java, Maven and Git tools are essential, so you need to install and configure these three tools first. In addition, a powerful IDE makes it easier to read code, develop new features, and fix bugs, so a brief introduction to IDE configuration will also be provided here.

1. Download flink code

Once the Git tool is installed and configured, you can download the Flink code from Github. Flink’s repository on Github is github.com/apache/flin…

(Optional) For Domestic users, downloading codes on Github may be slow. You can add the following configuration to /etc/hosts to significantly increase the download speed of Github:

151.101.72.133 151.101.73.194 192.30.253.113 11.238.159.92 github.com github.global.ssl.fastly.net assets-cdn.github.com  git.node5.mirror.et2sqaCopy the code

If Windows is used, the configuration is in the C:\Windows\System32\drivers\etc\hosts file.

If you use the Win10 Linux subsystem, you are advised to configure it in the “C:\Windows\System32\drivers\etc\hosts” file, and then restart the Linux subsystem. The /etc/hosts file in the Linux subsystem is generated based on the C:\Windows\System32\drivers\etc\hosts file in the Windows system.

Win10 Linux subsystem can also be deleted by deleting Linux subsystem /etc/hosts file in this line to prevent Linux subsystem startup time overwrite modified hosts file:

# This file was automatically generated by WSL. To prevent automatic generation of this file, remove this line.
Copy the code

Download the Flink code locally

git clone https://github.com/apache/flink.git
Copy the code

(Optional) After the code is downloaded, the default is in the Master branch. In consideration of the quality of the code, the appropriate release branch will be used, such as release-1.7 or release-1.8.

Git Checkout release-1.8Copy the code

2. Compile flink code

Flink code uses Maven to build projects. Maven will download the Flink dependency packages by default when compiling the code based on the configuration information in the ~/.m2/settings. XML file under the current user. You can also specify the location of the Maven Settings file by adding “– Settings =${your_maven_settings_file}” to the MVN command. If you already have the appropriate Maven Settings configuration, you can use the existing Settings

To specify the path of Maven’s local repository, you can configure the “localRepository” parameter in settings. XML. By default, ~/.m2/repository/ is downloaded to the. M2 /repository directory in the current user’s home directory.

The important configuration fragments are shown below.

<mirror> <id>nexus-aliyun</id> <mirrorOf>*,! jeecg,! jeecg-snapshots,! mapr-releases</mirrorOf> <name>Nexus aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </mirror> <mirror> <id>mapr-public</id> <mirrorOf>mapr-releases</mirrorOf> <name>mapr-releases</name> <url>https://maven.aliyun.com/repository/mapr-public</url> </mirror>Copy the code

The first mirror uses the Maven repository provided by Aliyun, which can be used to speed up access to maven repository. You can also configure other Maven repositories in the country or build your own repository. Most important is what is highlighted in red in the clip below. Since the Flink-Filesystems/Flink-Mapr-FS module in Flink relies on the JAR package provided by MapR-Dependency Repository, However, it is slow to access MapR-Dependency Repository domestically, and the dependency jar package maprfS-5.2.1-mapr. jar is 48MB, which is the largest jar of Flink dependency, so when I compile Flink for the first time, Compile failures often occur due to timeout of mapR-related dependencies for downloading. Therefore, Aliyun has a mirror repository agent, MapR-Releases Repository, in hopes of making it easier for users to download MAPR-related JARS. Can use this link to view the aliyun mirror warehouse meta information: maven.aliyun.com/mvn/view

After we have configured the previous tools, compiling Flink is as simple as executing the following command:

Build flink binary
Install flink binary in Maven's local Repository (default: ~/.m2/repository)
mvn clean install -DskipTests 

# another type of compile command, as opposed to the above command, the main ensure is:
Tests, QA plugins, and JavaDocs are not compiled, so compilation is faster
mvn clean install -DskipTests -Dfast
Copy the code

Alternatively, in some cases we may not want to install the compiled Flink binary in Maven’s Local Repository, we can use the following command:

Build flink binary
mvn clean package -DskipTests 

# another type of compile command, as opposed to the above command, the main ensure is:
Tests, QA plugins, and JavaDocs are not compiled, so compilation is faster
mvn clean package -DskipTests -Dfast
Copy the code

If you want to use the specified hadoop version, you can specify -dhadoop. version and run the following command to compile the version:

MVN clean install -dskiptests -dhadoop. version=2.6.1# or MVNThe clean package - DskipTests - Dhadoop. Version = 2.6.1Copy the code

When successfully compiled, all of the above compilation methods will eventually compile a full Flink binary under the current Flink code path, which can be seen in the flink-dist/target/ directory:

Problems that may be encountered during compilation

  • Fault 1: Failed to BUILD. The FAILURE information contains mapR information

This error is usually related to the download failure of the dependency packages related to mapR. In the actual test, even if the mapR-releases image of the Aliyun agent is configured, the download failure may occur. The problem may be caused by the large jar package of mapR, which is easy to download. When you encounter these problems, just try again. Before retrying, delete the corresponding directory in maven Local Repository based on the failure information. Otherwise, you need to wait for maven download timeout to start again.

For example, the following compiler failed:

Failure information display com. Mapr. Hadoop: maprfs: jar: 5.2.1 mapr and it relies on org.. Apache hadoop: hadoopauth: jar: 2.7.0 mapr – 1703 has a problem, Simply delete these two packages from the maven Local Repository directory and recompile them.

Rm - rf ~ / m2 / repository/com/mapr/hadoop/maprfs / 5.2.1 - mapr rm - rf . ~ / m2 / repository/org/apache/hadoop/hadoop - auth / 2.7.0 - mapr - 1703Copy the code

I have also encountered the above situation, just delete maprfs JAR package and try again.

Rm - rf ~ / m2 / repository/com/mapr/hadoop/maprfs / 5.2.1 - maprCopy the code

The mapR jar is stored in the local repository directory. The mapR jar is stored in the local repository directory.

  • Problem 2: It is found that compiling Flink in Linux subsystem of Win10 is time-consuming

The “ng build –prod –base-href./” command was very slow to compile flink-run-time web. It was completed, but took about an hour.

When executing the “ng build –prod –base-href./” command alone, you end up at “92% Chunk Asset Optimization” for a long time. Win10 系统 的 数 据 和 Win10 系统 的 数 据 (Win10 系统 的 数 据 和 Win10 系统 的 数 据) This question has no conclusion at present, interested and qualified students can also have a try.

3. Prepare the development environment

A good IDE can not only improve the efficiency of developers, but also help those who do not do code development but want to learn Flink by code to understand the code.

IntelliJ IDEA IDE is recommended as Flink’s IDE tool. Officially, the Eclipse IDE is not recommended, mainly due to incompatibility between Eclipse’s Scala IDE and Flink’s Scala.

(1) Download and install Intellij IDEA, Intellij IDEA IDE: www.jetbrains.com/idea/, download the latest version…

(2) Install the Scala Plugin. The Flink project uses Java and Scala development, and Intellij has Java support. Before importing the Flink code, it is necessary to ensure that the Scala plugin of Intellij is installed. Installation method is as follows:

  • IntelliJ IDEA -> Preferences -> Plugins, click “Install Jetbrains Plugin…”
  • Search for “Scala” and click “Install”
  • Restart the Intellij

(3) Check Maven configuration for Intellij

  • IntelliJ IDEA -> Preferences -> Build, Execution, Deployment -> Build Tools -> Maven

  • Check that “Maven Home Directory” meets expectations, if not, select the correct Maven path, and then apply

  • Check whether the User Settings file is correct. The default value is ${your_home_dir}/.m2/settings. XML

  • Check whether Local directory meets expectations. The default value is ${your_home_dir}/. M2 /repository

(4) Import Flink code

  • 1. IntelliJ IDEA -> File -> New -> Project from Existing sources… , select the root path of the Flink code
  • 2. In “Import Project from External Model”, select “Maven” and click “Next” all the way to the end
  • 3. IntelliJ IDEA -> File -> Project Structure… -> Project Settings -> Project to check whether the Project SDK is as expected (because we have configured JAVA_HOME in the previous step), if not, click “New”. Then select the JDK Home directory installed in the previous step

PS: After the code is imported, Intellij will automatically sync the code and create index for code lookup. If the code has not been translated before, you need to do a full compilation of the code, and then Intellij does a sync so that Intellij can recognize all the code.

(5) Add Java Checkstyle

Adding Checkstyle to Intellij is important because Flink enforces a code style check at compile time, and if the code style does not conform to the specification, it may simply compile and fail. For those of you who need to do secondary development on an open source basis, or who are interested in contributing code to the community, adding CheckStyle early and paying attention to the code specification can save you time from unnecessary formatting changes.

Checkstyle-idea plugin (Intellij IDEA -> Preferences -> Plugins) Search “checkstyle-idea”).

Configure Java Checkstyle:

  • IntelliJ IDEA -> Preferences -> Other Settings -> Checkstyle
  • Set Scan Scope to Only Java Sources (including Tests)
  • Select “8.9” from the “Checkstyle Version” drop-down box
  • Click “+” in “Configuration File” to add a flink Configuration:

A. “Flink” b. “Description” fill “the Use of a local Checkstyle file” select tools/maven under this code/Checkstyle. XML file c. Check “Store Relative to Project Location” and click “Next” D. Configuration “checkstyle. Suppressions. The file” has a value of “suppressions. XML”, then click “Next” and “Finish” e. Check “Flink” as the only checkstyle configuration in effect and click “Apply” and “OK”

  • IntelliJ IDEA -> Preferences -> Editor -> Code Style -> Java, click ⚙ gear button, select “Import Scheme” -> “Checkstyle Configuration”, Select the checkstyle.xml file. After this configuration, Intellij automatically adds the import code in the correct place according to the rules.

It should be noted that some of the modules in Flink do not fully checkstyle, including Flink-core, Flink-Optimizer, and Flink-Runtime. However, make sure that any code you add or modify follows checkStyle’s specifications.

Add Scala’s Checkstyle

  • Copy the “tools/maven/scalastyle-config. XML” file to the “.idea “subdirectory of the flink code root directory
  • IntelliJ IDEA -> Preferences -> Editor -> Conforms, search “Scala Style Conforms”, check this one

7. Run Example in Intellij

Flink code compilation is complete, choose an example can be run directly, such as: org. Apache. Flink. Streaming. Examples. The windowing. WindowWordCount. Java