About the author: Wang Zhiwu, senior big data engineer. This article is selected from the education column “42 Easy Customs Flink”.
This class we mainly introduce Flink entry procedures and SQL form of implementation.
We have explained the common application scenarios and architecture model design of Flink in the last class. In this class, we will start from a simple WordCount case as a starting point, and use SQL to realize it, laying a solid foundation for the practical courses.
This article is selected from the education column “42 Easy Customs Flink”.
Flink development environment
In general, any big data framework runs as a cluster in a real production environment, and most of our debugging code builds a template project locally, and Flink is no exception.
Flink is an open source big data project that uses Java and Scala as development languages. It is generally recommended to use Java as development language and Maven as compilation and package management tool for project construction and compilation. For most developers, the JDK, Maven, and Git development tools are essential.
Tips for installing JDK, Maven, and Git are as follows:
tool
Version suggested
note
JDK
1.8 or above
Maven
Suggest Maven 3.2.5
Liverpoolfc.tv: maven.apache.org/download.cg…
Git
No special requirements
Follow the latest version on the official website
Git official website: git-scm.com/downloads
Flink Warehouse: github.com/apache/flin…
system
Linux is recommended
Engineering to create
In general, we are creating projects through the IDE. We can create projects ourselves, add Maven dependencies, or create applications directly with the MVN command:
mvn archetype:generate \ -DarchetypeGroupId=org.apache.flink \ -DarchetypeArtifactId=flink-quickstart-java \ - DarchetypeVersion = 1.10.0Copy the code
Create a new project by specifying the three elements of a Maven project: GroupId, ArtifactId, and Version. Meanwhile, Flink provides me with a more convenient method to create Flink project:
curl https://flink.apache.org/q/quickstart.sh | bash -s 1.10.0Copy the code
We execute this command directly from the terminal:
The Build Success message pops up and we see a generated project called QuickStart in our local directory.
The main thing needed here is to comment out the scope dependency on Flink in the automatically generated project pom.xml file:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version> <! --<scope>provided</scope>--> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version> <! --<scope>provided</scope>--> </dependency>Copy the code
This article is selected from the education column “42 Easy Customs Flink”.
DataSet WordCount
The WordCount program is an introduction to the big data processing framework, commonly known as “word counting.” Used to count the number of occurrences of each word in a paragraph of text, the program is mainly divided into two parts: one is to divide the text into words; The other part is counting words in groups and printing out the results.
The overall code implementation is as follows:
Public static void main(String[] args) throws Exception {// Create Flink running context final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // DataSet<String> text = env.fromelements ("Flink Spark Storm"."Flink Flink Flink"."Spark Spark Spark"."Storm Storm Storm"); DataSet<Tuple2<String, Integer>> counts = text.flatmap (new LineSplitter()).groupby (0).sum(1); // DataSet<Tuple2<String, Integer>> counts = text.flatmap (new LineSplitter()). // Result prints counts. PrintToErr (); } public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {// Divide text into String[] tokens = value.tolowerCase ().split("\\W+");
for (String token : tokens) {
if(token.length() > 0) { out.collect(new Tuple2<String, Integer>(token, 1)); }}}Copy the code
The whole process of implementation is divided into the following steps.
First, we need to create Flink’s runtime context:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();Copy the code
We then use the fromElements function to create a DataSet object that contains our input and convert it using the FlatMap, GroupBy, and SUM functions.
Finally, print the output directly on the console.
conclusion
This class introduces the engineering creation of Flink. Experience the power of Flink SQL for the first time, giving you an intuitive understanding of what to do next.
That’s all for this lesson. In the next class, I’ll introduce “Flink’s programming model compared to other frameworks.” See you next class.
This article is selected from the education column “42 Easy Customs Flink”.
Copyright notice: The copyright of this article belongs to Pull hook education and the columnist. Any media, website or individual shall not be reproduced, linked, reposted or otherwise copied and published/published without the authorization of this agreement, the offender shall be corrected.