preface
The Flink project is a rising star in the field of big data computing. The development of big data computing engines has gone through several processes, from the first generation of MapReduce, to the second generation of Tez based on directed acyclic graphs, the third generation of Spark based on memory computing, and then to the fourth generation of Flink. Because Flink can be developed and used on top of Hadoop, Flink does not replace Hadoop, but is closely integrated with It.
Flink mainly includes DataStream API, DataSet API, Table API, SQL, Graph API and FlinkML, etc. Flink now has its own ecosystem of offline data processing, real-time data processing, SQL manipulation, graph computing and machine learning libraries.
If you need this information, click here to see how to get it
directory
The main content
This paper is divided into 11 chapters, the main content of each chapter is as follows:
** Chapter 1 Overview of Flink; ** This chapter explains the basic principles of Flink, including Flink principle and architecture analysis, Flink component introduction, comparison between Flink stream processing and batch processing, analysis of some typical application scenarios of Flink, and the difference between Flink and other streaming computing frameworks.
** Chapter 2 Flink quick Start; ** Chapter 1 analyzes the basic principles, architecture, and components of Flink. This chapter begins with a quick introduction to Flink to deepen the understanding of the previous content.
** Chapter 3 Flink installation and deployment; ** We have a basic understanding of Flink, and also master the development steps of Flink program. Here’s how to install and deploy a Flink cluster and actually run the Flink program on it.
The installation and deployment of Flink is mainly divided into local mode and cluster mode. The local mode can be used only when decompressed directly without modifying any parameters. It is generally used when doing some simple tests. The cluster mode contains the Standalone mode, such as Standalone.Flink on Yarn. This mode is suitable for the production environment and you need to modify the corresponding configuration parameters.
** Chapter 4 Flink common API details; ** This chapter mainly analyzes and explains the common apis of Flink DataStream and DataSet, and also involves FlinkTableAPI and Flink SQL. Some common operations.
** Chapter 5 Use of Flink advanced features; ** This chapter analyzes the advanced features of Flink, including Broadcast. Accumulator and DistributedCache.
** Chapter 6 Flink State Management and Restoration; ** This chapter mainly analyzes Flink State, including State management and recovery, as well as task restart strategy in Flink.
** Chapter 7 Details of Flink window; ** This chapter mainly analyzes Flink Windows, including common Windows provided in Flink and Window aggregation operations.
** Chapter 8 Details of Flink Time; ** This chapter focuses on Event Time, Ingestion Time, Processing Time and Watermark in Flink Time.
Chapter 9 Detailed explanation of Flink parallelism; ** This chapter mainly analyzes the parallelism in Flink in detail. The degree of parallelism in Flink is divided into four levels :Operator Level, Execution Environment Level, Client Level and System Level.
** Chapter 10 Flink Kafka Connector details; **Flink provides many Connector components, one of the most widely used is Kafka. In this chapter, we mainly analyze the application of Kafka Connector in Flink.
Chapter 11 Flink actual combat Project Development; ** This chapter mainly analyzes some actual application scenarios of Flink, including architecture design and code implementation. This paper mainly introduces two application scenarios: one is real-time data cleaning, also known as real-time ETL; The other is real-time data reporting.
Flink introduction and combat document a total of 254 pages, need a complete version of small partners, can forward this article pay attention to xiaobian, private xiaobian [learning] to obtain!!
There is the following video for you to learn, same as the private letter xiaobian [learning] to get ~~
The popularization and continuous upgrading of big data technology have greatly promoted the acceleration of the realization of an intelligent society, and the technology related to big data has become a more and more basic service. Flink has many features that distinguish it from other big data technologies, attracting more and more practitioners’ attention. The author of this paper has been deeply engaged in the field of big data for several years, has rich practical experience, and has an in-depth understanding of big data processing frameworks such as MapReduce, Spark and Storm. This paper introduces some key technologies and features of Flink, and helps readers to get started quickly with their own practical experience.
Flink is the mainstream real-time computing framework for big data at present. This paper explains the design principle and implementation mechanism of Flink in a simple way, and gives detailed explanations from interface use, platform operation and maintenance to case operation. This article can be used as a primer for Flink application developers and a manual for Flink platform operators.