Five days countdown! Flink Forward, the world’s first online conference for Apache’s top projects, will be launched on April 25-26.

Flink Forward global online conference highlights are live in Chinese. The core content is divided into Keynote and the most interesting talk voted by the community. The original English talk is translated and explained by Apache Flink core contributors, and you can directly watch it online for free. This article will cover the full half-time broadcast on April 25.

4/25 Flink Forward live afternoon highlights

  1. Keynote: Keep you updated on Cloudera’s Flink integration.
  2. Practice series: Uber Flink CEP, Netflix’s automatic scaling, Didi StreamSQL’s mass adoption, and worst practices.
  3. Community Ecology: Introduces the hands-on application of PyFlink + Zeppelin and how to use AI Flow to define a production-level AI workflow with Flink.
  4. Flink SQL: In-depth analysis of Flink SQL and the latest trends of 2020 will be shared.

Talk 1

Round table | Keynote: Apache Flink – Completing Cloudera’s End to End Streaming Platform

In January, Cloudera Hadoop’s Arun announced on Twitter that the Cloudera Data Platform had officially integrated Flink as its streaming computing product, Apache Flink PMC Chair Stephan also responded: “This is significant.” This means that all global enterprise users covered by CDH distributions will be able to use Flink for streaming data processing.

In Flink Forward, technologists from Cloudera will share detailed features and technical details of their end-to-end streaming Platform.

Sharing Guests:

  • Marton Balassi, Apache Flink PMC, one of the first contributors to the streaming API.
  • Joe Witt Joe Witt, Vice President of Engineering at Cloudera, focuses on Cloudera Data Flow (CDF) products.

Presenter:

Yang Kert (Rooney), Apache Member, Apache Flink PMC, Senior technical expert of Alibaba.

S Talk 2

2020 of the round table | Flink SQL: give me the who

Four years ago, the Apache Flink community began adding SQL support to simplify and unify the processing of static and streaming data. Today, Flink runs business-critical batch and streaming SQL queries at Alibaba, Huawei, Lyft, Uber, Yelp, and many others. While the community has made significant progress over the past few years, there are still bigger goals in the development blueprint and we are accelerating development.

Over the past few months, the community has added some important improvements and extensions, including DDL support, refactoring of the type system and Catalog interface, and Apache Hive integration. In order to keep up with all the development work done by Flink SQL and its ecosystem, this session will highlight Flink SQL 2020 with a complete example of the system. Based on the actual use case scenario, we will show:

  • How do I define the tables supported by the various storage systems
  • How do I use streaming SQL queries to solve common problems
  • Demonstrate Flink integration with Hive
  • Demonstrates how to define and use user-defined functions

And we’ll be sharing upcoming features and visions for the future.

Sharing Guests:

  • Fabian Hueske, Apache Flink PMC.
  • Timo Walther, Apache Flink PMC.

Presenter:

Chong Wu (Yun Xie), Apache Flink PMC, Alibaba technology expert.

S Talk 3

Round table | Apache Flink misuse

Distributed flow processing is evolving from a technology on the edge of big data to a critical technology that enables enterprises to provide highly scalable real-time services to their customers. Apache Flink’s commercial parent Company, Ververica, and the rest of the Flink community have witnessed this development. In working with our users and the wider community, we’ve seen some successes, but also some problems.

In this talk, I’ll share some anecdotes and lessons from the adoption of distributed flow processing, both Apache Flink specific and cross-framework. Through this sharing, you will learn how to eliminate the occurrence of faults and how to watch the big screen without worry.

**Konstantin Knauf, Product Director of Ververica Platform.

** Commentary guests: ** Sun Jincheng (Jinzhu), Apache Member, Apache Flink PMC, Senior technical expert of Alibaba.

S Talk 4

Round table | Netflix Flink automatic enlarge shrinks

Keystone Data Pipeline manages thousands of Flink pipelines with variable workloads. These pipes are simple data routes that read and write to one of three receivers from Kafka. In order to reduce the operation overhead, we implement automatic scaling for these routing programs.

Automatic scaling reduced our resource usage by 25-45% (depending on region and time), greatly reducing the burden. This talk will delves into the mathematics, algorithms, and infrastructure details needed to enable the automatic scaling of large-scale simple pipelines and discuss future work on automated scaling of complex pipelines.

**Timothy Farkas, Software Engineer, Netflix.

** Commentary guest: ** Wenlong Lv (Longsan), Alibaba technical expert.

S Talk 5

Round table | Uber: use the Flink CEP geography test practice

Uber operates in a complex physical world, and one of the challenges of providing a reliable service is real-time detection of geolocation and dynamic scenarios, such as spatial hot spots, streets with unbalanced demand/supply, etc. The problem is hard to solve because of Uber’s global scale, with congested streets and traffic.

To address this issue, Uber engineers built a geospatial condition detection platform supported by Apache Flink and the CEP library. In this talk, Uber engineers will describe how To leverage Apache Flink and derive geospatial semantics through CEP pattern matching, as well as the challenges involved in building and adopting various technologies on the platform.

**Teng (Niel) Hu, Software Engineer at Uber.

** Commentary guests: ** Fu Dian, Apache Flink Committer, Alibaba technical experts.

S Talk 6

Speech | A deep dive into Flink SQL

Over the past two major releases (1.9 and 1.10), the Apache Flink community has put a lot of effort into revamping the architecture to make it more streambatch uniform. One example is Flink SQL, which provides support for multiple SQL Planners under one set of apis. This talk will first discuss the motivation behind these moves and then delve into Flink SQL to explain some of its internal workings.

This presentation introduces the unified architecture of stream batch and how Flink translates queries into relational expressions and optimizes them with Calcite to generate efficient runtime code. In addition, you’ll look in detail at the query lifecycle, how some common optimizations work, how Flink leverages binary data formats as an underlying data structure, and how certain operators work. This will give the audience a better understanding of the inner workings of Flink SQL.

Sharing Guests:

  • Yang Kert (Rooney), Apache Member, Apache Flink PMC, Senior technical expert of Alibaba.
  • Chong Wu (Cloud Xie) Apache Flink PMC, technical expert of Alibaba.

S Talk 7

Speech | Flink ‘s application at Didi

Didi has rich real-time computing scenarios. Flink has been widely used in real-time monitoring, data channel, feature extraction, real-time data warehouse, online business and other fields. We also built StreamSQL products based on Flink Table API, combined with a one-stop development platform, reducing user costs. StreamSQL coverage currently exceeds 80%. At present, Didi has more than 7,000 real-time computing tasks and processes more than 2 trillion data per day.

** Xue Kang is currently didi’s technical expert and the head of real-time computing. He graduated from Zhejiang University and worked as a senior R&D engineer of Baidu. He has rich experience in big data ecological construction.

S Talk 8

Speech | finally wait until your: PyFlink + Zeppelin

Flink has made great strides in its core engine for unified batch and stream processing, but the barriers to entry are still high, especially for data analysts and data scientists who are only familiar with Python and SQL. For years, users have asked for built-in and complete Python support in Apache Flink to be able to use the programming language they are familiar with while taking advantage of Flink’s unique capabilities.

Version 1.9 of Apache Flink added the Python Table API (also known as PyFlink); Support for the native Python UDF (Apache Beam-based portability framework) was added in 1.10. We will continue to improve PyFlink. In the next release we will support defining Python machine learning processes that will enable users to implement complex machine learning applications entirely in PyFlink. In addition, we integrated Flink and Zeppelin Notebook, and redesigned Zeppelin’s outdated Flink interpreter to fit three major Flink scenarios:

Batch ETL and exploratory data analysis via Flink batch SQL+UDF+Zeppelin’s built-in visualization capabilities; Streaming ETL and streaming data analysis via Flink stream processing SQL+UDF+Zeppelin’s built-in visualization capabilities; Write machine learning processes with PyFlink+Alink.

Sharing Guests:

  • Sun Jincheng (Jinzhu), Apache Member, Apache Flink PMC, Senior technical expert of Alibaba.
  • Zhang Jianfeng (Jian Feng), Apache Member, Apache Zeppelin PMC, Alibaba Senior technical expert.

S Talk 9

Speech | Flink + AI Flow: let the AI is a piece of cake

At present, there are many projects to help users build their AI platforms, such as MLFlow, TFX, Metaflow, Sagemaker, etc. Most of these projects focus on offline training and online reasoning scenarios, and some of them are only available on specific engines and platforms.

In this talk, we will introduce a new project called AI Flow, which addresses both online and offline training processes without being heavily dependent on engines and platforms, so that users can easily define an AI workflow in a highly mixed environment. On the other hand, as a unified engine, Flink is one of the few that can implement all the semantics defined in AI Flow. We will demonstrate how users can define a production-level AI workflow with Flink using AI Flow.

** Qin Jiangjie, Apache Flink PMC, Senior technical expert of Alibaba.

Flink Forward Global Online Conference

Best way to watch

The live broadcast will be held on Flink Forward’s official website. Click “Read the original article” or copy the link below for more details. You can book the live broadcast after registering and logging in. At that time, the community will remind everyone to participate in the form of SMS notification in advance.

The conference website live booking: developer.aliyun.com/topic/ffsf2…

The following information is displayed after the reservation is successful:

Full Edition Agenda

Flink Forward global live broadcast highlights are divided into four parts: Key Keynote topics, Flink best Practices, in-depth technology applications, and community ecology. In the form of live broadcast in Beijing, Shanghai, and Hangzhou in turn, you will understand Flink’s core advantages and future development through practice cases of diverse scenarios.

■ Live: April 25-26

■ Sharing guests:

  • Apache Member, Flink PMC
  • Apache Flink core contributor
  • Dachang first-line technical experts

■ Detailed agenda:

April 25-26, Flink Forward global live Chinese highlights! To learn more about the conference, you can scan the qr code below the nail into the group consultation ~

If you are interested in the live broadcast of Flink Forward Virtual Conference 2020, click the link below to learn more about the full Flink Forward Virtual Conference 2020 agenda and register for booking!

www.flink-forward.org/sf-2020/con…