The construction of data warehouse is an indispensable part of “data intelligence”, and it is also an inevitable challenge in large-scale data application. In intelligent business, the result of data represents user feedback, and the timeliness of data acquisition is particularly important. Fast data feedback can help companies make faster decisions and better product iterations, and real-time data warehousing plays an irreplaceable role in this process.





How to build real-time data warehouse better, what excellent production practice experience can be used for reference?

From November 28 to 30, Flink Forward Asia invited data warehouse experts from Netflix, Meituan Dianping, Xiaomi, OPPO, Cainiao, etc., to focus on the role of Flink real-time data warehouse in data link and its important value in intelligent business. Share the application practice of real-time data warehouse and the exploration and thinking of platform intellectualization.

Meituan-dianping is based on the practice of Apache Flink real-time data warehouse platform

LuHao | Meituan comments on senior technical experts

Meituan Dianping has numerous businesses, involving dozens of business lines. The data volume is large, and the processing peak reaches 150 million pieces per second, and the daily data growth exceeds 3 trillion pieces; Most businesses are trading scenarios with long links and various states, so the business faces great challenges in the construction of data warehouse. As businesses have higher and higher requirements for timeliness, such as instant delivery and real-time marketing, more and more businesses have put forward demands and exploration for real-time data warehouse. The real-time computing team investigated and summarized the construction experience of multiple business lines in real-time data warehouse, and built a one-stop real-time data warehouse development platform to better support business development.

This sharing will mainly introduce the business application and scale of real-time computing, the construction of multiple businesses in real-time data warehouse, as well as the real-time computing platform and real-time data warehouse platform based on Flink.

Evolution and practice of Xiaomi streaming Platform Architecture

Garu | millet streaming platform, senior r&d engineers

Xiaomi group business lines are numerous, covering many fields from information flow, e-commerce, advertising to finance, etc. Xiaomi Streaming platform provides integrated streaming data solutions for all businesses of Xiaomi Group, mainly including data acquisition, data integration and streaming computing three modules. At present, the data volume reaches 2 trillion pieces per day, with 15,000 real-time synchronization tasks and 1 trillion pieces of data calculated in real-time. With the development of Xiaomi’s business, streaming platform has undergone three major upgrades to meet the various needs of many businesses.

The latest iteration is based on Apache Flink, which completely reconstructs the internal modules of the Streaming platform, while xiaomi’s services are gradually switching from Spark Streaming to Flink. This sharing mainly includes the evolution of Xiaomi streaming platform architecture, the design and productization of the new version streaming platform architecture based on Flink, xiaomi’s typical business application practice, future challenges and planning, etc.

Evolving Keystone to an Open Collaborative real-time ETL Platform

Zhen-zhong xu | Senior Software Engineer at Netflix

Netflix is committed to the joy of our members. We are relentlessly focused on improving the product experience and quality content. In recent years, we have been investing heavily in technology-driven Studio and content production. In the process, we found that there are many unique and interesting challenges in the field of real-time data platforms. For example, in a microservice architecture, Domain Objects are distributed across different apps and their stateful stores, making real-time reporting and entity search discovery with low latency and high consistency particularly challenging.

In this talk, we will discuss some interesting cases and share various challenges and solutions to distributed system infrastructure. We’ll also discuss what we’ve learned in development operations, some of our new visions for an open self-service real-time data platform, and some of our new thinking about the Realtime ETL infrastructure.

Architecture evolution and application scenario of cainiao real-time data warehouse in supply chain

Jia yuan rookie Joe | advanced data technical experts

Mr. Jia Yuanqiao works in the network supply chain data team of Cainiao, focusing on the construction of cainiao supply chain data warehouse, data product development and data technology innovation.

This sharing mainly introduces the evolution of real-time data technology architecture of Cainiao supply chain data team from the aspects of data model, data calculation and data service, as well as typical real-time application scenarios and Flink implementation schemes in supply chain scenarios.

OPPO is based on the practice of Apache Flink

Handsome | Apache Flink Contributor, OPPO big data platform, head of research and development

Mr. Zhang Jun led the construction of the data center of OPPO covering the whole link of “data access – data governance – Data development – data application”. He has worked in Morgan Stanley and Tencent successively, and has rich experience in data system research and development. At present, he focuses on data warehouse construction, real-time computing and OLAP engine, and is also a contributor to Flink open source community. This speech mainly shares OPPO’s construction of real-time data warehouse based on Flink:

1. Construction background 2. Top-level design 3





The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.