Conference speech, roundtable discussion, exciting activities and live direct…. Huawei Developer Conference 2021 dry materials are here! What are you waiting for? Come and get technical benefits! Click Collect Now

Abstract: At Huawei Developer Conference (Cloud), Huawei Cloud released GaussDB(for openGauss), which is a financial distributed database for the core business load of government and enterprise.

This article is shared in The Huawei Cloud community how to Implement Massive Data Minut-level Analysis across Lakes and Warehouses of Huawei Cloud FusionInsightMRS. The original article is written by Hourglass.

Huawei Developer Conference 2021 (Cloud) was successfully held in Shenzhen from April 24 to 26, 2021. Under the theme # Every Developer is Awesome, this conference brings a technology feast in ICT to many developers.

During the conference, a series of special lectures titled “Master Lecture Hall” created by Huawei technical experts Tiantuan discussed the value of technological innovation and shared innovation practices around topics such as cloud native, big data and artificial intelligence. Wu Wenbo, architect of HetuEngine for Huawei FusionInsight MRS Cloud native data lake, shared the topic of “How to Implement Minute-level analysis of Massive data in Cross-lake and cross-warehouse Scenarios”.

Huawei FusionInsight MRS Cloud native data lake HetuEngine architect Wu Wenbo delivers a speech

Traditional big data platform fusion analysis has three major problems: data wall, data difficult to get through, and slow data collaboration

With the application and development of big data technology, we have more kinds of data, as well as more widespread, query scene is becoming more and more complex, especially in the emerging business, using a platform is needed in the offline analysis, real-time analysis, chart analysis, the text analysis and interactive query engine, multiple heterogeneous data fusion can revitalize the data, through the development of data mining data value, Give full play to the role of data as a factor of production. However, traditional big data platforms gradually show fatigue when dealing with data fusion analysis, and there are the following problems:

Data walls exist between multiple data sources: Hive, HBase, MPPDB, Oracle… There are many data components, forming “data wall” among them; To meet the requirements of different scenarios, data is stored repeatedly to multiple data components, such as Hive (historical data), HBase (original data), and MPPDB (special data).

Multi-center data is hard to get through: analytics applications can only be based on local data; Collision analysis using data from the external center requires moving to the local area first, which is complicated and inefficient. Remote data processing requires local deployment and maintenance of processing platform, complex architecture;

It is difficult for multiple data centers to form joint force: data is concentrated in the main center, resulting in abnormally high load in the main center, but serious idle in the sub-center; Urgent tasks need to be handled quickly, but because the data in the sub-center has not been synchronized, it cannot be analyzed; The computing and expansion capabilities of multi-DC and multi-cluster are much stronger than those of a single dc. However, services can only be supported by a single DC because cross-DC access technology is in a blank state.

Simplified use number, HetuEngine unified interface, cross-lake, cross-warehouse, cross-cloud collaborative analysis reduced from days to minutes

In order to make data use easier, cross-lake collaboration easier, and solve the above three problems, Huawei launched “HetuEngine”, which was released in November 2019 and officially opened in June 2020 (open source name openLooKeng). HetuEngine is a unified and efficient data virtualization engine that seamlessly integrates with big data ecology to achieve massive data second-level query. The industry’s first multi-source heterogeneous collaboration to achieve one-stop SQL fusion analysis.

HetuEngine has the following features:

  • High-performance interactive query: Traditional big data uses Hive to construct AD hoc query tasks, which take a long time. HetuEngine uses heuristic indexing and execution plan Cache to achieve second-level query responses.

  • Cross-lake, cross-warehouse, cross-cloud integration: traditional data analysis needs to unify data format first. HetuEngine can realize join between different data formats to reduce data relocation and improve efficiency by 30% compared with traditional schemes. Traditional DC analysis to build manual ferry data, HetuEngine can be connected by DC Connector, data globally visible, collaborative time from days to minutes;

  • Multi-engine integration: Traditional big data in multi-engine component development, need to involve multi-component customized development, HetuEngine can unified SQL interface access to big data, reduce the number of threshold, development efficiency 2-10 times.

At present, huawei FusionInsight MRS cloud native data lake provides a solution for governments and enterprises to integrate lake and warehouse. One architecture can construct three types of data lakes: offline data lake, real-time data lake, and logical data lake. The logical data lake provides unified access across lakes, warehouses and clouds through HetuEngine, reducing data relocation and efficient data flow. The minute-level collaborative analysis of data in the whole domain improves the efficiency of online business by 10 times, shortening from weekly level to daily level.

HetuEngine has been used in a wide variety of industries, so let’s take a look at HetuEngine in a typical financial scenario.

Icbc realizes real-time BI based on HetuEngine to accelerate the flexible data exploration of financial data lake

Icbc financial Data Lake carries all the original data of the head office and branches for data analysts to explore and analyze. At present, there are 5000 queries per day, the average query data is 1 billion lines, and the maximum query data can reach 10 billion lines. With digital transformation entering the deepwater area, diversified business demands put forward higher requirements for data fusion analysis.

In some scenarios, the financial business needs to process the raw data into thematic data using batch processing technology in the data lake, then move the data mart across the cluster, and then do BI analysis from the data mart. In traditional big data platforms, SAS and other tools have poor performance in accessing data lake data through Hive SQL, with an average response time of 5 minutes to 2 hours and less than 10 concurrent capabilities. In addition, the lake warehouse data is fragmented, and the data is processed and loaded into OLAP bazaar, resulting in long data links and low analysis and development efficiency.

The HetuEngine provided by huawei FusionInsight MRS cloud native data lake solves the problem of data analysis between data lakes and data warehouses, avoiding unnecessary ETL.

  • HetuEngine data virtualization to realize the collaborative analysis of lake warehouse interconnection;

  • Avoid unnecessary ETL processes and reduce data migration.

By introducing the HetuEngine data virtualization engine, the concurrency capacity of data lake query analysis is improved. Only 1/5 resources can support 45 concurrency, the maximum peak concurrency is 200QPS, and the average delay is optimized to 8 seconds. In terms of collaborative analysis of lake and warehouse, HetuEngine can break through the data barrier between data lake and data warehouse, improve the performance of collaborative analysis of lake and warehouse from minute level to second level, and reduce the synchronization of data relocation between systems by 80%, greatly improving the efficiency of data management.

conclusion

As a unified and efficient data virtualization engine, HetuEngine gets through the data wall between multiple data sources and realizes high-performance data fusion analysis across lakes, warehouses and clouds. At the same time, HetuEngine provides a unified access entrance, shielding the traditional complex access interface, and unified use of SQL interface, reducing the threshold of using big data. Simplify the use of numbers!

Huawei FusionInsight MRS cloud native data lake will continue to innovate and expand the black land of the digital world. Together with 800+ISV, huawei FusionInsight MRS cloud native data Lake will provide customers with an integrated solution of continuous evolution of lake and warehouse. It can realize offline data lake, real-time data lake and logical data lake in one architecture, and build “one enterprise and one lake” in thousands of industries. One city, one lake “.

Click to follow, the first time to learn about Huawei cloud fresh technology ~