Introduction: On October 19, in the 2021 Cloud Conference, Ali Cloud released DataWorks full-link data governance product system, based on data warehouse, data lake, lake warehouse and other big data architecture, DataWorks to help enterprises governance internal rising “data suspension river”, release enterprise data productivity.
Alibaba Group vice President Ali Cloud Intelligent computing Platform business division senior researcher Jia Yangqing live sharing
“As the volume of data becomes larger and larger, the value of each unit of data becomes smaller and smaller. Full-link data governance allows data to flow from low quality and inefficient to high quality and efficient.”
Alibaba Group vice president, Ali Cloud intelligent computing platform business division senior researcher Jia Yangqing said at the scene. The silting of the Yellow River causes the riverbed to be raised continuously, forming the “overhanging river” which is higher than the ground level. In Kaifeng, Henan province, the highest hanging river reaches 10 meters, and the riverbed will increase at the speed of 10 centimeters every year, and then, the dykes on both sides are also constantly increasing. In the digital transformation of the enterprise, as the quantity of data gets bigger, the machine gets more and more, the team gets bigger and bigger, is the digital transformation really getting better and better? For companies, the appearance of prosperity does not mean there will not be a flood. In Alibaba, Double 11 has become a daily routine. In 2021, the water mark of daily data processing for MaxCompute, a big data computing service, has exceeded the peak value of Double 11 in 2020. The increasing data volume has caused great pressure on cost and efficiency.
Machine efficiency + human efficiency = data efficiency
In the face of such an annual expansion of data, Alibaba’s solution is to make data efficiency a core indicator of the enterprise through the ability of the integrated platform of big data and AI. In terms of machine efficiency, MaxCompute, as an offline warehouse, can process 1.7 exabytes of data in a single day. However, in addition to the data volume, we should pay attention to the fact that MaxCompute supports 75% of the data volume growth with only 10% of machine growth. MaxCompute continuously strives for extreme optimization in the underlying storage and performance, and has broken the TPCX-BigBench 100TB scale performance world record for 5 consecutive years. At the same time, Hologres, as a real-time data warehouse, can write 596 million pieces per second at its peak and store up to 2.5PB in a single table. It provides multi-dimensional analysis and services based on trillions of data, and 99.99% of queries can return results within 80ms. Hologres and MaxCompute form a data warehouse integrating offline, real-time, analysis and service, which greatly simplifies the complexity of big data architecture from the bottom. Machine-level efficiency is easy to measure, but human efficiency is hard to quantify. DataWorks has been the unified big data development and governance platform of Alibaba Group since 2009, and completed the establishment of alibaba data center. Users tend to vote with their feet on the completeness and ease of use of a platform. At present, the daily active users of the large-scale collaborative data center built on DataWorks have exceeded 50,000. On average, 1 out of every 3 Alibaba employees is using DataWorks, serving almost all departments within Alibaba, and precipitation of the core capacity of full-link data governance has exceeded hundreds. In FY2020, Alibaba’s comprehensive income from data governance exceeds 1 billion yuan. It can be said that DataWorks, the big data development and governance platform, MaxCompute and Hologres form a “Wintel alliance” under the big data architecture to jointly improve the efficiency of enterprise data.
Construction experience: from small workshop to large platform to agile manufacturing
Data governance, or data centralization, has never been an ivory tower product, but one that has been honed over many years. Alibaba’s digital transformation also went through the slash-and-burn era, where each business team maintained multiple Hadoop clusters, like small workshops: whatever was available, whatever was needed, and various technical components were gradually piled up like building blocks. While in the process, often will be very painful, platform released a new feature, don’t know what reason to another component to hang up and then technical personnel take a long time to go to another component of the February have what problem, to repair a component, release, and another to hang up and problems appear ceaselessly as “press the gourd float gourd ladle”, as if forever has no end. Therefore, Alibaba started a vigorous platform unification plan, set up a large platform, changed the open source architecture into the self-developed architecture, and gradually transferred the data to MaxCompute. At this time, the concept of data center also began to be promoted in the group, and gradually implemented the methodology of three ONE data center to DataWorks, completing the construction of the whole data center of Alibaba. So far, from the core e-commerce e-commerce tmall Taobao, to Ele. me, Youku, Hema and other business teams have all carried out one-stop collaborative data development on the same large platform. But as big platforms become ubiquitous and more people use them, data governance becomes more complex. Companies have no way of knowing how many malformed statements are consuming computing resources like termites, with thousands of tables being created. How many tables are being replicated repeatedly, creating the appearance of a “data boom”; How much dirty data is constantly producing the quality of tainted data; How many tables are being used continuously by applying for permission and facing data security risks? All these problems pose severe challenges to the big platforms. As a result, the large platform gradually evolves to agile manufacturing, which is controlled from a global perspective through the data governance capability of the whole link, and at the same time realizes the decentralization of data decision-making.
DataWorks full Link Data Governance new release
2021 Cloud Computing Conference Full link Data Governance Summit, DataWorks in 12 years of accumulated hundreds of data development governance capabilities, the release of full link data governance series of new products.
Data Governance Center
Data governance for big data teams is not only a technical issue, but also an organizational and management issue. How is the ultimate impact of data governance measured across the organization? How to give better play to the initiative of the organization? In some enterprises, a special data committee is set up to formulate some data governance norms, but they find that the platform does not support these norms well, or they buy a data platform, but they do not know how to complete the data governance work through the platform. Alibaba often refers to the concept of health division. In terms of organizational design, there are platform team, business team, risk control, finance and other collaborative teams under the data committee. That for a business team, to set a goal this year, for example, to promote health from 80 points to 90 points, from the computation, storage and other aspects, not only from the business side, the production side management optimization work, there is demand will also give to data platform team, optimize the engine and data platform product evolution, together toward that goal. With measurable means for organizations, departments can incorporate these numbers into their goals. At the same time, long-term operation work such as various data governance campaigns and competitions of various teams can also be extended continuously through healthy division to achieve the purpose of organizing data collaboration and give play to the initiative of data governance organizations.
DataWorks newly released data Governance center, for enterprise computing, storage, research and development, quality, safety five aspects of enterprise data governance health, to the problem driven concept, covering the pre-event, in-process, post-event full-link active data governance and data governance health evaluation. Enterprise data governance is no longer a “phased project”, but a “sustainable operation project”.
Intelligent data modeling
What is the value to the business people when the enterprise has built a platform and done a lot of normative governance? How much cost to save, how many problems to govern, relative to business personnel is insensitive. The business side only hopes to get the data they want more quickly, so the original data warehouse construction method is more small run from the bottom up, and quickly meet the demand first. However, today’s full-link data governance makes the construction of data warehouse to standardization and sustainable development direction, emphasizing the top-down normative modeling from a business-oriented perspective and the bottom-up construction of data warehouse from a development-oriented perspective.
DataWorks released a new intelligent data modeling, precipitation Of Alibaba data center construction methodology, from the data warehouse planning, data standards, dimension modeling, data indicators four aspects, to the business perspective of business data business interpretation. Intelligent data modeling supports fast data modeling, including forward and reverse modeling, and provides minute-level model creation capabilities. At the same time, through data development, you can directly publish the data model to multiple engines, one-click generation of quality rules, directly publish tables and automatically generate ETL simple code. The business personnel of the enterprise can easily understand the full picture of the data, quickly obtain the required data indicators and conduct data analysis and exploration based on the data model. All the employees in the enterprise can realize the fast understanding and circulation of “number with text”, so that the data decision-making can realize the real effective decentralization!
Hema launched a new retail industry data model reX-LDM through DataWorks intelligent data modeling
At the same time, the site also released DataWorks data integration real-time synchronization capability, intelligent data query, privacy security computing, DataWorks open platform, data relocation tool and cloud relocation expert services and other functions.
China’s digital economy reached 5.4 trillion US dollars last year, accounting for nearly one third of ITS GDP, according to the White Paper on Global Digital Economy released by the China Information and Communication Academy in September 2021. In the era of digital economy, data has become a key factor of production, just as land and labor are key factors of production in the era of agricultural economy and industrial economy. DataWorks through intelligent data modeling, global data integration, efficient data production, active data management, comprehensive data security, fast data service six full-link data governance capabilities, bearing the possibility of digital transformation of thousands of industries. DataWorks has landed thousands of clients in digital government, new finance, new retail, energy, industry, transportation, gaming, education, digital marketing and more.
The big data center of State Grid realizes the unified management of PB-level data of headquarters +27 provincial (municipal) companies through DataWorks, and accelerates the overall digital transformation and upgrading of power grid through the governance and monitoring operation system of full-link data center.
Based on the open source EMR engine, Chuangmengtiandi replaced its own scheduling system with DataWorks, so that the technical personnel inside the enterprise could focus more on business and help the digital operation of the game industry.
Mondelez China implements full-link data model governance through DataWorks intelligent data modeling, which greatly improves the self-serving capability of the data center, enables enterprises to decentralize data decisions, and releases the digital power of new retail.
Enterprise digital transformation is entering the deep water area, “data hanging river” will gradually become the “Sword of Damox” of the enterprise, Ali Cloud is working with customers and partners from all walks of life, through the whole link data governance, good data management, good data use, so that data to advanced productivity gathering!
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.