Brief introduction: The OneData methodology proposed by Alibaba helps enterprises to clarify the management ideas of the whole life cycle of data, and more, it is integrated into the product Dataphin (intelligent data construction and management), and provides services for enterprises through AliCloud.

Dataphin Intelligent Data Construction and Management Platform

For large data construction, management and application in all walks of life appeal, one-stop consumption provides access to the data from the data link of intelligent data construction and management of large data capacity, including product, technology and methodology, etc., help to build a standard unified service, achieve mastery through a comprehensive, capitalization, and the closed loop since the optimization system of intelligent data, in order to drive innovation. Dataphin products directly to: https://www.aliyun.com/produc…

Difficulty is the best coach

Alibaba began to build its own big data system in 2008, and is committed to building a diversified business of data services. Along the way, we experienced all kinds of difficulties.

The technology is trapped in the unknown of temporary fetching: Ali has built a special “temporary fetching demand management system” to allocate the time quota of temporary fetching to each business line. Before the end of each month, the quota has been zero, and it often happens that business students catch up with data and technology students to work overtime fetching…… In order to change this situation, the establishment of “business personnel SQL skills training”, hope to use this way to let business personnel master the temporary number skills, the beautiful name is “enabling”. And the essence behind it: resources.

Definition of data caliber is different: once because of the difference in data caliber, almost caused the loss of businesses. The data forecast that the merchants saw in the background showed that the registration requirements could be met, so they prepared goods in advance and prepared to make a big effort. However, the final registration did not pass, because the data caliber of the small second side was inconsistent with that of the merchant side, and the data of the small second system evaluated by the merchant did not reach the standard, resulting in the failure of the registration. Although the problem was eventually solved through coordination. But this is the essence behind it: standards.

Working overtime to make the report, the report was also scolded overtime is normal, usually take 2-3 hours to get the number, and then check the difference will spend a lot of energy, often 1-2 days; In the final reporting process, some differences in caliber and data quality problems may cause embarrassment, and even wrong data may lead to decision-making errors. The essence behind it: quality.



In addition to the typical scenarios mentioned above, Alibaba also experienced the explosive growth of data volume due to the growth of business, and the lack of governance and management of data meant that the storage and computing costs of data kept rising. Cost is also one of the difficulties facing the big data field.



With the determination to overcome the difficulties, Alibaba started the construction of B2B business data, e-commerce business data and Ali business data. In the process, we should explore, settle and move forward while we are in the process. We should improve the data quality through more systematic data construction, reduce the risk of data reconstruction and improve the efficiency of data service. After nearly 10 years of grinding, Alibaba has accumulated the methodology of OneData construction (OneModel+OneID+OneService) based on actual practice. OneModel carries out unified construction and management of data through systematic data architecture, normative definition of data elements and structured dismantling of data indicators. OneID through the establishment of entity objects, object-related behavior data and label construction methods, the core business elements of the enterprise for assets; Unified thematic data unit construction is carried out for data assets, and data API construction is configured and provided with API services, so as to improve the convenience of data asset consumption and enhance the value of data assets.

Overcome pain points to create leading big data capabilities

With the acceleration of the global digital process, enterprises are facing more severe market competition, and the difficulties encountered in the transformation of digital intelligence also used to be the initial pain of Alibaba. As a result, Ali Cloud Data Center came into being and cooperated with enterprises of all walks of life in the field of data to solve the prominent data problems of enterprises:

● Data standard problems: chimney type development and local business service support, leading to the occurrence of indicators with the same name and different caliber problems; In history, different business systems were gradually iterated online, and the same object attribute coding was inconsistent. ● Data quality problems: repetitive construction leads to long task chain, many tasks, computing resources, data timeliness is not good; There is a disconnect between the document precipitation of caliber carding definition and the implementation of development code, and the risk of data accuracy is high.

● Demand response problem: chimney type development cycle is long, low efficiency, application oriented service is insufficient, resulting in slow business response speed, business is not satisfied at the same time the technology and feel no precipitation and growth; There is a shortage of talents who understand both business and data, and a lot of communication is involved from demand understanding to development and implementation, resulting in poor service efficiency.

● Cost resource problem: chimney type development of repeated construction waste of technical resources; It is difficult to go online and even more difficult to go offline. Changes in the source system or business can not be timely reflected on the data. In addition, the data is not standard, which makes it more difficult to research and develop and maintain.

The OneData methodology proposed by Alibaba helps enterprises to clarify the management ideas of the whole life cycle of data, and more, it is implanted into the product Dataphin (Intelligent Data Construction and Management) to provide services for enterprises through Ali Cloud. In addition to the data integration, development, release, scheduling, operation and maintenance capabilities involved in the whole link of big data processing, DATAPHIN also provides the data specification definition, logical model definition, code automatic generation, data thematic service capabilities, and efficiently completes the construction of good data.

Dataphin Product Core Module

Since its launch in 2018, Dataphin has developed into a full-size picture. So far, it has undergone several rounds of major version upgrades, and the core capability modules of the product are clearly revealed.

1. Environmental adaptation

The bottom layer is the Dataphin’s environmental adaptation capabilities. Dataphin supports different cloud environments, offering different options for customers of different sizes and deployment requirements, including public cloud multi-tenancy, public cloud VPC, proprietary cloud enterprise and agile, and local IDC deployments.

2. Engine support

On top of the cloud environment, different computing engines can be supported according to different cloud environments. Offline computing engines include AliCloud MaxCompute, Hadoop ecological engine includes AliCloud e-MapReduce, CDH5, CDH6, and will soon support FusionInsight, CDP, etc. Real-time computing engine supports Ali Cloud Blink and FlinkVVP. An open source version of Flink is also coming soon.

3. Data construction

Based on different cloud environments and computing engines, DATAPHIN provides the data integration, development, release, scheduling, operation and maintenance capabilities involved in the whole link of big data processing, as well as the data specification definition, logical model definition, code automatic generation, and data construction capabilities of the subject is query.

4, assets

Dataphin provides asset management capabilities such as asset map, asset blood relationship, asset quality management and monitoring, resource cost management and efficiency improvement, as well as configurable asset service research, development and management capabilities, which can quickly service and feed back the business of data assets.

This article is the original content of Aliyun, shall not be reproduced without permission.