Recently Alibaba shared “Alibaba Data Center practice” this PPT (search the original article), for the originator of data center, or with great respect to learn, so carefully read, hoping to find something different.
Read these professional PPT, actual it is very time consuming, you need to put these PPT looks bright strip, death to dig under the above every word to understand the hidden meaning, and then to compare with your existing knowledge system, see if it helps to improve their cognitive, for you don’t understand, you also need to often go to retrieve relevant documents.
And, of course, many words not so rigorous write PPT, many of the concept of temporary building or is unique, therefore sometimes do some speculation, combined with their own practice to understand that the interpretation of this PPT with more than 6000 words, so please ready to burn the brain, though I didn’t go to the lecture, but I hope my “speech” also can make you learn kung fu.
Let’s get started.
1. Title and background
See the origin of this film, Ali cloud intelligent business division, is actually a little strange, remember Ali’s Business group in Taiwan including search business division, sharing business platform, data technology and product department, Ali cloud is a focus on cloud business platform, it is suitable for data in Taiwan?
What’s the difference, one might ask, between a platform and a platform? Ali cloud is not very appropriate to tell the central Taiwan?
The author’s doubts are as follows: in the general sense, the platform has business independence, and the concentration technology is ok, while the medium platform is the convergence of business, which is very relevant to the business. For the data medium platform, its core competitiveness is not the platform-level technology, but the understanding, processing and mining of data. It’s not realistic for a platform tech person to go to the front end and understand that data appeals to precipitate commonality, which is at the heart of creating value in the current data platform.
Of course, speaking PPT can not ask origin, can understand Ali’s data center can.
DT to the left, IT to the right
Traditional IT is the cost center, but with data, IT may become the value center. This value is reflected in: IT can provide decision support in management, and IT can provide intelligent tools matching with management in production, which is to improve the adaptation ability of production relations and productivity.
This point is good, such as Zhejiang mobile big data center is directly positioned as a profit center.
D forms DT through IT. For example, in the past, the IT channel system only accepts business, but now IT can load intelligent recommendation based on data.
DT is jack ma, propose a highlight the value of the data abstraction, not hard to understand, now China mobile has a three harmony concept: fusion, accommodation, RongZhi, I think IT and DT will be strengthen fusion facility, fusion is put together to sell, financing is the ability to share, with DT ability in IT, also want to have IT capabilities in DT.
DT is problem oriented and IT is demand oriented. These are two sides of the same question, not the difference between DT and IT. The difference between the new DT and IT is somewhat reasonable. For example, DT’s intelligent recommendation provides a method, while the previous IT recommendation relies on human judgment.
3. Hope of the enterprise organization for DT
Senior management team: This is the basic appeal of BI era, there is nothing to say; Big data’s more powerful processing, visualization, real-time and other technologies can provide better data viewing experience, which is improved compared with previous BI.
Business team: Mentioned three changes:
One is to use data to identify problems, not to slap heads.
Second, business personnel should understand both business and data, and even be able to DIY data and models.
Third, the data should be embedded in the production process to play a direct role, such as the label library to become the launch of marketing target users, risk control model to be embedded in the user operation process and so on.
The first point is that everyone is doing it. In fact, it is still based on experience. Data is only reference and evidence. Number two, number three is hard for most businesses to implement.
Technical team: Three key points:
First, “data run more” is the core of intelligent platform, zhejiang’s “run at most once” is to rely on data and platform integration to achieve this goal.
Second, IT personnel should have the thinking of data, which is very good. Those who lack the thinking of data seldom consider intelligence when designing IT systems. Now many enterprises’ acceptance system and recommendation system are both more or less for this reason.
The third is to discover new knowledge through data analysis, so as to empower the business, which is the mission of data technology team.
4. Big center and small front desk
This diagram illustrates the engine of Ali’s business operating system: big middle platform and small front desk. It is clearly displayed, especially reminding to understand two important concepts: business datalization and data businessalization.
Business datamation: all business activities should record relevant data, which is the mission of the business center.
In fact, the challenge of business digitalization is very big. In the past, when designing business platforms, functions and processes were the core, and only the data necessary for the realization of functions and processes were recorded, while the rest was dispensable.
For example, incomplete recording of some signaling logs of carriers may affect subsequent network analysis or realization of data value, so service datamation is not achieved.
However, business digitalization sometimes means huge cost input, which is easier to say than to carry out. The data of most enterprises is not the result of the implementation of business digitalization strategy, but just the incidentally picked low-hanging fruit.
One of the missions of the data team is to digitize the business. A lot of the good data is that you get to the front end and drive the business record data.
Data businessification: The essence of finding value from data and in turn enabling the business is well understood.
The term digital twin is also hot right now. The interconnected world of the future will record all your actions in real time, forming another digital you. This is digital twin.
5. Four typical scenarios of data empowerment
(1) Global data monitoring: in essence, it is indicators + reports + visualization, which is for managers, of course, business staff also need to see. The following is a double-11 large screen example.
(2) Data-oriented operation – intelligent CRM: it is mentioned to “establish a data connection extraction management system based on full-link and full-channel data with” people “as the core, and carry out fine management of users’ whole life cycle”. So many adjectives are not so dumb, what are they talking about?
Full link refers to data that records and tracks the entire business process vertically (including commodity planning, pre-sale and in-sale management, customer service management, order processing, warehousing and logistics, etc.).
Omnichannel is the user behavior data of each touch point, such as Tmall, Taobao, Youku and so on.
Therefore, a complete customer portrait can be formed by gathering the data of the whole link and channel, and then the required data can be easily obtained for analysis by means of connection extraction. Literally, it is similar to our label library positioning.
(3) Data implantation business – intelligent recommendation: it is quite clear here that marketing closed-loop management, from user segmentation, thousands of faces, channel recommendation, and marketing evaluation, the following is an example.
(4) data business – business staff: this is one of a few ali pushing pedigree data products, is a typical representative of data services and data directly to liquidate, can provide the owner with end-to-end analysis support, online introduced a lot of, this slide is highlighted in the history of business staff, now and in the future, a bit mean.
History: Contention of a hundred schools of thought, although raised data redundancy, poor experience and other problems, but without contention of a hundred schools of thought, it is impossible to have the emergence of this integrated product business advice.
Now: Business advisors dominate the world, relying on the data platform system, including OneData, OneService, OnePlatForm, etc., which will be explained later.
Future: one business advisor is not enough. We need to build a product development platform and copy one business advisor for different industries, that is, advisor X, with great ambition.
Why is the blood pure?
When you evaluate the value of data, it is difficult to say whether the business itself is good, the process design is good, or you recommend the data. Pure data products are a better way for data people to show their value.
Alibaba do data in the origin of Taiwan
Do data that the origins of China with general data warehouse integration model is the same, the need of sharing reuse, such as the original data based on taobao all business to build a set of the middle layer, and the middle layer is a lot of repetitive or similar, such as the ant topic business trade, Tmall also have trading theme, can that abstracts the transaction subject of public service for both business?
Therefore, you can see that Alibaba Data Center abstracts out the common core theme layer such as membership, goods, transactions, browsing, advertising, so as to serve the application layer. Each application layer used to do a lot of things of the public layer, but now can also fully reuse, theoretically can improve the speed of application construction.
The following slide compares the changes from the dependency graph of the data, one is mesh, representing the numerous relationships between each other, which must be a lot of redundancy, and the other is radial, a node can serve more back-end nodes, representing sharing and simplicity.
7. Alibaba Data Center Panorama
If you understand this picture, you will understand what Alibaba’s data center does. There are five parts directly related to the data center: DaaS, IPaaS, IPaaS and IaaS.
The author understands that the broad data platform actually includes DaaS, IPaaS of data Asset management and IPaaS of data R&D platform. If the narrow understanding only includes DaaS of data asset management and IPaaS of data R&D platform, which are called Energy Efficiency Medium Platform in the author’s enterprise.
(1) Computing and storage platform IaaS
SteamCompute: should refer to ali internal Flink version.
MaxCompute: EB-level data warehouse (original ODPS) developed by Ali.
Real-time computing ADS: short for AnalyticDB, mainly to provide real-time online analysis, which can be considered as the OLAP version developed by Alibaba.
(2) Data asset management IPaaS
Data asset management is the same thing as metadata management.
Asset map: in essence, it is a graphical version of the data dictionary. The answers to how much data alibaba has, how to store, how to find and use the data can be found from the asset map, which is quite vivid. From the online information, its design is worth learning from, and the following are some interface screenshots.
Asset analysis: You can think of it as BI analysis for metadata, structure analysis, trend analysis, whatever. You want to use metadata analysis to understand the status quo and find anomalies to guide the governance of data assets, such as the growth of the payment category.
Asset application: you can think of it as using metadata information to improve the utilization efficiency of data assets, such as digging out invalid data assets through impact analysis, so as to reduce data redundancy. This work is very valuable.
Asset operation: The word operation is overused. In fact, operation is not a function, but an action, hoping to make data used by more people through various measures, so as to generate more value, such as the recommendation of new data assets and so on.
The 80/20 rule in the use of data assets is very obvious. Most data is actually not accessed or used by anyone, and the cost of storage is very high. Only through operation can silent data be used by more people, and invalid data can be removed, so as to achieve cost reduction and efficiency increase.
(3) Data research and development platform IPaaS
This platform is the same thing as DACP mentioned in the previous article, which is responsible for data processing and requires a series of supporting functions, including data planning, exchange, processing, development, scheduling and monitoring, etc.
(4) DaaS in the data
Vertical data center (OneClick) : It is ETL in traditional data architecture, which collects data from various channels through offline and real-time methods.
OneData: It is the purpose of data warehouse modeling to ensure the standardization and unification of data caliber and precipitate common data. Alibaba adopts dimension modeling, which abstractions dimensions and indicators through analyzing business processes and finally summarizes them into the required warehouse model.
Extraction data Center (OneID) : It is the author’s understanding that In order to facilitate external data provision, Ali has formed a wide table with various ids (business core objects) as the unique identification, just like operators need to form a wide table system with user ID (mobile phone number), customer ID, account ID and family ID as the core.
Unified Data Service Middleware (OneService) : Provides data services externally through interfaces, using data warehouse integrated and calculated data as data source.
8. Precipitation and accumulation of Alibaba data
(1) OneData
Data standardization: to realize the unified specification of data assets in each domain, theme, model, field, index naming, etc. The author has always stressed that data standardization must be solved at the source. If Ali’s business system data assets follow this principle, it will be very powerful.
Technical kernel instrumentalization: My understanding is that the implementation of specifications must rely on tools to enforce control. For example, you can only build tables according to the requirements of standard templates, otherwise it cannot be implemented. Ali’s control in this aspect is said to be quite powerful.
Metadata-driven intelligence: with metadata analysis, we can scientifically calculate the demand for resources, and can do it very quickly and flexibly, eliminating the dilemma of finding the basis for each planning expansion, which is similar to the previous metadata application.
OneData is the core content of Alibaba Data Center. It has a Dataphin engine, which can realize data standard specification definition, automatic development of data model, real-time generation of thematic data service and other functions.
As shown in the slide below, this includes data import – specification definition – data modeling – data external correlation – data asset precipitation – data service generation and the whole closed loop chain through which most of the elements of data management are implemented.
This highly normative development mode also reduces flexibility to a certain extent, but its economies of scale are very good. Otherwise, alibaba’s huge data assets cannot be managed well. This author has deep experience.
Index standardization is something the author tried, because at the beginning I felt that there were too many repeated development statements, and index standardization can solve this kind of problem, this is the statement to a certain degree of natural ideas, the following Ali did the same as he did at the beginning, the so-called road to the same goal.
(2) the OneID
Suppose a user named Zhang SAN uses Baidu Map on the first mobile phone, watches Baidu iQiyi videos on the iPad, uses baidu APP on the second mobile phone, and uses Baidu Search on the PC. How to aggregate the user information of the same user on these different ends?
Different from the natural identification of mobile phone numbers by carriers, Internet companies face very high challenges in getting through all kinds of account ids. Id-mapping is a core technology of Internet companies, which needs to ensure that data collected in various fields can be integrated and associated with analysis without unified ID support. There is no point in centralizing diverse data for analysis; it is another form of data silos.
The four user records below, for example, actually identify the same person.
(3) the OneMeta
Here, “data asset analysis” and “data pedigree tracking” have been mentioned in the previous “Data Asset Management IPaaS”, which are very basic things in data management, especially comprehensive data management.
Security: Refers to definitions such as sensitive data classification and access control.
Quality: Refers to the definition of quality rules for data.
Cost: A comprehensive estimate based on the invocation and processing costs of data assets.
Personnel: Probably the definition of the data asset refers to the owning organization and individual. For example, there is an attribute in our data dictionary, and the creator and modifier of this asset must be identified for tracking and accountability.
(4) OneService
Topic-type data service: it should be a simple data service query engine based on metadata. It is business-oriented and unified data export and data query logic, shielding multiple data sources and multiple physical tables, and making a set of business-oriented pseudo-SQL to facilitate the number fetching.
Unified and diversified services: general query refers to general SQL query, OLAP is multidimensional analysis, online service is abstract, the author guesses that it is in the form of customized services such as data push and scheduled tasks.
Cross-source data services: Big data due to technical components, different data is often stored in different databases, such as hadoop, gbase, oracle, and so on, if want to undertake ad-hoc queries across heterogeneous database generally do do first data gathering, but some lightweight access to direct correlation analysis results, hence the service demands.
As far as the PPT is concerned, the author’s biggest feeling is that alibaba’s data center technology system is huge, but it pays great attention to details. A few words look simple, but it takes a huge price to implement, and it is a gradual process, such as Dataphin. If you want to know more technical details of Alibaba Data Center, recommend a book “Alibaba Big Data Practice”.
In fact, data center to do a good job is not a simple introduction of a few tools, technology is just technology, you can COPY technology but not COPY management and culture, and this is the key to the success of data center.
The bigger challenge in the data center is whether your organization has reached a point in its understanding of data and whether you can drive the organization to develop a data management mechanism and process that is appropriate for your organization. This is the hardest part. You have to make your own path.
Author: Fu Yiping
The original link
This article is the original content of the cloud habitat community, shall not be reproduced without permission.