This is peng Wenhua’s 127 original article \

directory

  • The basis and core of Kuaishou data governance
  • Kuaishou model regulates governance practices
  • Kuaishou model specification
  • Kuaishou model governance case
  • Kuaishou data governance system
  • Outlook and Summary




The end of the year is really a time of sharing and a time to feast for novices. There were a lot of conventions today, and I had several remote listening sessions on at the same time. However, I was soon attracted by Kuaishou’s “Data Governance Technology Exchange Meeting”, which not only systematically explained data governance clearly, but also presented the practice of Kuaishou’s data governance. What is more important is kuaishou’s open mind. After the meeting, he shared all the PPT without watermark. Gotta like it like crazy!

Let’s start with a panoramic view of the town building. This article mainly shares the first topic: Kuaishou’s data governance practices from model specification onwards. At the end of this article is the way to download PDF information.



This part is explained by Kuaishou data governance expert Sun Wei:

The basis and core of Kuaishou data governance



Kuaishou is based on DAMA theory to carry out data governance work. I have previously shared relevant frameworks and organizations for data governance both at home and abroad.Premise of data asset-construction of data governance system”. The biggest advantage of following a framework is that everything is considered and nothing is left out. Other data governance frameworks for reference include DMM, DMCM, DCMM, etc.



There are many theoretical frameworks for data governance, but these are the core. There are other things that are not included, such as strategy, organization, etc., but these are not relevant to the data model specification, so they are not included.

The logic of this slide is very important. The integrity of the underlying rules determines the stability of the superstructure. So the first session sharing the model specification was also meaningful.Copy the code



Vertically, the data model determines the construction of upper metadata and even data services. From a horizontal perspective, both quality and safety specifications serve the data model, so the importance of data model specifications can be seen.

Kuaishou model regulates governance practices



Again: “If you think the cost of data governance is too high, try the cost of data chaos”!

In fact, 99% of all the problems we encounter with data are caused by lack of data governance. The root cause of the lack of data governance lies in the organization’s lack of attention.



Kuaishou’s model specification governance ideas are very clear, first define norms and standards, then determine the stage goals and reference methodology, and finally use metadata to drive data governance.



The scene of Kuaishou is that there are too many business ecosystems, dozens of product lines, and the iteration is very fast. Therefore, the construction target is defined as an enterprise-level data center to provide data services for core business lines.

Kuaishou model specification



Kuaishou’s data warehouse is divided into 5 layers, ODS, DWD, DWS, theme layer and APP layer. The site also shared a principle of how data is defined into the common business layer or in the common base layer. In fact, if the business is fixed, it is easy to distinguish between the common base layer and the common business layer. But as the product iterates very quickly, tables with a common business layer become more and more important, and may even sink into a common base layer. For more information on data warehouse layering, please refer to another article I shared earlier.Poke me to look up: number warehouse exactly want cent how many layers?”



The standard definition of index system has basically formed a unified consensus, we basically do so. Indicators are divided into atomic indicators, derived indicators and derived indicators, including statistical period, granularity, various business restrictions (time, region, various business conditions, etc.). Some special indicators also have special definitions such as statistical point (for example, financial data need to be collected at a specific time). For details, please refer to the index system section in “How to build a data Warehouse”.



On the livestreaming platform, some people give feedback that is too boring. In fact, this is a very dry product. These specifications are not specific enough, but the basic management framework is laid out. In fact, the most important thing about data governance is organizational management. Without this map, all other content design is useless.

Many companies don’t have a “decision board,” which leads to an explosion of metrics that can’t be controlled. The most abominable is those business people, unable to complete the task to change the STATISTICAL caliber of KPI. An example is the indicator management process specification defined in this figure, where A1A2 is a company-level indicator and A3 is a depart-level indicator. Department level indicators are developed directly after being validated by the data analyst or data product manager. A1A2 is an enterprise-level indicator, which must be reviewed by the decision committee before entering the data development stage. Toutiao, Meituan, iQiyi and other companies are playing this way.

Kuaishou model governance case



The left side is the conventional way of building by business, which is just a stack of chimneys, which is actually the logic of Kimball’s construction. The right side is the master plan, unified implementation, as far as possible in the bottom level, layer by layer from top to bottom construction, the business layer wants to see the data, directly connected with the APP layer on the line, in fact, this is the Inmon data warehouse construction logic. There’s a little point here, top-down, bottom-up, not the top and bottom of the stack, but the top and bottom of the stack. A lot of people get confused.



Similar index requirements on the left side should be the most hated by warehouse engineers, but there is no way under the existing framework, can only do repeated work.

The unified metadata control is different. The unified facts can be abstracted from the DWD layer, and the unified atomic indicators can be abstracted from the DWS layer, which can be extended into derivative indicators and derived indicators in the way of domain modeling in the APP layer. In the diagram above, duration indicators can be abstracted into atomic indicators, and business 1 duration indicators are derived or derived indicators (atomic indicators plus business constraints are derived indicators, plus aggregate modifications, and business objectives are derived indicators). For details, please refer to the index system section in “How to build a data Warehouse”. So even if we add another business, we only need to handle it in the APP layer, not through the DWD layer.



Rapid model changes due to business changes are a warehouse engineer’s nightmare. The method of Kuaishou is somewhat similar to the DV model, the core model is basically unchanged, the expanded content is put separately, after decoupling, it is comfortable. Business changes rarely penetrate the core model, and change and operations are reduced.



In fact, I guess I was distracted on this one, so I don’t understand it in retrospect. You wouldn’t normally see anything on the left. When we say hierarchical construction of data warehouse, the significance of DWS is to further abstract and decouple. The DWS layer is removed only in small companies or scenarios where the business is very single and stable.

Kuaishou data governance system



That’s a great line. If you concentrate on one task for a long time, a similar insight will develop. Single-point problem-solving often leads to endless tasks, missing the forest for the trees, and slowly killing yourself. Building systematic thinking and setting up overall thinking frame is the only magic weapon to solve the problem. Relying on the DAMA governance system is undoubtedly very wise and correct.



Taking DAMA’s data governance system as a reference, disassemble data, capabilities, products, and support and assurance to tease out the figure above. I just want you to read a little bit, a few thousand words.



This data model metric system is interesting. It can be used as a KPI assessment source for warehouse team. The picture is not clear, you can download the PDF for reference. I have redrawn the health quantitative index system of the whole data governance:



This index system can be refined down to individuals and then directly ranked. In the “Kuaishou big data cost management” share, also used the ranking of some of the indicators. It seems that we can’t go anywhere without KPI pictures.

Outlook and Summary



This is a familiar picture. The previous figure adds a systematic governance tool to the right, and then reinforces security and data value.

Overall, Kuaishou data governance is a good place to start. Framework reference DAMA, first organization, then standard, foundation, strong execution, service.

The whole thing is targeted, organized, disciplined and methodical, steady and steady. It can be used as a reference for data governance in the industry.

2. Kuaishou data Governance Practices from Model specification – Sun Wei. PDF is ready for you. Reply “DAMA” to download the DAMA-DMbok document.