What is data governance

Data governance is a generic concept, in the definition of multiple scenarios it but can small, sometimes we refer to a long-term strategic planning data, sometimes we mean cost in order to solve the data (generally divided into storage and computing cost) problem of governance, so the share in netease strictly selected a special mention to do “borderless” governance.

Let’s start by looking at three definitions of data governance:

  1. Data governance is the management of data assets activity exercise of power and control the activities of the set (planning, monitoring and enforcement), is the organization to maximize data assets to carry out a series of continuous working process, the specific data of related parties, coordinate data related parties reached in liability interests, promote the data related party to take joint action.
  2. IBM believes that data governance refers to a quality control specification that injects rigor and discipline into an organization’s information management, use, improvement, and protection processes. Effective data governance improves the quality, availability, and integrity of an organization’s data by facilitating cross-organizational collaboration and structured decision making.
  3. DGI (data management institute) think, data governance is a through the process of some information related to the distribution of decision-making power and responsibility division of labor system, the process according to some consensus model to run, these models describe the who, according to what information, at what time and circumstance, with what method, what action to take.

I think the definition of DGI is the most easy to understand, and it can be used as the consensus of data governance.

Now that we know the definition, how do we understand data governance? At present, there are many data governance standards and frameworks in the market. If you do research without presetting problems or thinking in advance, you will be dazzled and lost. Here I recommend three research methods:

  1. First find the greatest common divisor, good system design, system will be divided into stable base model and external along with the expansion of business change and convenient iteration model, this kind of loose coupling can avoid iterative changes to the design of can’t control the scope, this truth in our understanding of data management is the same, we find what is the common theme of the framework, standards, That is what we must satisfy when we build our data governance theory, and we can prune different topics, depending on whether our business reality requires that topic to be included in the model or not. The largest common divisor of each framework and standard can be seen below:

  2. In stages. Data governance is a grand and long cycle topic, and data governance work cannot be accomplished overnight, let alone there are dependencies between some work. Therefore, we need to divide the topic of data governance in different stages. Meituan has carried out relevant research on this aspect, and I think the analysis is very good, so I took it directly:

  1. Top-down, data governance works from the highest level to the most fine-grained execution details, so understanding data governance from the top level of strategic planning helps us understand it from the “grand narrative” before focusing on the execution details. A graph is recommended to represent the concept of data governance. The graph can carry the most information, and the simpler the graph is, the more refined the information is. Here a graph can be used to represent:


Using the above three perspectives, what consensus conclusions can we reach?

The framework of data governance can be divided into at least 3 layers. As the base at the bottom, it is the minimum theme union of actual data that we should pay attention to, which are as follows:

  1. Data costs (including storage, computing and even scanning costs)
  2. Data quality
  3. Data security
  4. Data modeling
  5. Data value (not sure, but listed in each framework)
  6. Data services/applications/realizations

Data value uncertainty should be relegated to the bottom as a topic (if there are many systems and efforts that contribute directly to increasing data value) or as a results-oriented measure, such as ROI.

The second layer is our intermediate architecture for governing underlying data topics, which can include:

  1. organization
  2. standard
  3. System/Control
  4. Methodology/techniques (i.e. specific practice ideas, such as DQC for quality management, lifecycle management for cost management)

The last layer is our governance objectives, namely: strategy.

We can fit a few methodological things into this framework:

The project name The theme
DQC Data Quality Center Data quality
DQC Data Quality Center Data quality
Life cycle management The cost data
Health degree evaluation system Data standards

Once we understand the framework of data governance, we need to produce a data evaluation system – define what is good first – before discussing the specific work. What is a good modeling model? What are good governance outcomes? Then start again, and finally go back and look at the ratings so that you have a quantitative result to evaluate the implementation of data governance.

There are other more specific points of agreement that we can fill in later. For example, data governance always contaminates before governance; Data governance cannot be accomplished overnight. It is a medium – and long-term plan, so it is very important to formulate strategies. Data security is the bottom line, cloud cloud, these consensus has been established, we can organize the specific governance work.

The last is in combination with the practical situation of the company, to discuss the concrete implementation, such as what is the current market existing mature methodology, existing mature solution is what, how selection, such as quality mature methodology under the theme: the data assets ranking scheme, data processing card point calibration, DQC data quality center.


Is it useful to know about data governance? Is it worth much? This is a question for a seasoned — ah, no, a mature R&D habit to think about, because his energy and time are no longer unlimited, his career is no longer on the rise, he needs to invest his time and energy selectively, and strive for maximum appreciation return. Because I need to research the companies published data management information, found that as an industry benchmark of ali, ali dataWorks versatility to data management platform can be used as product sales, Meituan, netease, yan has praised, fly, mushroom street, no matter big small factory, has the fame, data accumulation of early, basically reached the early data governance stage, By 22 there was a mature and accurate theory internally and a system (a set of tools and platforms) that continued to generate revenue, and subsequent work should simply iterate on the existing framework. So, will these companies still need people who understand data governance theory/practice? Data governance in the Internet industry is still in its early stages. Are there many companies that still need this kind of talent? Whether those mature companies already have talents and platforms in this field, they just need to recruit some primary R&D with strong execution ability to implement iterations. Just like the Soviet Red Army in 43-45, when executing campaigns, the frontline soldiers were consumed at a high speed to fill the battle lines, but the logistics could replenish the troops to each group army very quickly. So you can get ready for the next battle faster than the enemy.

Behind, I thought the idea should be misplaced, because the data management is a high-level strategic planning, if can’t understand theory, as the development of the soldier is good, as a team leader, team leader may can’t fully understand the high-level convey large data strategy, can not correctly distinguish between main goal and secondary goal, can’t put the key milestones and dismantling. Moreover, data governance is a long-term process, and most big data work of companies will be more or less classified as governance. I believe that governance will become more and more professional in the future, and even the title of “data governance engineer” will become common. Finally, the work of data governance is infinite, there is no worry about nothing to do, except that the benefits may gradually decrease with the marginal utility, data governance if the full score of 100, it is not possible that now every major company is 100, right? There are definitely flaws that are not considered enough or sacrificed for speed when moving fast, and that’s where optimization is worth it and where you can continue to reduce costs and increase efficiency. Finally, if there is a data governance platform that is too good to be improved, sell it as a solution. Continue to iterate and meet the needs of customers while generating revenue for the company. From the perspective of trend, after informatization, digitalization is a fully affirmed development direction at the policy level. Digital transformation of so many traditional enterprises is also a big cake. So why worry about having nothing to do?