In the process of data application, data acquisition and data governance are two core levers. In this paper, the”Methodology + Practice, Comprehensive Analysis of data Acquisition Scheme”After that, author Wang Zhuozhou elaborated his thoughts on enterprise data governance through two challenges and three principles. The main contents are as follows:

· Definition and importance of data governance · Two challenges of data governance · Three principles of data governance

Definition and importance of data governance

Before discussing data governance, we need to have a clear definition of what data governance actually covers. From our perspective, data governance refers to “an organization’s overall management of data availability, integrity, and security.”

Data availability means that the data itself is available, credible and quality guaranteed, and will not cause problems for subsequent data applications because of its data quality;

Data integrity means that the data collected by us is complete and can cover the needs of various data applications. Data assets will not be lost due to the lack of certain data collection.

Data security means that the process of data governance and sharing is secure and controllable. This process will not violate user privacy or leave security risks to the organization itself.

The core of data governance is to help us find buried problems and data problems earlier, more timely and more efficiently, and ensure the correctness of subsequent data applications and value transfer. Therefore, the importance of data governance is beyond doubt. It is the foundation and foundation of all data applications, and its quality directly affects the value embodiment in the process of data application. At the same time, data governance is also the basis for an organization to precipitate data assets, which directly determines whether the data assets of an organization can be effectively precipitated and whether the value of data can be fully played in the process of data application.

Two major challenges of data governance

There are two broad categories of challenges that you typically face in data governance.

The first type of challenge arises from objective technical problems.

Technical challenges are easy to understand. The more complex the business, the more data applications, the more data sources need to be collected, the more data problems need to be dealt with, and naturally there will be greater challenges. To be specific, these technical challenges include: · Data transmission challenges: for example, how to reliably transmit data through the public network; · Challenges in data accuracy: how to ensure that data is not duplicated or lost; · Challenges in time identity: how to deal with inaccurate client time; · Compatibility challenges: how to deal with compatibility between different systems and devices; · Performance impact challenges: how to minimize the impact on client performance and business during data collection; · Testability challenges: how to debug and self-test easily during data governance development.

In addition, data governance, or the collection part of data governance, will also face challenges in user privacy and security, including how to deal with the requirements of GDPR and the Ministry of Industry and Information Technology, how to audit sensitive fields within the organization, and how to control the permissions of data in different rows and columns. And how to encrypt and decrypt data during transmission and processing.

According to the first kind of challenge, we suggest that you can use the unified data acquisition and data import tools, such as the use of the professional data collection SDK, multiterminal data acquisition work can be unified, specialization and standardization, minimizing in the process of data collection or import, possibility of various problems.

The second type of challenge arises from people or organizational structure issues.

These challenges are mainly reflected in: the disunity of powers and responsibilities of key roles; Lack of coordination between multiple departments; Lack of common language among data governance stakeholders; The disconnection between the contributions of different roles and the benefits; Who is responsible for the quality of the data, etc.

It is very difficult to solve these challenges completely. It is a long process. However, there are certain technologies we can use to mitigate these problems. For example, we can use a unified data model, such as event-User-Item model, which can solve the problem of lack of common language among different organizational structures and departments to a certain extent, and alleviate some challenges brought by cross-departments and cross-businesses.

Data governance spans multiple links and is a “protracted battle”. It is not something that can be accomplished overnight. It requires constant investment of time and energy, and it is also a process prone to error. Therefore, in each link of data governance, it is necessary to have professional personnel to consult, support, assist and coordinate, such as professional analysts of Divine Data, who are professionals of data governance and can help customers solve problems to some extent.

Principles of data governance

Based on the experience of serving more than 1500 enterprise customers in the past five years, we have summarized the three principles of data governance:

1. Do not pollute first and then treat, but control at source

This principle is easier to understand, just like people go to the hospital when they are sick. It is generally because they have been “sick” that they will choose to go to the hospital. No matter what treatment plan is adopted at this time, it will do certain harm to our body, and even after recovery, it may leave sequelae. Therefore, we should see more “health care doctors” at ordinary times to ensure that we try not to get sick, even if we can not completely avoid, at least we can find abnormalities in time, through appropriate exercise to enhance physical fitness.

Once the data is contaminated, it is a long process of finding, planning and cleaning, and it may not be as good as we expect. For example, data governance needs to be accompanied by the release of App, but even if the new version is sent out, users may not upgrade it, resulting in partial data being contaminated all the time.

With the help of data management products, such as SDG of Shence, certain verification rules can be set for the fields of reported data in the product system during the data access or data verification stage. When data is imported and the verification fails, alarm and display will be made in the quality inspection board. Buried point developers/analysts can focus on locating/viewing/reporting buried point problems.

The field rules are as follows: · If a field is set as a necessary field, the field cannot be null or not reported; · Enumeration verification, which can set the enumeration value content of the field; · Regular expression verification, which can set the regular expression that the field meets; · Interval verification, which can set its numerical interval for numeric type field; · Equivalence check, the value of this field can be set to a specific value, which can be understood as a special enumeration with only one value.

Therefore, for data governance, do not pollute before treatment, but should be controlled from the source.

2. The process of data governance should be throughout the entire business iteration

When we first started working on data governance, we were usually able to achieve better results because of our internal focus. However, with the continuous iteration of the business, the requirements of data application and the system itself are constantly changing, and data governance needs to be updated and adjusted accordingly. At this stage, due to factors such as declining attention to data governance, organizational structure and personnel changes, process instability, and incompleteness, the results of data governance are generally difficult to maintain a high baseline, but get worse and worse until the final data application requirements cannot be met.

Take the online recommendation system for example, if there are delays and failures in the Item data stream, exposure and click event import used for machine learning training, the online service will be directly affected. For example, the new Item recommendation system fails.

Data governance related products generally provide the monitoring ability to support the independent setting of rules. For example, if the reported amount of data flow of each Item is less than 100 for three consecutive hours, when the reported amount does not meet the expectations within the set time range, it will be automatically reported to the alarm platform and displayed.

Therefore, the process of data governance should run through the whole process of business iteration. As business changes and organizational structure changes, data governance schemes and processes should also be changed.

3. Solve the problem in a productized and componentized way, instead of relying on manual work

Shence Data provides a standard data acquisition SDK, which has productized and standardized general functions such as anonymous ID generation, basic attribute collection, data packaging, compression and encryption, local cache, network transmission, time calibration and remote control. In addition, common requirements and common governance solutions in the data governance process can be precipitated in a productized manner.Data governance, WHICH I understand as the role of the Commission for Discipline Inspection, should not only deal with problems after discovering them, but also have a forward-looking consciousness and keep monitoring and checking all the time. Because there is no perfect solution to all the problems encountered during business development, the only thing we can do is to adjust the rules of data governance in a timely manner based on business development and product iteration.

The authors introduce

Mr. Wang Zhuozhou is the author of “Android Full Buried Point Solution” and “iOS Full Buried Point Solution”, and the head of the R&D department of Shence Data Governance. I have 10+ years of Android & iOS development experience, and I am one of the first ones engaged in Android RESEARCH and development in China, developing and maintaining the first commercial open source Android & iOS data buried SDK in China.

Mr. Wang Zhuozhou once worked in Beijing Tianyu Langtong Communication Equipment Co., LTD., as an Android system engineer. Graduated from Beijing Institute of Technology, software engineering major.

About the divine policy data

Shence Data is a professional big data analysis and marketing technology service provider. Focusing on user-level big data analysis and management needs, the company has launched shence analysis, Shence user portrait, Shence intelligent operation, Shence intelligent recommendation, Shence Guest view and other products.

In addition, it also provides big data related consulting and complete solutions. Shence data has accumulated China Union Pay, Xiaomi, China Post Consumer Finance, Haitong Securities, Guangfa Securities, Orient Securities, Central Bank, Baixin Bank, CyTS, Ping an Life Insurance, Sichuan Airlines, VIPKID, Oriental Pearl, China Resources, Youzan, Baixing.com, Goods Lala, Flash delivery, Donkey Mom, Keep, 36 Krypton, Largo, VUE, Chunyu Doctor, Jumei Youpin.com, Edge Edge games, Laogou, Funenjoy and other more than 1500 paying enterprise users of the service and customer success experience, for customers to provide comprehensive indicators sorting, data model building and other professional consulting, implementation and technical support services. For more in-depth understanding of divine policy data or data-driven related questions, please consult 4006509827, answered by a professional consultant.