Data from ChinaEnterprise-level big data achieves unified, standard, safe and shared data organization in a systematic way, enabling front-end data applications in a service-oriented way, and improving the use efficiency of data. So what problem does the data center solve? It comes down to three main things: efficiency, quality and cost.


A, the efficiency

Efficiency can be divided into data research and development efficiency, data discovery efficiency and data analysis efficiency.

First is the efficiency of data research and development, in many projects, at the beginning of the project due to the business model is not fixed, changes faster, often lack good domain and hierarchical design, the theme of the chimney occupies the dominant development model, with business complexity and scale of the rise, a large number of repetitive data development, restricted the requirement of the data delivery efficiency. A requirement often takes a week or more to come online, and the demand response speed is often criticized by business departments.

The second is the efficiency of data discovery. Because different people develop and use data, it is very difficult to understand the meaning of each table accurately in the face of tens of thousands of tables, each table has dozens or even hundreds of fields. If there is no good system, it often needs a lot of communication costs. For data development, I often complain about the interruption of work and answer repetitive questions every day. For analysts, it takes a lot of time to know what data is available and find the data they want. Before the construction of the data center in NetEase, many businesses used very primitive methods, and each analyst maintained one by himself

Finally, the efficiency of data analysis, we hope that more and more people can analyze and make decisions based on data, but data analysis itself does have a threshold, the number is a big problem for most non-technical operations and analysts, often see an analyst

Second, the quality

Quality is the second problem to be solved in the data center, including the quality of warehouse design, consistency of indicators,

The good design of data warehouse is mainly reflected in three aspects: perfection, reusability and standardization. Data warehouse design is generally subject – oriented hierarchical design, for

2. Indicators are the results of data processing (or may be intermediate results). The core of indicator management is to ensure the consistency of indicators’ business caliber, calculation logic and data sources, and eliminate the ambiguity of indicators. A common situation in data development is that two data products see the same indicator and the results are inconsistent, which may be caused by different caliber, or of course, it may be caused by different data sources.

3. Quality also includes the quality of data, which includes data consistency, accuracy, timeliness and completeness. Data consistency is embodied in whether the same index data at the market level is consistent, whether the dimension is consistent, whether the trend of related indicators is consistent, and whether the value of the same entity from different data sources is consistent. Accuracy is reflected in whether the logic of numerical calculation conforms to expectations and whether the data format is correct. Once we had a profound lesson, in the e-commerce business, due to the business side update after the online part

Third, the cost of

Cost is the third problem to be solved in the data center, including computing resource cost, storage resource cost and human r&d cost.

Data is like files on your phone. If you don’t clean them regularly, you’ll never run out of storage space. We often find that big data costs faster than business growth, this was partly due to the development of the chimney caused by repeated data processing, waste of computing and storage resources, on the other hand is due to not regularly cleaned, offline will be useless data and tasks in time, lead to have no one to see the report, also every day from billions of calculation and processing the raw data, Waste a lot of resources. In fact, the cost of manpower is related to efficiency. If efficiency is improved, r&d costs will be controlled.

Efficiency, quality and cost are all interrelated, and I think these are the three most important issues for the data center to solve.