What are indicators

A statistical report field produced by the data team is the indicator. Such as the number of orders in the last seven days, the purchase conversion rate of a promotion, etc. An index specific to the calculation of implementation, mainly consists of the following parts

  • Index processing logic, such as count,sum, AVG
  • A dimension, for example, collects statistics by department and region, which corresponds to group by in SQL
  • Business qualifiers/modifiers such as different payment channels to calculate the corresponding indicators, wechat pay order refund rate, Alipay order refund rate. Corresponds to where in SQL

In addition, indicators themselves can also be derived, derived from more indicators, based on these characteristics, indicators can be classified

  • Atomic metrics basic business facts, no business qualification, no dimension. For example, the order quantity in the order table and the total amount of the order are atomic indicators
  • Derived metrics such as the total amount of orders over the past five days add a business time limit
  • Derivative metrics, such as the conversion rate of a particular promotion, are neededTarget number of promotion userswithPromotion order number indexCarry out calculations

Status of indicator system before construction

  • Do not know the caliber, the data platform out of the index, the user even data research and development themselves do not know the specific caliber, need to turn over the code
  • Index coupling, inconvenient offline and caliber adjustment, existing reports are demand-driven, a SQL may correspond to the calculation of multiple indicators, leading to indicators offline, logic adjustment must affect each other
  • The same indicator is used in different reports. Therefore, the same logic needs to be written repeatedly, which increases the computing pressure of the cluster. In addition, multiple indicators need to be adjusted once indicators are adjusted
  • Indicators can not be tracked, data team delivered a report for a long time, do not know who put forward, do not know which tables were used, do not know which team was finally used
  • You can’t see the forest for the trees. The data team spent a lot of time on report development and neglected to polish the underlying model

Macro steps of index construction

Organize the index system based on business objectives

Based on the company’s different business modules, business scenarios, different management levels to sort out the indicator system, determine the indicator caliber.

  • The general manager office is concerned about the flow related indicators, corresponding to the senior management
  • The indicators related to order performance concerned by the order team correspond to the business module
  • Advertising focus on the transformation of relevant indicators, corresponding to the business scenario

Input and management based on system support

Through the system construction, support the index system mentioned above, realize the index definition input, index call chain query and tracking, index offline management and other functions

Indicator management position of the data warehouse

The index calculation is generally based on the data warehouse star model

Product module design of index management system

Index definition

It is divided into product configuration part and technical personnel configuration part

  • Select a service module for an indicator category
  • Indicator demand description, fill in the department’s reason for initiating indicators
  • Text definition of indicator, fill in the text description of indicator caliber

Technical configuration defines both atomic and derived indicators as basic indicators, because they can be calculated through a single warehouse model. Derivative indicators are defined as composite indicators, because such indicators need to be processed according to the above basic indicators

Basic indicators

  • You need to select the business fact table, the dimension table, and the relationship between the business fact and the dimension, which is equivalent to selecting the table from and the corresponding group by logic
  • Write index processing logic, such as count, sum, AVG, etc
  • Writing a qualification is equivalent to writing a WHERE condition

The composite index

  • Select basic metrics, such as selectionThe number of people for promotion,Number of single-use households
  • Write processing logic,Number of single users/number of promotional users

Index of the query

Based on the Chinese name of the index, search the corresponding index definition and blood relationship

Index offline

Based on blood statistics, find unused indicators and take them offline

External interface

Provided to an external system, such as BI, to query the definition of an indicator, and the detailed data of this indicator

Index permissions

Control the range of different indicators that can be obtained by different users and different applications

Report configuration

Based on the index pool in the index system, the report can be produced quickly by dragging and dropping

Realization of index management system technology

Index processing logic

Based on the definition of the technical part of the definition of the index, the INDEX processing SQL is automatically generated and deployed to the scheduling system. The auto-generated logic is nothing more than assembling the word table processing logic, dimension information, business qualification, and so on in the metric definition

Index output design

SQL processed indicators can be stored in the following two ways

Stored in DWS as a large wide table

For example, a large and wide table is organized based on the user dimension, which stores all indexes related to the user, such as the order number purchased by the user in the last 5 days and the order amount of the user in the last 30 days

The disadvantage is that:

  1. The report combination is not flexible enough. Some reports require indicators that span multiple large and wide tables and have to fetch data from multiple DWS large and wide tables
  2. A new indicator is not flexible. You need to add fields in the large-width table to add a new indicator
  3. Because each indicator is an independent job, its calculation is completed in different order. Some intermediate storage needs to be performed, and then it is merged into a large-width table

Store all indicator definitions in a unified indicator table

Indicators of T+1 are stored in a unified indicator table. The unified indicator table is defined as follows

Index id The index code The index value The date of the partition

After all indicator jobs are calculated, indicator data is inserted into the corresponding table in the preceding format. If one job is inserted at a time, small files in the table may occur. The solution is to perform a small file merge on the index partition of the day after all index calculations are completed, similar to merging SQL

insert overwrite table chenjun_test PARTITION(dt='20200821') SELECT t.foo,t.bar from chenjun_test t where t.dt = '20200821'.Copy the code

The whole brain figure

The resources

Tech.meituan.com/2020/03/12/… www.infoq.cn/article/K29… Tech.meituan.com/2018/12/27/… Mp.weixin.qq.com/s/-pLpLD_HM… Tech.meituan.com/2018/12/27/… Baijiahao.baidu.com/s?id=163555… Help.aliyun.com/document_de… www.shangyexinzhi.com/article/158…