What are indicators
A statistical report field produced by the data team is the indicator. Such as the number of orders in the last seven days, the purchase conversion rate of a promotion, etc. An index specific to the calculation of implementation, mainly consists of the following parts
- Index processing logic, such as count,sum, AVG
- A dimension, for example, collects statistics by department and region, which corresponds to group by in SQL
- Business qualifiers/modifiers such as different payment channels to calculate the corresponding indicators, wechat pay order refund rate, Alipay order refund rate. Corresponds to where in SQL
In addition, indicators themselves can also be derived, derived from more indicators, based on these characteristics, indicators can be classified
- Atomic metrics basic business facts, no business qualification, no dimension. For example, the order quantity in the order table and the total amount of the order are atomic indicators
- Derived metrics such as the total amount of orders over the past five days add a business time limit
- Derivative metrics, such as the conversion rate of a particular promotion, are needed
Target number of promotion users
withPromotion order number index
Carry out calculations
Status of indicator system before construction
- Do not know the caliber, the data platform out of the index, the user even data research and development themselves do not know the specific caliber, need to turn over the code
- Index coupling, inconvenient offline and caliber adjustment, existing reports are demand-driven, a SQL may correspond to the calculation of multiple indicators, leading to indicators offline, logic adjustment must affect each other
- The same indicator is used in different reports. Therefore, the same logic needs to be written repeatedly, which increases the computing pressure of the cluster. In addition, multiple indicators need to be adjusted once indicators are adjusted
- Indicators can not be tracked, data team delivered a report for a long time, do not know who put forward, do not know which tables were used, do not know which team was finally used
- You can’t see the forest for the trees. The data team spent a lot of time on report development and neglected to polish the underlying model
Macro steps of index construction
Organize the index system based on business objectives
Based on the company’s different business modules, business scenarios, different management levels to sort out the indicator system, determine the indicator caliber.
- The general manager office is concerned about the flow related indicators, corresponding to the senior management
- The indicators related to order performance concerned by the order team correspond to the business module
- Advertising focus on the transformation of relevant indicators, corresponding to the business scenario
Input and management based on system support
Through the system construction, support the index system mentioned above, realize the index definition input, index call chain query and tracking, index offline management and other functions
Indicator management position of the data warehouse
The index calculation is generally based on the data warehouse star model
Product module design of index management system
Index definition
It is divided into product configuration part and technical personnel configuration part
- Select a service module for an indicator category
- Indicator demand description, fill in the department’s reason for initiating indicators
- Text definition of indicator, fill in the text description of indicator caliber
Technical configuration defines both atomic and derived indicators as basic indicators, because they can be calculated through a single warehouse model. Derivative indicators are defined as composite indicators, because such indicators need to be processed according to the above basic indicators
Basic indicators
- You need to select the business fact table, the dimension table, and the relationship between the business fact and the dimension, which is equivalent to selecting the table from and the corresponding group by logic
- Write index processing logic, such as count, sum, AVG, etc
- Writing a qualification is equivalent to writing a WHERE condition
The composite index
- Select basic metrics, such as selection
The number of people for promotion
,Number of single-use households
- Write processing logic,
Number of single users/number of promotional users
Index of the query
Based on the Chinese name of the index, search the corresponding index definition and blood relationship
Index offline
Based on blood statistics, find unused indicators and take them offline
External interface
Provided to an external system, such as BI, to query the definition of an indicator, and the detailed data of this indicator
Index permissions
Control the range of different indicators that can be obtained by different users and different applications
Report configuration
Based on the index pool in the index system, the report can be produced quickly by dragging and dropping
Realization of index management system technology
Index processing logic
Based on the definition of the technical part of the definition of the index, the INDEX processing SQL is automatically generated and deployed to the scheduling system. The auto-generated logic is nothing more than assembling the word table processing logic, dimension information, business qualification, and so on in the metric definition
Index output design
SQL processed indicators can be stored in the following two ways
Stored in DWS as a large wide table
For example, a large and wide table is organized based on the user dimension, which stores all indexes related to the user, such as the order number purchased by the user in the last 5 days and the order amount of the user in the last 30 days
The disadvantage is that:
- The report combination is not flexible enough. Some reports require indicators that span multiple large and wide tables and have to fetch data from multiple DWS large and wide tables
- A new indicator is not flexible. You need to add fields in the large-width table to add a new indicator
- Because each indicator is an independent job, its calculation is completed in different order. Some intermediate storage needs to be performed, and then it is merged into a large-width table
Store all indicator definitions in a unified indicator table
Indicators of T+1 are stored in a unified indicator table. The unified indicator table is defined as follows
Index id | The index code | The index value | The date of the partition |
---|
After all indicator jobs are calculated, indicator data is inserted into the corresponding table in the preceding format. If one job is inserted at a time, small files in the table may occur. The solution is to perform a small file merge on the index partition of the day after all index calculations are completed, similar to merging SQL
insert overwrite table chenjun_test PARTITION(dt='20200821') SELECT t.foo,t.bar from chenjun_test t where t.dt = '20200821'.Copy the code
The whole brain figure
The resources
Tech.meituan.com/2020/03/12/… www.infoq.cn/article/K29… Tech.meituan.com/2018/12/27/… Mp.weixin.qq.com/s/-pLpLD_HM… Tech.meituan.com/2018/12/27/… Baijiahao.baidu.com/s?id=163555… Help.aliyun.com/document_de… www.shangyexinzhi.com/article/158…