This is the 8th day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021.
Summarize with production.
A number of storehouse layering
1. Data Introduction Layer (ODS, Operational Data Store, also known as Data Base Layer)
The original data is stored in the data warehouse system almost without processing, and the structure is basically consistent with the source system, which is the data preparation area of the data warehouse.
2) Mysql statically synchronize data 3)mysql offline synchronize dataCopy the code
2. Data Common Dimension Model (CDM, Common Dimenions Model)
Store detailed fact data, dimension table data and summary data of public indicators.
Among them, detailed fact data and dimension table data are generally processed and generated according to ODS layer data. The summary data of public indicators are generally processed and generated based on dimension table data and detailed fact data.
CDM layer is subdivided into dimension layer (DIM), detail data layer (DWD) and summary data layer (DWS). Using dimension model method as the theoretical basis, the relationship between the primary key of dimension model and the foreign key of fact model can be defined to reduce data redundancy and improve the ease of use of detailed data table.
In the summary data layer, the dimensions in the statistical granularity can also be associated with reuse, and more broad tabulation methods can be adopted to construct the public indicator data layer, so as to improve the reuse of public indicators and reduce repeated processing.
1) Dimension layer (DIM) is modeling-driven by dimensions. Based on the business meaning of each Dimension, the process of attribute definition is completed by adding Dimension attributes, associating dimensions and defining calculation logic, etc., and a consistent data analysis Dimension table is established. In order to avoid redundant correlation of dimension attributes in dimension model, dimension table is constructed based on snowflake model. 2) Detail Data layer (DWD, Data Warehouse Detail) By ODS layer Data access, incremental or full import, the Detail fact table some important attribute fields do appropriate redundancy, that is, wide table processing. 3) Summary Data layer (DWS, Data Warehouse Summary) takes the analyzed subject objects as modeling-driven, and constructs Summary indicator table of common granularity based on the indicator requirements of upper-layer applications and products. Physical model by means of broad table, construct statistical indicators with naming norms and consistent caliber, provide common indicators for the upper level, establish summary broad table and detailed fact table. 4) Data intermediate layer (MID, Data Middle Conversion) Data Middle layer, calculation Middle layer of some complex indicators, generally reflected in row Conversion column, multi-dimensional intermediate storage, etc.Copy the code
3. Data Application Layer (ADS, Application Data Store)
Store personalized statistical index data of data products, which are processed and generated according to CDM layer and ODS layer.
1)BI indicators: basic indicators + custom indicators + funnel model + monthly report 2) Offline label: platform behavior + click behavior + pay + basic attributesCopy the code
Binary naming convention
Ods: platform name/library name or event name/table name
[XXX] mysql [XXX]_[XXXX]Copy the code
2. Dim: Indicates the dimension type
Example: [dim] _ [XXXX]Copy the code
3. DWD: dwd_ platform name/library name/event name/table name _..
Example: dwd_ _ [XXXX] [x]Copy the code
4. DWS: DWS_platform name_Topic name
Example: dws_ _ [XXXX] [x]Copy the code
5. Ads: ADS_ Platform name _ Application name _ Indicator name/model name, separated by platform
Example: the bi application ads_ [x] _ _ _ [XXX] [bi] [XXXX] portrait application ads_ [x] _ _ _ [XXX] [portrayal] [XXXX]Copy the code
Three layered meaning
1. Clear data structures
Each data layer has its scope so that we can locate and understand it more easily when using the table.
2. Data pedigree tracing
Finally, there are many data sources that are presented to the business. If there is a problem with one of the sources, we hope to locate the problem quickly and accurately and know the scope of its harm.
3. Reduce repetitive development
Standardizing the data layering and developing some common middle layer data can reduce the double calculation greatly.
4. Simplify complex problems
A complex task is divided into multiple steps to complete, and each layer only deals with a single step, which is relatively simple and easy to understand, and easy to maintain the accuracy of the data. When there is a problem with the data, it is not necessary to repair all the data, but only need to repair from the step with the problem. Shields original data anomalies and service impacts. Data needs to be re-connected without changing services once.