Pay attention to my public account [Bao Ge big Data], more dry goods waiting for you
1.1. Hierarchical structure of data warehouse
Advantages of layering: simplification of complex problems, clear data structure (easy management), increased data reuse, isolation of raw data (decoupling)
The hierarchy | function |
---|---|
ods | The original data layer stores the original data without processing |
dwd | Detailed data layer to oDS layer data cleaning (remove null value, dirty data, data beyond the limit range) |
dws | The service data layer is lightly aggregated |
ads | Specific requirements for application data layer |
Storehouse in each layer of the table are external table
1.2 Basic formats of buried behavior data (basic fields)
Public fields: fields contained in almost all Android phones Service fields: fields reported by buried points. Specific service types are available. The following is an example of service field uploading. Behavior data startup log/event log table Key fields:
{
"ap":"xxxxx".// Project data source app PC
"cm": { // Public field
"mid": "".// (String) Unique identifier of the device
"uid": "".// (String) User id
"vc": "1".// (String) versionCode, the program version number
"vn": "1.0".// (String) versionName: indicates the program versionName
"l": "zh".// (String) System language
"sr": "".// (String) Channel number, which channel the application comes from.
"os": 7.1.1 "".// (String) Android version
"ar": "CN"./ / (String) area
"md": "BBB100-1".// (String) Mobile phone model
"ba": "blackberry".// (String) Mobile phone brand
"sv": "V2.2.1".// (String) sdkVersion
"g": "".// (String) gmail
"hw": "1620x1080".// (String) heightXwidth, screen width
"t": "1506047606608".// (String) Time when client logs are generated
"nw": "WIFI".// (String) Network mode
"ln": 0.// (double) LNG longitude
"la": 0 // (double) latitude
},
"et": [ / / event
{
"ett": "1506047605364".// Client event generation time
"en": "display".// Event name startup and event log are based on the event name
"kv": { // The event result is self-defined in key-value form
"goodsid": "236"."action": "1"."extend1": "1"."place": "2"."category": "75"}}}]Copy the code
Log tables can be divided into different log tables based on event tags
1.3 table introduction of each layer
1.3.1, ods layer
1) ODs_start_log Starts the log table
- There is only one field line (holding JSON), partitioned by date DT, in the format of the table: LZO
2) ODs_event_log Event log table (same format as startup log table)
- There is only one field line, partitioned by date DT, table format: LZO
1.3.2, DWD layer
Dwd_start_log starts the table
- Key fields: mid_id, user_id, dt(partitioned by date)
- From line in ods_start_log
Get_json_object mid_id (line, '$. Mid)
To get the field
1.3.2.1 user-defined UDF/UDTF (Application in the Project)
- Custom UDF functions (parsing public fields, one in, one out)
- Custom UDTF functions (parsing specific event fields, one in, many out)
- Custom UDF: Inherit UDF and override evaluate method
- Custom UDTF: Inherits GenericUDTF and overrides three methods: Initialize (customize the output column name and type), process (return the result forward(result)), and close
- Why do you need to customize UDF/UDTF? Because customized functions can bury logs and print logs, and make debugging easier.
1.3.2.2 Basic details of event logs
Dwd_base_event_log Details of the event log base
- 1) Key fields
- Public fields: MID_ID, user_id, DT (partition field) and event_name, event_JSON, server_time
- 2) Use UDF to obtain public fields and server_time from ods_event_log line, and use UDTF to obtain event_name and event_json.
1.3.2.3 Commodity Click table
Dwd_display_log Goods click list
- Key fields: Public field + unique field
- Get public fields and server_time directly from DWD_base_event_log, and unique fields from event_JSON in DWD_base_event_log,
where event_name = "display"
get_json_object(event_json,'$.kv.action') action
1.3.2.4 List of other specific events
similar
Show that | Table annotation |
---|---|
dwd_newsdetail_log | List of product details page |
dwd_loading_log | Item list page table |
dwd_ad_log | Advertising table |
dwd_notification_log | Message notification table |
dwd_active_foreground_log | User foreground active table |
dwd_active_background_log | User background active table |
dwd_comment_log | Comments on the table |
dwd_favorites_log | Collect the table |
dwd_praise_log | Thumb up table |
dwd_error_log | Error log table |
A total of 11 event-specific itemized tables can be obtained from a single event-based itemized table dwD_base_event_log
Second, demand analysis
2.1. Users activate themes
2.1.1 Daily activity list of DWS Layer
Daily active device analysis
2.1.2 Weekly activity list of DWS Layer
Weekly active device analysis
2.1.3 Monthly activity list of DWS Layer
Monthly active device analysis
2.1.4 Daily, weekly and monthly active devices table of ADS layer
Active equipment analysis
2.2 Users add themes
2.2.1 Daily addition list of DWS layer
2.2.2 New device table is added in ADS layer every day
2.3 Retention themes
2.3.1 Introduction to user retention
2.3.2 User retention analysis
2.3.3 Daily retention schedule of DWS Layer
2.3.4 Table of retained users of ADS layer
2.3.5 Retention rate table of ADS layer
2.4. Silent users
2.5. Number of backflow users this week
2.6 Lost users
2.7. Active users in the last 3 consecutive weeks
2.8. Active users for three consecutive days in the last seven days
2.9. Requirement logic
2.9.1 How to Analyze User Activity?
The number of occurrences of different device ids is counted in startup logs.
2.9.2 How can I Analyze User Additions?
Use the active user table left Join to add a user. The user is added if its mid is empty.
2.9.3 How To Analyze 1-Day Retention?
Retained users = Join added on the previous day Retention rate of active users today = Retained users/Added on the previous day
2.9.4 How To Analyze Silent Users?
(The login time is 7 days ago and the login time only occurs once.) Group the daily live table based on the device ID. The login times are 1 and the login time is one week ago.
2.9.5 How do I Analyze the Returned Users this week?
Active this week Left JOIN New This week Left Join was active last week, and both the new ID of this week and the active ID of the last week are null
2.9.6 How To Analyze Lost Users?
(Login time: 7 days ago) Groups the daily live table by device ID and does not log in within seven days.
2.9.7 How To Analyze the Number of Active Users in the Latest Three Weeks?
Group weekly activity by device ID. The statistics count is greater than three times.
2.9.8 How To Analyze the Number of Active Users In Three Consecutive Days in the Last Seven Days?
- 1) Query the active users in the last 7 days and rank the active dates of users
- 2) Calculate the difference between active dates and rankings
- 3) Group the same users and difference values, and count the number of difference values
- 4) Take out the data with the same difference value greater than or equal to 3, and then remove the weight (what weight is removed??) , i.e., users who have been active for 3 consecutive days or more