1. Project objectives

Based on the existing data, the real-time update BI kanban is built with the theme of business belt depending on the situation.

2. Introduction to basic data sources

The existing data source consists of three tables: watch table, daikan_table; Subscription form, chengjiao_table; Personnel structure table; The subscription table can be understood as multiple records, and each line of the personnel structure table records broker information. The basic data sources of the above three tables are. CSV files downloaded by the background management system. Key fields in each table are listed as follows:

3. Project steps

(1) If you want to create powerBI with real-time refresh, consider building a wide table, using Python script to clean daikan_TABLE + CHENGJIAO_table, and save the cleaned data into mysql database. The latest data can be stored by executing the script periodically. The general idea is as follows:

(2) Use PowerBI to read data from the database and build models according to business requirements, such as date table. PowerBI Kanban can also be refreshed as database data is refreshed.

(3) For the personnel structure table, I have adopted a simple way to embed scripts to deal with the personnel structure table between powerBI.

4. Service introduction

(1) Data source understanding. The above data sources describe X Real Estate Company, which generates basic data of business transaction on a daily basis. From the perspective of brokers, the hierarchy of organizational structure is as follows: business division – region – store – branch – with brokers. Each strip view record generated by each broker each day is recorded in the strip view table, daikan_table; Every transaction record generated by each broker every day is recorded in the Subscription table, CHENGJIAO_table; The personnel structure table is relatively easy to understand, recording each broker’s organizational structure, basic information, occupation, etc. (2) Index calculation rules. Watchcount =daikan_table watchmark count; Penetration rate = number of viewers/number of employed persons; Per capita visits = visits/number of employed persons; Per capita performance = total subscription performance/number of employees; \

5. Data source architecture

(1) Integrate the viewing data and subscription data into a wide timeline and store it in the database. Read the wide table timeline from the database. Based on the service scenario, users are allowed to view data for about six months. Therefore, in the SQL statement, the date is set to roll back eight months from today.

(2) Build a date table in powerBI, and associate the date of the date table with the date of the wide table timeline. The goal is that all date filter fields are drawn from the date table, making the time filter unique. Date table construction is relatively simple, borrowed from some.

DATE table = ADDCOLUMNS (CALENDAR(DATE(2019,1,1),LASTDATE('YG_ALL Timeline'[DATE])), "YEAR ",YEAR([DATE]), "Quarter", "Q" & ROUNDUP (MONTH) / 3 (the [Date], 0), "MONTH", the MONTH ([Date]), "day", day ([Date]), "zhou", WEEKNUM ([Date]), "YEAR quarter," YEAR ([Date]) & "Q" & ROUNDUP (the MONTH ([Date]) / 3, 0), "years", YEAR ([Date]) * 100 + MONTH ([Date]))Copy the code

(3) The processing and construction of the personnel structure table. The difficulty of ad_jg_Shishi personnel structure table lies in that the personnel structure table is a daily one (for example, 1000 brokers yesterday, 1500 brokers today), so the personnel structure table is changing in real time. At present, the processing method is to read a. CSV personnel structure table every day, conduct relevant processing, and import powerBI. Introduce related packages;

import pandas as pd
import datetime
import glob
import os
from sqlalchemy import create_engine
import warnings
import pymysql
pymysql.install_as_MySQLdb()
warnings.filterwarnings("ignore")
Copy the code

Dynamically search the folder every day to obtain the address of the latest architecture table;

Join (r'\\10.8.29.213',' 08 war report ', Today_1) Ad_path= ". Join (glob.glob(os.path.join(path,' Details on Employees *'))) Ad_path= ad_path.strip () if not os.path.exists(Ad_path): today_1=str(datetime.datetime.now().year)+'.'+str(datetime.datetime.now().month)+'.'+str(datetime.datetime.now().day-1) Join (r'\\10.8.29.213', '08 Daily Archive ', today_1) Ad_path = '. Join (glob. Glob (os.path.join(path, '*')) # print(Ad_path)Copy the code

According to business needs, relevant fields are processed. For example, some fields need to be abbreviated, and some personnel with short working days are exempted from assessment;

df_aoding = pd.read_csv(Ad_path, Header =0) DF_AOding. drop_duplicates(SUBSET =[' ID '],keep='first',inplace=True) DF_Aoding_Tichu = Df_aoding [df_aoding permissions. The isin ([' comprehensive agent ', 'linkage agent', 'business agent', 'new brokers'])] df_aoding_tichu. Id = DF_AODING_TiCHU. Id astype(STR) DF_AODING_Tichu. Division = df_aoding_tichu. Division. Str.replace (" division ","").replace(" Homelesshouse "," house ") Df_aoding_tichu [" days in service "] = (datetime.datetime.today()-pd.to_datetime(df_aoding_tichu. Split (" ",expand=True)[0] df_aodING_tichu [" working days "]= Df_aodING_tichu [" working days "].astype(int) Df_aoding_tichu. Loc [df_aoding_Tichu [" division "]! STR. Replace (" replace "," replace ") df_aoding_tichu =df_aoding_tichu[' replace '] =df_aoding_tichu[' replace ']Copy the code

Label brokers according to their ranks;

zhiji_map = {'A0':'A0~2','A0/2':'A0~2','A1':'A0~2', 'A2':'A0~2','A3':'A3~5','A4':'A3~5','A5':'A3~5', 'A6':'A6~10','A7':'A6~10','A8':'A6~10','A9':'A6~10','A10':'A6~10', 'M1':'M1~M5','M2':'M1~M5','M3':'M1~M5','M4':'M1~M5','M5':'M1~M5', 'M6':'M6~M10','M7':'M6~M10','M8':'M6~M10','M9':'M6~M10', 'M10' : 'M6 ~ M10} df_aoding_tichu [' rank classification] = df_aoding_tichu [' rank']. The map (zhiji_map)Copy the code

Work number of the latest staff structure, one-to-many relationship connection timeline (including watch + subscription information), and calculate the basic fields (it can also be calculated in powerBI, but it is recommended to achieve in the code as much as possible, after all, the code is more stable than the DAX function of powerBI. I stepped in a hole in the DAX function);

Engine_b = create_engine('mysql+pymysql://root:yunguan@[email protected]:3306/YG_ALL') sql_b = "" select aa. Clinch a deal broker work number, AA. Always take the latest date to see, BB. Take the number of days that month, BB. Clinch A deal the month the latest with the date the from (select agent job number, MAX (date) as always with latest date from Timeline_3 where category = 'A belt to see' group by clinch A deal the agent work number) aa left join (select Broker ID,count(distinct date) as number of entries in the current month,count(distinct project name) as number of entries in the current month, Max (date) as latest entries in the current month from Timeline_3 WHERE category ='A Entries' And date >=date_add(curdate(), interval-day (curdate()) + 1 day) and date <=last_day(curdate()) group by last_day) bb on aa Transaction broker No. =bb. Transaction broker No. "' df_c = pd. Read_sql (SQL = sql_b, con = engine_b) # # # merge table, Ad_jg_shishi = pd.merge(df_aoding_tichu, df_C, left_ON =' id ', right_ON =' id ',how='left') Datetime.datetime.today ()-pd.to_datetime(ad_jg_shishi) = (datetime.datetime.today()-pd.to_datetime(ad_jg_shishi) Split (" ",expand=True)[0] ### ad_jg_shishi[" "expand=True "] = Replace ("NaT","1000000"). Astype (int) ad_jg_shishi[" NaT","1000000"] = pd.cut(ad_jg_shishi). Has not brought the number of days, bins = [1, 7, 30, 60, 90, 200100003], labels = [' a. 7 days', 'b. 8 ~ 30 days',' C. 31 to 60 days', 'D. 61 ~ 90 days',' E. 91 ~ 200 days', 'F. More than 200 days'])Copy the code
6. Visualization of powerBI panel data

(1) The id of ad_jG_Shishi and the id of the broker associated with the one-to-many Timeline table. The date of the new date table. Associate the date of the one-to-many Timeline table.

(2) Key indicators related to measurement values are calculated as follows.

DISTINCTCOUNT = CALCULATE(DISTINCTCOUNT('YG_ALL Timeline'),FILTER('YG_ALL Timeline','YG_ALL Timeline'[source number]<>""),FILTER('YG_ALL Timeline','YG_ALL Timeline' <>""))+0 DISTINCTCOUNT = CALCULATE(DISTINCTCOUNT('YG_ALL ') Timeline'[查 看标 准]),FILTER('YG_ALL Timeline','YG_ALL Timeline'[查 看标 准]= DISTINCTCOUNT(ad_jg_shishi[查 看标 准] DISTINCTCOUNT(ad_jg_shishi) = count (ad_jg_shishi) = count (ad_jg_shishi) = DISTINCTCOUNT(ad_jg_shishi)Copy the code
7. Results presentation

Link preview. Individuals move the data source to a local database and do not refresh in real time for a while. App.powerbi.com/view?r=eyJr…

(remember to 2021.6.7)