Friends who do data analysis may know some analytical methods more or less, but when it comes to analytical thinking, they have no confidence or do not know how to start when they encounter business problems. If you are confused, this article is a guide. The figure below is the analysis method and method. If it can be used flexibly, it will be able to solve more than 80% of the problems in work. Note that methodology is at the level of thinking and method is at the level of execution. The point, then, is how we can apply this to the real business. This paper will take RFM model as an example and apply it to practical cases. (This article is implemented in Python, Excel is also available)

Project Background: A fresh food delivery APP was established on January 1, 2018, specializing in fresh vegetables, fruits, seafood, meat and poultry. After the launch of the APP, the marketing period will be one year. Through analysis, it turns out that several important customers have been poached by competitors, and these users contribute 80% of the platform’s sales. The same operation strategy was used for all users before. In order to solve this problem, users need to be classified to understand the current user stratification and carry out fine operation.

I. Overall analysis process

1. Analysis purpose: user classification 2. Data acquisition: Excel data 3. Cleaning and processing: Excel and Python 4Copy the code

Ii. Understanding of RFM model

Finally, the results of RFM model processing will be used as user labels to help operators make activity rules more accurately to improve user stickiness and enhance user perception. The final result is as follows:

Third, use Python to achieve RFM user stratification

1. Get data
Import pandas as pd data = pd. Read_excel (' C: / Users/cherich/Desktop/user information. XLSX ') data. The head ()Copy the code

data.info()
Copy the code

Note: The current data set contains 5000 pieces of user data. Missing values do not affect this analysis. Data cleaning usually includes processing missing values, duplicate values and converting data types. So just think about the data type. There is a precondition that R, F, and M should have a reference time, which can be up to now if the activity continues up to now. But our data is historical, so we need to look up the end of the activity.

Data. Sort_values (by=' last transaction ', Ascending =False)Copy the code

2. Data processing
Data [' 2019-06-30']=data[' 2019-06-30']. Astype (' STR ') stop_date = pd.to_datetime('2019-06-30') datas = Data. The drop (columns = [' registration time ', 'member opening time, members' type', 'cities',' area ', 'the last time you login']) datas [' clinch a deal last time] = datas [' last clinch a deal]. Apply (lambda X :x.split()[0]) datas[' last traded time '] = pd.to_datetime(datas[' last traded time ']) datas['R1'] = datas[' last traded time '].apply(lambda) X :stop_date-x) datas['M1'] = datas['M1'] +datas[' M1'] = datas['M1'] +datas[' M1'] = datas['M1'] datas['R1']= datas['R1'].astype(str) datas['R1']= datas['R1'].apply(lambda x:x.split()[0]) datasCopy the code

Note: The purpose of the above operation is to convert R indicator from time type to computable format, so as to make preparation for the establishment of model and calculation of time interval.

3. Build a model

To establish the model, it is necessary to calculate the average value of F, R and M respectively. However, it should be noted that there are maximum and minimum values in the data of the three indicators, which will produce certain errors to the results. Therefore, the solution is to standardize them and set a segmented interval with a 5-point system, with 5 being the highest. (The value range can be flexibly adjusted or quartile according to the specific business)

def R_score(n):
    n = int(n)
    if 0<n<=80:
        r = 5
    elif 80<n<=160:
        r = 4
    elif 160<n<=240:
        r = 3
    elif 240<n<=320:
        r = 2
    else:
        r = 1
    return r

def F_score(n):
    n = int(n)
    if 0<n<=14:
        r = 1
    elif 14<n<=28:
        r = 2
    elif 28<n<=42:
        r = 3
    elif 42<n<=56:
        r = 4
    else:
        r = 5
    return r

def M_score(n):
    n = int(n)
    if 0<n<=1500:
        r = 1
    elif 1500<n<=3000:
        r = 2
    elif 3000<n<=4500:
        r = 3
    elif 4500<n<=6000:
        r = 4
    else:
        r = 5
    return r

datas['M1_score'] =datas['M1'].apply(M_score)
datas['F1_score'] =datas['F1'].apply(F_score)
datas['R1_score'] =datas['R1'].apply(R_score)
datas.head()
Copy the code

Note: here, the average value of R, F and M is calculated. The average value is taken as the standard. If a single indicator is greater than the average value, 1 is displayed; otherwise, 0 is displayed. The final RFM result is composed of 0 and 1 splicing to get the final user type.

R_mean = datas['R1_score'].mean() F_mean = datas['F1_score'].mean() M_mean = datas['M1_score'].mean() datas['R'] = datas['R1_score'].apply(lambda x: 1 if x> R_mean else 0) datas['F'] = datas['F1_score'].apply(lambda x: 1 if x> F_mean else 0) datas['M'] = datas['M1_score'].apply(lambda x: 1 if x> M_mean else 0) datas datas['RFM'] = datas['R'].apply(str)+datas['F'].apply(str)+datas['M'].apply(str) datas def User_tag (RFM): if RFM =='000': res =' lost user 'elif RFM =='010': res =' General retained user' elif RFM =='100': Res =' new client 'elif RFM =='110': res =' potential client' elif RFM =='001': res =' key retention client 'elif RFM =='101': Res =' elif RFM ': res =' elif RFM 'else: Return res datas['user_tag']=datas['RFM']. Apply (user_tag) datasCopy the code

4. Data visualization
import matplotlib.pyplot as plt import seaborn as sns import matplotlib as mpl sns.set(font='SimHei',style='darkgrid') User_tag = datas.groupby(datas['user_tag']).size() plt.figure(figsize = (10,4),dpi=80) User_tag. sort_values(Ascending =True,inplace=True) plt.title(label=' fresh platform user layer comparison ', fontsize=22, color='white', Backgroundcolor ='#334f65', pad=20) s = plt.barh(user_tag.index,user_tag.values, height=0.8, Color =plt.cm.coolwarm_r(np.linspace(0,1,len(user_tag)))) for rect ins: width = rect.get_width() plt.text(width+40,rect.get_y() + rect.get_height()/2, str(width),ha= 'center') plt.grid(axis='y') plt.show()Copy the code

Groups_b = datas.groupby(by='user_tag').size() plt.figure(figsize = (10,6),dpi=80) Fontsize =22, color='white', backgroundcolor='#334f65', pad=20) Explodes = [0.6, 0,0,0,0.4,0.8] patches, l_text, p_text = plt.pie(groups_b.values,labels = groups_b.index, Shadow = True, colors = PLT. Cm. Coolwarm_r (np) linspace (0, 1, len (groups_b))), autopct = '%. % % 2 f, Explodes = explodes, PLT startangle = 370). The legend (ins, bbox_to_anchor = (1.0), 2) PLT. The show ()Copy the code

5. Conclusions and Suggestions

The above has basically completed the RFM model to achieve user stratification. It can be seen that new customers account for about 30% and important value customers account for about 30%. Both are the dominant user types for the platform. The next step is to formulate operational strategies based on specific businesses. Finally, we’re seeing the most demand for analytical thinking right now. What is analytical thinking? My understanding is that the first thing is to understand the business, and the second thing is to master the analysis method. The significance of the existence of the analysis method is to help us classify scattered business problems, and the process of categorization forms the analysis thinking. With the analysis thinking, you have the analytical thinking. All right, that’s it.

The full data of this article can be obtained from RFM data.