“This is the first day of my participation in the First Challenge 2022. For details: First Challenge 2022.”
Hello, I’m Ding Xiaojie. The source of the case in this paper is the book “Data Analysis Practice”, which uses R language. In the following period of time, I will use Python + Tableau to reproduce the case as much as possible for everyone to learn.
Scene description
The monthly sales of a small APP game operated by a certain company rose steadily, but suddenly dropped in July. No matter from the market environment or the environment of the game itself, the sales of this game still has room for continued growth. There may be two main factors affecting the sales of this game:
- New on the game event
- Change in commercial publicity
Come to light
- The game activity has not changed much from last month’s activity
- Publicity has declined because of budget constraints
Next, we will analyze the data to confirm the above conclusions and propose ways to recover sales.
Data description
Daily Active Users
139,112 lines of user data that visits at least once a day.
field | type | meaning |
---|---|---|
log_date | str | Access time |
app_name | str | The application of |
user_id | numpy.int64 | The user ID |
DPU (Daily Payment User)
Data of users who spend at least 1 yen per day (0.056 yuan), 884 lines.
field | type | meaning |
---|---|---|
log_date | str | Access time |
app_name | str | The application of |
user_id | numpy.int64 | The user ID |
payment | numpy.int64 | Consumption amount |
Install
Log the first time each user logs into the game, 29,329 lines.
field | type | meaning |
---|---|---|
install_date | str | First Login Time |
app_name | str | The application of |
user_id | numpy.int64 | The user ID |
The data analysis
Data is read
Read three data sets.
import pandas as pd
DAU_data = pd.read_csv('DAU.csv')
DPU_data = pd.read_csv('DPU.csv')
install_data = pd.read_csv('install.csv')
Copy the code
Display the top five rows of the DAU dataset.
DAU_data.head()
Copy the code
The first five lines of the DPU data set are displayed.
DPU_data.head()
Copy the code
Display the first five lines of the Install dataset.
install_data.head()
Copy the code
Data consolidation
Merge daily active user data DAU with user’s first login data Install, and use user_id and app_name as reference keys. This will give you the user’s first login time.
all_data = pd.merge(DAU_data,
install_data,
on=['user_id'.'app_name'])
all_data.head()
Copy the code
After obtaining the user’s first login time, it is merged with the daily consumption user data DPU. All data in ALL_data is retained in left-connection mode, and the missing value is NaN by default.
all_data = pd.merge(all_data,
DPU_data,
on=['log_date'.'app_name'.'user_id'],
how='left')
all_data.head()
Copy the code
The data processing
Fill the null value in the Payment column with 0.
all_data['payment'] = all_data['payment'].fillna('0')
all_data
Copy the code
Convert the units of the Payment column to units, leaving only months for log_date and install_date.
all_data['payment'] = all_data['payment'].astype(float)
all_data['payment'] = all_data['payment'] * 0.056
all_data['log_date'] = pd.to_datetime(all_data['log_date']).map(lambda x : x.strftime("%m") [1] + 'month')
all_data['install_date'] = pd.to_datetime(all_data['install_date']).map(lambda x : x.strftime("%m") [1] + 'month')
all_data.head()
Copy the code
New and old users are divided
Users whose login month > first login month are defined as old users, and others are defined as new users.
all_data['user'] = all_data['log_date'] > all_data['install_date']
all_data['user'] = all_data['user'].map({False: 'New user'.True: 'Regular users'})
all_data.head()
Copy the code
Payment is summed by log_date and user groups to calculate the monthly sales of old and new users.
user_category = all_data.groupby(['log_date'.'user'[])'payment'].sum().reset_index()
user_category.head()
Copy the code
It can be seen that sales from existing users in June and July are basically the same, but sales from new users in July are significantly less than that in June.
Divide sales into regions to see which tiers of users are spending less.
import numpy as np
sale_df = all_data.loc[all_data['payment'] > 0['log_date'.'payment']].copy()
bins = list(range(0.101.30)) + [np.inf]
labels = [str(n) + The '-' + str(n + 30) for n in bins[:-2]] + ['> 90']
sale_df['payment_cut'] = sale_df['payment'].apply(lambda x : pd.cut([x], bins=bins, labels=labels)[0])
sale_df.head()
Copy the code
As can be seen from the bar chart above, compared with June, the number of users spending less than 60 yuan decreased in July.
Here we can draw some conclusions.
conclusion
The number of new users who consume decreased, especially those who consume small amounts of money. As a result, the company needs to launch a commercial campaign again and return to the previous level in order to increase awareness of the company’s products among potential users and add new users.
Case reference [1] Data Analysis Practice [J] Akinamura Takashi Yohei/translated by Xiao Feng
For those who are new to Python or want to learn Python, you can search “Python New Horizons” on wechat to communicate and learn together. They are all beginners. Sometimes a simple question is stuck for a long time, but others may realize it at a touch. There are also nearly 1,000 resume templates and hundreds of e-books waiting for you to collect!