Redis- Access and analysis of merchant information

  • Process meituan merchant information data set, clean the data and store it in mongodb database
  • Read and analyze the data, mining the existing value, and provide strong support for the operation strategy

Install the Redis operation module

PIP install redis

Data stored in

Data initialization

Import pandas as pd df= pd.read_csv(' CSV ', encoding=' UTF-8 ') df=df.dropna().drop(['id'],axis=1) #Copy the code

Database connection

Import redis pool = redis.ConnectionPool(host='127.0.0.1', port=6379,decode_responses=True, Encoding ='UTF-8') r = redis.strictredis (connection_pool=pool) # Connect redisCopy the code

If necessary, this operation will empty the Redis database!!

Write data to the database

The information for each store will be written to Redis as a List

For a zip in (df [' name '], df [' store ID '], df [' score '], df [' address '], df [' comments'], df [' average ']) : r.lpush(a[0],a[1],a[2],a[3],a[4],a[5]) print(r.lrange(a[0],0,-1))Copy the code

The data analysis

Process data and exploit value to support operational strategy

View each Key

keys = r.keys()
print(keys)
Copy the code

All read

df=pd.DataFrame() keys = r.keys() id=[] score=[] dir=[] num=[] price=[] for key in keys: key_list = r.lrange(key,0,-1) id.append(key_list[4]) score.append(key_list[3]) dir.append(key_list[2]) Num. Append (key_list[1]) price. Append (key_list[0]) df['name']=keys # append df['id']=id # append df['score']=score # append(key_list[1]) price Df ['dir']=dir # business address df['num']=num # comments df['price']=price # average consumption dfCopy the code

High marks the businessman

Merchant rating greater than 4.5

df['num']=df['num'].astype('int') df['price']=df['price'].astype('double') df['score']=df['score'].astype('double') Df1 = df [df [' num] > = 0] df2 = df1 [df1 [' price '] > 0] df3 = df2 [df2 [' score '] > 4.5] df4 axle = df3 [df3 [' score '] < = 5] df4 axle. The index = range(len(df4)) df4Copy the code

Popular merchants

Num More than 3000 comments

df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=3000]
df2=df1[df1['price']>0]
df3=df2[df2['score']>0]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
Copy the code

Top business

At the same time to meet the popular business and high score business conditions

df['num']=df['num'].astype('int') df['price']=df['price'].astype('double') df['score']=df['score'].astype('double') Df1 = df [df [' num] > = 3000] df2 = df1 [df1 [' price '] > 0] df3 = df2 [df2 [' score '] > 4.5] df4 axle = df3 [df3 [' score '] < = 5] df4 axle. The index = range(len(df4)) df4Copy the code

Inexpensive restaurant

Per capita consumption level is between 5 and 50

df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=0]
df2=df1[df1['price']>5]
df3=df2[df2['score']>0]
df4=df3[df3['score']<=5]
df5=df4[df4['price']<50]
df5.index = range(len(df5))
df5
Copy the code

Fancy restaurant

The per capita consumption level is above 200 yuan

df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=0]
df2=df1[df1['price']>200]
df3=df2[df2['score']>0]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
Copy the code

High-end preferred restaurant

Average consumption is higher than 200 yuan, score is higher than 4.5

df['num']=df['num'].astype('int') df['price']=df['price'].astype('double') df['score']=df['score'].astype('double') Df1 = df [df [' num] > = 0] df2 = df1 [df1 [' price '] > 200] df3 = df2 [df2 [' score '] > 4.5] df4 axle = df3 [df3 [' score '] < = 5] df4 axle. The index = range(len(df4)) df4Copy the code

Full score preferred merchant

Stores with a rating of 5.0 and thousands of reviews

df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=1000]
df2=df1[df1['price']>0]
df3=df2[df2['score']==5]
df3.index = range(len(df3))
df3
Copy the code