Image credit: Internet
As we all know, China is the main export target of Chilean cherries, accounting for 95% of its market share.
Nata, commercial counsellor of The Chilean Embassy in China, said: “The 2020-2021 cherry harvest has been achieved, and it is expected that about 500,000 tons of cherries will enter the Chinese market this year.” Since the middle of December 2020, Chilean Marine cherries have been arriving in China, with the transportation cost significantly lower than the previous air transportation. This means that domestic consumers will be able to buy cherries at a lower price. However, recently the domestic has many imported cherry nucleic acid test results are positive, in this case, you still dare to shout “cherry freedom”?
01
Data acquisition
In this paper, Python was used to collect the sales data of 1585 cherries on Taobao.com, and obtain the product name, price, number of payers, store name, delivery address and other fields of cherries. Due to space limitation, the crawler code only gives the main function:
def main():
browser.get('https://www.taobao.com/')
page = search_product(key_word)
print(page)
get_data()
page_num = 70
while int(page) ! = page_num:print("-" * 100)
print("Climbing page {} data".format(page_num + 1))
browser.get('https://s.taobao.com/search?q={}&s={}'.format(key_word, page_num*44))
browser.implicitly_wait(10)
get_data()
page_num += 1
print("Data fetching completed")
if __name__ == '__main__':
key_word = "Cherry"
browser = webdriver.Chrome("./chromedriver")
main()
Copy the code
02
The data processing
1. Read and preview data
import pandas as pd
import numpy as np
df = pd.read_csv('/ J learn Python/ taobao/cherries. CSV ',header=None,
names=['Trade Name'.'Commodity price'.'Payer number'.'Shop Name'.'Shipping Address'Df.sample ()5)
Copy the code
2. View the data
df.info()
Copy the code
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1595 entries, 0 to 1674
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0Name of commodity1595 non-null object
1Commodity prices1595 non-null float64
2The number of payment1595 non-null object
3Shop name1595 non-null object
4The delivery address1585 non-null object
dtypes: float64(1), object(4)
memory usage: 74.8+ KB
Copy the code
The following problems exist in the data:
(1) There is a missing value in the shipping address
(2) The number of payers needs to be withdrawn
(3) The delivery address shall be divided
(4) Custom index and descending order
3. Clean data
Df.dropna (axis=0, how='any', inplace=True) # split province and city from shipping address"Province"] = df["Shipping Address"].str.split(' ',expand=True)[0] #expand=True"City"] = df["Shipping Address"].str.split(' ',expand=True)[1] # extract city df["City"].fillna(df["Province"], inplace=True) # city field null fill with province non-null # Extract number from payer with regular expressionimport re
df['digital'] = [re.findall(r'(\ d + \. {0, 1} \ d *)', i)[0] for i in df['Payer number'[] #'digital'] = df['digital'].astype('float') # convert numeric df['unit'] = [' '.join(re.findall(r'(万)', i)) for i in df['Payer number'[%] [%] [%]'unit'] = df['unit'].apply(lambda x:10000 if x=='万' else 1)
df['Payer number'] = df['digital'] * df['unit'Df.drop (['Shipping Address'.'digital'.'unit'], axis=1Df = df.sort_values(by="Commodity prices", axis=0Reset_index (drop=True) # resets the indexCopy the code
After cleaning, the data is previewed as follows:
03
Data visualization
In the past, data visualization has often been mapped using the Python visualization library. This article will try to use Excel for cherry data visualization, because Excel is no match for Python when it comes to drawing! \
1. Where are the most popular cherries in China?
Using the data of provinces and payers, the map shows that Shanghai, Zhejiang and Guangdong have the largest sales volume of cherries, while Xizang, Qinghai and Inner Mongolia have the smallest sales volume. The economic and population advantages of coastal areas have become the main consumption market of cherries.
As a “star fruit”, the high price of cherries often makes workers shy away. According to the latest data released by the National Statistics Administration, Shanghai has the highest per capita disposable income of more than 70,000 yuan, which makes it easier to realize the “cherry freedom”. Although Beijing has higher income, it may be more affected by the epidemic, and the sales of cherries are not large.
2. How expensive are cherries?
As can be seen from the above figure, 40% of the cherries are priced between 201-500 yuan (listed price on Taobao, not per kilogram price), and the cherries below 50 yuan account for less than 4%. I mean, they are expensive, how about you? If you are not too expensive, then I can help you find one that may meet your needs, as follows:
3. Which stores sell the best?
From the perspective of taobao stores with high sales volume, most of them are flagship stores. It seems that people pay more attention to store brand. Furuida flagship store monthly sales of more than 60 thousand, veritable cherry pin crown, 100 orchard followed.
4. What are the characteristics of the cherries on sale?
In order to understand the characteristics of cherry fruit, the text analysis of commodity name field is made, and the cherry word cloud map is drawn against the background of fruit basket. The main features can be seen, fresh, Chile, seasonal, extra-large and other words are the fruit shop to promote the point. As for pregnant women mentioned so high frequency, a little wonder, so Baidu:
So the question is, in special times, can we workers cut off our hands and buy cherries? So far, according to the Chinese CENTER for Disease Control and Prevention, no cases of COVID-19 infection have been found from eating imported cold-chain food. So for the average consumer, there is no need to panic too much. Of course, if you are really anxious, you can also opt for domestically produced foods during the pandemic.
Read more
Top 10 Best Popular Python Libraries of 2020 \
2020 Python Chinese Community Top 10 Articles \
5 minutes to quickly master the Python timed task framework \
Special recommendation \
\
Click below to read the article and join the community