preface

I believe that speaking of “Taobao”, we will not feel strange. As the largest e-commerce platform in China, Taobao seems to be closely connected with our lives. Today, let’s use Python to crawl and simply analyze the commodity data of Taobao.

The development tools

Python version: 3.6.4
Related modules:

Numpy module;

Seaborn module;

Requests module;

Pyecharts module;

Pandas module;

Matplotlib module;

Wordcloud module;

Scipy module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

Data crawl

What we want to crawl is all the product information data that appears after a keyword is searched on Taobao:

After testing, the request was found:

Ai.taobao.com/search/getI…

And add keyword and page number data can obtain the corresponding commodity information data. So we can have fun writing code:

Run the aiTaobao. Py file in CMD to test it:

The crawl results are saved in the data.pkl file.

All Done! See the full source code for related files at the end.

The data analysis

Since Christmas is around the corner, let’s analyze the Santa hat data T_T and use Pyecharts all the time, which seems a bit lazy, so I changed some of the images to seaborn library, and will gradually introduce some other data visualization libraries

Take a look at the distribution of Santa hats:

It seems quite cheap

Let’s take a look at the location distribution of Santa hats:

It seems that most of the shops selling Santa hats are in Zhejiang

Let’s take a look at the sales rank of each merchant (some merchant names are so long that they only use the first few characters) :

Because the number and type of data captured is not much, so I feel there is no map to draw, and finally draw two word clouds to play:

To help upgrade those of you who are learning Python, here is a rich learning package

* * All Done!