preface
Today we will use crawler Ctrip tourist attractions data crawling and visualization and do a simple data visualization analysis. Let’s have a good time
The development tools
Python version: 3.6.4
Related modules:
Bs4 module;
Jieba module;
Pyecharts module;
Wordcloud module;
Requests module;
And some modules that come with Python.
Environment set up
Install Python and add it to the environment variables. PIP installs the required related modules.
Data crawl
First of all, let’s make it clear what data we want to crawl. For convenience, we only crawl the data of tourist attractions in Beijing, as shown in the figure below:
That is, the data we need to climb is the name, location, score and other data of all scenic spots in Beijing. Now that we know what our crawl goal is, we can start writing code
The code is actually very simple, you can find that the url of the scenic spot information page changes as follows:
'https://you.ctrip.com/sight/beijing1/s0-p page. Html# sightname'Copy the code
All we need to do is request all the relevant pages one by one and use BS4 to parse and extract the data we need. At the same time, in order to avoid the crawler being blocked, we change an agent every 10 times we request webpage, and the source of the agent is the free agent that crawls online.
Specifically, the code implementation is as follows:
Ctrip tourist Attraction crawlerCopy the code
The code runs as follows:
All done~ Complete source code see personal profile or private letter to get related files.
Data visualization
The same rules, write a wave of crawler visualization data, for convenience, or just climbed the Beijing scenic spot data ~
First of all, make a word cloud of the location information of all the attractions?
Take a look at the rating distribution of attractions:
Let’s take a look at the rating distribution of scenic spots:
Among them, 5A scenic spots are:
The Forbidden CityCopy the code
Let’s look at the price distribution:
How about taking a look at the top eight most commented scenic spots?
Python data crawler Github user data crawler Github user data
All done~ Complete source code see personal profile or private letter to get related files.