preface

Today we will use crawler Ctrip tourist attractions data crawling and visualization and do a simple data visualization analysis. Let’s have a good time

The development tools

Python version: 3.6.4
Related modules:

Bs4 module;

Jieba module;

Pyecharts module;

Wordcloud module;

Requests module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

Data crawl

First of all, let’s make it clear what data we want to crawl. For convenience, we only crawl the data of tourist attractions in Beijing, as shown in the figure below:

That is, the data we need to climb is the name, location, score and other data of all scenic spots in Beijing. Now that we know what our crawl goal is, we can start writing code

The code is actually very simple, you can find that the url of the scenic spot information page changes as follows:

'https://you.ctrip.com/sight/beijing1/s0-p page. Html# sightname'Copy the code

All we need to do is request all the relevant pages one by one and use BS4 to parse and extract the data we need. At the same time, in order to avoid the crawler being blocked, we change an agent every 10 times we request webpage, and the source of the agent is the free agent that crawls online.

Specifically, the code implementation is as follows:

Ctrip tourist Attraction crawlerCopy the code

The code runs as follows:

All done~ Complete source code see personal profile or private letter to get related files.

Data visualization

The same rules, write a wave of crawler visualization data, for convenience, or just climbed the Beijing scenic spot data ~

First of all, make a word cloud of the location information of all the attractions?

Take a look at the rating distribution of attractions:

Let’s take a look at the rating distribution of scenic spots:

Among them, 5A scenic spots are:

The Forbidden CityCopy the code

Let’s look at the price distribution:

How about taking a look at the top eight most commented scenic spots?

Python data crawler Github user data crawler Github user data

All done~ Complete source code see personal profile or private letter to get related files.