Xiamen is really more than Gulangyu
Today’s article is about Xiamen. This is because I spent many years in Xiamen. From graduation to work, I spent a very beautiful youth in Xiamen. As a foreigner, I really think Xiamen is a garden city with beautiful environment and artistic atmosphere. Once by the United States president Nixon as the “Oriental Hawaii”, in the author’s opinion, is really worthy of the name 😃
Xiamen has always been a popular 🔥 tourist city in China, many people are going to Gulangyu, after all, it is famous. In my memory, I have been to Gulangyu for at least 7 times. Every time my family members or friends came, I would take them or provide them with a simple guide. But Xiamen is really more than Gulangyu.
Gulangyu island
Let’s start with a picture of Gulangyu. Once, the author took a friend to the top of The Sunlight Rock (the highest peak of Gulangyu). There are so many tourists on the top that you can see Gulangyu Island, Xiamen landmark Shimao Twin Towers and Xiamen Bay. It is indeed very beautiful 😃
The data source
The data used in this article is obtained from a website, and the specific process is explained in detail.
Crawl field
A total of 6 fields have been climbed, which are:
- Chinese name cn_title
- English name en_title
- Guide the strategy
- Comment on the comment
- Top ranking
- Brief Introduction of scenic spots
Web rule
1, enter the website: travel.qunar.com/p-cs299782-… We found up to 10 attractions per page, totaling 126 pages:
Webpage rules can be constructed as:
for i in range(1.127):
url = "https://travel.qunar.com/p-cs299782-xiamen-jingdian-1-{}".format(i) # Rules of the web
Copy the code
2, Let’s look at the location of the six fields in the source code, right-click and choose “Check”, in the review element elements to find 10 spots per page, where each
pair represents a spot.
3. Look at the position of each field
We can locate each field through the following three pictures. Only when the field is located can we parse it out.
Let’s parse each field out
Import corresponding libraries
To crawl a field, you need to import various libraries. The main functions of these libraries include:
- Send network request requests
- Re module for regular parsing data
- Json packages handle Python dictionary type data
- CSV is used to save the crawled data
- Pandas and numpy handle the crawled data
- Plotly_express and Pyecharts
import pandas as pd
import numpy as np
import re
import csv
import json
import requests
import random
# participle
import jieba
# drawing
# import plotly. Express as px
import plotly_express as px
import plotly.graph_objects as go
from wordcloud import WordCloud # word cloud
import matplotlib.pyplot as plt
from plotly.graph_objects import Scatter,Bar
Copy the code
First page data request
Let’s take a look at the first page of data request, look at the specific source code
Some webpage data are as follows:
Match the field
Here is a regular expression to match each field using findAll () in the re module:
1. Names of scenic spots in Chinese
Sometimes we need to check if the number of pages is 10, so we print out the length, which is exactly 10
2. English names of scenic spots
3. Number of walkthroughs
4. Number of comments
5. Scenic spots ranking
The ranking of scenic spots is quite special. When I climbed the data, I found that some scenic spots were not ranked. For example, many of the scenic spots on page 16 were not ranked and needed special ranking.
In the code above, if the ranking of attractions does not exist, 0 is used instead:
6. Introduction to scenic spots
Cut in crawl
Here is the source code to crawl the entire site, including:
- Web URL construction
- Send a request to get the source code
- Field resolution process, special case handling
- Save the file
The data processing
Pandas is used to read the text in pandas:
df = pd.read_csv("Scenic spots in Xiamen. CSV")
df.head() Extract the first 5 data items
Copy the code
Check the length of the data and the field type, and see if there are any missing values:
- You can see that three fields are of type string: Object
- The other three fields are of type INT64
df.isnull().sum(a)# Check for missing values
The results showed that 369 English names were missing, and 1121 scenic spots had no introduction
Copy the code
Chinese name cn_title
First, let’s look at the names of scenic spots. Xiamen is a garden city. There are many parks on the island. Let’s look at the statistics to see how many parks there are in Xiamen:
str.contains()
: String contains functionsreset_index()
: Rearranges the indexes in pandas
# 1- How many parks
park = df[df["cn_title"].str.contains("Park")].reset_index(drop=True)
park
Copy the code
Conclusion: Data show that there are 107 parks in and around Xiamen Island
Let’s take a look at the parks by ranking field and see which ones are popular:
new_park = park[park["ranking"] != 0].sort_values(by=["ranking"]).reset_index(drop=True)
new_park[:20] # Take out the top 20 parks
Copy the code
From the data, the top three are: railway culture Park, Zhongshan Park, Wuyuanwan Wetland Park
1. Railway Culture Park: THE author has been to it once. Near Jinbang Park, near Jinbang Park, is the extension line of Yingxia (Jiangxi Yingtan to Xiamen) railway, as well as the old railway tracks
2, Zhongshan Park: many places should have Zhongshan Park, to commemorate Dr. Sun Yat-sen, there will be a lot of activities here
3. Wuyuanwan Wetland Park: Wuyuanwan reminds people of tuhao people in Xiamen. Wuyuan Bay can be said to be the tuhao district of Xiamen
Baeknojoo Park, Chunggrun Park, and Mt. Cheonjoo Forest Park outside the island are all worth visiting. Let’s take a look at some famous streets in Xiamen:
# 2- Famous Street
street = df[df["cn_title"].str.contains("Street")].reset_index(drop=True)
street.head(10)
Copy the code
There were 37 pieces of data, and we took the first 10 pieces of data
Zhongshan Road pedestrian street is really popular: a variety of local Southern Fujian snacks, milk tea, Taiwan specialties, Xiamen landmark building – arcade building. Every holiday is traffic jam, people mountain sea.
Ting O Tsai Cat Street is also popular. Not far from the south gate of Xiamen University, I have been there several times. Inside, there is a shop selling cats, which is super popular 🔥 finally, let’s look at the attractions related to college:
It can be seen that basically 17 scenic spots related to xiamen university have been contracted by 3 universities:
- Xiamen University: The most beautiful university in China
- Jimei university
- Huaqiao university
Xiamen University used to be accessible to tourists. In recent years, the number of students has been limited, and you need to make an appointment to enter the university, so if you want to go to Xiamen University, please make an appointment in advance
But if you have relatives or friends reading in it, I heard that you can take in oh 😃 secretly tell you. Put a picture of xiamen shangxian field, the author has taken.
Scenic spot ranking
Let’s go straight to the ranking to see which scenic spots are popular:
# remove ranking=0; Rank in ascending order;
# Take out the top 10 attractions
ranking = df[df["ranking"] != 0].sort_values(by=["ranking"],ascending=True)[:20].reset_index(drop=True)
px.bar(ranking, # Incoming data box
x="cn_title".# horizontal field
y="ranking".# vertical axis field
color="ranking" # Color display field
)
Copy the code
Top of the list is gulangyu 😭. Shell 🐚 Dream World, Xiamen Undersea World, Xiamen Da Deji Bathing Beach, Sunshine Rock and so on are the scenic spots on Gulangyu Island. So Gulangyu is really very popular
Xiamen University and the Adjacent Nanputuo Temple are also popular with tourists. A few years ago, Xiamen built a new landmark: the Twin Towers, and a lot of tourists went there.
Strategy for the strategy
Many tourists like to write some tourist guides for others to refer to after they arrive at scenic spots. Let’s take a look at the number of popular tourist guides:
px.scatter(df, # Drawing data
x="cn_title".# transverse and longitudinal axis
y="strategy",
color="strategy" # color tag
)
Copy the code
Conclusion: The data shows that Xiamen University is the most popular scenic spot for tourists to write guidebooks, followed by Nanputuo Temple and Zhongshan Road pedestrian Street.
Number of comments on the comment
Let’s take a look at the number of tourist comments; Take out the top 20 scenic spots after descending order, and display the top 10 data:
comment = df[df["comment"] != 0].sort_values(by=["comment"],ascending=False)[:20].reset_index(drop=True)
comment.head(10)
Copy the code
px.scatter(comment, # data frame
x="cn_title".# transverse and longitudinal axis
y="comment",
color="comment" # color
)
Copy the code
According to the number of comments and ranking, we draw a multi-graph combination:
fig = px.scatter(comment, # data frame
x="ranking".# transverse
y="comment".# on the vertical
color="ranking".# color
marginal_y="violin".# y graph
marginal_x="box".# X-axis graph
trendline="ols".# the trend line
template="simple_white") # template
fig.show()
Copy the code
Ranked the first or drum wave island 😭 Xiamen University, South Putuo Temple, zhongshan Road pedestrian street followed.
Brief introduction to the abstract
Finally, we analyze the brief introduction of scenic spots on the website. Here we use WordCloud to draw the WordCloud. Start by populating the empty values in the introduction.
abstract = df.fillna(value="") # Missing value fill
abstract_list = abstract["abstract"].tolist() # Show the top 10 profiles
abstract_list[:10]
Copy the code
Next we use jieba participle and add each participle to a big list:
jieba_list = []
for i in range(len(abstract_list)):
# jieba participle
seg_list = jieba.cut(str(abstract_list[i]).strip(), cut_all=False)
for each in list(seg_list):
jieba_list.append(each)
jieba_list[:10]
Copy the code
First time: Drawing directly using Wordcloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "".join(i for i in jieba_list) # String to be processed
# Download the simhei. TTF font and place it in one of your own directories
font = r'/Users/peter/Desktop/spider/SimHei.ttf'
wc = WordCloud(collocations=False,
font_path=font, # path
max_words=2000,width=4000,
height=4000, margin=2).generate(text.lower())
plt.imshow(wc)
plt.axis("off")
plt.show()
wc.to_file('xiamen.png') # Save the word cloud
Copy the code
Seen from the word cloud, xiamen and Gulangyu are very prominent in the introduction. Of course, there are also a lot of invalid words, such as: located, here, etc., next we use to stop times table for processing, stop words table is collected online:
Draw it again: the stop words list is self-collected
Create a list of stop words
def StopWords(filepath) :
stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]
return stopwords
Pass the path to the stop word table
stopwords = StopWords('/Users/peter/Desktop/Publish/nlp_stopwords.txt')
stopword_list = []
for word in jieba_list:
if word not in stopwords:
ifword ! ="\t" andword ! ="":
stopword_list.append(word)
stopword_list[:10]
Copy the code
After using the stop list, I find that commas and other punctuation have been removed, and many worthless words have also been removed. Next, we use the picture of the beautiful woman as the background to draw the word cloud map:
from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
d = path.dirname('. ') # Use this code in the IDE
# d = path.dirname(__file__)
Pass in a new word list
text = "".join(i for i in stopword_list)
# https://www.deviantart.com/jirkavinse/art/Real-Life-Alice-282261010
alice_coloring = np.array(Image.open(path.join(d, "wordcloud.jpg")))
# Set the stop word
stopwords = set(STOPWORDS)
stopwords.add("said")
# font path
font = r'/Users/peter/Desktop/spider/SimHei.ttf'
wc = WordCloud(background_color="white", font_path=font,
max_words=2000, mask=alice_coloring,
height=6000,width=6000,
stopwords=stopwords, max_font_size=40, random_state=42)
wc.generate(text)
image_colors = ImageColorGenerator(alice_coloring)
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()
wc.to_file('xiamen. PNG') # Save the word cloud
Copy the code
The final drawing is shown as follows:
conclusion
Based on a data from the Internet, this article analyzes the relevant scenic spots in Xiamen and has a look at what people like to go to in Xiamen:
- Gulangyu is so popular that it is almost a must for tourists
- Scenic spots in Xiamen University area: Siming Campus of Xiamen University (Furong Lake, Furong Tunnel, Songen Building, etc.), The South Putuo Temple next to it, the Five Old Peaks above the temple
- If you like the sea and cycling, go here: Xiamen University Baicheng Beach, Yanwu Bridge, Huandao Road, Hulishan Fortress, Coconut Wind Village
- If you are a food lover, you should go to: Zhongshan Road pedestrian Street, Zeng Cuo 埯, Taiwan Snack Street
- If you are a literary young man, shapowei Art West you can not miss
Last sentence: Xiamen welcomes you! 😃
Everything that seems to have passed away has never left. The love and warmth you have given me make me guard this place persistently.
You cottage, a sweet cottage. Cabin master, one hand code for survival, one hand cooking spoon to enjoy life, welcome your presence 😃
Welcome to scan the code to pay attention to the wechat public number: You and hut, take you to the entry data, take you to learn to do food 😃