Xiamen is really more than Gulangyu

Today’s article is about Xiamen. This is because I spent many years in Xiamen. From graduation to work, I spent a very beautiful youth in Xiamen. As a foreigner, I really think Xiamen is a garden city with beautiful environment and artistic atmosphere. Once by the United States president Nixon as the “Oriental Hawaii”, in the author’s opinion, is really worthy of the name 😃

Xiamen has always been a popular 🔥 tourist city in China, many people are going to Gulangyu, after all, it is famous. In my memory, I have been to Gulangyu for at least 7 times. Every time my family members or friends came, I would take them or provide them with a simple guide. But Xiamen is really more than Gulangyu.

Gulangyu island

Let’s start with a picture of Gulangyu. Once, the author took a friend to the top of The Sunlight Rock (the highest peak of Gulangyu). There are so many tourists on the top that you can see Gulangyu Island, Xiamen landmark Shimao Twin Towers and Xiamen Bay. It is indeed very beautiful 😃

The data source

The data used in this article is obtained from a website, and the specific process is explained in detail.

Crawl field

A total of 6 fields have been climbed, which are:

Chinese name cn_title
English name en_title
Guide the strategy
Comment on the comment
Top ranking
Brief Introduction of scenic spots

Web rule

1, enter the website: travel.qunar.com/p-cs299782-… We found up to 10 attractions per page, totaling 126 pages:

Webpage rules can be constructed as:

for i in range(1.127):
  url = "https://travel.qunar.com/p-cs299782-xiamen-jingdian-1-{}".format(i)  # Rules of the web
Copy the code

2, Let’s look at the location of the six fields in the source code, right-click and choose “Check”, in the review element elements to find 10 spots per page, where each

pair represents a spot.

3. Look at the position of each field

We can locate each field through the following three pictures. Only when the field is located can we parse it out.

Let’s parse each field out

Import corresponding libraries

To crawl a field, you need to import various libraries. The main functions of these libraries include:

Send network request requests
Re module for regular parsing data
Json packages handle Python dictionary type data
CSV is used to save the crawled data
Pandas and numpy handle the crawled data
Plotly_express and Pyecharts

import pandas as pd
import numpy as np
import re
import csv
import json
import requests
import random

# participle
import jieba

# drawing
# import plotly. Express as px
import plotly_express as px
import plotly.graph_objects as go

from wordcloud import WordCloud  # word cloud
import matplotlib.pyplot as plt
from plotly.graph_objects import Scatter,Bar
Copy the code

First page data request

Let’s take a look at the first page of data request, look at the specific source code

Some webpage data are as follows:

Match the field

Here is a regular expression to match each field using findAll () in the re module:

1. Names of scenic spots in Chinese

Sometimes we need to check if the number of pages is 10, so we print out the length, which is exactly 10

2. English names of scenic spots

3. Number of walkthroughs

4. Number of comments

5. Scenic spots ranking

The ranking of scenic spots is quite special. When I climbed the data, I found that some scenic spots were not ranked. For example, many of the scenic spots on page 16 were not ranked and needed special ranking.

In the code above, if the ranking of attractions does not exist, 0 is used instead:

6. Introduction to scenic spots

Cut in crawl

Here is the source code to crawl the entire site, including:

Web URL construction
Send a request to get the source code
Field resolution process, special case handling
Save the file

The data processing

Pandas is used to read the text in pandas:

df = pd.read_csv("Scenic spots in Xiamen. CSV")
df.head()  Extract the first 5 data items
Copy the code

Check the length of the data and the field type, and see if there are any missing values:

You can see that three fields are of type string: Object
The other three fields are of type INT64

df.isnull().sum(a)# Check for missing values

The results showed that 369 English names were missing, and 1121 scenic spots had no introduction
Copy the code

Chinese name cn_title

First, let’s look at the names of scenic spots. Xiamen is a garden city. There are many parks on the island. Let’s look at the statistics to see how many parks there are in Xiamen:

str.contains(): String contains functions
reset_index(): Rearranges the indexes in pandas

# 1- How many parks

park = df[df["cn_title"].str.contains("Park")].reset_index(drop=True)
park
Copy the code

Conclusion: Data show that there are 107 parks in and around Xiamen Island

Let’s take a look at the parks by ranking field and see which ones are popular:

new_park = park[park["ranking"] != 0].sort_values(by=["ranking"]).reset_index(drop=True)
new_park[:20]  # Take out the top 20 parks
Copy the code

From the data, the top three are: railway culture Park, Zhongshan Park, Wuyuanwan Wetland Park

1. Railway Culture Park: THE author has been to it once. Near Jinbang Park, near Jinbang Park, is the extension line of Yingxia (Jiangxi Yingtan to Xiamen) railway, as well as the old railway tracks

2, Zhongshan Park: many places should have Zhongshan Park, to commemorate Dr. Sun Yat-sen, there will be a lot of activities here

3. Wuyuanwan Wetland Park: Wuyuanwan reminds people of tuhao people in Xiamen. Wuyuan Bay can be said to be the tuhao district of Xiamen

Baeknojoo Park, Chunggrun Park, and Mt. Cheonjoo Forest Park outside the island are all worth visiting. Let’s take a look at some famous streets in Xiamen:

# 2- Famous Street
street = df[df["cn_title"].str.contains("Street")].reset_index(drop=True)
street.head(10)
Copy the code

There were 37 pieces of data, and we took the first 10 pieces of data

Zhongshan Road pedestrian street is really popular: a variety of local Southern Fujian snacks, milk tea, Taiwan specialties, Xiamen landmark building – arcade building. Every holiday is traffic jam, people mountain sea.

Ting O Tsai Cat Street is also popular. Not far from the south gate of Xiamen University, I have been there several times. Inside, there is a shop selling cats, which is super popular 🔥 finally, let’s look at the attractions related to college:

It can be seen that basically 17 scenic spots related to xiamen university have been contracted by 3 universities:

Xiamen University: The most beautiful university in China
Jimei university
Huaqiao university

Xiamen University used to be accessible to tourists. In recent years, the number of students has been limited, and you need to make an appointment to enter the university, so if you want to go to Xiamen University, please make an appointment in advance

But if you have relatives or friends reading in it, I heard that you can take in oh 😃 secretly tell you. Put a picture of xiamen shangxian field, the author has taken.

Scenic spot ranking

Let’s go straight to the ranking to see which scenic spots are popular:

# remove ranking=0; Rank in ascending order;
# Take out the top 10 attractions
ranking  = df[df["ranking"] != 0].sort_values(by=["ranking"],ascending=True)[:20].reset_index(drop=True)

px.bar(ranking,  # Incoming data box
       x="cn_title".# horizontal field
       y="ranking".# vertical axis field
       color="ranking"  # Color display field
      )
Copy the code

Top of the list is gulangyu 😭. Shell 🐚 Dream World, Xiamen Undersea World, Xiamen Da Deji Bathing Beach, Sunshine Rock and so on are the scenic spots on Gulangyu Island. So Gulangyu is really very popular

Xiamen University and the Adjacent Nanputuo Temple are also popular with tourists. A few years ago, Xiamen built a new landmark: the Twin Towers, and a lot of tourists went there.

Strategy for the strategy

Many tourists like to write some tourist guides for others to refer to after they arrive at scenic spots. Let’s take a look at the number of popular tourist guides:

px.scatter(df,  # Drawing data
           x="cn_title".# transverse and longitudinal axis
           y="strategy",
           color="strategy"  # color tag
          )
Copy the code

Conclusion: The data shows that Xiamen University is the most popular scenic spot for tourists to write guidebooks, followed by Nanputuo Temple and Zhongshan Road pedestrian Street.

Number of comments on the comment

Let’s take a look at the number of tourist comments; Take out the top 20 scenic spots after descending order, and display the top 10 data:

comment = df[df["comment"] != 0].sort_values(by=["comment"],ascending=False)[:20].reset_index(drop=True)
comment.head(10)
Copy the code

px.scatter(comment,  # data frame
           x="cn_title".# transverse and longitudinal axis
           y="comment",
           color="comment"  # color
          )
Copy the code

According to the number of comments and ranking, we draw a multi-graph combination:

fig = px.scatter(comment,   # data frame
                 x="ranking".# transverse
                 y="comment".# on the vertical
                 color="ranking".# color
                 marginal_y="violin".# y graph
                 marginal_x="box".# X-axis graph
                 trendline="ols".# the trend line
                 template="simple_white")  # template
fig.show()
Copy the code

Ranked the first or drum wave island 😭 Xiamen University, South Putuo Temple, zhongshan Road pedestrian street followed.

Brief introduction to the abstract

Finally, we analyze the brief introduction of scenic spots on the website. Here we use WordCloud to draw the WordCloud. Start by populating the empty values in the introduction.

abstract = df.fillna(value="")  # Missing value fill

abstract_list = abstract["abstract"].tolist() # Show the top 10 profiles
abstract_list[:10]
Copy the code

Next we use jieba participle and add each participle to a big list:

jieba_list = []

for i in range(len(abstract_list)):
  	# jieba participle
    seg_list = jieba.cut(str(abstract_list[i]).strip(), cut_all=False)
    for each in list(seg_list):
        jieba_list.append(each)

jieba_list[:10]
Copy the code

First time: Drawing directly using Wordcloud

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "".join(i for i in jieba_list)   # String to be processed

# Download the simhei. TTF font and place it in one of your own directories
font = r'/Users/peter/Desktop/spider/SimHei.ttf'

wc = WordCloud(collocations=False, 
               font_path=font, # path
               max_words=2000,width=4000,
               height=4000, margin=2).generate(text.lower())

plt.imshow(wc)
plt.axis("off")
plt.show()

wc.to_file('xiamen.png')  # Save the word cloud
Copy the code

Seen from the word cloud, xiamen and Gulangyu are very prominent in the introduction. Of course, there are also a lot of invalid words, such as: located, here, etc., next we use to stop times table for processing, stop words table is collected online:

Draw it again: the stop words list is self-collected

Create a list of stop words
def StopWords(filepath) :
    stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]
    return stopwords

Pass the path to the stop word table
stopwords = StopWords('/Users/peter/Desktop/Publish/nlp_stopwords.txt')

stopword_list = []
for word in jieba_list:  
    if word not in stopwords:
        ifword ! ="\t" andword ! ="":  
            stopword_list.append(word)

stopword_list[:10]
Copy the code

After using the stop list, I find that commas and other punctuation have been removed, and many worthless words have also been removed. Next, we use the picture of the beautiful woman as the background to draw the word cloud map:

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

d = path.dirname('. ')   # Use this code in the IDE
# d = path.dirname(__file__)

Pass in a new word list
text = "".join(i for i in stopword_list)

# https://www.deviantart.com/jirkavinse/art/Real-Life-Alice-282261010
alice_coloring = np.array(Image.open(path.join(d, "wordcloud.jpg")))

# Set the stop word
stopwords = set(STOPWORDS)
stopwords.add("said")

# font path
font = r'/Users/peter/Desktop/spider/SimHei.ttf'

wc = WordCloud(background_color="white", font_path=font,
               max_words=2000, mask=alice_coloring,
               height=6000,width=6000,
               stopwords=stopwords, max_font_size=40, random_state=42)

wc.generate(text)

image_colors = ImageColorGenerator(alice_coloring)  

plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()
wc.to_file('xiamen. PNG')  # Save the word cloud
Copy the code

The final drawing is shown as follows:

conclusion

Based on a data from the Internet, this article analyzes the relevant scenic spots in Xiamen and has a look at what people like to go to in Xiamen:

Gulangyu is so popular that it is almost a must for tourists
Scenic spots in Xiamen University area: Siming Campus of Xiamen University (Furong Lake, Furong Tunnel, Songen Building, etc.), The South Putuo Temple next to it, the Five Old Peaks above the temple
If you like the sea and cycling, go here: Xiamen University Baicheng Beach, Yanwu Bridge, Huandao Road, Hulishan Fortress, Coconut Wind Village
If you are a food lover, you should go to: Zhongshan Road pedestrian Street, Zeng Cuo 埯, Taiwan Snack Street
If you are a literary young man, shapowei Art West you can not miss

Last sentence: Xiamen welcomes you! 😃

Everything that seems to have passed away has never left. The love and warmth you have given me make me guard this place persistently.

You cottage, a sweet cottage. Cabin master, one hand code for survival, one hand cooking spoon to enjoy life, welcome your presence 😃

Welcome to scan the code to pay attention to the wechat public number: You and hut, take you to the entry data, take you to learn to do food 😃

Xiamen is really more than Gulangyu