“This is the 27th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.
preface
Using Python to realize the visualization of Chinese subway data. Without further ado.
Let’s have a good time
The development tools
Python version: 3.6.4
Related modules:
Requests module;
Wordcloud module;
Pandas module;
Numpy module;
Jieba module;
Pyecharts module;
Matplotlib module;
And some modules that come with Python.
Environment set up
Install Python and add it to the environment variables. PIP installs the required related modules.
This time through the acquisition of subway line data, urban distribution data visualization analysis.
Analysis for
Metro information is obtained from AmAP.
Obtain the “ID”, “cityName” and “name” of the city.
It is used to splice the requested url to obtain the specific information of subway line.
Find the request information and get details of subway lines and stations in each city.
To get the data
Part of the code
import json
import requests
from bs4 import BeautifulSoup
headers = {'user-agent': 'the Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
def get_message(ID, cityname, name) :
""" Metro Line information Acquisition """
url = 'http://map.amap.com/service/subway?_1555502190153&srhdata=' + ID + '_drw_' + cityname + '.json'
response = requests.get(url=url, headers=headers)
html = response.text
result = json.loads(html)
for i in result['l'] :for j in i['st'] :# Check whether there are subways
if len(i['la') >0:
print(name, i['ln'] + '(' + i['la'] + ') ', j['n'])
with open('subway.csv'.'a+', encoding='gbk') as f:
f.write(name + ', ' + i['ln'] + '(' + i['la'] + ') ' + ', ' + j['n'] + '\n')
else:
print(name, i['ln'], j['n'])
with open('subway.csv'.'a+', encoding='gbk') as f:
f.write(name + ', ' + i['ln'] + ', ' + j['n'] + '\n')
Copy the code
Get data result display
3,541 subway stations
Data visualization
First, the data were cleaned to remove the repeated transfer station information.
from wordcloud import WordCloud, ImageColorGenerator
from pyecharts import Line, Bar
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import jieba
Align column names with data
pd.set_option('display.unicode.ambiguous_as_wide'.True)
pd.set_option('display.unicode.east_asian_width'.True)
Display 10 lines
pd.set_option('display.max_rows'.10)
# fetch data
df = pd.read_csv('subway.csv', header=None, names=['city'.'line'.'station'], encoding='gbk')
# Subway lines in each city
df_line = df.groupby(['city'.'line']).count().reset_index()
print(df_line)
Copy the code
The total number of subway lines in China was obtained by grouping cities and subway lines.
There are 183 subway lines
def create_map(df) :
# Map
value = [i for i in df['line']]
attr = [i for i in df['city']]
geo = Geo("Distribution of metro cities already opened", title_pos='center', title_top='0', width=800, height=400, title_color="#fff", background_color="#404a59", )
geo.add("", attr, value, is_visualmap=True, visual_range=[0.25], visual_text_color="#fff", symbol_size=15)
geo.render("Distribution of open metro cities. HTML")
Copy the code
The number of subway lines in each city.
32 cities have subways
Urban distribution
Most of them are provincial capitals, and some have strong economic strength.
Distribution of the number of lines
You can see that most of them are still in the “0-5” phase, with at least one line of course.
# Which city and which line has the most subway stations
print(df_line.sort_values(by='station', ascending=False))
Copy the code
Which city and which line has the most subway stations
Beijing Line 10 ranked first and Chongqing Line 3 ranked second
Remove duplicate transfer station data
# Remove subway data from double transfer stations
df_station = df.groupby(['city'.'station']).count().reset_index()
print(df_station)
Copy the code
It contains 3,034 subway stations
Nearly 400 fewer subway stations
Let’s see which city has the most subway stations
# Count the number of subway stations in each city (excluding double transfer stations)
print(df_station.groupby(['city']).count().reset_index().sort_values(by='station', ascending=False))
Copy the code
There are so many subway stations in Wuhan
Implement the operation in the new weekly, generate metro noun cloud
def create_wordcloud(df) :
""" Generate metro noun cloud """
# participle
text = ' '
for line in df['station']:
text += ' '.join(jieba.cut(line, cut_all=False))
text += ' '
backgroud_Image = plt.imread('rocket.jpg')
wc = WordCloud(
background_color='white',
mask=backgroud_Image,
font_path='C:\Windows\Fonts\ W8.ttf ',
max_words=1000,
max_font_size=150,
min_font_size=15,
prefer_horizontal=1,
random_state=50,
)
wc.generate_from_text(text)
img_colors = ImageColorGenerator(backgroud_Image)
wc.recolor(color_func=img_colors)
# Look at the word frequency
process_word = WordCloud.process_text(wc, text)
sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)
print(sort[:50])
plt.imshow(wc)
plt.axis('off')
wc.to_file("Metro noun cloud. JPG")
print('Generated word cloud successfully! ')
create_wordcloud(df_station)
Copy the code
Show word cloud