“This is the 27th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

preface

Using Python to realize the visualization of Chinese subway data. Without further ado.

Let’s have a good time

The development tools

Python version: 3.6.4

Related modules:

Requests module;

Wordcloud module;

Pandas module;

Numpy module;

Jieba module;

Pyecharts module;

Matplotlib module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

This time through the acquisition of subway line data, urban distribution data visualization analysis.

Analysis for

Metro information is obtained from AmAP.

Obtain the “ID”, “cityName” and “name” of the city.

It is used to splice the requested url to obtain the specific information of subway line.

Find the request information and get details of subway lines and stations in each city.

To get the data

Part of the code

import json
import requests
from bs4 import BeautifulSoup

headers = {'user-agent''the Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

def get_message(ID, cityname, name) :
    """ Metro Line information Acquisition """
    url = 'http://map.amap.com/service/subway?_1555502190153&srhdata=' + ID + '_drw_' + cityname + '.json'
    response = requests.get(url=url, headers=headers)
    html = response.text
    result = json.loads(html)
    for i in result['l'] :for j in i['st'] :# Check whether there are subways
            if len(i['la') >0:
                print(name, i['ln'] + '(' + i['la'] + ') ', j['n'])
                with open('subway.csv'.'a+', encoding='gbk'as f:
                    f.write(name + ', ' + i['ln'] + '(' + i['la'] + ') ' + ', ' + j['n'] + '\n')
            else:
                print(name, i['ln'], j['n'])
                with open('subway.csv'.'a+', encoding='gbk'as f:
                    f.write(name + ', ' + i['ln'] + ', ' + j['n'] + '\n')

Copy the code

Get data result display

3,541 subway stations

Data visualization

First, the data were cleaned to remove the repeated transfer station information.

from wordcloud import WordCloud, ImageColorGenerator
from pyecharts import Line, Bar
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import jieba

Align column names with data
pd.set_option('display.unicode.ambiguous_as_wide'.True)
pd.set_option('display.unicode.east_asian_width'.True)
Display 10 lines
pd.set_option('display.max_rows'.10)
# fetch data
df = pd.read_csv('subway.csv', header=None, names=['city'.'line'.'station'], encoding='gbk')
# Subway lines in each city
df_line = df.groupby(['city'.'line']).count().reset_index()
print(df_line)
Copy the code

The total number of subway lines in China was obtained by grouping cities and subway lines.

There are 183 subway lines

def create_map(df) :
    # Map
    value = [i for i in df['line']]
    attr = [i for i in df['city']]
    geo = Geo("Distribution of metro cities already opened", title_pos='center', title_top='0', width=800, height=400, title_color="#fff", background_color="#404a59", )
    geo.add("", attr, value, is_visualmap=True, visual_range=[0.25], visual_text_color="#fff", symbol_size=15)
    geo.render("Distribution of open metro cities. HTML")

Copy the code

The number of subway lines in each city.

32 cities have subways

Urban distribution

Most of them are provincial capitals, and some have strong economic strength.

Distribution of the number of lines

You can see that most of them are still in the “0-5” phase, with at least one line of course.

# Which city and which line has the most subway stations
print(df_line.sort_values(by='station', ascending=False))
Copy the code

Which city and which line has the most subway stations

Beijing Line 10 ranked first and Chongqing Line 3 ranked second

Remove duplicate transfer station data

# Remove subway data from double transfer stations
df_station = df.groupby(['city'.'station']).count().reset_index()
print(df_station)
Copy the code

It contains 3,034 subway stations

Nearly 400 fewer subway stations

Let’s see which city has the most subway stations

# Count the number of subway stations in each city (excluding double transfer stations)
print(df_station.groupby(['city']).count().reset_index().sort_values(by='station', ascending=False))
Copy the code

There are so many subway stations in Wuhan

Implement the operation in the new weekly, generate metro noun cloud

def create_wordcloud(df) :
    """ Generate metro noun cloud """
    # participle
    text = ' '
    for line in df['station']:
        text += ' '.join(jieba.cut(line, cut_all=False))
        text += ' '
    backgroud_Image = plt.imread('rocket.jpg')
    wc = WordCloud(
        background_color='white',
        mask=backgroud_Image,
        font_path='C:\Windows\Fonts\ W8.ttf ',
        max_words=1000,
        max_font_size=150,
        min_font_size=15,
        prefer_horizontal=1,
        random_state=50,
    )
    wc.generate_from_text(text)
    img_colors = ImageColorGenerator(backgroud_Image)
    wc.recolor(color_func=img_colors)
    # Look at the word frequency
    process_word = WordCloud.process_text(wc, text)
    sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)
    print(sort[:50])
    plt.imshow(wc)
    plt.axis('off')
    wc.to_file("Metro noun cloud. JPG")
    print('Generated word cloud successfully! ')


create_wordcloud(df_station)
Copy the code

Show word cloud