“This is the 26th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.
preface
Use Python to realize data visualization of NetEase cloud music playlist. Without further ado.
Let’s have a good time
The development tools
Python version: 3.6.4
Related modules:
Requests the module
Pandas module
Matplotlib module;
And some modules that come with Python.
Environment set up
Install Python and add it to the environment variables. PIP installs the required related modules.
This time through the acquisition of NetEase cloud Music Chinese playlist data, the Chinese playlist data visualization analysis.
Use the Matplotlib visualization library to make use of the underlying library for visualization.
Web analytics
Playlist index page
Select Chinese hot songs list page.
Get playlist number, name, and author, and playlist details page link.
A total of 1,302 Chinese songs were collected.
Playlist details page
Get playlist details page information, more information.
There are single names, favorites, comments, tags, introductions, total number of songs, number of plays, title of songs included.
Here the length of the song, artist, album information in the web page in the iframe.
If you want to access information, you can use Selenium
To get the data
Playlist index page
from bs4 import BeautifulSoup
import requests
import time
headers = {
'User-Agent': 'the Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
for i in range(0.1330.35) :print(i)
time.sleep(2)
url = 'https://music.163.com/discover/playlist/?cat=, Europe and the United States & order = hot&limit = 35 & offset =' + str(i)
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
Get the tag that contains the playlist details page URL
ids = soup.select('.dec a')
Get the tag that contains the playlist index page information
lis = soup.select('#m-pl-container li')
print(len(lis))
for j in range(len(lis)):
Get the playlist details page address
url = ids[j]['href']
Get the playlist title
title = ids[j]['title']
Get playlist number
play = lis[j].select('.nb') [0].get_text()
Get the playlist contributor name
user = lis[j].select('p') [1].select('a') [0].get_text()
# output playlist index page information
print(url, title, play, user)
Write the information to a CSV file
with open('playlist.csv'.'a+', encoding='utf-8-sig') as f:
f.write(url + ', ' + title + ', ' + play + ', ' + user + '\n')
Copy the code
Through the above code we get the playlist index page information
Playlist details page
Part of the code
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time
df = pd.read_csv('playlist.csv', header=None, error_bad_lines=False, names=['url'.'title'.'play'.'user'])
headers = {
'User-Agent': 'the Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
for i in df['url']:
time.sleep(2)
url = 'https://music.163.com' + i
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
Get the playlist title
title = soup.select('h2') [0].get_text().replace(', '.', ')
# fetch tag
tags = []
tags_message = soup.select('.u-tag i')
for p in tags_message:
tags.append(p.get_text())
Format the tag
if len(tags) > 1:
tag = The '-'.join(tags)
else:
tag = tags[0]
Copy the code
Details of 1302 Chinese playlists obtained
Data visualization
TOP10 songs
Of the ten songs on the list, except “Mercury”, little F listened to many times.
Playlist contributed UP to TOP10
TOP10 playlists
Top 10 playlists with more than 70 million streams.
TOP10 in playlist collection
Recommended collection
TOP10 comments on playlist
The playlist “Goodbye Warrior: The death of Martial arts novel master Jin Yong” received the most comments.
Playlist collection quantity distribution
Mainly distributed between 0 and 150,000 (ln(150000)=12).
Playlist number distribution
The number of playlists is mainly distributed between 0 and 10 million.
Playlist label diagram
Since the selection is the Chinese song list, so the Chinese word is indispensable
Playlist introduction word cloud map,
Playlist introduces word cloud map, hope you can find a song you like