Python3.x Climbs netease Cloud music playlist

This article is strictly used as a beginner practice, crawl function does not include charge music, also does not include lossless music, complete code at the end of the article.

Analysis of the

1. Obtain the request header

Use the packet capture tool Fiddler/Charles to capture the request header information of the playlist. This step is already written in the code.

2. Analyze playlist link information

I’m going to open up a playlist, and the address is https://music.163.com/#/playlist?id=2944697443 and I’m going to change the link to HTTP, Then remove / # http://music.163.com/playlist?id=2944697443 this link the same address the playlist, we use this address.

3. Get the ID of each song

Open the playlist link in your browser, hover your mouse over a particular music and click Check to see the id information of the song. As shown in figure.

4. The outer chain address of music

Netease cloud music mp3 files outside chain address music.163.com/song/media/…

According to the analysis of the address, we can change a different ID

The address appears in the browser as shown below

This step, it is worth noting that according to the external link mp3 address access, there will be a redirect, so the code also needs to add.

5. Save the MP3 file

The complete code

#! python3 #encoding=utf8 import requests from bs4 import BeautifulSoup import urllib.request headers = { 'Referer':'http://music.163.com/', 'Host':'music.163.com', 'user-agent ':' Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8 ', } play_url = 'http://music.163.com/playlist?id=2308326138' s = requests.session() response=s.get(play_url,headers = Headers).content #print(response) the entire HTML source file s = BeautifulSoup(response,' LXML ') #print(s) main = s.find('ul',{'class':'f-hide'}) #print(main) lists=[] for music in main.find_all('a'): #print('{} : {}'.format(music.text, music['href'])) list=[] musicUrl='http://music.163.com/song/media/outer/url'+music['href'][5:]+'.mp3' List. Append (musicName) list. Append (musicUrl) # Put all song information in the lists Lists. Append (list) print(lists) # def get_redirect_url(url): Url = "url before redirection" Here I set the browser proxy headers = {'Referer':'http://music.163.com/', 'Host':'music.163.com', 'user-agent ': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q =0.8',} # request page response = requests. Headers =headers) # print(response.status_code) # print(response.url) # print the url after the redirect For I in lists: url= I [1] name= I [0] try: Print (' downloading ',name) # print(' downloading ',name) # print(' downloading ',name) To simulate the redirect # in advance want to manually create a folder 'wangyi' oh # urllib. Request. Urlretrieve (url '/ % s.m p3' % name) Urllib. Request. Urlretrieve (get_redirect_url (url), s.m p3 ' '. / wangyi / % % name) print (' download success) except: print (' failed download)Copy the code