preface

Recently, I wanted to download some music from the website and put it into my USB disk to listen to. However, when I searched online, all the major music websites needed VIP to download songs (especially the good ones).

For poor people like me, it is certainly not to spend dozens of dollars to download a few songs, and as a programmer, it is impossible to download music with money, so I spent a day, the Internet to find a variety of materials to learn how to not spend money to buy the music on the website.

Of course, there are many ways to do this, but IN the end I chose one of the simplest and most convenient: the Python crawler. Below, I will share with you some of the bugs I encountered while using python crawlers.


Here’s how I learned the Python crawler:

Ideas:

  • Where does music come from? — In the server of the website
  • How do I get music from the url? — Initiate a network request to the website
  • Delete and select music files
  • Downloading music files

The specific implementation


import requests A third-party library that sends network requests
Copy the code

Installation method

pip install requests

  1. Introduce a third-party library for data parsing
from lxml import etree # Third-party library for data parsing
Copy the code

Installation method

pip install lxml

  1. A list of yi Cloud music website url is ‘music.163.com/#/discover/… ‘
url = 'https://music.163.com/#/discover/toplist?id=3778678'
Copy the code
  1. Send a request for page data
response = requests.get(url=url) Request page data
Copy the code
  1. Analytical data
html=etree.HTML(response.text) Parse page data
Copy the code
  1. Get the collection of all song tags (A tag)
id_list = html.xpath('//a[contains(@href,"song?")]')  # Collection of all song ids
Copy the code
  1. Download the song
base_url = 'http://music.163.com/song/media/outer/url?id=' # Download music url prefix
# Download music URL = url prefix + music ID
for data in id_list:
    href = data.xpath('./@href') [0]
    music_id = href.split('=') [1] # music id
    music_url = base_url + music_id Download music url
    music_name = data.xpath('./text()') [0] # Download music name
    music = requests.get(url = music_url)
    Save downloaded music as a file
    with open('./music/%s.mp3' % music_name, 'wb') as file:
         file.write(music.content)
         print('<%s> Download successful ' % music_name)
Copy the code

In the pit of

The above method I learned from a video, that video was six months ago, maybe this method worked, but today I use this method to download music files suddenly error. First of all, the editor error could not find music_name and music_id, I carefully see, obtain id_list set (i.e., tag set) in the id is not id, is a code, estimate music website here also made the corresponding an mechanism. Secondly, I found a music on the website and obtained its ID and assigned the ID to music_id. As a result, when downloading music with external link, the error was 460, indicating that the network was crowded and the website for downloading music was probably not working.

base_url = 'http://music.163.com/song/media/outer/url?id='
music_id = '1804320463.mp3'
music_url = base_url + music_id
music = requests.get(url=music_url)
print(music.text)
Copy the code

{” MSG “:” Network is crowded, please try again later!” “,”code”:-460,”message”:” Network is crowded, please try again later!” }e

Finally, I print out music_URL, click on it, and I can still listen to music and download it

base_url = 'http://music.163.com/song/media/outer/url?id='
music_id = '1804320463.mp3'
music_url = base_url + music_id
# music = requests.get(url=music_url)
print(music_url)
Copy the code

Music.163.com/song/media/…

conclusion

After all, there are some things you can’t just give you. I’m writing this article to share with you my experience in learning python crawlers. At the same time, I also want to ask you some questions. How can I get the music files from this site onto my local computer?