Use python crawlers to crawl pits encountered by site music

preface

Recently, I wanted to download some music from the website and put it into my USB disk to listen to. However, when I searched online, all the major music websites needed VIP to download songs (especially the good ones).

For poor people like me, it is certainly not to spend dozens of dollars to download a few songs, and as a programmer, it is impossible to download music with money, so I spent a day, the Internet to find a variety of materials to learn how to not spend money to buy the music on the website.

Of course, there are many ways to do this, but IN the end I chose one of the simplest and most convenient: the Python crawler. Below, I will share with you some of the bugs I encountered while using python crawlers.

Here’s how I learned the Python crawler:

Ideas:

Where does music come from? — In the server of the website
How do I get music from the url? — Initiate a network request to the website
Delete and select music files
Downloading music files

The specific implementation

import requests A third-party library that sends network requests
Copy the code

Installation method

pip install requests

Introduce a third-party library for data parsing

from lxml import etree # Third-party library for data parsing
Copy the code

Installation method

pip install lxml

A list of yi Cloud music website url is ‘music.163.com/#/discover/… ‘

url = 'https://music.163.com/#/discover/toplist?id=3778678'
Copy the code

Send a request for page data

response = requests.get(url=url) Request page data
Copy the code

Analytical data

html=etree.HTML(response.text) Parse page data
Copy the code

Get the collection of all song tags (A tag)

id_list = html.xpath('//a[contains(@href,"song?")]')  # Collection of all song ids
Copy the code

Download the song

base_url = 'http://music.163.com/song/media/outer/url?id=' # Download music url prefix
# Download music URL = url prefix + music ID
for data in id_list:
    href = data.xpath('./@href') [0]
    music_id = href.split('=') [1] # music id
    music_url = base_url + music_id Download music url
    music_name = data.xpath('./text()') [0] # Download music name
    music = requests.get(url = music_url)
    Save downloaded music as a file
    with open('./music/%s.mp3' % music_name, 'wb') as file:
         file.write(music.content)
         print('<%s> Download successful ' % music_name)
Copy the code

In the pit of

The above method I learned from a video, that video was six months ago, maybe this method worked, but today I use this method to download music files suddenly error. First of all, the editor error could not find music_name and music_id, I carefully see, obtain id_list set (i.e., tag set) in the id is not id, is a code, estimate music website here also made the corresponding an mechanism. Secondly, I found a music on the website and obtained its ID and assigned the ID to music_id. As a result, when downloading music with external link, the error was 460, indicating that the network was crowded and the website for downloading music was probably not working.

base_url = 'http://music.163.com/song/media/outer/url?id='
music_id = '1804320463.mp3'
music_url = base_url + music_id
music = requests.get(url=music_url)
print(music.text)
Copy the code

{” MSG “:” Network is crowded, please try again later!” “,”code”:-460,”message”:” Network is crowded, please try again later!” }e

Finally, I print out music_URL, click on it, and I can still listen to music and download it

base_url = 'http://music.163.com/song/media/outer/url?id='
music_id = '1804320463.mp3'
music_url = base_url + music_id
# music = requests.get(url=music_url)
print(music_url)
Copy the code

Music.163.com/song/media/…

conclusion

After all, there are some things you can’t just give you. I’m writing this article to share with you my experience in learning python crawlers. At the same time, I also want to ask you some questions. How can I get the music files from this site onto my local computer?

Use python crawlers to crawl pits encountered by site music

preface

Ideas:

The specific implementation

In the pit of

conclusion

Related Posts

Optimization of recursion in PHP 7

Keep these 24 ES6 methods in mind and use them to solve real-world JS problems

CMDB – the cornerstone of intelligent operation and maintenance platform