introduce
Do you often want to download music from major music websites? But websites force you to download their apps? However, you download the app and they force you to buy VIP…… Never mind, today we will use crawler means “sanctions” these websites! First, start with the simplest cool dog music!
Functions overview
Let the user enter the name of the music to search, and then display all the music and the corresponding information of each music to the user. Then ask the user whether to download any music, if yes, let the user input the music corresponding ID number to download (support batch download).
Find out the train of thought
First, we need to know how to get a download address for a single song before we can get information and download addresses for multiple songs.
Go to www.kugou.com and type in the name of the song you want to find in the search bar. Press Enter. After switching pages, click on the play page of a song. Press F12 to bring up developer tools. Select Network, then click All. As you can see, nothing is displayed at the moment. Since all the files have been loaded by the time you open the developer tools, all you need to do at this point is refresh the page with F5. All right, now
You’ll get a page like this.
You can see js files, PNG files, audio files, nothing! Since the site has already loaded the files before we bring up the developer tools, all we need to do is press F5 to refresh the site. Okay, all the files are loaded. Go to a file called index.php? Then enter the address of this file.
,
Once you enter the file address, this is actually the music’s information (I’ll say it later in the article for convenience). We can also see something called play_url, which is the address of the MP3 file for the audio, and you can see that all of these play_url changes the/to the /. We don’t have to worry about this because the url input field automatically adjusts to/for us, but when we use code to implement crawlers, we need to change/to /. But for the time being, we don’t have to worry about that. Let’s go to this website, huh? Isn’t that the music we just played?
After success, we have more confidence and ideas to crawl. We only need to find out the information address of each song, and then use the regular expression to obtain the information and music address of each song. Get the binary code of music again with crawler and save it locally.
So how do we get the location of each song? By stitching the address! Let’s see how the urls of these two songs are different and you’ll see.
Faded – wwwapi.kugou.com/yy/i…
Calories – wwwapi.kugou.com/yy/i…
You can see that there’s no difference except for the hash value. That means we just need to go to wwwapi.kugou.com/yy/i…
To concatenate the information address of each song. So where’s hash going to find the song? Go back to cool Dog’s music search bar, search for any song and press Enter. You can see there’s a bunch of songs here. F12-network-all-f5, we found one of these files.
If we go to this site, we can see the hash of all the songs we just wrote. So again, how do we get the hash information url? This is too easy, just go to songsearch.kugou.com/…
Just concatenate the url.
So for the name of the song that we’re searching for, we’re going to code input and let the user enter the name of the song. So, did you get the idea?
Hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash Get the song’s play_URL again with the regular expression.
Start coding
Start by importing our Requests and RE regular expression libraries. Re is used to find out the music information and download location, and Requests is responsible for getting the text and downloading the music.
We also need to set some variables, which will come in handy later.
Aren’t we going to splice together information sites for multiple songs? Let’s ask the user to enter a song name first. Then try again.
Now we’re ready to request text with Requests! Since the url is requested by GET and we’re requesting text, we’ll also use the method requests.get().text method.
Next, you can try printing the text. The printed text is indistinguishable from the content of our concatenated url.
In this text, we can get the hash value for each song. Use regular expressions to look it up.
Print song_hashes, and you can see that it’s a list. So we’re going to do for traversal.
In the previous code, we concatenated each hash, and then we found the music name and author and download address from the individual song’s informational text. Since the music name and author are ASCII encoded, we also need to do a decoding. Because song names and artists are sometimes duplicated, we add music and author names to a dictionary every time we print music and author. Each print is checked for the presence of a dictionary. The dictionary key is changed by our timer variable. We also saved the song_urls dictionary where each song was downloaded.
Once the music information is printed, the user is asked which song to download.
As before, use requests.get().content to convert the music to a binary file and save it. Before getting, we also need to change the url’s messy/to /. After that, it can be saved!
Let’s try a song called the Day You Went Away
Code implementation effect:
Procedural inadequacies
Every once in a while, the cool dog makes a sliding captcha, and at that time, our program can’t get the data. This situation can be easily solved with Selenium.
Complete code:
This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian delete!
The original address: www.tuicool.com/articles/Vj…
Need source code or want to know more(Click here to download)