Cause: : Dad asked me to download thousands of songs for him to play in the car, feel manual download, even if the batch download also takes time, just write a crawler to download automatically.
For this crawler small project, select Node +koa2, initialize the project koA2 projectName (koA-Generator needs to be installed globally), then enter the project file, NPM install && NPM start, Among them, superagent, Cheerio, Async, FS and PATH are used as dependencies
Open the netease Cloud web page and click on the playlist page. I select the Chinese category, right-click the source code of the framework, obtain the real URL, and find the HTML structure with the ID of M-PL-Container. This is the playlist to be crawled this time. Async is required for concurrent crawls
static getPlayList(){
const pageUrlList = this.getPageUrl();
return new Promise((resolve, reject) => {
asy.mapLimit(pageUrlList, 1, (url, callback) => {
this.requestPlayList(url, callback);
}, (err, result) => {
if(err){ reject(err); } resolve(result); })})}Copy the code
Const asy = require(‘async’) const asy = require(‘async’) const asy = require(‘async’
static requestPlayList(url, callback){
superagent.get(url).set({
'Connection': 'keep-alive'
}).end((err, res) => {
if(err){
console.info(err);
callback(null, null);
return;
}
const $ = cheerio.load(res.text);
letcurList = this.getCurPalyList($); callback(null, curList); })}Copy the code
GetCurPalyList gets the information on the page, passing in $for DOM manipulation
static getCurPalyList($){
let list = [];
$('#m-pl-container li').each(function(i, elem){
let _this = $(elem);
list.push({
name: _this.find('.dec a').text(),
href: _this.find('.dec a').attr('href'),
number: _this.find('.nb').text()
});
});
return list;
}
Copy the code
Now that the list of songs has been climbed, it’s time to climb the list of songs
static async getSongList(){
const urlCollection = await playList.getPlayList();
let urlList = [];
for(let item of urlCollection){
for(letsubItem of item){ urlList.push(baseUrl + subItem.href); }}return new Promise((resolve, reject) => {
asy.mapLimit(urlList, 1, (url, callback) => {
this.requestSongList(url, callback);
}, (err, result) => {
if(err){ reject(err); } resolve(result); })})}Copy the code
RequestSongList is used in much the same way as playList above, so it is not repeated. After the above code gets the song list, you need to download it locally
static async downloadSongList(){
const songList = await this.getSongList();
let songUrlList = [];
for(let item of songList){
for(let subItem of item){
let id = subItem.url.split('=') [1]; songUrlList.push({ name: subItem.name, downloadUrl: downloadUrl +'? id=' + id + '.mp3'}); }}if(! fs.existsSync(dirname)){ fs.mkdirSync(dirname); }return new Promise((resolve, reject) => {
asy.mapSeries(songUrlList, (item, callback) => {
setTimeout(() => {
this.requestDownload(item, callback);
callback(null, item);
}, 5e3);
}, (err, result) => {
if(err){ reject(err); } resolve(result); })})}Copy the code
RequestDownload is a request for a downloadUrl and saves the download to the local PC
static requestDownload(item, callback){
let stream = fs.createWriteStream(path.join(dirname, item.name + '.mp3'));
superagent.get(item.downloadUrl).set({
'Connection': 'keep-alive'
}).pipe(stream).on('error', (err) => { console.info(err); // Error handling, when climbing error, print error and continue down})}Copy the code
At this point, the crawler applet is complete. This project climbs the songList -> songList -> download to the local, of course, can directly find the home page of a certain artist, change the URL passed into the songList, directly download the popular songs of that artist.