Analysis of the

We will use the simplest code to implement a simple crawler. This example is written in ES6 format. First we need to install the third-party Cheerio module, using NPM install Cheerio –save command in the project directory. In the file we need to introduce the following modules:

Module name role
cheerio Similar to jQuery, easy to manipulate DOM content
https The HTTP module is used similarly for HTTPS

Code implementation

// const HTTPS = require(' HTTPS '); const cheerio = require('cheerio'); // let URL = ''; Get (url, res => {let HTML = ""; Res. on('data', data => {HTML += data; }); Res. on('end', () => {let titles = filterData(HTML); console.log(titles); }); }).on('error', e => { console.log(e.message); }); Function filterData(HTML) {let $= cheerio.load(HTML); let oTitles = $('.post-title-link'); let titles = ''; oTitles.each( (index, item) => { let title = $(item).text(); let end = index == (oTitles.length - 1) ? '' : '\n'; Titles += '[' + (index+1) +'] '+ title + end; }); return titles; }Copy the code

At this point, a simple crawler that crawls the content title of a article is complete 😄

Last Updated:

Please indicate the source of reprint:…