Since contacting Node for more than two months, DUE to the urgent project, I have been unable to learn node related modules properly. I have contacted Python crawler before, and today I try to use Node to write a small crawler. Node modules used in this paper are:

rendering

The node module

const cheerio = require("cheerio"); // require("request") const request = require("request"); Const fs = require("fs"); // file operator const path = require("path"); // Path correlationCopy the code

steps

Pull the project

npm installCopy the code
node app.jsCopy the code

implementation

  • Select the fiction catalog page
  • Gets all chapter page addresses of the novel
  • Use Request to get section information
  • Get the chapter title and its chapter content using Cheerios
  • Use FS to save the novel content value in TXT file

Randomly selected novel sites

Gets all chapter page addresses of the novel

@param {*} body */ const booksQuery = function (body) {$= cheerio.load(body); booksName = $('.btitle').find('h1').text(); / / novel name $(' chapterlist '), find (' a '). Each (function (I, e) {/ / get chapter UrlList list. Push ($(e). Attr (' href ')}); createFolder(path.join(__dirname, `/book/${booksName}.txt`)); CreateWriteStream (path.join(__dirname, ` / book / ${booksName}. TXT `)) / / create the TXT file console log (` began to write "${booksName}" · · · · · · · `) getBody (); // Get section information}Copy the code

Use Request to get section information

/** / const getBody = function () {let primUrl = url + list[count]; // console.log(primUrl) request(primUrl, function (err, res, body) { if (! err && res.statusCode == 200) { toQuery(body); } else { console.log('err:' + err) } }) };Copy the code

Get the chapter title and its chapter content using Cheerios

@param {any} body */ const toQuery = function (body) {$= cheerio.load(body); const title = $('h1').text(); / / get chapter title const content = Trim ($(' # content). The text (), "g"); // Get the text content of the current chapter and remove all Spaces writeFs(title, content) of the text; }Copy the code

Use FS to save the novel content value in TXT file

@param {*} title * @param {*} content */ const writeFs = function (title, AppendFile (path.join(__dirname, '/book/${booksName}.txt'), title, function (err) { if (err) throw err; }); fs.appendFile(path.join(__dirname, `/book/${booksName}.txt`), content, Function (err) {if (err) {console.log(err)} else {console.log(title + '... ') if (count + 1 < list.length) { Count = count + 1; getBody(); }}}); }Copy the code

expand

To climb other novel websites, only by modifying THE URL address, booksQuery method and its toQuery method, the page node information filtering can be reused.

conclusion

Node is so easy and convenient in an era where everything you can do with JS will be implemented with JS.