I wrote an article before about using Node.js to develop an information crawler, in which HTML content is extracted and page elements are extracted using Cheerio Settings. If only one website is captured, there is no problem, but if multiple websites are captured, a lot of code with similar logical structure will be generated, as shown in the screenshot below. So we optimized it

Effect achieved

The corresponding data can be extracted based on the set data structure and the extracted elements

The data structure

let obj = {
  title: { dom: '.title-link', target: 'text' },
  link: { dom: '.title-link', target: 'attr', attrName: 'href' },
  content: { dom: '.content-text', target: 'text'}}Copy the code

Data results

[ { title: 'I am the title',
    link: 'https://juejin.cn',
    content: 'I am content'}]Copy the code

The implementation code

  extract() {// List elementslet$(this.zonedom).find(this.listdom) nodelist.each ((I, e) => { Keys (this.datadoms).foreach (objEle => {})})}Copy the code

Detailed code address: extract.js

That’s it 🙂