When it comes to crawler, it is not as complicated as imagined. The principle is to send a request to the target URL and then parse the response into the data format we want. If Token authentication is involved, it is more complicated.
Node. js implements crawler recommendation two libraries, Request and Cheerio
npm install request
npm install cheerio
Copy the code
Request is used to send requests. Cheerio is a quick, concise and flexible implementation of the core functions of jquery. It can directly perform Jquery-style DOM operations on the HTML of the page returned by request.
(1) DOM crawling
Here is a simple example of a crawler. Let’s take the username of Jane’s home page
- Open the page you want to climb and find the DOM node for the user name
- As per jquery selectors, this should be
$('.main-top>.title>a').text()
Attach a code
const request = require('request')
const cheerio = require('cheerio')
request('https://www.jianshu.com/u/5b23cf5114a1', (err, res) => {
if (err) {
console.log(err.code)
}
else {
let $ = cheerio.load(res.body)
console.log($('.main-top>.title>a').text())
}
})
Copy the code
(2) List crawling
If you want to crawl a list, such as my simple book Blog list, again, use jquery’s each method, which iterates through the matching DOM in the selector. Follow the same steps to find the DOM node before parsing it.
Attach reference code
const request = require('request')
const cheerio = require('cheerio')
request('https://www.jianshu.com/u/5b23cf5114a1', (err, res) => {
if (err) {
console.log(err.code)
}
else {
let $ = cheerio.load(res.body)
let data = []
$('.note-list>li').each(function (i) {
data.push($(this).find('.title').text());
});
console.log(data)
}
})
Copy the code
This is the end of the static crawler, and will explain the crawling of dynamic web pages.