preface
Idle at home over the weekend, brush the WeChat, playing with a mobile phone, found himself WeChat head the changed, go online to find the head, looking at pictures, his thought as a code, the agriculture, can make the pictures are climbing down a a WeChat small programs, to start, know about the basic all know how to do it, share a wave to everyone.
directory
- Install Node and download the dependencies
- Set up service
- Request the page we want to climb, return JSON
Install the node
To start installing Node, you can go to the node official website to download nodejs.org/zh-cn/ and run Node after downloading.
node -vCopy the code
The version number you installed will appear after successful installation.
Next we use Node, print hello World, create a new file named index.js and enter it
console.log('hello world')Copy the code
Run this file
node index.js
Copy the code
It prints Hello World to the control panel
Setting up the server
Create a new folder named node.
First you need to download the Express dependency
npm install express
Copy the code
Create a new directory named demo.js as shown below:
Introduce download express in demo.js
const express = require('express');
const app = express();
app.get('/index'.function(req, res) {
res.end('111')
})
var server = app.listen(8081, function() {
var host = server.address().address
var port = server.address().port
console.log("Application instance, access at http://%s:%s", host, port)
})
Copy the code
Run Node Demo.js and the simple service is set up, as shown below:
Request the page we want to climb
Request the page we want to climb
npm install superagent
npm install superagent-charset
npm install cheerio
Copy the code
Superagent is a lightweight, progressive Ajax API that is easy to read and has a low learning curve. It relies on the nodeJS native REQUEST API and can be used in nodeJS environments, as well as HTTP requests
Superagent-charset prevents crawled data from garbled characters and changes the character format
Cheerio for server custom, fast, flexible, implementation of jQuery core implementation. Once the dependencies are installed, you can import them
var superagent = require('superagent');
var charset = require('superagent-charset');
charset(superagent);
const cheerio = require('cheerio');
Copy the code
After introducing the request our address, https://www.qqtn.com/tx/weixintx_1.html, as shown in figure:
Declare address variable:
const baseUrl = 'https://www.qqtn.com/'
Copy the code
Now that this is done, it’s time to send the request. Look at the complete code demo.js
var superagent = require('superagent');
var charset = require('superagent-charset');
charset(superagent);
var express = require('express');
var baseUrl = 'https://www.qqtn.com/'; // Cheerio = require(const cheerio = require('cheerio');
var app = express();
app.get('/index'.function(req, res) {// Set the request header res.header("Access-Control-Allow-Origin"."*");
res.header('Access-Control-Allow-Methods'.'PUT, GET, POST, DELETE, OPTIONS');
res.header("Access-Control-Allow-Headers"."X-Requested-With");
res.header('Access-Control-Allow-Headers'.'Content-Type'); / var/typetype= req.query.type; Var page = req.query.page;type = type || 'weixin';
page = page || '1';
var route = `tx/${type}tx_${page}.html '// the page information is gb2312, so the chaeset should be.charset('gb2312'), general web pages are UTF-8 and can be used directly. Charset ('utf-8')
superagent.get(baseUrl + route)
.charset('gb2312')
.end(function(err, sres) {
var items = [];
if (err) {
console.log('ERR: ' + err);
res.json({ code: 400, msg: err, sets: items });
return;
}
var $ = cheerio.load(sres.text);
$('div.g-main-bg ul.g-gxlist-imgbox li a').each(function(idx, element) {
var $element = $(element);
var $subElement = $element.find('img');
var thumbImgSrc = $subElement.attr('src');
items.push({
title: $(element).attr('title'),
href: $element.attr('href'),
thumbSrc: thumbImgSrc
});
});
res.json({ code: 200, msg: "", data: items });
});
});
var server = app.listen(8081, function() {
var host = server.address().address
var port = server.address().port
console.log("Application instance, access at http://%s:%s", host, port)
})
Copy the code
Running demo.js returns the data we received, as shown in the following figure:
A simple Node crawler is done. I hope you can click a STAR on the project as your recognition and support for this project, thank you.
Project address: github.com/Mr-MengBo/R…