Case one crawls the web page
Create index.js in the project, open the terminal, and install package.json
npm init -y
Copy the code
Install the Express, Requests package
npm i express requests
Copy the code
Fs.writefile (file, data[, options], callback)
let requests = require('requests')
let fs = require('fs')
requests('https://www.jsdaima.com/js/demo/1358.html')
.on('data'.function(chunk){
fs.writeFile('index.html',chunk,function(){
console.log('save successfully')})})Copy the code
The terminal executes node index.js and crawls out index.html as shown below
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta name="viewport" content="Width = device - width, minimum - scale = 1.0, the maximum - scale = 1.0"</title> <meta name="keywords" content="Up, down, left, right, seamless scrolling,jQuery plugin" />
<meta name="description" content="Scroll up, down, left, and right seamlessly to download the jQuery plugin. Realize up, down, left and right automatic scrolling, seamless scrolling. />
<meta name="author" content="Js code" />
<meta name="copyright" content="Js code" />
<style>
* {
margin: 0px;
padding: 0px;
font-family: Microsoft Yahei; } html, iframe, body { height: 100% } .none { display: none ! important } @media screen and (max-width: 640px) {#mobileFrame {
display: none !important;
}
}
#hidemobile {
font-size: 14px;
font-weight: bold;
border: 1px solid silver;
position: absolute;
right: 20px;
top: 8px;
width: 15px;
height: 15px;
text-align: center;
padding: 0;
line-height: 15px;
border-radius: 15px;
cursor: pointer;
}
</style>
<script type="text/javascript" src="/static/js/protect.js"></script>
</head>
<body><iframe src="https://www.jsdaima.com/Uploads/js/201803/1522376449/index.html" frameborder="0" width="100%"
height="100%"></iframe></body>
</html>
Copy the code
As you can see he is through the iframe embeds a page, so we’re going to climb again from https://www.jsdaima.com/Uploads/js/201803/1522376449/index.html
let requests = require('requests')
let path = require('path')
let fs = require('fs')
requests('https://www.jsdaima.com/Uploads/js/201803/1522376449/index.html')
.on('data'.function(chunk){
fs.writeFile('index.html',chunk,function(){
console.log('save successfully')})})Copy the code
In the end, index.html was successfully climbed. In addition, we can see from his code that the following files are needed. Jquery can be obtained through BootCDN
<script type="text/javascript" src="/ static/js/jquery - 1.10.2 mins. Js." "></script>
<link rel="stylesheet" href="css/demo.css"/>
<script src="js/rollslide.js"></script>
Copy the code
Change SRC according to the file path
let requests = require('requests')
let path = require('path')
let fs = require('fs')
requests('https://www.jsdaima.com/Uploads/js/201803/1522376449/css/demo.css')
.on('data'.function(chunk){
fs.writeFile('demo.css',chunk,function(){
console.log('save successfully')})})Copy the code
The same goes for js files
Case 2: Crawl data from script in HTML file of web page
Objective: To crawl the epidemic data of lilac Garden
From the web page of Lilac Garden, it can be seen that his data is put in HTML script, rather than ajax request.
Each script has a separate ID on it, and the required script is obtained with the help of CHEERio of NPM
Cheerio var cheerio = require('cheerio'),
const $ = cheerio.load(chunk)
Copy the code
As you can see, the epidemic data is stored in the getAreaStat property of the Window object. Node does not have Windows, so you need to add a Window object, so that when the data is retrieved, it will not report an error when stored in the Window object
let window={}
Copy the code
So the Cheerio is crawling out a string, and you need to use eval to turn it into JS.
eval($('#getAreaStat').html())
Copy the code
Convert the window getAreaStat to a JSON string and save it in data. JSON
let requests = require('requests')
let fs = require('fs')
let cheerio = require('cheerio')
requests('https://ncov.dxy.cn/ncovh5/view/pneumonia_peopleapp?from=timeline&isappinstalled=0')
.on('data'.function(chunk){
let window={}
const $ = cheerio.load(chunk)
eval($('#getAreaStat').html()) // Convert window getAreaStat to JSON string and save it in data.json fs.writefile ('data.json',JSON.stringify(window.getAreaStat),function(){
console.log('save successfully')})})Copy the code
To be successful