A magical country
Puppeteer is Google’s headless browser plugin that does a lot of things, such as crawling, running JAVASCRIPT scripts, capturing web pages to generate images, automated testing, etc. I mainly use javascript scripts to get an encrypted return value. Why write this article? It is also the demand of business, the reptilian brothers can not crack a key parameter, so this glorious and great task to our front hand, no nonsense, for the year-end bonus, open dry… Ps: the crawler is risky, careful [API reference address] (https://www.kancloud.cn/luponu/puppeteer/870133)
Initialize Nodejs
I did it using egg.js. Why do I use it? Because fool… Hahaha, run the following command directly to generate the project:
| ` `` $ mkdir puppeteer && cd puppeteer $ npm init egg --type=simple $ npm i `` `| | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | start projects: |` ``
$ npm run dev $ open http://localhost:7001
`` ` |
| --------------------------------------------------
Copy the code
Three, no brain pack
Add whitelists and cross-domain configuration
We first add config.default.js and config.prod.js to the config Settings file, which can be directly copied from the default config.local.js. Here we mainly configure some local development information and online information. Since I used a static page to load a script, I have configured two live addresses, and I give the relevant configuration items directly
The config. Default. Js configuration
const config = exports = {
// Configure a cross-domain whitelist
security: {
csrf: {
enable: false.ignoreJSON: true,},domainWhiteList: [ The '*'].// Configure the whitelist
},
cors: {
origin: The '*'.// Allow all cross-domain access
credentials: true.// Allow cookies to cross domains
allowMethods: 'GET,HEAD,PUT,POST,DELETE,PATCH',}};Copy the code
The config. Local. Js configuration
const userConfig = {
htmlUrl:'http://127.0.0.1:8899/public/index.html'.// Change it according to your own needs
};
Copy the code
The config. Prod. Js configuration
const userConfig = {
htmlUrl:'http://xxxxx/public/index.html'.// Change it according to your own needs
};
Copy the code
Copy our static pages and scripts in the app folder
Later we need to use Node to open this page, call the js function inside to execute the logic, get what we need
To pack
/ / execution
npm install --save puppeteer
Copy the code
Logical development provides interfaces
Analysis: We need to open the static page prepared in advance, and then execute the functions on the page to obtain the required parameters. Here we first write the interface under router.js, and then write the logic under controller.js
Router.js is as follows:
router.post('/getxb', controller.home.getxb);
Copy the code
Controller.js code is as follows:
// Get xB encryption parameters
async getxb() {
const { ctx } = this;
const data = await ctx.helper.puppeteer(this)
if(data){
ctx.body = {
code:'Success'.data:data,
msg:'success'
};
ctx.status=200
}else{
ctx.body = {
code:'error'.data:null.msg:'System busy'
};
ctx.status=500}}Copy the code
I’ve put the core logic in helper.js below extend
Helper.js looks like this:
const puppeteer = require('puppeteer');
let browser=null,page=null;
exports.puppeteer = async (that) => {
// Create a Browser instance and set the parameters
const { ctx } = that;
const {xb} = ctx.request.body;
if(! browser){ browser =await puppeteer.launch({
headless: true.// Whether to run the browser in headless mode. The default is true
defaultViewport: null.// Set a default viewport size for each page. The default is 800x600. If null, the view port is disabled.
// timeout: 0, // The maximum time (in milliseconds) to wait for the browser instance to start. The default is 30000 (30 seconds). Use 0 to disable timeout.
ignoreHTTPSErrors: true.// Whether to ignore HTTPS errors during navigation. The default is false
args: [
'--no-sandbox',]});// Create a Page instance
page = await browser.newPage();
// It's easy to get into a pit here. Traditionally, we might just pass down our parameters, but that doesn't work, because this is our Node environment and the parameters need to be passed to our static page, so I hung them above the URL
await page.goto(`${that.config.htmlUrl}? xb=${xb}`);
}
// Execute native Js methods
return await page.evaluate(() = > {
return window.byted_acrawler.getxb(window.location.href.split('? ') [1].null); })}exports.closepuppeteer = async (that) => {
// Close the window when calling this interface
const { ctx } = that;
if(browser){
await browser.close();
browser=null;
page=null;
return 'success'
}else{
return null}}Copy the code
Ps: Why is the variable declaration outside? There is an issue of efficiency, because opening and closing the window frequently takes a lot of time. This is to provide an interface for other people to use, so I only open it once.
Try it, Postman
Let’s try it out with the parameters that we need, and you can see that we have the parameters that we need
Ok, come and try this miracle!!