A magical country

Puppeteer is Google’s headless browser plugin that does a lot of things, such as crawling, running JAVASCRIPT scripts, capturing web pages to generate images, automated testing, etc. I mainly use javascript scripts to get an encrypted return value. Why write this article? It is also the demand of business, the reptilian brothers can not crack a key parameter, so this glorious and great task to our front hand, no nonsense, for the year-end bonus, open dry… Ps: the crawler is risky, careful [API reference address] (https://www.kancloud.cn/luponu/puppeteer/870133)

Initialize Nodejs

I did it using egg.js. Why do I use it? Because fool… Hahaha, run the following command directly to generate the project:

| ` `` $ mkdir puppeteer && cd puppeteer $ npm init egg --type=simple $ npm i `` `| | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | start projects: |` ``
$ npm run dev $ open http://localhost:7001
`` ` |
| --------------------------------------------------
Copy the code

Three, no brain pack

Add whitelists and cross-domain configuration

We first add config.default.js and config.prod.js to the config Settings file, which can be directly copied from the default config.local.js. Here we mainly configure some local development information and online information. Since I used a static page to load a script, I have configured two live addresses, and I give the relevant configuration items directly

The config. Default. Js configuration
const config = exports = {
    // Configure a cross-domain whitelist
    security: {
      csrf: {
        enable: false.ignoreJSON: true,},domainWhiteList: [ The '*'].// Configure the whitelist
    },
    cors: {
      origin: The '*'.// Allow all cross-domain access
      credentials: true.// Allow cookies to cross domains
      allowMethods: 'GET,HEAD,PUT,POST,DELETE,PATCH',}};Copy the code
The config. Local. Js configuration
const userConfig = {
    htmlUrl:'http://127.0.0.1:8899/public/index.html'.// Change it according to your own needs
 };

Copy the code
The config. Prod. Js configuration
const userConfig = {
    htmlUrl:'http://xxxxx/public/index.html'.// Change it according to your own needs
 };

Copy the code
Copy our static pages and scripts in the app folder

Later we need to use Node to open this page, call the js function inside to execute the logic, get what we need

To pack

/ / execution
npm install --save puppeteer
Copy the code

Logical development provides interfaces

Analysis: We need to open the static page prepared in advance, and then execute the functions on the page to obtain the required parameters. Here we first write the interface under router.js, and then write the logic under controller.js

Router.js is as follows:
 router.post('/getxb', controller.home.getxb);
Copy the code
Controller.js code is as follows:
 // Get xB encryption parameters
  async getxb() {
    const { ctx } = this;
    const data = await ctx.helper.puppeteer(this)
    if(data){
      ctx.body = {
        code:'Success'.data:data,
        msg:'success'
      };
      ctx.status=200
    }else{
      ctx.body = {
        code:'error'.data:null.msg:'System busy'
      };
      ctx.status=500}}Copy the code

I’ve put the core logic in helper.js below extend

Helper.js looks like this:
const puppeteer = require('puppeteer'); 
let browser=null,page=null;
exports.puppeteer  = async (that) => {    
    // Create a Browser instance and set the parameters
    const { ctx } = that;
    const {xb} = ctx.request.body;
    if(! browser){ browser =await puppeteer.launch({
            headless: true.// Whether to run the browser in headless mode. The default is true
            defaultViewport: null.// Set a default viewport size for each page. The default is 800x600. If null, the view port is disabled.
            // timeout: 0, // The maximum time (in milliseconds) to wait for the browser instance to start. The default is 30000 (30 seconds). Use 0 to disable timeout.
            ignoreHTTPSErrors: true.// Whether to ignore HTTPS errors during navigation. The default is false
            args: [
                '--no-sandbox',]});// Create a Page instance
        page = await browser.newPage();
        // It's easy to get into a pit here. Traditionally, we might just pass down our parameters, but that doesn't work, because this is our Node environment and the parameters need to be passed to our static page, so I hung them above the URL
        await page.goto(`${that.config.htmlUrl}? xb=${xb}`);
    }
    // Execute native Js methods
    return await page.evaluate(() = > {
        return window.byted_acrawler.getxb(window.location.href.split('? ') [1].null); })}exports.closepuppeteer  = async (that) => {    
    // Close the window when calling this interface
    const { ctx } = that;
    if(browser){
        await browser.close();
        browser=null;
        page=null;
        return 'success'
    }else{
        return null}}Copy the code

Ps: Why is the variable declaration outside? There is an issue of efficiency, because opening and closing the window frequently takes a lot of time. This is to provide an interface for other people to use, so I only open it once.

Try it, Postman

Let’s try it out with the parameters that we need, and you can see that we have the parameters that we need

Ok, come and try this miracle!!