PuppeteerIt’s an official Google productDevTools agreementcontrolheadless ChrometheNodeLibrary. Can be achieved byPuppeteerProvides direct API controlChromeSimulate most user operationsUI TestOr as aThe crawlerVisit the page to collect data.

Chinese document

Zhaoqize. Making. IO/puppeteer – a…

Function:

What can be done?
  • Generate page PDF.
  • Grab SPA (single page application) and generate pre-rendered content.
  • Automatic form submission, UI testing, keyboard input, etc.
  • Create an automated test environment that is constantly updated. Perform tests directly in the latest version of Chrome using the latest JavaScript and browser features.
  • Capture the Timeline trace for the site to help analyze performance issues.
  • Test the browser extension.
Puppeteer is easy to install as it is an NPM package:

npm i puppeteer

or

yarn add puppeteer

How to use:

// Introduce the Puppeteer module
let puppeteer = require('puppeteer')

Puppeteer.launch Instantiates the browser
async function test() {
    // You can pass in an options object ({headless: false}), which can be configured as either no interface browser or with interface browser
    // No interface browser performance is higher and faster, the interface is generally used for debugging development
    let options = {
        // Set the window width and height
        defaultViewport: {width:1400.height:800
        },
        // Set to have interface, if true, no interface
        headless:false.// Set the number of milliseconds to slow down each step
        slowMo:250
    }
    let browser = await puppeteer.launch(options);

    // Call a new page
    let page = await browser.newPage();

    // The configuration requires access to the url
    await page.goto('http://www.baidu.com')
    
    / / screenshots
    await page.screenshot({path: 'test.png'});

    / / print the PDF
    await page.pdf({path: 'example.pdf'.format: 'A4'});

   // End the shutdown
    await browser.close();
}test()
Copy the code

// Get the page content
// the $$eval function allows the callback function to run in the browser, and to be output by the browser
    await page.$$eval('#head #s-top-left a'.(res) = >{
        //console.log(res);
        res.forEach((item,index) = > {
            console.log($(item).attr('href')); })})// Listen for console.log events
page.on('console'.(. args) = > {
    console.log(args);
})
// Get the page object and add the click event
   ElementHandle = await page.$$('#head #s-top-left a')
   ElementHandle[0].click();
Copy the code

// Search through form input
    inputBox = await page.$('#form .s_ipt_wr #kw')
    await inputBox.focus() // Cursor positioned in input box
    await page.keyboard.type('Node.js') // Enter content into the input box
    search = await page.$('.s_btn_wr input[type=submit]')
    search.click() // Click the search button
Copy the code

The crawler practice

Many web pages use user-agents to determine devices. This can be simulated using page.emulate(options). Options has two configuration items, one is userAgent, Viewport can set the width (width), height (height), screen scaling (deviceScaleFactor), whether mobile (isMobile), with or without touch event (hasTouch).

const puppeteer = require('puppeteer');
const devices = require('puppeteer/DeviceDescriptors');
const iPhone = devices['iPhone 6'];
​
puppeteer.launch().then(async browser => {
  const page = await browser.newPage();
  await page.emulate(iPhone);
  await page.goto('https://www.example.com');
  // other actions...
  await browser.close();
});
Copy the code

The above code simulates an iPhone6 visiting a website where devices are the simulation parameters for common devices built into the puppeteer.

Many web pages require login, and there are two solutions:

  • Click (selector[, options]) or focus(selector page. Focus). You can use page. Type (selector, text[, options]) to enter the specified string. You can also set the delay in options to be more like a real person. You can also use keyboard.down(key[, options]) to enter character by character.
  • If the login status is determined by cookie, you can use page.setcookie (… Cookies), and want to maintain cookies can be accessed periodically.
Tip: Some websites need to scan the code, but other pages of the same domain name have login, you can try to log in to the page that can log in, use cookie access to skip the scan code.
Refer to the official documentation for more powerful functions:Zhaoqize. Making. IO/puppeteer – a…

Today’s learning content is still more interesting, but learning is still more basic, in-depth words or not enough!!

Date: 2021/11/17

Learning Reference Video: *www.bilibili.com/video/BV1i7…