Puppeteer update v10.0.0, Chromium 92.0.4512.0 (r884014)

Contains a large number of examples ~ recommended to eat with DEMO, the effect is better

Github.com/lzsheng/pup…

Introduction to the

Puppeteer is a Node library that provides a high-level API for controlling Chromium or Chrome through the DevTools Protocol (CDP). Puppeteer runs in headless mode by default, but can be run in headless mode by modifying the configuration file.

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

What can Puppeteer do?

Most of the things you can do manually in a browser can be done using Puppeteer! Here are some examples:

  • Web page screenshots or PDF generation
  • Crawl SPA or SSR sites
  • UI automation test, simulate form submission, keyboard input, click and other behavior
  • Capture the Timeline trace for the site to help analyze performance issues.
  • Create an up-to-date test automation environment and run test cases using the latest JS and the latest Chrome browser
  • Test the Chrome extension

What is Headless Chrome

Use the command line or programming language to operate Chrome without human intervention, and run more stably. Add the parameter “headless” to start Chrome

Mac OS X command alias
alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"  
# Enable remote debugging
chrome --headless --remote-debugging-port=9222 --disable-gpu
Get the page DOM
chrome --headless --disable-gpu --dump-dom https://www.baidu.com
Copy the code

Learn more about Headless Chrome

Chrome DevTool Protocol

Before learning about Puppeteer, let’s take a look at Chrome DevTool Protocol

  • CDP is based on WebSocket and uses WebSocket to realize fast data channel with browser kernel
  • CDP is divided into multiple domains (DOM, Debugger, Network, Profiler, Console…). Commands and Events are defined for each domain.
  • Some tools can be used to debug and analyze Chrome based on CDP. For example, the Chrome Developer Tool is implemented based on CDP
  • Many useful tools are implemented based on CDP, such as Chrome Developer Tools, Chrome-remote-Interface, Puppeteer, etc

Learn more about CDP getting started

The Puppeteer API is layered

The API hierarchy in Puppeteer is basically the same as that in the browser. Here are some of the classes that are commonly used:

  • Puppeteer: uses the DevTools Protocol (CDP) to communicate with the browser.
  • Browser: Corresponding to a Browser instance, a Browser can contain multiple BrowserContext.
  • BrowserContext: a BrowserContext can have multiple pages.
  • Page: A browser Tab Page must have at least one Frame.
  • Frame: Frame. Each Page has one MainFrame(page.mainframe ()), or multiple subframes, created primarily by the iframe tag
  • ExecutionContext: Is the javascript execution environment. Each Frame has a default javascript execution environment

The installation

puppeteer

When you install Puppeteer, it downloads the latest version of Chromium (~170MB for Mac, ~282MB for Linux, ~280MB for Win) to make sure you can use the API.

# use NPM
npm i puppeteer 
# to use yarn
yarn add puppeteer
Copy the code

puppeteer-core

Since version 1.7.0, the official maintenance team has released a puppeteer-Core package that doesn’t download Chromium by default. Puppeteer-core is a lightweight version of Puppeteer used to launch an existing browser installation or connect to a remote installation.

# use NPM
npm i puppeteer-core 
# to use yarn
yarn add puppeteer-core
Copy the code

Introduction to common apis

How do I create a Browser instance

  • puppeteer.launch([options])

When Puppeteer is connected to a Chromium instance, a Browser object can be created using puppeteer.launch or puppeteer.connect.

// launch.js
const puppeteer = require("puppeteer");

// Use puppeteer.launch to launch Chrome
(async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
    slowMo: 100.// Slow down browser execution to facilitate test observation
    defaultViewport: { width: 1400.height: 900 },
    args: [
      / / start Chrome parameters, see https://peter.sh/experiments/chromium-command-line-switches/
      "- no - the sandbox"."-- the window - size = 1400900"]});const page = await browser.newPage();
  await page.goto("https://www.baidu.com");
  await page.close();
  awaitbrowser.close(); }) ();Copy the code

The page navigation

  • Page. goto: Opens a new page
  • Page. goBack: Goes back to the previous page
  • Page. goForward: Advance to the next page
  • Page. reload: reloads the page
// navigation.js
const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
    slowMo: 200.defaultViewport: { width: 1400.height: 900 },
    args: ["--start-fullscreen"].// Open the page in full screen
  });
  const page = await browser.newPage();
  await page.goto("https://www.baidu.com");
  await page.goto("https://juejin.cn/");
  await page.goBack(); / / back to back
  await page.goForward(); / / to go forward
  await page.reload(); / / refresh
  await page.close();
  awaitbrowser.close(); }) ();Copy the code

How do I wait to load?

In practice, we often encounter problems such as how to judge when a page is loaded, when to take a screenshot, when to click a button and so on. How do we wait for the page to load?

Loading the navigation page

  • Page.goto (URL [, options]) : Opens a new page
  • Page. GoBack (URL [, options]) : rewind to the previous page
  • Page.goforward (URL [, options]) : advances to the next page
  • Page.reload (URL [, options]) : reloads the page
  • Page.waitfornavigation ([options]) : Waits for the page to jump
    • options
      • Timeout Indicates the jump wait time. The unit is milliseconds. The default value is 30 seconds. Through page. SetDefaultNavigationTimeout (timeout) method to change the default values
      • WaitUntil < string | Array > meet what condition that leads to complete, the default is the load event fires. Specify an array of events so that the jump is not considered complete until all events fire. Events include:
        • Load – When the page load event is triggered
        • Domcontentloaded – When the page’s domContentLoaded event is triggered
        • Networkidle0 – Triggered when there is no more network connection (after at least 500 ms)
        • Networkidle2 – Triggered when there are only 2 network connections (at least 500 ms later)

Wait for elements, requests, responses

  • page.waitForRequest(urlOrPredicate[, options])

Triggered when the specified request is sent

  • page.waitForResponse(urlOrPredicate[, options])

Fired while waiting for the specified response to return

  • page.waitForSelector(selector[, options])

The element waiting for the specified selector to match appears on the page

  • page.waitForXPath(xpath[, options])

The element waiting for the specified xpath match appears on the page

// navigationWait.js
const puppeteer = require("puppeteer");

// Use puppeteer.launch to launch Chrome
(async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
    defaultViewport: { width: 1400.height: 900 },
    args: ["--start-fullscreen"].// Open the page in full screen
  });

  let page = null;

  // Ignore this method for now, as we'll explain later
  const pageAlert = async (page, pageMsg) => {
    await page.evaluate((msg) = > {
      alert(msg);
    }, pageMsg);
  };
  
  // The default load event is triggered and the load is considered complete
  page = await browser.newPage();
  await page.goto("https://juejin.cn/");
  await pageAlert(page, "default");
  await page.close();
  
  // Configure Networkidle0 with waitUntil
  page = await browser.newPage();
  await page.goto("https://juejin.cn/", {
    // waitUntil: 'load', // wait for the 'load' event to trigger
    // waitUntil: 'domcontentloaded', // wait for the 'domcontentloaded' event to trigger
    waitUntil: "networkidle0".// there is no network connection within 500ms
    // waitUntil: 'networkidle2' // the number of network connections should not exceed 2 within 500ms
  });
  await pageAlert(page, "waitUntil: networkidle0");
  await page.close();
  
  / / by waitForTimeout
  page = await browser.newPage();
  await page.goto("https://juejin.cn/");
  await page.waitForTimeout(3000);
  await pageAlert(page, "waitForTimeout");
  await page.close();
  
  / / by waitForResponse
  page = await browser.newPage();
  await page.goto("https://juejin.cn/");
  await page.waitForResponse(
    "https://i.snssdk.com/log/sentry/v2/api/slardar/batch/"
  );
  await pageAlert(page, "waitForResponse");
  await page.close();
  
  / / by waitForSelector
  page = await browser.newPage();
  await page.goto("https://juejin.cn/");
  await page.waitForSelector(".entry-box");
  await pageAlert(page, "waitForSelector");
  await page.close();

  awaitbrowser.close(); }) ();Copy the code

How do I execute injected javascript code in a browser environment

  • page.evaluate(pageFunction[, …args])
    • PageFunction < XSL: | string > to be executed in page instance context method
    • . args <… The Serializable | JSHandle > to pageFunction parameters
    • Return: pageFunction result of execution

If pageFunction returns a Promise, Page. Evaluate waits for the Promise to complete and returns its return value. If pageFunction returns a value that cannot be serialized, undefined is returned

const puppeteerVar = 7; / / puppeteer variables
const result = await page.evaluate(pageVar= > {
  // Access the puppeteer variable in a browser
  console.log(8 * pageVar); // Output 56 in the browser
  return Promise.resolve(8 * pageVar);
}, puppeteerVar);
console.log(result); // In node log output "56"
Copy the code

Function display

screenshots

page.screenshot([options])

// screenshot.js
const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
    defaultViewport: { width: 1400.height: 900 },
    args: ["--start-fullscreen"].// Open the page in full screen
  });

  const page = await browser.newPage();
  await page.goto("https://juejin.cn/", { waitUntil: "networkidle0" });
  // Take a screenshot of the entire page
  await page.screenshot({
    path: "./temp/capture.png".// Image save path
    type: "png".fullPage: true.// Take screenshots while scrolling
    // Clip: {x: 0, y: 0, width: 1920, height: 800} // Specify clipping area
  });
  // Take a screenshot of an element on the page
  let element = await page.$('.logo');
  await element.screenshot({
      path: './temp/element.png'
  });
  await page.close();
  awaitbrowser.close(); }) ();Copy the code

Obtaining performance Indicators

tracing.start(options) & tracing.stop()

  • options
    • Path Tracks the path where files are written
    • Screenshots capture screenshots in tracing
    • Categories specifies the custom category replacement default to use

Use tracing. Start and tracing. Stop to create a trace file that can be opened in Chrome DevTools or Timeline Viewer.

const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false.args: [],});const page = await browser.newPage();
  await page.tracing.start({ path: "trace.json" });
  await page.goto("https://www.baidu.com");
  awaitpage.tracing.stop(); browser.close(); }) ();Copy the code

You can visualize the report by placing the generated trace.json on Chrome

page.metrics()

  • Timestamp (When the metrics sample was taken)
  • Documents The number of documents on the page.
  • Frames The number of frames on the page.
  • JSEventListeners The number of JS events on the page.
  • Nodes The number of DOM nodes on the page.
  • LayoutCount The number of layouts for whole pages or parts of pages.
  • RecalcStyleCount Page style recalculation.
  • LayoutDuration Total page layout time.
  • RecalcStyleDuration Page style recalculation total time.
  • ScriptDuration Total time of page JS code execution.
  • TaskDuration Total page task execution time.
  • JSHeapUsedSize The size of heap memory occupied by pages.
  • JSHeapTotalSize Total page heap memory size.
const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: true});const page = await browser.newPage();
  await page.goto("https://www.baidu.com", { waitUntil: "networkidle0" });
  const res = await page.metrics();
  console.log(res); browser.close(); }) ();Copy the code

Injecting javascript code

Execute javascript code in a browser environment

  • page.evaluate(pageFunction[, …args])
    • PageFunction The method to execute in the context of the page instance
    • . Args The argument to pass to pageFunction

MainFrame ().evaluate(pageFunction,… The args) shorthand

// evaluate.js
const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false.defaultViewport: { width: 1400.height: 900 },
    args: ["--start-fullscreen"].devtools: true});const page = await browser.newPage();
  await page.goto("https://www.baidu.com", {
    waitUntil: "networkidle0"});const performance = JSON.parse(
    await page.evaluate(() = > {
      console.log("Hi, puppeteer");
      return JSON.stringify(window.performance.timing); }));console.log(performance);

  await page.waitForTimeout(5000);
  awaitbrowser.close(); }) ();Copy the code

page.evaluateOnNewDocument(pageFunction[, …args])

The specified function is called before the owning page is created and any script of the owning page is executed. Often used to modify the javascript environment of a page, such as seeding math.random, etc.

Example: Override the console.log method

// evaluateOnNewDocument.js
const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false.devtools: true});const page = await browser.newPage();
  await page.evaluateOnNewDocument(function () {
    console.log("page.evaluateOnNewDocument");
    const log = console.log;
    console.log = function (. par) {
      log("%c log "."color: green;". par); }; });await page.goto("https://www.baidu.com");
  await page.evaluate(async() = > {console.log(window.navigator.userAgent);
  });

  await page.waitForTimeout(20000);
  awaitbrowser.close(); }) ();Copy the code

Browser Console output

Code in the browser

In the browser, call the Node.js method

ExposeFunction (name, puppeteerFunction) This method adds a method named name to the page’s window object when the name method is called, PuppeteerFunction is executed in Node.js and returns a Promise object, which returns the value of puppeteerFunction after parsing

Example: call the method readfile defined by Node in js

// exposeFunction.js
const puppeteer = require("puppeteer");
const path = require("path");
const fs = require("fs");

(async() = > {const browser = await puppeteer.launch({
    headless: false.devtools: true});const page = await browser.newPage();

  await page.exposeFunction("readfile".async (filePath) => {
    return new Promise((resolve, reject) = > {
      fs.readFile(filePath, "utf8".(err, text) = > {
        console.log("fs.readFile");
        if (err) reject(err);
        else resolve(text);
      });
    });
  });

  const filePath = path.resolve("./exposeFunction.js");

  await page.evaluate(async (filePath) => {
    // Read the contents of the file with window.readfile
    console.log(window.readfile);
    const content = await window.readfile(filePath);
    document.querySelector("body").innerText = content;
  }, filePath);
    await page.waitForTimeout(10000);
    awaitbrowser.close(); }) ();Copy the code

The browser code simply defines readfile as an asynchronous method

To simulate the login

  • page.$(selector)

This method performs document.querySelector within the page. If no element matches the specified selector, the return value is null.

  • elementHandle.type(text[, options])

Focus elements, and then send keyDown, KeyPress/INPUT, and KeyUP events for each character in the text.

  • elementHandle.click([options])

The click corresponding to elementHandle is triggered

Simulated gold digger landing

// login
const puppeteer = require("puppeteer");

(async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
    slowMo: 100.defaultViewport: { width: 1400.height: 900 },
    args: ["--start-fullscreen"].// Open the page in full screen
  });

  const page = await browser.newPage();
  await page.goto("https://juejin.cn/");

  await page.waitForSelector("button.login-button");
  const showLoginModalBtnEle = await page.$("button.login-button");
  await showLoginModalBtnEle.click();
  // The login popover is rendered
  await page.waitForSelector("form.auth-form");

  // Click another login method
  const otherLoginTypeEle = await page.$("span.clickable")
  await otherLoginTypeEle.click()

  // Enter the account
  const accountInputEle = await page.waitForSelector("input[name='loginPhoneOrEmail']");
  await accountInputEle.type("[email protected]", { delay: 20 });

  // Enter the password
  const pwdInputEle = await page.$("input[name='loginPassword']");
  await pwdInputEle.type("0099 @ the & # 123123", { delay: 20 });

  // Click the login button
  const submitBtnEle = await page.$("form.auth-form .btn");
  await submitBtnEle.click();

  await page.waitForTimeout(2000);

  await page.close();
  awaitbrowser.close(); }) ();Copy the code

Simulate different devices

  • page.emulate(options)

Simulate iPhone 6 device information

// devices.js
const puppeteer = require("puppeteer");
const iPhone = puppeteer.devices["iPhone 6"]; // Puppeteer. devices Pre-set values for a large number of puppeteer.devices

// Use puppeteer.launch to launch Chrome
(async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
  });
  const page = await browser.newPage();
  await page.emulate(iPhone);
  await page.goto("https://www.taobao.com", { waitUntil: "networkidle0" });
 
  awaitbrowser.close(); }) ();Copy the code

Request to intercept

  • page.setRequestInterception(boolean)

Enabling the request interceptor activates the Request. Abort, Request. Continue, and Request. Respond methods. This provides the ability to modify network requests made by the page. Once request interception is enabled, every request is stopped unless it continues, responds, or aborts.

Cancel all image requests through the request blocker

const puppeteer = require('puppeteer');

puppeteer.launch().then(async browser => {
  const page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on('request'.interceptedRequest= > {
    if (interceptedRequest.url().endsWith('.png') || interceptedRequest.url().endsWith('.jpg'))
      interceptedRequest.abort();
    else
      interceptedRequest.continue();
  });
  await page.goto('https://www.baidu.com');
  await browser.close();
});
Copy the code

Modified nuggets – Author list interface return

// requestInterception.js
const puppeteer = require("puppeteer");
const mock = {
  err_no: 0.err_msg: "success".data: [{user_id: "123123".user_name: "I'm an analog user.".got_digg_count: 7314.got_view_count: 506897.avatar_large:
        "https://sf6-ttcdn-tos.pstatp.com/img/user-avatar/bfc66a5d7055015e8c7f6b7944dfe747~300x300.image".company: "aaaaa".job_title: "Public Account".level: 5.description: "https://github.com/newbee-ltd".author_desc: "".isfollowed: false],},cursor: "20".count: 99.has_more: true}; (async() = > {const browser = await puppeteer.launch({
    headless: false.// A browser interface is started
    defaultViewport: { width: 1400.height: 900 },
    args: [
      "--start-fullscreen".// Open the page in full screen
      "--disable-web-security".// Disable the same-origin policy]});const pages = await browser.pages();
  const page = pages[0];
  // Request interception
  await page.setRequestInterception(true);
  page.on("request".(interceptedRequest) = > {
    if (
      interceptedRequest.url().indexOf("/user_api/v1/author/recommend") > -1
    ) {
      interceptedRequest.respond({
        status: 200.contentType: "application/json; charset=utf-8".body: JSON.stringify(mock),
      });
    }

    // https://github.com/puppeteer/puppeteer/issues/3853
    return Promise.resolve()
      .then(() = > interceptedRequest.continue())
      .catch((e) = > {});
  });

  await page.goto("https://juejin.cn/recommendation/authors/recommended");
  await page.waitForTimeout(5000);
  awaitbrowser.close(); }) ();Copy the code

Modified interface page display effect

Extension: We can use this function to do some mock data reading logic, combined with DOM structure recognition or screenshots +AI image recognition and other functions, to achieve some complex multi-state page display UI tests

The Puppeteer ecological

  • Chinese Api Documentation

  • Rize

Rize is a library that provides a top-level, smooth, and chainable API that makes puppeteer easy to use.

  • jest-puppeteer

A test library of Jest running on Puppeteer

  • mocha-headless-chrome

Mocha test library running on Puppeteer

  • expect-puppeteer

An assertion library based on Puppeteer

  • headless-chrome-crawler

A distributed crawler based on Puppeteer

The DEMO link

Github.com/lzsheng/pup…

subsequent

Puppeteer is implementing a web page automation performance test project, and we will see when we have time to write a document to communicate with you