Puppeteer
It’s an official Google productDevTools agreement
controlheadless Chrome
theNode
Library. Can be achieved byPuppeteer
Provides direct API controlChrome
Simulate most user operationsUI Test
Or as aThe crawler
Visit the page to collect data.
Chinese document
Zhaoqize. Making. IO/puppeteer – a…
Function:
What can be done?
- Generate page PDF.
- Grab SPA (single page application) and generate pre-rendered content.
- Automatic form submission, UI testing, keyboard input, etc.
- Create an automated test environment that is constantly updated. Perform tests directly in the latest version of Chrome using the latest JavaScript and browser features.
- Capture the Timeline trace for the site to help analyze performance issues.
- Test the browser extension.
Puppeteer is easy to install as it is an NPM package:
npm i puppeteer
or
yarn add puppeteer
How to use:
// Introduce the Puppeteer module
let puppeteer = require('puppeteer')
Puppeteer.launch Instantiates the browser
async function test() {
// You can pass in an options object ({headless: false}), which can be configured as either no interface browser or with interface browser
// No interface browser performance is higher and faster, the interface is generally used for debugging development
let options = {
// Set the window width and height
defaultViewport: {width:1400.height:800
},
// Set to have interface, if true, no interface
headless:false.// Set the number of milliseconds to slow down each step
slowMo:250
}
let browser = await puppeteer.launch(options);
// Call a new page
let page = await browser.newPage();
// The configuration requires access to the url
await page.goto('http://www.baidu.com')
/ / screenshots
await page.screenshot({path: 'test.png'});
/ / print the PDF
await page.pdf({path: 'example.pdf'.format: 'A4'});
// End the shutdown
await browser.close();
}test()
Copy the code
// Get the page content
// the $$eval function allows the callback function to run in the browser, and to be output by the browser
await page.$$eval('#head #s-top-left a'.(res) = >{
//console.log(res);
res.forEach((item,index) = > {
console.log($(item).attr('href')); })})// Listen for console.log events
page.on('console'.(. args) = > {
console.log(args);
})
// Get the page object and add the click event
ElementHandle = await page.$$('#head #s-top-left a')
ElementHandle[0].click();
Copy the code
// Search through form input
inputBox = await page.$('#form .s_ipt_wr #kw')
await inputBox.focus() // Cursor positioned in input box
await page.keyboard.type('Node.js') // Enter content into the input box
search = await page.$('.s_btn_wr input[type=submit]')
search.click() // Click the search button
Copy the code
The crawler practice
Many web pages use user-agents to determine devices. This can be simulated using page.emulate(options). Options has two configuration items, one is userAgent, Viewport can set the width (width), height (height), screen scaling (deviceScaleFactor), whether mobile (isMobile), with or without touch event (hasTouch).
const puppeteer = require('puppeteer');
const devices = require('puppeteer/DeviceDescriptors');
const iPhone = devices['iPhone 6'];
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.emulate(iPhone);
await page.goto('https://www.example.com');
// other actions...
await browser.close();
});
Copy the code
The above code simulates an iPhone6 visiting a website where devices are the simulation parameters for common devices built into the puppeteer.
Many web pages require login, and there are two solutions:
- Click (selector[, options]) or focus(selector page. Focus). You can use page. Type (selector, text[, options]) to enter the specified string. You can also set the delay in options to be more like a real person. You can also use keyboard.down(key[, options]) to enter character by character.
- If the login status is determined by cookie, you can use page.setcookie (… Cookies), and want to maintain cookies can be accessed periodically.
Tip: Some websites need to scan the code, but other pages of the same domain name have login, you can try to log in to the page that can log in, use cookie access to skip the scan code.
Refer to the official documentation for more powerful functions:Zhaoqize. Making. IO/puppeteer – a…
Today’s learning content is still more interesting, but learning is still more basic, in-depth words or not enough!!
Date: 2021/11/17
Learning Reference Video: *www.bilibili.com/video/BV1i7…