This article is published in vivo Internet technology wechat public number link: mp.weixin.qq.com/s/P-YdQPOQ9… Wang Zhenzheng
Puppeteer is a Node.js package that provides a set of apis for manipulating Chrome, which is basically a Headless Chrome browser that can also be configured with a UI. Puppeteer can be used to crawl page data, take screenshots of pages or generate PDF files, perform front-end automated testing (simulating input/click/keyboard behavior) and capture site timelines to analyze site performance issues.
A, cause
Although Puppeteer is a Node.js package released by the Chrome team in 2017, it is rarely used in the team’s daily work. Some time ago, when developing a chat tool, we needed to introduce emojis, but the business side wanted to use Google Emojis, so we needed to save these images in Emojipedia. If so many images are saved one by one, it is useless development. The first thing that comes to mind is to call the API interface of the page, get the corresponding emoji address from the interface and iterate through the local file.
We found that the DOM of these emojis is under a UL whose class is emoji-grid, so if we get all the URL of the IMG node under this UL node and then traverse to the local, Do you save emojis?
With this in mind, we came up with the idea of Puppeteer. Before we introduce Puppeteer, let’s show this simple code that captures moji’s emojis.
const puppeteer = require('puppeteer') const request = require('request') const fs = require('fs') async function GetEmojiImage (url) {// Returns a browser resolved to a Promise const browser = await puppeteer.launch() // Returns a new page object const page = await Browser.newpage () // The page object accesses the corresponding URL, await page.goto(url, {waitUntil: 'networkidle2'}) // wait 3000ms for the browser to load await page.waitfor (3000) // access the browser object in the page.evaluate callback function. Can undertake DOM manipulation const emojis = await page. Evaluate (() = > {let ol = document. GetElementsByClassName (' emoji - grid) [0] let imgs = ol.getElementsByTagName('img') let url = [] for (let i = 0; i < 97; I ++) {url.push(imgs[I].getAttribute(' SRC '))} return url}) // Define an existing JSON let json = [] for (let) i = 0; i < emojis.length; I ++) {const name = emojis[I].slice(emojis[I].lastIndexof ('/') + 1) // Write emoji to local file request(emojis[i]).pipe(fs.createWriteStream('./' + (i < 10 ? '0' + i : i) + name)) json.push({ name, url: `. / a/a / ${name} ` / / your url address}) console. The log (` ${name} - emoji write successful `)} / fs/write json file. The writeFile ('. / Google - emoji. Json ', JSON. Stringify (JSON), function () {}) / / close the headless browser await the close ()} getEmojiImage (' https://emojipedia.org/google/ ')Copy the code
Before we get into Puppeteer, let’s take a look at Headless Chrome.
Second, the Headless Chrome
Headless Chrome was released in Chrome59 to run the Chrome browser in a Headless environment, that is, in a non-Chrome environment. It brings all the modern Web platform functionality provided by the Chromium and Blink rendering engines to the command line.
How to use Headless in terminal: We tried to open vivo’s official website by terminal command
chrome --headless --disable-gpu --remote-debugging-port=8080 https://vivo.com.cn
Copy the code
Note: It is recommended to bind the Chrome alias before using it on a Mac
alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"
Copy the code
At this point, Headless Chrome is successfully running. Type http://127.0.0.1:8080 and you’ll see the vivo interface as follows:
In addition, you can also use the command line to perform the following common operations:
1. Print DOM:
chrome --headless --disable-gpu --dump-dom https://vivo.com.cn
Copy the code
Create a PDF file
chrome --headless --disable-gpu --print-to-pdf https://vivo.com.cn
Copy the code
3, screenshots
Chrome --headless --disable -- gpu --screenshot https://vivo.com.cn // Set the screenshot size for chrome --headless --disable -- gpu --screenshot Window - size = 1280169-6 https://vivo.com.cnCopy the code
So what is a Puppeteer? What can be done? Puppeteer is a node library that provides a set of apis for manipulating Chrome, commonly known as a Headless Chrome browser, which can also be configured with a UI, which is not available by default.
Three, the Puppeteer
What can Puppeteer do? As you can see from a demo at the beginning of this article, Puppeteer can crawl page data. In addition, Puppeteer, combined with the command lines of Headless Chrome, can do a few things:
- Crawl page data
- Take a screenshot of the page or generate a PDF file
- Front-end automation testing (simulating input/click/keyboard behavior)
- Capture a timeline of the site to analyze site performance issues
1, a preliminary study
This is an API hierarchy diagram provided by Puppeteer
(Photo from Internet)
As you can see, Puppeteer uses Chrome DevTools Protocol(CDP) to communicate with the Browser. Browser corresponds to a Browser instance and can own the Browser context. A Browser can contain multiple BrowserContexts. Page represents a Tab Page, and a BrowserContext can contain multiple pages. Each page has a main Frame, and ExecutionContext is a JavasSript execution environment that Frame provides.
2, the Browser
It all started with Browser, so let’s take a look at what happened with Browser instances.
First, create an instance of Browser through puppeteer.launch()
Const browser = await puppeteer.launch({// --remote-debugging-port=3333) ['--remote-debugging-port=3333'] }) console.log(browser.wsEndpoint())Copy the code
From printing browser.wsendpoint (), we see the following link output:
Ws: / / 127.0.0.1:57546 / devtools/browser / 5 d6ee624 b5e 4-6 b8c b284-5 e4800eac853Copy the code
This is the connection devTool uses to connect to the debug page. This Websocket connection follows CDP. Let’s see what’s in it.
{"id":46,"method":"CSS.getMatchedStylesForNode","params":{"nodeId":5}}
{"id":47,"method":"CSS.getComputedStyleForNode","params":{"nodeId":5}}
Copy the code
Each message is formatted with an incrementing ID value, followed by method and params parameters. These message directors are debugged to perform various actions on the page. In other words, any program that implements CDP can be used to debug pages, and the Chrome protocol opens up the interface for application control of page actions. For example, we can simulate an alert to a page like this.
{"id":190,"method":"Runtime.compileScript","params":{"expression":"alert()","sourceURL":"","persistScript":false,"execut ionContextId":3}}Copy the code
This kind of direct manipulation is too unfriendly, but Puppeteer implements the CDP compliant Node top-level API, allowing us to invoke the corresponding instructions for simple and convenient operations.
3, Page
Browser.newpage () is the browser context method in browser. Let’s look at the code implementation of newPage().
/** * @param {? string} contextId * @return {! Promise<! Puppeteer.Page>} */ async _createPageInContext(contextId) { const {targetId} = await this._connection.send('Target.createTarget', {url: 'about:blank', browserContextId: contextId || undefined}); const target = await this._targets.get(targetId); assert(await target._initializedPromise, 'Failed to create target for page'); const page = await target.page(); return page; }Copy the code
This._connection. send(‘ target.createTarget ‘,{}) creates a page using the target.createTarget in the CDP, as well as using methods in the OTHER APIS. For example page.goto() actually executes client.send(‘ page.navigate ‘, {}); . Some operations in Page, such as click/mock input, are invoked DomWorld instances. DomWorld is managed through FrameManager. Page objects use three main managers to manage common operations:
-
**FrameManager: ** Page behavior management. For example, jump to goto, click clcik, simulate input type, wait to load waitFor, etc
-
**NetworkManager: **Network behavior management. For example, set setCacheEnabled to be ignored for each request, request an interception for setRequestInterception, and so on
-
**EmulationManager: ** Emulates behavior management. There is only one method, emulateViewport, to simulate the device with viewport dimensions
Four, the application
In addition to articles began to grab emoji expressions, we try to apply Puppeteer in a front-end automated test scenario, we in the background management system development tests, often encounter the form submission, for different fields in the form of check need to simulate different scenarios, artificial click on low efficiency, and need to be repeated every time form input, It’s tedious.
Based on this scenario, Puppeteer is used to implement auto-fill – save – Print interface return data – screenshot.
STEP 1
Create an instance of the Browser class and initialize it with parameters.
Const browser = await puppeteer.launch({devTools: true, // Whether to automatically open devTools panel headless: false for each TAB, // whether to run browser in headless mode. Default is true, unless devtools is true defaultViewport: {width: 1000, height: 1200}, // set a defaultViewport size for each page. True // Whether HTTPS errors are ignored during navigation})Copy the code
STEP 2
Create a Page instance and navigate to a URL
const page = await browser.newPage()
await page.goto(url, {
waitUntil: 'networkidle0'
})
Copy the code
The waitUntil parameter is used to determine what conditions are met before a page jump is considered complete. Including the following events:
- Load – When the page load event is triggered
- Domcontentloaded – When the page’s domContentLoaded event is triggered
- Networkidle0 – Triggered when there is no more network connection (after at least 500 ms)
- Networkidle2 – Triggered when there are only 2 network connections (at least 500 ms later)
There is no longer a network connection to consider the page jump complete. It is worth noting that the background management system will have token verification. There are two solutions: one is to wait for the page to automatically jump to the login, simulate the login operation and then return; One is to directly set the token information in the cookie. We use the second option, with the following code:
Const cookies = [{name: 'token', value: 'system tokens', // }] await page.setcookie (... cookies)Copy the code
STEP 3
Simulate page input operations and click events, our code will only enumerate two, not one to expand.
Type ('. El-form-item :nth-child(1) input', '132', {delay: await page.type('. El-form-item :nth-child(1) input', '132', {delay: await page.type('. El-form-item :nth-child(1) input', '132', {delay: Click ('. El-form-item :nth-child(2). El-form-item__content label:nth-child(1)')Copy the code
STEP 4
Monitor the page for API responses and print the response data to the console.
Page. On ('response', response => {const req = response.request() console.log('response' : ${req.url()}), response => {const req = response.request() console.log('response' : ${req.url()}) ${req.method()}, ${response.status()}, ') let message = response.text() message.then(function (result) {console.log(' return data: ${result}`) }) })Copy the code
STEP 5
Save the screenshot after the operation
// Capture the path identifier in the URL as the name of the saved image, Const testName = decodeURIComponent(url.split('#/')[1]).replace(/\//g, '-') await page. Screenshot ({path: `${testName}.png`, fullPage: true })Copy the code
STEP 6
Close the Browser – await the close ()
At this point, we have completed automated checksum testing of a form. Let’s take a look at the effect:
1. The front-end verification passes and requests data to the server interface
2. If the front-end verification fails, generate a screenshot
Fifth, expand
- Simulation line environment point inspection operation walk check
- Regularly climb weekly daily data, generate screenshots and send them to relevant personnel
Six, reference
-
Developers.google.com/web/updates…
-
Peter. Sh/experiments…
-
Zhaoqize. Making. IO/puppeteer – a…
For more content, please pay attention to vivo Internet technology wechat public account
Note: To reprint the article, please contact our wechat account: Labs2020.