Puppeteer is an official Chrome library for Headless Chrome nodes. It provides a series of apis that can be used to call Chrome functionality without a UI, and is suitable for various scenarios such as crawlers and automated processing

puppteer

Puppeteer is an official Chrome headless Chrome Node library (a web browser without a graphical user interface). It provides a series of apis that can be used to call Chrome functionality without a UI, and is suitable for various scenarios such as crawlers and automated processing

What can it be used for?

Generate page screenshots and PDF
Automated form submission, UI testing, keyboard entry, and more
Create an up-to-date automated test environment. With the latest JavaScript and browser features, you can run tests directly in the latest version of Chrome.
Crawl SPA page and pre-render (i.e. ‘SSR’)
.

The difference from Cheerio

Cheerio – This is a tired HTML document library for JQ syntax operation. It can only crawl static HTML and cannot get Ajax data. It is generally used in combination with AXIos + Cherrio
Puppteer – can simulate the browser runtime environment, can request website information. It can simulate actions (click/swipe /hover, etc.) and even inject Node scripts to run inside the browser

Puppteer architecture diagram

Puppeteer – Communicates with browser through devTools
Browser – an instance of a Browser that can have multiple pages (Chroium)
Page – A Page that contains at least one Frame
Frame – Also has at least one execution environment for executing javascript, and can extend multiple execution environments

Easy entry


const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(targetUrl);
  await page.screenshot({path: 'example.png'}); await browser.close(); }) ();Copy the code

Analysis of the code

1. Introduce the puppeteer

 const puppeteer = require('puppeteer');
 
Copy the code

2. Create an instance

This is enabling a browser environment through Puppeteer

const browser = await puppeteer.launch(options);
Copy the code

options:

ExecutablePath: Puppeteer.executablePath () – Gets the default executable chrome location
Headless: false – Whether to enable the headless mode
SlowMo: 250 – This option will slow the Puppeteer operation by the specified number of milliseconds
Devtools: true – Use the debugger in the application code browser
DefaultViewport – Default 800 x 600
- width
- height
- DeviceScaleFactor – Scale factor
- IsMobile – Whether to consider the Meta Viewport tag. Default is false
- HasTouch – Specifies whether the viewport supports touch events. Default is false
- IsLandscape – Specifies whether the port is in landscape mode
For more parameters, see puppeteer.launch ()

3. Open a new page

const page = await browser.newPage();

Copy the code

4. Go to the target page

await page.goto(targetUrl);

Copy the code

Note: the second argument is acceptable, which is an object for some simple configuration, with options

waitUntil:

Load – The data is returned immediately after the request is received
Domcontentloaded-dom returns after loading
Networkidle0 – Returned after 500ms with no more than 0 network connections
Networkidle2 – returned after 500ms without more than 2 network connections

Timeout: jump waiting time, the unit is milliseconds, the default is 30 seconds, the 0 means unlimited waiting, can through the page. SetDefaultNavigationTimeout (timeout) method to modify the default values

Referer (uncommon): The value of the header referenced. If provided, It will take precedence over the referer header value set by page.setexTraHttpHeaders (). If provided it will take preference over the referer header value set by page.setExtraHTTPHeaders().)

5. Close the browser

 browser.close();
Copy the code

tart

In fact, easy to get started section has been a relatively complete description of our commonly used functions, to sum up, climb a web page takes a few steps

Open a browser
climb
Close the browser

Isn’t that easy? Question? How to climb? Will you use JQ?

If you can use JQ, you can use crawler!

Find a video site you like, (the following content is for teaching only!)

const demo = async () => {
  const browser = await (puppeteer.launch({
    executablePath: puppeteer.executablePath(),
    headless: false
  }))
  var arr = []
  for (let i = 1; i <= 40; i++) {
    console.log('Catching the first full time master' + i + 'set')
    const targetUrl = `https://goudaitv1.com/play/78727-4-${i}.html`
    console.log(targetUrl)
    const page = await browser.newPage()
    await page.goto(targetUrl, {
      timeout: 0,
      waitUntil: 'domcontentloaded'
    })
    const baseNode = '.row'
    const movieList = await page.evaluate((sel) => {
      var stream = Array.from($(sel).find('iframe#Player').attr('src'))
      stream && (stream = stream.join(' '))
      return stream
    }, baseNode)
    arr.push(movieList)
    page.close()
  }
  console.log(arr)
  browser.close()
}
Copy the code

page.evaluate(pageFunction[, …args])

PageFunction < XSL: | string > to be executed in page instance context method
. Args The argument to pass to pageFunction
Return: result of pageFunction execution

If pageFunction returns a Promise, Page. Evaluate waits for the Promise to complete and returns its return value.

If pageFunction returns a value that cannot be serialized, undefined is returned

PageFunction = pageFunction;

const result = await page.evaluate(x => {
  returnPromise.resolve(8 * x); }, 7); // console.log(result); // console.log(result); / / output"56"
Copy the code

You can also pass in a string

console.log(await page.evaluate('1 + 2')); / / output"3"
const x = 10;
console.log(await page.evaluate(`1 + ${x}`)); / / output"11"
Copy the code

Database entry

Done! You can do whatever you want with that data, like

The last

Of course, ‘crawling’ is only the tip of the iceberg, the above demo is rather lazy to directly get the address of a tag to jump, we can also use the click event to jump to the page, interested can try.

page.click(selector[, options])

Selector The selector of the element to be clicked. If there are multiple matching elements, click the first one.
options
- Button left, right, or middle, default left.
- ClickCount defaults to 1. View uievent.detail.
- Delay Time between a mouseDown and a mouseup, in milliseconds. The default is 0
Return the: Promise object, and the matching element is clicked. If no element is clicked, the Promise object is rejected.

This method finds an element that matches the Selector, scrolls it visually if needed, and then clicks on it via page.mouse. This method will report an error if the selector does not match any elements.

Note that if click() triggers a jump, there is a separate Page.waitforNavigation () Promise object to wait on. The correct waiting jump looks like this:


const [response] = await Promise.all([
  page.waitForNavigation(waitOptions),
  page.click(selector, clickOptions),
]);

Copy the code

page.waitForNavigation([options])

This method resolves when the page jumps to a new address or is reloaded, and is useful if your code indirectly causes the page to jump.

See Page.waitforNavigation ([options]) for more information

reference

Puppeteer Chinese website

Puppeteer npm

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Everything can climb. – Puppeteer

puppteer

The difference from Cheerio

Puppteer architecture diagram

Easy entry

Analysis of the code

tart

Isn’t that easy? Question? How to climb? Will you use JQ?

If you can use JQ, you can use crawler!

page.evaluate(pageFunction[, …args])

Database entry

The last

page.click(selector[, options])

reference

Everything can climb. – Puppeteer

puppteer

The difference from Cheerio

Puppteer architecture diagram

Easy entry

Analysis of the code

tart

Isn’t that easy? Question? How to climb? Will you use JQ?

If you can use JQ, you can use crawler!

page.evaluate(pageFunction[, …args])

Database entry

The last

page.click(selector[, options])

reference

Related Posts

Magic IntersectionObserver – IntersectionObserver

Talk about UI regression testing | BackstopJs

Front-end Rust-Ownership (emphasis)