Recently because of work needs, learned the Node crawler, a simple record of my heart process.
Let’s start with the puppeteer library. If you look up the word puppeteer, it seems to mean puppeteer, which makes sense. This library is essentially designed for automated testing. It provides some apis to directly control Chrome’s actions, which can be used for UI tests or as a crawler to retrieve page data.
We agreed to start from the beginning, well, let’s officially start the entry:
First, Puppeteer is an NPM package that is easy to install.
After creating the project directory, execute:
$ yarn add puppeteer
or
$ npm install puppeteer
The latest version of Chromium is automatically downloaded when you install it, and all subsequent operations are done directly in Chromium.
Create a new index.js file in the project directory
Puppeteer: Take a look at the functions and basic uses of the puppeteer
Let’s try puppeteer first
In index.js, add the following code
const puppeteer = require("puppeteer");
(async() = > {const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("http://www.baidu.com");
await page.screenshot({ path: "baidu.png" });
awaitbrowser.close(); }) ();Copy the code
Execute later
node index.js
After the puppeteer is successfully executed, a screenshot is written to the puppeteer root directory
Read this code roughly. An obvious feature is the high frequency of async/await. This is a feature of ES7, puppeteer supports async/await very well, so Node 7.6 and above is officially recommended. The code is simple, semantic and easy to understand:
// Import the puppeteer library
const puppeteer = require("puppeteer");
// Use IIFE to execute functions directly
(async() = > {// Create the browser instance
const browser = await puppeteer.launch();
// Create a new page
const page = await browser.newPage();
// Open baidu URL
await page.goto("http://www.baidu.com");
// Take a screenshot and set the location of the picture
await page.screenshot({ path: "baidu.png" });
// Close the browser
awaitbrowser.close(); }) ();Copy the code
Above, a simple little example is implemented using Puppeteer.
In general, it is not complicated to use, the official documentation is good, and there is a Chinese version, this praise! In the process of learning, I will mainly refer to the documents.
The official online API documentation address: zhaoqize. Making. IO/puppeteer – a…