What is a headless browser
A Headless browser is a Web browser that does not have a graphical interface. Headless browsers are particularly useful for testing Web pages because they can present and parse HTML in the same way as browsers, including style elements such as page layout, color, font selection, and JavaScript and Ajax execution that are not normally available when other testing methods are used. [1]
What can a headless browser do
Headless browsers can be used to do many things, including but not limited to:
- Web Page Testing
- Web Page Screenshot
- Generate a PDF file from the Web page
- Test javascript library
- Automatically submit forms
- The crawler
How to use a headless browser
Generate PDF files
Using a headless browser is simple, as long as Node is installed, you don’t need to install any libraries. For example, Chrome provides a number of command-line commands, including the use of headless, all chrome startup parameters. If we want to generate a webpage display as a PDF, we need to use the — print-to-PDF parameter, as follows:
const process = require('child_process');
const path = require('path');
const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
const headLess = '--headless'; // Use a headless browser
const disableGpu = '--disable-gpu'; // Do not use hardware rendering
const action = '--print-to-pdf'; // Save the URL as PDF
const outputName = path.resolve(__dirname, 'assets'.`The ${Date.now() }.pdf`); // Save the file path
const printUrl = 'https://juejin.cn'; // Enter the URL to the browser
// Create a child process to execute commands
const result = process.spawnSync(chromeUrl, [
headLess,
disableGpu,
`${ action }=${ outputName }`,
printUrl,
])
Copy the code
After the execution of the page will be generated PDF file saved to the specified directory
Generate web page images
Similarly, if we need to generate web page snapshots, just change the above code –print-to-pdf to –screenshot
const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
const headLess = '--headless'; // Use a headless browser
const disableGpu = '--disable-gpu'; // Do not use hardware rendering
const action = '--screenshot'; // Save the URL as an image
const outputName = path.resolve(__dirname, 'assets'.`The ${Date.now() }.png`); // Save the file path
const printUrl = 'https://juejin.cn'; // Enter the URL to the browser
// Create a child process to execute commands
const result = process.spawnSync(chromeUrl, [
headLess,
disableGpu,
`${ action }=${ outputName }`,
printUrl,
])
Copy the code
Grab web page information
const process = require('child_process');
const path = require('path');
const fs = require('fs');
const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
const headLess = '--headless'; // Use a headless browser
const disableGpu = '--disable-gpu'; // Do not use hardware rendering
const action = '--dump-dom';// Grab web page information command
const outputName = path.resolve(__dirname, `The ${Date.now() }.txt`); // Save the address
const printUrl = 'https://juejin.cn'; // Enter the URL to the browser
// Create a child process to execute commands
const result = process.spawnSync(chromeUrl, [
headLess,
disableGpu,
action,
printUrl,
])
fs.writeFileSync(outputName, result.stdout.toString());
Copy the code
Headless browser librarypuppeteer
As you can see from the example above, you don’t need any plugins just node and the browser to call the headless browser. As you can see, however, the numerous and complex start parameter commands make native headless browsers difficult to use. So we can use some headless browser libraries to simplify the operation. Puppeteer is a Node.js package released by the Chrome team in 2017 to simulate running the Chrome browser. Puppeteer is a javascript wrapper that manipulates browser functions. Makes it easier to invoke the browser’s functionality. For example, obtain a screenshot of a web page
const puppeteer = require('puppeteer');
const path = require('path');
(async() = > {const browser = await puppeteer.launch({
headless: true,})const page = await browser.newPage(); // Create a new page
await page.goto('https://juejin.cn'); // New page to know the address
await page.screenshot({ // Call the screenshot function
path: path.resolve(__dirname, 'assets'.`The ${Date.now() }.png`) }) browser.close(); }) ()Copy the code
Convert web pages to PDF
await page.pdf({
path: path.resolve(__dirname, 'assets'.`The ${Date.now() }.pdf`)})Copy the code
Execute scripts to automate form submission
More features click here
conclusion
- A Headless browser is a Web browser that does not have a graphical interface.
- Headless browser can be used for web page testing, screenshots of web pages, web pages generated PDF, JavScript library testing, automatic form submission, crawler and other functions
Refer to the article
- Headless browser, wiki
- A first look at The headless browser Puppeteer
- Chrome startup parameters
If there are any mistakes in this post, feel free to correct them in the comments section