What is a headless browser

A Headless browser is a Web browser that does not have a graphical interface. Headless browsers are particularly useful for testing Web pages because they can present and parse HTML in the same way as browsers, including style elements such as page layout, color, font selection, and JavaScript and Ajax execution that are not normally available when other testing methods are used. [1]

What can a headless browser do

Headless browsers can be used to do many things, including but not limited to:

  1. Web Page Testing
  2. Web Page Screenshot
  3. Generate a PDF file from the Web page
  4. Test javascript library
  5. Automatically submit forms
  6. The crawler

How to use a headless browser

Generate PDF files

Using a headless browser is simple, as long as Node is installed, you don’t need to install any libraries. For example, Chrome provides a number of command-line commands, including the use of headless, all chrome startup parameters. If we want to generate a webpage display as a PDF, we need to use the — print-to-PDF parameter, as follows:

    const process = require('child_process');
    const path = require('path');

    const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
    const headLess = '--headless'; // Use a headless browser
    const disableGpu = '--disable-gpu'; // Do not use hardware rendering
    const action = '--print-to-pdf'; // Save the URL as PDF
    const outputName = path.resolve(__dirname, 'assets'.`The ${Date.now() }.pdf`); // Save the file path
    const printUrl = 'https://juejin.cn'; // Enter the URL to the browser

    // Create a child process to execute commands
    const result = process.spawnSync(chromeUrl, [
      headLess,
      disableGpu,
      `${ action }=${ outputName }`,
      printUrl,
    ])
Copy the code

After the execution of the page will be generated PDF file saved to the specified directory

Generate web page images

Similarly, if we need to generate web page snapshots, just change the above code –print-to-pdf to –screenshot

    const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
    const headLess = '--headless'; // Use a headless browser
    const disableGpu = '--disable-gpu'; // Do not use hardware rendering
    const action = '--screenshot'; // Save the URL as an image
    const outputName = path.resolve(__dirname, 'assets'.`The ${Date.now() }.png`); // Save the file path
    const printUrl = 'https://juejin.cn'; // Enter the URL to the browser

    // Create a child process to execute commands
    const result = process.spawnSync(chromeUrl, [
      headLess,
      disableGpu,
      `${ action }=${ outputName }`,
      printUrl,
    ])
Copy the code

Grab web page information

    const process = require('child_process');
    const path = require('path');
    const fs = require('fs');

    const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
    const headLess = '--headless'; // Use a headless browser
    const disableGpu = '--disable-gpu'; // Do not use hardware rendering
    const action = '--dump-dom';// Grab web page information command
    const outputName = path.resolve(__dirname, `The ${Date.now() }.txt`); // Save the address
    const printUrl = 'https://juejin.cn'; // Enter the URL to the browser

    // Create a child process to execute commands
    const result = process.spawnSync(chromeUrl, [
      headLess,
      disableGpu,
      action,
      printUrl,
    ])
    fs.writeFileSync(outputName, result.stdout.toString());
Copy the code

Headless browser librarypuppeteer

As you can see from the example above, you don’t need any plugins just node and the browser to call the headless browser. As you can see, however, the numerous and complex start parameter commands make native headless browsers difficult to use. So we can use some headless browser libraries to simplify the operation. Puppeteer is a Node.js package released by the Chrome team in 2017 to simulate running the Chrome browser. Puppeteer is a javascript wrapper that manipulates browser functions. Makes it easier to invoke the browser’s functionality. For example, obtain a screenshot of a web page

const puppeteer = require('puppeteer');
const path = require('path');

(async() = > {const browser = await puppeteer.launch({
    headless: true,})const page = await browser.newPage(); // Create a new page
  await page.goto('https://juejin.cn'); // New page to know the address
  await page.screenshot({ // Call the screenshot function
    path: path.resolve(__dirname, 'assets'.`The ${Date.now() }.png`) }) browser.close(); }) ()Copy the code

Convert web pages to PDF

await page.pdf({
  path: path.resolve(__dirname, 'assets'.`The ${Date.now() }.pdf`)})Copy the code

Execute scripts to automate form submission

More features click here

conclusion

  • A Headless browser is a Web browser that does not have a graphical interface.
  • Headless browser can be used for web page testing, screenshots of web pages, web pages generated PDF, JavScript library testing, automatic form submission, crawler and other functions

Refer to the article

  1. Headless browser, wiki
  2. A first look at The headless browser Puppeteer
  3. Chrome startup parameters

If there are any mistakes in this post, feel free to correct them in the comments section