A preliminary study of the "headless" browser

What is a headless browser

A Headless browser is a Web browser that does not have a graphical interface. Headless browsers are particularly useful for testing Web pages because they can present and parse HTML in the same way as browsers, including style elements such as page layout, color, font selection, and JavaScript and Ajax execution that are not normally available when other testing methods are used. [1]

What can a headless browser do

Headless browsers can be used to do many things, including but not limited to:

Web Page Testing
Web Page Screenshot
Generate a PDF file from the Web page
Test javascript library
Automatically submit forms
The crawler

How to use a headless browser

Generate PDF files

Using a headless browser is simple, as long as Node is installed, you don’t need to install any libraries. For example, Chrome provides a number of command-line commands, including the use of headless, all chrome startup parameters. If we want to generate a webpage display as a PDF, we need to use the — print-to-PDF parameter, as follows:

    const process = require('child_process');
    const path = require('path');

    const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
    const headLess = '--headless'; // Use a headless browser
    const disableGpu = '--disable-gpu'; // Do not use hardware rendering
    const action = '--print-to-pdf'; // Save the URL as PDF
    const outputName = path.resolve(__dirname, 'assets'.`The ${Date.now() }.pdf`); // Save the file path
    const printUrl = 'https://juejin.cn'; // Enter the URL to the browser

    // Create a child process to execute commands
    const result = process.spawnSync(chromeUrl, [
      headLess,
      disableGpu,
      `${ action }=${ outputName }`,
      printUrl,
    ])
Copy the code

After the execution of the page will be generated PDF file saved to the specified directory

Generate web page images

Similarly, if we need to generate web page snapshots, just change the above code –print-to-pdf to –screenshot

    const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
    const headLess = '--headless'; // Use a headless browser
    const disableGpu = '--disable-gpu'; // Do not use hardware rendering
    const action = '--screenshot'; // Save the URL as an image
    const outputName = path.resolve(__dirname, 'assets'.`The ${Date.now() }.png`); // Save the file path
    const printUrl = 'https://juejin.cn'; // Enter the URL to the browser

    // Create a child process to execute commands
    const result = process.spawnSync(chromeUrl, [
      headLess,
      disableGpu,
      `${ action }=${ outputName }`,
      printUrl,
    ])
Copy the code

Grab web page information

    const process = require('child_process');
    const path = require('path');
    const fs = require('fs');

    const chromeUrl = path.join('F:'.'application'.'Chrome'.'Application'.'chrome'); // The browser path
    const headLess = '--headless'; // Use a headless browser
    const disableGpu = '--disable-gpu'; // Do not use hardware rendering
    const action = '--dump-dom';// Grab web page information command
    const outputName = path.resolve(__dirname, `The ${Date.now() }.txt`); // Save the address
    const printUrl = 'https://juejin.cn'; // Enter the URL to the browser

    // Create a child process to execute commands
    const result = process.spawnSync(chromeUrl, [
      headLess,
      disableGpu,
      action,
      printUrl,
    ])
    fs.writeFileSync(outputName, result.stdout.toString());
Copy the code

Headless browser library`puppeteer`

As you can see from the example above, you don’t need any plugins just node and the browser to call the headless browser. As you can see, however, the numerous and complex start parameter commands make native headless browsers difficult to use. So we can use some headless browser libraries to simplify the operation. Puppeteer is a Node.js package released by the Chrome team in 2017 to simulate running the Chrome browser. Puppeteer is a javascript wrapper that manipulates browser functions. Makes it easier to invoke the browser’s functionality. For example, obtain a screenshot of a web page

const puppeteer = require('puppeteer');
const path = require('path');

(async() = > {const browser = await puppeteer.launch({
    headless: true,})const page = await browser.newPage(); // Create a new page
  await page.goto('https://juejin.cn'); // New page to know the address
  await page.screenshot({ // Call the screenshot function
    path: path.resolve(__dirname, 'assets'.`The ${Date.now() }.png`) }) browser.close(); }) ()Copy the code

Convert web pages to PDF

await page.pdf({
  path: path.resolve(__dirname, 'assets'.`The ${Date.now() }.pdf`)})Copy the code

Execute scripts to automate form submission

More features click here

conclusion

A Headless browser is a Web browser that does not have a graphical interface.
Headless browser can be used for web page testing, screenshots of web pages, web pages generated PDF, JavScript library testing, automatic form submission, crawler and other functions

Refer to the article

Headless browser, wiki
A first look at The headless browser Puppeteer
Chrome startup parameters

If there are any mistakes in this post, feel free to correct them in the comments section

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

A preliminary study of the “headless” browser

What is a headless browser

What can a headless browser do

How to use a headless browser

Generate PDF files

Generate web page images

Grab web page information

Headless browser library`puppeteer`

conclusion

Refer to the article

A preliminary study of the “headless” browser

What is a headless browser

What can a headless browser do

How to use a headless browser

Generate PDF files

Generate web page images

Grab web page information

Headless browser librarypuppeteer

conclusion

Refer to the article

Related Posts

Html5 Canvas sliding jigsaw verification, jigsaw with shadow effect

Promise uses techniques in projects

How do I record audio using the MediaStream API

Headless browser library`puppeteer`