This is the 8th day of my participation in the August More Text Challenge

preface

The last article introduced the basic usage and common apis for Puppeteer, and one usage scenario mentioned for Puppeteer was “generating PDF.” Next, let’s get a taste of Puppeteer by putting PDF generation into practice.

Generate PDF for a single HTML

Start with a simple one. Here, take a Vue document as an example, and save the introduction page of the document as a PDF. Attention! PDF generation only supports unbounded operations. Headless must be true. Otherwise, an error will be reported.

const puppeteer = require('puppeteer');

/ / start chromium
puppeteer.launch({
    headless:true.// PDF generation only supports unbounded operations! (I set false at the beginning of debugging and kept getting errors hahaha)
    // Specify the viewport size
    defaultViewport: {width:1920.height:1080
    }
}).then(async (browser)=>{
    // Open a TAB
    const page = await browser.newPage();
    // Enter the specified page. By default, the page load event is triggered
    await page.goto('https://cn.vuejs.org/v2/guide/index.html'); 
    // Specify the path to save the generated PDF file
    await page.pdf({path: `./vueDoc-pdf/guide.pdf`}); 
    // Close the page
    page.close()
    / / close the chromium
    browser.close();
})
Copy the code

Page. Goto supports the configuration of waitUntil, which is considered to be complete when the load event is triggered by default. You can also configure networkidle0 to trigger when no network connection is available (at least 500 milliseconds later). For more configuration items, see Page.goto

Generate PDF for multiple HTML files

The example above only generates one page, what if you want to take the entire Vue tutorial down? Here is a simple and crude method: open each page in turn and generate one by one. Again, take the Vue document as an example, the implementation is as follows:

const puppeteer = require('puppeteer');
const fs = require('fs');

(async() = > {// Specify the folder to store the PDF
   const folder = 'vueDoc'
   fs.mkdir(folder,() = >{ console.log('Folder created successfully')})// Start the headless browser
   const browser = await puppeteer.launch({headless:true }) //PDF generation is only supported in non-interface mode, remember to set it to true after debugging
   const page = await browser.newPage();
   await page.goto('https://cn.vuejs.org/v2/guide/index.html'); // By default, it waits for the page load event to trigger

   // 1) The left menu structure of the Vue document is:.menu-root>li>a
   // Get all level 1 links
   const urls = await page.evaluate(() = >{
      return new Promise( resolve= > {
         const aNodes = $('.menu-root>li>a')
         const urls = aNodes.map(n= >{
               returnaNodes[n].href }) resolve(urls); })})// 2) Traverse urls, one by one, and generate PDF
   for( let i = 0; i<urls.length; i++ ){
      const url = urls[i], 
         tmp = url.split('/'),
         fileName = tmp[tmp.length-1].split('. ') [0]
      await page.goto(url); // By default, it waits for the page load event to trigger
      await page.pdf({path: `. /${folder}/${i}_${fileName}.pdf`}); // Specify the path to save the generated PDF file
      console.log(`${i}_${fileName}.pdf has been generated) } page.close() browser.close(); }) ()Copy the code

Generate PDF_plus for multiple HTML files

If you had run through the above code, you might have encountered the same problem:

That’s right. I watched you wrestle on purpose. I remember you when you wrestle

The default puppeteer request timeout is 30s. Sometimes, a page may fail to be generated due to network environment or other factors, blocking the generation of other pages. So we also need to do some fault tolerance, preferably by counting the error data for subsequent retries or something:

const puppeteer = require('puppeteer');
const fs = require('fs');

(async() = > {// Specify the folder to store the PDF
    const folder = 'vueDoc' 
    fs.mkdir(folder,() = >{ console.log('Folder created successfully')})// Start the headless browser
    const browser = await puppeteer.launch({headless:true }) //PDF generation is only supported in non-interface mode, remember to set it to true after debugging
    const page = await browser.newPage();
    await page.goto('https://cn.vuejs.org/v2/guide/index.html'); // By default, it waits for the page load event to trigger

    // 1) The left menu structure of the Vue document is:.menu-root>li>a
    // Get all level 1 links
    const urls = await page.evaluate(() = >{
        return new Promise( resolve= > {
            const aNodes = $('.menu-root>li>a')
            const urls = aNodes.map(n= >{
                returnaNodes[n].href }) resolve(urls); })})// 2) Traverse urls, one by one, and generate PDF
    let successUrls = [], failUrls = [] // Used to collect statistics on successes and failures
    for(let i = 17; i<urls.length; i++){
        const url = urls[i], 
            tmp = url.split('/'),
            fileName = tmp[tmp.length-1].split('. ') [0]
        try{            
            await page.goto(url); // By default, it waits for the page load event to trigger
            await page.pdf({path: `. /${folder}/${i}_${fileName}.pdf`}); // Specify the path to save the generated PDF file
            console.log(`${fileName}.pdf generated! `)
            successUrls.push(url)
        }catch{
            // If the page times out, an error is thrown. To ensure that subsequent page generation is not affected, we do a little fault tolerance here.
            failUrls.push(url)
            console.log(`${fileName}.PDF generation failed! `)
            continue}}console.log('PDF generation completed! successful${successUrls.length}A, failure${failUrls.length}A `)
    console.log('Failure details:${failUrls}`)

    //TODO:Failure to retrypage.close() browser.close(); }) ()Copy the code

The last

Write a little dazzling, later free to continue to improve, do a retry, generated PDF merge what. Let’s see what it looks like in the last picture.