This is the 8th day of my participation in the August More Text Challenge
preface
The last article introduced the basic usage and common apis for Puppeteer, and one usage scenario mentioned for Puppeteer was “generating PDF.” Next, let’s get a taste of Puppeteer by putting PDF generation into practice.
Generate PDF for a single HTML
Start with a simple one. Here, take a Vue document as an example, and save the introduction page of the document as a PDF. Attention! PDF generation only supports unbounded operations. Headless must be true. Otherwise, an error will be reported.
const puppeteer = require('puppeteer');
/ / start chromium
puppeteer.launch({
headless:true.// PDF generation only supports unbounded operations! (I set false at the beginning of debugging and kept getting errors hahaha)
// Specify the viewport size
defaultViewport: {width:1920.height:1080
}
}).then(async (browser)=>{
// Open a TAB
const page = await browser.newPage();
// Enter the specified page. By default, the page load event is triggered
await page.goto('https://cn.vuejs.org/v2/guide/index.html');
// Specify the path to save the generated PDF file
await page.pdf({path: `./vueDoc-pdf/guide.pdf`});
// Close the page
page.close()
/ / close the chromium
browser.close();
})
Copy the code
Page. Goto supports the configuration of waitUntil, which is considered to be complete when the load event is triggered by default. You can also configure networkidle0 to trigger when no network connection is available (at least 500 milliseconds later). For more configuration items, see Page.goto
Generate PDF for multiple HTML files
The example above only generates one page, what if you want to take the entire Vue tutorial down? Here is a simple and crude method: open each page in turn and generate one by one. Again, take the Vue document as an example, the implementation is as follows:
const puppeteer = require('puppeteer');
const fs = require('fs');
(async() = > {// Specify the folder to store the PDF
const folder = 'vueDoc'
fs.mkdir(folder,() = >{ console.log('Folder created successfully')})// Start the headless browser
const browser = await puppeteer.launch({headless:true }) //PDF generation is only supported in non-interface mode, remember to set it to true after debugging
const page = await browser.newPage();
await page.goto('https://cn.vuejs.org/v2/guide/index.html'); // By default, it waits for the page load event to trigger
// 1) The left menu structure of the Vue document is:.menu-root>li>a
// Get all level 1 links
const urls = await page.evaluate(() = >{
return new Promise( resolve= > {
const aNodes = $('.menu-root>li>a')
const urls = aNodes.map(n= >{
returnaNodes[n].href }) resolve(urls); })})// 2) Traverse urls, one by one, and generate PDF
for( let i = 0; i<urls.length; i++ ){
const url = urls[i],
tmp = url.split('/'),
fileName = tmp[tmp.length-1].split('. ') [0]
await page.goto(url); // By default, it waits for the page load event to trigger
await page.pdf({path: `. /${folder}/${i}_${fileName}.pdf`}); // Specify the path to save the generated PDF file
console.log(`${i}_${fileName}.pdf has been generated) } page.close() browser.close(); }) ()Copy the code
Generate PDF_plus for multiple HTML files
If you had run through the above code, you might have encountered the same problem:
That’s right. I watched you wrestle on purpose. I remember you when you wrestle
The default puppeteer request timeout is 30s. Sometimes, a page may fail to be generated due to network environment or other factors, blocking the generation of other pages. So we also need to do some fault tolerance, preferably by counting the error data for subsequent retries or something:
const puppeteer = require('puppeteer');
const fs = require('fs');
(async() = > {// Specify the folder to store the PDF
const folder = 'vueDoc'
fs.mkdir(folder,() = >{ console.log('Folder created successfully')})// Start the headless browser
const browser = await puppeteer.launch({headless:true }) //PDF generation is only supported in non-interface mode, remember to set it to true after debugging
const page = await browser.newPage();
await page.goto('https://cn.vuejs.org/v2/guide/index.html'); // By default, it waits for the page load event to trigger
// 1) The left menu structure of the Vue document is:.menu-root>li>a
// Get all level 1 links
const urls = await page.evaluate(() = >{
return new Promise( resolve= > {
const aNodes = $('.menu-root>li>a')
const urls = aNodes.map(n= >{
returnaNodes[n].href }) resolve(urls); })})// 2) Traverse urls, one by one, and generate PDF
let successUrls = [], failUrls = [] // Used to collect statistics on successes and failures
for(let i = 17; i<urls.length; i++){
const url = urls[i],
tmp = url.split('/'),
fileName = tmp[tmp.length-1].split('. ') [0]
try{
await page.goto(url); // By default, it waits for the page load event to trigger
await page.pdf({path: `. /${folder}/${i}_${fileName}.pdf`}); // Specify the path to save the generated PDF file
console.log(`${fileName}.pdf generated! `)
successUrls.push(url)
}catch{
// If the page times out, an error is thrown. To ensure that subsequent page generation is not affected, we do a little fault tolerance here.
failUrls.push(url)
console.log(`${fileName}.PDF generation failed! `)
continue}}console.log('PDF generation completed! successful${successUrls.length}A, failure${failUrls.length}A `)
console.log('Failure details:${failUrls}`)
//TODO:Failure to retrypage.close() browser.close(); }) ()Copy the code
The last
Write a little dazzling, later free to continue to improve, do a retry, generated PDF merge what. Let’s see what it looks like in the last picture.