Demand background

  • Business systems need to preview reports (such as weekly product reports, physical examination reports, etc.) and generate PDF formats for users to download, or periodically send to designated users
  • The report format is relatively fixed, consisting of text, images and charts, which are basically the same as the front page

The solution

Requirements are divided into two steps: report preview and report generation.

  • The report preview is displayed in the front end and can be restored using front-end technologies such as React/Vue stack, and data is obtained from the server.
  • Report generation needs to convert THE HTML generated in the first step to PDF generation. HTML2PDF can be divided into two ways:
    • Canvas based client generation scheme
    • Server-side generation scheme based on NodeJS + Puppeteer

A complete case

The following is an example of a physical examination report to illustrate the two schemes: The presentation form of the physical examination report is as follows, and the format is relatively fixed, which is divided into four pages: personal information page, suggestion page, principle page, personal information page and suggestion page. The data comes from the server.

Canvas based client generation scheme

Canvas is a new element in the HTML5 standard that can be used to draw graphics using JS scripts. Canvas provides toDataURL/toBlob method, which is used to convert contents in canvas into pictures. The API document is as follows (from MDN) :

Since HTML documents exist in the browser as DOM trees, we can convert HTML to PDF in three steps:

  • Converting a DOM tree to a Canvas object can be done using HTML2Canvas
  • Converting a canvas to an image can be done using canvas.todataURL
  • Converting images to PDF can be done using jsPDF

Complete code implementation: github.com/simonwoo/di…

Click the download button to produce PDF:

The scheme is completely generated on the client side without server support. In the process of using this scheme, some problems were found:

  • The PDF produced is fuzzy and of low quality
  • Cannot generate an external link image if there is one in HTML
  • Since the first step is to generate canvas through DOM, for a particularly long report, click download before DOM is loaded, resulting in report generation problems
  • Because it is a client scheme, it requires the user to actively trigger the generation, but this scheme cannot be used for some reports that are regularly sent to the user

Server-side generation scheme based on NodeJS + Puppeteer

Puppeteer is Google’s Headless browser, which does not have a graphical interface but can render normal browser HTML/JS/CSS, as well as other basic browser functions. You can think of it as a Chrome browser with no interface. There are mainly the following scenarios:

  • Generate screenshots and PDF of the page
  • Grab the SPA and generate pre-rendered content (i.e., “SSR”)
  • Crawlers, which grab what you need from a website
  • Automated testing, automatic form submission, UI testing, keyboard input, etc
  • Create an up-to-date automated test environment. With the latest JavaScript and browser capabilities, run tests directly in the latest version of Chrome. By understanding the capabilities of Puppeteer, we can open an instance to render AN HTML report, and then use the PDF conversion capabilities provided for PDF generation.

Two important apis:

  • Page.goto (url, [options]) – opens a file at the specified url, either a local file (file://) or a network file (http://)
  • Page.pdf ([options]) – Convert pages to PDF files

Puppeteer uses a small example to convert baidu web pages to PDF:

The complete code is as follows:

  • Front end: github.com/simonwoo/di…
  • Backend: github.com/simonwoo/di…

The project start-up process is as follows:

  • Go to the WebApp directory and use NPM install and NPM run start to start the front-end server at localhost:3000
  • Go to the server directory and start the Node server with NPM install and NPM run dev at localhost:7001

The overall service architecture is as follows:

The Node server adds a PDF generated controller by routing the Puppeteer instance to load the localhost:3000 page and generate the PDF. Directly in the browser through http://localhost:7001/pdf can access to the generated PDF.

In a real world, front-end pages can be deployed on an Nginx server or directly on a Node server. Puppeteer also supports the use of cookies to avoid the need for authentication.

Compared with the client, the PDF generated using Puppeteer is of higher quality and meets production requirements.

In both scenarios mentioned in this article, the Ajax back-end request data part is omitted and the reader can add it as needed.

Reference

  • Html2canvas – html2canvas.hertzen.com/documentati…
  • jsPDF – github.com/MrRio/jsPDF
  • The puppeteer – zhaoqize. Making. IO/puppeteer – a…
  • Eggjs – eggjs.org/zh-cn/