Recently, the project needs to realize the function of PDF download. Since I have no experience in this aspect, IT took me a long time to find relevant materials from the Internet. After sorting it out, there are several frameworks that can do this.

1. Open source framework support

  • IText, which can generate PDF documents and convert XML and Html files to PDF files.
  • Apache PDFBox, generate and merge PDF documents;
  • Docx4j, generate DOCX, PPTX, XLSX documents, support conversion to PDF format.

Comparison:

  • The iText open source protocol is AGPL, and the other two framework protocols are Apache License V2.0.
  • Using PDFBox to generate PDFS is like drawing pictures. Text and images are drawn according to page coordinates and need to be manually wrapped according to the number of words.
  • Docx4j is used to generate DOCX documents, providing the function of converting WORD documents to PDF documents, but not directly generating PDF documents.

2. Implementation scheme

Format complex Format is simple
Large amount of data docx4j+freemarker Docx4j or PDFBox
Small amount of data docx4j PDFBox

2.1 Pure data generation PDF

1. Docx4j, suitable for generating PDF documents with simple format or complex format and small amount of data; 2.Apache PDFBox, suitable for generating PDF documents with simple format and small amount of data.

Docx4j is an Open source Java library for creating and manipulating Microsoft Open XML (Word Docx, Powerpoint PPTX, and Excel XLSX) files. It is similar to Microsoft’s OpenXML SDK, but works with Java. Docx4j uses JAXB to create in-memory object representations, and programmers need to spend time learning about JAXB and Open XML file structures.

/ / word objects
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
// Document body
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
/ / a newline character
Br br = objectFactory.createBr();
/ / paragraphs
P p = objectFactory.createP();
// Paragraph Settings
PPr ppr = objectFactory.createPPr();
// Text position
Jc jc = new Jc();
jc.setVal(je);
ppr.setJc(jc);
/ / Settings
RPr rpr = objectFactory.createRPr();
// Font Settings
RFonts rFonts = objectFactory.createRFonts();
rFonts.setAscii("Times New Roman");
rFonts.setEastAsia("宋体");
rpr.setRFonts(rFonts);
/ / line
R r = objectFactory.createR();
/ / text
Text text = objectFactory.createText();
text.setValue("This is a plain text.");
r.setRPr(rpr);
r.getContent().add(br);
r.getContent().add(text);
p.getContent().add(r);
p.setPPr(ppr);
// Add to the body
mainDocumentPart.addObject(p);
/ / export
/ /..
Copy the code

Apache PDFBox Apache PDFBox is an open source Java tool for processing PDF documents. The project allows the creation of new PDF documents, processing of existing documents, and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities.

String formTemplate = "/Users/xiaoming/Desktop/test_pdfbox.pdf";
// Define the document object
PDDocument document = new PDDocument();
// Define a page, size A4
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
// Get the font
PDType0Font font = PDType0Font.load(document, new File("/Users/xiaoming/work/tmp/simsun.ttf"));
// Define the page content flow
PDPageContentStream stream = new PDPageContentStream(document, page);
// Set the font and text size
stream.setFont(font, 12);
// Set the brush color
stream.setNonStrokingColor(Color.BLACK);
// Add rectangle
stream.addRect(29.797.100.14);
// Fill the rectangle
stream.fill();
stream.setNonStrokingColor(Color.BLACK);
// Text filling begins
stream.beginText();
// Set the line spacing
stream.setLeading(18f);
// Set the text position
stream.newLineAtOffset(30.800);
// Fill in the text
stream.showText("Ha ha");
/ / a newline
stream.newLine();
stream.showText("Ha ha");
stream.newLine();
stream.showText("Hee hee");
// End of text filling
stream.endText();
/ / close the flow
stream.close();
/ / save
document.save(formTemplate);
// Release resources
document.close();
Copy the code

2.2 Template + Data generation PDF

FreeMarker+ Docx4J, ideal for generating complex and data-heavy PDF documents

Apache FreeMarker is a templating engine for generating textual output (HTML web pages, emails, configuration files, source code, etc.) from templates and changing data. Templates are written in the FreeMarker Template Language (FTL), which is a simple, proprietary language.

Office2003 onwards, Word can be stored in XML text format. The generated PDF is converted to a Word document, saved to XML text, populated with data through a template engine, and then reversely converted to a PDF document. PDF->Word->XML->Word->PDF process.

steps describe tool
1 word -> xml manual
2 xml -> ftl Manual, refer toIntroduction to Common Tags in Word Documents in XML Format
3 ftl + obj = xml freemarker
4 xml -> pdf docx4j
steps
  • 1. Make the corresponding WORD (DOCX) of the PDF document
  • Save the Word document as an XML file
  • Make the XML file into a Freemarker Template (FTL) file
  • 4 Assemble the data and FTL files into XML text
Map<String, Object> map = new HashMap<>();
map.put("name"."Xiao Ming");
map.put("address".Chaoyang District, Beijing);
map.put("email"."[email protected]");
StringWriter stringWriter = new StringWriter();
BufferedWriter writer = new BufferedWriter(stringWriter);
template.process(map, writer);
String xmlStr = stringWriter.toString();
Copy the code
  • 5 Use Docx4J to load XML text into Word document objects
ByteArrayInputStream in = new ByteArrayInputStream(xmlStr.getBytes());
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(in);
Copy the code
  • 6 Use Docx4J to save word documents as PDF documents
String outputfilepath = "Resume/Users/xiaoming/PDF".;
FileOutputStream os = new FileOutputStream(new File(outputFilePath));
FOSettings foSettings = Docx4J.createFOSettings();
foSettings.setWmlPackage(wordMLPackage);
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
// Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File(outputfilepath)));
Copy the code

2.3 turn Word PDF

docx4j

WordprocessingMLPackage mlPackage = WordprocessingMLPackage.load(new File("abc.docx"));
Mapper fontMapper = new IdentityPlusMapper();  
// fontMapper. Put (" Chinese fonts ", PhysicalFonts. Get ("STXingkai"));
mlPackage.setFontMapper(fontMapper);  
OutputStream os = new java.io.FileOutputStream("abc.pdf");    
FOSettings foSettings = Docx4J.createFOSettings();  
foSettings.setWmlPackage(mlPackage);  
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);  
Copy the code

2.4 Merge multiple PDFS

Apache PDFBox to merge multiple PDF documents

String folderName = "/Users/xiaoming/pdfs";
String destPath = "/Users/xiaoming/all.pdf";
PDFMergerUtility mergePdf = new PDFMergerUtility();
String[] filesInFolder = getFiles(folderName);
Arrays.sort(filesInFolder, new Comparator<String>() {
      @Override
      public int compare(String o1, String o2) {
          returno1.compareTo(o2); }});for (int i = 0; i < filesInFolder.length; i++) {
     mergePdf.addSource(folderName + File.separator + filesInFolder[i]);
}
mergePdf.setDestinationFileName(destPath);
mergePdf.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
Copy the code

The sample code

Github.com/brandonbai/…