In recent months, I have done a lot of PDF related work. Take advantage of these two days when I am not busy, I will record the relevant knowledge points.
PDF is short for Portable Document Format, which translates as “Portable Document Format” and was created by Adobe in 1992. The format features the same rendering effect on any platform regardless of the operating system platform.
As for the platform… The simplicity of the format, the fact that all platforms follow the same standard, and the openness of the standard make it platform-independent.
The history of
PDF was created as a proprietary format by Adobe until 2008, when it became an official ISO (ISO 32000) standard.
Although PDF files have become the standard, Adobe, the originator of PDF, has developed some proprietary features, such as XFA (Adobe PDF Forms), which are not part of the ISO 32000 PDF standard.
Domestic format – OFD
Similar to PDF, there is OFD (Open fixed-layout Documents) format, regarded as the “Domestic PDF” standard, which was officially released by the Standardization Administration of China in 2016.
Compared to PDF, OFD is simpler, easier to implement, and supports national secrecy, although it is rarely used.
The font
In common Office formats, fonts are non-embedded by default. Non-embedded fonts can avoid storing the same font repeatedly. You only need to install the corresponding font on the rendering device. However, the disadvantage is obvious, if the client device does not have the corresponding font, it will not be able to render, and using an alternative font will affect the rendering effect.
And PDF font processing and Office is different, PDF default use of embedded fonts, but also a subset of embedded fonts – only used in the file character fonts to embed PDF files, not the entire font library embedded. This way, even if the fonts are embedded, the file size will not increase too much.
Font encryption
Some documents provide PDF files for data security, but don’t want users to copy text freely.
At this point, the benefits of embedded fonts in PDF are reflected, based on font obfuscation & encryption technology, so that the current PDF uses obfuscation & encryption font library. In this way, even if the PDF file is provided publicly, the text copied by customers is confused, which also ensures data security.
But now that OCR is so powerful, obfuscating fonts can still be recognized by OCR, but with a little more effort.
Electronic signature & digital signature
Electronic Signature
The US Global and National Commercial Electronic Signature Act (2000) defines “electronic signature” as “an electronic sound, symbol or process logically associated with other records or records attached to a contract or created, sent, communicated, received or stored electronically.”
In fact, electronic signature is simply a picture of a handwritten signature attached to an electronic document, followed by some multi-factor IDENTIFICATION (PIN/ password/email) proof.
Digital Signature – Digital Signature
Digital signatures are different from electronic signatures. A digital signature needs to be implemented with a digital certificate issued by the PKI authentication Authority. The basic play is as follows:
- Use summarization algorithms (such as MD/SHA) to generate summaries of content
- The digest is encrypted using asymmetric encryption algorithm + certificate private key
- Attach the encrypted digest data and signed certificate (public key section) to the PDF file
As you can see from the above steps, PDF digital signature is different from SSL encryption. PDF essentially “signs” files to ensure the identity of the signer and ensure that the file cannot be tampered with, while SSL encrypts packets.
The following figure shows the difference between encryption and digital signature under symmetric encryption algorithm:To sum up, there are two mainstream applications of asymmetric encryption algorithm: public key encryption -> private key decryption, private key encryption -> public key check.
PDF digital signature also has a special play, which can “bind” digital signature information with pictures, such as Stamp picture in electronic invoice, which can be used as the Appearance of digital signature.
If you don’t use skins, it’s certainly possible to just digitally sign. But remember one thing: a signature picture doesn’t necessarily have a digital signature, and a digital signature doesn’t necessarily have a signature picture. They’re not the same thing.
In fact, not only PDF files can be digitally signed, Microsoft Office suite is also support digital signature, but generally no one will be on the Office format of the document signature, so the market can see the PDF digital signature.
Signature verification
The principle of PDF signature verification is also very simple:
- Verify that the PDF signature certificate is trusted
- Use a client root certificate library (such as Adobe PDF, which uses a built-in list of root certificates, operating system independent) to verify that the signed certificate is trusted
- Through the public key of the certificate, the signature of the summary data is checked.
Certificate & signature algorithm
The type of certificate used for PDF digital signature is different from SSL. A common SSL certificate validates the domain name owner, while a PDF digital signature certificate is generally called an organizational certificate. It does not have the concept of a domain name, but it strictly validates enterprise information, such as a business license, etc.
At present, the mainstream digital certificate asymmetric encryption algorithm has RSA/DSA/DSS, but the most widely used or RSA algorithm, but with the trend of localization, financial insurance and other industries slowly migrated to the national secret algorithm.
But the algorithm is not important, are asymmetric encryption, are digital certificates, but the specific signature/check/encryption/decryption algorithm is different.
Form field – Acro Form
A Form field is a PDF Form, called an Acro Form. Yes, you read that right, PDF also has a form technology similar to HTML, which can be configured with text fields, checkboxes, checkboxes, etc. :Once you’ve edited the PDF form, you can use a tool or program to fill in or fill out the PDF.
PDF Library (JAVA)
PDF technology is still relatively closed, open source library will be very uncomfortable to use, if the enterprise business, try to consider buying commercial SDK, rich features, perfect documentation, money to buy time.
Open Source & Free
- Itext-4.x is free below and AGPL is available under open source license above 5
- Openpdf – Fix based on IText 2.x, again itext
- Pdfbox – Open source PDF library under Apache, although free, but not as powerful as Itext, not recommended.
There are also more niche PDF libraries that are not recommended here. By far the most used is IText, which, while completely open source and free, is nowhere near as feature-rich and documentedas PDFBox.
business
- Itext 7 – available in both JAVA and C# versions, fully functional, well-documented, basically meets all your PDF requirements.
- Aspose.PDF – provides a multi-language SDK with powerful functions and rich documentation
- Spire.PDF – provides multi-language SDK, powerful, rich documentation, and not only PDF, but also support Office bucket, also has distributors in China
- Adobe PDF Library SDK – Adobe’s own PDF SDK is certainly the most comprehensive and supports multiple languages
- Datalogics PDF Java Toolkit – Datalogics is the distributor of Adobe PDF Library and also provides another version of the PDF SDK itself
PDF tools
There are a lot of PDF tools on the market, but here are some of the mainstream full-featured GUI tools (reading, editing, converting, signing) :
- Adobe Acrobat– The granddaddy of PDF, the most powerful PDF tool without one
- Foxit – domestic brand PDF software
- Wan Xing PDF
- Swift office
- Small PDF – The conscientious PDF online tool offers editing, conversion, compatibility, and a free daily quota
reference
- www.wikiwand.com/zh/%E5%8F%A…
- www.ssl.com/zh-CN/%E5%B…
- www.wosign.com/FAQ/faq_201…
- gmssl.org/
- Itextpdf.com/sites/defau…