This is the sixth day of my participation in Gwen Challenge
You can transfer PDF to Word in WPS or Office, but you can only transfer to the first 5 pages for free. Here’s a Python office hack: Batch Pdf to Word, so you can convert as many pages as you want.
Python’s pdfMine3K library is used to extract PDF content, and the python-docx library is used to save the content to Word.
Here’s a look at the effect:
01 Environment Preparations
Before we start writing the code, let’s install some Python libraries to use as follows:
pip install pdfminer
Copy the code
Note:
PIP install docx is used to install docx.
ModuleNotFoundError :No module named ‘exceptions’
Truth:
pip install python-docx
Copy the code
02 Extracting the PDF content
1. Import related libraries
from pdfminer.pdfparser import PDFParser, PDFDocument
Copy the code
Explanation:
2. Read the PDF content
Before you start reading, take a look at the PDF:
Chenge has created a new two-page PDF file with his original articles sorted by modules.
The above code reads the PDF file and places each page in doc.get_pages.
The loop extracts the contents of each page and prints out each page
03 Save the file to Word
We have successfully extracted the PDF content above, and then we saved the content into Word
Step by step write and save content in traversing PDF content. Finally save it and name it: Python researcher-cheng.docx
04 summary
In order to facilitate your learning, Chen Elder brother has put the complete source of this article uploaded, need to reply in the public background: PDF conversion
In this article, Chen Ge mainly explains how to use Python to convert batch Pdf to Word. If you don’t understand the place, you can leave a message below, and communicate together.