This is the third day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021
0 x1, introduction
Section on the Van came Python | a planet’s simple crawl with readers in the background to the private chat I say:
The results of the crawl are saved to Markdown, which is not easy to view on your phone,
I:
Outrageous, cough cough, maybe they will use later, meaning toss next bar, then afternoon touch fish time, using a wave of keywords, see two kinds of conventional play, have a try, first toss seemingly simple library ↓
pandoc
0x2. Initial experience with PanDoc Library
Support super! Super! Super! Multitype conversions, with the following mess:
Okay, so you don’t have to look at what file formats are supported, you can think of almost anything, but what we’re trying to do here is convert Markdown to PDF.
For details, see Install. md. You can either download the zip package directly for Windows or INSTALL it using Choco.
Then try to configure environment variables so that pandoc can be executed everywhere:
Unzip the package → Go to folder → Select Pandoc. exe → Hold Shift right click → Copy PATH → This computer → Right click open Properties → Find Advanced System Settings → Environment variables → At System variables (S) → Find PATH → Edit → New → Paste the path you just copied here:
After the configuration, open CMD and type: pandoc -v.
If the configuration is successful, you can directly dump the decompressed file into the Python Scripts directory and run the where command to obtain the Python installation directory.
Go to the path below and just drop all your files here
After configuration, see how to use it:
Relatively simple, is the command line execution:
Pandoc -o File to be converted File to be convertedCopy the code
Convert TXT to PDF:
A latex engine needs to be specified, with the following options:
It’s a bit of a hassle. Some Latex images have to be created separately, with more than four gigabytes, so we can convert them directly to Word document format → docx:
pandoc 123.txt -o test.docx
Copy the code
Open to see the effect:
Ok, then some general operations to traverse the folder, concatenating CMD strings and executing commands using subprocess, as shown in the following code:
def md_to_doc(file_path) :
cmd = "pandoc {} -o {}"
sep_split = file_path.split(os.path.sep)
# Switch to the image directory
os.chdir(output_root_dir)
# check whether the folder exists, do not create
cp_file_utils.is_dir_existed(os.path.join(doc_save_dir, sep_split[-2]))
doc_file_path = os.path.join(doc_save_dir, '{}{}{}.docx'.format(sep_split[-2], os.path.sep, sep_split[-1] [: -4]))
subprocess.call(cmd.format(file_path, doc_file_path), shell=True)
print("Generate file:", doc_file_path)
Copy the code
File generation is complete, then to write a script to synthesize so many Word documents, need to use the following library (direct PIP installation can be) :
pip install python-docx
pip install docxcompose
Copy the code
Next direct liver code:
from docx import Document
from docxcompose.composer import Composer
def compose_docx(docx_list) :
# first file
master = Document(docx_list[0])
master.add_page_break() # Force a new page
composer = Composer(master)
# Subsequent file appending merge
for docx in docx_list[1] :print("Current processing file:", docx)
temp = Document(docx)
temp.add_page_break()
composer.append(temp)
composer.save("result.docx")
print("File merge completed...")
Copy the code
Run and wait for the program to run, because the default merge order is by filename, and we created the file using a timestamp, so don’t worry about the order. Looking at the synthesized document, you can:
841MB, 2927 pages, WPS opened instantly stuck:
0x3 slightly more troublesome scheme
Moving on to the second scenario, which is to render Markdown into HTML and then PDF using these two libraries:
pip install markdown
pip install pdfkit
Copy the code
Also: wkhtmlTopdf, also download the zip package, configure the environment variable way:
Run the wkhtmltopdf -v command to check whether the configuration takes effect ~
Then you can go ahead and write the following test demo:
import pdfkit
from markdown import markdown
def md_to_pdf(file_path) :
html = markdown(cp_file_utils.read_file_text_content(file_path), output_format='html')
pdfkit.from_string(html, "out.pdf", options={'encoding': 'utf-8'})
Copy the code
Pass the path to the md file if it appears after running:
Pdfkit OSError: No wkhtmltopdf executable found
If the above environment variable does not take effect, restart a window, or specify the path with the following code:
import pdfkit
config = pdfkit.configuration(wkhtmltopdf=r"D:\xxx\bin\wkhtmltopdf.exe")
pdfkit.from_url(html, filename, configuration=config)
Copy the code
The following error was reported:
Try-except catch a wave of exceptions when an external resource is referenced but not found:
def md_to_pdf(file_path) :
html = markdown(cp_file_utils.read_file_text_content(file_path), output_format='html')
try:
pdfkit.from_string(html, "out.pdf", options={'encoding': 'utf-8'})
except IOError as e:
# Ignore exceptions directly
pass
finally:
print("File generated...")
Copy the code
After running, open the generated PDF and see the result:
This is fine, but the default rendering, which does not support annotations, tables, LaTeX, code blocks, flowcharts, sequence diagrams, and Gantt diagrams, requires more extensions.
- Markdown module extension
The following is an example:
# Enable the tables extension
html = markdown(text, output_format='html', extensions=['tables'])
Copy the code
- Third-party extensions
The following is an example:
Install the math package
pip install python-markdown-math
# Enable math package extension
text = markdown(text, output_format='html', extensions=['mdx_math'])
Copy the code
- Export the HTML (such as job tribes) using a third-party Markdown HTML rendering tool and convert it to a PDF
The following is an example:
pdfkit.from_file('test.html'.'test.pdf', options={'encoding': 'utf-8'})
Copy the code
For more details, see: Python converts MarkDown to PDF (perfect HTML rendering, LaTeX, tables, etc.)
0 x4, summary
Look at all very simple, in fact, there is a deep custom style requirements, have to do STH over and over again, but fortunately, THE author does not have, can see the line, interested readers can find their own ~
Climb data save method skills +1, reading experience is also more, 23333, the above is the entire content of this article, have any questions welcome to point out the comment area, thank you ~