This article will share with you a practical Python office automation script “using Python to batch translate English Word documents and preserve the format”, the end result is even better than some of the paid software! Take a look at the specific job content first

I. Requirement description

Docx test2. Docx test1. Docx test2.

Basic requirements: “Translate all the contents of these documents into Chinese in batches and transfer them to new files”, the effect is as follows:

Advanced requirements: When basic requirements are met, “retain the format of the original document” is required. The effect is as follows:

Second, logical sorting

1. Translation API

The core of this demand is translation, the strategy is to use the network translation API, here recommend Baidu translation open platform, do not consider the number of concurrent words can use the standard version, free use unlimited characters!

Before using Baidu’s universal translation API, you need to complete the following tasks:

  1. Use baidu account to log in baidu Translation open platform (api.fanyi.baidu.com);

  2. Sign up as a developer and get an APPID;

  3. Developer certification (if only standard version can be skipped);

  4. Open the General translation API service: open the link

  5. Refer to technical documentation and Demo to write code

The ID and key can be seen on the personal page after completion, this is very important! The following is a general translation API demo, has done a simple output modification, the code can be used!

Note that if you need to access the API multiple times, the free version has a limit on concurrency and time. You can use the Time module to sleep for one second

2. Modify the format

The difficulty of advanced requirements is to preserve the format, simply speaking, what is the page format and paragraph format of the original document, and what is the corresponding part after translation.

Based on the above logical relationship, it is only necessary to obtain the corresponding content of the original document and assign it to the newly translated document. (For the time being, it can only meet the unification of page setting and paragraph setting. For the format modification of specific words in a paragraph, the accuracy shall be guaranteed based on NLP, which is not involved in this paper.) 2.1 Page Style

As long as the page style includes margins, orientation, height, width, and so on, as you can see from the original document, narrow margins are used. But we don’t need to know how to set the four directions of the narrow margin, we just need to render the variable passing of the old and new documents in the code, as follows

2.2 Paragraph Style

Paragraph styles include alignment, indentation, spacing, etc. In the original document, paragraphs are indented and headings are centered. These Settings can be done nicely in variable passing. If the value of a variable not set in the original document is None

2.3 Modification of text block style

For a style such as font size, bold, italic, color adjustment, the strategy is to create an empty list, through the original document each text block, each paragraph to get the corresponding attributes and on their list, for the same period, which includes a block of text attributes most options assigned to the corresponding translated document paragraph (as a whole or most of the text is a bold, Readers interested in NLP can try how to restore the style changes of certain words in English documents to a high degree and reflect them in the translated documents

The above code does not include font Settings, because there is no need to pass English fonts to Chinese documents. The setting of Chinese font has been mentioned in the previous article, which is complicated and can be directly seen in the code:

From docx.oxml.ns import qn run.font. Name = 'Microsoft yahei' r = run._element.rpr. rFonts R.set (qn('w:eastAsia'), 'Microsoft Yahei ') 3. Overall implementation stepsCopy the code

Now each part of the operation has been completed. Considering that there are multiple documents in this example that need to be translated, all the logic is as follows:

  1. Glob module batch processing framework can be used to obtain the absolute path of a file

  2. The paragraph is parsed after Word file is instantiated by python-docx

  3. The parsed paragraph text is handed to baidu General Translation API, the returned Json format result is parsed (this has been done in the modified demo above) and rewritten to a new file

  4. The same file is parsed, translated and written to a new file to save the file

Three, code implementation

Import required modules. In addition to the libraries needed in demo translation, glob library is also required to obtain files in batches, python-docx to read files, and time module to control access concurrency. Why OS modules are needed see below:

import requests
import random
import json
from hashlib import md5
import time
from docx import Document
import glob
import os
Copy the code

Parts of the original demo are retained, and the code involving query parameters needs to be moved to a later loop. Reserved parts:

Results the following

Once you get the paragraph text, you can assign the paragraph text to the Query parameter and call subsequent code in the API demo. Write the result to a new document with add_paragraph:

Finally, it is expected to be saved into a new file named _translated as the original file name. Os.path. basename method can be used to obtain the translated file and string stitching to achieve the purpose:

wordfile_new.save(path + r'\\' + os.path.basename(file)[:-5] + '_translated.docx')
Copy the code

After a single file operation is complete, put the block of code that reads and creates the file into the batch framework:

Once the above is done, the basic requirements are complete. According to our knowledge of style modification, we just add the code for style adjustment, and the final complete code is as follows:

Five new translated files are generated after the code is run

The translation effect is as follows, you can see that the English has been translated into Chinese, and the style is mostly preserved!

So far, all the documents have been translated successfully, of course by machine, with further manual adjustments to key parts of the application, but overall it’s a successful Python OA experiment!

How to obtain the source code:

① More than 3000 Python ebooks ②Python development environment installation tutorial ③Python400 set self-learning video ④ software development common vocabulary ⑤Python learning roadmap ⑤ project source code case sharing if you use it can be directly taken away in my QQ technical exchange group group number: 754370353 (pure technical exchange and resource sharing, no advertising) to take away by yourself click here to collect