Python batch Image Recognition and Translation – I used Python to translate cosmetics labels for my girlfriend
Recently, Xiaobian encountered a survival problem. My girlfriend asked me to translate English cosmetics labels for her. It is called: “program ape every day English development, English must be very good, to help me translate the makeup ingredients”, “come, help me look at this mask suggested apply a few minutes”… It seems that it is not enough to spend a huge sum of money on cosmetics, but also need to learn various English introductions.
Silently put away a pile of college entrance examination 429 points of the four level certificate, I opened the IDE… I am going to develop a demo that can translate pictures in batches and translate all kinds of cosmetics at home. As smart as I am, I would not start from the training model. When I opened the friendly AI interface page of Youdao Wisdom Cloud, there was a picture translation service. After experiencing it, it was really good, so I decided to use it.
Results show
The Demo is here. Let’s see what it looks like:
The identification process is as follows:
Look at the effect one by one! Make up for ever though not translated into Mei Ke fi, ha ha ha, but the key words long-term moisturizing, fixed spray are translated ~~ rod
This is more unknown, Korean, English can be translated
Sakura water also performed well
Add an image recognition that opens more like a box, and the effect is good, not affected by the text tilt on the picture, etc. :
Preparation of the call API – generate the application ID and key needed for the call
According to the interface agreement of Youdao Wisdom Cloud, you need to generate the application ID and key required for calling on youdao Wisdom Cloud’s personal page first, so as to be used as your calling mark and charging reference.
The specific steps are as follows: create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application used to invoke the interface. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article
Introduction to development Process
1. API interface introduction
First introduce the core part of the project, the call interface of Youdao Wisdom Cloud image translation service
API HTTPS address: openapi.youdao.com/ocrtransapi
Interface call mode: POST
Request format: form
Corresponding format: JSON
Interface call parameter
Calling the API requires sending the following fields to the interface to access the service.
The field name | type | meaning | mandatory | note |
---|---|---|---|---|
type | text | File Upload Type | True | Currently, Base64 is supported. Set this field to 1 |
from | text | The source language | True | See belowSupport language(You can set it to auto.) |
to | text | The target language | True | See belowSupport language(You can set it to auto.) |
appKey | text | Application ID | True | Can be inApplication managementTo view |
salt | text | UUID | True | 1995882C5064805BC30A39829B779D7B |
sign | text | The signature | True | Md5 (App Id+ Q +salt+ App key) |
ext | text | Translated results in audio format, mp3 support | false | mp3 |
q | text | The picture to identify | true | This parameter is mandatory when type is 1. The Base64 encoding of the image is used |
docType | text | Server response type. Currently, only JSON is supported | false | json |
render | text | Does the server need to return the rendered image? 0: No. 1: Yes. The default value is 0 | false | 0 |
nullIsError | text | If OCR does not detect text, whether to return an error. False: No; True: If yes, the default value is false | false | Notice it’s a string |
The signature generation method is as follows: 1. In the request parameters, the application ID appKey, the Base64 encoding q of the image, the UUID salt and the application key are concatenated in the sequence of application ID+ Q +salt+ the application key to obtain the string STR. 2. Perform MD5 on string STR to get 32-bit uppercase sign (see Java md5 generation example, click Java example).
The output
The returned result is in JSON format, described as follows:
The field name | Fields that |
---|---|
orientation | The direction of the picture |
lanFrom | OCR recognizes the language in the image |
textAngle | The slant Angle of the picture |
errorCode | Error code |
lanTo | The target language |
resRegions | The specific content of picture translation |
-boundingBox | The region range has four values: the x value in the upper left corner, the y value in the upper left corner, the width of the region, and the height of the region. For example, 134,0,1066,249 |
-linesCount | Number of lines (for front-end layout) |
-lineheight | Line height |
-context | The text of the area |
-linespace | Line spacing |
-tranContent | Translation results |
2. Detailed development
This demo is developed using PYTHon3 and includes maindow. Py, transclass.py, and pictranslate. Maindow. Py implements the interface, using python’s tkinter library to select image files and store the results. Transclass. py implements the logic of image reading and processing, and finally calls the image translation API through methods in pictranslate.
1. Interface
Main elements:
root=tk.Tk()
root.title("netease youdao translation test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')
btn_get_file = tk.Button(frm, text='Select picture to be translated', command=get_files)
btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20')
text1 = tk.Text(frm, width='40', height='10')
text1.grid(row=0, column=1)
btn_get_result_path=tk.Button(frm,text='Select translation result path',command=set_result_path)
btn_get_result_path.grid(row=1,column=0)
text2=tk.Text(frm,width='40', height='2')
text2.grid(row=1,column=1)
btn_sure=tk.Button(frm,text="Translation",command=translate_files)
btn_sure.grid(row=2,column=1)
root.mainloop()
Copy the code
Method for obtaining image files to be translated (only. JPG files are supported here) :
def get_files() :
files = filedialog.askopenfilenames(filetypes=[('text files'.'.jpg')])
translate.file_paths=files
if files:
for file in files:
text1.insert(tk.END, file + '\n')
text1.update()
else:
print('You didn't select any files')
Copy the code
Obtain the result storage path:
def set_result_path() :
result_path=filedialog.askdirectory()
translate.result_root_path=result_path
text2.insert(tk.END,result_path)
Copy the code
The translate_files() method in this file finally calls the translate_files() method of translate:
def translate_files() :
if translate.file_paths:
translate.translate_files()
tk.messagebox.showinfo("Tip"."Done")
else :
tk.messagebox.showinfo("Tip"."No file")
Copy the code
2. Batch image processing
Transclass. py implements image reading and processing logic. Translate class definitions are as follows:
class Translate() :
def __init__(self,name,file_paths,result_root_path,trans_type) :
self.name=name
self.file_paths=file_paths Path of the file to be translated
self.result_root_path=result_root_path # Where the result is stored
self.trans_type=trans_type
def translate_files(self) :
for file_path in self.file_paths: # Batch image one by one processing
file_name=os.path.basename(file_path)
print('= = = = = = = = = = ='+file_path+'= = = = = = = = = = =')
trans_reult=self.translate_use_netease(file_path) # call the interface for a single image
resul_file=open(self.result_root_path+'/result_'+file_name.split('. ') [0] +'.txt'.'w').write(trans_reult) # return result write
def translate_use_netease(self,file_content) : Call the Youdao interface and return the result
result= connect(file_content)
return result
Copy the code
3. Youdao API calls
Pictranslate. Py encapsulates some methods to call The YOUdao Wisdom Cloud API, among which the most core is connect() method, which splices the required parameters according to the interface requirements, initiates the request and returns the result.
def connect(file_content,fromLan,toLan) :
f = open(file_content, 'rb') Open the image file in binary mode
q = base64.b64encode(f.read()).decode('utf-8') Read the content of the file and convert it to Base64 encoding
f.close()
data = {}
# data['from'] = 'source language'
# data['to'] = 'target language'
data['from'] = 'auto'
data['to'] = 'auto'
data['type'] = '1'
data['q'] = q
salt = str(uuid.uuid1())
signStr = APP_KEY + q + salt + APP_SECRET
sign = encrypt(signStr)
data['appKey'] = APP_KEY
data['salt'] = salt
data['sign'] = sign
response = do_request(data)
result=json.loads(str(response.content, encoding="utf-8"))
print(result)
translateResults=result['resRegions']
print(translateResults)
pictransresult=""
for i in translateResults:
pictransresult=pictransresult+i['tranContent'] +"\n"
return pictransresult
Copy the code
conclusion
Is a pleasant development experience, and is one of the few survival experience success: P, unexpectedly with the power of open platform, image recognition, natural language processing has become so easy, as long as can initiate requests correctly, you can get a good translation as a result, the remaining time is used to show off with his girlfriend, this feeling great!
Project address: github.com/LemonQH/Bat…