Python batch Image Recognition and Translation – I used Python to translate cosmetics labels for my girlfriend

Recently, Xiaobian encountered a survival problem. My girlfriend asked me to translate English cosmetics labels for her. It is called: “program ape every day English development, English must be very good, to help me translate the makeup ingredients”, “come, help me look at this mask suggested apply a few minutes”… It seems that it is not enough to spend a huge sum of money on cosmetics, but also need to learn various English introductions.

Silently put away a pile of college entrance examination 429 points of the four level certificate, I opened the IDE… I am going to develop a demo that can translate pictures in batches and translate all kinds of cosmetics at home. As smart as I am, I would not start from the training model. When I opened the friendly AI interface page of Youdao Wisdom Cloud, there was a picture translation service. After experiencing it, it was really good, so I decided to use it.

Results show

The Demo is here. Let’s see what it looks like:

The identification process is as follows:

Look at the effect one by one! Make up for ever though not translated into Mei Ke fi, ha ha ha, but the key words long-term moisturizing, fixed spray are translated ~~ rod

This is more unknown, Korean, English can be translated

Sakura water also performed well

Add an image recognition that opens more like a box, and the effect is good, not affected by the text tilt on the picture, etc. :

Preparation of the call API – generate the application ID and key needed for the call

According to the interface agreement of Youdao Wisdom Cloud, you need to generate the application ID and key required for calling on youdao Wisdom Cloud’s personal page first, so as to be used as your calling mark and charging reference.

The specific steps are as follows: create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application used to invoke the interface. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article

Introduction to development Process

1. API interface introduction

First introduce the core part of the project, the call interface of Youdao Wisdom Cloud image translation service

API HTTPS address: openapi.youdao.com/ocrtransapi

Interface call mode: POST

Request format: form

Corresponding format: JSON

Interface call parameter

Calling the API requires sending the following fields to the interface to access the service.

The field name type meaning mandatory note
type text File Upload Type True Currently, Base64 is supported. Set this field to 1
from text The source language True See belowSupport language(You can set it to auto.)
to text The target language True See belowSupport language(You can set it to auto.)
appKey text Application ID True Can be inApplication managementTo view
salt text UUID True 1995882C5064805BC30A39829B779D7B
sign text The signature True Md5 (App Id+ Q +salt+ App key)
ext text Translated results in audio format, mp3 support false mp3
q text The picture to identify true This parameter is mandatory when type is 1. The Base64 encoding of the image is used
docType text Server response type. Currently, only JSON is supported false json
render text Does the server need to return the rendered image? 0: No. 1: Yes. The default value is 0 false 0
nullIsError text If OCR does not detect text, whether to return an error. False: No; True: If yes, the default value is false false Notice it’s a string

The signature generation method is as follows: 1. In the request parameters, the application ID appKey, the Base64 encoding q of the image, the UUID salt and the application key are concatenated in the sequence of application ID+ Q +salt+ the application key to obtain the string STR. 2. Perform MD5 on string STR to get 32-bit uppercase sign (see Java md5 generation example, click Java example).

The output

The returned result is in JSON format, described as follows:

The field name Fields that
orientation The direction of the picture
lanFrom OCR recognizes the language in the image
textAngle The slant Angle of the picture
errorCode Error code
lanTo The target language
resRegions The specific content of picture translation
-boundingBox The region range has four values: the x value in the upper left corner, the y value in the upper left corner, the width of the region, and the height of the region. For example, 134,0,1066,249
-linesCount Number of lines (for front-end layout)
-lineheight Line height
-context The text of the area
-linespace Line spacing
-tranContent Translation results

2. Detailed development

This demo is developed using PYTHon3 and includes maindow. Py, transclass.py, and pictranslate. Maindow. Py implements the interface, using python’s tkinter library to select image files and store the results. Transclass. py implements the logic of image reading and processing, and finally calls the image translation API through methods in pictranslate.

1. Interface

Main elements:


root=tk.Tk()
root.title("netease youdao translation test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')
btn_get_file = tk.Button(frm, text='Select picture to be translated', command=get_files)
btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20')
text1 = tk.Text(frm, width='40', height='10')
text1.grid(row=0, column=1)
btn_get_result_path=tk.Button(frm,text='Select translation result path',command=set_result_path)
btn_get_result_path.grid(row=1,column=0)
text2=tk.Text(frm,width='40', height='2')
text2.grid(row=1,column=1)

btn_sure=tk.Button(frm,text="Translation",command=translate_files)
btn_sure.grid(row=2,column=1)


root.mainloop()
Copy the code

Method for obtaining image files to be translated (only. JPG files are supported here) :

def get_files() :
    files = filedialog.askopenfilenames(filetypes=[('text files'.'.jpg')])
    translate.file_paths=files
    if files:
        for file in files:
            text1.insert(tk.END, file + '\n')
            text1.update()
    else:
        print('You didn't select any files')
Copy the code

Obtain the result storage path:

def set_result_path() :
    result_path=filedialog.askdirectory()
    translate.result_root_path=result_path
    text2.insert(tk.END,result_path)
Copy the code

The translate_files() method in this file finally calls the translate_files() method of translate:

def translate_files() :
    if translate.file_paths:
        translate.translate_files()
        tk.messagebox.showinfo("Tip"."Done")
    else :
        tk.messagebox.showinfo("Tip"."No file")
Copy the code
2. Batch image processing

Transclass. py implements image reading and processing logic. Translate class definitions are as follows:

class Translate() :
    def __init__(self,name,file_paths,result_root_path,trans_type) :
        self.name=name
        self.file_paths=file_paths  			Path of the file to be translated
        self.result_root_path=result_root_path  # Where the result is stored
        self.trans_type=trans_type


    def translate_files(self) :
        for file_path in self.file_paths:	# Batch image one by one processing
            file_name=os.path.basename(file_path)
            print('= = = = = = = = = = ='+file_path+'= = = = = = = = = = =')
            trans_reult=self.translate_use_netease(file_path) # call the interface for a single image
            resul_file=open(self.result_root_path+'/result_'+file_name.split('. ') [0] +'.txt'.'w').write(trans_reult)     # return result write


    def translate_use_netease(self,file_content) :  Call the Youdao interface and return the result
        result= connect(file_content)
        return result
Copy the code
3. Youdao API calls

Pictranslate. Py encapsulates some methods to call The YOUdao Wisdom Cloud API, among which the most core is connect() method, which splices the required parameters according to the interface requirements, initiates the request and returns the result.

def connect(file_content,fromLan,toLan) :
    f = open(file_content, 'rb')  Open the image file in binary mode
    q = base64.b64encode(f.read()).decode('utf-8')  Read the content of the file and convert it to Base64 encoding
    f.close()
    data = {}
    # data['from'] = 'source language'
    # data['to'] = 'target language'
    data['from'] = 'auto'
    data['to'] = 'auto'
    data['type'] = '1'
    data['q'] = q
    salt = str(uuid.uuid1())
    signStr = APP_KEY + q + salt + APP_SECRET
    sign = encrypt(signStr)
    data['appKey'] = APP_KEY
    data['salt'] = salt
    data['sign'] = sign

    response = do_request(data)
    result=json.loads(str(response.content, encoding="utf-8"))
    print(result)

    translateResults=result['resRegions']
    print(translateResults)
    pictransresult=""
    for i in translateResults:
        pictransresult=pictransresult+i['tranContent'] +"\n"
    return pictransresult
Copy the code

conclusion

Is a pleasant development experience, and is one of the few survival experience success: P, unexpectedly with the power of open platform, image recognition, natural language processing has become so easy, as long as can initiate requests correctly, you can get a good translation as a result, the remaining time is used to show off with his girlfriend, this feeling great!

Project address: github.com/LemonQH/Bat…