Wow, Python can do real-time translation

Welcome to pay attention to me, together to fulfill my promise before, even more within a month, to finish several articles.

The serial number	Estimated completion time	Develop dome name and features & publish article content	Is it finished	The article links
1	On September 3	Text translation, single text translation, batch translation demo.	Has been completed	CSDN:Let me directWechat Official Account:Let me direct
2	On September 11	Ocr-demo, complete batch upload identification; In a demo, you can select different types of OCR recognition “include handwriting/print/ID card/form/whole topic/business card), and then call the platform capabilities, specific implementation steps, etc.	Has been completed	CSDN:Let me directWechat Official Account:
3	On October 27	Voice recognition Demo, demo upload a video, and capture the video short voice recognition – Demo audio for short voice recognition		CSDN:Let me directWechat Official Account:
4	On September 17	Intelligent voice evaluation – Demo		CSDN: wechat Official Number:
5	On September 24	Essay correction – Demo		CSDN: wechat Official Number:
6	On September 30	Voice synthesis – Demo		CSDN: wechat Official Number:
7	On October 15	Single question pat-demo		CSDN: wechat Official Number:
8	On October 20	Picture translation – Demo		CSDN: wechat Official Number:

Recently, amid much anticipation, a fruit phone manufacturer held a press conference that did not announce the anticipated mobile phone products, and unveiled some other products besides the mobile phone, including the latest fruit 14 system. A few days later, people who had updated the system discovered a very interesting feature in the new system — translation, like this:

Strange translation knowledge increased!

Compared with common translation tools, simultaneous translation tools have more practical value, think about not proficient in other languages and friends can communicate barrier-free scenes, is really a beautiful thing, it is better to implement a tool for backup! For a simultaneous translation tool, the logic may be to identify first and then translate. The accuracy of recognition is a key factor in the success of translation. To ease the difficulty, I decided to complete the tool development in two separate sessions. Let’s try the speech recognition part first.

This demo continues to call Youdao Wisdom Cloud API to achieve real-time speech recognition.

Take a look at the interface and results:

You can choose from a variety of sounds, but here are just four common ones:

I tested Chinese, Korean and English separately. Looks good

Here, the translation results are recognized in real time, word by word, according to the audio. Because the recognition speed is relatively fast, it seems that there is no time difference.

First of all, you need to create instances, create applications, bind applications and instances on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application used to invoke the interface. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article

(1) Preparation

The following describes the specific code development process.

The first is to analyze the input and output of the interface according to the real-time speech recognition document. The purpose of the interface design is to recognize the continuous audio stream in real time, convert it into text information and return the corresponding text stream, so the communication uses Websocket, and the call process is divided into authentication and real-time communication two stages.

During the authentication phase, you need to send the following parameters:

parameter	type	mandatory	instructions	The sample
appKey	String	is	ID of the application that has been applied	ID
salt	String	is	UUID	UUID
curtime	String	is	Time stamp (seconds)	TimeStamp
sign	String	is	Encrypts digital signatures.	sha256
signType	String	is	Digital signature Type	v4
langType	String	is	Language selection, refer to the list of supported languages	zh-CHS
format	String	is	Audio format, support WAV	wav
channel	String	is	Channel, support 1 (mono)	1
version	String	is	API version	v1
rate	String	is	Sampling rate	16000

The method for generating signature sign is as follows: signType=v4; Sign =sha256(app ID+salt+curtime+ app key)

Certification, he entered the stage of real-time communication to send audio stream, to obtain recognition as a result, the sending end to end communication, here it is important to note that send audio is best 16 bit depth of the mono, 16 k, sampling rate of clear wav audio file, here I developed at the beginning because there is something wrong with the audio recording equipment, The audio effect is poor, and the interface keeps returning error code 304 (manual face covering).

2. Development

This demo is developed using python3 and includes maindow. Py, audioandprocess.py, recobynetease. Interface part, using Python’s own Tkinter library, for language selection, recording start, recording stop and recognition operations. Audioandprocess. py implements the logic for recording, audio processing, and finally calls the real-time speech recognition API through methods in Recobyyahoos.py.

1. Interface

Main elements:

root=tk.Tk() root.title("netease youdao translation test") frm = tk.Frame(root) frm.grid(padx='80', Pady ='80') label=tk. label (FRM,text=' select language type: ') label.grid(row=0,column=0) combox=ttk.Combobox(frm,textvariable=tk.StringVar(),width=38) combox["value"]=lang_type_dict combox.current(0) combox.bind("<<ComboboxSelected>>",get_lang_type) Grid (row=0,column=1) btn_start_rec = tk.Button(FRM, text=' start recording ', command=start_rec) btn_start_rec.grid(row=2, column=1) btn_start_rec.grid(row=2, column=1) column=0) lb_Status = tk.Label(frm, text='Ready', anchor='w', Fg ='green') lb_status. grid(row=2,column=1) btn_sure=tk.Button(FRM,text=" end and identify ",command=get_result) btn_sure.grid(row=3,column=0) root.mainloop()Copy the code

After the language type is selected, recording begins, and after recording ends, the interface is identified by calling the get_result() method.

def get_result():
    lb_Status['text']='Ready'
    sr_result=au_model.stop_and_recognise()
Copy the code

2. Development of audio recording

The audio recording section introduces the PyAudio library (via PIP installation) to call the audio device and record the WAV files required by the interface, and calls the Wave library to store the audio files.

Construction of the Audio_model class:

    def __init__(self, audio_path, language_type,is_recording):
        self.audio_path = audio_path,					
        self.audio_file_name=''							
        self.language_type = language_type,				
        self.language_dict=["zh-CHS","en","ja","ko"]	
        self.language=''
        self.is_recording=is_recording					
        self.audio_chunk_size=1600						
        self.audio_channels=1
        self.audio_format=pyaudio.paInt16
        self.audio_rate=16000
Copy the code

(2) The development of record() method

Record () method to achieve the logic of recording, call PyAudio library, read audio stream, write files.

    def record(self,file_name):
        p=pyaudio.PyAudio()
        stream=p.open(
            format=self.audio_format,
            channels=self.audio_channels,
            rate=self.audio_rate,
            input=True,
            frames_per_buffer=self.audio_chunk_size
        )
        wf = wave.open(file_name, 'wb')
        wf.setnchannels(self.audio_channels)
        wf.setsampwidth(p.get_sample_size(self.audio_format))
        wf.setframerate(self.audio_rate)

        
        while self.is_recording:
            data = stream.read(self.audio_chunk_size)
            wf.writeframes(data)
        wf.close()
        stream.stop_stream()
        stream.close()
        p.terminate()
Copy the code

(3) Development of stop_AND_Supplement () method

The stop_AND_RECOGNISE () method marks the recording status of Audio_model as false and starts a method to call the Youdao Zhiyun API.

    def stop_and_recognise(self):
        self.is_recording=False
        recognise(self.audio_file_name,self.language_dict[self.language_type])
Copy the code

3. Development of real-time speech recognition

The real-time voice recognition interface of Youdao Wisdom Cloud uses socket for communication. In order to simplify the presentation logic, an interface for displaying recognition results is developed here, which is displayed using Tkinter:

root = tk.Tk()
root.title("result")
frm = tk.Frame(root)
frm.grid(padx='80', pady='80')
text_result = tk.Text(frm, width='40', height='20')
text_result.grid(row=0, column=1)
Copy the code

Recognise () method concatenates required parameters to the URI according to an interface document and passes it to the start() method request interface:

def recognise(filepath,language_type): print('l:'+language_type) global file_path file_path=filepath nonce = str(uuid.uuid1()) curtime = str(int(time.time())) signStr = app_key + nonce + curtime + app_secret print(signStr) sign = encrypt(signStr) uri = "wss://openapi.youdao.com/stream_asropenapi?appKey=" + app_key + "&salt=" + nonce + "&curtime=" + curtime + \ "&sign=" +  sign + "&version=v1&channel=1&format=wav&signType=v4&rate=16000&langType=" + language_type print(uri) start(uri, 1600)Copy the code

The start() method is the core method of the real-time recognition part, invoking the recognition interface through the WebSocket.

def start(uri):

    websocket.enableTrace(True)

    ws = websocket.WebSocketApp(uri,
                                on_message=on_message,
                                on_error=on_error,
                                on_close=on_close)

    ws.on_open = on_opend
    ws.run_forever()
Copy the code

When requesting the interface, the previously recorded audio file is first read and sent:

def on_open(ws):
    count = 0
    file_object = open(file_path, 'rb')  
    while True:
        chunk_data = file_object.read(1600)
        ws.send(chunk_data, websocket.ABNF.OPCODE_BINARY)  
        time.sleep(0.05)
        count = count + 1
        if not chunk_data:
            break
    print(count)
    ws.send('{\"end\": \"true\"}', websocket.ABNF.OPCODE_BINARY)
Copy the code

Then, in the communication process, the message returned by the interface is processed and the identification result returned by the interface is collected:

def on_message(ws, message):
    result=json.loads(message)
    resultmessage= result['result']  
    
    if resultmessage:
        resultmessage1 = result['result'][0]
        resultmessage2 = resultmessage1["st"]['sentence']
        print(resultmessage2)
        
        result_arr.append(resultmessage2)
Copy the code

Finally, the identification results are displayed after the end of communication:

def on_close(ws): print_resule(result_arr) print("### closed ###") def print_resule(arr): Text_result. delete('1.0', tk.end) for n in arr: text_result.insert("insert", n + '\n')Copy the code

Youdao ZhiYun provided interfaces works as usual, and the main energy wasted in this development due to the poor quality of my own recording audio and recognition on the issue of failure, audio quality ok, accurate identification results, the next step is to translate, have youdao ZhiYun API, real-time translation can be so simple!

Wow, Python can do real-time translation

(1) Preparation

2. Development

1. Interface

2. Development of audio recording

(2) The development of record() method

(3) Development of stop_AND_Supplement () method

3. Development of real-time speech recognition

Related Posts

Helm Template for the first time, easy to manage multiple environments

JUnit5 learning 8: comprehensive progression (final)

Very hardcore technical knowledge -CopyOnWrite ideas