Pyttsx3, eSpeak1.48.04: PZH-Speech

Hello everyone, I am a ruffian balance, is a serious technical ruffian. Today ruffian heng to introduce to you is the speech processing tool pzh-py-speech birth of text synthesis implementation.

Text synthesis is the core function of Pzh-Py-Speech. Pzh-py-speech is based on pyTTSX3 and eSpeak.

Introduction to PyTTSX3

Pyttsx3 is a Python package library that implements the SAPI5 language synthesis engine. Designed by Natesh M Bhat, pyTTsx3 is a continuation of the pyTTS and PyTTsX projects. Pyttsx3 was designed primarily for Python3. But it is also compatible with Python2. JaysPySPEECH uses PyTTSX3 2.7. The official homepage of pyTTSX3 is as follows:

Pyttsx3 official homepage: github.com/nateshmbhat…

Pyttsx3 Installation method: pypi.org/project/pyt…

Pyttsx3 using simple enough, its official document pyttsx3. Readthedocs. IO/en/latest/e… It can be read in half an hour. Here is the simplest example code:

import pyttsx3;

engine = pyttsx3.init();
engine.say("I will speak this text");
engine.runAndWait() ;
Copy the code

1.1 Microsoft Speech API (SAPI5) engine

SAPI5 is the core of PyTTSX3, which is a TTS engine developed by Microsoft. Its official website is as follows:

Official documentation of SAPI5: docs.microsoft.com/en-us/previ…

Since SAPI5 is already packaged by PyTTSX3, there is no need to focus on the TTS implementation principle of SAPI5 itself.

1.2 Verifying voice packages supported by the PC

When using PyTTSX3 for text synthesis, it depends on the current PC Speech environment, open Control Panel -> Speech Recognition, you can see the following page:

Ruffian Balance using PC is Win10 English, so the default only English voice package (David is male voice, Zira is female voice), this can also use the following pyTTsx3 call code to confirm:

import pyttsx3;

ttsObj = pyttsx3.init()
voices = ttsObj.getProperty('voices')
for voice in voices:
    print ('id = {} \nname = {} \n'.format(voice.id, voice.name))
Copy the code

The code results are as follows:

Id = HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\ tts_MS_en-us_david_11.0 name = Microsoft David Desktop - English (United States) id = HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\ tts_MS_en-us_zira_11.0 name = Microsoft Zira Desktop - English (United States)Copy the code

1.3 Adding voice package support for PCS

If you want to use pzh-py-Speech to combine Chinese and English languages, ensure that both English and Chinese speech packages are available on the PC. The PC only has English speech packages, so you need to install Chinese speech packages (the method of installing other language speech packages is similar). In Windows, there are many Chinese voice packages. You can use third-party voice packages (such as NeoSpeech) or Microsoft voice packages. Ruizheng chooses the classic Huihui voice package (zh-CN_huihui). Enter Microsoft Speech Platform-Runtime (Version 11) and Microsoft Speech Platform-Runtime Languages (Version 11) Download page will be selected file download (pro test can only be accessed using Google Chrome browser, IE unexpectedly cannot open) :

Install SpeechPlatformRuntime first. Msi (double-click the installation), restart the computer after the installation is complete, then install MSSpeech_TTS_zh – CN_HuiHui. Msi, after the installation need to modify the registry, Open Run (Win +R) and enter “regedit” to see the following Registry editing interface. You can see the default voice package (DAVID, ZIRA) in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices You can see the newly installed voice package in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices:

Right-click HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices and export it as a. Reg file. Use a text editor to open the.reg file, replace all “\Speech Server\v11.0” with “\Speech” and save it, then import the modified.reg file into the registry.

After the import is successful, you can see Huihui in the registry and voice recognition options:

Note: The preceding changes apply only to 32-bit operating systems. For 64-bit operating systems, You need to do the same for the registry in the HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Speech Server\v11.0\Voices path.

Introduction to eSpeak

Since PyTTSX3 can only speak online, it cannot save the synthesized speech as WAV file, so Ruffin needs to find another TTS engine that can save JaysPySPEECH as WAV. ESpeak is a concise, open source speech synthesis software written in C, which supports English and many other languages, as well as SAPI5 interface. The synthesized speech can be exported as WAV files. ESpeak’s official home page is as follows:

ESpeak official home page: espeak.sourceforge.net/

ESpeak download and install: espeak.sourceforge.net/download.ht…

ESpeak added language pack: espeak.sourceforge.net/data/index….

ESpeak reads text from standard input or input files, and while the voice output is nothing like a human voice, eSpeak is still a quick and easy tool to use when a project needs it. ESpeak \command_line\espeak.exe is the tool that you need to use to call eSpeak\command_line\espeak.exe. You need to add the “C:\tools_mcu\eSpeak\command_line” Path to the system environment variable Path. For Chinese support, basic Chinese characters are already included in the \eSpeak\espeak-data\zh_dict file, but for full Chinese support, you need to download the zh_listx.zip Chinese voice package. Dictsource = dictsource, and then run the command “eSpeak –compile=zh”. After successful execution, you can see that the \eSpeak\espeak-data\zh_dict file has become significantly larger. ESpeak is external to Python, and we need to call espeak.exe with subprocess. Here is the sample code:

import subprocess
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

enText = "Hello world"
zhText = U "Hello world"
txtFile = "C:/test.txt"  # The file is in Chinese
wavFile = "C:/test.wav"

# online pronunciation (-v = set voice, en = English, M3 male, zh = Chinese, F3 = female)
subprocess.call(["espeak"."-ven+m3", enText])
subprocess.call(["espeak"."-vzh+f3", zhText])
Save as a WAV file (the first method can only save English WAV, if you want to save other languages waV need to use the second method)
subprocess.call(["espeak"."-w"+wavFile, enText])
subprocess.call(["espeak"."-vzh+f3"."-f"+txtFile, "-w"+wavFile])
Copy the code

To experience the quality of eSpeak pronunciation directly, open the \eSpeak\TTSApp. Exe application. It is very simple to use:

Realization of text synthesis of Pzh-py-Speech

The realization of text and language synthesis is mainly divided into two parts: TTS and TTW. TTS needs to import PyTTSX3, TTW needs to use subprocess to call eSpeak, the following ruffracks respectively introduce the implementation of these two parts:

3.1 the Text – to – researched and realized

TTS code implementation is actually very simple, currently only the implementation of the PyTTSX3 engine, and only support Chinese and English bilingual recognition. In pzh-py-Speech, the callback function of the “TTS” button on the GUI interface is implemented, namely textToSpeech(). If the user selects the configuration parameters (language type, speaker type, TTS engine type) and clicks the “TTS” button, This triggers the execution of textToSpeech(). The code is as follows:

reload(sys)
sys.setdefaultencoding('utf-8')
import pyttsx3

class mainWin(win.speech_win) :

    def __init__(self, parent) :
        #...
        self.ttsObj = None

    def refreshVoice( self, event ) :
        languageType, languageName = self.getLanguageSelection()
        engineType = self.m_choice_ttsEngine.GetString(self.m_choice_ttsEngine.GetSelection())
        if engineType == 'pyttsx3 - SAPI5':
            if self.ttsObj == None:
                 self.ttsObj = pyttsx3.init()
            voices = self.ttsObj.getProperty('voices')
            voiceItems = [None] * len(voices)
            itemIndex = 0
            for voice in voices:
                voiceId = voice.id.lower()
                voiceName = voice.name.lower()
                if(voiceId.find(languageType.lower()) ! = -1) or(voiceName.find(languageName.lower()) ! = -1):
                    voiceItems[itemIndex] = voice.name
                    itemIndex += 1
            voiceItems = voiceItems[0:itemIndex]
            self.m_choice_voice.Clear()
            self.m_choice_voice.SetItems(voiceItems)
        else:
            voiceItem = ['N/A']
            self.m_choice_voice.Clear()
            self.m_choice_voice.SetItems(voiceItem)

    def textToSpeech( self, event ) :
	    Get the speech language type (English/Chinese)
        languageType, languageName = self.getLanguageSelection()
        Get the text to convert from the asrttsText text box
        lines = self.m_textCtrl_asrttsText.GetNumberOfLines()
        iflines ! =0:
            data = ' '
            for i in range(0, lines):
                data += self.m_textCtrl_asrttsText.GetLineText(i)
        else:
            return
        ttsEngineType = self.m_choice_ttsEngine.GetString(self.m_choice_ttsEngine.GetSelection())
        if ttsEngineType == 'pyttsx3 - SAPI5':
		    Create pyTTsx3 text composite object ttsObj
            if self.ttsObj == None:
                 self.ttsObj = pyttsx3.init()
		    Search the current PC for speakers of the specified language type
            hasVoice = False
            voices = self.ttsObj.getProperty('voices')
            voiceSel = self.m_choice_voice.GetString(self.m_choice_voice.GetSelection())
            for voice in voices:
                #print ('id = {} \nname = {} \nlanguages = {} \n'.format(voice.id, voice.name, voice.languages))
                voiceId = voice.id.lower()
                voiceName = voice.name.lower()
                if(voiceId.find(languageType.lower()) ! = -1) or(voiceName.find(languageName.lower()) ! = -1) :if (voiceSel == ' ') or (voiceSel == voice.name):
                        hasVoice = True
                        break
            if hasVoice:
			    # Call pyttsx3 say() and runAndWait() to complete the text composition
                self.ttsObj.setProperty('voice', voice.id)
                self.ttsObj.say(data)
                self.statusBar.SetStatusText("TTS Conversation Info: Run and Wait")
                self.ttsObj.runAndWait()
                self.statusBar.SetStatusText("TTS Conversation Info: Successfully")
            else:
                self.statusBar.SetStatusText("TTS Conversation Info: Language is not supported by current PC")
            self.textToWav(data, languageType)
        else:
            self.statusBar.SetStatusText("TTS Conversation Info: Unavailable TTS Engine")
Copy the code

3.2 the Text – to – Wav

TTW code implementation is also very simple, currently only eSpeak engine, and only support Chinese and English bilingual recognition. In pzh-py-Speech, the callback function of the “TTW” button on the GUI interface is textToWav(). If the user selects the configuration parameters (speaker gender type, TTW engine type) and clicks the “TTW” button, the execution of textToWav() will be triggered. The code is as follows:

import subprocess

class mainWin(win.speech_win) :

    def textToWav(self, text, language) :
        fileName = self.m_textCtrl_ttsFileName.GetLineText(0)
        if fileName == ' ':
            fileName = 'tts_untitled1.wav'
        ttsFilePath = os.path.join(os.path.dirname(os.path.abspath(os.path.dirname(__file__))), 'conv'.'tts', fileName)
        ttwEngineType = self.m_choice_ttwEngine.GetString(self.m_choice_ttwEngine.GetSelection())
        if ttwEngineType == 'eSpeak TTS':
            ttsTextFile = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'ttsTextTemp.txt')
            ttsTextFileObj = open(ttsTextFile, 'wb')
            ttsTextFileObj.write(text)
            ttsTextFileObj.close()
            try:
                #espeak_path = "C:/tools_mcu/eSpeak/command_line/espeak.exe"
                #subprocess.call([espeak_path, "-v"+languageType[0:2], text])
                gender = self.m_choice_gender.GetString(self.m_choice_gender.GetSelection())
                gender = gender.lower()[0] + '3'
				Call espeak.exe to convert text to WAV file
                subprocess.call(["espeak"."-v"+language[0:2] +'+'+gender, "-f"+ttsTextFile, "-w"+ttsFilePath])
            except:
                self.statusBar.SetStatusText("TTW Conversation Info: eSpeak is not installed or its path is not added into system environment")
            os.remove(ttsTextFile)
        else:
            self.statusBar.SetStatusText("TTW Conversation Info: Unavailable TTW Engine")
Copy the code

So, the speech processing tool pzh-py-speech is the birth of text-based synthesis of riffrump balance will be introduced, where is the applause ~~~

Welcome to subscribe to

The article will be published on my blog park homepage, CSDN homepage and wechat public account platform at the same time.

Wechat search “ruffian balance embedded” or scan the following two-dimensional code, you can see the first time on the phone oh.

Introduction to PyTTSX3

1.1 Microsoft Speech API (SAPI5) engine

1.2 Verifying voice packages supported by the PC

1.3 Adding voice package support for PCS

Introduction to eSpeak

Realization of text synthesis of Pzh-py-Speech

3.1 the Text – to – researched and realized

3.2 the Text – to – Wav

Welcome to subscribe to

Related Posts

How do I listen for DOM size changes

Have learned C++ Primer 5 training projects

12 Docker Open Source alternatives