AI speech processing - text synthesis speech function

Make writing a habit together! This is the third day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

1. Introduction

Voice synthesis technology is more and more widely used in life, reading, listening to books, order broadcasting, intelligent hardware, voice navigation in many scenarios have added voice broadcasting function. Speech synthesis is based on deep neural network technology, providing highly personified, smooth and natural speech synthesis services, which can simulate the voices of different people, so that apps and devices can speak, and intelligent training of personalized speech.

This topic describes how to use the voice synthesis service provided by Huawei Cloud. You can download the synthesized voice by using the apis provided by Huawei Cloud.

2. Enable the function

Huawei Cloud provides speech synthesis, a service that converts text into realistic speech. The user can access and call the API in real time to obtain the result of speech synthesis and synthesize the user’s input text into audio. Through the choice of timbre, custom volume, speed, for enterprises and individuals to provide personalized pronunciation services.

2.1 Voice Interaction Service

Address: console.huaweicloud.com/sis/?region…

2.2 Help Documents

Address: support.huaweicloud.com/api-sis/sis…

Request Header:

parameter	Whether the choice	The parameter types	describe
X-Auth-Token	is	String	The user Token. Token authentication means that the Token is added to the request header when the API is invoked to obtain the permission to operate the API through identity authentication. The X-subject-Token value in the response header is the Token.

Request head X-ray Auth – Token field in previous articles have been introduced, access method here: bbs.huaweicloud.com/blogs/31775… Turn to section 2.3.

(2) Request Body parameter:

parameter	Whether the choice	The parameter types	describe
text	is	String	The length of the text to be synthesized is less than 500 characters.
config	no	Configure the JSON	Voice composition configuration information.

(3) Configuration parameters of TtsConfig:

parameter	Whether the choice	The parameter types	describe
audio_format	no	String	Voice format header: WAV, MP3, PCM Default: WAv parent node: config
sample_rate	no	String	Sampling rate: 16000, 8000 Default: 8000 Parent node: config
property	no	String	Speech synthesis character string of the form {language}{speaker}{domain}, that is, “languageResearchers identifiedAreas “. Speakers are divided into ordinary speakers and fine speakers. The price of each call is the same. For fine speakers, every 50 words are counted as a call, and less than 50 words are counted as a call. Ordinary speakers count one call every 100 words, less than 100 words count one call. One Chinese character, one English letter or one punctuation mark is counted as one character. Excellent pronunciator: only CN-north-4 and CN-east-3 are supported in the region. Pitch adjustment is not supported for the time being. If you report an error with SIS.0411, please check for compliance with the usage restrictions. Default: chinese_xiaoyan_common Parent node: config
speed	no	Integer	Speed. Value range: -500-500 Default value: 0 Parent node: configDescription:When the value is 0, it indicates the normal speech speed of an adult, which is about 250 words per minute. When setting this value, there is no absolute mapping between speed and value.
pitch	no	Integer	Pitch. Value range: -500-500 Default value: 0 Parent node: config
volume	no	Integer	The volume. Value range: 0-100 Default value: 50 Parent node: config

(4) Common pronator property value range:

The property value	describe
chinese_xiaoqi_common	Xiaoqi, the standard female voice speaker.
chinese_xiaoyu_common	Xiao Yu, the standard male voice speaker.
chinese_xiaoyan_common	Xiaoyan, gentle female voice speaker.
chinese_xiaowang_common	Xiao Wang, the voice of children.
chinese_xiaowen_common	Xiao Wen, soft and beautiful female voice speaker.
chinese_xiaojing_common	Xiaojing, the nifty female vocalist.
chinese_xiaosong_common	Xiao Song, passionate male voice speaker.
chinese_xiaoxia_common	Xiaoxia, the passionate female vocalist.
chinese_xiaodai_common	Silly, lovely child voice.
chinese_xiaoqian_common	Xiaoqian, mature female voice speaker.
english_cameal_common	Cameal, gentle female voice English speaker.

(5) Property value range:

The property value	describe
chinese_huaxiaoxia_common	Hua Xiaoxia, passionate female vocalist.
chinese_huaxiaogang_common	Hua Xiaogang, agile male voice speaker.
chinese_huaxiaolu_common	Hua Xiaolu, intellectual female voice speaker.
chinese_huaxiaoshu_common	Hua Xiaoshu, soothing female voice speaker.
chinese_huaxiaowei_common	Hua Xiaowei, gentle female voice speaker.
chinese_huaxiaoliang_common	Hua Xiaoliang, loud and clear female voice.
chinese_huaxiaodong_common	Hua Xiaodong, mature male voice speaker.
chinese_huaxiaoyan_common	Hua Xiaoyan, strict female voice speaker.
chinese_huaxiaoxuan_common	Hua Xiaoxuan, Taiwan female vocalist.
chinese_huaxiaowen_common	Hua Xiaowen, soft and beautiful female voice speaker.
chinese_huaxiaoyang_common	Hua Xiaoyang, vigorous male voice speaker.
chinese_huaxiaomin_common	Hua Xiao Min, female vocalist in southern Fujian.
chinese_huanvxia_literature	Hua Nu Xia, wuxia girl voice, only supports 16K sampling rate.
chinese_huaxiaoxuan_literature	Hua Xiaoxuan, suspense male vocalist, only supports 16K sampling rate.
chinese_huaxiaomei_common	Hua Xiaomei, gentle female voice speaker.

(6) Body parameter of the response

Status code: 200

parameter	Whether the choice	The parameter types	describe
trace_id	no	String	Tokens within the service that can be used to trace specific processes in the log. This token string may not be present in some error cases.
result	no	object	If the invocation succeeds, it indicates the identification result. If the invocation fails, this field does not exist.

(7) CustomResult parameter

parameter	Whether the choice	The parameter types	describe
data	no	String	Voice data, returned in Base64 encoding format. To generate audio files, you need to decode Base64 encoding into byte arrays and save them as audio files in the same format“Audio_format”The default value is in WAV format.

2.3 Online Debugging Interface

Through the online debugging interface, you can quickly debug interface parameters, request mode, return results and other information.

Address: apiexplorer.developer.huaweicloud.com/apiexplorer…

You can also fill in the test parameters online to test the effect.

2.4 Summary of Request Interfaces

Request address format: POST /v1/{project_id}/ TTS https://sis-ext.cn-north-4.myhuaweicloud.com/v1/0e5957be8a00f53c2fa7c0045e4d8fbf/tts request body: {" text ":" please note that the sitting position ", "config" : { "audio_format": "wav", "sample_rate": "16000", "property": "chinese_xiaoqi_common", "speed": 0, "pitch": 0, "volume": Request header: 0}} {" X - Auth - Token ":" * * * * * * ", "the content-type" : "application/json; Charset =UTF-8"} Response body :{" result":{"data": XXXXXXXX "}} This XXXX is the returned Base64 encoded voice data, which can be decoded and saved as a file.Copy the code

3. Realize the source code

Software using QT design, the core part is mainly used to HTTP request related operations.

3.1 Text to speech source code

// Text to voice void Widget::TextToAudio(QString text) {function_select=1; QString requestUrl; QNetworkRequest request; // set the request address QUrl url; RequestUrl = QString("https://sis-ext.%1.myhuaweicloud.com/v1/%2/tts").arg(SERVER_ID).arg(PROJECT_ID); / / set the format of data submission request. SetHeader (QNetworkRequest: : ContentTypeHeader, QVariant (" application/json ")); SetRawHeader (" x-auth-token ", token); SetUrl (requestUrl); request.setUrl(url); QString post_param=QString ("{" ""text": "%1"," ""config": {" ""audio_format": "%2"," ""sample_rate": "%3"," ""property": "%4"," ""speed": %5," ""pitch": 0," ""volume": %6" "}" "}").arg(text).arg(ui->comboBox_formt->currentText()) .arg(ui->comboBox_cai_yang_lv->currentText()) .arg(ui->comboBox_fa_yin_ren->currentText()) .arg(ui->spinBox_audio_speed->value()) .arg(ui->spinBox_yin_liang->value()); Manager ->post(request, post_param.toutf8 ()); Void Widget::on_pushButton_to_audio_clicked() {QString text= UI ->lineEdit->text(); If (text. IsEmpty ()) {QMessageBox: : information (this, "prompt", "please enter the text", QMessageBox: : Ok, QMessageBox: : Ok); return; } qDebug()<<"text:"<<text; TextToAudio(text); }Copy the code

3.2 access token

/* function :GetToken */ void Widget::GetToken() {// indicates to GetToken function_select=3; QString requestUrl; QNetworkRequest request; // set the request address QUrl url; / / access token request address requestUrl = QString (" https://iam.%1.myhuaweicloud.com/v3/auth/tokens "). Arg (SERVER_ID); // create TCP server, test //requestUrl="http://10.0.0.6:8080"; / / set the format of data submission request. SetHeader (QNetworkRequest: : ContentTypeHeader, QVariant (" application/json; charset=UTF-8")); SetUrl (requestUrl); request.setUrl(url); QString text =QString("{"auth":{"identity":{"methods":["password"],"password":" "{"user":{"domain": {" ""name":"%1"},"name": "%2","password": "%3"}}}," ""scope":{"project":{"name":"%4"}}}}") .arg(MAIN_USER) .arg(IAM_USER) .arg(IAM_PASSWORD) .arg(SERVER_ID); Manager ->post(request, text.toutf8 ()); }Copy the code

3.3 Parsing the Returned Value

Void Widget::replyFinished(QNetworkReply * Reply) {QString displayInfo=""; int statusCode = reply->attribute(QNetworkRequest::HttpStatusCodeAttribute).toInt(); QByteArray replyData = reply->readAll(); QDebug ()<<" statusCode :"<<statusCode; QDebug ()<<" Feedback data :"<<QString(replyData); // Update token if(function_select==3) {displayInfo=" Token update failed "; / / read the HTTP response header data QList < QNetworkReply: : RawHeaderPair > RawHeader = reply - > rawHeaderPairs (); QDebug ()<<"HTTP response header number :"<< rawheader.size (); for(int i=0; i<RawHeader.size(); i++) { QString first=RawHeader.at(i).first; QString second=RawHeader.at(i).second; if(first=="X-Subject-Token") { Token=second.toUtf8(); DisplayInfo =" Token update succeeded "; // Save to file SaveDataToFile(Token); break; }} QMessageBox: : information (this, "tip" displayInfo, QMessageBox: : Ok, QMessageBox: : Ok); return; } // Check the status code if(200! = statusCode) {QJsonParseError json_error; QJsonDocument document = QJsonDocument::fromJson(replyData, &json_error); If (json_error. Error == QJsonParseError::NoError) {if(document.isObject()) {QString error_str=""; QJsonObject obj = document.object(); QString error_code; If (obj.contains("error_code")) {error_code=obj.take("error_code").tostring (); Error_str +=" Error code :"; error_str+=error_code; error_str+="\n"; } the if (obj. The contains (" error_msg ")) {error_str + = "error message:"; error_str+=obj.take("error_msg").toString(); error_str+="\n"; } QMessageBox: : information (this, "tip", error_str QMessageBox: : Ok, QMessageBox: : Ok); } } return; } else if(function_select==1) // QJsonParseError json_error; QJsonDocument document = QJsonDocument::fromJson(replyData, &json_error); Error == QJsonParseError::NoError) {if(document.isObject()) {QJsonObject obj = document.object(); If (obj.contains("result")) {QJsonObject obj1=obj.take("result").toobject (); if(obj1.contains("data")) { QString data=obj1.take("data").toString(); QByteArray d2=QByteArray::fromBase64(data.toUtf8()); QDebug ()<<" Data obtained successfully.." ; QStringList path_list=QStandardPaths::standardLocations(QStandardPaths::DownloadLocation); / / save to file a QString filename = QFileDialog: : getSaveFileName (this, "save the audio file," path_list. Ats (0), tr (" *. Wav *. Mp3 *. PCM ")); if(filename.isEmpty()) { filename=path_list.at(0)+"/123.wmv"; } QFile::remove(filename); QFile file_2(filename); file_2.open(QIODevice::WriteOnly); file_2.write(d2); // Write data file_2.close(); } } } } } }Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

AI speech processing – text synthesis speech function

1. Introduction

2. Enable the function

2.1 Voice Interaction Service

2.2 Help Documents

2.3 Online Debugging Interface

2.4 Summary of Request Interfaces

3. Realize the source code

3.1 Text to speech source code

3.2 access token

3.3 Parsing the Returned Value

AI speech processing – text synthesis speech function

1. Introduction

2. Enable the function

2.1 Voice Interaction Service

2.2 Help Documents

2.3 Online Debugging Interface

2.4 Summary of Request Interfaces

3. Realize the source code

3.1 Text to speech source code

3.2 access token

3.3 Parsing the Returned Value

Related Posts

Solving 3d Path planning for Multiple Unmanned Aerial vehicles based on MATLAB Particle Swarm Genetics

High reuse Bert model text classification code (3) training part

The upcoming WICC2021 “Audio and video +AI” is a new technology highlight