Make writing a habit together! This is the third day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

1. Introduction

Voice synthesis technology is more and more widely used in life, reading, listening to books, order broadcasting, intelligent hardware, voice navigation in many scenarios have added voice broadcasting function. Speech synthesis is based on deep neural network technology, providing highly personified, smooth and natural speech synthesis services, which can simulate the voices of different people, so that apps and devices can speak, and intelligent training of personalized speech.

This topic describes how to use the voice synthesis service provided by Huawei Cloud. You can download the synthesized voice by using the apis provided by Huawei Cloud.

2. Enable the function

Huawei Cloud provides speech synthesis, a service that converts text into realistic speech. The user can access and call the API in real time to obtain the result of speech synthesis and synthesize the user’s input text into audio. Through the choice of timbre, custom volume, speed, for enterprises and individuals to provide personalized pronunciation services.

2.1 Voice Interaction Service

Address: console.huaweicloud.com/sis/?region…

2.2 Help Documents

Address: support.huaweicloud.com/api-sis/sis…

Request Header:

parameter Whether the choice The parameter types describe
X-Auth-Token is String The user Token. Token authentication means that the Token is added to the request header when the API is invoked to obtain the permission to operate the API through identity authentication. The X-subject-Token value in the response header is the Token.

Request head X-ray Auth – Token field in previous articles have been introduced, access method here: bbs.huaweicloud.com/blogs/31775… Turn to section 2.3.



(2) Request Body parameter:

parameter Whether the choice The parameter types describe
text is String The length of the text to be synthesized is less than 500 characters.
config no Configure the JSON Voice composition configuration information.

(3) Configuration parameters of TtsConfig:

parameter Whether the choice The parameter types describe
audio_format no String Voice format header: WAV, MP3, PCM Default: WAv parent node: config
sample_rate no String Sampling rate: 16000, 8000 Default: 8000 Parent node: config
property no String Speech synthesis character string of the form {language}{speaker}{domain}, that is, “languageResearchers identifiedAreas “. Speakers are divided into ordinary speakers and fine speakers. The price of each call is the same. For fine speakers, every 50 words are counted as a call, and less than 50 words are counted as a call. Ordinary speakers count one call every 100 words, less than 100 words count one call. One Chinese character, one English letter or one punctuation mark is counted as one character. Excellent pronunciator: only CN-north-4 and CN-east-3 are supported in the region. Pitch adjustment is not supported for the time being. If you report an error with SIS.0411, please check for compliance with the usage restrictions. Default: chinese_xiaoyan_common Parent node: config
speed no Integer Speed. Value range: -500-500 Default value: 0 Parent node: configDescription:When the value is 0, it indicates the normal speech speed of an adult, which is about 250 words per minute. When setting this value, there is no absolute mapping between speed and value.
pitch no Integer Pitch. Value range: -500-500 Default value: 0 Parent node: config
volume no Integer The volume. Value range: 0-100 Default value: 50 Parent node: config

(4) Common pronator property value range:

The property value describe
chinese_xiaoqi_common Xiaoqi, the standard female voice speaker.
chinese_xiaoyu_common Xiao Yu, the standard male voice speaker.
chinese_xiaoyan_common Xiaoyan, gentle female voice speaker.
chinese_xiaowang_common Xiao Wang, the voice of children.
chinese_xiaowen_common Xiao Wen, soft and beautiful female voice speaker.
chinese_xiaojing_common Xiaojing, the nifty female vocalist.
chinese_xiaosong_common Xiao Song, passionate male voice speaker.
chinese_xiaoxia_common Xiaoxia, the passionate female vocalist.
chinese_xiaodai_common Silly, lovely child voice.
chinese_xiaoqian_common Xiaoqian, mature female voice speaker.
english_cameal_common Cameal, gentle female voice English speaker.

(5) Property value range:

The property value describe
chinese_huaxiaoxia_common Hua Xiaoxia, passionate female vocalist.
chinese_huaxiaogang_common Hua Xiaogang, agile male voice speaker.
chinese_huaxiaolu_common Hua Xiaolu, intellectual female voice speaker.
chinese_huaxiaoshu_common Hua Xiaoshu, soothing female voice speaker.
chinese_huaxiaowei_common Hua Xiaowei, gentle female voice speaker.
chinese_huaxiaoliang_common Hua Xiaoliang, loud and clear female voice.
chinese_huaxiaodong_common Hua Xiaodong, mature male voice speaker.
chinese_huaxiaoyan_common Hua Xiaoyan, strict female voice speaker.
chinese_huaxiaoxuan_common Hua Xiaoxuan, Taiwan female vocalist.
chinese_huaxiaowen_common Hua Xiaowen, soft and beautiful female voice speaker.
chinese_huaxiaoyang_common Hua Xiaoyang, vigorous male voice speaker.
chinese_huaxiaomin_common Hua Xiao Min, female vocalist in southern Fujian.
chinese_huanvxia_literature Hua Nu Xia, wuxia girl voice, only supports 16K sampling rate.
chinese_huaxiaoxuan_literature Hua Xiaoxuan, suspense male vocalist, only supports 16K sampling rate.
chinese_huaxiaomei_common Hua Xiaomei, gentle female voice speaker.

(6) Body parameter of the response

Status code: 200

parameter Whether the choice The parameter types describe
trace_id no String Tokens within the service that can be used to trace specific processes in the log. This token string may not be present in some error cases.
result no object If the invocation succeeds, it indicates the identification result. If the invocation fails, this field does not exist.

(7) CustomResult parameter

parameter Whether the choice The parameter types describe
data no String Voice data, returned in Base64 encoding format. To generate audio files, you need to decode Base64 encoding into byte arrays and save them as audio files in the same format“Audio_format”The default value is in WAV format.

2.3 Online Debugging Interface

Through the online debugging interface, you can quickly debug interface parameters, request mode, return results and other information.

Address: apiexplorer.developer.huaweicloud.com/apiexplorer…

You can also fill in the test parameters online to test the effect.

2.4 Summary of Request Interfaces

Request address format: POST /v1/{project_id}/ TTS https://sis-ext.cn-north-4.myhuaweicloud.com/v1/0e5957be8a00f53c2fa7c0045e4d8fbf/tts request body: {" text ":" please note that the sitting position ", "config" : { "audio_format": "wav", "sample_rate": "16000", "property": "chinese_xiaoqi_common", "speed": 0, "pitch": 0, "volume": Request header: 0}} {" X - Auth - Token ":" * * * * * * ", "the content-type" : "application/json; Charset =UTF-8"} Response body :{" result":{"data": XXXXXXXX "}} This XXXX is the returned Base64 encoded voice data, which can be decoded and saved as a file.Copy the code

3. Realize the source code

Software using QT design, the core part is mainly used to HTTP request related operations.

3.1 Text to speech source code

// Text to voice void Widget::TextToAudio(QString text) {function_select=1; QString requestUrl; QNetworkRequest request; // set the request address QUrl url; RequestUrl = QString("https://sis-ext.%1.myhuaweicloud.com/v1/%2/tts").arg(SERVER_ID).arg(PROJECT_ID); / / set the format of data submission request. SetHeader (QNetworkRequest: : ContentTypeHeader, QVariant (" application/json ")); SetRawHeader (" x-auth-token ", token); SetUrl (requestUrl); request.setUrl(url); QString post_param=QString ("{" ""text": "%1"," ""config": {" ""audio_format": "%2"," ""sample_rate": "%3"," ""property": "%4"," ""speed": %5," ""pitch": 0," ""volume": %6" "}" "}").arg(text).arg(ui->comboBox_formt->currentText()) .arg(ui->comboBox_cai_yang_lv->currentText()) .arg(ui->comboBox_fa_yin_ren->currentText()) .arg(ui->spinBox_audio_speed->value()) .arg(ui->spinBox_yin_liang->value()); Manager ->post(request, post_param.toutf8 ()); Void Widget::on_pushButton_to_audio_clicked() {QString text= UI ->lineEdit->text(); If (text. IsEmpty ()) {QMessageBox: : information (this, "prompt", "please enter the text", QMessageBox: : Ok, QMessageBox: : Ok); return; } qDebug()<<"text:"<<text; TextToAudio(text); }Copy the code

3.2 access token

/* function :GetToken */ void Widget::GetToken() {// indicates to GetToken function_select=3; QString requestUrl; QNetworkRequest request; // set the request address QUrl url; / / access token request address requestUrl = QString (" https://iam.%1.myhuaweicloud.com/v3/auth/tokens "). Arg (SERVER_ID); // create TCP server, test //requestUrl="http://10.0.0.6:8080"; / / set the format of data submission request. SetHeader (QNetworkRequest: : ContentTypeHeader, QVariant (" application/json; charset=UTF-8")); SetUrl (requestUrl); request.setUrl(url); QString text =QString("{"auth":{"identity":{"methods":["password"],"password":" "{"user":{"domain": {" ""name":"%1"},"name": "%2","password": "%3"}}}," ""scope":{"project":{"name":"%4"}}}}") .arg(MAIN_USER) .arg(IAM_USER) .arg(IAM_PASSWORD) .arg(SERVER_ID); Manager ->post(request, text.toutf8 ()); }Copy the code

3.3 Parsing the Returned Value

Void Widget::replyFinished(QNetworkReply * Reply) {QString displayInfo=""; int statusCode = reply->attribute(QNetworkRequest::HttpStatusCodeAttribute).toInt(); QByteArray replyData = reply->readAll(); QDebug ()<<" statusCode :"<<statusCode; QDebug ()<<" Feedback data :"<<QString(replyData); // Update token if(function_select==3) {displayInfo=" Token update failed "; / / read the HTTP response header data QList < QNetworkReply: : RawHeaderPair > RawHeader = reply - > rawHeaderPairs (); QDebug ()<<"HTTP response header number :"<< rawheader.size (); for(int i=0; i<RawHeader.size(); i++) { QString first=RawHeader.at(i).first; QString second=RawHeader.at(i).second; if(first=="X-Subject-Token") { Token=second.toUtf8(); DisplayInfo =" Token update succeeded "; // Save to file SaveDataToFile(Token); break; }} QMessageBox: : information (this, "tip" displayInfo, QMessageBox: : Ok, QMessageBox: : Ok); return; } // Check the status code if(200! = statusCode) {QJsonParseError json_error; QJsonDocument document = QJsonDocument::fromJson(replyData, &json_error); If (json_error. Error == QJsonParseError::NoError) {if(document.isObject()) {QString error_str=""; QJsonObject obj = document.object(); QString error_code; If (obj.contains("error_code")) {error_code=obj.take("error_code").tostring (); Error_str +=" Error code :"; error_str+=error_code; error_str+="\n"; } the if (obj. The contains (" error_msg ")) {error_str + = "error message:"; error_str+=obj.take("error_msg").toString(); error_str+="\n"; } QMessageBox: : information (this, "tip", error_str QMessageBox: : Ok, QMessageBox: : Ok); } } return; } else if(function_select==1) // QJsonParseError json_error; QJsonDocument document = QJsonDocument::fromJson(replyData, &json_error); Error == QJsonParseError::NoError) {if(document.isObject()) {QJsonObject obj = document.object(); If (obj.contains("result")) {QJsonObject obj1=obj.take("result").toobject (); if(obj1.contains("data")) { QString data=obj1.take("data").toString(); QByteArray d2=QByteArray::fromBase64(data.toUtf8()); QDebug ()<<" Data obtained successfully.." ; QStringList path_list=QStandardPaths::standardLocations(QStandardPaths::DownloadLocation); / / save to file a QString filename = QFileDialog: : getSaveFileName (this, "save the audio file," path_list. Ats (0), tr (" *. Wav *. Mp3 *. PCM ")); if(filename.isEmpty()) { filename=path_list.at(0)+"/123.wmv"; } QFile::remove(filename); QFile file_2(filename); file_2.open(QIODevice::WriteOnly); file_2.write(d2); // Write data file_2.close(); } } } } } }Copy the code