With more and more people becoming vloggers these days, synthesizing text into speech is a strong demand for many bloggers. Today, combined with the “tool treasure box” this micro channel small program, introduce how to develop text to speech function.

Text to Speech (TTS) is called text to Speech (TTS) in academic circles, and domestic cloud server manufacturers provide interfaces for calling. Every Tencent Cloud user can get a free voice package of 8 million characters for study and research for two months, so this paper takes Tencent Cloud as an example to introduce.

Server-side development

The server needs to use the appId and appKey for authentication when invoking the TTS interface of Tencent Cloud. You need to go to the Tencent Cloud API Center to create a key and perform operations as prompted.

For the convenience of users, Tencent Cloud provides SDKS in various programming languages. You can go to the TTS document and find the SDK you want at the bottom of the page. Take NodeJS as an example, just introduce it in package.json.

"dependencies": {
    "tencentcloud-sdk-nodejs": "4.0.157"
  }
Copy the code

Several important parameters of synthetic speech are: speed, timbre, text content, volume. Tencent Cloud API Explore provides visual tools to guide developers to construct request parameters. Developers only need to refer to the calling methods in the above to call the SDK on the server side.

router.get("/text-to-voice".async (req, res) => {
    let ret = {
        success: true
    }
    try {
        if (req.query.textValue) {
            ret = await contentCheck(req.query.textValue);
            if (ret.success) {
                let param = {
                    "Text": req.query.textValue,
                    "SessionId": req.query.session,
                    "VoiceType": req.query.voiceType ? parseInt(req.query.voiceType) : 2."ModelType": 1."Speed": req.query.voiceSpeed ? parseInt(req.query.voiceSpeed) : 0."Volume": req.query.volume ? parseInt(req.query.volume) : 0
                }
                let tmp = await ttsClient.TextToVoice(param)
                fs.writeFileSync("/www/voice/" + (req.query.oriSession ? req.query.session : md5(req.query.session + req.query.randomSand)) + ".wav", Buffer.from(tmp.Audio, "base64"))}}else {
            ret.success = false;
            ret.frontMessage = globalVariable.frontMessage.paramError
        }
    } catch (error) {
        ret.success = false;
        ret.frontMessage = globalVariable.frontMessage.busy;
        console.log(error)
    }

    res.json(ret)
})
Copy the code

The above code is the “Toolkit” small back-end core code that decodes the Base64 data returned by the TTS interface and writes it to a file whose directory is configured on nginx and can be accessed directly.

location ~ (/images/|/voice/) {
    root /www;
  }
Copy the code

Small program development

The content of the applet is relatively simple and provides the necessary interaction for the user to choose. After the voice is generated, the user can try it out. Through the small program audio playback synthesis of the voice file. Because wechat small program limits, download voice can only be opened through the browser voice files. Here a pop-up is made to prompt the user and copy the address of the voice file to the clipboard.

  playVoice: function (evt) {
    if (!this.data.audio) {
      this.data.audio = wx.createInnerAudioContext();
      this.data.audio.src = this.data.voiceUrl;
      this.data.audio.onEnded(() = > {
        this.setData({
          playing: false})}); }this.data.audio.play();
    this.setData({
      playing: true})}Copy the code