What is volute?

Volute is a voice assistant created using Raspberry Pi+Node.js.

What is raspberry pie?

The Raspberry Pi (English: Raspberry Pi) is a Linux-based single-chip computer developed by the Raspberry Pi Foundation in the UK to promote basic computer science education in schools with low-cost hardware and free software.

Every generation of Raspberry PI uses ARM architecture processors produced by Broadcom. Today’s models contain between 2GB and 8GB of internal memory, and mainly use SD cards or TF cards as storage media. Equipped with USB interface, HDMI video output (support sound output) and RCA terminal output, built-in Ethernet/WLAN/Bluetooth network connection mode (according to the model), and can use a variety of operating systems. The product line is divided into type A, Type B, Zero and ComputeModule.

Simply put, this is a computer you can put in your pocket!!

What is a Node. Js?

Javascript was originally executed in a browser environment. Node.js is an environment that can execute Javascript, an event driven I/O server side Javascript environment, based on Google’s V8 engine.

What is a man-machine dialogue system?

Human-machine Conversation is a technology that enables machines to understand and use natural language to communicate with each other.

The dialogue system can be roughly divided into five basic modules: speech recognition (ASR), natural speech understanding (NLU), dialogue management (DM), natural language generation (NLG), speech synthesis (TTS).

  • Speech recognition (ASR) : The completion of speech to text conversion, the user’s speaking voice into speech.
  • Natural language Understanding (NLU) : complete the semantic analysis of the text, extract key information, intent recognition and entity recognition.
  • Dialog management (DM) : responsible for dialog state maintenance, database query, context management, etc.
  • Natural language generation (NLG) : Generating corresponding natural language text.
  • Speech synthesis (TTS) : Converts generated text into speech.

Material preparation

  • Raspberry PI 4B motherboard
  • Raspberry PI 5V3A TYPE C Interface
  • Microusb microphone
  • Mini speakers
  • 16 gb TF card
  • Chuan Yu card reader
  • Dupont wire, housing, heat sink…

Raspberry PI system installation and basic configuration

The new Raspberry PI doesn’t just work on the machine like your Macbook does. There’s a step-by-step process to getting it right

Burn operating system

Raspberry PI has no hard disk structure and only a microSD card slot for storage, so the operating system needs to be loaded into the microSD card.

Raspberry PI supports many operating systems, but Raspbian, the official recommended Raspbian, is a Debian Linux-based raspberry PI dedicated system for all models of raspberry PI.

I used the Raspberry Pi Imager tool for the Raspberry Pi Burning system image.

Basic configuration

To configure the PI, you first need to start the system. You can connect the PI to the monitor, keyboard and mouse to see the system desktop. I used another method:

  • Use the IP Scanner tool to scan the IP address of the Raspberry Pi

  • After the IP address is scanned out, use the VNC Viewer tool to connect to the system

  • You can also SSH the connection directly and configure it using the raspi-config command

  • Configure network, resolution, language, and input/output audio parameters

Volute implementation idea

Task scheduling service

const fs = require("fs");
const path = require("path");
const Speaker = require("speaker");
const { record } = require("node-record-lpcm16");
const XunFeiIAT = require("./services/xunfeiiat.service");
const XunFeiTTS = require("./services/xunfeitts.service");
const initSnowboy = require("./services/snowboy.service");
const TulingBotService = require("./services/tulingbot.service");
// Task scheduling service
const taskScheduling = {
  / / the microphone
  mic: null.speaker: null.detector: null.// Audio input stream
  inputStream: null.// Audio output stream
  outputStream: null.init() {
    // Initialize snowboy
    this.detector = initSnowboy({
      record: this.recordSound.bind(this),
      stopRecord: this.stopRecord.bind(this)});// Pass the stream to snowboy as the microphone picks it up
    this.mic.pipe(this.detector);
  },
  start() {
    // Listen to the microphone input stream
    this.mic = record({
      sampleRate: 16000./ / sampling rate
      threshold: 0.5.verbose: true.recordProgram: "arecord",
    }).stream();
    this.init();
  },
  // Record the audio input
  recordSound() {
    // Before each recording, stop the output stream that was not played last time
    this.stopSpeak();
    console.log("start record");
    // Create a writable stream
    this.inputStream = fs.createWriteStream(
      path.resolve(__dirname, "./assets/input.wav"),
      {
        encoding: "binary"});// Pipe stream, which passes the input stream received by the microphone to the created writable stream
    this.mic.pipe(this.inputStream);
  },
  // Stop the audio input
  stopRecord() {
    if (this.inputStream) {
      console.log("stop record");
      // Unbind the pipe stream bound to this. MAC
      this.mic.unpipe(this.inputStream);
      this.mic.unpipe(this.detector);
      process.nextTick(() = > {
        // Destroy the input stream
        this.inputStream.destroy();
        this.inputStream = null;
        // Reinitialize
        this.init();
        // Call the voice dictation service
        this.speech2Text(); }); }},// speech to text
  speech2Text() {
    // Instantiate the voice dictation service
    const iatService = new XunFeiIAT({
      onReply: (msg) = > {
        console.log("msg", msg);
        // Callback to call the chat function
        this.onChat(msg); }}); iatService.init(); },// Chat -> Turing robot
  onChat(text) {
    // Instantiate the chatbot
    TulingBotService.start(text).then((res) = > {
      console.log(res);
      // Receive the chat message and call the voice synthesis service
      this.text2Speech(res);
    });
  },
  // text to speech
  text2Speech(text) {
    // Instantiate the speech synthesis service
    const ttsService = new XunFeiTTS({
      text,
      onDone: () = > {
        console.log("onDone");
        this.onSpeak(); }}); ttsService.init(); },// Playback, audio output
  onSpeak() {
    // Instantiate the speaker to play the speech
    this.speaker = new Speaker({
      channels: 1.bitDepth: 16.sampleRate: 16000});// Create a readable stream
    this.outputStream = fs.createReadStream(
      path.resolve(__dirname, "./assets/output.wav"));// this is just to activate the speaker, 2s delay
    this.speaker.write(Buffer.alloc(32000.10));
    // Stream the output to the speaker for playback
    this.outputStream.pipe(this.speaker);
    this.outputStream.on("end".() = > {
      this.outputStream = null;
      this.speaker = null;
    });
  },
  // Stop playing
  stopSpeak() {
    this.outputStream && this.outputStream.unpipe(this.speaker); }}; taskScheduling.start();Copy the code

Hot word wake up Snowboy

Voice assistants need to wake up, just like devices on the market. The need for storage resources and network connections is enormous if you keep listening without a wake-up step.

Snowboy is a highly customizable Hotwords Detection Library that can be used for real-time embedded systems. After training Hotwords, it can be run offline with very little power consumption. Currently, it runs on Raspberry Pi, (Ubuntu) Linux and Mac OS X.

const path = require("path");
const snowboy = require("snowboy");
const models = new snowboy.Models();

// Add a training model
models.add({
  file: path.resolve(__dirname, ".. /configs/volute.pmdl"),
  sensitivity: "0.5".hotwords: "volute"});// Initialize the Detector object
const detector = new snowboy.Detector({
  resource: path.resolve(__dirname, ".. /configs/common.res"),
  models: models,
  audioGain: 1.0.applyFrontend: false});/** * Initialize initSnowboy. * 2. During the recording, when there is a sound, reset the silenceCount parameter * 3. During recording, when no sound is received, the silenceCount is accumulated, and when the accumulated value is greater than 3, the recording is stopped */
function initSnowboy({ record, stopRecord }) {
  const MAX_SILENCE_COUNT = 3;
  let silenceCount = 0,
    speaking = false;
  /** * Silence event callback, triggered when there is no sound
  const onSilence = () = > {
    console.log("silence");
    if (speaking && ++silenceCount > MAX_SILENCE_COUNT) {
      speaking = false;
      stopRecord && stopRecord();
      detector.off("silence", onSilence);
      detector.off("sound", onSound);
      detector.off("hotword", onHotword); }};/** * Callback to the sound event, which fires when there is a sound
  const onSound = () = > {
    console.log("sound");
    if (speaking) {
      silenceCount = 0; }};/** * hotword event callback, which fires when listening for a hotword
  const onHotword = (index, hotword, buffer) = > {
    if(! speaking) { silenceCount =0;
      speaking = true; record && record(); }}; detector.on("silence", onSilence);
  detector.on("sound", onSound);
  detector.on("hotword", onHotword);
  return detector;
}

module.exports = initSnowboy;
Copy the code

Voice dictation iflytek API

Speech-to-text uses the voice dictation service of The IfI Open platform. It can accurately identify short audio (≤60 seconds) into text, in addition to Mandarin and English, support 25 dialects and 12 languages, real-time return results, to achieve the effect of speaking while returning.

require("dotenv").config();
const fs = require("fs");
const WebSocket = require("ws");
const { resolve } = require("path");
const { createAuthParams } = require(".. /utils/auth");

class XunFeiIAT {
  constructor({ onReply }) {
    super(a);/ / connection
    this.ws = null;
    // Return the result, the parsed text of the message
    this.message = "";
    this.onReply = onReply;
    // The input stream voice file that needs to be converted
    this.inputFile = resolve(__dirname, ".. /assets/input.wav");
    // Interface input parameter
    this.params = {
      host: "iat-api.xfyun.cn".path: "/v2/iat".apiKey: process.env.XUNFEI_API_KEY,
      secret: process.env.XUNFEI_SECRET,
    };
  }
  // Generate webSocket connection
  generateWsUrl() {
    const { host, path } = this.params;
    // Interface authentication, parameter encryption
    const params = createAuthParams(this.params);
    return `ws://${host}${path}?${params}`;
  }
  / / initialization
  init() {
    const reqUrl = this.generateWsUrl();
    this.ws = new WebSocket(reqUrl);
    this.initWsEvent();
  }
  // Initialize the WebSocket event
  initWsEvent() {
    this.ws.on("open".this.onOpen.bind(this));
    this.ws.on("error".this.onError);
    this.ws.on("close".this.onClose);
    this.ws.on("message".this.onMessage.bind(this));
  }
  /** * WebSocket Open event, which is triggered to indicate that the connection has been successfully established */
  onOpen() {
    console.log("open");
    this.onPush(this.inputFile);
  }
  onPush(file) {
    this.pushAudioFile(file);
  }
  // WebSocket message receive callback
  onMessage(data) {
    const payload = JSON.parse(data);
    if (payload.data && payload.data.result) {
      // Concatenate the message result
      this.message += payload.data.result.ws.reduce(
        (acc, item) = > acc + item.cw.map((cw) = > cw.w),
        ""
      );
      // Status 2 indicates the end
      if (payload.data.status === 2) {
        this.onReply(this.message); }}}// WebSocket shutdown event
  onClose() {
    console.log("close");
  }
  // WebSocket error event
  onError(error) {
    console.log(error);
  }
  /** * Parse the voice file and send the voice to the back end as a binary stream */
  pushAudioFile(audioFile) {
    this.message = "";
    // Send the required carrier parameters
    const audioPayload = (statusCode, audioBase64) = > ({
      common:
        statusCode === 0
          ? {
              app_id: "5f6cab72",} :undefined.business:
        statusCode === 0
          ? {
              language: "zh_cn".domain: "iat".ptt: 0,} :undefined.data: {
        status: statusCode,
        format: "audio/L16; rate=16000".encoding: "raw".audio: audioBase64,
      },
    });
    const chunkSize = 9000;
    // Create a buffer to store binary data
    const buffer = Buffer.alloc(chunkSize);
    // Open the voice file
    fs.open(audioFile, "r".(err, fd) = > {
      if (err) {
        throw err;
      }

      let i = 0;
      // Send the binary stream recursively
      function readNextChunk() {
        fs.read(fd, buffer, 0, chunkSize, null.(errr, nread) = > {
          if (errr) {
            throw errr;
          }
          // nread indicates that the file stream has been read, and sends the end of transmission identifier (status=2)
          if (nread === 0) {
            this.ws.send(
              JSON.stringify({
                data: { status: 2}}));return fs.close(fd, (err) = > {
              if (err) {
                throwerr; }}); }let data;
          if (nread < chunkSize) {
            data = buffer.slice(0, nread);
          } else {
            data = buffer;
          }

          const audioBase64 = data.toString("base64");
          const payload = audioPayload(i >= 1 ? 1 : 0, audioBase64);
          this.ws.send(JSON.stringify(payload));
          i++;
          readNextChunk.call(this);
        });
      }

      readNextChunk.call(this); }); }}module.exports = XunFeiIAT;
Copy the code

Chatbot Turing Bot API

Turing Robot API V2.0 is an online service and development interface for developers and enterprises based on the core technologies of Turing Robot platform, such as semantic understanding and deep learning.

At present, THE API interface can call the corpus of chat dialogue, corpus and skill modules:

Chat dialogue refers to nearly 1 billion public dialogue corpus provided by the platform free of charge to meet the entertainment needs of users.

Corpus refers to the private corpus uploaded by users on the platform, which is only for personal viewing and use, and helps users to build the corpus of specialized fields in the most convenient way.

Skills service refers to 26 practical service skills packaged by the platform. Covering life, travel, shopping and other fields, one-stop to meet the needs of users.

require("dotenv").config();
const axios = require("axios");

// It's too easy... Too lazy to explain 🐶

const TulingBotService = {
  requestUrl: "http://openapi.tuling123.com/openapi/api/v2".start(text) {
    return new Promise((resolve) = > {
      axios
        .post(this.requestUrl, {
          reqType: 0.perception: {
            inputText: {
              text,
            },
          },
          userInfo: {
            apiKey: process.env.TULING_BOT_API_KEY,
            userId: process.env.TULING_BOT_USER_ID,
          },
        })
        .then((res) = > {
          // console.log(JSON.stringify(res.data, null, 2));
          resolve(res.data.results[0].values.text); }); }); }};module.exports = TulingBotService;
Copy the code

Iflytek API for speech synthesis

The speech synthesis streaming interface converts text information into sound information, and provides a large number of distinctive speakers (sound library) for you to choose.

This voice capability provides a common interface to developers through the Websocket API. The Websocket API supports streaming data transmission and is applicable to AI service scenarios that require streaming data transmission. Compared to THE SDK, the API is lightweight and cross-language; The Websocket API protocol has the advantage of native cross-domain support over the HTTP API.

require("dotenv").config();
const fs = require("fs");
const WebSocket = require("ws");
const { resolve } = require("path");
const { createAuthParams } = require(".. /utils/auth");

class XunFeiTTS {
  constructor({ text, onDone }) {
    super(a);this.ws = null;
    // The text to be converted
    this.text = text;
    this.onDone = onDone;
    // The converted voice file
    this.outputFile = resolve(__dirname, ".. /assets/output.wav");
    // Interface input parameter
    this.params = {
      host: "tts-api.xfyun.cn".path: "/v2/tts".appid: process.env.XUNFEI_APP_ID,
      apiKey: process.env.XUNFEI_API_KEY,
      secret: process.env.XUNFEI_SECRET,
    };
  }
  // Generate webSocket connection
  generateWsUrl() {
    const { host, path } = this.params;
    const params = createAuthParams(this.params);
    return `ws://${host}${path}?${params}`;
  }
  / / initialization
  init() {
    const reqUrl = this.generateWsUrl();
    console.log(reqUrl);
    this.ws = new WebSocket(reqUrl);
    this.initWsEvent();
  }
  // Initialize the WebSocket event
  initWsEvent() {
    this.ws.on("open".this.onOpen.bind(this));
    this.ws.on("error".this.onError);
    this.ws.on("close".this.onClose);
    this.ws.on("message".this.onMessage.bind(this));
  }
  /** * WebSocket Open event, which is triggered to indicate that the connection has been successfully established */
  onOpen() {
    console.log("open");
    this.onSend();
    if (fs.existsSync(this.outputFile)) {
      fs.unlinkSync(this.outputFile); }}// Send the parameter information to be converted
  onSend() {
    const frame = {
      / / fill the common
      common: {
        app_id: this.params.appid,
      },
      / / fill the business
      business: {
        aue: "raw".auf: "audio/L16; rate=16000".vcn: "xiaoyan".tte: "UTF8",},/ / fill the data
      data: {
        text: Buffer.from(this.text).toString("base64"),
        status: 2,}};this.ws.send(JSON.stringify(frame));
  }
  // Save the converted speech result
  onSave(data) {
    fs.writeFileSync(this.outputFile, data, { flag: "a" });
  }
  // WebSocket message receive callback
  onMessage(data, err) {
    if (err) return;
    const res = JSON.parse(data);
    if(res.code ! = =0) {
      this.ws.close();
      return;
    }
    // Receive the message result and save it
    const audio = res.data.audio;
    const audioBuf = Buffer.from(audio, "base64");
    this.onSave(audioBuf);
    if (res.code == 0 && res.data.status == 2) {
      this.ws.close();
      this.onDone(); }}onClose() {
    console.log("close");
  }
  onError(error) {
    console.log(error); }}module.exports = XunFeiTTS;
Copy the code

Results demonstrate

Finch – see the effect at the bottom of the article

The source address

Github source address if there is help to you, leave a star