What is volute?
Volute is a voice assistant created using Raspberry Pi+Node.js.
What is raspberry pie?
The Raspberry Pi (English: Raspberry Pi) is a Linux-based single-chip computer developed by the Raspberry Pi Foundation in the UK to promote basic computer science education in schools with low-cost hardware and free software.
Every generation of Raspberry PI uses ARM architecture processors produced by Broadcom. Today’s models contain between 2GB and 8GB of internal memory, and mainly use SD cards or TF cards as storage media. Equipped with USB interface, HDMI video output (support sound output) and RCA terminal output, built-in Ethernet/WLAN/Bluetooth network connection mode (according to the model), and can use a variety of operating systems. The product line is divided into type A, Type B, Zero and ComputeModule.
Simply put, this is a computer you can put in your pocket!!
What is a Node. Js?
Javascript was originally executed in a browser environment. Node.js is an environment that can execute Javascript, an event driven I/O server side Javascript environment, based on Google’s V8 engine.
What is a man-machine dialogue system?
Human-machine Conversation is a technology that enables machines to understand and use natural language to communicate with each other.
The dialogue system can be roughly divided into five basic modules: speech recognition (ASR), natural speech understanding (NLU), dialogue management (DM), natural language generation (NLG), speech synthesis (TTS).
- Speech recognition (ASR) : The completion of speech to text conversion, the user’s speaking voice into speech.
- Natural language Understanding (NLU) : complete the semantic analysis of the text, extract key information, intent recognition and entity recognition.
- Dialog management (DM) : responsible for dialog state maintenance, database query, context management, etc.
- Natural language generation (NLG) : Generating corresponding natural language text.
- Speech synthesis (TTS) : Converts generated text into speech.
Material preparation
- Raspberry PI 4B motherboard
- Raspberry PI 5V3A TYPE C Interface
- Microusb microphone
- Mini speakers
- 16 gb TF card
- Chuan Yu card reader
- Dupont wire, housing, heat sink…
Raspberry PI system installation and basic configuration
The new Raspberry PI doesn’t just work on the machine like your Macbook does. There’s a step-by-step process to getting it right
Burn operating system
Raspberry PI has no hard disk structure and only a microSD card slot for storage, so the operating system needs to be loaded into the microSD card.
Raspberry PI supports many operating systems, but Raspbian, the official recommended Raspbian, is a Debian Linux-based raspberry PI dedicated system for all models of raspberry PI.
I used the Raspberry Pi Imager tool for the Raspberry Pi Burning system image.
Basic configuration
To configure the PI, you first need to start the system. You can connect the PI to the monitor, keyboard and mouse to see the system desktop. I used another method:
- Use the IP Scanner tool to scan the IP address of the Raspberry Pi
- After the IP address is scanned out, use the VNC Viewer tool to connect to the system
- You can also SSH the connection directly and configure it using the raspi-config command
- Configure network, resolution, language, and input/output audio parameters
Volute implementation idea
Task scheduling service
const fs = require("fs");
const path = require("path");
const Speaker = require("speaker");
const { record } = require("node-record-lpcm16");
const XunFeiIAT = require("./services/xunfeiiat.service");
const XunFeiTTS = require("./services/xunfeitts.service");
const initSnowboy = require("./services/snowboy.service");
const TulingBotService = require("./services/tulingbot.service");
// Task scheduling service
const taskScheduling = {
/ / the microphone
mic: null.speaker: null.detector: null.// Audio input stream
inputStream: null.// Audio output stream
outputStream: null.init() {
// Initialize snowboy
this.detector = initSnowboy({
record: this.recordSound.bind(this),
stopRecord: this.stopRecord.bind(this)});// Pass the stream to snowboy as the microphone picks it up
this.mic.pipe(this.detector);
},
start() {
// Listen to the microphone input stream
this.mic = record({
sampleRate: 16000./ / sampling rate
threshold: 0.5.verbose: true.recordProgram: "arecord",
}).stream();
this.init();
},
// Record the audio input
recordSound() {
// Before each recording, stop the output stream that was not played last time
this.stopSpeak();
console.log("start record");
// Create a writable stream
this.inputStream = fs.createWriteStream(
path.resolve(__dirname, "./assets/input.wav"),
{
encoding: "binary"});// Pipe stream, which passes the input stream received by the microphone to the created writable stream
this.mic.pipe(this.inputStream);
},
// Stop the audio input
stopRecord() {
if (this.inputStream) {
console.log("stop record");
// Unbind the pipe stream bound to this. MAC
this.mic.unpipe(this.inputStream);
this.mic.unpipe(this.detector);
process.nextTick(() = > {
// Destroy the input stream
this.inputStream.destroy();
this.inputStream = null;
// Reinitialize
this.init();
// Call the voice dictation service
this.speech2Text(); }); }},// speech to text
speech2Text() {
// Instantiate the voice dictation service
const iatService = new XunFeiIAT({
onReply: (msg) = > {
console.log("msg", msg);
// Callback to call the chat function
this.onChat(msg); }}); iatService.init(); },// Chat -> Turing robot
onChat(text) {
// Instantiate the chatbot
TulingBotService.start(text).then((res) = > {
console.log(res);
// Receive the chat message and call the voice synthesis service
this.text2Speech(res);
});
},
// text to speech
text2Speech(text) {
// Instantiate the speech synthesis service
const ttsService = new XunFeiTTS({
text,
onDone: () = > {
console.log("onDone");
this.onSpeak(); }}); ttsService.init(); },// Playback, audio output
onSpeak() {
// Instantiate the speaker to play the speech
this.speaker = new Speaker({
channels: 1.bitDepth: 16.sampleRate: 16000});// Create a readable stream
this.outputStream = fs.createReadStream(
path.resolve(__dirname, "./assets/output.wav"));// this is just to activate the speaker, 2s delay
this.speaker.write(Buffer.alloc(32000.10));
// Stream the output to the speaker for playback
this.outputStream.pipe(this.speaker);
this.outputStream.on("end".() = > {
this.outputStream = null;
this.speaker = null;
});
},
// Stop playing
stopSpeak() {
this.outputStream && this.outputStream.unpipe(this.speaker); }}; taskScheduling.start();Copy the code
Hot word wake up Snowboy
Voice assistants need to wake up, just like devices on the market. The need for storage resources and network connections is enormous if you keep listening without a wake-up step.
Snowboy is a highly customizable Hotwords Detection Library that can be used for real-time embedded systems. After training Hotwords, it can be run offline with very little power consumption. Currently, it runs on Raspberry Pi, (Ubuntu) Linux and Mac OS X.
const path = require("path");
const snowboy = require("snowboy");
const models = new snowboy.Models();
// Add a training model
models.add({
file: path.resolve(__dirname, ".. /configs/volute.pmdl"),
sensitivity: "0.5".hotwords: "volute"});// Initialize the Detector object
const detector = new snowboy.Detector({
resource: path.resolve(__dirname, ".. /configs/common.res"),
models: models,
audioGain: 1.0.applyFrontend: false});/** * Initialize initSnowboy. * 2. During the recording, when there is a sound, reset the silenceCount parameter * 3. During recording, when no sound is received, the silenceCount is accumulated, and when the accumulated value is greater than 3, the recording is stopped */
function initSnowboy({ record, stopRecord }) {
const MAX_SILENCE_COUNT = 3;
let silenceCount = 0,
speaking = false;
/** * Silence event callback, triggered when there is no sound
const onSilence = () = > {
console.log("silence");
if (speaking && ++silenceCount > MAX_SILENCE_COUNT) {
speaking = false;
stopRecord && stopRecord();
detector.off("silence", onSilence);
detector.off("sound", onSound);
detector.off("hotword", onHotword); }};/** * Callback to the sound event, which fires when there is a sound
const onSound = () = > {
console.log("sound");
if (speaking) {
silenceCount = 0; }};/** * hotword event callback, which fires when listening for a hotword
const onHotword = (index, hotword, buffer) = > {
if(! speaking) { silenceCount =0;
speaking = true; record && record(); }}; detector.on("silence", onSilence);
detector.on("sound", onSound);
detector.on("hotword", onHotword);
return detector;
}
module.exports = initSnowboy;
Copy the code
Voice dictation iflytek API
Speech-to-text uses the voice dictation service of The IfI Open platform. It can accurately identify short audio (≤60 seconds) into text, in addition to Mandarin and English, support 25 dialects and 12 languages, real-time return results, to achieve the effect of speaking while returning.
require("dotenv").config();
const fs = require("fs");
const WebSocket = require("ws");
const { resolve } = require("path");
const { createAuthParams } = require(".. /utils/auth");
class XunFeiIAT {
constructor({ onReply }) {
super(a);/ / connection
this.ws = null;
// Return the result, the parsed text of the message
this.message = "";
this.onReply = onReply;
// The input stream voice file that needs to be converted
this.inputFile = resolve(__dirname, ".. /assets/input.wav");
// Interface input parameter
this.params = {
host: "iat-api.xfyun.cn".path: "/v2/iat".apiKey: process.env.XUNFEI_API_KEY,
secret: process.env.XUNFEI_SECRET,
};
}
// Generate webSocket connection
generateWsUrl() {
const { host, path } = this.params;
// Interface authentication, parameter encryption
const params = createAuthParams(this.params);
return `ws://${host}${path}?${params}`;
}
/ / initialization
init() {
const reqUrl = this.generateWsUrl();
this.ws = new WebSocket(reqUrl);
this.initWsEvent();
}
// Initialize the WebSocket event
initWsEvent() {
this.ws.on("open".this.onOpen.bind(this));
this.ws.on("error".this.onError);
this.ws.on("close".this.onClose);
this.ws.on("message".this.onMessage.bind(this));
}
/** * WebSocket Open event, which is triggered to indicate that the connection has been successfully established */
onOpen() {
console.log("open");
this.onPush(this.inputFile);
}
onPush(file) {
this.pushAudioFile(file);
}
// WebSocket message receive callback
onMessage(data) {
const payload = JSON.parse(data);
if (payload.data && payload.data.result) {
// Concatenate the message result
this.message += payload.data.result.ws.reduce(
(acc, item) = > acc + item.cw.map((cw) = > cw.w),
""
);
// Status 2 indicates the end
if (payload.data.status === 2) {
this.onReply(this.message); }}}// WebSocket shutdown event
onClose() {
console.log("close");
}
// WebSocket error event
onError(error) {
console.log(error);
}
/** * Parse the voice file and send the voice to the back end as a binary stream */
pushAudioFile(audioFile) {
this.message = "";
// Send the required carrier parameters
const audioPayload = (statusCode, audioBase64) = > ({
common:
statusCode === 0
? {
app_id: "5f6cab72",} :undefined.business:
statusCode === 0
? {
language: "zh_cn".domain: "iat".ptt: 0,} :undefined.data: {
status: statusCode,
format: "audio/L16; rate=16000".encoding: "raw".audio: audioBase64,
},
});
const chunkSize = 9000;
// Create a buffer to store binary data
const buffer = Buffer.alloc(chunkSize);
// Open the voice file
fs.open(audioFile, "r".(err, fd) = > {
if (err) {
throw err;
}
let i = 0;
// Send the binary stream recursively
function readNextChunk() {
fs.read(fd, buffer, 0, chunkSize, null.(errr, nread) = > {
if (errr) {
throw errr;
}
// nread indicates that the file stream has been read, and sends the end of transmission identifier (status=2)
if (nread === 0) {
this.ws.send(
JSON.stringify({
data: { status: 2}}));return fs.close(fd, (err) = > {
if (err) {
throwerr; }}); }let data;
if (nread < chunkSize) {
data = buffer.slice(0, nread);
} else {
data = buffer;
}
const audioBase64 = data.toString("base64");
const payload = audioPayload(i >= 1 ? 1 : 0, audioBase64);
this.ws.send(JSON.stringify(payload));
i++;
readNextChunk.call(this);
});
}
readNextChunk.call(this); }); }}module.exports = XunFeiIAT;
Copy the code
Chatbot Turing Bot API
Turing Robot API V2.0 is an online service and development interface for developers and enterprises based on the core technologies of Turing Robot platform, such as semantic understanding and deep learning.
At present, THE API interface can call the corpus of chat dialogue, corpus and skill modules:
Chat dialogue refers to nearly 1 billion public dialogue corpus provided by the platform free of charge to meet the entertainment needs of users.
Corpus refers to the private corpus uploaded by users on the platform, which is only for personal viewing and use, and helps users to build the corpus of specialized fields in the most convenient way.
Skills service refers to 26 practical service skills packaged by the platform. Covering life, travel, shopping and other fields, one-stop to meet the needs of users.
require("dotenv").config();
const axios = require("axios");
// It's too easy... Too lazy to explain 🐶
const TulingBotService = {
requestUrl: "http://openapi.tuling123.com/openapi/api/v2".start(text) {
return new Promise((resolve) = > {
axios
.post(this.requestUrl, {
reqType: 0.perception: {
inputText: {
text,
},
},
userInfo: {
apiKey: process.env.TULING_BOT_API_KEY,
userId: process.env.TULING_BOT_USER_ID,
},
})
.then((res) = > {
// console.log(JSON.stringify(res.data, null, 2));
resolve(res.data.results[0].values.text); }); }); }};module.exports = TulingBotService;
Copy the code
Iflytek API for speech synthesis
The speech synthesis streaming interface converts text information into sound information, and provides a large number of distinctive speakers (sound library) for you to choose.
This voice capability provides a common interface to developers through the Websocket API. The Websocket API supports streaming data transmission and is applicable to AI service scenarios that require streaming data transmission. Compared to THE SDK, the API is lightweight and cross-language; The Websocket API protocol has the advantage of native cross-domain support over the HTTP API.
require("dotenv").config();
const fs = require("fs");
const WebSocket = require("ws");
const { resolve } = require("path");
const { createAuthParams } = require(".. /utils/auth");
class XunFeiTTS {
constructor({ text, onDone }) {
super(a);this.ws = null;
// The text to be converted
this.text = text;
this.onDone = onDone;
// The converted voice file
this.outputFile = resolve(__dirname, ".. /assets/output.wav");
// Interface input parameter
this.params = {
host: "tts-api.xfyun.cn".path: "/v2/tts".appid: process.env.XUNFEI_APP_ID,
apiKey: process.env.XUNFEI_API_KEY,
secret: process.env.XUNFEI_SECRET,
};
}
// Generate webSocket connection
generateWsUrl() {
const { host, path } = this.params;
const params = createAuthParams(this.params);
return `ws://${host}${path}?${params}`;
}
/ / initialization
init() {
const reqUrl = this.generateWsUrl();
console.log(reqUrl);
this.ws = new WebSocket(reqUrl);
this.initWsEvent();
}
// Initialize the WebSocket event
initWsEvent() {
this.ws.on("open".this.onOpen.bind(this));
this.ws.on("error".this.onError);
this.ws.on("close".this.onClose);
this.ws.on("message".this.onMessage.bind(this));
}
/** * WebSocket Open event, which is triggered to indicate that the connection has been successfully established */
onOpen() {
console.log("open");
this.onSend();
if (fs.existsSync(this.outputFile)) {
fs.unlinkSync(this.outputFile); }}// Send the parameter information to be converted
onSend() {
const frame = {
/ / fill the common
common: {
app_id: this.params.appid,
},
/ / fill the business
business: {
aue: "raw".auf: "audio/L16; rate=16000".vcn: "xiaoyan".tte: "UTF8",},/ / fill the data
data: {
text: Buffer.from(this.text).toString("base64"),
status: 2,}};this.ws.send(JSON.stringify(frame));
}
// Save the converted speech result
onSave(data) {
fs.writeFileSync(this.outputFile, data, { flag: "a" });
}
// WebSocket message receive callback
onMessage(data, err) {
if (err) return;
const res = JSON.parse(data);
if(res.code ! = =0) {
this.ws.close();
return;
}
// Receive the message result and save it
const audio = res.data.audio;
const audioBuf = Buffer.from(audio, "base64");
this.onSave(audioBuf);
if (res.code == 0 && res.data.status == 2) {
this.ws.close();
this.onDone(); }}onClose() {
console.log("close");
}
onError(error) {
console.log(error); }}module.exports = XunFeiTTS;
Copy the code
Results demonstrate
Finch – see the effect at the bottom of the article
The source address
Github source address if there is help to you, leave a star