Voice commands are becoming ubiquitous today, with many mobile phone users using voice assistants like Siri and Cortana, and our bedrooms being invaded by devices like Amazon’s Echo and Google Home. All of these systems rely on Speech recognition software, and now our browsers have friendly support for the Web Speech API, which allows users to integrate Speech capabilities into Web applications.
This article will show you how to use the API to create an AI voice chat interface in your browser. The app recognizes the user’s voice and responds with a synthesized voice. Because the WebSpeech API is still experimental, the application is only available on supported browsers. The speech recognition and speech synthesis features used in this article are currently only supported on Chromium-based browsers, including Chrome 25+ and Opera 27+, and currently only supported on Firefox, Edge and Safari.
A link to a demo video running on Chrome. Let’s finish the demo!
To complete the Web application, we need to complete the following three main steps:
1. Use the SpeechRecognition interface of the Web Speech API to identify a user’s voice;
2. Send the user’s message as a text string to a commercial natural language processing API;
3. Once apI.ai returns the response text, use the SpeechSynthesis interface to synthesize speech.
The complete source code used for this article is on GitHub. (Give the girl a thumbs up first)
Start your Node.js application
First, we will use Node.js to build a Web application framework. Create your app directory like this:
. ├ ─ ─ index. Js ├ ─ ─ public │ ├ ─ ─ CSS │ │ └ ─ ─ style.css. CSS │ └ ─ ─ js │ └ ─ ─ script. Js └ ─ ─ views └ ─ ─ index. The HTMLCopy the code
Then, execute the following command to initialize your Node.js application:
$ npm init -f
Copy the code
The -f command accepts the default configuration (you can also remove it and configure your application manually), which generates a package.json file containing some basic information.
Now, install the following dependent libraries:
$ npm install express socket.io apiai --save
Copy the code
— The save command will automatically update dependencies in package.json.
We will use the Express library, a Web application service framework written by Node.js, to run the server locally. To achieve real-time two-way communication between the browser and the server, we will use socket.io. At the same time, we will install the natural speech processing service tool, APi. AI, to build an AI chatbot.
Socket.IO is a library that makes it easy to use Websockets in Node.js. By establishing a socket connection between the client and server, our chat messages can be sent back and forth between the browser and server as long as the Web Speech API (voice message) or apI.ai API (AI message) returns text data.
Now, let’s create the index.js file and instantiate Express and the listener server:
const express = require('express');
const app = express(a);
app.use(express.static(__dirname + '/views')); // html
app.use(express.static(__dirname + '/public')); // js, css, images
const server = app.listen(5000);
app.get('/'. (req. res) = > {
res.sendFile('index.html');
});
Copy the code
Next, we’ll use the Web Speech API to integrate the front-end code.
Receive speech with the SpeechRecognition interface
The Web Speech API has one main control interface, called SpeechRecognition, for receiving and recognizing a user’s Speech from the microphone.
Creating a User Interface
The app’s UI is simple: a button that turns on speech recognition. Open index.html and include the front-end JavaScript file (script.js) and socket.io.
<html lang="en">
<head>...</head>
<body>
…
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.0.1/socket.io.js"></script>
<script src="js/script.js"></script>
</body>
</html>
Copy the code
Then, we add a button to the body:
<button>Talk</button>
Copy the code
To make our buttons look like the demo, we need to reference the style.css file in our source code.
Capture sound with JavaScript
In script.js, call an instance of SpeechRecognition, the control interface of the Web Speech API:
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition(a);
Copy the code
We used both prefixed and unprefixed objects because Chrome currently supports the prefix attribute of the API.
Also, we use the ECMAScript6 syntax because ES6, const keyword and arrow functions, and the Speech API interfaces SpeechRecognition and SpeechSynthesis are all supported in the browser.
You can optionally set some attribute variables to customize speech recognition:
recognition.lang = 'en-US';
recognition.interimResults = false;
Copy the code
Then, get the BUTTON’s DOM reference and listen for the click event to initialize speech recognition:
document.querySelector('button').addEventListener('click'. (a) = > {
recognition.start(a);
});
Copy the code
Once you start speech recognition, you can turn what you just said into text using the Result event:
recognition.addEventListener('result'. (e) = > {
let last = e.results.length - 1;
let text = e.results[last] [0].transcript;
console.log('Confidence: ' + e.results[0] [0].confidence);
// We will use the Socket.IO here later...
});
Copy the code
This returns SpeechRecognitionResultList object, you can be in the text results in an array of objects. Also, you can see the confidence transcript returned in the code above.
Now, it’s time to send text to the server using socket. IO.
withSocket.IOReal-time interaction
You may be wondering why we don’t use simple HTTP or AJAX instead. You can send data to the server via POST, but with socket. IO, we use WebSocket to send data, because sockets are the best solution for two-way communication, especially when we push events from the server to the browser. With a continuous socket connection, we don’t have to reload the browser or constantly send AJAX requests.
Instantiate http://Socket.IO in script.js:
const socket = io(a);
Copy the code
Then insert the Result event code that you listen to with SpeechRecognition:
socket.emit('chat message'. text);
Copy the code
Now go back to the Node.js code, receive the text, and use the AI to respond to the user.
Get the answer from the AI
A number of platforms and services allow you to integrate AI systems with speech-text natural language processing in your applications, including IBM’s Watson, Microsoft’s LUIS, and Facebook’s Wit.ai. To quickly build the dialog interface, we’ll use api.ai, which provides a free developer account that allows us to quickly build a small dialog system with its Web interface and Node.js library.
Set API. AI
When you create an account, you create an “agent.” Refer to the first step of the documentation guide.
Next, click “Small Talk” on the left menu and turn on the enable service option.
Customize your small-talk agent with the API.ai interface.
Click on the gear icon next to your agent name in the menu, go back to the “Basic Settings” page, and get the API key. You will use the “client access token” in the Node.js SDK.
Use the API. AI
We will use the Node.js SDK to connect our Node.js application to api.ai. Go back to index.js and initialize api.ai with your access token:
const apiai = require('apiai')(APIAI_TOKEN);
Copy the code
If you only want to run it locally, you can hardcode your API keys here. There are several ways to set environment variables, but I usually declare variables using.env files. In the GitHub source, I hide my certificate file with.gitignore. But you can refer to the.env-test file to see how it is set up.
Now we need to use socket. IO on the server to receive data from the browser.
Once a connection is established and a message is received, respond to the user using the API.AI interface:
io.on('connection'. function(socket) {
socket.on('chat message'. (text) = > {
// Get a reply from API.AI
let apiaiReq = apiai.textRequest(text. {
sessionId: APIAI_SESSION_ID
});
apiaiReq.on('response'. (response) = > {
let aiText = response.result.fulfillment.speech;
socket.emit('bot reply'. aiText); // Send the result back to the browser!
});
apiaiReq.on('error'. (error) = > {
console.log(error);
});
apiaiReq.end(a);
});
});
Copy the code
When api.ai returns the result, it is sent to the client using the socket.io socket.emit() method.
Make the AI sound with the SpeechSynthesis interface
Go back to script.js and create a synthesized sound function using the SpeechSynthesis Controller interface of the Web Speech API. This function takes a string argument and tells the browser to read the text:
function synthVoice(text) {
const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance(a);
utterance.text = text;
synth.speak(utterance);
}
Copy the code
The code above first creates a Window. SpeechSynthesis API access point. You may notice that this time there is no prefix attribute, and the API is more widely supported than SpeechRecognition. All browsers drop the SpeechSysthesis prefix.
Then create a SpeechSynthesisUtterance(), and set the text to be synthesized. You can set other properties, such as voice, to select the type of sound supported by the browser and operating system. Finally call speechSynthesy.speak () to make the sound.
Now use socket. IO again to get the server response and call the above function once the message is received.
socket.on('bot reply'. function(replyText) {
synthVoice(replyText);
});
Copy the code
Let’s try our AI robot!
Reference article:
- Web Speech API, Mozilla Developer Network
- Web Speech API Specification, W3C
- Web Speech API: Speech Synthesis (Microsoft Edge Documentation) Microsoft
- Guide, Node.js
- Documentation, npm
- “Hello world example,” Express
- “Get Started,” Socket. IO
Try different natural language processing tools:
- API.AI, Google
- Wit.ai, Facebook
- LUIS, Microsoft
- Watson, IBM
- Lex, Amazon
Original link:
Building A Simple AI Chatbot With Web Speech API And Node.js
Recommended reading:
- JavaScript arrays and objects are like books and newspapers
Welcome to Zhihu’s Aurora Daily, which provides Makers with three quality English articles a day.