There are two ways to implement recording on the front end, one is to use MediaRecorder, the other is to use WebRTC’s getUserMedia combined with AudioContext. MediaRecorder is older, but Safari/Edge and other browsers have never implemented it. So it’s not very compatible, and WebRTC has been supported by all major browsers, like Safari since 11. So we recorded it with WebRTC.

The use of AudioContext to play sounds, which I covered in Chrome 66 after Autoplay was disabled, will continue to use the AudioContext API.

To implement recording, let’s start by playing music from local files, because some apis are generic.

1. Play a local audio file

You can play audio with an audio tag, you can play audio with an AudioContext, and the audio tag needs a URL, it can be a remote HTTP URL, it can be a local BLOB URL, how do you create a local URL?

Use the following HTML for illustration:

<input type="file" onchange="playMusic.call(this)" class="select-file">
<audio class="audio-node" autoplay></audio>Copy the code

Provide a File Input upload control that lets the user select a local file and an Audio tag ready to play it. When the user selects the file, the onChange event will be triggered, and the onChange callback will get the contents of the file, as shown in the following code:

function playMusic () {
    if (!this.value) {
        return;
    }
    let fileReader = new FileReader();
    let file = this.files[0];
    fileReader.onload = function () {
        let arrayBuffer = this.result;
        console.log(arrayBuffer);
    }
    fileReader.readAsArrayBuffer(this.files[0]);
}Copy the code

Here we use a FileReader to read the file as an ArrayBuffer, the raw binary content, and print it as follows:

You can read the contents of the Uint8Array by instantiating it as an unsigned 8-bit integer ranging from 0 to 255. This means that every byte of 0101 is read as an integer. For more on this topic, see front End Local File Operations and Uploads.

This arrayBuffer can be converted to a BLOB, which can then be used to generate a URL, as shown in the following code:

fileReader.onload = function () {
    let arrayBuffer = this.result;
    // Convert to a blob
    let blob = new Blob([new Int8Array(this.result)]);
    // Generate a local BLOb URL
    let blobUrl = URL.createObjectURL(blob);
    console.log(blobUrl);
    // Give the SRC attribute to the audio tag
    document.querySelector('.audio-node').src = blobUrl;
}Copy the code

Use the URl.createObjecturl API to generate the URL of the blob. The URL will print like this:

blob:null/c2df9f4d-a19d-4016-9fb6-b4899bac630d

And then you throw in the audio tag and it plays back, which is like a remote HTTP URL.

When using ArrayBuffer to generate bloB objects, you can specify the file type or MIME type, as shown in the following code:

let blob = new Blob([new Int8Array(this.result)], {
    type: 'audio/mp3' // files[0].type
});Copy the code

This MIME can be obtained from file Input with files[0].type. Files [0] is an instance of file. File has a MIME type, and so does Blob, because file inherits from Blob and the two have the same root. Instead of reading an ArrayBuffer and encapsulating it into a Blob, use a File, as shown in the following code:

function playMusic () {
    if (!this.value) {
        return;
    }
    // Use the File object directly to generate the BLOB URL
    let blobUrl = URL.createObjectURL(this.files[0]);
    document.querySelector('.audio-node').src = blobUrl;
}Copy the code

Using AudioContext requires you to get the contents of the file and then manually decode the audio to play it.

2. Model of AudioContext

How to play a sound using an AudioContext? Let’s analyze its model, as shown in the figure below:

When we get an ArrayBuffer, we use AudioContext’s decodeAudioData to decode it, generate an AudioBuffer instance, and use it as the Buffer property of the AudioBufferSourceNode object. This Node inherits from AudioNode, and it also has connect and start methods. Start is to start playing, but before playing, you need to tune Connect to connect the Node to audioContext.destination, the speaker device. The code looks like this:

function play (arrayBuffer) {
    // Safari requires the webKit prefix
    let AudioContext = window.AudioContext || window.webkitAudioContext,
        audioContext = new AudioContext();
    // Create an AudioBufferSourceNode object using the factory function of AudioContext
    let audioNode = audioContext.createBufferSource();
    // To decode audio, you can use Promise, but older Safari requires callbacks
    audioContext.decodeAudioData(arrayBuffer, function (audioBuffer) {
        console.log(audioBuffer);
        audioNode.buffer = audioBuffer;
        audioNode.connect(audioContext.destination); 
        // Start from 0s
        audioNode.start(0);
    });
}
fileReader.onload = function () {
    let arrayBuffer = this.result;
    play(arrayBuffer);
}Copy the code

Print out the decoded audioBuffer as shown in the figure below:

It has several properties that are visible to developers, including audio duration, number of channels, and sampling rate. The printed result shows that the audio played is 2 channels, the sampling rate is 44.1k Hz, and the duration is 196.8s. The meaning of these properties of sound can be seen in Chrome source code audio/video streaming implementation 1.

As you can see from the above code, there is an important hub element for audio processing with an AudioContext, the AudioNode. The AudioBufferSourceNode is used above, and its data comes from a decoded complete buffer. Other AudioNode descendants include GainNode: for setting volume, BiquadFilterNode: for filtering, ScriptProcessorNode: Provides a onaudioprocess callback allows you to analyze processing audio data, MediaStreamAudioSourceNode: used to connect the microphone equipment, etc. These nodes can use decorator mode. Layers of connect, such as the bufferSourceNode used in the above code, can connect to the gainNode first, and then from the gainNode connect to the speaker, adjusting the volume.

As shown in the figure below:

These nodes are created using the factory functions of audioContext, such as calling createGainNode to create a gainNode.

All of this is in preparation for recording, which requires ScriptProcessorNode.

3. Realization of recording

The source of the music played above is the local audio file, and the source of the recording is the microphone. In order to get the microphone and get the data, we need to use WebRTC’s getUserMedia, as shown in the following code;

<button onclick="record()">Start the recording</button>
<script>
function record () {
    window.navigator.mediaDevices.getUserMedia({
        audio: true
    }).then(mediaStream= > {
        console.log(mediaStream);
        beginRecord(mediaStream);
    }).catch(err= > {
        // If the user's computer does not have a microphone, or the user refuses, or the connection is not working, etc
        // An exception will be thrown, and err.name will tell you what type of error it is
        console.error(err); }); }</script>Copy the code

When calling getUserMedia, you can specify that you want to record audio. If you want to record video, you can add video: true. You can also specify the format of the recording:

window.navigator.mediaDevices.getUserMedia({
    audio: {
        sampleRate: 44100./ / sampling rate
        channelCount: 2./ / track
        volume: 1.0        / / volume
    }
}).then(mediaStream= > {
    console.log(mediaStream);
});Copy the code

When called, the browser will pop up a box asking if the user is allowed to use the microphone:

If the user rejects, an exception is thrown, which can be caught in a catch, and if everything is in order, a MediaStream object is returned:

It is an audio stream of abstraction, the flow is used to initialize a MediaStreamAudioSourceNode object, then connects the nodes connect to a JavascriptProcessorNode, in its onaudioprocess access to the audio data, And then you save it, and you get the data of the recording.

If you want to play the recorded sound directly, you simply connect it to the speaker, as shown in the following code:

function beginRecord (mediaStream) {
    let audioContext = new (window.AudioContext || window.webkitAudioContext);
    let mediaNode = audioContext.createMediaStreamSource(mediaStream);
    // It will play automatically after connect
    mediaNode.connect(audioContext.destination);
}Copy the code

But if you’re recording and playing at the same time, if you’re not using headphones, it’s going to echo, so don’t play it here.

To get the recorded sound data, we connect it to a javascriptProcessorNode and create an instance for this:

function createJSNode (audioContext) {
    const BUFFER_SIZE = 4096;
    const INPUT_CHANNEL_COUNT = 2;
    const OUTPUT_CHANNEL_COUNT = 2;
    // createJavaScriptNode is deprecated
    let creator = audioContext.createScriptProcessor || audioContext.createJavaScriptNode;
    creator = creator.bind(audioContext);
    return creator(BUFFER_SIZE,
                    INPUT_CHANNEL_COUNT, OUTPUT_CHANNEL_COUNT);
}Copy the code

This is the object created using createScriptProcessor. Three parameters need to be passed: one is the buffer size, which is usually set to 4kB, and the other two are the number of input and output channels, which in this case is set to dual channels. It has two buffers, an inputBuffer and an outputBuffer, which are AudioBuffer instances. You can retrieve the data from the inputBuffer in the onAudioProcess callback, process it, and then put it into the outputBuffer, as shown in the figure below:

For example, we can connect the bufferSourceNode used to play the audio in step 1 to jsNode, which in turn connects jsNode to the speaker to process the sound data in batches, such as noise reduction, in the Process callback. When the speaker has consumed the 4kB outputBuffer, the Process callback is fired. So the process callback is constantly fired.

In the recording example, you connect the mediaNode to the jsNode, take the recording data, and push the data into an array until the recording is terminated. The following code looks like this:

function onAudioProcess (event) {
    console.log(event.inputBuffer);
}
function beginRecord (mediaStream) {
    let audioContext = new (window.AudioContext || window.webkitAudioContext);
    let mediaNode = audioContext.createMediaStreamSource(mediaStream);
    // Create a jsNode
    let jsNode = createJSNode(audioContext);
    // The process callback can be triggered only when the outputBuffer is consumed by connecting to the speaker
    // And because the outputBuffer is not set, the speaker will not play any sound
    jsNode.connect(audioContext.destination);
    jsNode.onaudioprocess = onAudioProcess;
    // Connect mediaNode to jsNode
    mediaNode.connect(jsNode);
}Copy the code

If we print out the inputBuffer, we can see that each section is about 0.09s:

That means every 0.09 seconds. The next step is to continuously save the recorded data in the process callback, as shown in the following code, and obtain the data of the left and right channels respectively:

function onAudioProcess (event) {
    let audioBuffer = event.inputBuffer;
    let leftChannelData = audioBuffer.getChannelData(0),
        rightChannelData = audioBuffer.getChannelData(1);
    console.log(leftChannelData, rightChannelData);
}Copy the code

Print out a Float32Array, where each number in the array is a 32-bit single-precision floating-point number, as shown below:

The question is, what does the recorded data actually represent? It’s sampled to show the strength of the sound, the sound waves are converted by the microphone into electric signals of different strength, and these numbers represent the strength of the signal. Its value range is [-1, 1], indicating a relative ratio.

And push it into an array:

let leftDataList = [],
    rightDataList = [];
function onAudioProcess (event) {
    let audioBuffer = event.inputBuffer;
    let leftChannelData = audioBuffer.getChannelData(0),
        rightChannelData = audioBuffer.getChannelData(1);
    // Clone is needed
    leftDataList.push(leftChannelData.slice(0));
    rightDataList.push(rightChannelData.slice(0));
}Copy the code

Finally, add a stop recording button and respond to the operation:

function stopRecord () {
    // Stop recording
    mediaStream.getAudioTracks()[0].stop();
    mediaNode.disconnect();
    jsNode.disconnect();
    console.log(leftDataList, rightDataList);
}Copy the code

The saved data is printed like this:

Is an ordinary array with a number of Float32Arrays, which are then synthesized into a single Float32Array:

function mergeArray (list) {
    let length = list.length * list[0].length;
    let data = new Float32Array(length),
        offset = 0;
    for (let i = 0; i < list.length; i++) {
        data.set(list[i], offset);
        offset += list[i].length;
    }
    return data;
}
function stopRecord () {
    // Stop recording
    let leftData = mergeArray(leftDataList),
        rightData = mergeArray(rightDataList);
}Copy the code

Why not make it a single Array in the first place, because it is not easy to expand. I don’t know the total length of the array at the beginning, because I’m not sure how long I want to record, so it’s easier to merge it at the end of the recording.

Then merge the left and right channel data. When stored in WAV format, it is not put left channel first and then put right channel, but a left channel data, a right channel data crossover, as shown in the following code:

// Cross-merge the left and right channels
function interleaveLeftAndRight (left, right) {
    let totalLength = left.length + right.length;
    let data = new Float32Array(totalLength);
    for (let i = 0; i < left.length; i++) {
        let k = i * 2;
        data[k] = left[i];
        data[k + 1] = right[i];
    }
    return data;
}Copy the code

Finally, create a WAV file. First, write the header information of WAV, including setting the sound channel, sampling rate, bit sound, etc., as shown in the following code:

function createWavFile (audioData) {
    const WAV_HEAD_SIZE = 44;
    let buffer = new ArrayBuffer(audioData.length * 2 + WAV_HEAD_SIZE),
        // We need a view to manipulate the buffer
        view = new DataView(buffer);
    // Write the wav header
    // RIFF chunk descriptor/identifier
    writeUTFBytes(view, 0.'RIFF');
    // RIFF chunk length
    view.setUint32(4.44 + audioData.length * 2.true);
    // RIFF type
    writeUTFBytes(view, 8.'WAVE');
    // format chunk identifier
    // FMT sub-chunk
    writeUTFBytes(view, 12.'fmt ');
    // format chunk length
    view.setUint32(16.16.true);
    // sample format (raw)
    view.setUint16(20.1.true);
    // stereo (2 channels)
    view.setUint16(22.2.true);
    // sample rate
    view.setUint32(24.44100.true);
    // byte rate (sample rate * block align)
    view.setUint32(28.44100 * 2.true);
    // block align (channel count * bytes per sample)
    view.setUint16(32.2 * 2.true);
    // bits per sample
    view.setUint16(34.16.true);
    // data sub-chunk
    // data chunk identifier
    writeUTFBytes(view, 36.'data');
    // data chunk length
    view.setUint32(40, audioData.length * 2.true);
}
function writeUTFBytes (view, offset, string) {
    var lng = string.length;
    for (var i = 0; i < lng; i++) { view.setUint8(offset + i, string.charCodeAt(i)); }}Copy the code

Next, we are going to write the recording data. We are going to write the 16-bit deep, that is, the 16-bit binary represents the intensity of the sound. The 16-bit range is [-32768, +32767], the maximum value is 32767 (0x7FFF), and the recording data value range is [-1, 1], indicating the relative ratio. Multiply this ratio by the maximum value to get the actual value to store. The following code looks like this:

function createWavFile (audioData) {
    // Write the wav header as above
    // Write PCM data
    let length = audioData.length;
    let index = 44;
    let volume = 1;
    for (let i = 0; i < length; i++) {
        view.setInt16(index, audioData[i] * (0x7FFF * volume), true);
        index += 2;
    }
    return buffer;
}Copy the code

Finally, you can use the generated blob URL mentioned in point 1 to play the recorded sound, as shown in the following code:

function playRecord (arrayBuffer) {
    let blob = new Blob([new Uint8Array(arrayBuffer)]);
    let blobUrl = URL.createObjectURL(blob);
    document.querySelector('.audio-node').src = blobUrl;
}
function stopRecord () {
    // Stop recording
    let leftData = mergeArray(leftDataList),
        rightData = mergeArray(rightDataList);
    let allData = interleaveLeftAndRight(leftData, rightData);
    let wavBuffer = createWavFile(allData);
    playRecord(wavBuffer);
}Copy the code

Or upload the bloB using FormData.

The whole implementation of a recording is basically over, the code references a recording library RecordRTC.

4. Summary

To review, the overall process looks like this:

Call webRTC’s getUserMediaStream to retrieve the audio stream, initialize a mediaNode with the stream, connect it to a jsNode, and retrieve the recording data in the jsNode’s process callback. After stopping the recording, This data is converted into 16-bit integer data, and the waV header information is written to generate a memory buffer for the WAV audio file. The buffer is encapsulated into a Blob file and a URL is generated, which can be played locally or uploaded using FormData. This process is not very complicated once you understand it.

This paper involves the API of WebRTC and AudioContext, focuses on the overall model of AudioContext, and knows that audio data is actually the sound of strong and weak records, storage by multiplied by the maximum 16-bit integer into 16-bit deep representation. Create a BLOB link to the local data using both blob and URl.createObjecturl.

RecordRTC recording library at the end of the use of webworker to merge the left and right sound channel data and generate WAV files, can further improve the efficiency, to avoid the recording file is too large after the processing of stuck.