With the company’s product business expansion, this year is and the browser recording function hard battle. There are a lot of weird problems and a few more extensions. Here is a record to share with those who also have a headache.

Parsed base64 PCM data for playback

This scenario still exists. There may not be a problem with webSocket and Server interaction. But if it’s a native application interaction, in order to ensure consistency of the data, it needs to be used when only passing strings.

  1. Parse base64 to arrayBuffer.

    function base642ArrayBuffer() {
    			const binary_string = window.atob(base64); / / parsing base64
          const len = binary_string.length;
          const bytes = new Uint8Array(len);
          for (let i = 0; i < len; i++) {
            bytes[i] = binary_string.charCodeAt(i);
          }
      		// If '.buffer 'is not used, Unit8Array is returned
      		// Unit8Array can be used to mute fill(0), but buffer cannot
          return bytes.buffer;
    }
    Copy the code
  2. Since the browser does not support playing PCM data, add a WAV request header if the backend server is “inconvenient”. So we need to build our own waV head (that’s the 44 bytes)

      function buildWaveHeader(opts) {
        const numFrames = opts.numFrames;
        const numChannels = opts.numChannels || 1;
        const sampleRate = opts.sampleRate || 16000; // Sample rate 16000
        const bytesPerSample = opts.bytesPerSample || 2; // Bit depth is 2 bytes
        const blockAlign = numChannels * bytesPerSample;
        const byteRate = sampleRate * blockAlign;
        const dataSize = numFrames * blockAlign;
    
        const buffer = new ArrayBuffer(44);
        const dv = new DataView(buffer);
    
        let p = 0;
    
        p = this.writeString('RIFF', dv, p); // ChunkID
        p = this.writeUint32(dataSize + 36, dv, p); // ChunkSize
        p = this.writeString('WAVE', dv, p); // Format
        p = this.writeString('fmt ', dv, p); // Subchunk1ID
        p = this.writeUint32(16, dv, p); // Subchunk1Size
        p = this.writeUint16(1, dv, p); // AudioFormat
        p = this.writeUint16(numChannels, dv, p); // NumChannels
        p = this.writeUint32(sampleRate, dv, p); // SampleRate
        p = this.writeUint32(byteRate, dv, p); // ByteRate
        p = this.writeUint16(blockAlign, dv, p); // BlockAlign
        p = this.writeUint16(bytesPerSample * 8, dv, p); // BitsPerSample
        p = this.writeString('data', dv, p); // Subchunk2ID
        p = this.writeUint32(dataSize, dv, p); // Subchunk2Size
    
        return buffer;
      }
      function writeString(s, dv, p) {
        for (let i = 0; i < s.length; i++) {
          dv.setUint8(p + i, s.charCodeAt(i));
        }
        p += s.length;
        return p;
      }
      function writeUint32(d, dv, p) {
        dv.setUint32(p, d, true);
        p += 4;
        return p;
      }
      function writeUint16(d, dv, p) {
        dv.setUint16(p, d, true);
        p += 2;
        return p;
      }
    Copy the code
  3. The head and PCM are assembled once

    concatenate(header, pcmTTS);
    function concatenate(buffer1, buffer2) {
        const tmp = new Uint8Array(buffer1.byteLength + buffer2.byteLength);
        tmp.set(new Uint8Array(buffer1), 0);
        tmp.set(new Uint8Array(buffer2), buffer1.byteLength);
        return tmp.buffer;
      }
    Copy the code
  4. This can be converted into a playable buffer stream, which can be used to obtain time, and can be assembled and splice if multiple PCM data streams are available

    Audioctx.decodeaudiodata (TTS, (buffer) => {store for playback});// Buffer. Duration can be used to determine the play duration
    // buffer
    Copy the code
  5. play

    const source = audioCtx.createBufferSource();
    const gainNode = audioCtx.createGain();
    source.buffer = buffer;
    gainNode.gain.setTargetAtTime(0.1, audioCtx.currentTime + 2.5);
    source.connect(gainNode);
    gainNode.connect(context.destination);
    source.start('You can simply use buffer.duration or calculate the spliced length logic yourself.' + this.context.currentTime);// currentTime must be added
    Copy the code

After recording, it was found that the noise and echo were serious when the mobile phone was playing

Among the business requirements, there is a weak requirement. The scenario of our product is to simulate the communication dialogue process between a robot and the user. The intermediate involves the audio playback of the robot and the recording of the user’s speech (because of the functional requirements, the robot is required to still record when broadcasting to ensure the existence of the scramble logic). Originally, this scheme can still perform well in the scenario of wearing headphones, but in the case of the requirement of customization to achieve the above functions, the mobile phone audio is played in public, and the headset is not allowed. Then we crashed and spent a lot of time researching the implementation (actually there’s nothing going on here, but I can sort out what I know).

  1. **VAD do noise reduction logic. ** VAD is sound activity detection, detecting the presence of sound. And noise reduction are two different things. However, the algorithm module of Vad can play a certain role in noise reduction. Its approach is to roughly default that the human voice is louder than the environmental sound, and remove the sound source with small sound. But it’s not standard noise reduction.
  2. How to achieve noise reduction. Because the noise is acoustically indistinguishable from the human voice, it is difficult to achieve noise reduction using pure software algorithms. So it’s usually the hardware filtering once and then the algorithm filtering. It’s hard to filter the noise once it’s collected. In summary, noise reduction mainly depends on hardware devices (microphone array), but after testing different mobile phone hardware devices are not consistent, and the logic of public playback, huawei mate20pro/iphoneX and so on will still have obvious echo, Therefore, later we were forced to cut the requirements for the problem of experience (only recording when it was the user’s turn to speak) — we may continue to investigate in the future. The third-party solution we contacted failed to successfully introduce verification for the time being, so we can’t be sure that this direction is not feasible at all.
  3. There is no special noise reduction algorithm. There must be a special noise reduction algorithm. If the scene is on Android and IOS native devices, noise reduction or even echo cancellation can be directly realized locally by calling the underlying API, which is equivalent to directly installing the algorithm module into the application in the form of SDK. But if it’s on the Web then there’s no way you have to go to the cloud and do algorithmic noise reduction.
  4. Best practice. We still have to guide our customers to wear microphones despite the limitations of the algorithm. As far as the scene is concerned, the quality of ASR is very high in order to achieve a good dialogue effect and extract information in this scenario. Noise reduction by physical devices can greatly reduce the pressure on the algorithm. After all, the physical unit can also do active noise cancellation (those active noise cancellation headphones on the market).

The webview

For various reasons, we started to debug how the native app would interact with the web page (the native app would collect the audio, and the PCM flow of the audio would be provided to the web page through callback methods). This is my first time working with native apps, and since we don’t need developers for this yet, I may have stepped on some common sense questions. Also a little more neat:

Native application and WebView interaction

It used to be thought that interaction was all about fancy operations like callbacks. After docking, we know that both ends of the call can only be passed with simple method calls. This leads to a problem. We need to bind methods to the window to interact with the native application, and the methods bound to the window do not have the this context of vue. So in order to get through the PCM data flow of the native application to the Vue instance for logical processing, I wrote a simple event subscriber mode. Through subscription, notification form to achieve.

How do I view console logs

Probably the biggest problem after embedding the WebView is how we look at the Chrome log. Maybe the way adopted now is still a relatively painful way to achieve, I downloaded the IOS development tools and Android development tools, and then let them help me build the environment, and then the adjustment is my own thing. One of the benefits of this approach is that if I run into some small problems (involving native changes), I can directly look up and change some small logic myself, without relying on others, and improve my efficiency.

There is also a bug here, but it is a 404 error when chrome is debugged by Android. In fact, you need to use magic to access the Internet after the normal access. Otherwise, whatever you do, it won’t work.

Permissions related considerations

Webview embedding native applications has many permissions issues, such as whether localstorage is allowed, whether illegal security certificates are allowed (local developers will forge certificates to simulate HTTPS), whether recording is allowed, whether HTTPS is allowed to load HTTP resources, even broadcast, and so on. My approach to this problem is to describe to my partner as clearly as possible what my page will do, and then let them decide what permission to give you

IOS love hate

There are a lot of potholes in IOS, so hopefully we can get you through them.

Wkbview cannot support web recording

This is actually a bit of a pain in the ass. I started by sending a piece of code to verify browser compatibility and asked my partners (who write native apps to embed in our WebView) to help us with a simple compatibility test to finalize our plan. As a result, we probably did not communicate well. When we tried to embed our page near the launch of the project, we found that we did not support this function. This is a serious step in the hole, there is no choice but to choose a temporary alternative, in the embedded IOS WebView to use native recording, other environment logic to continue to go web recording.

** to summarize, in IOS12 (the latest version), safari supports web side recording, but it is not supported in wkwebview (native app embedded webview) scenarios. ** I’ve seen people on github saying in IOS11 that they expect IOS12 to support this feature. For us, such a less compatible solution is definitely abandoned without mercy.

Error message after audioctx. XXX is called multiple times in Safarinull is not an object

In Safari we generate an instance of audioContext for every time we record and play a robot voice, and in Chrome we do it no matter how many times we do it. However, when I switched to Safari, I found that the page cannot be operated up to 5 times, and as soon as the 5th operation, an error will be reported. Audiocontext Samplerate null after being read 8 times. The reason the call fails is because audioCtx cannot be created more than six times, otherwise null will be returned. Combined with our 5 times (this number may have a certain deviation), you can intuitively judge that the problem should be here — our audio example is not destroyed normally. So audioCtx = null; It doesn’t go into garbage collection. Again with the MDN document, find this method.

AudioContext.close();

Closes an audio environment to release any audio that is using system resources.

AudioContext = null to audioContext.close().

< span style = “color: RGB (50, 50, 50)

In Safari, after the audio file pulled back from the remote end is placed in the audio label, the total time is displayed as Infinity. However, there is no such problem in Chrome, so we started to locate the problem. First, look at the article audio. Duration Returns Infinity on Safari when MP3 is served from PHP. From the key information in the article, we can see that this problem is most likely caused by the request header setting. So I tried pulling the remote recording file into the static file directory provided by Egg to access it as a static file (to see how the request header should be modified), and was pleasantly surprised to find that the middleware provided by Egg for handling static files worked perfectly in Safari. This basically confirms that the remote server did not handle the header properly. At the same time see the MDN documentation introduction of dutaion. You can tell that in Chrome the browser does the processing for you (getting the preset length), whereas in Safari you have to do it yourself.

A double. If the media data is available but the length is unknown, this value is NaN. If the media is streamed and has no predefined length, the value is Inf.

“Length” means “contentLength”, “contentLength” means “length”, “contentLength” means “length”.

The reason behind why safari returns duration as infinity is quite interesting: It appears that Safari requests the server twice for playing files. First it sends a range request to the server with a range header like this:(bytes:0-1).If the server doesnt’ return the response as a partial content and if it returns the entire stream then the safari browser will not set audio.duration tag and which result in playing the file only once and it can’t be played again.

Retrieving audio resources in Safari will send at least two requests. The first request will look like (bytes: 0-1), if the server does not return the bytes for this request, Safari will not parse the full amount of audio data for the next request and will lose the functionality of the audio tag. So for the request, we can solve it in this crude way:

    const { ctx } = this;
    const file = fs.readFileSync('./record.mp3');
    ctx.set('Content-Type'.'audio/mpeg');

    if (ctx.headers.range === 'bytes=0-1') {
      ctx.set('Content-Range'.`bytes 0-1/${file.length}`);
      ctx.body = file.slice(0.1);
    } else {
      ctx.body = file;
    }
Copy the code

Of course, this is a very rough way to handle it. I looked at the koA middleware implementation of static-cache, which works fine in Safari, but it doesn’t have the above code. So in my opinion, the above code is a bit of a hack. Of course, we haven’t found the right way to solve the problem yet.

/deep/ selector is not supported

There is currently no responsive solution to this problem. This can only be done by extracting the styles that need to be changed to the child component into the style tag without scope. No smooth compatibility mode has been found for the time being.

Ios call stops native recording, causing wkWebView to enter suspended state (unable to use route jump and send request, etc.)

This was actually a native recording problem, but it took a lot of time to figure it out because I thought it was a front end problem. Record it here in case anyone else steps in the pit.

The code in the project terminates a session with a variety of save operations and route hops. However, after accessing the recording function of ios, I found that although the page request was displayed to have been sent, the background did not receive it. The problem is caused by the ios recording stop, which is probably caused by the page getting stuck for some task queue-related operation (if only console.log).

There’s also a little bit of an ios solution here

// Stop the recording queue and remove the buffer, and close the session, regardless of success
AudioQueueStop(_audioQueue, false);
// Remove the buffer. True means to finish recording immediately, false means to finish processing the buffer
AudioQueueDispose(_audioQueue, false);
Copy the code

The correct invocation context. CreateBufferSouce. Stop ()

After embedding the WebView, when the page interrupts, you need to interrupt all the audio that is currently playing. Executing this method on ios will cause an error (there are several reasons why it needs to be repeated). For this type of error, the simplest try {} catch{} is chosen, because there are no other cases, and several cases should be tested without any other problems


Postscript, in fact, I have done a lot of things in this period of time. Like what Web – RTC these, but have not had time to sort out, if you are interested in ~ later can sort out