Practice! Pure front-end audio clip processing

preface

Recently, I was working on a project that needed to process the audio recorded by webRTC, including clipping audio, merging multiple audio, and even replacing a part of one audio with another.

The author intended to will go to complete the work to the service side, but, in fact, both the front and back, the work is similar, but also to the service side also need to be extra walk a process to upload, download audio, this not only adds to the pressure from the server, and the cost of network traffic, and came out an idea: Why can’t the front-end do the audio processing?

So in the author’s half grope half practice, produced this article. This is a front-end audio clip SDK that works out of the box.

ffmpeg

Ffmpeg is a very core module to achieve front-end audio processing, of course, not only front-end, FFMPGE as a recording, conversion and streaming audio and video industry mature complete solution, it is also applied in the server, APP applications and other scenarios. About ffMPEG introduction, we can Google, here do not say too much.

Since FFMPEG requires a lot of calculation in the process, it is impossible to run it directly on the front page, because we need to open a separate Web worker to run it in the worker without blocking the page interaction.

Happily, there are developers on github who have provided ffmpge.js and a version of the worker that you can use directly.

So we have a general idea: when we get the audio file, we will decode it and send it to the worker for calculation processing, and return the processing result as an event, so that we can do whatever we want with the audio 🙂

The work necessary to start an amazing journey

It is important to note that the following code examples and the code in the repository address are mainly for MP3, since the author’s project needs only to process in.mp3 format. Of course, the idea is similar regardless of the format.

Create a worker

The way to create a worker is very simple: directly new. It should be noted that due to the restriction of the same-origin policy, to make the worker work normally, it must be the same origin as the parent page. Since this is not the key point, it is skipped

function createWorker(workerPath: string) {
  const worker = new Worker(workerPath);
  return worker;
}
Copy the code

PostMessage turn promise

If you look closely at the ffmPEg. js document, you will notice that it emits events to the parent page at different stages of the audio processing, such as stdout, start and done, etc. If you add callback functions to these events directly, it is not easy to maintain the separation and processing of audio results in the callback function. Individuals are more likely to convert it to promise:

function pmToPromise(worker, postInfo) {
  return new Promise((resolve, reject) = > {
    // Successful callback
    const successHandler = function(event) {
      switch (event.data.type) {
        case "stdout":
          console.log("worker stdout: ", event.data.data);
          break;

        case "start":
          console.log("worker receive your command and start to work:)");
          break;

        case "done":
          worker.removeEventListener("message", successHandler);
          resolve(event);
          break;

        default:
          break; }};// Exception catch
    const failHandler = function(error) {
      worker.removeEventListener("error", failHandler);
      reject(error);
    };

    worker.addEventListener("message", successHandler);
    worker.addEventListener("error", failHandler);
    postInfo && worker.postMessage(postInfo);
  });
}
Copy the code

Through this layer of transformation, we can transform a postMessage request into a promise to handle it, making it easier to expand the space

Interconversion of audio, BLOB, and arrayBuffer

The data format required by FFmPEg-worker is arrayBuffer. Generally, we can directly use either the audio file object BLOb, or the audio element object audio, or even just a link URL. Therefore, it is necessary to convert these formats:

Transfer audio arrayBuffer

function audioToBlob(audio) {
  const url = audio.src;
  if (url) {
    return axios({
      url,
      method: 'get'.responseType: 'arraybuffer',
    }).then(res= > res.data);
  } else {
    return Promise.resolve(null); }}Copy the code

The way I can turn audio into BLOB is to make an Ajax request, set the request type to ArrayBuffer, and get the ArrayBuffer.

Turn the blob arrayBuffer

This is as simple as extracting the bloB content with FileReader

function blobToArrayBuffer(blob) {
  return new Promise(resolve= > {
    const fileReader = new FileReader();
    fileReader.onload = function() {
      resolve(fileReader.result);
    };
    fileReader.readAsArrayBuffer(blob);
  });
}
Copy the code

Turn arrayBuffer blob

Create a BLOB using File

function audioBufferToBlob(arrayBuffer) {
  const file = new File([arrayBuffer], 'test.mp3', {
    type: 'audio/mp3'});return file;
}
Copy the code

Turn the blob audio

Blob to Audio is very simple, and JS provides a native API, url.createObjecturl, to convert blobs into locally accessible links for playback

function blobToAudio(blob) {
  const url = URL.createObjectURL(blob);
  return new Audio(url);
}
Copy the code

Now let’s get down to business.

Audio clip — Clip

The so-called clipping refers to the extraction of the content of the given audio according to the given starting and ending time point to form a new audio. First, the code:

class Sdk {
  end = "end";

  // other code...

  /** * An audio blob will be passed in, Clipping is performed at the specified time position * @param originBlob Audio to be processed * @param startSecond Clipping start time (s) * @Param endSecond clipping end time (s) */
  clip = async (originBlob, startSecond, endSecond) => {
    const ss = startSecond;
    // Get the clipping duration. If endSecond is not passed, clipping ends by default
    const d = isNumber(endSecond) ? endSecond - startSecond : this.end;
    // Convert the BLOB into a processable arrayBuffer
    const originAb = await blobToArrayBuffer(originBlob);
    let resultArrBuf;

    // Get the instruction sent to ffmpge-worker and send it to the worker to wait for it to crop
    if (d === this.end) {
      resultArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, ss)
      )).data.data.MEMFS[0].data;
    } else {
      resultArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, ss, d)
      )).data.data.MEMFS[0].data;
    }

    // Wrap the worker's arrayBuffer as a blob and return it
    return audioBufferToBlob(resultArrBuf);
  };
}
Copy the code

We define three parameters for the interface: the audio blob to be clipped, and the start and end points of clipping, notably the getClipCommand function, which wraps the incoming arrayBuffer and time into the data format of the FFmPEG-Worker convention

/** * Convert clipped data to the specified format as required by ffmPEG documentation * @param arrayBuffer Audio buffer to be processed * @param ST Clipping start time (s) * @param duration Clipping duration */
function getClipCommand(arrayBuffer, st, duration) {
  return {
    type: "run".arguments: `-ss ${st} -i input.mp3 ${
      duration ? `-t ${duration} ` : ""
    }-acodec copy output.mp3`.split(""),
    MEMFS: [{data: new Uint8Array(arrayBuffer),
        name: "input.mp3"}}; }Copy the code

Multiaudio synthesis — concat

Multiaudio composition is a straightforward process of combining multiple sounds into one sound in array order

class Sdk {
  // other code...

  /** * Clipping an incoming audio blob at the specified time position * @param blobs Array of audio blobs to be processed */
  concat = async blobs => {
    const arrBufs = [];
  
    for (let i = 0; i < blobs.length; i++) {
      arrBufs.push(await blobToArrayBuffer(blobs[i]));
    }
  
    const result = await pmToPromise(
      this.worker,
      await getCombineCommand(arrBufs),
    );
    return audioBufferToBlob(result.data.data.MEMFS[0].data);
  };
}
Copy the code

In this code, we use a for loop to decode an array of bloBs into an arrayBuffer. Why not just use the forEach method that comes with arrays? Writing a for loop is a bit of a hassle. Well, there’s a reason: We use await in the body of the loop, expecting that the blobs will be decoded one by one before the following code is executed. The for loop is executed synchronously, but each body of the forEach loop is executed asynchronously and we cannot wait for them all to be completed with await. So using forEach was not what we expected.

Similarly, the getCombineCommand function has a similar responsibility to getClipCommand above:

async function getCombineCommand(arrayBuffers) {
  // Convert arrayBuffers to the data format specified by ffmPEg-worker
  const files = arrayBuffers.map((arrayBuffer, index) = > ({
    data: new Uint8Array(arrayBuffer),
    name: `input${index}.mp3`,}));// Create a TXT text that tells FFmPEG what audio files we want to merge (sort of a map of these files)
  const txtContent = [files.map(f= > `file '${f.name}'`).join('\n')];
  const txtBlob = new Blob(txtContent, { type: 'text/txt' });
  const fileArrayBuffer = await blobToArrayBuffer(txtBlob);

  // Push the TXT file to the list of files to be sent to ffmPEg-worker
  files.push({
    data: new Uint8Array(fileArrayBuffer),
    name: 'filelist.txt'});return {
    type: 'run'.arguments: `-f concat -i filelist.txt -c copy output.mp3`.split(' '),
    MEMFS: files,
  };
}
Copy the code

In the code above, unlike cropping, there is not one audio object being manipulated, but multiple audio objects, so a “mapping table” needs to be created to tell FFmPEg-Worker which audio objects need to be merged and in what order.

Audio clipping replacement — Splice

It’s A bit like an upgraded version of Clip, where we remove audio A from the specified position and insert audio B here:

class Sdk {
  end = "end";
  // other code...

  /** * Add an audio blob, Replace audio * @param originBlob Audio blob to be processed * @param startSecond Start time (s) * @Param endSecond End time (s) * @param InsertBlob is replaced by audio blob */
  splice = async (originBlob, startSecond, endSecond, insertBlob) => {
    const ss = startSecond;
    const es = isNumber(endSecond) ? endSecond : this.end;

    // If insertBlob does not exist, only the specified content of the audio will be deletedinsertBlob = insertBlob ? insertBlob : endSecond && ! isNumber(endSecond) ? endSecond :null;

    const originAb = await blobToArrayBuffer(originBlob);
    let leftSideArrBuf, rightSideArrBuf;

    // Clip the audio to the specified position
    if (ss === 0 && es === this.end) {
      // Cut all
      return null;
    } else if (ss === 0) {
      // Cut from scratch
      rightSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, es)
      )).data.data.MEMFS[0].data;
    } else if(ss ! = =0 && es === this.end) {
      // Clipping to the tail
      leftSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, 0, ss)
      )).data.data.MEMFS[0].data;
    } else {
      // Local clipping
      leftSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, 0, ss)
      )).data.data.MEMFS[0].data;
      rightSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, es)
      )).data.data.MEMFS[0].data;
    }

    // Recombine multiple audio files
    const arrBufs = [];
    leftSideArrBuf && arrBufs.push(leftSideArrBuf);
    insertBlob && arrBufs.push(await blobToArrayBuffer(insertBlob));
    rightSideArrBuf && arrBufs.push(rightSideArrBuf);

    const combindResult = await pmToPromise(
      this.worker,
      await getCombineCommand(arrBufs)
    );

    return audioBufferToBlob(combindResult.data.data.MEMFS[0].data);
  };
}
Copy the code

The above code is somewhat similar to a combination of clip and concat.

At this point, our needs are basically realized. Just with the help of the worker, the front-end can also process the audio itself, isn’t it beautiful?

The above code is just for a better explanation, so do some simplification, interested in children’s shoes can be direct source, welcome to exchange, clap brick 🙂