AudioFileStream introduction

AudioFileStreamer is used to read basic information such as sampling rate, bit rate and duration as well as to separate audio frames.

So it’s important that all the data is associated with it, and the AudioQueue is actually a little bit easier to use, but the hard part is dealing with the data…

AudioFileStreamer is used for streaming, not just network streams, but also local files to read information and separate audio frames, according to Apple. AudioFileStreamer’s main data is file data rather than file path, so data access needs to be implemented by the user.

AudioFileStreamer’s main data is file data. The file formats supported by AudioFileStreamer are:

  • MPEG-1 Audio Layer 3, used for .mp3 files
  • MPEG-2 ADTS, used for the .aac audio data format
  • AIFC
  • AIFF
  • CAF
  • MPEG-4, used for .m4a, .mp4, and .3gp files
  • NeXT
  • WAVE

Initialize AudioFileStream

Initialize the AudioFileStream, create an audio stream parser, and generate an AudioFileStream example.

extern OSStatus 
AudioFileStreamOpen (
    void * __nullable     inClientData,
    AudioFileStream_PropertyListenerProc    inPropertyListenerProc,
    AudioFileStream_PacketsProc inPacketsProc,
    AudioFileTypeID  inFileTypeHint,
    AudioFileStreamID __nullable * __nonnull outAudioFileStream) __OSX_AVAILABLE_STARTING(__MAC_10_5,__IPHONE_2_0);
Copy the code
  • InClientData: User-specified data to be passed to the callback function, where we specify (__bridge LocalAudioPlayer*)self
  • InPropertyListenerProc: is a callback to the parse of the song information, which is performed once for each parse of the song information
  • InPacketsProc: is a callback to split frames, which is called back when parsed to an audio frame
  • InFileTypeHint: Specifies the format of the audio data. If you do not know the format of the audio data, pass 0
  • OutAudioFileStream: Indicates the AudioFileStreamID instance that needs to be saved for future use
//AudioFileTypeID Enum enum {kAudioFileAIFFType ='AIFF',
        kAudioFileAIFCType             = 'AIFC',
        kAudioFileWAVEType             = 'WAVE',
        kAudioFileSoundDesigner2Type   = 'Sd2f',
        kAudioFileNextType             = 'NeXT',
        kAudioFileMP3Type              = 'MPG3',    // mpeg layer 3
        kAudioFileMP2Type              = 'MPG2',    // mpeg layer 2
        kAudioFileMP1Type              = 'MPG1',    // mpeg layer 1
        kAudioFileAC3Type              = 'ac-3',
        kAudioFileAAC_ADTSType         = 'adts',
        kAudioFileMPEG4Type            = 'mp4f',
        kAudioFileM4AType              = 'm4af',
        kAudioFileM4BType              = 'm4bf',
        kAudioFileCAFType              = 'caff',
        kAudioFile3GPType              = '3gpp',
        kAudioFile3GP2Type             = '3gp2',        
        kAudioFileAMRType              = 'amrf'        
};
Copy the code

This function creates an AudioFileStreamID, based on which all subsequent operations are performed, and again creates two callbacks, inPropertyListenerProc and inPacketsProc, which are more important, as described below.

OSStatus Indicates whether the initialization is successful (OSStatus == noErr).

Analytical data

After initialization is complete, this method is called to parse the file data. Call the method when parsing:

extern OSStatus AudioFileStreamParseBytes(AudioFileStreamID inAudioFileStream,
                                          UInt32 inDataByteSize,
                                          const void* inData,
                                          UInt32 inFlags);
Copy the code

Parameters are described as follows:

  • InAudioFileStream: Indicates an AudioFileStreamID instance. The instance is opened by AudioFileStreamOpen
  • InDataByteSize: Size of data bytes parsed this time
  • InData: indicates the size of the parsed data
  • InFlags: data parsing, with only one value kAudioFileStreamParseFlag_Discontinuity = 1, said analytical data is discontinuous, at present we can 0.

If the value of OSStatus is not noErr, parsing fails.

enum
{
  kAudioFileStreamError_UnsupportedFileType        = 'typ? ',
  kAudioFileStreamError_UnsupportedDataFormat      = 'fmt? ',
  kAudioFileStreamError_UnsupportedProperty        = 'pty? ',
  kAudioFileStreamError_BadPropertySize            = '! siz',
  kAudioFileStreamError_NotOptimized               = 'optm',
  kAudioFileStreamError_InvalidPacketOffset        = 'pck? ',
  kAudioFileStreamError_InvalidFile                = 'dta? ',
  kAudioFileStreamError_ValueUnknown               = 'unk? ',
  kAudioFileStreamError_DataUnavailable            = 'more',
  kAudioFileStreamError_IllegalOperation           = 'nope',
  kAudioFileStreamError_UnspecifiedError           = 'wht? ',
  kAudioFileStreamError_DiscontinuityCantRecover   = 'dsc! '
};
Copy the code

The return value should be noted after each successful call, and subsequent parsing should not be necessary if an error occurs.

The callback is introduced

Parse file format information

AudioFileStream_PropertyListenerProc, parse the file format information of the callback, when parsing call AudioFileStreamParseBytes method first reads in format information, And synchronously enter the AudioFileStream_PropertyListenerProc callback method.

In this callback, you can get what you want audio information, such as audio structure (AudioStreamBasicDescription), rate (BitRate), MagicCookie and so on, through these information, you can also calculate other data, such as audio total duration.

Enter this method and take a look:

typedef void (*AudioFileStream_PropertyListenerProc)(
            void *                          inClientData,
            AudioFileStreamID           inAudioFileStream,
            AudioFileStreamPropertyID       inPropertyID,
            AudioFileStreamPropertyFlags *  ioFlags);
Copy the code

The first parameter is the context object from which we initialized the instance

The second parameter is the ID of the instance

The third parameter is the mediation analysis information ID back, said the current PropertyID corresponding information has been parsing is complete (offset) data format, for example, audio information, could be obtained by AudioFileStreamGetProperty this PropertyID corresponding value

extern OSStatus
AudioFileStreamGetPropertyInfo( 
     AudioFileStreamID               inAudioFileStream,
    AudioFileStreamPropertyID       inPropertyID,
    UInt32 * __nullable             outPropertyDataSize,
    Boolean * __nullable            outWritable)
    __OSX_AVAILABLE_STARTING(__MAC_10_5,__IPHONE_2_0);
Copy the code

The fourth parameters ioFlags is a return, said the property whether to cache, if need be kAudioFileStreamPropertyFlag_PropertyIsCached can be assigned a value

This callback happens multiple times, but not every time it needs to be handled. The list of propertyIds is as follows:

CF_ENUM(AudioFileStreamPropertyID)
{
    kAudioFileStreamProperty_ReadyToProducePackets          =   'redy',
    kAudioFileStreamProperty_FileFormat                     =   'ffmt',
    kAudioFileStreamProperty_DataFormat                     =   'dfmt',
    kAudioFileStreamProperty_FormatList                     =   'flst',
    kAudioFileStreamProperty_MagicCookieData                =   'mgic',
    kAudioFileStreamProperty_AudioDataByteCount             =   'bcnt',
    kAudioFileStreamProperty_AudioDataPacketCount           =   'pcnt',
    kAudioFileStreamProperty_MaximumPacketSize              =   'psze',
    kAudioFileStreamProperty_DataOffset                     =   'doff',
    kAudioFileStreamProperty_ChannelLayout                  =   'cmap',
    kAudioFileStreamProperty_PacketToFrame                  =   'pkfr',
    kAudioFileStreamProperty_FrameToPacket                  =   'frpk',
    kAudioFileStreamProperty_PacketToByte                   =   'pkby',
    kAudioFileStreamProperty_ByteToPacket                   =   'bypk',
    kAudioFileStreamProperty_PacketTableInfo                =   'pnfo',
    kAudioFileStreamProperty_PacketSizeUpperBound           =   'pkub',
    kAudioFileStreamProperty_AverageBytesPerPacket          =   'abpp',
    kAudioFileStreamProperty_BitRate                        =   'brat',
    kAudioFileStreamProperty_InfoDictionary                 =   'info'
};
Copy the code

So here’s a couple of propertyids

1. KAudioFileStreamProperty_ReadyToProducePackets said parsing is complete, can start have been separated in a frame of audio data

2. KAudioFileStreamProperty_BitRate said audio data bit rate, obtaining the property is to calculate the total time duration in the audio, In addition, ReadyToProducePackets still didn’t get bitRate when the data volume was small. In this case, we need to separate some frames. Then calculate the averageBitRate UInt32 averageBitRate = totalPackectByteCount/totalPacketCout;

3. KAudioFileStreamProperty_DataOffset said audio data in the entire audio file offset, because most of the audio file is a file header. The value of seek will play a larger role in seek, audio seek is not directly seek file location and seek time (such as seek to 2 minutes 10 seconds position), Seek will calculate the audio data’s byte offset based on time and then add the audio data’s offset to get the real offset in the file.

4. KAudioFileStreamProperty_DataFormat said audio file structure information, is a AudioStreamBasicDescription

struct AudioStreamBasicDescription
{
    Float64             mSampleRate;
    AudioFormatID       mFormatID;
    AudioFormatFlags    mFormatFlags;
    UInt32              mBytesPerPacket;
    UInt32              mFramesPerPacket;
    UInt32              mBytesPerFrame;
    UInt32              mChannelsPerFrame;
    UInt32              mBitsPerChannel;
    UInt32              mReserved;
};
Copy the code

5. Role and kAudioFileStreamProperty_DataFormat kAudioFileStreamProperty_FormatList, But the access to an array of AudioStreamBasicDescription, this parameter is used to support the AAC SBR that contains multiple file types of audio formats. But we don’t know how many formats there are, so we need to get the total data size first

AudioFormatListItem *formatList = malloc(formatListSize);
OSStatus status = AudioFileStreamGetProperty(_audioFileStreamID, kAudioFileStreamProperty_FormatList, &formatListSize, formatList);
if (status == noErr) {
    UInt32 supportedFormatsSize;
    status = AudioFormatGetPropertyInfo(kAudioFormatProperty_DecodeFormatIDs, 0, NULL, &supportedFormatsSize);
    if(status ! = noErr) { free(formatList);return;
    }
                
    UInt32 supportedFormatCount = supportedFormatsSize / sizeof(OSType);
    OSType *supportedFormats = (OSType *)malloc(supportedFormatsSize);
    status = AudioFormatGetProperty(kAudioFormatProperty_DecodeFormatIDs, 0, NULL, &supportedFormatsSize, supportedFormats);
    if(status ! = noErr) { free(formatList); free(supportedFormats);return;
    }
                
    for (int i = 0; i * sizeof(AudioFormatListItem) < formatListSize; i++) {
        AudioStreamBasicDescription format = formatList[i].mASBD;
            for (UInt32 j = 0; j < supportedFormatCount; j++) {
                if (format.mFormatID == supportedFormats[j]) {
                    format = format;
                    [self calculatePacketDuration];
                    break;
                }
            }
    }
    free(supportedFormats);
};
free(formatList);
Copy the code

6. KAudioFileStreamProperty_AudioDataByteCount said the amount of audio file audio data. This is used to calculate the total duration of the audio and can be used to calculate the corresponding byte offset when seeking

UInt32 audioDataByteCount;
UInt32 byteCountSize = sizeof(audioDataByteCount);
OSStatus status = AudioFileStreamGetProperty(_audioFileStreamID, kAudioFileStreamProperty_AudioDataByteCount, &byteCountSize, &audioDataByteCount);
if (status == noErr) {
    NSLog(@"audioDataByteCount : %u, byteCountSize: %u",audioDataByteCount,byteCountSize);
}
Copy the code

As with bitRate, audioDataByteCount may not be available when the amount of data is small, so approximate calculations are required

UInt32 dataOffset = ... ; //kAudioFileStreamProperty_DataOffset UInt32 fileLength = ... ; // Audio file size UInt32 audioDataByteCount = fileLength - dataOffset; // Audio file size UInt32 audioDataByteCount = fileLength - dataOffset;Copy the code

Here are two ways to calculate the audio duration:

  • Total duration = Total number of frames * Duration of a single frame

    Duration of a single frame = Number of samples in a single frame * duration of each frame

    Duration per frame = 1/ sample rate

    Sampling rate: the number of samples per unit time

  • Total duration = Total bytes of the file/bit rate

    Bit rate: the number of file bytes per unit time

After parsing the audio frame, let’s split the audio frame.

Split audio frame

After reading the format information to complete, let’s continue to call AudioFileStreamParseBytes method to separation of frames, and into the AudioFileStream_PacketsProc callback methods.

typedef void (*AudioFileStream_PacketsProc)(
            void *                          inClientData,
            UInt32                          inNumberBytes,
            UInt32                          inNumberPackets,
            const void *                    inInputData,
            AudioStreamPacketDescription    *inPacketDescriptions);
Copy the code

The first parameter is also a context object

The second parameter is the size of the data to process

The third parameter, how many frames were processed,

The fourth parameter, all data processed

The fifth parameter, AudioStreamPacketDescription array, storage for each frame of data from which a byte, a frame how many bytes

struct  AudioStreamPacketDescription
{
    SInt64  mStartOffset;
    UInt32  mVariableFramesInPacket;
    UInt32  mDataByteSize;
};
Copy the code

Process split audio frames

if (_discontinuous) {
    _discontinuous = NO;
}
    
if (numberOfBytes == 0 || numberOfPackets == 0) {
    return;
}
    
BOOL deletePackDesc = NO;
    
if(packetDescriptions == NULL) {// If packetDescriptions do not exist, the product is processed according to CBR, and packetDescriptions deletePackDesc = YES is generated after averaging each frame of data; UInt32 packetSize = numberOfBytes / numberOfPackets; AudioStreamPacketDescription *descriptions = (AudioStreamPacketDescription *)malloc(sizeof(AudioStreamPacketDescription)*numberOfPackets);for (int i = 0; i < numberOfPackets; i++) {
        UInt32 packetOffset = packetSize * i;
        descriptions[i].mStartOffset  = packetOffset;
        descriptions[i].mVariableFramesInPacket = 0;
        if (i == numberOfPackets-1) {
            descriptions[i].mDataByteSize = numberOfPackets-packetOffset;
        }else{
            descriptions[i].mDataByteSize = packetSize;
        }
    }
        packetDescriptions = descriptions;
}
    
NSMutableArray *parseDataArray = [NSMutableArray array];
for(int i = 0; i < numberOfPackets; i++) { SInt64 packetOffset = packetDescriptions[i].mStartOffset; / / the parse out the frame of data into its own buffer NParseAudioData * parsedData = [NParseAudioData parsedAudioDataWithBytes: packets + packetOffset packetDescription:packetDescriptions[i]]; [parseDataArray addObject:parsedData];if(_processedPacketsCount < BitRateEstimationMaxPackets) { _processedPacketsSizeTotal += parsedData.packetDescription.mDataByteSize; _processedPacketsCount += 1; [self calculateBitRate]; [self calculateDuration]; }}...if (deletePackDesc) {
    free(packetDescriptions);
}
Copy the code

If the field inPacketDescriptions is empty, it must be processed according to CBR data. But in fact, inPacketDescriptions generally return when parsing CBR data, because even the size of CBR data frame is not constant, for example, CBR MP3 will put a 1byte fill bit after each frame of data, which may not always exist, so the frame will have a 1byte float

Seek

So what this is really about is we drag the progress bar, and it takes a few minutes or a few seconds, and what we’re really doing is we’re addressing the file to the byte where we start playing the audio data

For the original PCM data, each PCM frame is of fixed length and the corresponding playback time is also fixed, but once converted into compressed audio data, it will be different because of the different encoding forms. For CBR, the PCM data frames contained in each frame are constant, so is the corresponding playing time of each frame. However, VBR is different. In order to ensure optimal data and minimum file size, the PCM data frames contained in each frame of VBR are not fixed, which leads to that it is not easy to seek VBR data in the case of streaming playback. Here we also only discuss seek under CBR.

We generally implement SEEK of CBR in this way

1. Approximately calculate which byte seek reaches

double seekToTime = ... ; // The unit is second. UInt64 audioDataByteCount =... ; // The value obtained by kAudioFileStreamProperty_AudioDataByteCount SInt64 dataOffset =... ; // kAudioFileStreamProperty_DataOffset double durtion =... ; // the duration calculated using the formula (AudioDataByteCount * 8)/BitRate approximateSeekOffset = data offset + seekToTime approximate number of bytes SInt64 parity = dataOffset + (seekToTime / duration) * audioDataByteCount;Copy the code

2. Calculate the seekToTime corresponding to the frame number and calculate the packetDuration using the audio format information obtained from the previous parsing

/ / you will first need to calculate each packet length AudioStreamBasicDescription asbd =... ; //// Double packetDuration = obtained by kAudioFileStreamProperty_DataFormat or kAudioFileStreamProperty_FormatList SInt64 seekToPacket = floor(seekToTime/packetDuration);Copy the code

AudioFileStreamSeek can be used to find the byte offset corresponding to a Packet frame:

  • KAudioFileStreamSeekFlag_OffsetIsEstimated indicates that the outDataByteOffset is estimated and not accurate. So seek should still be made using the parity calculated in step 1.

  • If kAudioFileStreamSeekFlag_OffsetIsEstimated is not present in ioFlags, the exact outDataByteOffset is the byte offset of the seekToPacket input. We can calculate the exact seekOffset and seekToTime from outDataByteOffset;

4. Read the corresponding data according to seekByteOffset and use AudioFileStreamParseByte to parse the data

Calculate the duration

The best way to get the time is to read from the ID3 information, which is the most accurate. If the ID3 information does not exist, it depends on the information in the file header to calculate.

The formula for calculating duration is as follows:

double duration = (audioDataByteCount * 8) / bitRate
Copy the code

The total number of bytes of audio data audioDataByteCount can be obtained by kAudioFileStreamProperty_AudioDataByteCount, BitRate can be obtained by kAudioFileStreamProperty_BitRate or by parsepart of the data and calculating the average bitRate.

Duration calculated in this way is more accurate for CBR data, but not for VBR data. Therefore, for VBR data, it is better to obtain duration from ID3 information, but it is not possible to calculate duration by calculating the average bit rate.

Finally, you need to turn off the AudioFileStream

extern OSStatus AudioFileStreamClose(AudioFileStreamID inAudioFileStream); 
Copy the code

summary

  • To use AudioFileStream, you need to call AudioFileStreamOpen first, preferably to help resolve the file type
  • When a data call AudioFileStreamParseBytes parsing, when value represents outside noErr parse error, kAudioFileStreamError_NotOptimized represents file lack of header information or discomfort confluence at the end of the file
  • After calling AudioFileStreamParseBytes will first enter the AudioFileStream_PropertyListenerProc, When the callback is kAudioFileStreamProperty_ReadyToProducePackets MyAudioFileStreamPacketsCallBack separation into the frame information.
  • Shut down the AudioFileStream after use

Here (github.com/Nicholas86/…). Is the code

The resources

  • (github.com/lilingyu062…).
  • (github.com/lilingyu062…).
  • (github.com/mattgallagh…).
  • (www.cocoachina.com/articles/19)…
  • (llyblog.com/2018/05/07/…).