preface

Before explaining audio rendering, it is necessary to understand the basic knowledge of audio, so this article is divided into basic concepts, AudioTrack and OpenSL ES Demo examples to explain, so as to help better understand audio rendering in Android.

The basic concepts of audio involve more knowledge points, the first half of this article will be introduced in detail, and the subsequent articles will basically involve the development of audio, with the foundation for the following content will be easier to use.

The basics of audio

The physical properties of sound

  • Sound is a wave

    Speaking of sound, I believe that anyone with normal hearing has heard sound, but how is sound produced? Remember the description in junior high school physics textbook – sound is produced by the vibration of objects. In fact, sound is a kind of pressure wave, when beating an object or playing a musical instrument, their vibration will cause the rhythmic vibration of the air, so that the surrounding air density changes, forming the longitudinal wave of density and density, thus generating sound waves, this phenomenon will continue until the vibration disappears.

  • The three elements of sound waves

    The three elements of sound wave are frequency, amplitude, and waveform. Frequency represents the height of the scale, amplitude represents loudness, and waveform represents timbre.

  • The medium of sound

    Sound has a wide range of media. It can travel through air, liquids and solids. And the medium is different, the propagation speed is also different, such as sound in the air propagation speed of 340m/s, in distilled water propagation speed of 1497 m/s, and in the iron rod propagation speed can be as high as 5200 m/s; But sound doesn’t travel in a vacuum.

  • The echo

    When we shout loudly over a mountain or an open area, we often hear an echo. The echo occurs because the sound hits an obstacle along the way and bounces back to us again.

    However, if the time difference between two kinds of sound to our ears in less than 80 milliseconds, we cannot separate the two, in fact, in our daily life to the human ear also in collecting echo, just because of the noise of the external environment and the echo of the db is lower, so we could not tell the difference between such noises in my ears, or the brain receives but could not tell the difference.

  • resonance

    There are light energy and water energy in nature, mechanical energy and electrical energy in life. In fact, sound can also produce energy. For example, when two objects with the same frequency are knocked on one object, the other object will also vibrate. This phenomenon is called resonance, and resonance proves that sound propagation can cause another object to vibrate, that is, sound propagation is also a kind of energy propagation process.

Digital audio

In the previous section, we mainly introduced the physical phenomena of sound and the common concepts in sound. We will also unify the terminology in the following sections. This section mainly introduces the concepts of digital audio.

In order to digitize analog signals, this section introduces digital audio into three concepts: sampling, quantization, and encoding. First, the analog signal is sampled, which is digitized on the time axis. According to Nyquist’s theorem (also known as sampling theorem), sound is sampled at a frequency more than 2 times higher than the highest frequency of sound. For high-quality audio signals, their frequency ranges from 20Hz to 20kHz, so the sampling frequency is generally 44.1khz, which ensures that the sampled sound can be digitized even when it reaches 20kHz. So that after digital processing, the sound quality of the human ear will not be reduced. The so-called 44.1khz means that 1 s will sample 44100 times.

Now, how do we represent each sample? This brings us to our second concept: quantification. Quantization refers to the digitalization of signals on the amplitude axis. For example, 16-bit binary signals are used to represent a sampling of sound. The range represented by 16-bit signals is [-32768, 32767], with a total of 65536 possible values. Therefore, the amplitude of the final analog audio signal is also divided into 65536 layers.

Since each component is a sample, how can so many samples be stored? This brings us to our third concept: coding. The so-called encoding is to record the digital data after sampling and quantization in a certain format, such as sequential storage or compressed storage, etc.

There are a lot of formats involved here, usually said audio naked data is PCM (Pulse Code Modulation) data. To describe PCM data, the following concepts are required: sampleFormat, sampleRate, and channel. Take CD sound quality as an example: quantization format is 16 bits (2 bytes), sampling rate is 44100, and number of sound channels is 2. These information describes the sound quality of CD. For sound formats, there is another concept that describes its size, called data bit rate, which is the number of bits in 1s, and it measures the volume of audio data per unit of time. For CD sound quality data, what is the bit rate? The calculation is as follows:

44100 * 16 * 2 = 1378.125 KBPSCopy the code

So how much storage does this kind of CD quality data take up in a minute? The calculation is as follows:

1378.125 * 60/8/1024 = 10.09 MBCopy the code

Of course, if the sampleFormat is more precise (4 bytes to describe a sample) or the sampleRate is more dense (48kHz to sampleRate), it will take up more storage and be able to describe sound details more accurately. The stored binary data indicates that the analog signal has been converted into a digital signal, and the binary data can be stored, played back, copied, or otherwise operated on.

Audio coding

Above mentioned CD quality data sampling format, used to calculate the per minute storage space is about 10.09 MB, if it’s just store it in CD or hard disk, may be acceptable, but if you want to in the network real-time online transfer, then the amount of data may be too big, so must be compressed coding. One of the basic criteria for compression coding is the compression ratio, which is usually less than 1. Compression algorithms include lossy compression and lossless compression. Uncompressed means that the decompressed data can be completely restored. In the commonly used compression format, the most used is lossy compression. Lossy compression means that the decompressed data cannot be completely restored and some information will be lost. The smaller the compression ratio is, the more information will be lost and the greater the distortion will be after the signal is restored. You can select different compression algorithms, such as PCM, WAV, AAC, MP3, and Ogg, according to different application scenarios (including storage devices, transmission networks, and playback devices).

  • WAV coding

    WAV coding is the PCM data format in front of 44 bytes, respectively used to store PCM sampling rate, number of sound channels, data format and other information.

    Features: Good sound quality, lots of software support.

    Scene: Intermediate files for multimedia development, save music and sound materials.

  • MP3 encoding

    MP3 has a good compression ratio. An MP3 file encoded using LAME (an implementation of the MP3 encoding format) at a medium to high bit rate sounds very similar to the source WAV file. Of course, the appropriate parameters should be adjusted for different application scenarios to achieve the best results.

    Features: sound quality in 128 Kbit/s above performance is good, high compression ratio, a large number of software and hardware support, good compatibility.

    Scenario: Music appreciation that requires compatibility at high bit rates.

  • AAC encoding

    AAC is a new generation of audio lossy compression technology. Through some additional coding technologies (such as PS, SBR), it derived three main coding formats lC-AAC, HE-AAC, He-AAC V2. Lc-aac is a traditional AAC, relatively speaking, it is mainly applied to the encoding of medium and high bit rate scenarios (>=80Kbit/s). He-aac is equivalent to AAC + SBR, which is mainly applied to low and medium bit rate coding (<= 80Kbit/s). The newly released HE-AAC V2 is equivalent to AAC + SBR + PS, which is mainly used for encoding in low bit rate scenarios (<= 48Kbit/s). In fact, most encoders are set to <= 48Kbit/s automatically enable PS technology, and > 48Kbit/s does not add PS, equivalent to ordinary HE-AAC.

    Features: excellent performance at bit rate less than 128Kbit/s, and mostly used for audio encoding in video.

    Scenario: 128 Kbit/s audio encoding, mostly used for video audio track encoding.

  • Ogg coding

    Ogg is a very promising encoding with excellent performance at various bit rates, especially in low and medium bit rate scenarios. In addition to good sound quality, Ogg is completely free, which lays a good foundation for Ogg to get more support. Ogg has a very good algorithm, can achieve better sound quality with a smaller bit rate, 128 Kbit/s Ogg is better than 192kbit/s or higher bit rate MP3. But at present because there is no media service software support, so the digital broadcast based on Ogg can not be realized. Ogg’s current support is not good enough, either in software or hardware support, is not comparable to MP3.

    Features: can use smaller bit rate than MP3 to achieve better sound quality than MP3, high, medium and low bit rate have good performance, compatibility is not good, streaming features are not supported.

    Scenario: Audio message scenario for language chat.

Audio rendering on Android platform

The basic concept of audio is finished above, now we realize the audio rendering under Android, laying a foundation for the realization of audio and video player, audio and video capture video recording in the explanation.

[PCM file – links: pan.baidu.com/s/1ISS7bHMr… password: 5 z1n] (link: pan.baidu.com/s/1ISS7bHMr… Password: 5 z1n)

The use of AudioTrack

Since AudioTrack is the lowest level audio playback API provided by the Android SDK layer, only raw data PCM is allowed to be entered. In contrast to MediaPlayer, for a compressed audio file (MP3, AAC, etc.), it only needs to implement decoding and buffer control itself. Since only the audio rendering side of AudioTrack is involved here, codec will be explained later, so this section only introduces how to render audio PCM bare data using AudioTrack.

  1. Configuration AudioTrack

    public AudioTrack(int streamType, int sampleRateInHz, int channelConfig, int audioFormat,
                int bufferSizeInBytes, int mode)
    Copy the code

    StreamType: Android phones provide multiple audio management policies, which determine the final rendering effect when multiple processes need to play audio. The optional values of this parameter are defined as constants in the AudioManager class, including the following:

        /** The phone rings */
        public static final int STREAM_VOICE_CALL = AudioSystem.STREAM_VOICE_CALL;
        /** System ringtone */
        public static final int STREAM_SYSTEM = AudioSystem.STREAM_SYSTEM;
        / * * * /
        public static final int STREAM_RING = AudioSystem.STREAM_RING;
        /** Music */
        public static final int STREAM_MUSIC = AudioSystem.STREAM_MUSIC;
        /** warning sound */
        public static final int STREAM_ALARM = AudioSystem.STREAM_ALARM;
        /** notification */
        public static final int STREAM_NOTIFICATION = AudioSystem.STREAM_NOTIFICATION;
    Copy the code

    SampleRateInHz: Sampling rate, that is, the number of times the audio will be sampled per second. The sampling frequency list is as follows: 8000, 16000, 22050, 24000,32000, 44100, 48000, etc., you can make a reasonable choice according to your own application scenario.

    channelConfig: In AudioFormat, the commonly-used channels are CHANNEL_IN_MONO and CHANNEL_IN_STEREO. Now, most mobile phones use pseudo-stereo microphones. For performance reasons, It is recommended to use mono channel for collection.

    AudioFormat: This parameter is used to configure the “data bit width”, that is, the sampling format. The optional values are defined as constants in the audioFormat class: ENCODING_PCM_16BIT (compatible with all mobile phones), ENCODING_PCM_8BIT,

    BufferSizeInBytes: Sets the size of the internal audio buffer. The AudioTrack class provides a bufferSizeInBytes function to help developers determine the size of the audio buffer.

     static public int getMinBufferSize(int sampleRateInHz, int channelConfig, int audioFormat)
    Copy the code

    In real development, it is highly recommended that this function calculate the size of the buffer that needs to be passed in rather than calculate it manually.

    Mode: AudioTrack provides two playback modes. The optional values are defined in the form of constants in the AudioTrack class. One is MODE_STATIC, which requires all data to be written into the playback buffer at one time. The other is MODE_STREAM, which requires continuous writing of audio data at regular intervals, and can theoretically be applied to any audio playback scenario.

  2. Play

    // Whether the current play instance was initialized successfully. If it is in the initialized state and no play state, then call play
    if (null! = mAudioTrack && mAudioTrack.getState() ! = AudioTrack.STATE_UNINITIALIZED && mAudioTrack.getPlayState() ! = PLAYSTATE_PLAYING) mAudioTrack.play();Copy the code
  3. Destruction of resources

        public void release(a) {
            Log.d(TAG, "==release===");
            mStatus = Status.STATUS_NO_READY;
            if(mAudioTrack ! =null) {
                mAudioTrack.release();
                mAudioTrack = null; }}Copy the code
  4. For example, go to the AudioTracker section of the AudioPlay project. You need to put the PCM files in the raw directory of the project into the sdcard directory.

Use of OpenSL ES

OpenSL ES official documentation

OpenSL ES (Open Sound Library for Embedded System) is a standard for audio acceleration. OpenSL ES are no licence fees, cross-platform, carefully optimized for embedded systems hardware accelerated audio API, it will save the local application of the embedded mobile multimedia devices developer provides a standardized, high performance and low response time audio function realization method, but also realizes the direct cross-platform deployment of soft/hard audio performance, Not only is it easier to implement, but it also promotes the growth of the premium audio market.

In Android, High Level Audio Libs is the input and output of the Audio Java layer API, which belongs to the advanced API. OpenSL ES is a lower-level API, belonging to the C language API. In development, advanced apis are generally used directly, except for performance bottlenecks such as voice live chat, 3D Audio, and some Effects. Developers can develop OpenSL ES Audio applications directly through C/C++.

Before using the OpenSL ES API, we need to import the OpenSL ES header file as follows:

// This is the standard OpenSL ES library
#include <SLES/OpenSLES.h>
// This is an extension for Android, so be careful if you want to break the platform
#include <SLES/OpenSLES_Android.h>
Copy the code
  1. Create the engine and get the engine interface

    void createEngine(a) {
            // OpenLSES is involved in audio playback
            // TODO first step: create the engine and get the engine interface
            // 1.1 creating an engineObject: SLObjectItf engineObject
            SLresult result = slCreateEngine(&engineObj, 0.NULL.0.NULL.NULL);
            if(SL_RESULT_SUCCESS ! = result) {return;
            }
    
            1.2 Initializing the engine
            result = (*engineObj) ->Realize(engineObj, SL_BOOLEAN_FALSE);
            if(SL_BOOLEAN_FALSE ! = result) {return;
            }
    
            // 1.3 Obtaining SLEngineItf engineInterface
            result = (*engineObj) ->GetInterface(engineObj, SL_IID_ENGINE, &engine);
            if(SL_RESULT_SUCCESS ! = result) {return; }}Copy the code
  2. Set up the mixer

    // TODO takes the second step to set up the mixer
            // 2.1 Creating a mixer: SLObjectItf outputMixObject
            result = (*engine)->CreateOutputMix(engine, &outputMixObj, 0.0.0);
    
            if(SL_RESULT_SUCCESS ! = result) {return;
            }
    
            // 2.2 Initializing the mixer
            result = (*outputMixObj)->Realize(outputMixObj, SL_BOOLEAN_FALSE);
            if(SL_BOOLEAN_FALSE ! = result) {return;
            }
    Copy the code
  3. Creating a player

    // TODO takes the third step to create a player
        // 3.1 Configuring input Voice Information
        // Create a buffer queue
        SLDataLocator_AndroidSimpleBufferQueue locBufq = {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 2};
        // PCM data format
        // SL_DATAFORMAT_PCM: Data format is PCM
        // 2: dual channel
        // SL_samplinGrateful 44_1: Sample rate 44100 (44.1 hz most widely used, most compatible)
        // SL_PCMSAMPLEFORMAT_FIXED_16: Sampling format is 16bit (16bit) (2 bytes)
        // SL_PCMSAMPLEFORMAT_FIXED_16: Data size is 16 bits (16 bits) (2 bytes)
        / / SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT: left and right channels (double channel) (double channel of stereo effect)
        // SL_BYTEORDER_LITTLEENDIAN: small endian mode
        SLDataFormat_PCM formatPcm = {SL_DATAFORMAT_PCM, (SLuint32) mChannels, mSampleRate,
                                      (SLuint32) mSampleFormat, (SLuint32) mSampleFormat,
                                      mChannels == 2 ? 0 : SL_SPEAKER_FRONT_CENTER,
                                      SL_BYTEORDER_LITTLEENDIAN};
        /* * Enable Fast Audio when possible: once we set the same rate to be the native, fast audio path * will be triggered */
        if (mSampleRate) {
            formatPcm.samplesPerSec = mSampleRate;
        }
    
        // Data source Puts the above configuration information into this data source
        SLDataSource audioSrc = {&locBufq, &formatPcm};
    
        // 3.2 Configuring audio Tracks (output)
        // Set the mixer
        SLDataLocator_OutputMix locOutpuMix = {SL_DATALOCATOR_OUTPUTMIX, mAudioEngine->outputMixObj};
        SLDataSink audioSink = {&locOutpuMix, nullptr};
    
        /* * create audio player: * fast audio does not support when SL_IID_EFFECTSEND is required, skip it * for fast audio case */
        // Required interface Indicates the interface of the operation queue
        const SLInterfaceID ids[3] = {SL_IID_BUFFERQUEUE, SL_IID_VOLUME, SL_IID_EFFECTSEND};
        const SLboolean req[3] = {SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE};
    
        // 3.3 Creating a player
        result = (*mAudioEngine->engine)->CreateAudioPlayer(mAudioEngine->engine, &mPlayerObj,
                                                            &audioSrc, &audioSink,
                                                            mSampleRate ? 2 : 3, ids, req);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("CreateAudioPlayer failed: %d", result);
            return false;
        }
    
        // 3.4 Initializing the player: mPlayerObj
        result = (*mPlayerObj)->Realize(mPlayerObj, SL_BOOLEAN_FALSE);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj Realize failed: %d", result);
            return false;
        }
    3.5 Obtaining the player interface: SLPlayItf mPlayerObj
        result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_PLAY, &mPlayer);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj GetInterface failed: %d", result);
            return false;
        }
    Copy the code
  4. Set the playback callback function

    // Set up the playback callback function
        / / 4.1 for player queue interface: SLAndroidSimpleBufferQueueItf mBufferQueue
        result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_BUFFERQUEUE, &mBufferQueue);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj GetInterface failed: %d", result);
            return false;
        }
    / / 4.2 set callback void playerCallback (SLAndroidSimpleBufferQueueItf bq, void * context)
        result = (*mBufferQueue)->RegisterCallback(mBufferQueue, playerCallback, this);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj RegisterCallback failed: %d", result);
            return false;
        }
    
        mEffectSend = nullptr;
        if (mSampleRate == 0) {
            result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_EFFECTSEND, &mEffectSend);
            if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj GetInterface failed: %d", result);
                return false;
            }
        }
    
        result = (*mPlayerObj)->GetInterface(mPlayerObj, SL_IID_VOLUME, &mVolume);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj GetInterface failed: %d", result);
            return false;
        }
    Copy the code
  5. Setting the Player state

        // TODO step 5: Set the player state to play
        result = (*mPlayer)->SetPlayState(mPlayer, SL_PLAYSTATE_PLAYING);
        if(result ! = SL_RESULT_SUCCESS) { LOGE("mPlayerObj SetPlayState failed: %d", result);
            return false;
        }
    Copy the code
  6. Manually activate the callback function

    void OpenSLAudioPlay::enqueueSample(void *data, size_t length) {
        // Enqueue the second audio frame only after it has finished playing
        pthread_mutex_lock(&mMutex);
        if (mBufSize < length) {
            mBufSize = length;
            if (mBuffers[0]) {
                delete[] mBuffers[0];
            }
            if (mBuffers[1]) {
                delete[] mBuffers[1];
            }
            mBuffers[0] = new uint8_t[mBufSize];
            mBuffers[1] = new uint8_t[mBufSize];
        }
        memcpy(mBuffers[mIndex], data, length);
        TODO step 6: Manually activate the callback function
        (*mBufferQueue)->Enqueue(mBufferQueue, mBuffers[mIndex], length);
        mIndex = 1 - mIndex;
    }
    Copy the code
  7. Release resources

    extern "C"
    JNIEXPORT void JNICALL
    Java_com_devyk_audioplay_AudioPlayActivity_nativeStopPcm(JNIEnv *env, jclass type) {
        isPlaying = false;
        if (slAudioPlayer) {
            slAudioPlayer->release();
            delete slAudioPlayer;
            slAudioPlayer = nullptr;
        }
        if (pcmFile) {
            fclose(pcmFile);
            pcmFile = nullptr; }}Copy the code

Refer to the OpenSL ES section of the repository for the complete code. Note: The PCM file in raw must be placed in the sdcard root directory.

conclusion

This article introduces the basics of audio and uses AudioTrack and OpenSL ES to render raw stream audio data. You can deepen your understanding according to my source code.

Final page effect:

Thank you

  • Audio and video development guide – Zhan Xiaokai