(9) Develop an Android player from 0 ~ 1 (support multi-protocol network pull stream/local file)

preface

Now an APP is playing more and more tricks are almost inseparable from audio, video, pictures and other data display, this article introduces the audio and video playback, audio and video playback can be used to mature open source player, (recommend a good player open source project GSYVideoPlayer). If you use the open source player there is no great learning significance, this article will introduce based on FFmpeg 4.2.2, Librtmp library from 0~1 to develop an Android player process and example code writing.

The first things you need to know to develop a player are:

FFmpeg RTMP hybrid cross-compilation

C/C + +

The NDK, JNI

Audio and video decoding and synchronization

After learning, the general effect of our player is as follows:

The effect looks a bit stuck, which is related to the actual network environment, this player has RTMP/HTTP /URL/File and other protocols to play.

RTMP is compiled in combination with FFmpeg

RTMP

Introduction:

RTMP is an acronym for Real Time Messaging Protocol. This protocol is based on TCP and is a protocol family, including RTMP and RTMPT/RTMPS/RTMPE. RTMP is a network protocol designed for real-time data communication. It is mainly used for audio, video and data communication between Flash/AIR platforms and streaming/interactive servers that support the RTMP protocol. Software supporting this protocol includes Adobe Media Server, Ultrant Media Server, Red5, etc. RTMP, like HTTP, belongs to the application layer of the TCP/IP four-tier model.

Download:

git clone https://github.com/yixia/librtmp.git
Copy the code

Scripting:

#! /bin/bash

#Configure NDK environment variables
NDK_ROOT=$NDK_HOME

#Specify the CPU
CPU=arm-linux-androideabi

#Specify the Android APIANDROID_API=17 TOOLCHAIN=$NDK_ROOT/toolchains/$CPU-4.9/prebuilt/linux-x86_64 export XCFLAGS="-isysroot $NDK_ROOT/sysroot  -isystem $NDK_ROOT/sysroot/usr/include/arm-linux-androideabi -D__ANDROID_API__=$ANDROID_API" export XLDFLAGS="--sysroot=${NDK_ROOT}/platforms/android-17/arch-arm " export CROSS_COMPILE=$TOOLCHAIN/bin/arm-linux-androideabi- make install SYS=android prefix=`pwd`/result CRYPTO= SHARED= XDEF=-DNO_SSLCopy the code

The compile is successful if the following result appears:

Mix compilation

The librtmp module will be commented out in the configure script as follows:

Modify FFmpeg compilation script:

#! /bin/bash

#The NDK_ROOT variable points to the NDK directory
NDK_ROOT=$NDK_HOME
#The TOOLCHAIN variable points to the directory where cross-compiled GCC is in the NDKTOOLCHAIN = $NDK_ROOT toolchains/arm - Linux - androideabi - 4.9 / prebuilt/Linux - x86_64
#Specify the Android API version
ANDROID_API=17

#This variable is used to specify the directory in which the libraries and header files will be stored after compilation
PREFIX=./android/armeabi-v7a

#RTMP path
RTMP=/root/android/librtmp/result

#Execute the configure script to generate the makefile
#--prefix: installation directory
#--enable-small: optimizes the size
#Disable-programs: Instead of compiling FFmpeg programs (command-line tools), we need to obtain static (dynamic) libraries.
#--disable-avdevice: disables the avdevice module. This module is useless in Android
#--disable-encoders: Disable all encoders (play without encoding)
#--disable-muxers: Disable all multiplexers (wrappers). You don't need to generate files like MP4, so disable
#--disable-filters: Turns off the video filters
#--enable-cross-compile: enables cross-compilation
#--cross-prefix: GCC prefix XXX/XXX/XXX - GCC to XXX/XXX/XXX -
#disable-shared enable-static You can do this by default.
#--sysroot: 
#--extra-cflags: arguments passed to GCC
#Arch -- target-OS: Must be given

./configure \
--prefix=$PREFIX \
--enable-small \
--disable-programs \
--disable-avdevice \
--disable-encoders \
--disable-muxers \
--disable-filters \
--enable-librtmp \
--enable-cross-compile \
--cross-prefix=$TOOLCHAIN/bin/arm-linux-androideabi- \
--disable-shared \
--enable-static \
--sysroot=$NDK_ROOT/platforms/android-$ANDROID_API/arch-arm \
--extra-cflags="-isysroot $NDK_ROOT/sysroot -isystem $NDK_ROOT/sysroot/usr/include/arm-linux-androideabi -D__ANDROID_API__=$ANDROID_API -U_FILE_OFFSET_BITS  -DANDROID -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp -mfpu=vfpv3-d16 -mthumb -Wa,--noexecstack -Wformat -Werror=format-security  -O0 -fPIC -I$RTMP/include" \
--extra-ldflags="-L$RTMP/lib" \
--extra-libs="-lrtmp" \
--arch=arm \
--target-os=android

#After running the script above to generate the makefile, use make to execute the script
make clean
make
make install
Copy the code

If the following appears, the compilation has started:

If the following is displayed, the compilation is successful:

As you can see from the figure above, both the static library and the header library have been compiled successfully. Now we are ready to write the code.

Player development

The flow chart

To implement a web/local player, we must know its flow, as shown below:

Project preparation

Create a new Android project and import the respective libraries

Cmakelists. TXT compilation script

cmake_minimum_required(VERSION 3.4.1)

# define ffmpeg, RTMP, yk_player directory
set(FFMPEG ${CMAKE_SOURCE_DIR}/ffmpeg)
set(RTMP ${CMAKE_SOURCE_DIR}/librtmp)
set(YK_PLAYER ${CMAKE_SOURCE_DIR}/player)

# specify the ffmpeg header directory
include_directories(${FFMPEG}/include)

# specify ffmpeg static library file directory
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -L${FFMPEG}/libs/${CMAKE_ANDROID_ARCH_ABI}")

RTMP static library file directory
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -L${RTMP}/libs/${CMAKE_ANDROID_ARCH_ABI}")

# Batch add your own CPP files, do not add *. H
file(GLOB ALL_CPP ${YK_PLAYER}/*.cpp)

# Add your own CPP source files to generate dynamic library
add_library(YK_PLAYER SHARED ${ALL_CPP})

# find NDK log library in system
find_library(log_lib
        log)

# Finally start linking libraries
target_link_libraries(
        YK_PLAYER
        Don't worry about adding ffmpeg lib order and causing application crash
        -Wl,--start-group
        avcodec avfilter avformat avutil swresample swscale
        -Wl,--end-group
        z
        rtmp
        android
        # Audio Playback
        OpenSLES
        ${log_lib}
        )

Copy the code

Defining native functions

   /**
     * 当前 ffmpeg 版本
     */
    public native String getFFmpegVersion(a);

    /** * Set surface *@param surface
     */
    public native void setSurfaceNative(Surface surface);

    /** * do some preparatory work *@paramMDataSource playback */
    public native void prepareNative(String mDataSource);

    /** * Ready to play */
    public native void startNative(a);

    /** * Call this function to resume playing */ if you click to stop playing
    public native void restartNative(a);

    /** * stop playing */
    public native void stopNative(a);

    /** * Release resources */
    public native void releaseNative(a);

    /** * Whether to play *@return* /
    public native boolean isPlayerNative(a);
Copy the code

decapsulation

According to before we know the flow chart of the calls to set up the data source, after start decapsulation ffmpeg (express packages can be understood as received, we need to open the parcel and see what’s inside, then classifies the take out place), here is a data source into coded audio data and video data, subtitles, etc., The FFmpeg API is used to decompose the data as follows:

/** * This function is truly unpacked and is called when the child thread is opened. * /
void YKPlayer::prepare_() {
    LOGD("Step 1 Open streaming media address");
    //1. Open the streaming media address (file path, live address)
    If avformat_open_input is NULL, the avformat_alloc_context is automatically applied when the avformat_open_input function is executed
    // Encapsulates the format information of the media stream
    formatContext = avformat_alloc_context();

    // dictionary: key-value pairs
    AVDictionary *dictionary = 0;
    av_dict_set(&dictionary, "timeout"."5000000".0);// The unit is subtle


    /** ** @param AVFormatContext: passing a format context is a secondary pointer * @param const char *url: * @param ff_const59 AVInputFormat * FMT: An input encapsulation format that ffmPEG can check for itself, so it gives 0 * @param AVDictionary **options: dictionary arguments */
    int result = avformat_open_input(&formatContext, data_source, 0, &dictionary);
    // result-13 --> No read/write permission
    // result-99 --> write NULl for the third parameter
    LOGD("Avformat_open_input - > % d, % s", result, data_source);
    // Release the dictionary
    av_dict_free(&dictionary);


    if (result) {//0 on success true
        // Your file path, or, your file is corrupted, need to tell the user
        // Send error messages to the Java layer (callback to Java)
        if (pCallback) {
            pCallback->onErrorAction(THREAD_CHILD, FFMPEG_CAN_NOT_OPEN_URL);
        }
        return;
    }

    // The second step is to find information about audio and video streams in the media
    LOGD("Step 2 Find information about audio and video streams in media.");
    result = avformat_find_stream_info(formatContext, 0);
    if (result < 0) {
        if (pCallback) {
            pCallback->onErrorAction(THREAD_CHILD, FFMPEG_CAN_NOT_FIND_STREAMS);
            return; }}// Step 3 According to the stream information, the number of streams, loop search, audio stream and video stream
    LOGD("Step three, based on the stream information, the number of streams, loop search, audio stream video stream.");
    //nb_streams = Number of streams
    for (int stream_index = 0; stream_index < formatContext->nb_streams; ++stream_index) {
        // Step 4 get the media stream
        LOGD("Step 4 Get media streams);
        AVStream *stream = formatContext->streams[stream_index];


        // Step 5: Get the parameter information of decoding stream from stream, distinguish whether it is audio or video
        LOGD("The fifth step is to get the parameter information of decoding the stream from the stream and distinguish whether it is audio or video.");
        AVCodecParameters *codecParameters = stream->codecpar;

        // The sixth step is to obtain the current stream decoder through the codec ID in the stream codec parameter
        LOGD("Step 6 Obtain the decoder of the current stream through the codec ID in the stream codec parameter.");
        AVCodec *codec = avcodec_find_decoder(codecParameters->codec_id);
        // Current decoding may not be supported
        // If no decoder is found, recompile ffmpeg --enable-librtmp
        if(! codec) { pCallback->onErrorAction(THREAD_CHILD, FFMPEG_FIND_DECODER_FAIL);return;
        }

        // Get the decoder context from the decoder
        LOGD("Step seven: Get the decoder context from the decoder you have.");
        AVCodecContext *codecContext = avcodec_alloc_context3(codec);
        if(! codecContext) { pCallback->onErrorAction(THREAD_CHILD, FFMPEG_ALLOC_CODEC_CONTEXT_FAIL);return;
        }

        // Set parameters for the decoder context
        LOGD("Step 8 Set parameters for the decoder context");
        result = avcodec_parameters_to_context(codecContext, codecParameters);
        if (result < 0) {
            pCallback->onErrorAction(THREAD_CHILD, FFMPEG_CODEC_CONTEXT_PARAMETERS_FAIL);
            return;
        }

        // Step 9 open the decoder
        LOGD("Step 9 Open the decoder");
        result = avcodec_open2(codecContext, codec, 0);
        if (result) {
            pCallback->onErrorAction(THREAD_CHILD, FFMPEG_OPEN_DECODER_FAIL);
            return;
        }

        // You can get the time base in media stream
        AVRational baseTime = stream->time_base;

        // Get the stream type codec_type from the encoder parameters
        LOGD("Step 10 Get the stream type codec_type from the encoder parameters");
        if (codecParameters->codec_type == AVMEDIA_TYPE_AUDIO) {
            audioChannel = new AudioChannel(stream_index, codecContext,baseTime);
        } else if (codecParameters->codec_type == AVMEDIA_TYPE_VIDEO) {
            // Get the video FPS
            // Average frame rate == time base
            AVRational frame_rate = stream->avg_frame_rate;
            int fps_value = av_q2d(frame_rate);
            videoChannel = newVideoChannel(stream_index, codecContext, baseTime, fps_value); videoChannel->setRenderCallback(renderCallback); }}//end for

    // Step 11 If there is no audio and video data in the stream
    LOGD("Step 11 If there is no audio and video data in the stream");
    if(! audioChannel && ! videoChannel) { pCallback->onErrorAction(THREAD_CHILD, FFMPEG_NOMEDIA);return;
    }

    // Step 12 Either have audio or video or both
    LOGD("Step 12 has either audio or video or both.");
    // When ready, notify the Android upper layer to start playing
    if (this->pCallback) { pCallback->onPrepared(THREAD_CHILD); }}Copy the code

CodecParameters ->codec_type = codecParameters->codec_type = codec_type = codec_type

Obtain data to be decoded (e.g. H264, AAC)

After decapsulation, we put the data to be decoded into the queue, as shown below:

/** * Read packet, undecoded, audio/video packet in queue */
void YKPlayer::start_() {
    // Loop through the audio and video package
    while (isPlaying) {
        if (isStop) {
            av_usleep(2 * 1000);
            continue;
        }
        LOGD("start_");
        // Memory leak point 1, solution: control queue size
        if (videoChannel && videoChannel->videoPackages.queueSize() > 100) {
            // The data in the dormant wait queue is consumed
            av_usleep(10 * 1000);
            continue;
        }

        // Memory leak point 2, solution controls queue size
        if (audioChannel && audioChannel->audioPackages.queueSize() > 100) {
            // The data in the dormant wait queue is consumed
            av_usleep(10 * 1000);
            continue;
        }

        //AVPacket may be audio or video, but it is not decoded
        AVPacket *packet = av_packet_alloc();
        // This line is completed, packet has audio and video data
        int ret = av_read_frame(formatContext, packet);
        /* if (ret ! = 0) { return; } * /
        if(! ret) {if (videoChannel && videoChannel->stream_index == packet->stream_index) {/ / video package
                // Undecoded video packets are enqueued
                videoChannel->videoPackages.push(packet);
            } else if (audioChannel && audioChannel->stream_index == packet->stream_index) {/ / voice package
                // Add the voice packet to the queue for decodingaudioChannel->audioPackages.push(packet); }}else if (ret == AVERROR_EOF) { // the data is read
            //TODO----
            LOGD("Unpack completed %s"."Read complete.")
            isPlaying = 0;
            stop();
            release();
            break;
        } else {
            LOGD(Unpacking "% s"."Read failed")
            break;// Failed to read}}//end while
    // Finally release the work
    isPlaying = 0;
    isStop = false;
    videoChannel->stop();
    audioChannel->stop();
}

Copy the code

Av_packet_alloc (); Get the pointer type AVPacket to be decoded and put it into the corresponding audio and video queue for decoding.

video

decoding

As we know in the previous step, the corresponding data was put into the queue to be decoded after the decoding was completed. In the next step, we will get the data from the queue for decoding, as shown in the figure below:

/** ** video decoding */
void VideoChannel::video_decode() {
    AVPacket *packet = 0;
    while (isPlaying) {
        if (isStop) {
            // The thread sleeps for 10s
            av_usleep(2 * 1000);
            continue;
        }

        // Control the queue size to avoid fast production and full consumption
        if (isPlaying && videoFrames.queueSize() > 100) {
// LOGE(" size :%d", videofames.queuesize ());
            // Thread sleep waits for the data in the queue to be consumed
            av_usleep(10 * 1000);//10s
            continue;
        }

        int ret = videoPackages.pop(packet);

        // If you stop playing, break out of the loop, release
        if(! isPlaying) { LOGD("isPlaying %d", isPlaying);
            break;
        }

        if(! ret) {continue;
        }

        // Start fetching the video packet to be decoded
        ret = avcodec_send_packet(pContext, packet);
        if (ret) {
            LOGD("ret %d", ret);
            break;/ / failed
        }

        / / release the packet
        releaseAVPacket(&packet);

        //AVFrame gets the decoded raw packet
        AVFrame *frame = av_frame_alloc();
        ret = avcodec_receive_frame(pContext, frame);
        if (ret == AVERROR(EAGAIN)) {
            / / from the new
            continue;
        } else if(ret ! =0) {
            LOGD("ret %d", ret);
            releaseAVFrame(&frame);// Free memory
            break;
        }

        // The decoded video data YUV is added to queue
        videoFrames.push(frame);
    }

    // release the loop
    if (packet)
        releaseAVPacket(&packet);
}
Copy the code

According to the above code, the data to be decoded is mainly put into avCODEC_send_packet, and then received by avCODEC_receive_frame function. Finally, the decoded YUV data is put into the original data queue for conversion

YUV turn RGBA

YUV cannot be played directly in Android, we need to convert it to RGB format and then call local nativeWindow or OpenGL ES for rendering. The following is a direct call to FFmpeg API for conversion, the code is as follows:

void VideoChannel::video_player() {
    //1. Raw video data YUV --> rGBA
    /** * sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, SwsFilter *srcFilter, SwsFilter *dstFilter, const double *param) */
    SwsContext *swsContext = sws_getContext(pContext->width, pContext->height,
                                            pContext->pix_fmt,
                                            pContext->width, pContext->height, AV_PIX_FMT_RGBA,
                                            SWS_BILINEAR, NULL.NULL.NULL);
    //2. Allocate memory for dst_data
    uint8_t *dst_data[4];
    int dst_linesize[4];
    AVFrame *frame = 0;

    /** * Pointers [4] : Save the address of the image channel. If RGB, the first three Pointers point to the memory addresses of R,G, and B, respectively. The fourth pointer is retained without * linesizes[4] : the memory-aligned step size of each channel of the saved image, i.e. the width of the aligned memory of a row, which is equal to the image width. * W: Image width to be allocated memory. * h: image height to apply memory. * pix_fmt: The pixel format of the image to be allocated memory. * align: value for memory alignment. * Return value: the total size of the requested memory space. If the value is negative, the application fails. * /
    int ret = av_image_alloc(dst_data, dst_linesize, pContext->width, pContext->height,
                             AV_PIX_FMT_RGBA, 1);
    if (ret < 0) {
        printf("Could not allocate source image\n");
        return;
    }

    //3. YUV -> RGBA format conversion frame by frame conversion
    while (isPlaying) {

        if (isStop) {
            // The thread sleeps for 10s
            av_usleep(2 * 1000);
            continue;
        }

        int ret = videoFrames.pop(frame);

        // If it stops playing, it breaks out of the loop and needs to be released
        if(! isPlaying) {break;
        }

        if(! ret) {continue;
        }

        // The actual conversion function,dst_data is rGBA data
        sws_scale(swsContext, frame->data, frame->linesize, 0, pContext->height, dst_data,
                  dst_linesize);

        // Start rendering, display screen
        Render a frame of image (width, height, data)
        renderCallback(dst_data[0], pContext->width, pContext->height, dst_linesize[0]);
        releaseAVFrame(&frame);// Frame is released after rendering.
    }
    releaseAVFrame(&frame);// Frame is released after rendering.
    // Stop playing flag
    isPlaying = 0;
    av_freep(&dst_data[0]);
    sws_freeContext(swsContext);
}
Copy the code

The above code is directly through the sws_scale function to perform YUV -> RGBA conversion.

Rendering RGBA

After the conversion, we directly call ANativeWindow to render, and the code is as follows:

/** * set to play surface */
extern "C"
JNIEXPORT void JNICALL
Java_com_devyk_player_1common_PlayerManager_setSurfaceNative(JNIEnv *env, jclass type, jobject surface) {
    LOGD("Java_com_devyk_player_1common_PlayerManager_setSurfaceNative");
    pthread_mutex_lock(&mutex);
    if (nativeWindow) {
        ANativeWindow_release(nativeWindow);
        nativeWindow = 0;
    }
    // Create a new window for video display window
    nativeWindow = ANativeWindow_fromSurface(env, surface);

    pthread_mutex_unlock(&mutex);


}
Copy the code

Render:

/** ** dedicated rendering function * @param src_data decoded video RGBA data * @param width video width * @param height video height * @param src_size number of lines size related information * * /
void renderFrame(uint8_t *src_data, int width, int height, int src_size) {
    pthread_mutex_lock(&mutex);

    if(! nativeWindow) { pthread_mutex_unlock(&mutex); }// Set window properties
    ANativeWindow_setBuffersGeometry(nativeWindow, width, height, WINDOW_FORMAT_RGBA_8888);

    ANativeWindow_Buffer window_buffer;

    if (ANativeWindow_lock(nativeWindow, &window_buffer, 0)) {
        ANativeWindow_release(nativeWindow);
        nativeWindow = 0;
        pthread_mutex_unlock(&mutex);
        return;
    }

    // Add data to buffer
    uint8_t *dst_data = static_cast<uint8_t *>(window_buffer.bits);
    int lineSize = window_buffer.stride * 4;//RGBA

    // Copy it line by line.
    / / copy
    for (int i = 0; i < window_buffer.height; ++i) {
        memcpy(dst_data + i * lineSize, src_data + i * src_size, lineSize);
    }
    ANativeWindow_unlockAndPost(nativeWindow);
    pthread_mutex_unlock(&mutex);
}
Copy the code

The video rendering is complete.

audio

decoding

The process of audio is the same as that of video. After getting the unsealed AAC data, start decoding, and the code is as follows:

/** * audio decoding */
void AudioChannel::audio_decode() {
    // Packet to be decoded
    AVPacket *avPacket = 0;
    // Loop data as long as it is playing
    while (isPlaying) {

        if (isStop) {
            // The thread sleeps for 10s
            av_usleep(2 * 1000);
            continue;
        }

        // There is a bug here, if the production is fast and the consumption is slow, it will cause too much queue data, which will cause OOM.
        // The solution is to control the queue size
        if (isPlaying && audioFrames.queueSize() > 100) {
// LOGE(" size :%d", audiofames.queuesize ());
            // The thread sleeps for 10s
            av_usleep(10 * 1000);
            continue;
        }

        // It can be removed normally
        int ret = audioPackages.pop(avPacket);
        // Condition to determine whether to continue
        if(! ret)continue;
        if(! isPlaying)break;

        // The data to be decoded is sent to the decoder
        ret = avcodec_send_packet(pContext,
                                  avPacket);//@return 0 on success, otherwise negative error code:
        if (ret)break;// Failed to send to decoder

        // The packet is sent successfully
        releaseAVPacket(&avPacket);

        // Retrieve the decoded raw packet
        AVFrame *avFrame = av_frame_alloc();
        // Send raw data to avFrame memory
        ret = avcodec_receive_frame(pContext, avFrame);//0:success, a frame was returned

        if (ret == AVERROR(EAGAIN)) {
            continue;// Failed to obtain, continue to the next task
        } else if(ret ! =0) {// Indicates a failure
            releaseAVFrame(&avFrame);// Release the requested memory
            break;
        }

        // Put the raw data in the queue, i.e. the decoded raw data
        audioFrames.push(avFrame);
    }

    / / release the packet
    if (avPacket)
        releaseAVPacket(&avPacket);
}
Copy the code

The logic of audio and video is the same without further elaboration.

Render the PCM

PCM can be rendered using AudioTrack of Java layer, or OpenSL ES of NDK to render. For performance and better docking, I have directly implemented it in C++, the code is as follows:

/** * PCM data */
void AudioChannel::audio_player() {
    //TODO 1. Create the engine and get the engine interface
    // 1.1 creating an engineObject: SLObjectItf engineObject
    SLresult result = slCreateEngine(&engineObject, 0.NULL.0.NULL.NULL);
    if(SL_RESULT_SUCCESS ! = result) {return;
    }

    1.2 Initializing the engine
    result = (*engineObject)->Realize(engineObject, SL_BOOLEAN_FALSE);
    if(SL_BOOLEAN_FALSE ! = result) {return;
    }

    // 1.3 Obtaining SLEngineItf engineInterface
    result = (*engineObject)->GetInterface(engineObject, SL_IID_ENGINE, &engineInterface);
    if(SL_RESULT_SUCCESS ! = result) {return;
    }

    //TODO 2. Set the mixer
    // 2.1 Creating a mixer: SLObjectItf outputMixObject
    result = (*engineInterface)->CreateOutputMix(engineInterface, &outputMixObject, 0.0.0);

    if(SL_RESULT_SUCCESS ! = result) {return;
    }

    // 2.2 Initializing the mixer
    result = (*outputMixObject)->Realize(outputMixObject, SL_BOOLEAN_FALSE);
    if(SL_BOOLEAN_FALSE ! = result) {return;
    }
    // Without reverb enabled, you do not need to obtain the mixer interface
    // Get the mixer interface
    // result = (*outputMixObject)->GetInterface(outputMixObject, SL_IID_ENVIRONMENTALREVERB,
    // &outputMixEnvironmentalReverb);
    // if (SL_RESULT_SUCCESS == result) {
    // Set reverb: default.
    / / SL_I3DL2_ENVIRONMENT_PRESET_ROOM: indoor
    // SL_I3DL2_ENVIRONMENT_PRESET_AUDITORIUM: auditorium etc
    // const SLEnvironmentalReverbSettings settings = SL_I3DL2_ENVIRONMENT_PRESET_DEFAULT;
    // (*outputMixEnvironmentalReverb)->SetEnvironmentalReverbProperties(
    // outputMixEnvironmentalReverb, &settings);
    / /}

    //TODO 3
    // 3.1 Configuring input Voice Information
    // Create a buffer queue
    SLDataLocator_AndroidSimpleBufferQueue loc_bufq = {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE,
                                                       2};
    // PCM data format
    // SL_DATAFORMAT_PCM: Data format is PCM
    // 2: dual channel
    // SL_samplinGrateful 44_1: Sample rate 44100 (44.1 hz most widely used, most compatible)
    // SL_PCMSAMPLEFORMAT_FIXED_16: Sampling format is 16bit (16bit) (2 bytes)
    // SL_PCMSAMPLEFORMAT_FIXED_16: Data size is 16 bits (16 bits) (2 bytes)
    / / SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT: left and right channels (double channel) (double channel of stereo effect)
    // SL_BYTEORDER_LITTLEENDIAN: small endian mode
    SLDataFormat_PCM format_pcm = {SL_DATAFORMAT_PCM, 2, SL_SAMPLINGRATE_44_1,
                                   SL_PCMSAMPLEFORMAT_FIXED_16,
                                   SL_PCMSAMPLEFORMAT_FIXED_16,
                                   SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT,
                                   SL_BYTEORDER_LITTLEENDIAN};

    // Data source Puts the above configuration information into this data source
    SLDataSource audioSrc = {&loc_bufq, &format_pcm};

    // 3.2 Configuring audio Tracks (output)
    // Set the mixer
    SLDataLocator_OutputMix loc_outmix = {SL_DATALOCATOR_OUTPUTMIX, outputMixObject};
    SLDataSink audioSnk = {&loc_outmix, NULL};

    // Required interface Indicates the interface of the operation queue
    const SLInterfaceID ids[1] = {SL_IID_BUFFERQUEUE};
    const SLboolean req[1] = {SL_BOOLEAN_TRUE};

    // 3.3 Creating a player
    result = (*engineInterface)->CreateAudioPlayer(engineInterface, &bqPlayerObject, &audioSrc,
                                                   &audioSnk, 1, ids, req);
    if(SL_RESULT_SUCCESS ! = result) {return;
    }
    // 3.4 Initializing player: SLObjectItf bqPlayerObject
    result = (*bqPlayerObject)->Realize(bqPlayerObject, SL_BOOLEAN_FALSE);
    if(SL_RESULT_SUCCESS ! = result) {return;
    }
    3.5 Obtaining the player interface: SLPlayItf bqPlayerPlay
    result = (*bqPlayerObject)->GetInterface(bqPlayerObject, SL_IID_PLAY, &bqPlayerPlay);
    if(SL_RESULT_SUCCESS ! = result) {return;
    }

    TODO 4. Set the player callback function
    / / 4.1 for player queue interface: SLAndroidSimpleBufferQueueItf bqPlayerBufferQueue
    (*bqPlayerObject)->GetInterface(bqPlayerObject, SL_IID_BUFFERQUEUE, &bqPlayerBufferQueue);

    / / 4.2 set callback void bqPlayerCallback (SLAndroidSimpleBufferQueueItf bq, void * context)
    (*bqPlayerBufferQueue)->RegisterCallback(bqPlayerBufferQueue, bqPlayerCallback, this);

    //TODO 5. Set the playback state
    (*bqPlayerPlay)->SetPlayState(bqPlayerPlay, SL_PLAYSTATE_PLAYING);

    TODO 6. Manually activate the callback function
    bqPlayerCallback(bqPlayerBufferQueue, this);

}
Copy the code

Set render data:

/** * get PCM * @return */
int AudioChannel::getPCM() {
    // Define PCM data size
    int pcm_data_size = 0;

    // Raw data wrapper class
    AVFrame *pcmFrame = 0;
    // loop out
    while (isPlaying) {

        if (isStop) {
            // The thread sleeps for 10s
            av_usleep(2 * 1000);
            continue;
        }

        int ret = audioFrames.pop(pcmFrame);
        if(! isPlaying)break;
        if(! ret)continue;

        //PCM handles logic
        pcmFrame->data;
        // The audio player's data format is the one we define below (16-bit dual-channel....)
        // The raw data (the audio PCM data to be played)
        // A 16-bit dual channel is the same as a 16-bit dual channel. One is the raw data, and in order to solve the above problem, resampling is required.
        // Start resampling
        int dst_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx, pcmFrame->sample_rate) +
                                            pcmFrame->nb_samples, out_sample_rate,
                                            pcmFrame->sample_rate, AV_ROUND_UP);

        / / re-sampling
        /** ** @param out_buffers output buffer, when PCM data is in Packed format, only out[0] will be populated with data. * @param dST_NB_samples The number of samples each channel can store for output PCM data. * @param pcmFrame->data Input buffer. When PCM data is in Packed format, only in[0] needs to be filled with data. * @param pcmFrame->nb_samples Enter the number of samples available for each channel in the PCM data. * * @return Returns the number of samples output per channel, or a negative number if an error occurs. * /
        ret = swr_convert(swr_ctx, &out_buffers, dst_nb_samples, (const uint8_t **) pcmFrame->data,
                          pcmFrame->nb_samples);// Return the number of samples output per channel, negative if an error occurs.
        if (ret < 0) {
            fprintf(stderr."Error while converting\n");
        }

        pcm_data_size = ret * out_sample_size * out_channels;

        // For audio and video synchronization
        audio_time = pcmFrame->best_effort_timestamp * av_q2d(this->base_time);
        break;
    }
    // Resources are freed after rendering
    releaseAVFrame(&pcmFrame);
    return pcm_data_size;
}


/** * Create a callback function for playing audio */
void bqPlayerCallback(SLAndroidSimpleBufferQueueItf bq, void *context) {
    AudioChannel *audioChannel = static_cast<AudioChannel *>(context);
    // Get PCM audio raw stream
    int pcmSize = audioChannel->getPCM();
    if(! pcmSize)return;
    (*bq)->Enqueue(bq, audioChannel->out_buffers, pcmSize);
}
Copy the code

The code has been written so far, and the audio and video have been rendered normally, but there is still a problem here. As the playback time goes on, the audio and video will be rendered separately and not synchronized or played all the time. Such experience is very bad, so we will solve this problem in the next section.

Audio and video synchronization

There are three solutions for audio and video synchronization: Audio to video synchronization, video to audio synchronization, and audio and video synchronization with the external clock. Here’s how each of the three alignments is implemented, along with the pros and cons of each.

1. Audio is synchronized to video
  
  Let’s take a look at how this works. Audio to video syncing, as the name implies, means that the video will maintain a certain refresh rate, or the render time of the current video frame will be determined by how long it takes to render the video frame, or every frame of the video must be rendered in full. When we fill the AudioChannel module with audio data, we will compare it with the timestamp of the currently rendered video frame. If this difference is not within the threshold value, we will need to align it. If it is within the threshold, the audio frame of this frame can be filled directly into the AudioChannel module to let the user hear the sound. So how do you align if you’re not within the threshold? This requires adjusting the audio frame, which means that if the time stamp of the audio frame to be filled is smaller than the time stamp of the currently rendered video frame, we need to do frame skipping (either by speeding up playback, or by discarding some audio frames); If the time stamp of the audio frame is larger than that of the currently rendered video frame, it will need to wait. The implementation can either fill the AudioChannel module with empty data for playback, or slow down the audio to play to the user while the video frame continues to render frame by frame. Once the time stamp of the video catches up with the time stamp of the audio, the audio frame of this frame can be filled into the AudioChannel module, which is the implementation of audio to video synchronization.
  
  Advantages: Video can play every frame to the user, the picture looks the most smooth.
  
  Disadvantages: Audio will accelerate or lose frames. If the frame loss coefficient is small, the user’s perception may not be very strong; if the coefficient is large, the user’s perception will be very strong. When frame loss occurs or empty data is inserted, the user’s ear can obviously feel it.
1. Video is synchronized to audio
  
  Look at the video to audio is how to realize the synchronous way, this has to do with the above mentioned way, to the contrary, because no matter which one platform play audio engine, can guarantee the time length of the playback of audio and the actual length of time the audio represents is consistent, so we can rely on the order of the audio playback provides us with a timestamp, When the client code requests to send a video frame, it first calculates the difference between the timestamp of the video frame element in the header of the current video queue and the timestamp of the current audio playback frame. If it is within the threshold, the video frame can be rendered; If it is not within the threshold, align. The specific alignment operation method is: if the timestamp of the video frame in the head of the current queue is smaller than the timestamp of the current audio frame, then the frame hopping operation is carried out; If it is larger than the timestamp of the current playing audio frame, wait for the action (sleep, repeat render, no render).
  
  Pros: Audio can be rendered continuously.
  
  Disadvantages: The video screen will have the operation of jumping frame, but the video frame loss and jumping frame is not easy to distinguish the user’s eyes.

The audio and video are synchronized with the external clock

This strategy is more like the above two ways of alignment, its implementation is in external individually maintenance rail external clock, we need to ensure that the update of the external clock is in accordance with the increase of time slowly increased, and when we get audio data and video frame, the need to align with the external clock, if there is no more than the threshold, Then the audio frame or video frame will be directly returned to the frame, if the threshold is exceeded, the alignment operation will be carried out. The specific alignment operation is: use the alignment operation in the above two ways, and apply it to the audio and video alignment respectively.

Advantages: Can maximize the guarantee of audio and video can not occur frame hopping behavior.

Disadvantages: The external clock is not easy to control, and may cause frame hopping in both audio and video.

Synchronous summary:

According to the physiological structure of human eyes and ears, a conclusion is drawn that human ears are much more sensitive than human eyes. That is to say, if the audio has the behavior of frame skipping or filling in blank data, then our ears are very easy to detect; However, it is not easy for our eyes to distinguish a video with frame skipping or repetitive rendering. Based on this theory, we will also use video to audio alignment here.

Based on our results, we need to make several changes before rendering audio and video, as shown below:

Get the audio timestamp through the FFMPEG API

//1. Define variables in BaseChannel for use by subclasses
//############### The following are the audio and video synchronization needs
    //FFmpeg time base: internal time
    AVRational base_time;
    double audio_time;
    double video_time;
//############### The following are the audio and video synchronization needs

//2. Get the original data frame after decoding the audio timestamp pcmFrame
audio_time = pcmFrame->best_effort_timestamp * av_q2d(this->base_time);
Copy the code

Alignment of video to audio timestamp (greater than or less than the audio timestamp processing)

        // Video to audio timestamp alignment -- controls video playback speed
        // Control video frames according to FPS before rendering
        //frame->repeat_pict = How long should this picture be displayed after decoding
        double extra_delay = frame->repeat_pict;
        // Get the latency based on FPS
        double base_delay = 1.0 / this->fpsValue;
        // Get the delay of the current frame
        double result_delay = extra_delay + base_delay;

        // Get the time base for the video to play
        video_time = frame->best_effort_timestamp * av_q2d(this->base_time);

        // Get the audio playback time base
        double_t audioTime = this->audio_time;

        // Calculate the difference between audio and video
        double av_time_diff = video_time - audioTime;

        / / description:
        //video_time > audioTime indicates that video is fast, audio is slow, and audio is waiting
        //video_time < audioTime indicates that the video is slow and the audio screen is fast. It is necessary to catch up with the audio. Redundant video packets are discarded, that is, frames are lost
        if (av_time_diff > 0) {
            // Wait flexibly through sleep
            if (av_time_diff > 1) {
                av_usleep((result_delay * 2) * 1000000);
                LOGE("Av_time_diff > 1 sleep :%d", (result_delay * 2) * 1000000);
            } else {// There is little difference in the instructions
                av_usleep((av_time_diff + result_delay) * 1000000);
                LOGE("Av_time_diff < 1 sleep :%d", (av_time_diff + result_delay) * 1000000); }}else {
            if (av_time_diff < 0) {
                LOGE("Av_time_diff < 0 Packet loss: %f", av_time_diff);
                // Video packet loss processing
                this->videoFrames.deleteVideoFrame();
                continue;
            } else {
                / / perfect}}Copy the code

After adding this code, our audio and video are almost synchronized, I can’t guarantee 100%.

conclusion

A simple audio and video player is implemented, we from the decapsulation – > decoding – > audio and video synchronization – > audio and video rendering in accordance with the process of interpretation and write the example code, believe that you have process and architecture design of players have a certain understanding, while waiting for the company has a demand can be design and developed a player.

The full code has been uploaded to GitHub

(9) Develop an Android player from 0 ~ 1 (support multi-protocol network pull stream/local file)

preface

RTMP is compiled in combination with FFmpeg

RTMP

Mix compilation

Player development

The flow chart

Project preparation

decapsulation

Obtain data to be decoded (e.g. H264, AAC)

video

decoding

YUV turn RGBA

Rendering RGBA

audio

decoding

Render the PCM

Audio and video synchronization

conclusion

Related Posts

Thinking about Android plugins

Your signature verification may have been cracked

Let’s talk about straddles