Develop an Android RTMP player from scratch

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

1. Background

In 2015, when mobile terminal live streaming applications became popular, the main live streaming protocol was RTMP. Multimedia services began to become popular with Adobe AMS, WowZA, Red5, CrtmpServer, Nginx RTMP Module, etc., followed by the long RTMP service SRS. Android end player mainly started to play HLS in EXOPlayer, but HLS has high latency. Then we mainly used the open source iJkplyer, iJkPlayer can pull stream and decode through FFMPEG, support a variety of audio and video coding, and cross-platform. API is consistent with system player and other features. Ijkplayer is included in the subsequent live broadcast SDK provided by major manufacturers.

At that time, I was working on a game SDK, which mainly provided game picture and sound collection, audio and video coding and decoding, livestreaming push stream, livestreaming pull stream playing, etc. The SDK provided livestreaming functions for games, and the ready-made iJkPlayer was used for playing. But SDK promotion has a problem, the game manufacturer to abandon the SDK bulky (a total of about 3 MB is actually), we need a small size, high performance of the players, because of the development costs, there has been no time to do the change during the period of work, spent a month time to get the player developed, and out of the open source. Oarplayer is based on MediaCodec and SRS-LIBRtmp, completely independent of FFMPEG, pure C language implementation of player. This article mainly introduces the realization of this player.

2. Overall architecture design

The overall playing process of the player is as follows:

Srs-librtmp pulls live stream, separates audio and video stream by package type, and caches package data to package queue. Decoding thread continuously reads package from package queue and delivers it to decoder for decoding, and decoder stores decoded frame to frame queue. The OpenSLES player thread and the Opengles render thread read the frame play and render from the Frame queue, which also involves audio and video synchronization.

The player mainly involves the following threads:

RTMP pull thread;
Audio decoding thread;
Video decoding thread;
Audio player thread;
Video rendering thread;
JNI callback thread.

3. API interface design

To complete RTMP playback, perform the following steps:

Instantiation OARPlayer:OARPlayer player = new OARPlayer();
Set the video source:player.setDataSource(rtmp_url);
Set up the surface:player.setSurface(surfaceView.getHolder());
Start playing:player.start();
Stop playing:player.stop();
Release resources:player.release();

Java layer methods encapsulate JNI layer methods, and THE JNI layer encapsulates the corresponding specific functions that are invoked.

4. RTMP pulls the flow thread

Srs-librtmp srs-librtmp srs-librtmp srs-librtmp srs-librtmp srs-librtmp srs-librtmp srs-librtmp

Rtmpdump /librtmp code is too hard to read, while SRS code is very readable;
Srs-bench is a client and requires a client library;
Think the server can do well, the client is not a problem

At present, srS-LIBRtmp author has stopped maintenance, the main reason is as the author said:

What determines the justice of an open source project is not how good the technology is, but how long it runs. Technology is cow, performance is very strong, code style is very good, is a good thing, but these are not the top of a “not maintenance” of the big sin, the code released not maintenance, how to follow up the continuous development of technology in the industry. What determines how long you can run is, first, technical enthusiasm, and then the background of the maintainer’s field. SRS maintainers are all server background, they all work in the server, the client side experience is too little, can not maintain the client side library for long. As a result, SRS decided to decisively abandon SRS-LIBRtmp, stop maintaining client libraries and focus on rapid iteration of the server. The client is not unimportant, but is maintained by professional client open source projects and friends, such as FFmpeg, which implements librtmp itself.

Oarplayer originally used srS-LIBRtmp because of the readability of srS-LIBRtmp code. Oarplayer is highly modularized, making it easy to replace various RTMP lib implementations. Here is the SRS-LIBRtmp interface:

Create srs_rtmp_t object:srs_rtmp_create(url);
Set read/write timeout:srs_rtmp_set_timeout;
Start shaking hands:srs_rtmp_handshake;
Start connecting:srs_rtmp_connect_app;
Set the playback mode:srs_rtmp_play_stream;
Cyclic read audio and video package:srs_rtmp_read_packet(rtmp, &type, &timestamp, &data, &size);
Parsing audio packages:
1. Get encoding type:srs_utils_flv_audio_sound_format;
2. Get audio sampling rate:srs_utils_flv_audio_sound_rate;
3. Obtain the sampling bit depth:srs_utils_flv_audio_sound_size;
4. Get channel number:srs_utils_flv_audio_sound_type;
5. Get the audio package type:srs_utils_flv_audio_aac_packet_type;
Parsing video packages:
1. Get encoding type:srs_utils_flv_video_codec_id;
2. Whether keyframe:srs_utils_flv_video_frame_type;
3. Get the video package type:srs_utils_flv_video_avc_packet_type;
Parse metadata types;
Destroy the srs_rtMP_t object:srs_rtmp_destroy;

Here is a small trick, we in the pull thread, loop call srs_rtmp_read_packet method, can set the timeout time through srs_rtmp_set_timeout, but if the timeout time is set too short, will cause frequent thread wake, if set too long, we stop when, You must wait for the timeout to end. Here we can use the poll model, put the TCP socket of RTMP into the poll, put a pipe FD, write an instruction to the pipe when it needs to stop, wake up the poll, and stop the RTMP pull thread directly.

5. Main data structures

5.1 Package Structure:

typedef struct OARPacket { int size; // Package size PktType_e type; // Package type int64_t DTS; // Decode time stamp int64_t PTS; // Display timestamp int isKeyframe; Struct OARPacket *next; // Uint8_t data[0]; }OARPacket;Copy the code

5.2 Package Queue:

typedef struct oar_packet_queue { PktType_e media_type; // Type pthread_mutex_t *mutex; // Thread lock pthread_cond_t *cond; OARPacket *cachedPackets; OARPacket *lastPacket; // The last element of the queue int count; Int total_bytes; // Uint64_t max_duration; Void (*full_cb)(void *); Void (*empty_cb)(void *); // Queue is empty callback void *cb_data; } oar_packet_queue;Copy the code

5.3 Frame type

typedef struct OARFrame { int size; // Frame size PktType_e type; // Frame type int64_t DTS; // Decode time stamp int64_t PTS; // Display timestamp int format; // Format (for video) int width; // width (for video) int height; // High (for video) int64_t pkt_pos; int sample_rate; Struct OARFrame *next; uint8_t data[0]; }OARFrame;Copy the code

5.4 Frame queue

typedef struct oar_frame_queue { pthread_mutex_t *mutex; pthread_cond_t *cond; OARFrame *cachedFrames; OARFrame *lastFrame; int count; // Number of frames unsigned int size; } oar_frame_queue;Copy the code

6. Decode threads

Our RTMP stream pulling, decoding, rendering, and audio output are all implemented in the C layer. In layer C, Android 21 provides AMediaCodec interface, we directly find_library(media-NDk mediandk), and introduce

header file. For Android versions prior to 21, MediaCodec in the Java layer can be called from the C layer. Here are two implementations:

6.1 Java Layer Proxy Decoding

Java layer MediaCodec decoding steps:

Create a decoder:codec = MediaCodec.createDecoderByType(codecName);
Configure the decoder format:codec.configure(format, null, null, 0);
Start the decoder:codec.start()
Get the decode input cache ID:dequeueInputBuffer
Get the decode input cache:getInputBuffer
Get the decode output cache:dequeueOutputBufferIndex
Release the output cache:releaseOutPutBuffer
Stop decoder:codec.stop();

The Jni layer encapsulates the corresponding calling interface.

6.2 Use of C-layer decoder

Interface description of Layer C:

Create the Format:AMediaFormat_new;
Create a decoder:AMediaCodec_createDecoderByType;
Configure decoding parameters:AMediaCodec_configure;
Start the decoder:AMediaCodec_start;
Input audio and video package:
1. Get the input buffer sequence:AMediaCodec_dequeueInputBuffer
2. Get the input buffer:AMediaCodec_getInputBuffer
3. Copy data:memcpy
4. Input buffer into decoder:AMediaCodec_queueInputBuffer
Get decoded frame:
1. Get the output buffer sequence:AMediaCodec_dequeueOutputBuffer
2. Get the output buffer:AMediaCodec_getOutputBuffer

We found that both Java layer and C layer interfaces provide similar ideas, but they ultimately call the decoding framework of the system.

Here we can use Java layer interface and C layer interface according to the system version, our OarPlayer, the main code is implemented in C layer, so we also limited use of C layer interface.

7. Audio output thread

Audio output we use OpenSL implementation, the previous article introduced the Android audio architecture, in fact, can also use AAudio or Oboe. Here is a brief introduction to openSL ES.

Create engine:slCreateEngine(&engineObject, 0, NULL, 0, NULL, NULL);
Implementation engine:(*engineObject)->Realize(engineObject, SL_BOOLEAN_FALSE);
Get interface:(*engineObject)->GetInterface(engineObject, SL_IID_ENGINE, &engineEngine);
Create output mixer:(*engineEngine)->CreateOutputMix(engineEngine, &outputMixObject, 0, NULL, NULL);;
To achieve the mixer:(*outputMixObject)->Realize(outputMixObject, SL_BOOLEAN_FALSE);
Configure the audio source:SLDataLocator_AndroidSimpleBufferQueue loc_bufq = {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 2};
Configure the Format:SLDataFormat_PCM format_pcm = {SL_DATAFORMAT_PCM, channel, SL_SAMPLINGRATE_44_1,SL_PCMSAMPLEFORMAT_FIXED_16, SL_PCMSAMPLEFORMAT_FIXED_16,SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT, SL_BYTEORDER_LITTLEENDIAN};
Create player:(*engineEngine)->CreateAudioPlayer(engineEngine,&bqPlayerObject, &audioSrc, &audioSnk,2, ids, req);
Implementation player:(*bqPlayerObject)->Realize(bqPlayerObject, SL_BOOLEAN_FALSE);
Get the playback interface:(*bqPlayerObject)->GetInterface(bqPlayerObject, SL_IID_PLAY, &bqPlayerPlay);
Get buffer interface:(*bqPlayerObject)->GetInterface(bqPlayerObject, SL_IID_ANDROIDSIMPLEBUFFERQUEUE,&bqPlayerBufferQueue);
Register the cache callback:(*bqPlayerBufferQueue)->RegisterCallback(bqPlayerBufferQueue, bqPlayerCallback, oar);
Get the volume adjuster:(*bqPlayerObject)->GetInterface(bqPlayerObject, SL_IID_VOLUME, &bqPlayerVolume);
The cache callback continually reads data from the audio frame queue and writes it to the cache queue:(*bqPlayerBufferQueue)->Enqueue(bqPlayerBufferQueue, ctx->buffer,(SLuint32)ctx->frame_size);

The above is the introduction of openSL ES interface for audio playback.

8. Render threads

Compared to audio playback, video rendering may be more complex, in addition to opengL engine creation, OpengL thread creation, OarPlayer uses audio based synchronization, so in video rendering also need to consider audio and video synchronization.

8.1 Creating an OpenGL Engine

Generate buffer:glGenBuffers
Binding buffer:glBindBuffer(GL_ARRAY_BUFFER, model->vbos[0])
Set clear screen color:glClearColor
Create a texture object:texture2D
Create a shader object:glCreateShader
Set shader source code:glShaderSource
Compile shader source code:glCompileShader
Attached shader:glAttachShader
Connect shaders:glLinkProgram

Opengl also needs an EGL environment to interact with hardware. The EGL initialization process code is shown below:

static void init_egl(oarplayer * oar){ oar_video_render_context *ctx = oar->video_render_ctx; const EGLint attribs[] = {EGL_SURFACE_TYPE, EGL_WINDOW_BIT, EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT, EGL_BLUE_SIZE, 8, EGL_GREEN_SIZE, 8, EGL_RED_SIZE, 8, EGL_ALPHA_SIZE, 8, EGL_DEPTH_SIZE, 0, EGL_STENCIL_SIZE, 0, EGL_NONE}; EGLint numConfigs; ctx->display = eglGetDisplay(EGL_DEFAULT_DISPLAY); EGLint majorVersion, minorVersion; eglInitialize(ctx->display, &majorVersion, &minorVersion); eglChooseConfig(ctx->display, attribs, &ctx->config, 1, &numConfigs); ctx->surface = eglCreateWindowSurface(ctx->display, ctx->config, ctx->window, NULL); EGLint attrs[] = {EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE}; ctx->context = eglCreateContext(ctx->display, ctx->config, NULL, attrs); EGLint err = eglGetError(); if (err ! = EGL_SUCCESS) { LOGE("egl error"); } if (eglMakeCurrent(ctx->display, ctx->surface, ctx->surface, ctx->context) == EGL_FALSE) { LOGE("------EGL-FALSE"); } eglQuerySurface(ctx->display, ctx->surface, EGL_WIDTH, &ctx->width); eglQuerySurface(ctx->display, ctx->surface, EGL_HEIGHT, &ctx->height); initTexture(oar); oar_java_class * jc = oar->jc; JNIEnv * jniEnv = oar->video_render_ctx->jniEnv; jobject surface_texture = (*jniEnv)->CallStaticObjectMethod(jniEnv, jc->SurfaceTextureBridge, jc->texture_getSurface, ctx->texture[3]); ctx->texture_window = ANativeWindow_fromSurface(jniEnv, surface_texture); }Copy the code

8.2 Audio and video synchronization

There are three common types of audio and video synchronization:

Based on video synchronization;
Based on audio synchronization;
Synchronization based on third-party timestamps.

Here we use the method based on audio frame synchronization. When rendering the picture, judge the audio timestamp diff and the rendering period of the video picture. If it is greater than the period, wait; if it is greater than 0 and less than the period, draw immediately if it is less than 0.

Here is the rendering code:

/** * * @param oar * @param frame * @return 0 draw * -1 sleep 33ms continue * -2 break */ static inline int Draw_video_frame (oarPlayer *oar) { If (oAR ->video_frame == NULL) {oAR ->video_frame = oar_frame_queue_get(oAR ->video_frame_queue); } // buffer empty ==> sleep 10ms , return 0 // eos ==> return -2 if (oar->video_frame == NULL) { _LOGD("video_frame is null..." ); usleep(BUFFER_EMPTY_SLEEP_US); return 0; } int64_t time_stamp = oar->video_frame->pts; int64_t diff = 0; if(oar->metadata->has_audio){ diff = time_stamp - (oar->audio_clock->pts + oar->audio_player_ctx->get_delta_time(oar->audio_player_ctx)); }else{ diff = time_stamp - oar_clock_get(oar->video_clock); } _LOGD("time_stamp:%lld, clock:%lld, diff:%lld",time_stamp , oar_clock_get(oar->video_clock), diff); oar_model *model = oar->video_render_ctx->model; // diff >= 33ms if draw_mode == wait_frame return -1 // if draw_mode == fixed_frequency draw previous frame ,return 0 //  diff > 0 && diff < 33ms sleep(diff) draw return 0 // diff <= 0 draw return 0 if (diff >= WAIT_FRAME_SLEEP_US) { if (oar->video_render_ctx->draw_mode == wait_frame) { return -1; } else { draw_now(oar->video_render_ctx); return 0; } } else { // if diff > WAIT_FRAME_SLEEP_US then use previous frame // else use current frame and release frame // LOGI("start draw..." ); pthread_mutex_lock(oar->video_render_ctx->lock); model->update_frame(model, oar->video_frame); pthread_mutex_unlock(oar->video_render_ctx->lock); oar_player_release_video_frame(oar, oar->video_frame); JNIEnv * jniEnv = oar->video_render_ctx->jniEnv; (*jniEnv)->CallStaticVoidMethod(jniEnv, oar->jc->SurfaceTextureBridge, oar->jc->texture_updateTexImage); jfloatArray texture_matrix_array = (*jniEnv)->CallStaticObjectMethod(jniEnv, oar->jc->SurfaceTextureBridge, oar->jc->texture_getTransformMatrix); (*jniEnv)->GetFloatArrayRegion(jniEnv, texture_matrix_array, 0, 16, model->texture_matrix); (*jniEnv)->DeleteLocalRef(jniEnv, texture_matrix_array); if (diff > 0) usleep((useconds_t) diff); draw_now(oar->video_render_ctx); oar_clock_set(oar->video_clock, time_stamp); return 0; }}Copy the code

9. To summarize

Based on the implementation process of Android RTMP player, this paper introduces RTMP push-pull flow library, Android MediaCodec Java layer and C layer interface, OpenSL ES interface, OpenGL ES interface, EGL interface, and audio and video related knowledge. The specific player code can be directly viewed in the official address: oarPlayer

Please feel free to discuss in the comments section. The nuggets will draw 100 nuggets in the comments section after the diggnation project. See the event article for details