ExoPlayer is an excellent playback framework encapsulated by Google. At present, YouTube and many other applications are using it to play videos. The update of ExoPlayer is very fast, basically keeping the pace of updating a version every half a month. ExoPlayer essentially uses MediaCodec to decode video, but the process is very complicated, so we’ll go through it from the beginning to the end. We haven’t looked at many parts in detail, so let’s report back.

The profile

The ExoPlayer is designed to make very few assumptions about the type of media being played, how the media will be stored and stored, and how it will be rendered (and therefore imposes some restrictions). Instead of loading and rendering media directly, the ExoPlayer implementation delegates this work to components that are injected when the player is created or ready to play. The common components of all ExoPlayer implementations are:

  • A MediaSource that defines the media to play, loads the media, and reads the loaded media from it. Inject the MediaSource at the start of playback with prepare (MediaSource). The library module provides default implementations for Progressive media files (ProgressiveMediaSource), DASH (DashMediaSource), SmoothStreaming (SsMediaSource), and HLS (HlsMediaSource). This implementation is an implementation for loading a single media sample (MergingMediaSource, MergingMediaSource, ConcatenatingMediaSource, LoopingMediaSource and ClippingMediaSource).
  • A renderer that renders the various components of a medium. The library provides default implementations of common media types (MediaCodecVideoRenderer, MediaCodecAudioRenderer, TextRenderer, and MetadataRenderer). The renderer uses media from the playing MediaSource. The renderer is injected when the player is created.
  • A TrackSelector that selects tracks provided by MediaSource for use by each available renderer. The library provides a default implementation (DefaultTrackSelector) suitable for most use cases. The TrackSelector is injected when the player is created.
  • A LoadControl that controls when and how much media MediaSource buffers. The library provides a default implementation (DefaultLoadControl) suitable for most use cases. LoadControl is injected when the player is created.

ExoPlayer can be built using the default components provided by the library, but custom implementations can also be built if non-standard behavior is required. For example, you can inject a custom LoadControl to change a player’s buffering policy, or you can inject a custom Renderer to add support for video codecs that are not supported by Android native.

Throughout the library there is the concept of injecting components that implement player functionality. The default component implementation listed above delegates work to other injected components. This allows many child components to be replaced with custom implementations separately. For example, the default MediaSource implementation needs to inject one or more DataSource factories through its constructor. By providing custom factories, data can be loaded from non-standard sources or through other network stacks.

Here is the thread model for ExoPlayer:

  • An ExoPlayer instance must be accessed from a single application thread. In most cases, this should be the main thread of the application. When using the UI component of ExoPlayer or the IMA extension, you also need to use the main thread of the application. When creating the player, you can explicitly specify the thread that must access the ExoPlayer instance by passing “Looper”. If “Looper” is not specified, the “Looper” of the thread that created the player is used, or the “Looper” of the main thread of the application is used if the thread does not have a “Looper”. In all cases, the Player can be used. The getApplicationLooper () query must have access to the thread of the Player’s “stars”.
  • Associated with the Player. GetApplicationLooper () on the thread of the call to registered listeners. Note that this means that the registered listeners are invoked on the same thread that must be used to access the player.
  • The internal playback thread is responsible for playback. The player calls injected player components on this thread, such as Renderer, MediaSources, TrackSelectors, and LoadControls.
  • When the application performs an operation on the player, such as a search, messages are passed to the internal playback thread via the message queue. The internal playback thread consumes the messages in the queue and performs the corresponding operations. Similarly, when a playback event occurs on the internal playback thread, a message is delivered to the application thread through the second message queue. The application thread uses the messages in the queue to update the application’s visible state and invoke the appropriate listener methods.
  • Injected player components may use other background threads. For example, MediaSource can use background threads to load data.

The following is a basic call process to write an ExoPlayer demo, which is relatively simple, we can expand from this figure, a comprehensive analysis of the ExoPlayer mechanism.

I. Relationship review of MediaSource

MediaSource is a very important structure in ExoPlayer. The loaded video resources are encapsulated by MediaSource, and the MediaSource contains the logic to parse the video resources. Below is the MediaSource class structure diagram;

1.1 DashMediaSource

DashMediaSource is dedicated to parsing dash-formatted video. This is all packaged, not codec, and the same is true here; Dynamic Adaptive Streaming over HTTP DASH is an Adaptive Streaming technology that splits multimedia files into one or more fragments and transmits them to clients using hypertext Transfer Protocol.

1.2 HlsMediaSource

HlsMediaSource is dedicated to analyzing video in HLS format. Http Live Streaming technology, pioneered by Apple, has become the basic technology component of Live Streaming. HLS format distinction is very fine, interested can take a look at the HLS protocol specification;

1.3 SsMediaSource

SsMediaSource is a Microsoft solution for SmoothStreaming technology, which has many similarities with DASH and HLS.

1.4 ProgressiveMediaSource

This is MediaSource encapsulation for generic video, which means non-DASH, HLS, SS video. Here are the video encapsulation formats supported by ProgressiveMediaSource;

Container format Supported Comment
MP4 YES  
M4A YES  
FMP4 YES  
WebM YES  
Matroska YES  
MP3 YES Some streams only seekable using constant bitrate seeking**
Ogg YES Containing Vorbis, Opus and FLAC
WAV YES  
MPEG-TS YES  
MPEG-PS YES  
FLV YES Not seekable*
ADTS (AAC) YES Only seekable using constant bitrate seeking**
FLAC YES FLAC extension only
AMR YES Only seekable using constant bitrate seeking**

Summary: Currently DASH and SS have been used relatively little, we mainly analyze HLS and ProgressiveMediaSource two types;

1.5 MediaSource Judgment rule

private MediaSource createLeafMediaSource( Uri uri, String extension, DrmSessionManager<ExoMediaCrypto> drmSessionManager) { @ContentType int type = Util.inferContentType(uri, extension); switch (type) { case C.TYPE_DASH: return new DashMediaSource.Factory(dataSourceFactory) .setDrmSessionManager(drmSessionManager) .createMediaSource(uri); case C.TYPE_SS: return new SsMediaSource.Factory(dataSourceFactory) .setDrmSessionManager(drmSessionManager) .createMediaSource(uri); case C.TYPE_HLS: return new HlsMediaSource.Factory(dataSourceFactory) .setDrmSessionManager(drmSessionManager) .createMediaSource(uri); case C.TYPE_OTHER: return new ProgressiveMediaSource.Factory(dataSourceFactory) .setDrmSessionManager(drmSessionManager) .createMediaSource(uri); default: throw new IllegalStateException("Unsupported type: " + type); }}Copy the code

The various MediaSource types are used according to certain rules. Above is the logic of judgment. The key to extract is the type derived from the URI and extension.

public static int inferContentType(String fileName) { fileName = toLowerInvariant(fileName); if (fileName.endsWith(".mpd")) { return C.TYPE_DASH; } else if (fileName.endsWith(".m3u8")) { return C.TYPE_HLS; } else if (fileName.matches(".*\\.ism(l)? (/manifest(\\(.+\\))?) ?" )) { return C.TYPE_SS; } else { return C.TYPE_OTHER; }}Copy the code

The judgment logic here is very simple. In this way, the judgment of HLS is not very accurate. Many VIDEOS of HLS type will be judged as non-HLS type, which will be discussed in HLS format analysis later.

MediaSource parsing

There’s a whole bunch of Factory classes, and this is the ExoPlayer that encapsulates the work of each module in the Factory class, and it’s managed by the Factory;

In ExoPlayer DefaultHttpDataSourceFactory – > DefaultHttpDataSource to achieve the analytical work of HLS resources;

HlsMediaSource. Factory constructor will pass in a DataSource. The Factory object, the object is DefaultHttpDataSourceFactory object, this object is responsible for the HTTP request;

    public Factory(DataSource.Factory dataSourceFactory) {
      this(new DefaultHlsDataSourceFactory(dataSourceFactory));
    }
Copy the code

There are many Factory classes in the ExoPlayer, and basically every module has a Factory class that handles the work of that module; DefaultHttpDataSource has work for parsing the corresponding file path:

public long open(DataSpec dataSpec) throws IOException { Assertions.checkState(dataSource == null); // Choose the correct source for the scheme. String scheme = dataSpec.uri.getScheme(); if (Util.isLocalFileUri(dataSpec.uri)) { String uriPath = dataSpec.uri.getPath(); if (uriPath ! = null && uriPath.startsWith("/android_asset/")) { dataSource = getAssetDataSource(); } else { dataSource = getFileDataSource(); } } else if (SCHEME_ASSET.equals(scheme)) { dataSource = getAssetDataSource(); } else if (SCHEME_CONTENT.equals(scheme)) { dataSource = getContentDataSource(); } else if (SCHEME_RTMP.equals(scheme)) { dataSource = getRtmpDataSource(); } else if (SCHEME_UDP.equals(scheme)) { dataSource = getUdpDataSource(); } else if (DataSchemeDataSource.SCHEME_DATA.equals(scheme)) { dataSource = getDataSchemeDataSource(); } else if (SCHEME_RAW.equals(scheme)) { dataSource = getRawResourceDataSource(); } else { dataSource = baseDataSource; } // Open the source and return. return dataSource.open(dataSpec); }Copy the code

HTTP is the most critical dataSource, which is created in DefaultHttpDataSourceFactory createDataSource, finally launched and handling HTTP requests in DefaultHttpDataSource class;The class diagram above shows the structure distribution between DataSource in ExoPlayer. The main Http request processing class is DefaultHttpDataSource. OkHttpDataSource and CronetDataSource are extended classes for developers to access external network loading libraries.

Third, the Renderer

Media read by the Renderer from SampleStream. Internally, the lifecycle of the Renderer is managed by the owning ExoPlayer. As the overall play state and enabled tracks change, the Renderer transitions through various states. Valid state transitions are shown below, marked with the methods invoked during each transition.

Renderer classes are also available for extension, such as FfmpegAudioRenderer, which is not listed here;

3.1 Renderer initialization

When constructing the SimpleExoPlayer object, we pass in the Renderer List object, which passes in all supported Renderer renderers;

  public Renderer[] createRenderers(
      Handler eventHandler,
      VideoRendererEventListener videoRendererEventListener,
      AudioRendererEventListener audioRendererEventListener,
      TextOutput textRendererOutput,
      MetadataOutput metadataRendererOutput,
      @Nullable DrmSessionManager<FrameworkMediaCrypto> drmSessionManager) {
    if (drmSessionManager == null) {
      drmSessionManager = this.drmSessionManager;
    }
    ArrayList<Renderer> renderersList = new ArrayList<>();
    buildVideoRenderers(
        context,
        extensionRendererMode,
        mediaCodecSelector,
        drmSessionManager,
        playClearSamplesWithoutKeys,
        enableDecoderFallback,
        eventHandler,
        videoRendererEventListener,
        allowedVideoJoiningTimeMs,
        renderersList);
    buildAudioRenderers(
        context,
        extensionRendererMode,
        mediaCodecSelector,
        drmSessionManager,
        playClearSamplesWithoutKeys,
        enableDecoderFallback,
        buildAudioProcessors(),
        eventHandler,
        audioRendererEventListener,
        renderersList);
    buildTextRenderers(context, textRendererOutput, eventHandler.getLooper(),
        extensionRendererMode, renderersList);
    buildMetadataRenderers(context, metadataRendererOutput, eventHandler.getLooper(),
        extensionRendererMode, renderersList);
    buildCameraMotionRenderers(context, extensionRendererMode, renderersList);
    buildMiscellaneousRenderers(context, eventHandler, extensionRendererMode, renderersList);
    return renderersList.toArray(new Renderer[0]);
  }
Copy the code

buildVideoRenderers buildAudioRenderers buildTextRenderers buildMetadataRenderers buildCameraMotionRenderers BuildMiscellaneousRenderers track VideoRenderer and AudioRenderer workflow, to our understanding of video decoding processes have great help;



protected void buildVideoRenderers( Context context, @ExtensionRendererMode int extensionRendererMode, MediaCodecSelector mediaCodecSelector, @Nullable DrmSessionManager<FrameworkMediaCrypto> drmSessionManager, boolean playClearSamplesWithoutKeys, boolean enableDecoderFallback, Handler eventHandler, VideoRendererEventListener eventListener, long allowedVideoJoiningTimeMs, ArrayList<Renderer> out) { out.add( new MediaCodecVideoRenderer( context, mediaCodecSelector, allowedVideoJoiningTimeMs, drmSessionManager, playClearSamplesWithoutKeys, enableDecoderFallback, eventHandler, eventListener, MAX_DROPPED_VIDEO_FRAME_COUNT_TO_NOTIFY)); if (extensionRendererMode == EXTENSION_RENDERER_MODE_OFF) { return; } int extensionRendererIndex = out.size(); if (extensionRendererMode == EXTENSION_RENDERER_MODE_PREFER) { extensionRendererIndex--; } try { // Full class names used for constructor args so the LINT rule triggers if any of them move. // LINT.IfChange Class<? > clazz = Class.forName("com.google.android.exoplayer2.ext.vp9.LibvpxVideoRenderer"); Constructor<? > constructor = clazz.getConstructor( long.class, android.os.Handler.class, com.google.android.exoplayer2.video.VideoRendererEventListener.class, int.class); // LINT.ThenChange(.. /.. /.. /.. /.. /.. /.. /proguard-rules.txt) Renderer renderer = (Renderer) constructor.newInstance( allowedVideoJoiningTimeMs, eventHandler, eventListener, MAX_DROPPED_VIDEO_FRAME_COUNT_TO_NOTIFY); out.add(extensionRendererIndex++, renderer); Log.i(TAG, "Loaded LibvpxVideoRenderer."); } catch (ClassNotFoundException e) { // Expected if the app was built without the extension. } catch (Exception e) { // The extension is present, but instantiation failed. throw new RuntimeException("Error instantiating VP9 extension", e); } try { // Full class names used for constructor args so the LINT rule triggers if any of them move. // LINT.IfChange Class<? > clazz = Class.forName("com.google.android.exoplayer2.ext.av1.Libgav1VideoRenderer"); Constructor<? > constructor = clazz.getConstructor( long.class, android.os.Handler.class, com.google.android.exoplayer2.video.VideoRendererEventListener.class, int.class); // LINT.ThenChange(.. /.. /.. /.. /.. /.. /.. /proguard-rules.txt) Renderer renderer = (Renderer) constructor.newInstance( allowedVideoJoiningTimeMs, eventHandler, eventListener, MAX_DROPPED_VIDEO_FRAME_COUNT_TO_NOTIFY); out.add(extensionRendererIndex++, renderer); Log.i(TAG, "Loaded Libgav1VideoRenderer."); } catch (ClassNotFoundException e) { // Expected if the app was built without the extension. } catch (Exception e) { // The extension is present, but instantiation failed. throw new RuntimeException("Error instantiating AV1 extension", e); }}Copy the code

Mainly MediaCodecVideoRenderer, the following are scalable soft decoder Renderer engines;




protected void buildAudioRenderers( Context context, @ExtensionRendererMode int extensionRendererMode, MediaCodecSelector mediaCodecSelector, @Nullable DrmSessionManager<FrameworkMediaCrypto> drmSessionManager, boolean playClearSamplesWithoutKeys, boolean enableDecoderFallback, AudioProcessor[] audioProcessors, Handler eventHandler, AudioRendererEventListener eventListener, ArrayList<Renderer> out) { out.add( new MediaCodecAudioRenderer( context, mediaCodecSelector, drmSessionManager, playClearSamplesWithoutKeys, enableDecoderFallback, eventHandler, eventListener, new DefaultAudioSink(AudioCapabilities.getCapabilities(context), audioProcessors))); if (extensionRendererMode == EXTENSION_RENDERER_MODE_OFF) { return; } int extensionRendererIndex = out.size(); if (extensionRendererMode == EXTENSION_RENDERER_MODE_PREFER) { extensionRendererIndex--; } try { // Full class names used for constructor args so the LINT rule triggers if any of them move. // LINT.IfChange Class<? > clazz = Class.forName("com.google.android.exoplayer2.ext.opus.LibopusAudioRenderer"); Constructor<? > constructor = clazz.getConstructor( android.os.Handler.class, com.google.android.exoplayer2.audio.AudioRendererEventListener.class, com.google.android.exoplayer2.audio.AudioProcessor[].class); // LINT.ThenChange(.. /.. /.. /.. /.. /.. /.. /proguard-rules.txt) Renderer renderer = (Renderer) constructor.newInstance(eventHandler, eventListener, audioProcessors); out.add(extensionRendererIndex++, renderer); Log.i(TAG, "Loaded LibopusAudioRenderer."); } catch (ClassNotFoundException e) { // Expected if the app was built without the extension. } catch (Exception e) { // The extension is present, but instantiation failed. throw new RuntimeException("Error instantiating Opus extension", e); } try { // Full class names used for constructor args so the LINT rule triggers if any of them move. // LINT.IfChange Class<? > clazz = Class.forName("com.google.android.exoplayer2.ext.flac.LibflacAudioRenderer"); Constructor<? > constructor = clazz.getConstructor( android.os.Handler.class, com.google.android.exoplayer2.audio.AudioRendererEventListener.class, com.google.android.exoplayer2.audio.AudioProcessor[].class); // LINT.ThenChange(.. /.. /.. /.. /.. /.. /.. /proguard-rules.txt) Renderer renderer = (Renderer) constructor.newInstance(eventHandler, eventListener, audioProcessors); out.add(extensionRendererIndex++, renderer); Log.i(TAG, "Loaded LibflacAudioRenderer."); } catch (ClassNotFoundException e) { // Expected if the app was built without the extension. } catch (Exception e) { // The extension is present, but instantiation failed. throw new RuntimeException("Error instantiating FLAC extension", e); } try { // Full class names used for constructor args so the LINT rule triggers if any of them move. // LINT.IfChange Class<? > clazz = Class.forName("com.google.android.exoplayer2.ext.ffmpeg.FfmpegAudioRenderer"); Constructor<? > constructor = clazz.getConstructor( android.os.Handler.class, com.google.android.exoplayer2.audio.AudioRendererEventListener.class, com.google.android.exoplayer2.audio.AudioProcessor[].class); // LINT.ThenChange(.. /.. /.. /.. /.. /.. /.. /proguard-rules.txt) Renderer renderer = (Renderer) constructor.newInstance(eventHandler, eventListener, audioProcessors); out.add(extensionRendererIndex++, renderer); Log.i(TAG, "Loaded FfmpegAudioRenderer."); } catch (ClassNotFoundException e) { // Expected if the app was built without the extension. } catch (Exception e) { // The extension is present, but instantiation failed. throw new RuntimeException("Error instantiating FFmpeg extension", e); }}Copy the code

MediaCodecAudioRenderer is the Renderer engine for audio; Now I think it’s a good idea to illustrate how these renderers work from where they’re initialized how do you call the Renderer?

3.2 How Renderer works

There are many types of Renderer. We chose MediaCodecRenderer to illustrate how Renderer works.There’s a lot of MediaCodec going on here, so I’m going to focus on the decoding of MediaCodec, so I’m not going to go into that; There are a few points to pay attention to:

  • 1. How to achieve audio and video synchronization;
  • 2. How to interpret subtitles;
  • 3. Can MediaCodec be reused?
public interface VideoRendererEventListener {
  default void onVideoEnabled(DecoderCounters counters) {}

  default void onVideoDecoderInitialized(
      String decoderName, long initializedTimestampMs, long initializationDurationMs) {}

  default void onVideoInputFormatChanged(Format format) {}

  default void onDroppedFrames(int count, long elapsedMs) {}

  default void onVideoSizeChanged(
      int width, int height, int unappliedRotationDegrees, float pixelWidthHeightRatio) {}

  default void onRenderedFirstFrame(@Nullable Surface surface) {}

  default void onVideoDisabled(DecoderCounters counters) {}
}
Copy the code

What’s important to us is the onRenderedFirstFrame callback, which indicates the first time data is present on the surface during the rendering sequence;

ExoPlayer also has some important knowledge, limited to the length of the article, or a separate article.