What is VideoLab?

VideoLab is an open source, high-performance and flexible iOS video clips and effects framework that provides a more After Effects (Adobe After Effect) approach to use. The framework core is based on AVFoundation and Metal. Current features:

  • High performance real-time clipping with export.
  • High degree of freedom combined video, picture, audio.
  • Support audio pitch setting, volume adjustment.
  • Support CALayer vector animation, can support complex text animation.
  • Support keyframe animation.
  • Support for pre-composition similar to AE.
  • Support transitions.
  • Support custom effects, such as LUT filters, Zoom Blur and so on (MSL script).

Here are giFs with some features (multilayer, text animation, keyframe animation, precomposition and transitions) :

Warehouse address: github.com/ruanjx/Vide…

This article will share the AVFoundation video editing process and the design and implementation of the VideoLab framework.

AVFoundation video clip process

Before we start, we recommend that those of you who are new to video clips watch the following WWDC video:

  • Advanced Editing with AV Foundation
  • Edit and play back HDR video with AVFoundation

Let’s take a look at the overall workflow of AVFoundation video clips:

Let’s break down the steps:

  1. Create one or moreAVAsset.
  2. createAVComposition,AVVideoCompositionAVAudioMix. Among themAVCompositionSpecifies the time alignment of audio and video tracks,AVVideoCompositionSpecifies the geometric transformation and mixing of video orbits at any given point in time,AVAudioMixManage mixing parameters for audio tracks.
  3. We can create it using these three objectsAVPlayerItemAnd create one from itAVPlayerTo play the editing effect.
  4. Alternatively, we can create it using these three objectsAVAssetExportSessionTo write the editing results to a file.

AVComposition

Let’s start with AVComposition, which is a collection of one or more avcompositionTracks. AVCompositionTrack can also contain AVAssetTrack from multiple AvAssets.

In the example below, combine AVAssetTrack from two Avassets into AVComposition AVCompositionTrack.

AVVideoComposition

Imagine a scenario like the one below, where AVComposition contains two AvCompositionTracks. We need to mix two AVCompositionTrack images at T1 time point. To do this, we need to use AVVideoComposition.

AVVideoComposition can be used to specify render size and render scaling, as well as frame rate. In addition, to implement AVVideoCompositionInstructionProtocol agreement is stored Instruction (Instruction) array, these Instruction to store the parameters of the hybrid. With these blending parameters, AVVideoComposition can mix the corresponding image frames with a Compositor that implements the AVVideoCompositing protocol.

The overall workflow is shown in the figure below:

Let’s focus on the Compositor, we have multiple original frames and need to process and output a new frame. The working process is as follows:

The process can be decomposed into:

  1. AVAsynchronousVideoCompositionRequestA sequence of raw frames bound to the current time and Instruction for the current time.
  2. receivedstartVideoCompositionRequest:Callback and receive the Request.
  3. According to the original frame and Instruction related mixing parameters, render the composite frame.
  4. callfinishWithComposedVideoFrame:Deliver rendered frames.

AVAudioMix

With AVAudioMix, you can process audio on AVComposition’s audio tracks. AVAudioMix contains a set of AVAudioMixInputParameters, each AVCompositionTrack AVAudioMixInputParameters corresponding to an audio. As shown below:

AVAudioMixInputParameters contains a MTAudioProcessingTap, you can use it to real-time processing of audio. For linear volume change, of course, can directly use the volume slope interface setVolumeRampFromStartVolume: toEndVolume: timeRange:

Moreover, AVAudioMixInputParameters also contains a AVAudioTimePitchAlgorithm, you can use it to set up the pitch.

Frame design

After introducing the AVFoundation video clip process, let’s look at the design of the VideoLab framework.

A brief introduction to Adobe After Effect (AE), the motion graphics and visual effects software commonly used by special-effects designers (see the AE website for more information). AE controls the composition of video, audio, and still images through a “layer”, where each media (video, audio, and still image) object has its own independent track.

Below is an example of composing two videos in AFTER Effects.

Let’s break down this example:

  • In the Project area, a Composition of type Composition called Comp1. Compositing can be considered as a work in AFTER Effects and can play and export a video. A composition can set the width and height value, frame rate, background color and other parameters.
  • In the Timeline Control area, there are two layers, whose sources are video1.mov and video2.mov respectively. We can set layer parameters as we like, such as Transform (the example also animated keyframes for Scale), Audio, and move the time range of the layer in the right area. In addition, we can add a set of effects to each layer.

Based on the analysis of AE, we can design a similar description:

  • RenderComposition, corresponding to Composition in AE. Contains a set ofRenderLayer(corresponding to layers in AE). In addition,RenderCompositionAlso containsBackgroundColor,FrameDuration,RenderSize, corresponding to the background color, frame rate, render size and other clipping related parameters.
  • RenderLayerCorresponds to Layer in AE. Contains theSource,TimeRange,Transform,AudioConfiguration,Operations, corresponding to the source of the footage, the time interval in the timeline, transformation (position, rotation, zoom), audio configuration and special effects operation group respectively.
  • RenderLayerGroupCorresponding to the pre-synthesis of AE.RenderLayerGroupInherited fromRenderLayerContains a groupRenderLayer.
  • KeyframeAnimation, corresponding to AE keyframe animation. Contains theKeyPath,Values,KeyTimes,TimingFunctions, respectively corresponding to the critical path, value array, critical time array, slow function array.

RenderComposition, RenderLayer, RenderLayerGroup, and KeyframeAnimation were introduced. As you can see from AVFoundation, we need to generate AVPlayerItem and AVAssetExportSession for playback and export. Therefore, we need an object that can parse these description objects and generate AVPlayerItem and AVAssetExportSession using AVFoundation methods. The framework names this object VideoLab, which can be interpreted as a laboratory.

The overall workflow is as follows:

Let’s break down the steps:

  1. Create one or moreRenderLayer.
  2. createRenderComposition, set itsBackgroundColor,FrameDuration,RenderSize, as well asRenderLayerThe array.
  3. Created withRenderCompositioncreateVideoLab.
  4. Created withVideoLabgenerateAVPlayerItemAVAssetExportSession.

This chapter mainly introduces the design ideas of the framework. Overall, I hope the framework is designed in a flexible way.

Implementation of the framework

Source

From the previous introduction, we know that a RenderLayer may contain a single source of material. The source material can be video, audio, still images, etc. The framework abstracts the Source protocol. Here is the core code of the Source protocol:

public protocol Source {
    var selectedTimeRange: CMTimeRange { get set }
    
    func tracks(for type: AVMediaType)- > [AVAssetTrack]
    func texture(at time: CMTime) -> Texture?
}
Copy the code
  • selectedTimeRangeIs the selection time range of the material itself, for example, for a 2-minute video, we choose the range of 60s-70s as the editing material, thenselectedTimeRangeIs [60s-70s) (actual code useCMTime).
  • tracks(for:)Method used according toAVMediaTypeTo obtainAVAssetTrack.
  • texture(at:)Method for retrieving by timeTexture(Texture).

The framework provides four built-in sources: 1. AVAssetSource, AVAsset; 2. ImageSource, static image; 3. PHAssetVideoSource, album video; PHAssetImageSource, album images. We can also implement the Source protocol to provide custom sources of material.

AVComposition

So far we’ve seen RenderComposition, RenderLayer, RenderLayerGroup, KeyframeAnimation, Source, Next, you’ll see how the VideoLab class uses these objects to create AVComposition, AVVideoComposition, and AVAudioMix.

Let’s look at AVComposition first. We need to add video and audio tracks to AVComposition.

Let’s illustrate this process with an example, as shown below, The RenderComposition includes RenderLayer1 (video/audio), RenderLayer2(video only), RenderLayer3 (images), RenderLayer4 (effects manipulation group only), and one RenderLayerGroup (including RenderLayer5 and RenderLayer6, both with video and audio)

Let’s talk about adding a video track first. Adding a video track involves the following steps:

1. Convert RenderLayer to VideoRenderLayer

The VideoRenderLayer is an internal framework object that contains a RenderLayer and is responsible for adding the video track of the RenderLayer to AVComposition. RenderLayer that can be converted to VideoRenderLayer includes the following classes: 1. Source contains the video track; 2. 2. Source is the image type. 3. Special effects operation group is not null.

The VideoRenderLayerGroup is the internal frame object of the video corresponding to the RenderLayerGroup, which contains a RenderLayerGroup. Renderlayergroups can be converted to VideoRenderLayerGroups only if one of the included RenderLayer groups can be converted to VideoRenderLayerLayer.

After converting the VideoRenderLayer, it looks like this:

2. Add the VideoRenderLayer video track to AVComposition

For the VideoRenderLayer Source that contains the video track, get the video AVAssetTrack from the Source and add it to AVComposition.

For the VideoRenderLayer whose Source is of the image type or whose Source is of the effect operation group type only (Source is empty), add a new video track using empty video (empty video is the video track with black frames and no audio track)

The video track of AVComposition after adding is shown below:

As shown, VideoRenderLayer1 shares a video track with VideoRenderLayer5. This is because Apple has a limit on the number of video tracks, so we need to reuse video tracks as much as possible (each video track corresponds to a decoder, and when the number of decoders exceeds the system limit, there will be an error that cannot be decoded).

The principle of frame video track reuse is that if the VideoRenderLayer to be placed does not overlap with the VideoRenderLayer of the previous video track in time, the video track can be reused. If all video tracks cannot be reused, a new video track can be created.

Let’s move on to adding an audio track. Adding an audio track involves the following steps:

1. Convert RenderLayer to AudioRenderLayer

The AudioRenderLayer is an internal framework object that contains a RenderLayer and is responsible for adding the Audio track of the RenderLayer to AVComposition. A RenderLayer that can be converted to an AudioRenderLayer needs only one condition: the Source contains audio tracks.

An AudioRenderLayerGroup is an internal framework object that corresponds to the audio of a RenderLayerGroup, and contains a RenderLayerGroup. Renderlayergroups can be converted to AudioRenderLayerGroups only if one of the included RenderLayer groups can be converted to an AudioRenderLayerLayer.

After converting the AudioRenderLayer, it looks like this:

2. Add the AudioRenderLayer audio track to AVComposition

For the AudioRenderLayer whose Source contains the audio track, get the audio AVAssetTrack from the Source and add it to AVComposition.

The audio track of AVComposition after adding is shown below:

As shown, unlike the reuse of video tracks, there is one audio track for each AudioRenderLayer of audio. This is due to a AVAudioMixInputParameters with an audio track one-to-one correspondence, and the pitch setting (audioTimePitchAlgorithm) apply to the entire audio track. If reused, an audio track could have multiple AudioRenderLayers, which would cause all the AudioRenderlayers to be configured with the same pitch, which is obviously not reasonable.

AVVideoComposition

As you can see from AVFoundation, AVVideoComposition can be used to specify render size and render scaling, as well as frame rate. In addition, there is a set of instructions that store mixed parameters. With these blending parameters in place, AVVideoComposition can mix the corresponding image frames with a custom Compositor.

This section will focus on generating this set of instructions and creating AVVideoComposition. We will use the VideoRenderLayer generated in the previous section to generate this set of instructions.

Let’s illustrate this process with a simple example, as shown below. The AVComposition has three VideoRenderLayer1, VideoRenderLayer2, and VideoRenderLayer3 Layers. The transformation process consists of the following steps:

  • Record each on the timelineVideoRenderLayerFigure T1-T6 shows the start time and end time of.
  • Create a Instruction for each interval that intersects with the intervalVideoRenderLayerInstruction1-instruction5 are both mixed parameters for Instruction.

Next we create AVVideoComposition and set the frame rate, render size, Instruction group and custom Compositor. The core code is as follows:

let videoComposition = AVMutableVideoComposition()
videoComposition.frameDuration = renderComposition.frameDuration
videoComposition.renderSize = renderComposition.renderSize
videoComposition.instructions = instructions
videoComposition.customVideoCompositorClass = VideoCompositor.self
Copy the code

Now that we have the Instruction groups and blending parameters required for rendering, we move on to how to draw frames in the Compositor using them. We made an update to the previous Compositor workflow to update the blending parameters to the VideoRenderLayer group that intersects with Instruction.

We also use an example to illustrate the rules for video blending, as shown below. At T1 time, we want to mix the VideoRenderLayer screens.

Our rendering blending rules are as follows:

  • The sortingVideoRenderLayerGroup, according to what it containsRenderLayerlayerLevel. Order vertically from high to low as shown in the figure above.
  • traverseVideoRenderLayerGroup, for eachVideoRenderLayerIt is divided into the following three mixing methods:
    • The currentVideoRenderLayerVideoRenderLayerGroupIs the pre-synthesis mode. Iterate through the process of its own internalVideoRenderLayerGroup to generate a texture and blend it into the previous texture.
    • The currentVideoRenderLayerSourceContains video tracks orSourceFor the image type, take the texture process to its own Special effects Operations group and then blend it into the previous texture.
    • The currentVideoRenderLayerEffects manipulation group only, all actions applied to the previously mixed texture.

Render blending rules to summarize, render hierarchically, from bottom to top. If the current level has a texture, process its own texture first and then blend it into previous textures. If there is no texture at the current level, the operation works directly on the previous texture.

Let’s use the rule in the example above, assuming we end up with an Output Texture:

  1. The VideoRenderLayerGroup at the lowest level generates Texture1, which is blended into the Output Texture.
  2. Handle the VideoRenderLayer2 to generate Texture2 and blend Texture2 into the Output Texture.
  3. Handle the VideoRenderLayer3 to generate Texture3 and blend Texture3 into the Output Texture.
  4. The effects group that handles VideoRenderLayer4 on the Output Texture.

AVAudioMix

As you can see from AVFoundation, AVAudioMix is used to process audio. AVAudioMixInputParameters AVAudioMix contains a set of, can set MTAudioProcessingTap real-time processing of audio, set the AVAudioTimePitchAlgorithm specified pitch algorithm.

This chapter will mainly introduce how to generate this group AVAudioMixInputParameters, as well as create AVAudioMix. We will be using AVComposition AudioRenderLayer generated by the chapter, generate AVAudioMixInputParameters in this group.

Let’s illustrate this process with a simple example, as shown below. This AVComposition has AudioRenderLayer1, AudioRenderLayer2, and AudioRenderLayer3. The transformation process consists of the following steps:

  • For eachAudioRenderLayerCreated aAVAudioMixInputParameters
  • For eachAVAudioMixInputParametersSet up aMTAudioProcessingTap.MTAudioProcessingTapFor real-time processing of audio fromRenderLayerAudioConfigurationGet audio configuration to calculate the volume at the current point in time in real time.
  • For eachAVAudioMixInputParametersSet up theAVAudioTimePitchAlgorithm.AVAudioTimePitchAlgorithmUsed to set the pitch algorithm fromRenderLayerAudioConfigurationGets the pitch algorithm configuration.

Then we create AVAudioMix, and set the AVAudioMixInputParameters group. The code is as follows:

let audioMix = AVMutableAudioMix()
audioMix.inputParameters = inputParameters
Copy the code

The above chapters introduce the implementation of the framework from a large dimension. For the introduction of Metal part, I will consider another article in the future. In the next few chapters, we introduce the framework’s follow-up plans, some sharing of other applications in the process of developing the framework, and recommended learning materials.

The follow-up plan

  • Open GL rendering is supported (the user decides to use Metal or Open GL for the rendering engine).
  • Features continue to be added, such as variable speed, easier ways to use transitions (perhaps by providing transitions Layer), and so on.
  • Provides interface interaction Demo.

Reverse share

In the process of developing the framework, the author reversed many video editors at home and abroad. After comparing their respective schemes, WE choose AVFoundation plus Metal scheme as the core of the framework. Here are some highlights of reverse Videoleap:

  • Draw calls as little as possible, and try to put all operations of a layer in a Shader script (for example, YUV conversion, filter conversion, etc. of a video clip in Videoleap are all in a Shader).
  • Better texture generation performance with IOSurface (iOS 11 or greater).
    • Metal corresponding methodmakeTexture(descriptor:iosurface:plane:)
    • Open GL corresponding methodtexImageIOSurface(_:target:internalFormat:width:height:format:type:plane:)
  • Use the Framebuffer Fetch as much as possible (can be used if the fragment is only the color change of the pixel itself, can not be used if there are reference adjacent pixels)
    • [[color(0)]]
    • Open GL Reference, search GL_EXT_shader_framebuffer_fetch

Recommended data

AVFoundation

  • WWDC 2012 – Real-Time Media Effects and Processing during Playback
  • WWDC 2013 – Advanced Editing with AV Foundation
  • WWDC 2020 – Edit and play back HDR video with AVFoundation
  • AVFoundation Programming Guide
  • Apple AVFoundation Collection
  • Apple Sample Code – AVCustomEdit
  • Github – Cabbage

Apply colours to a drawing

  • Learn OpenGL – Getting started
  • Apple Metal Collection
  • Metal Best Practices Guide
  • Metal by Tutorials
  • Metal by Example
  • Small column – iOS image processing
  • Github – GPUImage3

The author

  • Keng Hung Yuen, currently working at RingCentral, is the former head of iOS
  • Mail: [email protected]