FFmpeg for Android audio and Video

Takeaway:

Since the business side had a demand for synthesizing video, I wanted to make the function of combining pictures and videos with transition and BGM and then synthesizing and exporting, SO I studied audio and video technology, and found that Android native technology did not meet the demand, so I went to learn the use of FFmpeg, which took about two weeks. In the middle encountered various problems, fortunately, finally thought of the solution, here to record the learning process, to avoid the same pit

This article contains the following contents:

1.FFmpeg common commands

2. Design ideas and performance optimization of video synthesis and transition

3. Major problems encountered in the project and their solutions

4. Follow-up optimization plan

FFmpeg Common commands

FFmpeg website:ffmpeg.org

There are all command parameter description, very detailed

The simplest command

ffmpeg -i input.mp4 output.avi
Copy the code

FFmpeg [global options] {[Input file options] -i Input file path}… {[Output file option] Output file address}…

* – I input

Picture to video

ffmpeg -y -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -loop 1 -i pic.png/jpg -c:v libx264 -r 25 -t 1 out.jpg
Copy the code

*-f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100

*-r Outputs the video frame rate

*-y Forcibly overwrites the file

*-t Indicates the output video duration, in seconds

Generate video on black background (with mute track)

ffmpeg -y -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -f lavfi -r %d -i color=black -t %f -vf scale=1280:720 -vcodec mpeg4 %s
Copy the code

Cut the video

ffmpeg -i input.mp4 -filter:v "crop=w:h:x:y" output.mp4
Copy the code

Changing video Resolution

ffmpeg -i input.mp4 -filter:v scale=1280:720 -c:a copy output.mp4
Copy the code

*-c:a copy Preserves the encoding format of the original audio

Gaussian blur

ffmpeg -y -i input.mp4 -filter_complex split[a][b]; [a]scale=1280:720,boxblur=30:5[a]; [b]scale=1280:720:force_original_aspect_ratio=decrease[b]; [a][b]overlay=(W-w)/2[blur]; [blur]pad=1280:720:(ow-iw)/2:(oh-ih)/2,setdar=16/9 -vcodec libx264 -r %d -preset superfast out.mp4Copy the code

Split the input stream into two input streams a, B, A input stream blurred and spread the screen, B input stream keep the video scale cover in the middle of a input stream

Add a Text Watermark

ffmpeg -i input.mp4 -vf drawtext=fontfile=%s:fontcolor=white:fontsize=36:text='... ':x=(w-tw)/2:y=(h-text_h)/2),drawtext=fontfile=%s:fontcolor=white:fontsize=36:text='... ':x=(w-tw)/2:y=((h-text_h)/2)+(text_h-(th/4)) -y -vcodec mpeg4 output.mp4Copy the code

*fontfile is a fontfile, x=(w-tw)/2:y=(h-text_h)/2) indicates that the text is in the center of the screen, w is the width of the video, tw is the width of the text, and text_h = th

Add the BGM

ffmpeg -i video.mp4 -i bgm.mp3 -filter_complex [1:a]aloop=loop=-1[out]; [out][0:a]amix -ss 0 -t %f -y %sCopy the code

*[1:a] represents the sound track of the second input stream, aloop=-1 loops BGM, AMix mixes video and BGM tracks

There are a lot of commonly used commands online, according to their own needs to find generally can find, but the key is to understand the principle of FFMPEG, or out of the problem or can not be located

For example, the concat concatenation command, the principle of the official website is as follows: at first, when I saw [0:a][3: V], I was also confused. Later, I understood that these are video and audio input

Understanding the input stream

Understanding the Filter principle

Two: video synthesis and transition design ideas and performance optimization

At the beginning, the design idea was to use the fade command to transition, but the effect was not very good. The second video would enter after the first video completely fades out, and there would be a black screen in the middle

It looks like a powerpoint presentation, something like that

So I try to do cross fades, something like that

The code looks like this (omit setting resolution, video ratio, frame rate, audio, etc.) :

ffmpeg -y -i a.mp4 -i b.mp4 -filter_complex [0:v]fade=t=out:st=%f:d=%f:alpha=0:color=black,setpts=PTS-STARTPTS[v0]; [1:v]fade=t=in:st=0:d=%f:alpha=1,fade=t=out:st=%f:d=%f:alpha=0:color=black,setpts=PTS-STARTPTS+%f/TB[v1]; [v0][v1]overlay[outv] -vcodec libx264 -map [outv] -f mp4 -r 25 -preset medium out.mp4Copy the code

* ST – Start gradient time D – Duration alpha- (If set to 1, fade only alpha channel, If one exists on the input. Default value is 0. Setpts = pts-startpts +%f/TB The point at which the first frame starts

In the initial test, there seemed to be no problem, but when the actual operation was added, the problem became obvious. Because of the overlay operation, when more than 30 videos were overlaid, the synthesis speed would be much slower. Moreover, the phone with poor performance directly reported an OOM error, and the memory could not withstand it

So a third solution came up, using concat to concatenate video instead of overlay video

First, we cut the two videos into four videos. The first one cuts the last second, the second one cuts the first second,

Then the two cut videos are gradient + superimposed to create a new video, and finally the three videos are connected

/**
     * 多个视频拼接在一起concat(交叉淡入)，只拼接和加bgm，不处理其他
     */
    public static String[] concatMultiVideoWithFade(List<MediaBean> list , String bgmFilePath, String fontFilePath, String targetPath) {

        final float FADE_DURATION = 0.2f;//0.2秒的fade
        final int FRAM = 25;//帧率
        final int bit = getFitBitRate(1280 * 720);

        StringBuilder bd =  new StringBuilder();
        bd.append("ffmpeg ");
        bd.append("-y ");
        float totaltime = 0 ;
        for(int i = 0 ; i < list.size() ; i++){ //; MediaBean vedio : list
            MediaBean vedio = list.get(i);

            //以下时间单位都是秒
            if (vedio.getType() != 1){
                //图片转视频，直接用视频时长
                bd.append(String.format("-i %s -r %d "  , codeString(vedio.getPath()) ,FRAM ));

                //加入视频总时长
                float addtime = (vedio.getTime()/1000f - FADE_DURATION);
                totaltime += addtime > 0 ? addtime : FADE_DURATION;

            }else {
                //原始视频，根据需求规则截取
                bd.append(String.format("-i %s -r %d " , codeString(vedio.getPath()) ,FRAM));
                float addtime = (vedio.getTime()/1000f - FADE_DURATION);
                totaltime += addtime > 0 ? addtime : FADE_DURATION;
            }

        }

        //最后一个不用减
        totaltime+=FADE_DURATION;

        //添加bgm
        if (!TextUtils.isEmpty(bgmFilePath)){
            bd.append(String.format("-i %s " , bgmFilePath));
        }
        bd.append("-filter_complex ");

        //plan A 加转场
        //渐变转场
        for(int i = 0 ; i < list.size() ; i++) {//; MediaBean vedio : list
            MediaBean vedio = list.get(i);
            float t = vedio.getTime()/1000f;
            //开始渐变的时间点
            float duration;
            if (t > FADE_DURATION ){
                duration = t - FADE_DURATION;
            }else {
                duration = t;
            }

            /*
            //拆分
            v0 -> [out0][fadeout0]
            v1 -> [fadein0][out1][fadeout1]
            v2 -> [fadein1][out2][fadeout2]
            v3 -> [fadein2][out3]
            //混合
            [fadeout0][fadeout0] fade-> [fade0]
            //拼接
            [out0][fade0][out1][fade1]..[fade n-1][outn]

             */
            if (i == 0){
                //第一个
                bd.append(String.format("[%d:v]split[out0][fadeout0];[out0]trim=end=%f[out0];[fadeout0]trim=start=%f,setpts=PTS-STARTPTS[fadeout0];", i , duration , duration ));
            }else if (i == list.size()-1 ){
                //最后一个
                bd.append(String.format("[%d:v]split[fadein%d][out%d];", i , i-1 , i ));
                bd.append(String.format("[fadein%d]trim=end=%f,fade=t=in:st=0:d=%f:alpha=1[fadein%d];" , i-1 , FADE_DURATION, FADE_DURATION , i-1 ));
                bd.append(String.format("[out%d]trim=start=%f,setpts=PTS-STARTPTS[out%d];" , i , FADE_DURATION , i ));

            }else {
                //中间
                bd.append(String.format("[%d:v]split[fadein%d][splite%d];", i , i-1 , i  ));
                bd.append(String.format("[splite%d]split[out%d][fadeout%d];", i , i , i ));
                bd.append(String.format("[fadein%d]trim=end=%f,fade=t=in:st=0:d=%f:alpha=1[fadein%d];" , i-1 , FADE_DURATION ,FADE_DURATION , i-1 ));
                bd.append(String.format("[out%d]trim=start=%f:end=%f,setpts=PTS-STARTPTS[out%d];" , i ,FADE_DURATION , duration , i ));
                bd.append(String.format("[fadeout%d]trim=start=%f,setpts=PTS-STARTPTS[fadeout%d];" , i , duration , i ));

            }
//            currentTime += duration;
        }

        // 0...n-1
        for(int i = 0 ; i < list.size()-1 ; i++) { //; MediaBean vedio : list
            bd.append(String.format("[fadeout%d][fadein%d]overlay[fade%d];" , i , i , i));

        }

        //0...n
        //拼接  [out0][fade0][out1][fade1]..[fade n-1][outn]
        //拼接视频
        for(int i = 0 ; i < list.size() ; i++) { //; MediaBean vedio : list

            if (i == list.size()-1){
                //最后一个
                bd.append(String.format("[out%d]" , i));
            }else {
                bd.append(String.format("[out%d][fade%d]" , i,  i ));
            }
        }

        bd.append(String.format("concat=n=%d:v=1:a=0[outv];" , (list.size()*2) - 1 ));

        float currentTime = 0;
        //覆盖音频
        for(int i = 0 ; i < list.size() ; i++) { //; MediaBean vedio : list
            MediaBean vedio = list.get(i);
            float t = vedio.getTime()/1000f;
            //开始消失时间，后面的视频进入时间
            float duration;
            if (t > FADE_DURATION ){
                duration = t - FADE_DURATION;
            }else {
                duration = t;
            }

//            adelay=1500|0|500
            if (i == 0){
                bd.append(String.format("[%d:a]afade=t=out:st=%f:d=%f,volume=10dB[a%d];" , i , duration , FADE_DURATION , i)); //,asetpts=PTS-STARTPTS
            }else {
                bd.append(String.format("[%d:a]adelay=%d|%d,afade=t=in:st=0:d=%f,afade=t=out:st=%f:d=%f,volume=10dB[a%d];" , i ,(int)(currentTime*1000) ,(int)(currentTime*1000) ,FADE_DURATION ,currentTime +duration ,FADE_DURATION ,i)); //,asetpts=PTS-STARTPTS+%f/TB
            }

//            bd.append(String.format("[%d:a]atrim=%f[a%d];" , i , duration - FADE_DURATION , i)); //,asetpts=PTS-STARTPTS

            currentTime += duration;
        }

        //拼接音频
        for(int i = 0 ; i < list.size() ; i++){
            bd.append(String.format("[a%d]",i));
        }
//        bd.append(String.format("concat=n=%d:v=0:a=1[outa]" , list.size()));

        bd.append(String.format("amix=inputs=%d:duration=longest[outa]",list.size() ));

        //添加bgm背景音
        if (!TextUtils.isEmpty(bgmFilePath)){
            bd.append(String.format(";[%d:a]aloop=loop=-1:size=2e+09,afade=t=out:st=%f:d=%f[bgm];[outa][bgm]amix=inputs=2:duration=first[outbgm]" , list.size() ,currentTime-0.8 , FADE_DURATION +0.8));
//            bd.append(String.format(" -vcodec libx264 -map [outv] -acodec aac -map [outbgm] -ar 22050 -ac 2 -ab 128k -r %d -preset medium -crf 18 %s" ,FRAM, targetPath));
            bd.append(String.format(" -vcodec libx264 -map [outv] -acodec aac -map [outbgm] -ar 22050 -ac 2 -ab 128k -r %d -pix_fmt yuv420p -preset fast %s" ,FRAM, targetPath));
        }else {
            //无bgm
            bd.append(String.format(" -vcodec libx264 -map [outv] -map [outa] -r %d -pix_fmt yuv420p -preset fast %s" ,FRAM, targetPath));
        }

        Log.d("ffmpeg--" , bd.toString());

        String str = bd.toString();
        String[] result = str.split(" ");
        for (int i = 0;i<result.length;i++){
            result[i] = decodeString(result[i]);
        }
        return result;
    }
Copy the code

In the end, it performed slightly better than the last one… But there will still be oom error,

The problem then focuses on the direction of input too much data, with input too much data, FFmpeg commands can reach hundreds of lines… Input 38 videos or pictures at most at one time, and the middle of the subtitles, cutting, Gaussian blur and other effects, not collapse just blame….

We had to sacrifice synthesis speed to improve stability, and then we came up with two ways to reduce the performance requirements

1. Each effect is composited step by step -> Picture to video -> Generate text watermark -> Cropping, Gaussian Blur -> Splicing, Transition -> add BGM

2. Grouping and synthesizing videos. Since there are at most 38 videos or pictures, a recursion is written to split 6 videos into a group, and every 6 videos are synthesized into a new video

* This kind of composition will reduce efficiency, but the version will be released soon. In order to maintain stability, I have to successfully synthesize 38 videos and pictures on mi 5S

* Later studies OpenGL ES to do transitions and video synthesis, Android native to do audio mixing, FFMPEG only deal with cropping resolution, Gaussian blur and add subtitles

Iii. Major problems encountered in the project and their solutions

1.FFmpeg source code compilation and import open-GL plug-in

At the beginning of the investigation, I found that there is an open-GL transition library that can meet the requirements, but it is necessary to insert the transition code into the source code of FFMPEG and compile the source code. It is no problem to compile on the MAC environment. But in the use of Android NDK tool (let Android call the underlying C++ code) compilation of FFmpeg, there are various problems, there is no solution on the Internet, because it involves the knowledge of architecture and assembly, do not understand, then give up, do their own transition effect

2. Failed to join other videos after transferring pictures to videos

This is because at the beginning of the video generated by the picture, there is no audio track, which should be a mistake often made by beginners, resulting in the failure to find the input audio track when synthesizing BGM

3. Sound transitions

The filter filter in FFMPEG is very powerful, but there is a big difference in video and audio processing. For example, when transferring, the video uses setpts= pts-startpts +% F /TB to realize delay playback, while the audio uses Adelay to realize delay effect..

4. Video and audio coding problems

Video splicing requires the same format, such as resolution, aspect ratio, etc., but also to ensure that the audio encoding, bit rate, bit rate should be the same

5. Add text watermark must have font file path, and the selected third-party library does not enable this function…

At first, I imported another library, but later found that the font files were too large and the two libraries were not easy to maintain, so I finally used image watermarking instead

The idea is to first use Android to generate a TextView native, and then use the principle of caching screenshots to convert the TextView into a Bitmap, then export it as a picture, and finally add it to the video

6. An OOM exception is reported if too many input sources are entered

Batch processing, as I mentioned above

7. The source video has no audio track

This I really did not think of…. The third party library FFmpeg did not report any useful errors, so I pulled the command and source files to the MAC environment to execute, and finally found that one of the videos did not have an audio track, resulting in the synthesis failure

This problem is not what problem actually, but what pit father is that kuo he did not sign up for any information…. In fact, I should have compiled FFmpeg source code myself and imported it into Android using NDK, but I tried several times and failed, so I used someone else’s sealed library, which caused me to spend time trying to find out what the problem was, and even thought it was a performance problem and went further and further. Finally, I thought of using MAC FFmpeg to run this command. [5: A] Audio has a problem. It can be seen that the encoding of the audio track is wrong or missing. Well, I wasted most of my day

Experience is a multi-platform tool or framework like this. If something goes wrong on one platform and you don’t know what to do, try a different platform and you might learn something useful

Iv. Follow-up optimization

1. Ffmpeg specializes in picture and video processing, video stitching and transfer with OpenGL

2. If the aspect ratio of the original video or picture is close to the target aspect ratio, gaussian blur is not applied to improve performance

**-frames:v 1 Save a frame as an image