1. Problem description
1.1 background
Previously based on FFMPEG to do secondary development, complete the common video processing functions, and FFMPEG command line to do the bottom. On this basis, also do a transcoding access and dispatch system to provide services. One function needs to be like this: quickly crop subvideos in a certain time range from the specified video, two requirements: 1. To be fast, not as time-consuming as transcoding; 2. To be precise, you can specify the beginning and end of a second when editing.
1.2 the difficulty
Ffmpeg makes it easy to cut a small video from a long one. For example, ffmpeg -i input. Mp4 -ss 00:10:03 -t 00:03:00 -vcodec copy -acodec copy Output.mp4 means to cut a 3-minute video from the 10th minute and 03 seconds of input.mp4 and save it as output.mp4. Parameter -vcodec copy -acodec Copy copies the audio and video stream of the original video without codec. While the above method is convenient, it has a fatal flaw: the picture freezes at first (but the sound stays normal), and it takes a few seconds for the picture to scroll properly. Here’s an example.
2. Cause analysis
The reason is that the start time of the clip falls in the middle of the VIDEO GOP instead of the first I frame. Those of you who know a little bit about video coding have probably heard of I, B, P frames. In simple terms, I frame is a complete image, P frame is differentially encoded according to I frame, and B frame is differentially encoded according to the I, P and B frames before and after. That is, I frames have complete contents, while P and B frames do not, so if I frames are missing, P and B frames cannot be decoded properly. Generally speaking, the first frame in a GOP is I frame, followed by several P and B frames. A GOP as long as 10 seconds is possible. The figure below is an infographic of I, B and P frames of a real video. The red one represents I frame. It can be seen that the two I frames are far apart (actually 10 seconds apart).
From the above analysis, it can be seen that the beginning time of the clip is probably not in I frame, because the lack of I frame will make the following P and B frames unable to be decoded, resulting in the picture being stuck. The above analysis is based on the direct copy of video content without decoding. If you consider decoding into images first, and then encode the images that meet the time requirements, then the editing time can be very accurate. But doing so takes too long: it takes a lot of CPU to complete the codec operation.
3. Solutions
There is a solution: encode and decode the first GOP that meets the time requirement, and copy the subsequent GOP content directly to the target video. Firstly, since the first GOP frame is re-encoded, I frame will be reassigned so that it can be played. Secondly, the subsequent GOP content is directly copied, so it basically consumes no CPU and performance lever. As shown below:
Of course, there are still some holes in there, so let’s start filling them in.
3.1 joining together
Source video may be surprised: I rely on the ability to make up the code, why can you just copy it to decode? In general, decoding depends on SPS and PPS, and the SPS and PPS of source video and target video will be different, so direct copy will not decode correctly. For MP4 files, SPS and PPS are generally placed in the file header. A file can only have one header, so it cannot store two different SPS and PPS. SPS and PPS of active video must be obtained in order to decode target video correctly. If you can’t put headers, where can you put them? Can I put it in front of the copied frame? How to put? There was nothing to do until one day I suddenly remembered that in order to fill a hole, I traced the implementation of H264_MP4TOAnnexb, whose function was to copy SPS and PPS to the front of the frame (AVPacket to be exact). Come on! Review the implementation of H264_MP4TOAnnexb: Add 0x000001 or 0x00000001 before all AVPackets and insert SPS and PPS before I frames. That is, h264_MP4TOEXannB can correctly insert the SPS and PPS required for decoding into the video. H264_mp4toannexb is also simple to use with the following code:
AVBSFContext* initBSF(const std::string &filter_name, const AVCodecParameters *codec_par, AVRational tb)
{
const AVBitStreamFilter *filter = av_bsf_get_by_name(m_filter_name.c_str());
AVBSFContext *bsf_ctx = nullptr;
av_bsf_alloc(filter, &bsf_ctx);
avcodec_parameters_copy(bsf_ctx->par_in, codec_par);
bsf_ctx->time_base_in = tb;
av_bsf_init(bsf_ctx);
return bsf_ctx;
}
AVPacket* feedPacket(AVBSFContext *bsf_ctx, AVPacket &packet)
{
av_bsf_send_packet(bsf_ctx, packet);
AVPacket *dst_packet = av_packet_alloc();
av_bsf_receive_packet(bsf_ctx, dst_packet);
return dst_packet;
}
void test()
{
AVBSFContext *bsf_ctx = initBSF("h264_mp4toannexb", video_stream->codecpar, video_stream->time_base);
AVPacket *packet = readVideoPacket();
AVPacket *dst_packet = feedPacket(bsf_ctx, packet);
}
Copy the code
Note: The time stamp of the first GOP and the subsequent GOP splicing of the original video should be handled carefully, otherwise the video may appear jitter when playing.
3.2 take screen
You think that’s it? No!!!!! You’ll find some videos that have a split-screen at the last second…
It’s not hard to guess why: the last frame is B. Since not all clips end with B frames, splintering is not mandatory. Knowing that frame B is the cause, the solution is clear: make sure the last frame is P. Even if it’s a little ahead of time (the audio stream should also follow the video stream). However, since it is not possible to determine whether a frame is a P frame directly from AVPacket, the last GOP must be decoded (no encoding required). Record the PTS of the first P frame after the time limit is exceeded. Copy to this PTS can be stopped when copying GOP later.
4. To summarize
At first it was difficult to solve the problem, after all ffmpeg command lines are cropped out of the problem. And all changes are inseparable from its ancestor, starting from the cause of the problem, step by step to find a solution, and will encounter problems along the way one by one. Remember, understand the principles to solve the problem.