This section is the second section of ffMPEG Development player learning notes “Soft Unpack Video Stream, Render RGB24”

If the display is mostly the RGB color standard, on the display, is through the electron gun hit on the screen of red, green, blue three color luminous pole to produce color, computer generally can display 32-bit color, there are more than ten million colors. All the colors on the computer screen are made up of these three colors, red, green, blue, mixed in different proportions. A set of red, green and blue is the smallest unit of display. Any color on the screen can be recorded and represented by a set of RGB values. Therefore, this red, green and blue light is also known as the three primary colors, which are R(red), G(green) and B(blue) in English.

Reading a frame of raw video stream information may be incomplete, depending on different codec standards. For example, in h264, which is widely used, non-key frames may need to refer to the front or back frames. Decoding video frames with FFMPEG is easy thanks to ffMPEG’s excellent design and packaging.

✅ section 1 – Hello FFmpeg 🔔 section 2 – Soft unclip video stream, render RGB24 📗 section 3 – understand YUV 📗 section 4 – hard decoding,OpenGL render YUV Disk section 5 – Metal render YUV disk section 6 – Decode audio, play with AudioQueue 📗 section 7 – audio and video synchronization 📗 Section 8 – Perfect playback control 📗 Section 9 – double speed play 📗 section 10 – increase video filtering effect section 11 – audio changes

The Demo address of this section is github.com/czqasngit/f… The example code provides both Objective-C and Swift implementations. This article cites Objective-C code for ease of illustration, since Swift code Pointers don’t look neat. The final effect of this section is shown below:

The target

  • Understand ffMPEG video soft decoding process
  • Read a data frame from FFMPEG and decode it
  • Understand ffMPEG Filter workflow
  • Use FFMPEG filter to output RGB24 video frames
  • Render video frames in RGB24 format

Understand ffMPEG video soft decoding process

The previous section showed the ffMPEG soft decoding initialization flow chart, so let’s look at the complete flow chart from initialization to decoding rendering video:

The general logic of the flowchart looks like this:

  • 1. Initialize FFMPEG
  • 2. Read a frame from FFMPEG
  • 3. Read the video data, put it into the video decoder context for decoding, and get a frame of video data in the original format
  • 4. Put the data into FFMPEG filter and output the data in the target format
  • 5. Render target format data frames

Read a data frame from FFMPEG and decode it

Currently only the video is rendered, all the triggers that read the video frame use timers and have the Timer trigger interval set to 1.0/ FPS.

1. Read video frames

AVPacket packet = av_packet_alloc();
av_read_frame(formatContext, packet);
Copy the code

Packet is used repeatedly, and the contents of the last frame of data need to be cleaned before use

av_packet_unref(packet);
Copy the code

After reading, determine whether it is a video frame or an audio frame

if(self->packet->stream_index == videoStreamIndex){ }
Copy the code

VideoStreamIndex is read from AVStream when the AVCodecContext is initialized

2. Decode video frames

int ret = avcodec_send_packet(codecContext, packet);
Copy the code

Call the function to send the undecoded frame to the video decoder for decoding

AVFrame *frame = av_frame_alloc(); Av_frame_unref (frame); if(ret ! = 0) return NULL; ret = avcodec_receive_frame(codecContext, frame);Copy the code

Get the decoded video data frame and save the data in the frame

At this point the decoding of the video frame is complete, but the result is the original video format such as YUV420P. This section needs to render RGB24, so it needs to transcode the video frame.

Understand ffMPEG Filter workflow

Ffmpeg filter can be understood as filter and filter. Different data transformation is defined as filter nodes one by one, so that data flows through these pipes connected by filter like water. Data enters from the inlet (buffer) and passes through the filter transformation. The data coming out of the bufferSink is what we want in the end. Its general structure is as follows: BufferThe filter provided in FFMPEG, which is responsible for receiving data, is the input to the entire Filter graph, which contains only one output. It takes several initialization parameters, one of which is pix_fmt, which specifies the format of the input video.

BufferSinkThe filter provided in ffmPEG, which outputs the final data, is the output end of the entire Filter graph, which contains only one input end. It has an initialization parameter pix_FMts that specifies the format of the video frame to output.

Filters: Developers can customize parts of the free component. Each filter has an input and output for connecting the upper and lower filters. Developers can code their own filters to achieve the desired effect, and FFMPEG also provides some off-the-shelf filters to use. Each intermediate Filter contains the input end and output end to undertake the video frame data of the previous Filter and output the processed video frame data

AVFilterGraph: Manager of the entire filter.

Use FFMPEG filter to output RGB24 video frames

1. CreateAVFilterGrapha

AVFilterGraph *graph = avfilter_graph_alloc();
Copy the code

2. Create a Buffer Filter

AVRational time_base = stream->time_base; Const AVFilter *buffer = avfilter_get_by_name("buffer"); char args[512]; // The buffer filter is passed in a string as the initialization parameter. It is important to note that the corresponding variable cannot be AV_OPT_TYPE_BINARY AV_OPT_TYPE_BINARY needs to be set separately. Its data points to an address in memory, so it cannot initialize snprintf(args, sizeof(args), "video_size=%dx%d:pix_fmt=%d:time_base=%d/%d:pixel_aspect=%d/%d", codecContext->width, codecContext->height, codecContext->pix_fmt, time_base.num, time_base.den, codecContext->sample_aspect_ratio.num, codecContext->sample_aspect_ratio.den); AVFilterContext *bufferContext = NULL; Int ret = avfilter_graph_create_filter(&bufferContext) int ret = avfilter_graph_create_filter(&bufferContext) buffer, "in", args, NULL, graph);Copy the code

The third parameter in avfilter_graph_create_filter gives the instance an alias. The filter in the middle is used to specify the filter instance by a character during subsequent concatenation.

AVBuffer can be understood as a definition and AVFilterContext as an instance of an implementation, just as AVCodec is related to AVCodecContext. The initialized variables are defined in buffersrc.c as follows: pixel_aspect: Aspect ratio of one pixel. On a computer the ratio is 1:1, with pixels as a square. On some devices that pixel is not a square. In simple terms, display the ratio of screen width to height occupied by a pixel.

pix_fmt: Raw data format.

In particular, why is it possible to initialize a string as a key-value pair?

This is because FFMPEG implements the ability to find corresponding properties by string, which is implemented through AVClass. AVFilterContext is defined as follows:In FFMPEG, the first variable of any structure that supports lookup or setting by key is an AVClass pointer.

AVClass stores the AVOption pointer associated with this instance, which can be used to find and set the function. All functions that operate on AVClass or objects whose first variable is an AVClass pointer are defined inavutil/opt.hIn the.

3. Create BufferSink

int ret = avfilter_graph_create_filter(&bufferSinkContext, bufferSink, "out", NULL, NULL, graph); av_print_obj_all_options(bufferSinkContext); /** pix_fmts defines an AVFilter named buffersink in buffersin. c and adds an AVOption to pix_fmts static const AVOption buffersink_options[] =  { { "pix_fmts", "set the supported pixel formats", OFFSET(pixel_fmts), AV_OPT_TYPE_BINARY, .flags = FLAGS }, { NULL }, }; */ /// where pix_fmts cannot be initialized as a string because its type is an AV_OPT_TYPE_BINARY // pix_fmts is defined as follows: enum AVPixelFormat *pixel_fmts; RGB24 enum AVPixelFormat format[] = {AV_PIX_FMT_RGB24}; // Ret = av_opt_set_bin(bufferSinkContext, "pix_fmts", (uint8_t *)&format, sizeof(self-> FMT), AV_OPT_SEARCH_CHILDREN);Copy the code

The process for creating a BufferSink is the same as creating a Buffer, The only thing to note here is that the attribute defined in liavfilter/buffersink.c has only one pix_fmts(target format), which is of type binary, so you can’t pass the parameter as a string to the initialization method. You need to use the extra method av_opt_set_bin This is one of a series of methods defined in opt.h mentioned earlier.

4. Initialize AVFilterInOut

AVFilterInOut *inputs = avfilter_inout_alloc();
AVFilterInOut *outputs = avfilter_inout_alloc();
inputs->name = av_strdup("out");
inputs->filter_ctx = bufferSinkContext;
inputs->pad_idx = 0;
inputs->next = NULL;

outputs->name = av_strdup("in");
outputs->filter_ctx = bufferContext;
outputs->pad_idx = 0;
outputs->next = NULL;
Copy the code

Outputs ->name is “in”. Take a look at the picture below:Each AVFilterGraph has inputs and outputs that are set to “in”. Filter_ctx is a bufferContext. Inputs are outputs of buffer and inputs of bufferSink. Because buffer only has output,BufferSink only has input.

Outputs AVFilterGraph inputs and outputs AVFilterGraph outputs

Filters: Filter ret = avfilter_graph_parse_ptr(graph, "null", &inputs, &outputs, null); // Filters: filter ret = avfilter_graph_parse_ptr(graph, "null", &inputs, &outputs, null);Copy the code

String parsing is used to add filters to the graph, where no additional filters are concatenated in the middle, so “null” is passed. There are two filters in the whole graph,buffer(the input filter that decodes the data) and buffersink(the filter that gets the decoded data). “Null” is a special filter that indicates that there are no other filters. It is defined as follows:

AVFilter ff_vf_null = {
   .name        = "null",
   .description = NULL_IF_CONFIG_SMALL("Pass the source unchanged to the output."),
   .inputs      = avfilter_vf_null_inputs,
   .outputs     = avfilter_vf_null_outputs,
};
Copy the code

If another filter was used, it would be described like this:

const char *filter_descr = "scale=78:24,transpose=cclock";
Copy the code

6. Check and link

int ret = avfilter_graph_config(graph, NULL);
Copy the code

7. Output video frames in RGB24 format

int ret = av_buffersrc_add_frame(bufferContext, frame);
if(ret < 0) {
    NSLog(@"add frame to buffersrc failed.");
    return;
}
ret = av_buffersink_get_frame(bufferSinkContext, outputFrame);
Copy the code

Av_buffersrc_add_frame adds the original data post (data frame to be converted) to the bufferContext, and then retrieves the converted data frame from the bufferSinkContext via AV_BUFFerSINk_get_frame.

Render video frames in RGB24 format

The format of AVFrame is RGB24, it has only one flat data stored in data[0], linesize[0] holds the required number of bytes in a row. Since different CPU platforms may have different alignments, this data cannot be treated as width. Use CoreGraphics to render. The code is as follows:

- (void)displayWithAVFrame:(AVFrame *)rgbFrame {
    int linesize = rgbFrame->linesize[0];
    int videoHeight = rgbFrame->height;
    int videoWidth = rgbFrame->width;
    int len = (linesize * videoHeight);
    UInt8 *bytes = (UInt8 *)malloc(len);
    memcpy(bytes, rgbFrame->data[0], len);
    dispatch_async(display_rgb_queue, ^{
        CFDataRef data = CFDataCreateWithBytesNoCopy(kCFAllocatorDefault, bytes, len, kCFAllocatorNull);
        if(!data) {
            NSLog(@"create CFDataRef failed.");
            free(bytes);
            return;
        }
        if(CFDataGetLength(data) == 0) {
            CFRelease(data);
            free(bytes);
            return;
        }
        CGDataProviderRef provider = CGDataProviderCreateWithCFData(data);
        CGBitmapInfo bitmapInfo = kCGBitmapByteOrderDefault;
        CGColorSpaceRef colorSpaceRef = CGColorSpaceCreateDeviceRGB();
        CGImageRef imageRef = CGImageCreate(videoWidth,
                                            videoHeight,
                                            8,
                                            3 * 8,
                                            linesize,
                                            colorSpaceRef,
                                            bitmapInfo,
                                            provider,
                                            NULL,
                                            YES,
                                            kCGRenderingIntentDefault);
        NSSize size = NSSizeFromCGSize(CGSizeMake(videoWidth, videoHeight));
        NSImage *image = [[NSImage alloc] initWithCGImage:imageRef
                                                     size:size];
        CGImageRelease(imageRef);
        CGColorSpaceRelease(colorSpaceRef);
        CGDataProviderRelease(provider);
        CFRelease(data);
        free(bytes);
        
        dispatch_async(dispatch_get_main_queue(), ^{
            @autoreleasepool {
                self.imageView.image = image;
            }
        });
        
    });
}
Copy the code

At this point, the overall process of decoding video frames, outputting RGB24 format and rendering is complete 👏👏👏.

It is worth noting that rendering using CoreGraphics is not very efficient, with a CPU usage of 35%.

Conclusion:

  • Understand ffMPEG decoding process, its process is not complex 🙌🙌🙌🙌
  • Read a frame of raw data, determine whether it is audio or video, and hand it to a different decoder for decoding
  • Understand the process of using filter, and use filter to complete the output of target format
  • Render RGB24 using CoreGraphics

For more content, please pay attention to wechat public number << program ape moving bricks >>