preface
This article begins to explain the most interesting knowledge points ππ» H264 video coding, roughly divided into three parts, including the explanation of each knowledge point and the actual coding part.
H264 structure and code stream analysis
1.1 H264 structure diagram
In the H264 structure above, the encoded data of a video image is called a frame. A frame is composed of one slice or multiple slices, and a slice is composed of one or more macro blocks (MB). A macro block is composed of multiple sub-blocks, namely 16×16 YUV data. The macro block is the basic unit of H264 coding.
-
Field and Frame: A scene or frame of a video can be used to produce an encoded image.
-
Slice: In each image, a pattern in which macroblocks are arranged into slices. The tablets are divided into I tablets, B tablets, P tablets and some others.
-
An I slice contains only I macroblocks, a P slice can contain P and I macroblocks, and a B slice can contain B and I macroblocks.
- I macro blocks make intra-frame predictions using the pixels decoded from the current slice as a reference.
- P macroblock uses the previously encoded image as the reference image for intra-frame prediction.
- B macro blocks use bidirectional reference images (preceding and following frames) for intra-frame prediction.
-
The purpose of chip is to limit the spread and transmission of error code, so that the chip is independent of each other. The prediction of one slice should not be based on macroblocks in other slices, so that the prediction error in one slice does not propagate to other slices.
-
-
Macro block: An encoded image is usually divided into several macro blocks, one of which consists of a 16Γ16 brightness pixel with an 8Γ8 Cb and an 8Γ8 Cr color pixel block attached.
1.2 H264 coding layer
H264 coding layer, divided into two layers.
-
NAL Layer: (Network Abstraction Layer)
- What it does is that as long as H264 is transmitted over the network, each packet Ethernet is 1500 bytes during transmission. H264 frames tend to be larger than 1500 bytes. So it’s going to be
unpacking
. Split a frame into multiple packets for transmission. All of theUnpack or pack
Is throughNAL layer
To deal with it.
- What it does is that as long as H264 is transmitted over the network, each packet Ethernet is 1500 bytes during transmission. H264 frames tend to be larger than 1500 bytes. So it’s going to be
-
VCL Layer (Video Coding Layer) is used to compress Video raw data.
1.3 Basic concepts of bit stream
-
SODB:(String of Data Bits). The length may not be a multiple of 8. It is generated by the VCL layer. Because it’s not a multiple of 8, it’s a little tricky.
-
RBSP:(Raw Byte Sequence Payload,SODB+trailing bits) . The algorithm is to fill the last bit of SODB with 1. Not byte aligned with 0. If you complete 0, do not know where to end. If there are less than 8 bits, add 0 bits.
-
EBSP:(Encapsulate Byte Sequence Payload) . After generating the compressed stream, we also add a start bit before each frame. The starting bit is usually 0001 in hexadecimal, but there may be two consecutive 0x00 bits in the entire encoded data. So that’s a conflict with the starting bit. So what happened? The H264 specification states that if two consecutive 0x00 values are processed, an additional 0x03 value is added. This prevents the compressed data from colliding with the starting bit.
-
NALU: NAL Header(1B)+ EBsp. NALU is a network Header added to EBSP.
Key points of EBSP decoding
-
Each NAL is preceded by a start code 0x00 00 01 (or 0x00 00 00 00 01). The decoder detects each start code as a start identifier for a NAL. When the next start code is detected, the current NAL ends.
-
At the same time, H.264 states that when the detection of 0x00 00 01, the end of the current NAL can also be represented. What happens when 0x000001 or 0x000000 occurs in NAL? H.264 introduces a mechanism to prevent competition. If the encoder detects the presence of 0x000001 or 0x000000 in NAL data, the encoder will insert a new byte 0x03 before the last byte, so that when the decoder detects 0x000003, it will discard 03 and restore the original data (unshell operation).
-
When decoding, the decoder first reads the NAL data byte by byte, calculates the NAL length, and then starts decoding.
1.4 Details on NAL Unit
The detailed structure diagram of NALU is as follows:
- NAL units are composed of a NALU header + a slice.
- Slice can be subdivided into “slice head + slice data “.
- Each slice data contains many macro blocks.
- Each macro block contains the type of the macro block, the prediction of the macro block, and the residual data.
H264 code stream layered structure diagram
- A Annex format data: start code +Nal Unit data
- NAL Unit: NALU header +NALU data
- NALU body: is composed of slices. Slice includes slice head + slice data
- Slice data: macro block composition
- PCM: Macro block type + PCM data, or macro block type + macro block mode + residual data
- Residual: the Residual block
β οΈ This graph is more important. You can see more.
Ii. Introduction to VideoToolBox
VideoToolBox is a native hard-coded framework released after apple iOS8.0, using hardware accelerators and based on Core Foundation library functions (it is written in C language).
2.1 Procedure
We generally use the VideoToolBox framework, and the things we need to do include ππ»
-
Create session -> Set parameters related to encoding -> Start encoding -> loop input source data (YUV type data, directly obtained from the camera)-> get encoded H264 data -> End encoding
-
Build H264 files, network transmission is actually H264 files
2.2 Basic data structures
CMSampleBuffer has two cases of encoding and decoding, which are different ππ»
After the coding
ππ» Data is stored inCMBlockBuffer
In whichStreaming data
That’s where I got itunencoded
ππ» Data is stored inCVPixelBuffer
In the
2.4 Coding process
In the figure above, through video coding, the original data is encoded to generate H264 stream data. However, it is not that the H264 data can be directly handed over to the decoder for processing. The decoder can only process H264 file data.
2.3 the h264 file
In the figure above π π»
-
SPS and PPS are the first to be decoded. SPS and PPS should be decoded first before the following data can be analyzed.
-
Next, I B P frame, refer to 03- video coding ## 7, H264 related concepts.
-
No matter what kind of framework you use, such as VideoToolBox, FFmpeg, hard coding, etc., no matter what kind of platform you are on, such as MAC, Windows or mobile terminal, you need to follow the FORMAT of H264 file.
The SPS and PPS
SPS(Sequence Parameter Sets)
Picture Parameter Sets (PPS)
This is just an understanding.
2.4 Determine the frame type I, B and P
As we know, video is composed of frames, and frames are composed of one or more pieces of data. In the process of network transmission, a piece of data may be very large, which needs to be unpacked and sent, and then packets are assembled after receiving. Then the problem comes:
How to identify the type of frame, distinguish between I, B and P frames?
Iii. Detailed explanation of NALU unit data
NALU = NAL Header + NAL Body
H264 bit streams are actually transmitted in the form of NALU in the network. Each NALU consists of 1-byte Header and RBSP, as shown in the following figure ππ»
3.1 NAL Header Parsing
The NAL Header is 1 byte and contains 8 bits. What data does the 8 bits contain? π π»
- A: 0
F
- 1-2:
NRI
- 3-7:
TYPE
Type, that’s how it worksDetermine the frame type
Frame I, frame B, frame P
F
:forbidden_zero_bit
In theH264
It’s in the codeThe first digit has to be 0
I’m not going to explain this in detail,Keep in mind that
Can.NRI
Said:The importance of
It is of no use at present.000 is the least useful, 111 is the most useful
. Used to representThe current NALU
The importance ofThe larger the value, the more important it is
. The decoder can discard the NALU of zero importance when the decoding fails.TYPE
: indicates the NAL type. There are many tables in the following table, but you need to remember several common ones ππ»-
5: IDR image slice (can be understood as I frame, I frame is composed of multiple I slices)
-
7: Sequence Parameter Sets (SPS)
-
8: Image Parameter Set (PPS)
-
3.2 NAL Types
Single type
: An RTP packet contains only NALU, that is, H264 frames contain only one slice, such as P or B frames are single typeCombination type
: An RTP contains multiple NALUs of the type 24-27, which are generally placed in a package like PPS or SPS, because the two data units are very smallShard type
: AN NALU unit is divided into multiple RTP packets of type 28-29
Single NALU RTP packet
RTP packets that combine NALU
RTP packets of fragmented NALU
First byte :FU Indicator Second byte :FU Header Header of a fragment. If there are more than one fragment, FU Header will be combined
FU Header
S
:start bit
Used to indicate shardingstart
In network transmission, packet by packet, we know its shard packet, so how do we distinguish the packet at the beginning or the packet at the end? If it’s 1, it’s the beginning of a shardE
:end bit
Used to indicate shardingThe end of the
R
: Unused. Set it to0
Type
: Indicates shardingNAL type
After the network transmission is completed, fragments still need to be combined into NALU units. The NAL unit isKey frames
orNon-critical frame
, it issps
orpps
According toType
To judge
Consider: in the transmission process, a frame is cut into multiple slices. If the sequence is out of order during transmission, or one of the slices is lost, how can we judge the transmission integrity of NALU units?
Solution ππ»
According to FU S/E of the Header, and with the help of the RTP packets in baotou, baotou in RTP, include the serial number of each package, if you receive the package, received S package, also received a package, E in the middle of the package of the serial number is continuous, then package is complete, if not is the packet loss in a row, if there is no packet loss can be combined.
Iv. Realization of AVFoundation video Data Collection (1)
Next, the coding demonstrates how to collect video data. You can recall the previous 02-AVFoundation advanced capture, we previously implemented a video recording function based on the system camera, and did not involve video coding, so this coding demonstration is different ππ»
The data collection
π π» based onAVFoudation
Frames (this should be familiar)Video coding
π π» based onVideoToolBox
The framework
The whole process is roughly ππ»
Data collection -> Coding complete -> H264 files -> Write sandbox/network transfer
4.1 Data Collection
I’m sure you’re all familiar with the data acquisition process now, so I’m not going to go into the code here.
- First declare the attribute ππ»
@interface ViewController ()<AVCaptureVideoDataOutputSampleBufferDelegate> @property(nonatomic,strong)UILabel *cLabel; @property(nonatomic,strong)AVCaptureSession *cCapturesession; @property(nonatomic,strong)AVCaptureDeviceInput *cCaptureDeviceInput; @property(nonatomic,strong)AVCaptureVideoDataOutput *cCaptureDataOutput; / / capture output @ property (nonatomic, strong) AVCaptureVideoPreviewLayer * cPreviewLayer; // Preview layer @endCopy the code
Different from the video camera function, the output is used AVCaptureVideoDataOutput, so need to follow the delegate is AVCaptureVideoDataOutputSampleBufferDelegate.
Then you need to create a queue to do two things ππ» capture and code ππ»
@implementation ViewController { int frameID; // frame ID dispatch_queue_t cCaptureQueue; Dispatch_queue_t cEncodeQueue; VTCompressionSessionRef cEncodeingSession; // Encode session CMFormatDescriptionRef format; NSFileHandle *fileHandele; // File pointer to store sandbox}Copy the code
ViewDidLoad
ππ»
- (void)viewDidLoad { [super viewDidLoad]; // Do any additional setup after loading the view, Descriptor = [[UILabel alloc]initWithFrame:CGRectMake(20, 20, 200, 100)]; _clabel. text = @" H.264 hardcoded in CC class "; _cLabel.textColor = [UIColor redColor]; [self.view addSubview:_cLabel]; UIButton *cButton = [[UIButton alloc]initWithFrame:CGRectMake(200, 20, 100, 100)]; [cButton setTitle:@"play" forState:UIControlStateNormal]; [cButton setTitleColor:[UIColor whiteColor] forState:UIControlStateNormal]; [cButton setBackgroundColor:[UIColor orangeColor]]; [cButton addTarget:self action:@selector(buttonClick:) forControlEvents:UIControlEventTouchUpInside]; [self.view addSubview:cButton]; }Copy the code
Next comes the button click event
- (void)buttonClick (UIButton *)button {// Check whether _cCapturesession and _cCapturesession are capturing if (! _cCapturesession || ! _cCapturesession. Set the) {/ / modify button state [button setTitle: @ "Stop" forState: UIControlStateNormal]; // startCapture [self startCapture]; } else { [button setTitle:@"Play" forState:UIControlStateNormal]; // stopCapture [self stopCapture]; }}Copy the code
- Start recording a video ππ»
- (void)startCapture { self.cCapturesession = [[AVCaptureSession alloc]init]; / / set to capture high-resolution self. CCapturesession. SessionPreset = AVCaptureSessionPreset640x480; CCaptureQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); cEncodeQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); AVCaptureDevice *inputCamera = nil; // Obtain iPhone video capture devices, such as front camera and rear camera...... NSArray *devices = [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo]; For (AVCaptureDevice * device in devices) {/ / get the rear camera if ([device position] = = AVCaptureDevicePositionBack) {inputCamera = device; } // Encapsulate the capture device as an AVCaptureDeviceInput object self.cCaptureDeviceInput = [[AVCaptureDeviceInput alloc]initWithDevice:inputCamera error:nil]; / / determine whether can join the rear camera as an input device if ([self. CCapturesession canAddInput: self. CCaptureDeviceInput]) {/ / add equipment to the session [self.cCapturesession addInput:self.cCaptureDeviceInput]; } self.cCaptureDataOutput = [[AVCaptureVideoDataOutput alloc]init]; / / set the discarded the final video frame to NO [self. CCaptureDataOutput setAlwaysDiscardsLateVideoFrames: NO]; [self.cCaptureDataOutput setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange] forKey:(id)kCVPixelBufferPixelFormatTypeKey]]; }Copy the code
YUV4:2:0, I haven’t seen this before, so let’s see.
Five, YUV color details
The more familiar color system ππ» RGB occupies 1 byte per color channel. YUV is familiar with audio and video business development, and its characteristics are ππ»
YUV
(also known asYCbCr
) is a color coding method used in television systemsY
Said:brightness
, that is,Gray-scale values
It is the basic signalThe U and V
This is thetachroma
UV is used to describe imagesColor saturation
, which are used to specifyPixel color
.
The relationship between YUV and video: The video recorded by the camera is YUV.
5.1 Common FORMATS of YUV
- Yuv4:2:0 (YCbCr 4:2:0) ππ» is half less than RGB
- Yuv4:2:2 (YCbCr 4:2:2) ππ» is 1/3 less than RGB and saves a lot of space for historical reasons.
- Yuv4:4:4 (YCbCr 4:4:4) ππ» can be understood as 1:1:1, that is, 4 Y correspond to 4 U and 4 V.
YUV4:4:4
In the 4:4:4 mode, all color information is saved, as shown in ππ»
Each of the four adjacent pixels ABCD has its own YUV. In the process of secondary sampling of color, each pixel retains its own YUV, which is called 4:4:4.
YUV4:2:2
ABCD has four adjacent pixels, A (Y0, U0, V0), B (Y1, U1, V1), C (Y2, U2, V2), D (Y3, U3, V3). When sampling twice, A is retained (Y0, U0), B (Y1, V1), C (Y2, U2) and D (Y3, V3). That is, the Y (brightness) of each pixel retains its own value, while the values of U and V are sampled at every interval and eventually become ππ»
In other words, A borrows B’s V1, B borrows A’s U0, C borrows D’s V3, and D borrows C’s U2. This is the legendary 4:2:2, A picture of 1280 * 720 size, which is ππ» when YUV 4:2:2 is sampled
(1280 * 720 * 8 + 1280 * 720 * 0.5 * 8 * 2) / 8/1024/1024 = 1.76 MB.
It can be seen that the YUV 4:2:2 sampled image saves one-third of the storage space compared with RGB model image, and the bandwidth occupied during transmission will also be reduced accordingly.
YUV4:2:0
In the 4:2:2 mentioned above, we can see that the UV of the two adjacent pixels is borrowed from each other left and right. Can it be borrowed from left and right up and down? The answer is of course ππ»
YUV 4:2:0 sampling does not mean sampling only the U component but not the V component. Instead, only one chromaticity component (U or V) is scanned for each line, and the Y component is sampled in a 2:1 fashion.
For example, YU samples the first row in a 2:1 fashion, while YV components are sampled in a 2:1 fashion in the second row. For each chromaticity component, its horizontal and vertical samples are 2:1 relative to the Y component. Assuming that the first row scans the U component and the second row scans the V component, two rows need to be scanned to complete the UV component.
It can be seen from the mapped pixels that the four Y components share a set of UV components and are distributed in the form of 2*2 small squares. Compared with YUV 4:2:2 sampling, the two Y components share a set of UV components, which can save more space. The size of a 1280 * 720 image sampled at YUV 4:2:0 is:
(1280 * 720 * 8 + 1280 * 720 * 0.25 * 8 * 2) / 8/1024/1024 = 1.32MB Saves half of the space compared to 2.63m
5.2 YUV Storage Format
-
Planar formats: For Planar YUV formats, Y for all pixels is stored consecutively, followed by U for all pixels, and then V for all pixels, such as YYYY YYYY UU VV.
- I420: YYYYYYYY UU VV –> YUV420P (
PC
Dedicated) - YV12: YYYYYYYY VV UU –> YUV420P
- I420: YYYYYYYY UU VV –> YUV420P (
-
Packed formats: For Packed YUV formats, Y,U and V of each pixel are stored consecutively and alternately, such as YUV YUV YUV YUV, which is similar to RGB.
- NV12: YYYYYYYY UVUV –> YUV420SP
- NV21: YYYYYYYY VUVU –> YUV420SP
Possible in the development process, such as android and iOS, after decoding video, found that appear upside down or flip video images, is probably because they do not match the YUV format, general common I420 PC, android default NV21 commonly, the iOS default NV12, if you want to conduct unified, you need to ensure consistent storage format.
Vi. Realization of AVFoundation video Data Collection (2)
After understanding YUV color system, we continued to complete the video collection process ππ»
- (void)startCapture { self.cCapturesession = [[AVCaptureSession alloc]init]; / / set to capture high-resolution self. CCapturesession. SessionPreset = AVCaptureSessionPreset640x480; CCaptureQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); cEncodeQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); AVCaptureDevice *inputCamera = nil; // Obtain iPhone video capture devices, such as front camera and rear camera...... NSArray *devices = [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo]; For (AVCaptureDevice * device in devices) {/ / get the rear camera if ([device position] = = AVCaptureDevicePositionBack) {inputCamera = device; } // Encapsulate the capture device as an AVCaptureDeviceInput object self.cCaptureDeviceInput = [[AVCaptureDeviceInput alloc]initWithDevice:inputCamera error:nil]; / / determine whether can join the rear camera as an input device if ([self. CCapturesession canAddInput: self. CCaptureDeviceInput]) {/ / add equipment to the session [self.cCapturesession addInput:self.cCaptureDeviceInput]; } self.cCaptureDataOutput = [[AVCaptureVideoDataOutput alloc]init]; / / set the discarded the final video frame to NO [self. CCaptureDataOutput setAlwaysDiscardsLateVideoFrames: NO]; [self.cCaptureDataOutput setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber] [self.cCaptureDataOutput setVideoSettings:[NSDictionary Withobject :[NSNumber numberWithInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange] forKey:(id)kCVPixelBufferPixelFormatTypeKey]]; / / set to capture agents and capture the queue [self. CCaptureDataOutput setSampleBufferDelegate: self queue: cCaptureQueue]; / / determine whether can add output if ([self. CCapturesession canAddOutput: self. CCaptureDataOutput]) {/ / add output [self. CCapturesession addOutput:self.cCaptureDataOutput]; } / / create a connection AVCaptureConnection * connection = [self. CCaptureDataOutput connectionWithMediaType: AVMediaTypeVideo]; / / set the connection direction [connection setVideoOrientation: AVCaptureVideoOrientationPortrait]; / / initialize layer self. CPreviewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession: self. CCapturesession]; / / set the video gravity [self cPreviewLayer setVideoGravity: AVLayerVideoGravityResizeAspect]; / / set the layer frame [self cPreviewLayer setFrame: self. The bounds]; / / add layer [self. The layer addSublayer: self. CPreviewLayer]; / / file is written to sandbox nsstrings * filePath = [[NSSearchPathForDirectoriesInDomains (NSDocumentDirectory NSUserDomainMask, YES) lastObject]stringByAppendingPathComponent:@"cc_video.h264"]; / / remove the existing files first [[NSFileManager defaultManager] removeItemAtPath: filePath error: nil]; / / a new file BOOL createFile = [[NSFileManager defaultManager] createFileAtPath: filePath contents: nil attributes: nil]; if (! createFile) { NSLog(@"create file failed"); } else { NSLog(@"create file success"); } NSLog(@"filePaht = %@",filePath); fileHandele = [NSFileHandle fileHandleForWritingAtPath:filePath]; // Initialize videoToolbBox [self initVideoToolBox]; // Start capturing [self.cCapturesession startRunning]; }Copy the code
7. Configuration of VideoToolBox video coding parameters
Next comes the videoToolbBox initialization process, including the configuration of some parameters for the video encoding. Things to do include ππ»
- Create an encoding session ππ» cEncodeingSession
- The parameters of the configuration code
7.1 Creating an encoding session
Create a coding session using C function is VTCompressionSessionCreate π π»
Explain the meanings of each parameter ππ»
- Parameter 1: allocator, set NULL to default allocation
- Parameter 2: resolution
width
, the unit ispixel
If the data is invalid, the system changes it to a reasonable value - Parameter 3: resolution
height
Same as above, - Parameter 4: encoding type, such as kCMVideoCodecType_H264
- Parameter 5: Coding specification. Setting NULL is optional for videoToolbox
- Parameter 6: Source pixel buffer properties. Set NULL to disallow videToolbox creation and create your own
- Parameter 7: Compressed data allocator. Set NULL, the default assignment
- Parameter 8: callback function. when
VTCompressionSessionEncodeFrame
After a call is compressed once, it is called asynchronously.
β οΈ note: when you set up the NULL, you need to call VTCompressionSessionEncodeFrameWithOutputHandler method is compressed frame processing, support iOS9.0 above
- Parameter 9: Callback the reference value defined by the customer, bridging self so that the C function can call the OC method
- Parameter 10: Encoding session variable
7.2 Parameters of the configuration code
Configuration encoded parameters also need to use the C function VTSessionSetPropertyππ»
This function is simple, and the parameters are interpreted as ππ»
- Parameter 1: the setting object of the configuration parameter
cEncodeingSession
- Parameter 2: Attribute name
- Parameter 3: The value of the property
7.3 Complete Initialization Code
// Initialize videoToolBox - (void)initVideoToolBox {dispatch_sync(cEncodeQueue, ^{frameID = 0; // Resolution: same as AVFoudation resolution int width = 480,height = 640; / / 1. Call VTCompressionSessionCreate create coding session OSStatus status = VTCompressionSessionCreate (NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH264, (__bridge void *)(self), &cEncodeingSession); NSLog(@"H264:VTCompressionSessionCreate:%d",(int)status); if (status ! = 0) { NSLog(@"H264:Unable to create a H264 session"); return ; } / / 2. The configuration parameters / / set the real-time encoding output (avoid delay) VTSessionSetProperty (cEncodeingSession kVTCompressionPropertyKey_RealTime, kCFBooleanTrue); // Discard VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_ProfileLevel,kVTProfileLevel_H264_Baseline_AutoLevel); VTSessionSetProperty(cEncodeingSession, cEncodeingSession); kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanFalse); Int frameInterval = 10; // Set the GOPsize interval. /** CFNumberCreate(CFAllocatorRef allocator, CFNumberType theType, const void *valuePtr) * allocator: Allocator kCFAllocatorDefault Default * theType: data type * *valuePtr: Pointer, address */ CFNumberRef frameIntervalRaf = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &frameInterval); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, frameIntervalRaf); // Set expected framerate, not actual framerate int FPS = 10; CFNumberRef fpsRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &fps); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_ExpectedFrameRate, fpsRef); // Bit rate: a large bit rate will be very clear, but at the same time the file will be large. Int bitRate = width * height * 3 * 4 * 8; int bitRate = width * height * 3 * 4 * 8; CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRate); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_AverageBitRate, bitRateRef); Byte int bigRateLimit = width * height * 3 * 4; CFNumberRef bitRateLimitRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bigRateLimit); VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_DataRateLimits, bitRateLimitRef); / / start coding VTCompressionSessionPrepareToEncodeFrames (cEncodeingSession); }); }Copy the code
For the calculation formula of bit rate, please refer to ππ» below
Viii. Realization of AVFoundation video Data Collection (3)
There are two nodes left in the process of video collection: stop capturing and video coding preparation.
8.1 Stopping Capture
Before using VideoToolBox for video coding, let’s go back to the process of video collection. Just now, we have realized the startCapture and stopped the unrealized ππ»
- (void)stopCapture {// Stop capturing [self.ccapturesession stopRunning]; // Remove the preview layer [self.cpreviewLayer removeFromSuperlayer]; // End videoToolbBox [self endVideoToolBox]; // Close fileHandele closeFile; fileHandele = NULL; }Copy the code
The end VideoToolBox code is ππ»
-(void)endVideoToolBox {
Β Β VTCompressionSessionCompleteFrames(cEncodeingSession, kCMTimeInvalid);
Β Β VTCompressionSessionInvalidate(cEncodeingSession);
Β Β CFRelease(cEncodeingSession);
Β Β cEncodeingSession = NULL;
}
Copy the code
8.2 Video coding preparation
The preparation, you should know, is definitely done in the delegate method of the output, and the output we’re using isAVCaptureVideoDataOutput
Its delegate isAVCaptureVideoDataOutputSampleBufferDelegate
.Get the video stream
The method that triggers is
-(void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer FromConnection :(AVCaptureConnection *)connection {// start video recording, get the camera video frame, Dispatch_sync (cEncodeQueue, ^{// This is unencoded/uncompressed video stream [self encode:sampleBuffer]; }); }Copy the code
But there is a problem, the video and audio data are collected by AVFoudation, and then handed over to this agent method! So how do you tell the difference between video and audio data? π π»
With the captureOutput object, determine whether it is AVCaptureVideoDataOutput or AVCaptureAudioDataOutput.
9. VideoToolBox Video Coding Implementation (1)
9.1 Encoding Function
As well as creating an encoding session, the video encoding function is also C function ππ»
The parameters are defined as ππ»
- Parameter 1:
Coding session
variable - Argument 2:
unencoded
data - Parameter 3: The one obtained
sample buffer
Presentation of dataThe time stamp
. Each timestamp passed to this session is greater than the previous display timestamp. - Parameter 4: The display time of the frame when the sample buffer data was retrieved. If no time information is available, set this parameter
kCMTimeInvalid
. - Parameter 5: frameProperties: Contains this
The attribute of the frame
. Frame changes affect subsequent encoding frames. - Parameter 6: ourceFrameRefCon: The callback will reference the frame you set
reference
. - Parameter 7: infoFlagsOut: Points to one
VTEncodeInfoFlags
To accept an encoding operation. If you are usingasynchronous
Run,kVTEncodeInfo_Asynchronous
Is set up;synchronous
Run,kVTEncodeInfo_FrameDropped
Is set up; Set up theNULL
I don’t want to accept this information.
9.2 Video encoding encode
- (void)encode:(CMSampleBufferRef)sampleBuffer {CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer); // Set the frame time, otherwise the timeline will be too long. CMTime presentationTimeStamp = CMTimeMake(frameID++, 1000); VTEncodeInfoFlags flags; / / code function OSStatus statusCode = VTCompressionSessionEncodeFrame (cEncodeingSession imageBuffer, presentationTimeStamp, kCMTimeInvalid, NULL, NULL, &flags); if (statusCode ! = noErr) { NSLog(@"H.264:VTCompressionSessionEncodeFrame faild with %d",(int)statusCode); / / end coding VTCompressionSessionInvalidate (cEncodeingSession); CFRelease(cEncodeingSession); cEncodeingSession = NULL; return; } NSLog(@"H264:VTCompressionSessionEncodeFrame Success"); }Copy the code
At this point the coding is complete, and the next two questions are ππ»
- Where can I get successfully encoded H264 stream data?
- What do you do after you’ve got the data encoded successfully?
9.3 Encoding callback completed
To answer question 1, when we configured sessioncEncodeingSession, we specified a callback function didCompressH264 to get the H264 stream data ππ»
void didCompressH264(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer)
Copy the code
Remember the H264 file format we talked about earlier? See below π π»
In NALU stream data, the 0 and 1 are SPS and PPS, which contain many key information such as parameters. Of course, we need to deal with this first, and to obtain SPS and PPS, we need to get key frames first. That’s problem 2: what you need to do once you get the data that you coded successfully.
9.3.1 Judgment of key frames
It is roughly divided into three steps ππ»
- from
sampleBuffer
Gets an array of data streams fromarray
CFArrayRef array = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true);
- from
array
Gets the object whose index value is 0
CFDictionaryRef dic = CFArrayGetValueAtIndex(array, 0);
- Determine whether
Key frames
bool isKeyFrame = ! CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync);
9.3.2 Obtaining the C function of SPS/PPS
- Parameter 1: image storage mode
- Parameter 2:0 Indicates the index value
- Parameters 3, 4, and 5: The transmission value is the address, and the output is the SPS/PPS parameter information
- Parameter 6: Output information. 0 is passed by default
9.3.3 Generating H264 files
void didCompressH264(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer) { NSLog(@"didCompressH264 called with status %d infoFlags %d",(int)status,(int)infoFlags); If (status! = 0) { return; } // Not ready if (! CMSampleBufferDataIsReady(sampleBuffer)) { NSLog(@"didCompressH264 data is not ready"); return; } // Convert ref (previously bridging self object) to viewConntroller ViewController *encoder = (__bridge ViewController *)outputCallbackRefCon; Bool keyFrame =! CFDictionaryContainsKey((CFArrayGetValueAtIndex(CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true), 0)), kCMSampleAttachmentKey_NotSync); // Get the SPS & PPS data only once, // SPS (sample per second /s) // if (keyFrame) {// Image storage mode, Description of encoder, the CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription (sampleBuffer); / / SPS was obtained from the zeroth index key frames size_t sparameterSetSize, sparameterSetCount; const uint8_t *sparameterSet; OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0); If (statusCode = = noErr) {/ / to get PPS size_t pparameterSetSize, pparameterSetCount; const uint8_t *pparameterSet; / / PPS was obtained from the first index key frames OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex (format, 1, & pparameterSet, &pparameterSetSize, &pparameterSetCount, 0); // SPS and PPS are successfully obtained. Prepare written documents if (statusCode = = noErr) {/ / PPS & SPS - > NSData NSData * SPS = [NSData dataWithBytes: sparameterSet length:sparameterSetSize]; NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize]; If (encoder) {// Write a file [encoder gotSpsPps: SPS PPS: PPS]; }}}} // There are other operations... }Copy the code
GotSpsPps: PPS: implementation, see figure ππ»
So you add the starting bit 00, 00, 00, 01
NSLog(@" gotspp %d %d",(int)[SPS length],(int)[PPS length]); // Add start bit 00 00 00 01 const char bytes[] = "\x00\x00\x00\x01"; Size_t length = (sizeof bytes) -1; NSData *ByteHeader = [NSData dataWithBytes:bytes length:length]; [fileHandele writeData:ByteHeader]; [fileHandele writeData:sps]; [fileHandele writeData:ByteHeader]; [fileHandele writeData:pps]; }Copy the code
X. VideoToolBox Video Coding Implementation (2)
SPS/PPS processing has been done above, followed by NALU stream data processing, which is the CMBlockBuffer shown below ππ»
The CMBlockBuffer summarizes the encoded data stream, which we need to capture and convert to H264 file format.
10.1 get CMBlockBuffer
C function ππ», of course
Very simple, just a code ππ»
CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
We can think of dataBuffer as an array that we need to iterate over to get the data inside. How do I traverse it? Three conditions are required ππ»
- The length of a single element
- The length of the total data
- The starting address
The C function is then used to obtain ππ»
CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); size_t length,totalLength; Char *dataPointer; // According to the length of the single data, the length of the entire NALU stream, and the first address of the data, Can traverse the entire data flow for processing -- > can be interpreted as time array OSStatus statusCodeRet = CMBlockBufferGetDataPointer (dataBuffer, 0, & length, & totalLength. &dataPointer); If (statusCodeRet == noErr) {Copy the code
10.2 Big-endian Mode and small-endian Mode
Before iterating through the data, there is a problem to consider ππ» big-endian mode & little-endian mode.
In computer hardware, data can be stored in two ways: big-endian and small-endian.
Big endian byte order
:high
Bytes inIn front of the
.low
Bytes inbehind
Small endian byte order
:low
Bytes inIn front of the
.high
Bytes inbehind
For example, in hexadecimal data 0x01234567, the big-endian byte order is 01, 23, 45, 67, and the small-endian byte order is 67, 45, 23, 01.
Why do we have little endian order?
Because computer circuits deal with the low bytes first, efficiency will be higher! Therefore, the internal processing of the computer starts from the low byte, and the human reading and writing habit is big endian, so, except inside the computer, the general situation is to keep the big endian.
10.3 Iterating NALU Data
There are two ways to iterate, one is through the pointer p++ offset to operate, one is through the step size offset operation, we use the latter here, the code is ππ»
size_t bufferOffset = 0; static const int AVCCHeaderLength = 4; // The first 4 bytes of nALu data returned are not 001 startCode, but the frame length of large-endian mode. While (bufferOffset < Totallength-avCCheaderLength) {uint32_t NALUnitLength = 0; // Read nalu memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength); NALUnitLength = CFSwapInt32BigToHost(NALUnitLength); NALUnitLength = CFSwapInt32BigToHost(NALUnitLength); NSData *data = [[NSData alloc]initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength]; // Write nALu data to a file [encoder gotEncodedData:data isKeyFrame:keyFrame]; BufferOffset += AVCCHeaderLength + NALUnitLength; }Copy the code
10.4 full version didCompressH264
Full version code ππ»
void didCompressH264(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer) { NSLog(@"didCompressH264 called with status %d infoFlags %d",(int)status,(int)infoFlags); If (status! = 0) { return; } // Not ready if (! CMSampleBufferDataIsReady(sampleBuffer)) { NSLog(@"didCompressH264 data is not ready"); return; } ViewController *encoder = (__bridge ViewController *)outputCallbackRefCon; Bool keyFrame =! CFDictionaryContainsKey((CFArrayGetValueAtIndex(CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true), 0)), kCMSampleAttachmentKey_NotSync); // Get the SPS & PPS data only once, // SPS (sample per second /s) // if (keyFrame) {// Image storage mode, Description of encoder, the CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription (sampleBuffer); / / SPS was obtained from the zeroth index key frames size_t sparameterSetSize, sparameterSetCount; const uint8_t *sparameterSet; OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0); If (statusCode = = noErr) {/ / to get PPS size_t pparameterSetSize, pparameterSetCount; const uint8_t *pparameterSet; / / PPS was obtained from the first index key frames OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex (format, 1, & pparameterSet, &pparameterSetSize, &pparameterSetCount, 0); // SPS and PPS are successfully obtained. Prepare written documents if (statusCode = = noErr) {/ / PPS & SPS - > NSData NSData * SPS = [NSData dataWithBytes: sparameterSet length:sparameterSetSize]; NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize]; If (encoder) {// Write a file [encoder gotSpsPps: SPS PPS: PPS]; } } } } CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); size_t length,totalLength; Char *dataPointer; // According to the length of the single data, the length of the entire NALU stream, and the first address of the data, Can traverse the entire data flow for processing -- > can be interpreted as time array OSStatus statusCodeRet = CMBlockBufferGetDataPointer (dataBuffer, 0, & length, & totalLength. &dataPointer); if (statusCodeRet == noErr) { size_t bufferOffset = 0; static const int AVCCHeaderLength = 4; // The first 4 bytes of nALu data returned are not 001 startCode, but the frame length of large-endian mode. While (bufferOffset < Totallength-avCCheaderLength) {uint32_t NALUnitLength = 0; // Read nalu memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength); NALUnitLength = CFSwapInt32BigToHost(NALUnitLength); NALUnitLength = CFSwapInt32BigToHost(NALUnitLength); NSData *data = [[NSData alloc]initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength]; // Write nALu data to a file [encoder gotEncodedData:data isKeyFrame:keyFrame]; BufferOffset += AVCCHeaderLength + NALUnitLength; }}}Copy the code
Then there is gotEncodedData: isKeyFrame: the implementation of the methods π π»
- (void)gotEncodedData:(NSData*)data isKeyFrame:(BOOL)isKeyFrame { NSLog(@"gotEncodeData %d",(int)[data length]); if (fileHandele ! < span style = "max-width: 100%; clear: both; min-height: 1em; The current NAL ends. /* To prevent 0x000001 from occurring in NAL, H.264 proposed the 'Competitive Emulation Prevention' mechanism. If a NAL was detected to have two consecutive 0x00 bytes, a 0x03 was inserted. When the decoder detects 0x000003 within NAL, 0x03 is discarded and the original data is restored. Generally speaking, there are two ways to package the code stream of H264. One is the annex-b Byte stream format, which is the default output format of most encoders. The first 3 to 4 bytes of each frame are H264's start_code,0x00000001 or 0x000001. The first few bytes (1,2,4 bytes) are the length of NAL instead of start_code. In this case, we must use some global data to get the coders profile,level,PPS,SPS and other information before decoding. */ const char bytes[] ="\x00\x00\x00\x01"; Size_t length = (sizeof bytes) -1; NSData *ByteHeader = [NSData dataWithBytes:bytes length:length]; // Write the header byte [fileHandele writeData:ByteHeader]; // Write H264 data [fileHandele writeData:data]; }}Copy the code
conclusion
- H264 structure and code stream analysis
- The H264 structure
- Video images are encoded after ππ»
frame
slice
π π»One slice or more slices
compositionframe
The macro block
ππ» one or moreMacroblock (MB)
compositionslice
- Video images are encoded after ππ»
- H264 coding layer
- NAL layer: Β (Network Abstraction Layer)
- VCL Layer :(Video Coding Layer)
- stream
- SODB:(String of Data Bits, original Data Bits)
- RBSP:(Raw Byte Sequence Payload,SODB+trailing bits)
- EBSP:(Encapsulate Byte Sequence Payload)
- NALU: NAL Header(1B)+EBSPThis is π π»
Focus on
- NAL Unit
- NAL Unit = one NALU header + one slice
- Slice = slice head + slice data
- Slice data = macro block +… + macro block
- Macroblock = type + prediction + residual data
- The H264 structure
- VideoToolBox
- IOS8.0 after the launch of the native hardcoding framework, based on
Core Foundation
C language - Basic data structure ππ»
CMSampleBuffer
- Unencoded π π»
CVPixelBuffer
- Encoded π π»
CMBlockBuffer
- Unencoded π π»
- Coding process ππ»
CVPixelBuffer
Video Encoder ->CMBlockBuffer
-> H264 file format - The H264 file
- The H264 file format is the NALU stream data type
- The sequence of frames ππ» SPS + PPS + I B P frames
- Identify I, B, P frames
hexadecimal
The conversionbinary
binary
takeFour to eight
And then converted intoThe decimal system
- Decimal result reference
table
- IOS8.0 after the launch of the native hardcoding framework, based on
- NALU unit data in detail
- NALU = NAL Header(1 Byte) + NAL Body
- NAL Header parsing
- 1 byte, that is, 8 bits
- A: 0
F
The value must be 0 - 1-2:
NRI
Importance ππ» 000 the most useless, 111 the most useful - 3-7:
TYPE
Type, that’s how it worksDetermine the frame type
Frame I, frame B, frame P- 5 indicates I frame
- 7 represents the SPS sequence parameter set
- 8 represents PPS image parameter set
- A: 0
- 1 byte, that is, 8 bits
- NAL type
- Single type: an RTP packet contains only NALU, that is, H264 frames contain only one slice
- Combination type: An RTP contains multiple NALUs, such as PPS or SPS
- Sharding type: An NALU unit is divided into multiple RTP packets
- First byte :FU Indicator Fragment unit indicator
- Second byte :FU Header The fragment unit Header, which has multiple slices
- FU Header
S
:Βstart bit
Used to indicate shardingstart
E
:Βend bit
Used to indicate shardingThe end of the
R
: Unused. Set it to0
Type
: Indicates shardingNAL type
, it isKey frames
orNon-critical frame
, it issps
orpps
- NALU unit transmission complete identification
- The S and E packets are received. Procedure
- The ones in the middle
The serial number is continuous
- YUV color system
- Also known as
YCbCr
, is a color coding method used in television systems Y
Said:brightness
, that is,Gray-scale values
It is the basic signalThe U and V
This is thetachroma
UV is used to describe imagesColor saturation
, which are used to specifyPixel color
- YUV common format
- Yuv4:2:0 (YCbCr 4:2:0) ππ» is half less than RGB
- Yuv4:2:2 (YCbCr 4:2:2) ππ» is one-third less than RGB
- Yuv4:4:4 (YCbCr 4:4:4) ππ» can be interpreted as 1:1:1
- YUV storage format
- Planar formats
- I420:YUV420P (
PC
Dedicated) - YV12:YUV420P
- I420:YUV420P (
- Packed formats
- NV12:YUV420SP (
iOS
The default) - NV21:YUV420SP (
The android
The default)
- NV12:YUV420SP (
- Planar formats
- Also known as
- AVFoundation video data acquisition implementation
The whole process
ππ» Data collection -> Encoding complete -> H264 file -> Write sandbox/network transferThe data collection
π π» based onAVFoudation
The framework- The output source
AVCaptureVideoDataOutput
, need to followAVCaptureVideoDataOutputSampleBufferDelegate
- The queue
synchronous
Complete 2 things ππ»capture
Β εΒcoding
- The compression mode of pixels captured in video is YUV4:2:0
kCVPixelBufferPixelFormatTypeKey : kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
- The output source
Video coding
π π» based onVideoToolBox
The framework- Initialize videoToolbBox
- Create an encoded session ππ»
VTCompressionSessionCreate
- Parameter of configuration code ππ»
VTSessionSetProperty
- Real-time encoding
kVTCompressionPropertyKey_RealTime
- Abandon B frame
kVTCompressionPropertyKey_ProfileLevel
- Produce B frame
kVTCompressionPropertyKey_AllowFrameReordering
- Key frame (GOPsize) interval
kVTCompressionPropertyKey_MaxKeyFrameInterval
- Expect frame rate
kVTCompressionPropertyKey_ExpectedFrameRate
- Rate limit
kVTCompressionPropertyKey_DataRateLimits
- The mean rate
kVTCompressionPropertyKey_AverageBitRate
- Real-time encoding
- Create an encoded session ππ»
- Initialize videoToolbBox
- VideoToolBox video coding
- Stop to catch
- Stop capturing sessions
- Remove preview layer
- End videoToolbBox
- Close the file
- Pre-coding preparation
- The timing of the coding points π π» AVCaptureVideoDataOutputSampleBufferDelegate method
-(void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
- The timing of the coding points π π» AVCaptureVideoDataOutputSampleBufferDelegate method
- coded
- Get each unencoded frame ππ»
CMSampleBufferGetImageBuffer
- Encoding function ππ»
VTCompressionSessionEncodeFrame
- Get successfully encoded H264 stream data π
- Encoding complete callback ππ»
VTCompressionSessionCreate
Is the specified callback function- from
sampleBuffer
Gets an array of data streams fromCMSampleBufferGetSampleAttachmentsArray
- from
array
Where the index value is 0CFDictionaryRef
dic - Judgment key frame
! CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync)
- from
- Encoding complete callback ππ»
- Generate H264 file format
- For SPS/PPS π π»
CMVideoFormatDescriptionGetH264ParameterSetAtIndex
- Written to the file
- NSData is read based on size and address pointer
- Configure the Header
- Add start bit
"\x00\x00\x00\x01"
- To get rid of
\ 0
terminator
- Add start bit
- Write sequence ππ» Header + spsData + Header + ppsData
- Get CMBlockBuffer π π»
CMSampleBufferGetDataBuffer
- Traverse the CMBlockBuffer to obtain nALU data
- Length of individual elements + length of total data +
The starting address
Pointer offset traversal - Switch from big-endian mode to small-endian mode (default small-endian mode on MAC)
- Length of individual elements + length of total data +
- Writes nALU data to a file
- Configure the Header as you would write SPS/PPS
- Write sequence ππ» Header + NALData
- For SPS/PPS π π»
- Get each unencoded frame ππ»
- Stop to catch
CC teacher _HelloCoder audio and video learning from zero to whole