H264 structure diagram:
After the H264 video compression would be a sequence of frame, frame contains images, the image is divided into many pieces, each piece can be divided into macro block, each macro block is composed of many sub-block H264 structure, a video image encoded data is called a frame, a frame consists of a piece (slice) or more piece, one piece by one or more of the macro block (MB), A macro block consists of 16×16 yuv data. Macroblock is the basic unit of H264 coding.
- Field and Frame: A scene or frame of a video can be used to produce an encoded image. In television, each TV frame is created by scanning the screen twice, with the lines of the second scan filling the gap left by the first. Each scan is called a field. So a 30 frames per second TV picture is actually 60 frames per second
- Slice: In each image, several macroblocks are arranged into slices. Purpose of chip: to limit the spread and transmission of error code, so that the chip is independent of each other. There are five types of slices: I slices (containing only I macroblocks), P slices (P and I macroblocks), B slices (B and I macroblocks), SP slices (for switching between different encoding streams) and SI slices (special types of encoding macroblocks).
- Macro blocks: An encoded image is first divided into multiple blocks (4×4 pixels) before processing. Obviously, the macro block should be made up of integer blocks, usually 16×16 pixels. Macroblock is divided into I, P and B macroblock. I macroblock can only use the pixels decoded in the current piece as reference for intra-frame prediction. P macroblock can use the previously decoded image as the reference image for intra-frame prediction. B macro block is the use of forward and backward reference graph for intra-frame prediction
H264 coding layer
- NAL Layer: (Network Abstraction Layer) : As long as H264 is transmitted over the network, each Ethernet packet is 1500 bytes in the transmission process, while H264 frames are often larger than 1500 bytes. Therefore, it is necessary to unpack a frame into multiple packets for transmission. All unpack or group packets are processed through the NAL layer.
- VCL Layer (Video Coding Layer) : Compress Video raw data
The basic concept of codestream
- SODB:(String of Data Bits) : Generated by the VCL layer. Data length is not necessarily a multiple of 8, so it is difficult to process
- RBSP:(Raw Byte Sequence Payload,SODB+trailing bits, data stream after encoding) RBSP:(Raw Byte Sequence Payload,SODB+trailing bits, data stream after encoding
- EBSP:(Encapsulate Byte Sequence Payload) : After generating encoded data streams, we also need to add a start bit before each frame, which requires the developer to manually add. The start bit is usually 0001 in hexadecimal. But in the whole encoded data, there might be two 0x00 in a row. So that’s a conflict with the starting bit. So what happened? The H264 specification states that if two consecutive 0x00 values are processed, an additional 0x03 value is added. This prevents the compressed data from colliding with the starting bit
- NALU: (NAL Header(1B)+EBSP).NALU is a network Header added 1B to EBSP.
NALU parsing
-
NALU header structure: NALU type (5bit), importance indicator bit (2bit), forbidden bit (1bit) : The first bit is forbidden bit, and the default is fixed at 0. If the received bit is 1, the unit needs to be discarded. Bits 2-3 indicate importance, 00 indicates least important, and 11 indicates most important. We can discard some unimportant units when decoding is too late. Bits 4-8 are used to indicate the type of NALU
-
NALU types: 1 to 12 are used by H.264, 24 to 31 are used by applications other than H.264. The following figure shows the NALU types:
-
The relation between slice and macro block: Each slice includes slice head and slice data, and each slice data contains many macro blocks. Each macro block includes macro block type, macro block prediction, residual effect:
-
Slice head: Contains information about a set of slices, such as the number of slices, order, etc