If you have worked on audio and video projects, you are sure to understand the concepts related to H.264. Why am I writing this article? One is to sum up the knowledge, and the other is to give a reference to the students who just started the audio and video.
Basic concept
H.264, also known as MPEG-4, is a block-oriented video coding standard based on motion compensation. It is the most commonly used video coding format in the market at present. In the Android. You can use MediaCodec createEncoderByType (” video/avc “) to create the form of an encoder, It is also possible to create an encoder by soft coding AVcodec_find_encoder (AV_CODEC_ID_H264)/avcodec_find_encoder_by_name(“libx264”). Since the topic of this article is analyzing THE H.264 stream, I won’t go into much more about how to encode it. Now let’s get straight to some common concepts.
GOP
A Group Of pictures formed between two I frames is the concept Of GOP (Group Of Pictures).
The I frame
I frame is also called the video keyframe, you can think of it as the complete image of a frame, you can directly take the I frame to decode
Features:
1, it is a full frame compression coding frame, it will be the whole frame image information JPEG compression coding and transmission
2. Only I frame data can be used to reconstruct the complete image when decoding
3. I frame describes the details of the image background and the moving subject
4. I frames are generated without reference to other frames
5. I frame is the reference frame of P/B frame (its quality directly affects the quality of the next frame in the same group)
6. I frame is the first frame of GOP frame group, and there is only one I frame in a group
7. I frames do not need to consider motion vectors
8. I frame occupies a large amount of data information
B frame
B frame is also known as bidirectional differential frame, that is, the difference between this frame and the front and back frames. In plain English, it means to decode B frame, not only to get the cached picture before, but also to decode the picture after, and restore the final picture through the superposition of the front and back pictures.
Features:
1. B frame is predicted by the preceding I frame or P frame and the following P frame
2. B frame transmits the prediction error and motion vector before I frame or P frame and after P frame
3. B frame is a bidirectional predictive coding frame
4. B frame has the highest compression rate, because it only reflects the change of the motion subject between the reference frames, so the prediction is more accurate
5. B frame is not a reference frame and will not cause the spread of decoding errors
P frame
P frame is also called pre-predictive coding frame. P frame represents the difference between this frame and the previous I or P frame. When decoding, the difference defined in this frame needs to be superimposed on the cached picture to generate the final picture.
Features:
1. P frame is an encoding frame separated by 1 to 2 frames after FRAME I
2. P frame adopts the method of motion compensation to transmit the true difference between it and the preceding I or P and the motion vector
3. A complete P frame image can be reconstructed only after summing the predicted value and prediction error in I frame during decoding
4. P frame belongs to forward prediction inter-frame coding. It only refers to the nearest I or P frame
5. The P frame can be the reference frame of the following P frame or the reference frame of the B frame before and after it
6. Since the P-frame is a reference frame, it may cause a proliferation of decoding errors
7, because it is a difference transmission, so the compression rate of P frame is relatively high
Now that the basic frame concepts are understood, let’s analyze the stream based on the code
Code flow analysis
H.264, also known as bare stream, is composed of multiple NALUs. If the Slice corresponding to NALU is the start of a frame, it is represented by 4 bytes, 0x00 00 00 01, otherwise by 3 bytes, 0x00 00 01. To analyze the H.264 bit stream, first search the starting bit from the stream, i.e. 0x00 00 00 01 or 0x00 00 01, then separate the NALU, and finally parse the fields. NALU Header type NALU Header type
type | instructions |
---|---|
0 | keep |
1 | A segment in a non-IDR image that does not use data partitioning |
2 | A segment of class A data in A non-IDR image |
3 | A fragment of class B data in a non-IDR image |
4 | A fragment of class C data in a non-IDR image |
5 | Fragment of IDR image |
6 | Supplementary Confidence Enhancement (SEI) |
7 | SPS (Sequence parameter Set) |
8 | PPS (Image parameter Set) |
9 | separator |
10 | End of sequence |
11 | End of flow |
12 | Fill in the data |
13 | Sequence parameter set extension |
14 | The NAL unit is prefixed |
15 | Subsequence parameter set |
16 to 18 | keep |
19 | Auxiliary encoding image fragments without data partitioning |
20 | Code fragment extension |
21-23 | keep |
24 to 31 | keep |
In the actual development of the most used is 1, 5, 7, 8, we use the code to analyze:
The code analysis
The code is very simple, just read the file, search for the start code and read it byte by byte, so let’s just post the results
How to learn more
Systematic audio and video entry level information is very few, a fresh graduate white may be difficult to understand, because audio and video involves a lot of theoretical knowledge, and code writing needs to combine these theories, so it is very important to understand audio and video, coding and decoding and other theoretical knowledge.
I have been in touch with audio and video projects since my internship. I have read many people’s articles and found an open source project with a star of 6.8K on GitHub by chance. I would like to share it with you here, so that more students who are preparing to learn audio and video can get started faster.
Here are some sections of the development document:
Stage 1: Android multimedia
Chapter 1: Three ways to draw pictures
Chapter 2 AudioRecord PCM audio
Chapter 3 AudioTrack Plays PCM audio
Chapter 4 Camera Video capture
MediaExtractor MediaMuxer realizes video unencapsulation and synthesis
Chapter 6 MediaCodec hardmarshalling process and practice
Phase 2: OpenGL ES
Chapter 7 Basic concepts of OpenGL ES
Chapter 8 GLSL and Shader rendering process
Chapter 9 OpenGL ES drawing plane graphics
Chapter 10 GLSurfaceView source code parsing &EGL environment
Chapter 11 OpenGL ES Matrix Transformation and coordinate system
Chapter 12 OpenGL ES Textures
Chapter 13 OpenGL ES Filter
Chapter 14 OpenGL ES Real-time Filter
Chapter 15 OpenGL ES particle System – Fountain
Chapter 16 OpenGL ES particle Effects – Fireworks Explosion
Phase 3: JNI&NDK
Chapter 17: Learning and using JNI and NDK
Chapter 18 JNI – Reference types, exception handling, function registration
Chapter 19 NDK construction methods NDK-build and cmake
Chapter 20 Pointers, Memory Models, references
Chapter 21 operator overloading, inheritance, polymorphism, templates
Chapter 22 containers for STL
Subseries algorithm
Chapter 23 algorithm series – Bubble sort
Chapter 24 algorithm series – Quicksort
Chapter 25 Algorithm series – Heap sort
Chapter 26 algorithms series – Selection, insertion sort, and implementation of SORT in STL
Chapter 27 algorithm sequence – binary search tree
Chapter 28 algorithm sequence – balanced binary tree
Chapter 29 algorithm sequence – hash table
Stage 4: FFmpeg
Chapter 30 Audio and video basics
Chapter 31 FFMPEG Common commands
Chapter 32 FFMPEG +OPENSL ES audio decoding and playback
Chapter 33 FFMPEG + OPENGLES decoding and playing video
Friends in need can [Click here toCome to me for free.
summary
Audio and video industry has been developing for years, with the emergence of the mobile terminal in recent years, more and more audio and video APP, the audio and video to a climax, but as a result of the audio and video learning cost is very high, many developers, in order to closely to the pace of The Times, need friends can free access to the information above, to break the “high threshold” of audio and video, I hope we can make progress together.
In a word, audio and video have been strong rise, I believe that the next decade must be the decade of audio and video. And combining audio and video technology with computer vision and artificial intelligence will lead the next two decades.
I will share more relevant articles in the future, pay attention to me not to get lost!
Now is the best time to learn audio and visual technology, we must seize the opportunity to keep pace with The Times, so that they can have a great success in the future.