A summary of H264
H264 is a highly compressed digital Video codec standard proposed by the Joint Video Team (JVT) jointly formed by ITU-T Video Coding Expert Group (VCEG) and ISO/IEC Dynamic Image Expert Group (MPEG). Also known as H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC) is just one thing,
2 the H264 origin
Itu-t invented H261 in 1988, AND ISO/IEC-MPEG invented MPEG-1 in 1991, both of which are standard algorithms for video compression, but these two are not compatible, they are different things. With the development of audio and video, in 2003, the two bodies reached a consensus, Combined into an algorithm, H264/MPEG-4AVC, commonly known as H264.
Three why to compress the video file, that is, use H264
So if you have a 1 second video, 30 frames, 1920 x 1080 resolution, you’re going to have 8 Y’s per second for 2 U’s and 2 Y’s per second
1920x1080x30x1.5(YUV420 format is 1.5 bytes)/1024/1024 = 118MB (because the video is YUV not ARB)
So if we were to watch online video, we would have to download 118 megabytes per second to watch the video. In today’s world of short videos, no.
Which places in the four videos can be compressed
- Spatial redundancy: For example, if a frame is all white, I can save the color of one pixel
- Time redundancy: For example, in a slow-motion video, two frames are basically the same, and only one frame of name can be saved
- Visual redundancy: Human vision does not always perceive any change in an image,
- Knowledge redundancy: The structure of regularity can be derived from prior knowledge and background knowledge
Four H264 compression principle
- Intra – frame compression: The use of intra – frame predictive coding based on the convergence of pixels within a frame
- Interframe compression: Generates megabytes of optimally matched macroblocks from the current macroblock to the reference frame using macroblock-based motion compensation predictive coding techniques
4.1 the macroblock
Such as the one from left to right the picture of the black and white gradient We don’t have to put all the pixels to save it, we can put all of the pixels, divided into 1616 squares, we just remember in the 1616 square pixel values of the top and left, width is high, the start and stop position, so we can calculate all the pixels. That’s a lot less memory, so if you have a lot of 16 by 16 images, you’re going to compress them into a smaller volume,
4.1 key frames
- I frame: I frame can be regarded as the product of an image after compression. It can be decompressed into a single complete picture by video decompression algorithm. It’s about 1/6 compressed
- P frame: for reference with I frame, there are only motion vectors and differences, which are compressed by 1/20. P frame can also be used as reference, on the premise that there is I frame in front of this P frame. If the live broadcast has just come in, BPBBI, this P frame is useless even as a reference, it must have I frame
- B frames: B frames are predicted by the I or P frames that precede and the P frames that follow. It’s just the vector, 1/50
4.2 GOP
It’s the number of frames between two I frames. That is a video for the first frame must be the I frame, if not I frame, will be black screen for a moment, because the I frame is a frame, such as broadcast, we came in, if is P frame will be black screen, until the first the I frame, the less in the short video I frame, the GOP value, the greater the compression of the smaller size, but in the process of live, We cannot set the GOP so large, because if it is large, the probability of black screen will be increased. So live GOP is relatively small
4.3 the SPS and PPS
SPS configuration information (width and height frame rate resolution, encoding mode, etc.), PPS (width and height only) that is, the I frame must be preceded by SPS and PPS frames, each of which is segmented by 0x000001
4.4 Overall structure of bit stream
It is divided into two layers: video coding layer (VCL) and network abstraction layer/network abstraction layer (NAL).H.264’s encoded video sequence consists of a series of NAL units, each containing an RBSP. An original H.264 consists of N NALU cells. NALU cells are usually composed of [StartCode] [NALU Header] [NALU Payload]. The StartCode is used to indicate the Start of a NALU cell. Must be “00 00 00 01” or “00 00 01”
4.5 NAL Transmission on networks
The delimiter 0x000001 is in NAL. If the Slice corresponding to NALU is the beginning of a frame, it is represented by 4 bytes, i.e. 0x00000001; Otherwise, use 3 bytes, 0x000001. NAL Header:forbidden_bit, nal_reference_bit(priority), nal_unit_type(type). Unshell operation: To keep the NALU body from including the start code, a byte 0x03 is inserted for every two (consecutive) bytes of 0 encountered during encoding to distinguish it from the start code. When decoding, the corresponding 0x03 is deleted.
Five Android underlying video codec DSP chip
For example, an I frame will go directly to the transmission encoder. When an I frame comes, it will be cached in the transmission buffer. After a P frame comes, I frame comes out after P frame, so the sequence of H264 code stream and video playback sequence must be different
What video encoding uses YUV instead of RGB
- Rgb principle: the definition of Rgb is designed from the principle of color light, by red, green, blue three lights, when their light overlap with each other, color mixed, and brightness is equal to the sum of the brightness of the two lights (the brightness of the two lights! , the more mixed, the higher the brightness, that is, additive mixing. RGB24 means that R, G, and B are 8 bits each, which is 3 bytes
1280 * 720 * 3(3 bytes per pixel) / 1024/1024 = 2.63MB
- YUV works by separating brightness from chroma, and the human eye is more sensitive to brightness than chroma. YUV three letters, in which “Y” refers to brightness (Lumina NCE or Luma), that is, gray scale value; The “U” and “V” stand for Chrominance or Chroma, which describes the color and saturation of an image and specifies the color of a pixel.
, YUV is mainly used to optimize the transmission of color video signals. Compared with RGB video signal transmission, its biggest advantage is that it only needs to occupy a very small bandwidth (RGB requires three independent video signals to be transmitted at the same time), where “Y” stands for brightness, namely gray order value; The “U” and “V” are for chroma
1280 * 720 * 2(2 bytes per pixel) / 1024/1024 = 1.76MB Using YUV4:2:2 saves 1/3
Use MediaCodec to play h254 data in Android
MediaCodec is the underlying decoder for accessing audio and video decoding provided by Android. The underlying decoder is the DSP chip. All apps share a DSP chip. We can use FFMPEG to extract H264 from the video. With the following command, we use
ffmpeg -i demo.mp4 -vcodec hevc out.h265
// Decoder, specify H264 format
MediaCodec mediaCodec = MediaCodec.createDecoderByType("video/avc");
// The decoded format is 264 wide and high
MediaFormat mediaFormat = MediaFormat.createAudioFormat("video/avc".100.100);
// Set the color format
mediaFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
// Set the frame rate
mediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 0);
// If it is displayed directly on the surfaceView, pass in the surface; if you want to use decoded data, write null
mediaCodec.configure(mediaFormat, surface, null.0);
// We need to decode each frame in a child thread
@Override
public void run(a) {
byte[] bytes = h64 filebyteThe array;// Is the input buffer from the underlying decoder
ByteBuffer[] inputBuffers = mediaCodec.getInputBuffers();
// start indexing, because the previous must have a file format, so arbitrarily add a value
int startFrameIndex = 0;
/ / total size
int total = bytes.length;
while(true) {if(total == 0 || startFrameIndex>=total){
// If the next position is greater than totalSize, the constitution
break;
}
// Find the point to write a frame, since we know that each frame is split by 0x0000001
int nextFrameStart = 0;
for((int i = startFrameIndex+2; i<total; i++){if (bytes[i] == 0x00 && bytes[i + 1] = =0x00 && bytes[i + 2] = =0x00 && bytes[i + 3] = =0x01)||(bytes[i] == 0x00 && bytes[i + 1] = =0x00 && bytes[i + 2] = =0x01)) {
nextFrameStart = i;
break;
}
}
MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
int index = mediaCodec.dequeueInputBuffer(10000);
if(index>=0){
ByteBuffer byteBuffer = inputBuffers[index];
byteBuffer.clear();
byteBuffer.put(bytes, startFrameIndex, nextFrameStart - startFrameIndex);
mediaCodec.queueInputBuffer(startFrameIndex, 0, nextFrameStart - startFrameIndex, 0.0);
startFrameIndex = nextFrameStart;
}else{
// No app is available
continue;
}
int outIndex= mediaCodec.dequeueOutputBuffer(info, 10000);
if (outIndex >= 0) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
mediaCodec.releaseOutputBuffer(outIndex, true); }}}Copy the code
Encode the recording screen into H264
public class MainActivity2 extends AppCompatActivity {
public static final String TAG = "MainActivity2";
private MediaProjectionManager mediaProjectionManager;
private MediaProjection mediaProjection;
private MediaCodec mediaCodec;
private Surface surface;
// After 30 to record the screen
// If you want to edit an MP4 file, you will have to edit the H264 data
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main2);
mediaProjectionManager = (MediaProjectionManager) getSystemService(Context.MEDIA_PROJECTION_SERVICE);
Intent captureIntent = mediaProjectionManager.createScreenCaptureIntent();
startActivityForResult(captureIntent, 100);
checkPermission();
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (requestCode == 100 && resultCode == Activity.RESULT_OK) {
// Get mediaProjection to record the screen
mediaProjection = mediaProjectionManager.getMediaProjection
(resultCode, data);
//initMediaCodec(); }}// H264 a frame of data is a NALU unit
private void initMediaCodec(a) {
try {
// createEncoderByType is the encoding, encoding a YUV video as H264 will also have delimiter, the first frame will always be the SPS PPS output as a frame
// It must be the SPS PPS before a video will appear many times with the SPS 0000000167 PPS 68EFBCB0 000001 I frame
// createDecoderByType decodes an H264 into a YUV video
mediaCodec = MediaCodec.createEncoderByType("video/avc");
MediaFormat format = MediaFormat.createVideoFormat(MediaFormat.MIMETYPE_VIDEO_AVC, 540.960);
format.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
// 15 frames per second
format.setInteger(MediaFormat.KEY_FRAME_RATE, 15);
// An I frame every 2 seconds
format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 2);
// The higher the bit rate, the clearer the video. Control the sharpness of the encoded or decoded video picture, and the length of the encoded frame will be different
format.setInteger(MediaFormat.KEY_BIT_RATE,1200 _000);
// Create a virtual Surface
surface = mediaCodec.createInputSurface();
// Whether Surface is displayed on Surface
/ / encryption
/ / to coding
mediaCodec.configure(format, null.null, MediaCodec.CONFIGURE_FLAG_ENCODE);
new Thread() {
@Override
public void run(a) {
super.run();
mediaCodec.start();
// Associate the screen recording mediaProjection with the Surface, and drop every recorded frame onto the Surface
// 1 = dp = 1 pixel
// VIRTUAL_DISPLAY_FLAG_PUBLIC Virtual Surface
mediaProjection.createVirtualDisplay("screen-codec".540.960.1, DisplayManager.VIRTUAL_DISPLAY_FLAG_PUBLIC, surface, null.null);
MediaCodec.BufferInfo bufferInfo = new MediaCodec.BufferInfo();
// https://blog.csdn.net/mcsbary/article/details/95883818
// We need to output a frame of data from input 264
// Just focus on the output
while (true) {
// a 10ms timeout returns the data index. BufferInfo will size the data when we execute this sentence
int index = mediaCodec.dequeueOutputBuffer(bufferInfo, 100000);
if (index > 0) {
// Here is compressed data and what we are doing here is converting screen recording into compressed data h264 and here is encode
/ / if we get the h264 data, to decode MediaCodec. CreateDecoderByType, so here is the yuv video data
ByteBuffer buffer = mediaCodec.getOutputBuffer(index);
// The ByteBuffer is not operable by the DSP chip. It can only be read and destroyed at last
byte[] outData = new byte[bufferInfo.size];
buffer.get(outData);
// Write to a file as a string so that we can look at it
writeContent(outData);
// Writes a file in hexadecimal format so that it can be played
writeBytes(outData);
} else {
// -1 is failure
Log.e(TAG,"1");
}
}
}
}.start();
} catch(IOException e) { e.printStackTrace(); }}public boolean checkPermission(a) {
if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.M && checkSelfPermission( Manifest.permission.WRITE_EXTERNAL_STORAGE) ! = PackageManager.PERMISSION_GRANTED) { requestPermissions(new String[]{
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE
}, 1);
}
return false;
}
public void writeBytes(byte[] array) {
FileOutputStream writer = null;
try {
// Open a file writer. The second argument in the constructor, true, means to append the file
writer = new FileOutputStream(Environment.getExternalStorageDirectory() + "/codec.h264".true);
writer.write(array);
writer.write('\n');
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if(writer ! =null) { writer.close(); }}catch(IOException e) { e.printStackTrace(); }}}public String writeContent(byte[] array) {
char[] HEX_CHAR_TABLE = {
'0'.'1'.'2'.'3'.'4'.'5'.'6'.'7'.'8'.'9'.'A'.'B'.'C'.'D'.'E'.'F'
};
StringBuilder sb = new StringBuilder();
for (byte b : array) {
sb.append(HEX_CHAR_TABLE[(b & 0xf0) > >4]);
sb.append(HEX_CHAR_TABLE[b & 0x0f]);
}
Log.i(TAG, "writeContent: " + sb.toString());
FileWriter writer = null;
try {
// Open a file writer. The second argument in the constructor, true, means to append the file
writer = new FileWriter(Environment.getExternalStorageDirectory() + "/codec.txt".true);
writer.write(sb.toString());
writer.write("\n");
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if(writer ! =null) { writer.close(); }}catch(IOException e) { e.printStackTrace(); }}returnsb.toString(); }}Copy the code
Extract H265 FFMPEG-I H265. Mkv-vcodec heVC output.h265
Mp4 – C :v copy-bsf :v h264_mp4toannexb -an out.h264
Ffplay-stats -f H264 out.h264
Mp4 -acodec copy -vn output.aac
ffmpeg -i input.mp4 -f mp3 -vn apple.mp3
ffplay -ar 48000 -channels 2 -f f32le -i output.pcm 1. Exe -i input.mp4 -filter_complex [0:v]reverse[v] -map [v] -preset superfast reversed. Mp4
Exe -i input.mp4 – VF Reverse reversed. Mp4
Exe -i input.mp4 -map 0 -c:v copy -af “areverse” reversed_audio. Mp4
Exe -i input.mp4 – VF reverse -af areverse -preset superfast reversed. Mp4
Mp4 -vcodec copy -acodec copy -ss 00:00:00 -to 00:05:00./cutout1.mp4 -y ffmPEg-i./input.mp4 -vcodec copy -acodec copy -ss 00:05:00 -to 00:10:00 ./cutout2.mp4 -y ffmpeg -i ./input.mp4 -vcodec copy -acodec copy -ss 00:10:00 -to 00:14:50./cutout3.mp4 -y
VCL video coding layer.
H265
The disadvantage of H264 is that it compresses individual macroblocks. Interframe prediction has not changed since 2003, I didn’t expect the resolution to evolve so fast now, say 8K, using H264 will be very large.
H264 16*16, record 16 above, record 15 = 31, 16×16-31 =
H264 macro block 1616 more H265 6464 or 128×128
Recursively divide the tree, finally to 4×4 no longer divided, it is very smart, in a large color change, will recursively divide, if there is a change, divide, find the left, pixel point, if there is no difference can be directly 64×64 prediction direction will be more
It takes up a lot of CPU,
I frame is H265 larger than H264, tree coding will increase the complexity, more additional information, B frame P frame will be about ten times smaller than H264
H265 has a VPS for the first frame including naked eye 3D, which is not useful for normal video, but will also parse, find offsets, find SPS and PPS
sps pps
For example, if it’s 26, it’s I and then 02 is P, 01 is B, H265, 00 is B and H265 will be charged, 4K, 8K TV
CTU is defined the same way as a macro block, CTU of a tree
algorithm
H264 is within 255 pixels, yuV and configuration. So we use Columbus, which is very short data friendly, variable length code, Columbus code,
Software that analyzes code streams
MAC Elecard HeVC Analyzer, Window Vedio Eye
H264 differs from H265
I. Version:
H.265 is a new encoding protocol, which is the upgraded version of H.264. H265 requires a high CPU and is suitable for 4K or 8K movies
Two reduced bit rate and storage space
Each macroblock/MB size in H.264 is fixed at 16×16 pixels, while h.265 encoding units can be selected from a minimum of 8×8 to a maximum of 64×64. So H264 takes up more memory,
The quadtree partition structure of block is adopted
The most important difference between H.265 and H.264 is that h.265 adopts the quadtree partition structure of blocks, adopting the adaptive block partition from 64×64 to 8×8 pixels, and adopting a series of adaptive coding techniques such as prediction and transformation based on this block partition structure. For example, an uncomplicated 64 by 64 is just one big block, and the more complicated one is divided into four 8×8 pieces, and then it depends on which one is more complicated, and finally it goes to 4×4
frame
I frame 265 will be larger than 264, tree coding will add complexity, more additional information, B frame P frame is much smaller than 264.
Frame prediction
H264 has 9 predicted directions, h265 has 35 more accurate directions