A summary of H264

H264 is a highly compressed digital Video codec standard proposed by the Joint Video Team (JVT) jointly formed by ITU-T Video Coding Expert Group (VCEG) and ISO/IEC Dynamic Image Expert Group (MPEG). Also known as H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC) is just one thing,

2 the H264 origin

Itu-t invented H261 in 1988, AND ISO/IEC-MPEG invented MPEG-1 in 1991, both of which are standard algorithms for video compression, but these two are not compatible, they are different things. With the development of audio and video, in 2003, the two bodies reached a consensus, Combined into an algorithm, H264/MPEG-4AVC, commonly known as H264.

Three why to compress the video file, that is, use H264

So if you have a 1 second video, 30 frames, 1920 x 1080 resolution, you’re going to have 8 Y’s per second for 2 U’s and 2 Y’s per second

1920x1080x30x1.5(YUV420 format is 1.5 bytes)/1024/1024 = 118MB (because the video is YUV not ARB)

So if we were to watch online video, we would have to download 118 megabytes per second to watch the video. In today’s world of short videos, no.

Which places in the four videos can be compressed

Spatial redundancy: For example, if a frame is all white, I can save the color of one pixel
Time redundancy: For example, in a slow-motion video, two frames are basically the same, and only one frame of name can be saved
Visual redundancy: Human vision does not always perceive any change in an image,
Knowledge redundancy: The structure of regularity can be derived from prior knowledge and background knowledge

Four H264 compression principle

Intra – frame compression: The use of intra – frame predictive coding based on the convergence of pixels within a frame
Interframe compression: Generates megabytes of optimally matched macroblocks from the current macroblock to the reference frame using macroblock-based motion compensation predictive coding techniques

4.1 the macroblock

Such as the one from left to right the picture of the black and white gradient We don’t have to put all the pixels to save it, we can put all of the pixels, divided into 1616 squares, we just remember in the 1616 square pixel values of the top and left, width is high, the start and stop position, so we can calculate all the pixels. That’s a lot less memory, so if you have a lot of 16 by 16 images, you’re going to compress them into a smaller volume,

4.1 key frames

I frame: I frame can be regarded as the product of an image after compression. It can be decompressed into a single complete picture by video decompression algorithm. It’s about 1/6 compressed
P frame: for reference with I frame, there are only motion vectors and differences, which are compressed by 1/20. P frame can also be used as reference, on the premise that there is I frame in front of this P frame. If the live broadcast has just come in, BPBBI, this P frame is useless even as a reference, it must have I frame
B frames: B frames are predicted by the I or P frames that precede and the P frames that follow. It’s just the vector, 1/50

4.2 GOP

It’s the number of frames between two I frames. That is a video for the first frame must be the I frame, if not I frame, will be black screen for a moment, because the I frame is a frame, such as broadcast, we came in, if is P frame will be black screen, until the first the I frame, the less in the short video I frame, the GOP value, the greater the compression of the smaller size, but in the process of live, We cannot set the GOP so large, because if it is large, the probability of black screen will be increased. So live GOP is relatively small

4.3 the SPS and PPS

SPS configuration information (width and height frame rate resolution, encoding mode, etc.), PPS (width and height only) that is, the I frame must be preceded by SPS and PPS frames, each of which is segmented by 0x000001

4.4 Overall structure of bit stream

It is divided into two layers: video coding layer (VCL) and network abstraction layer/network abstraction layer (NAL).H.264’s encoded video sequence consists of a series of NAL units, each containing an RBSP. An original H.264 consists of N NALU cells. NALU cells are usually composed of [StartCode] [NALU Header] [NALU Payload]. The StartCode is used to indicate the Start of a NALU cell. Must be “00 00 00 01” or “00 00 01”

4.5 NAL Transmission on networks

The delimiter 0x000001 is in NAL. If the Slice corresponding to NALU is the beginning of a frame, it is represented by 4 bytes, i.e. 0x00000001; Otherwise, use 3 bytes, 0x000001. NAL Header:forbidden_bit, nal_reference_bit(priority), nal_unit_type(type). Unshell operation: To keep the NALU body from including the start code, a byte 0x03 is inserted for every two (consecutive) bytes of 0 encountered during encoding to distinguish it from the start code. When decoding, the corresponding 0x03 is deleted.

Five Android underlying video codec DSP chip

For example, an I frame will go directly to the transmission encoder. When an I frame comes, it will be cached in the transmission buffer. After a P frame comes, I frame comes out after P frame, so the sequence of H264 code stream and video playback sequence must be different

What video encoding uses YUV instead of RGB

Rgb principle: the definition of Rgb is designed from the principle of color light, by red, green, blue three lights, when their light overlap with each other, color mixed, and brightness is equal to the sum of the brightness of the two lights (the brightness of the two lights! , the more mixed, the higher the brightness, that is, additive mixing. RGB24 means that R, G, and B are 8 bits each, which is 3 bytes

1280 * 720 * 3(3 bytes per pixel) / 1024/1024 = 2.63MB

YUV works by separating brightness from chroma, and the human eye is more sensitive to brightness than chroma. YUV three letters, in which “Y” refers to brightness (Lumina NCE or Luma), that is, gray scale value; The “U” and “V” stand for Chrominance or Chroma, which describes the color and saturation of an image and specifies the color of a pixel.

, YUV is mainly used to optimize the transmission of color video signals. Compared with RGB video signal transmission, its biggest advantage is that it only needs to occupy a very small bandwidth (RGB requires three independent video signals to be transmitted at the same time), where “Y” stands for brightness, namely gray order value; The “U” and “V” are for chroma

1280 * 720 * 2(2 bytes per pixel) / 1024/1024 = 1.76MB Using YUV4:2:2 saves 1/3

Use MediaCodec to play h254 data in Android

MediaCodec is the underlying decoder for accessing audio and video decoding provided by Android. The underlying decoder is the DSP chip. All apps share a DSP chip. We can use FFMPEG to extract H264 from the video. With the following command, we use

ffmpeg -i demo.mp4 -vcodec hevc out.h265

// Decoder, specify H264 format
 MediaCodec  mediaCodec = MediaCodec.createDecoderByType("video/avc");
  // The decoded format is 264 wide and high
  


MediaFormat mediaFormat = MediaFormat.createAudioFormat("video/avc".100.100);
// Set the color format
mediaFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
 // Set the frame rate
 mediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 0);
 // If it is displayed directly on the surfaceView, pass in the surface; if you want to use decoded data, write null
 mediaCodec.configure(mediaFormat, surface, null.0);
 
 
// We need to decode each frame in a child thread
@Override
public void run(a) {
     byte[] bytes = h64 filebyteThe array;// Is the input buffer from the underlying decoder
    ByteBuffer[] inputBuffers = mediaCodec.getInputBuffers();
    // start indexing, because the previous must have a file format, so arbitrarily add a value
    int startFrameIndex = 0;
    / / total size
    int total = bytes.length;
    while(true) {if(total == 0 || startFrameIndex>=total){
            // If the next position is greater than totalSize, the constitution
            break;
        }
    
        // Find the point to write a frame, since we know that each frame is split by 0x0000001
        int nextFrameStart = 0;
        for((int i = startFrameIndex+2; i<total; i++){if (bytes[i] == 0x00 && bytes[i + 1] = =0x00 && bytes[i + 2] = =0x00 && bytes[i + 3] = =0x01)||(bytes[i] == 0x00 && bytes[i + 1] = =0x00 && bytes[i + 2] = =0x01)) {
                nextFrameStart = i;
                 break;
    
            }
        }
        MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
        int index =   mediaCodec.dequeueInputBuffer(10000);
        if(index>=0){
            ByteBuffer byteBuffer = inputBuffers[index];
            byteBuffer.clear();
            byteBuffer.put(bytes, startFrameIndex, nextFrameStart - startFrameIndex);

            mediaCodec.queueInputBuffer(startFrameIndex, 0, nextFrameStart - startFrameIndex, 0.0);
            startFrameIndex = nextFrameStart;
        }else{
            // No app is available
            continue;
        }
        int outIndex= mediaCodec.dequeueOutputBuffer(info, 10000);
        if (outIndex >= 0) {
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            mediaCodec.releaseOutputBuffer(outIndex, true); }}}Copy the code

Encode the recording screen into H264

public class MainActivity2 extends AppCompatActivity {
    public static final String TAG = "MainActivity2";
    private MediaProjectionManager mediaProjectionManager;

    private MediaProjection mediaProjection;

    private MediaCodec mediaCodec;
    private Surface surface;
    // After 30 to record the screen

    // If you want to edit an MP4 file, you will have to edit the H264 data
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main2);
        mediaProjectionManager = (MediaProjectionManager) getSystemService(Context.MEDIA_PROJECTION_SERVICE);
        Intent captureIntent = mediaProjectionManager.createScreenCaptureIntent();
        startActivityForResult(captureIntent, 100);
        checkPermission();

    }

    @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        super.onActivityResult(requestCode, resultCode, data);
        if (requestCode == 100 && resultCode == Activity.RESULT_OK) {
            // Get mediaProjection to record the screen
            mediaProjection = mediaProjectionManager.getMediaProjection
                    (resultCode, data);
//initMediaCodec(); }}// H264 a frame of data is a NALU unit
    private void initMediaCodec(a) {
        try {
            // createEncoderByType is the encoding, encoding a YUV video as H264 will also have delimiter, the first frame will always be the SPS PPS output as a frame
            // It must be the SPS PPS before a video will appear many times with the SPS 0000000167 PPS 68EFBCB0 000001 I frame
            // createDecoderByType decodes an H264 into a YUV video
            mediaCodec = MediaCodec.createEncoderByType("video/avc");


            MediaFormat format = MediaFormat.createVideoFormat(MediaFormat.MIMETYPE_VIDEO_AVC, 540.960);

            format.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
            // 15 frames per second
            format.setInteger(MediaFormat.KEY_FRAME_RATE, 15);
            // An I frame every 2 seconds
            format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 2);
            // The higher the bit rate, the clearer the video. Control the sharpness of the encoded or decoded video picture, and the length of the encoded frame will be different
            format.setInteger(MediaFormat.KEY_BIT_RATE,1200 _000);


            // Create a virtual Surface
            surface = mediaCodec.createInputSurface();

            // Whether Surface is displayed on Surface
            / / encryption
            / / to coding
            mediaCodec.configure(format, null.null, MediaCodec.CONFIGURE_FLAG_ENCODE);
            new Thread() {
                @Override
                public void run(a) {
                    super.run();
                    mediaCodec.start();

                    // Associate the screen recording mediaProjection with the Surface, and drop every recorded frame onto the Surface

                    // 1 = dp = 1 pixel
                    // VIRTUAL_DISPLAY_FLAG_PUBLIC Virtual Surface
                    mediaProjection.createVirtualDisplay("screen-codec".540.960.1, DisplayManager.VIRTUAL_DISPLAY_FLAG_PUBLIC, surface, null.null);

                    MediaCodec.BufferInfo bufferInfo = new MediaCodec.BufferInfo();

                    // https://blog.csdn.net/mcsbary/article/details/95883818
                    // We need to output a frame of data from input 264
                    // Just focus on the output
                    while (true) {
                        // a 10ms timeout returns the data index. BufferInfo will size the data when we execute this sentence
                        int index = mediaCodec.dequeueOutputBuffer(bufferInfo, 100000);
                        if (index > 0) {
                            // Here is compressed data and what we are doing here is converting screen recording into compressed data h264 and here is encode
                            / / if we get the h264 data, to decode MediaCodec. CreateDecoderByType, so here is the yuv video data
                            ByteBuffer buffer = mediaCodec.getOutputBuffer(index);
                            // The ByteBuffer is not operable by the DSP chip. It can only be read and destroyed at last

                            byte[] outData = new byte[bufferInfo.size];
                            buffer.get(outData);
                            // Write to a file as a string so that we can look at it
                            writeContent(outData);

                            // Writes a file in hexadecimal format so that it can be played
                            writeBytes(outData);


                        } else {
                            // -1 is failure

                            Log.e(TAG,"1");
                        }
                    }

                }
            }.start();

        } catch(IOException e) { e.printStackTrace(); }}public boolean checkPermission(a) {
        if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.M && checkSelfPermission( Manifest.permission.WRITE_EXTERNAL_STORAGE) ! = PackageManager.PERMISSION_GRANTED) { requestPermissions(new String[]{
                    Manifest.permission.READ_EXTERNAL_STORAGE,
                    Manifest.permission.WRITE_EXTERNAL_STORAGE
            }, 1);

        }
        return false;
    }

    public void writeBytes(byte[] array) {
        FileOutputStream writer = null;
        try {
            // Open a file writer. The second argument in the constructor, true, means to append the file
            writer = new FileOutputStream(Environment.getExternalStorageDirectory() + "/codec.h264".true);
            writer.write(array);
            writer.write('\n');


        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if(writer ! =null) { writer.close(); }}catch(IOException e) { e.printStackTrace(); }}}public String writeContent(byte[] array) {
        char[] HEX_CHAR_TABLE = {
                '0'.'1'.'2'.'3'.'4'.'5'.'6'.'7'.'8'.'9'.'A'.'B'.'C'.'D'.'E'.'F'
        };
        StringBuilder sb = new StringBuilder();
        for (byte b : array) {
            sb.append(HEX_CHAR_TABLE[(b & 0xf0) > >4]);
            sb.append(HEX_CHAR_TABLE[b & 0x0f]);
        }
        Log.i(TAG, "writeContent: " + sb.toString());
        FileWriter writer = null;
        try {
            // Open a file writer. The second argument in the constructor, true, means to append the file
            writer = new FileWriter(Environment.getExternalStorageDirectory() + "/codec.txt".true);
            writer.write(sb.toString());
            writer.write("\n");
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if(writer ! =null) { writer.close(); }}catch(IOException e) { e.printStackTrace(); }}returnsb.toString(); }}Copy the code

Extract H265 FFMPEG-I H265. Mkv-vcodec heVC output.h265

Mp4 – C :v copy-bsf :v h264_mp4toannexb -an out.h264

Ffplay-stats -f H264 out.h264

Mp4 -acodec copy -vn output.aac

ffmpeg -i input.mp4 -f mp3 -vn apple.mp3

ffplay -ar 48000 -channels 2 -f f32le -i output.pcm 1. Exe -i input.mp4 -filter_complex [0:v]reverse[v] -map [v] -preset superfast reversed. Mp4

Exe -i input.mp4 – VF Reverse reversed. Mp4

Exe -i input.mp4 -map 0 -c:v copy -af “areverse” reversed_audio. Mp4

Exe -i input.mp4 – VF reverse -af areverse -preset superfast reversed. Mp4

Mp4 -vcodec copy -acodec copy -ss 00:00:00 -to 00:05:00./cutout1.mp4 -y ffmPEg-i./input.mp4 -vcodec copy -acodec copy -ss 00:05:00 -to 00:10:00 ./cutout2.mp4 -y ffmpeg -i ./input.mp4 -vcodec copy -acodec copy -ss 00:10:00 -to 00:14:50./cutout3.mp4 -y

VCL video coding layer.

H265

The disadvantage of H264 is that it compresses individual macroblocks. Interframe prediction has not changed since 2003, I didn’t expect the resolution to evolve so fast now, say 8K, using H264 will be very large.

H264 16*16, record 16 above, record 15 = 31, 16×16-31 =

H264 macro block 1616 more H265 6464 or 128×128

Recursively divide the tree, finally to 4×4 no longer divided, it is very smart, in a large color change, will recursively divide, if there is a change, divide, find the left, pixel point, if there is no difference can be directly 64×64 prediction direction will be more

It takes up a lot of CPU,

I frame is H265 larger than H264, tree coding will increase the complexity, more additional information, B frame P frame will be about ten times smaller than H264

H265 has a VPS for the first frame including naked eye 3D, which is not useful for normal video, but will also parse, find offsets, find SPS and PPS

sps pps

For example, if it’s 26, it’s I and then 02 is P, 01 is B, H265, 00 is B and H265 will be charged, 4K, 8K TV

CTU is defined the same way as a macro block, CTU of a tree

algorithm

H264 is within 255 pixels, yuV and configuration. So we use Columbus, which is very short data friendly, variable length code, Columbus code,

Software that analyzes code streams

MAC Elecard HeVC Analyzer, Window Vedio Eye

H264 differs from H265

I. Version:

H.265 is a new encoding protocol, which is the upgraded version of H.264. H265 requires a high CPU and is suitable for 4K or 8K movies

Two reduced bit rate and storage space

Each macroblock/MB size in H.264 is fixed at 16×16 pixels, while h.265 encoding units can be selected from a minimum of 8×8 to a maximum of 64×64. So H264 takes up more memory,

The quadtree partition structure of block is adopted

The most important difference between H.265 and H.264 is that h.265 adopts the quadtree partition structure of blocks, adopting the adaptive block partition from 64×64 to 8×8 pixels, and adopting a series of adaptive coding techniques such as prediction and transformation based on this block partition structure. For example, an uncomplicated 64 by 64 is just one big block, and the more complicated one is divided into four 8×8 pieces, and then it depends on which one is more complicated, and finally it goes to 4×4

frame

I frame 265 will be larger than 264, tree coding will add complexity, more additional information, B frame P frame is much smaller than 264.

Frame prediction

H264 has 9 predicted directions, h265 has 35 more accurate directions

H.264/AVC