directory

  1. Video Player Principle
  2. Audio coding basics
  3. Basics of video coding
  4. data
  5. harvest

From this article we enter the learning practice of ffMPEG series. As the beginning, let’s first understand the basics of audio and video.

I. Principle of video player

Photo source: [Making a video player based on FFmpeg+SDL – Lei Xiaohua]

This is a very clear illustration of the video playback process: decamping — audio/video decompression — Audio/video coding — Audio/video raw data PCM and YUV — Audio/video synchronization — Audio playback, video rendering

Two, audio basic knowledge

To convert an analog signal to a digital signal for sound, there are three steps: sampling, quantization, and coding

The human ear can hear sound in the frequency range of 20Hz ~ 20KHz. According to the Nyquist sampling principle, the sound is sampled at a frequency more than 2 times higher than the highest frequency of the sound, so the sampling rate is generally 44100Hz (slightly more than 20KHz x 2), that is, 44100 times per second.

The sampled data is represented by binary signals, generally 8 bits, 16 bits and 32 bits.

The bare data format for audio is Pulse Code Modulation (PCM), which describes the format, sampling rate, and number of tracks that need to be quantized by PCM. Let’s use common values to calculate the bitrate and storage space. If the quantization format is 16 bits (2 bytes), the sampling rate is 44100HZ, and the number of sound channels is 2, the corresponding bit rate is 2x44100x2 x8/1024= 1378kbps. If the audio is 4 minutes, the corresponding file size is 1378x60x4/8/1024 = 40MB

This is still relatively large, in order to reduce the storage space and transmission of traffic, the need for compression coding, common compression coding mp3, AAC, WAV and other encoding. Among them, WAV is the head of adding 44 subsections on the basis of PCM, which belongs to lossless compression, while MP3 and AAC belong to lossy compression, especially AAC compression rate is larger, generally used in hell 128Kbit/s below the audio coding.

Audio sample data can be viewed through Adobe Audition

You can see the audio spectrum signal that we see, amplified and sampled in time.

3. Basic knowledge of video

3.1 RGB and YUV

Video is made up of images, images can be made up of RGB. For a 1080×1920 pixel mobile phone, an IMAGE of RGBA8888 takes up 1080x1920x4 = 7M. If it is RGB565, the corresponding memory footprint is 1080x1920x2 = 3.5m, which is also the size of a bitmap in memory in use, thus it can be seen that a diagram of raw data is relatively large. Image compression is common JPEG, PNG. JPEG compression can be up to 7 times.

But this compression can not be directly applied to video compression, because for video, not only intra-frame encoding should be considered, but also inter-frame encoding should be considered. For video raw data, YUV is more used to represent, compared with RGB, the biggest advantage is that it takes up very little bandwidth (eg YUv420p uses one byte to represent).

YUV, the previous black and white TV only Y not UV, YUV color mode is compatible with the previous black and white mode. Where Y refers to brightness, UV is hue and saturation, Cr and Cb are used to represent hue and saturation respectively, where Cr reflects the difference between the brightness of the red signal of the input signal and the RGB signal. Cb reflects the difference between the blue signal of the input signal and the RGB signal brightness.

View video files commonly used tools have MP4Info can be very intuitive view, MP4 format structure

Elecard StreamEye can easily view IPB frame information when analyzing video encoding data.

H264Visa can view YUV data corresponding to each frame of the picture

Of course, ffMPEG provides a very powerful ffProbe, which can be used for video and audio data analysis, which we will cover in the next article when we learn the common commands of FFMPEG.

3.2 Encapsulation Format

The function of encapsulation format is to store video and audio bit streams in a file (container) according to a certain format. Common video encapsulation formats include MP4, MOV, AVI, TS and so on. It can be analyzed by Elecard Format Analyzer or ffProbe.

3.3 IPB frames, GOP, DTS, PTS

I-frame, also known as key frame, can be regarded as the product of an image after compression. The compression ratio of I-frame is similar to THAT of JPEG. The compression of I-frame can remove the spatial redundancy information. The redundancy between encoded frames behind the image sequence should also be taken into account, and the compression ratio can reach 50:1

GOP is a Group Of pictures formed between two I-frames. You can specify the number Of frames between two I-frames by setting gop_size

The Decoding Time Stamp (DTS) is used for the Decoding of video, and the Presentation Time Stamp (PTS) is used for the synchronization and output of audio and video in the Decoding stage. DTS and PTS are consistent in the absence of B frames.

Four, data

“Audio and video development advanced guide” based on FFmpeg+SDL video player production — Lei Xiaohua

Five, the harvest

  1. Understand the principle and process of video playback
  2. Understand audio related basic knowledge of sampling, quantization, coding, compression
  3. Understand the basic knowledge related to video, such as raw data RGB and YUV representation, IPB, GOP, DTS, and PTS

Thank you for reading

Next we learn to practice ffMPEG common commands, welcome to follow the public account “Audio and video development journey”, learn and grow together.

Welcome to communicate