Forward from White Wolf Stack: See the original article
As for audio and video, I believe everyone has seen movies (videos), listened to music (audio), at least should know that MP4 is a video file, MP3 is an audio file.
What are the properties of an audio or video file? In the case of video, we can view the information of a media file with the FFMPEG-I command.
FFmpeg - I r1ori. MP4 FFmpeg version 4.1 Copyright (C) 2000-2018 The FFmpeg Developers Built with Apple LLVM version 10.0.0 (clang - 1000.10.44.4) configuration: -- prefix = / usr/local/Cellar/ffmpeg / 4.1 - enable - Shared - enable - pthreads - enable - version3 - enable - hardcoded - tables - enable - avresample - cc = clang - host - cflags = '-i/Library/Java/JavaVirtualMachines jdk1.8.0 _251. JDK/Contents/Home/include - I/Library/Java/JavaVirtualMachines jdk1.8.0 _251. JDK/Contents/Home/include/Darwin '- host - ldflags = - enable - ffplay --enable-gpl --enable-libmp3lame --enable-libopus --enable-libsnappy --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-chromaprint --enable-frei0r --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libgme --enable-libgsm --enable-libmodplug --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-librsvg --enable-librtmp --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtesseract --enable-libtwolame --enable-libvidstab --enable-libwavpack --enable-libwebp --enable-libzmq --enable-opencl --enable-openssl --enable-videotoolbox - enable - libopenjpeg - disable - decoder = jpeg2000 - extra - cflags = upon - I/usr/local/Cellar/openjpeg 2.3.0 / include/openjpeg - 2.3 -- Enable-nonfree libavutil 56\.22.100/56 \.22.100 libavcodec 58\.35.100/58 \.35.100 libavformat 58\.20.100/58 \. 20.100 libavDevice 58\.5.100/58 \.5.100 libavFilter 7\.40.101/7 \.40.101 libavresample 4\.0 \.0/4 \.0 \.0 Libswscale 5\.3.100/5 \.3.100 libswresample 3\.3.100/3 \.3.100 libpostproc 55\.3.100/55 \.3.100 Input #0, libswscale 5\.3.100/5 \.3.100 libswresample 3\.3.100/3 \.3.100 libswresample mov,mp4,m4a,3gp,3g2,mj2, from 'r1ori.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: ISOMISO2AVC1MP41 encoder: Lavf58.20.100 Duration: 00:00:58.53, start: 0.000000, bitrate: 1870 KB /s Stream #0:0(und): Video: H264 (HIGH) (AVC1/0x31637661), YUV420P, 544x960, 1732 KB /s, 29.83 FPS, 29.83 TBR, 11456 TBN, 59.67 TBC (default) Metadata: handler_name: VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default) Metadata: handler_name : SoundHandler
In addition to the meta information of the video, it also includes more of the configuration that we originally compiled. You can hide this information by choosing the -hide_banner parameter. The full command is shown below
Mp4-hide_banner Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'r1ori. Mp4 ': Metadata: major_brand: ISom minor_version: 512 compatible_brands: ISOMISO2AVC1MP41 encoder: Lavf58.20.100 Duration: 00:00:58.53, start: compatible_brands: ISOMISO2AVC1MP41 encoder: Lavf58.20.100 Duration: 00:00:58.53, start: 0.000000, bitrate: 187KB /s Stream #0:0(und): Video: H264 (HIGH) (AVC1/0x31637661), YUV420P, 544x960, 1732 KB /s, 29.83 FPS, 29.83 TBR, 11456 TBN, 59.67 TBC (default) Metadata: handler_name: VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default) Metadata: handler_name : SoundHandler At least one output file must be specified
We’re going to focus on a couple of numbers
- Input #0, mov,mp4,m4a,3gp,3g2,mj2, from ‘r1ori.mp4’: # Input #0 represents the first file we Input with the FFMPEG-i parameter. The subscript starts at 0, which means we can Input multiple files. In fact, FFMPEG supports output multiple files
- Metadata stands for video meta information
- The Duration line contains a video that has a playback time of 58.53 seconds, a start time of 0, and a bitrate of 1870kbit/s for the entire file
- Stream #0:0(und): Video: H264, this line indicates that the first stream of the file is a video stream, encoded in H264 format (encapsulated in AVC1 format), the data of each frame is represented as YUV420P, the resolution is 544×960, the bit rate of the video stream is 1732kbit/s, and the frame rate is 29.83 frames per second.
- Stream #0:1(und): Audio: AAC, this line indicates that the second stream of the file is an audio stream, encoded in ACC format (encapsulated in MP4A format), using the Profile LC specification, sampling rate 44.1 kHz, sound channel Stereo (Stereo), and bit rate 129kbit/s
Some strange nouns are beginning to appear, so let’s introduce them one by one.
The container
Like the above video file, the different data streams (video stream, audio stream, some have subtitle stream, etc.) are encapsulated in a file, we call it a container. We are familiar with MP4, AVI, RMVB and so on are multimedia container format, in general, the suffix of the multimedia file is its container format.
We can think of a container as a bottle or jar or something like that.
Encoding and decoding (CODEC)
Coding: to record and store video and audio in a format or specification, called coding (Codec). Coding can be thought of as the processing of the contents of the container.
Common video encoding formats include H264, H265, etc., and common audio encoding formats include MP3, AAC, etc.
Decoding: it is to decode the video and audio compressed encoded data into uncompressed video and audio original data. For example, if we want to echo a piece of audio, we need to decode and encode the audio file first.
Soft solution: that is, software decoding, through the software to let the CPU to video file decoding operation.
Hard solution: that is, hardware decoding, in order to reduce the pressure on the CPU, GPU is used to process part of the video data that was originally processed by the CPU.
Soft solution needs to deal with a lot of video information, so soft solution is very eat CPU, a FFMPEG command may put the CPU dry down.
In comparison, the efficiency of hard solution is very high, but the disadvantage of hard solution is also obvious, it can not be like soft solution, the processing effect of subtitles, picture quality and so on is not very good. If I remember correctly, the seven niuyun platform (a relatively professional audio and video platform) does not yet support hard solution.
FFMPEG is the most common open source soft decoding library, it is actually through such as H264, H265, MPEG-4 codec algorithm for soft solution.
In today’s audio and video field, FFMPEG support almost all audio and video codec, very powerful.
Transcoding: the conversion of a video from one format to another. For example, converting a FLV file to an MP4 file.
ffmpeg -i input.flv output.mp4
Bit rate
Bit rate, also known as bit rate, is the number of bytes per second output by the encoder. The unit is Kbps (b is bits). This is a measure of the size of a computer file.
Such as
The higher the bit rate, the higher the quality of the video under the same compression algorithm (we will discuss several different compression algorithms later).
For compressed files, as understood above, the rough calculation of bitrate = file size/length.
For example, R1ori. MP4 is 13.7 megabytes in size and lasts about 59 seconds, so its bit rate is approximately equal to (13.7 x 1024 x 8) / 59 = 1900 KB /s
Formula: 1 MB = 8 MB = 1024 KB = 8192 KB
Because there are other parameters, we can only get an approximate value for this bit rate.
Fixed and variable bit rates
In the early years, Constant Bitrate (CBR) was used for audio encoding, followed by Variable Bitrate (VBR). Fixed Bitrate refers to the fixed Bitrate of the encoder output. This makes it difficult to balance “calm scenes” and “violent scenes”. In contrast, variable rate allows the encoder to be well controlled, using more bits when there is more detail and the scene is relatively violent, and using lower bits when the scene is relatively calm. As a result, VBR is more advantageous in the case of constant output quality, and variable rate is preferred for storage.
Frame and frame rate
A frame is an image.
Frames per second (FPS) is how many frames are printed per second. You can also understand how many times a frame is printed per second.
Everyone must have deep experience when playing the game. When the game is stuck, the picture is jumping between frames, which is not smooth.
The frame rate affects the smoothness of the picture, and the higher the frame rate, the smoother the picture will be.
Because of the phenomenon of visual retention (that is, when the object is moving rapidly, the human eye can continue to retain the image of the image of 1/24 second or so), the minimum frame rate required for general movie video is 24, that is, 1/24 = 0.042 seconds per frame exposure.
The resolution of the
Resolution we should be not unfamiliar, such as a video website common blue 1080P, 720P ultra clear, high-definition 540P.
Resolution can be understood as the size of the video frame, that is, the width and height of the video. 720P means 720 pixels high.
After understanding the bit rate and frame rate, we found that it is not absolute to say that the higher the resolution is, the clearer the video will be. What is more important is how to balance the relationship among the bit rate, frame rate and resolution.
In general, we’re more comfortable with smaller, higher-resolution videos, both because they’re easier to store and because they look good.
Lossy and lossless
First let’s talk about what is audio and video raw data? Raw data refers to the data collected by audio-video equipment without any processing. The raw data of audio is in PCM format, and the raw data of video is in YUV format.
Lossy and lossless, that is, there is no loss, here for multimedia data compression is a way of saying. Lossy compression is also known as destructive compression, which of course does not mean the kind of damage that cannot be decompressed after compression. For example, our common MP3, MP4 files are lossy compression.
Take audio coding as an example. The sound in audio comes from nature. We capture the sound through a technical scheme and store it according to a certain algorithm.
At this stage, we can’t completely restore the sound of nature, any audio encoding is lossy.
Some students may have some questions. I read an article about it. Isn’t the original data of audio in PCM format?
In fact, PCM code is only infinitely close to lossless, it can achieve the highest fidelity of the signal, therefore, PCM code is agreed to be lossless compression.
Good audio, I want to listen to the most authentic sounds from nature, why compress?
Raw data is too large to store easily
Even if it is stored, it is not easy to transmit and requires a lot of bandwidth
Now the video compression ratio is very high, such as nowadays we are familiar with 4K 8K, seems to be fully meet the needs
Multiplexer and demultiplexer
For containers, notice that this is for containers, we often have two kinds of frequent operations.
The removal of audio and video data from the container is called unpacking, and is done by the Demuxer unpacker (also known as demultiplexer).
The packing of processed audio and video data into a container is called encapsulation, which is done by a Muxer wrapper (also known as a multiplexer).
We will continue to update the concepts related to audio and video below this article. If you think there is any concept difficult to understand, you can give me a message, and I will collect and supplement.