After contact front-end audio and video, need to master a lot of audio and video and multimedia related basic knowledge. When using FFmpeg + WASM for video frame extraction, related concepts such as video frame and color coding are involved. This article will introduce color space in video frames.

Video frame

Video, as we all know, consists of a series of images changing from one image to the next in a short period of time (usually 1/24 or 1/30 of a second). These pictures are called video frames.

For video frames, in modern video technology, RGB color space or YUV color space pixel matrix is usually used to represent. In ffMPEG, we can see that the source code libavutil/pixfmt. H defines a series of pixel formats, most of which are RGB and YUV color space types.

enum AVPixelFormat {
  / /... Types of omission that are less important
  ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
  AV_PIX_FMT_YUV420P,

  ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
  AV_PIX_FMT_YUYV422,

  ///< planar YUV 4:2:2, 16bpp, (1 Cr & Cb sample per 2x1 Y samples)
  AV_PIX_FMT_YUV422P,

  ///< packed YUV 4:2:2, 16bpp, Cb Y0 Cr Y1
  AV_PIX_FMT_UYVY422,

  ///< planar YUV 4:4:4, 24bpp, (1 Cr & Cb sample per 1x1 Y samples)
  AV_PIX_FMT_YUV444P,

  ///< planar YUV 4:4:0 (1 Cr & Cb sample per 1x2 Y samples)
  AV_PIX_FMT_YUV440P,

  ///< packed RGB 8:8:8, 24bpp, RGBRGB...
  AV_PIX_FMT_RGB24,
  ///< packed RGB 8:8:8, 24bpp, BGRBGR...
  AV_PIX_FMT_BGR24,
  
  ///< packed ARGB 8:8:8:8, 32bpp, ARGBARGB...
  AV_PIX_FMT_ARGB,
  ///< packed RGBA 8:8:8:8, 32bpp, RGBARGBA...
  AV_PIX_FMT_RGBA,
  ///< packed ABGR 8:8:8:8, 32bpp, ABGRABGR...
  AV_PIX_FMT_ABGR,
  ///< packed BGRA 8:8:8:8, 32bpp, BGRABGRA...
  AV_PIX_FMT_BGRA,

  ///< packed RGB 5:6:5, 16bpp, (msb) 5R 6G 5B(lsb), big-endian
  AV_PIX_FMT_RGB565BE,
  ///< packed RGB 5:6:5, 16bpp, (msb) 5R 6G 5B(lsb), little-endian
  AV_PIX_FMT_RGB565LE,
  ///< packed RGB 5:5:5, 16bpp, (msb)1X 5R 5G 5B(lsb), big-endian , X=unused/undefined
  AV_PIX_FMT_RGB555BE,
  ///< packed RGB 5:5:5, 16bpp, (msb)1X 5R 5G 5B(lsb), little-endian, X=unused/undefined
  AV_PIX_FMT_RGB555LE,

  ///< packed BGR 5:6:5, 16bpp, (msb) 5B 6G 5R(lsb), big-endian
  AV_PIX_FMT_BGR565BE,
  ///< packed BGR 5:6:5, 16bpp, (msb) 5B 6G 5R(lsb), little-endian
  AV_PIX_FMT_BGR565LE,
  ///< packed BGR 5:5:5, 16bpp, (msb)1X 5B 5G 5R(lsb), big-endian , X=unused/undefined
  AV_PIX_FMT_BGR555BE,
  ///< packed BGR 5:5:5, 16bpp, (msb)1X 5B 5G 5R(lsb), little-endian, X=unused/undefined
  AV_PIX_FMT_BGR555LE,
    
}
Copy the code

Each type of notes begins either packed or planar, followed by YUV with three numbers 4:2:0, 4:2:2, 4:4:4, and so on. What do these represent? With these questions in mind, I began to search materials and study the concepts of RGB and YUV color space correlation and pixel format.

RGB or YUV

RGB and YUV are both types of color space. RGB is one of the most widely used color systems at present, and RGB color standards are basically used on modern displays. The principle of RGB is to divide colors into red, green and blue channels, and each channel is mixed in different proportions to describe a color. YUV describes a color with a luminance component and two chromaticity components, Y for luminance and U and V for chromaticity. The biggest feature of YUV is the separation of brightness information and color information. Without color information, a complete black and white picture can still be displayed.

RGB

For front-end developers, RGB or RGBA color values are often used in CSS. The RGB format is very easy to understand, with R, G, and B representing the values of the red, green, and blue channels respectively. RGB formats can be divided into 16-bit, 24-bit and 32-bit formats, depending on the number of bits stored. You can also see 16bPP, 24BPP and 32BPP annotations in FFmpeg’s source code. (Since memory byte order is big-endian and little-endian, RGB may be expressed as BGR order, which is essentially the same)

The 16-bit format is mainly RGB555 and RGB565. RGB555 is 5 bits per channel component, leaving one unused. RGB565, as the name implies, R and B channels account for 5, G channels account for 6.

# RGB555
XRRR RRGG GGGB BBBB

# RGB565
RRRR RGGG GGGB BBBB
Copy the code

The 24-bit format and the 32-bit format are most commonly used. RGB24 means that each color channel component has 8 bits for a total of 24 bits. RGB32 means that in addition to 8 bits for each color channel component, there are 8 bits for transparent channel, also known as RGBA or ARGB, etc.

# RGB24
RRRR RRRR GGGG GGGG BBBB BBBB

# RGB32
RRRR RRRR GGGG GGGG BBBB BBBB AAAA AAAA
Copy the code

YUV

YUV is a color coding system, mainly used in video, graphics processing pipeline. Compared with RGB color space, YUV is designed to facilitate coding and transmission, reduce bandwidth occupation and information errors.

YUV coding system is the general name of Y ‘UV, YUV, YCbCr, YPbPr and other color Spaces. Due to historical relations, Y ‘UV, YUV is mainly used in color TV, used for analog signal representation. YCbCr is used for digital video, image compression and transmission, such as MPEG, JPEG. Due to the popularity of digital signals, YUV refers to YCbCr most of the time.

Conversion with RGB

For the display, display images are in RGB format, so you need to convert YUV format to RGB first.

Conversion from YUV to RGB has the formula:

R = Y + 1.13983 * V G = y-0.39465 * U-0.58060 * V B = Y + 2.03211 * UCopy the code

The formula for converting RGB to YUV:

Y = 0.299 * R + 0.587 * G + 0.114 * B U = -0.14713 * r-0.28886 * G + 0.436 * B V = 0.615 * R-0.51499 * g-0.10001 *  BCopy the code

The sampling

For a single pixel, pixel data is composed of Y/U/V channel data. However, for a whole picture, the data storage is not necessarily in order of each pixel data. In the transmission process of TV signals, due to the limitation of storage and transmission, part of the information will be reduced in signal processing to reduce the load. Based on the premise that human eyes are less sensitive to chroma than to brightness, chroma can be compressed and the impact on image expression can be minimized. YUV444, YUV422, YUV420 these YUVs are followed by numbers to represent the sampling method of YUV. The main sampling methods of YUV format are YUV 4:4:4, YUV 4:2:2 and YUV 4:2:0. (Sampling here can be simply understood as the process of converting original RGB image into YUV image)

In the sampling system of video systems, it is commonly used to represent A ratio of three: J:A:B (e.g. 4:2:2) to describe A conceptual region J pixels wide and two pixels high.

  • J: horizontal sampling reference (conceptual width of area). It’s usually 4.
  • A: Number of chromaticity samples in the first row of J pixels.
  • B: Number of additional chromaticity samples in the second row of J pixels.
YUV 4:4:4 sampling

YUV 444 sampling, also known as full sampling, means that each Y component uses a UV component, and the resulting image is the same size as the original RGB image.

YUV 4:2:2 sampling

YUV 4:2:2 means that the Y and UV components are sampled in a ratio of 2:1, with each of the two Y components sharing one UV component. So half of the pixels are 1/3 the size of the original, and the entire image is 2/3 the size of the original image.

YUV 4:2:0 sampling

YUV 4:2:0 is the most commonly used video frame format. The Y and UV components of the first row of pixels are sampled in a 2:1 ratio. The second row of pixels does not sample the UV component. Sampling schematic diagram is as follows:

Storage format

In the code comments above, the letters begin either planar or packed. Planar and Packed represent the storage format of picture data.

Packed

Packed format is simply understood that each channel component is stored continuously and alternately. RGB format is basically Packed format, because the data arrangement is RGBRGBRGBRGB… . The common storage formats of Packed mode in YUV are YUYV format and UYVY format, both of which are based on YUV 4:2:2 sampling format.

  • YUYV

    Example Y0U0Y1V0 Y2U2Y3V2. Y0 and Y1 share U0V0 components, and Y2 and Y3 share U2V2 components.

  • UYVU

    For example, the sequence U0Y0V0Y1 U2Y2V2Y3 differs from YUYV in that the UV component is placed in the front.

Planar

In the Planar format, the Y component of all pixels is stored consecutively first, then the U component, and finally the V component. A typical example is the I420 (most commonly used in video), based on the YUV 4:2:0 sampling format. Taking 4 * 4 pixels as an example, the arrangement is as follows:

Each of the four Y components shares a UV component, and the sharing relationship is shown in the figure.

Afterword.

When looking up YUV related data, I found that there are too many format types, but the principle is almost the same. It can be imagined that there is no unified standard in the development process of digital signals and various schemes are flying in the era is how chaotic.

FFmpeg provides a yuV conversion method to RGB, but a review of the source code found that it is based on CPU operations. The conversion formula of YUV and RGB can be expressed as matrix multiplication

According to the principle that all operations that can be written as matrix multiplication can be accelerated by USING GPU, the method of accelerating YUV conversion into RGB will be further studied to improve the performance when it is implemented on the business side.

reference

  1. Wikipedia – Film frame

  2. Wikipedia – pixel format

  3. Wikipedia – Chroma sampling

  4. Wikipedia – YUV

  5. Zhihu – Video and Video Frame: Basic knowledge of video and frame sorting

  6. Audio and video development – a reading of YUV sampling and format