Rounding YUV format (I420 / YUV420 / NV12 / NV12 / YUV422)

YUV (Y ‘CBCR) is a pixel format commonly found in video encoding and still images. As opposed to the RGB format (red-green-blue), YUV is represented by a Luminance or Luma component called Y (equivalent to grayscale) and two Chrominance or Chroma components called U (blue projection Cb) and V (red projection Cr), hence the name.

Only Y component but no UV component information, can display a complete black and white (gray) image, solve the analog signal TV black and white and color compatibility problem.

The sampling

Chromaticity channel (UV) sampling rates can be lower than brightness channel (Y) without significantly reducing perceived quality. A notation called “A:B:C” is used to describe the frequencies of U and V relative to Y sampling:

  • 4:4:4 represents the sampling rate of the chromaticity (UV) channel without lowering it. Each Y component corresponds to a set of UV components.
  • 4:2:2 means 2:1 horizontal down sampling, no vertical down sampling. Each Y component shares a set of UV components.
  • 4:2:0 means 2:1 horizontal sampling and 2:1 vertical sampling. Each of the four Y components shares a set of UV components.
  • 4:1:1 means 4:1 horizontal sampling, no vertical sampling. Each of the four Y components shares a set of UV components. 4:1:1 sampling is less common than other formats and will not be discussed in detail in this article.

The following figure shows how to sample chroma for each lower sampling rate. The luminance sample is represented by a cross and the chroma sample by a circle.

Storage format

YUV storage is usually divided into Planar format, semi-planar format and Packed format.

Planar scheme

The planar format is sometimes called Triplanar format, that is, Y, U and V components are stored in separate arrays, which is convenient for video coding.

YU12 (I420)

  • 4:2:0 Formats, 12 Bits per Pixel, 3 Planars

FOURCC I420

YU12, or I420, also known as IYUV, belongs to YUV420P format. Three planes, each storing the Y, U and V components. Each of the four Y components shares a set of UV components. The strides, width and height for the U and V planes are half as strides, width and height for the Y plane, so a pixel is 12 bits, as shown below:

As you see in the figure, strides per line of bytes and height for the U and V planes are half as strides as for the Y plane.

I420 is a common format used in audio and video development.

YV12

  • 4:2:0 Formats, 12 Bits per Pixel, 3 Planars

FOURCC YV12

YV12 is almost the same as I420, only changing the order of the U and V planes. The memory arrangement is as follows:

J420

  • 4:2:0 Formats, 12 Bits per Pixel, 3 Planars

The J420 is identical to the I420, but has a brightness (Y) component in the full range (0-255, full range) rather than the limited range (16-240, limited range, also called video range on iOS). The chroma (UV) component is exactly the same as in the I420.

IMC1

  • 4:2:0 Formats, 16 Bits per Pixel, 3 Planars

FOURCC IMC1

IMC1 is similar to I420. The width and height of the U and V planes are half as strides as those of the Y plane, but each line has the same number of bytes, so the padding for the U and V planes is 16 bits per pixel. As shown in the figure:

IMC3

  • 4:2:0 Formats, 16 Bits per Pixel, 3 Planars

IMC3 is almost the same as IMC1, only changing the order of the U and V planes. The memory arrangement is as follows:

I422

  • 4:2:2 Formats, 16 Bits per Pixel, 3 Planars

I422 belongs to YUV422P format. Three planes, each storing the Y, U and V components. Each Y component shares a set of UV components. The strides for plane U and V are half the width of plane Y, but the height is the same as it is for plane Y, so one pixel is 16 bits, as shown in the following figure:

As you see from the figure, strides per line for planes U and V are half as strides as for plane Y. The heights are the same as those for plane Y.

J422

  • 4:2:2 Formats, 16 Bits per Pixel, 3 Planars

The J422 is identical to the I422, but has a brightness (Y) component in the full range (0-255, full range) rather than the limited range (16-240, limited range, also called video range on iOS). The chroma (UV) component is exactly the same as in the I420.

Semi-Planar format

The half-plane format has two planes instead of three, with one plane storing the brightness (Y) component and the other storing the two chromaticity (UV) components. They are sometimes called BiPlanar schemes.

NV12

  • 4:2:0 Formats, 12 Bits per Pixel, 2 Planars

FOURCC NV12

NV12 belongs to YUV420SP format. Two planes, storing the Y component and the UV component. The UV components share a plane and are staggered in the order U, V, U, V. Each of the four Y components shares a set of UV components.

The UV plane is strides, width as long as the Y plane, but only half the height. Therefore, a pixel is 12 bits, and the memory arrangement is shown in the figure below:

As you can see from the figure, the UV plane has the same number of bytes per line as the Y plane, and the height is half as great as the Y plane.

NV12 is one of two video frame formats that iOS cameras (AVCaptureOutput) can directly output, the other being BGRA32(kCVPixelFormatType_32BGRA).

On iOS, NV12 also divided into Full Range (0-255, kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) and Video Range (16-240, KCVPixelFormatType_420YpCbCr8BiPlanarVideoRange), the difference is only the scope of luminance (Y) component, generally speaking, the Full Range is suitable for the static image (photo), Video Range suitable for Video acquisition (camera).

NV21

  • 4:2:0 Formats, 12 Bits per Pixel, 2 Planars

FOURCC NV21

NV21 belongs to YUV420SP, and NV12 is almost the same, the difference is that the sequence of U and V in the UV plane is reversed, with the order of V, U, V, U staggered, memory arrangement as shown in the figure:

NV21 is the default output format for Android cameras.

Packed format

The packaging format usually has only one plane, with all brightness (Y) and chromaticity (UV) data interwoven together. Somewhat similar to RGB format, but with different color space.

Packaging format is common in webcams. Hardware devices are less efficient with multiplanar formats because each pixel requires multiple memory accesses. The packaging format, because it has only one plane, costs less to access memory.

AYUV

  • 4:4:4 Formats, 32 Bits per Pixel

FOURCC AYUV

AYUV is Packed format, in which each pixel is encoded into four consecutive bytes, and each pixel is arranged in memory in the order of V, U, Y and A (A refers to alpha channel), as shown below:

YUYV (V422 / YUY2 / YUNV)

  • 4:2:2 Formats, 16 Bits per Pixel

FOURCC YUY2

YUYV is also commonly known as V422, YUY2, YUNV

YUY2 is Packed format, in which two pixels share a set of UV components, and the memory is arranged in the order of Y, U, Y, V, as shown in the figure below:

UYVY (Y422 / UYNV)

  • 4:2:2 Formats, 16 Bits per Pixel

FOURCC UYVY

UYVY is also commonly called Y422 or UYNV

UYVY is similar to YUYV, except that the order of brightness (Y) and chromaticity (UV) components is reversed, as shown in the figure below:

The resources

  1. Microsoft: Recommended 8-Bit YUV Formats for Video Rendering

  2. VideoLAN’s Wiki: YUV

  3. FOURCC: YUV pixel formats

  4. WWDC2011: Capturing from the Camera using AV Foundation on iOS 5