Shader transforms RGBA into YUV

  • OpenGL uses shader to convert RGBA to YUYV
  • OpenGL uses shader to convert RGBA to NV21

Some links about YUV images are also posted here for those unfamiliar with them.

  • The basic processing of YUV image is mastered in this paper

Shader implements RGBA to I420

I420 images are common in video decoding. As mentioned in the previous article, Shader is generally used to transform RGBA into YUV in engineering, so that in reading images using glReadPixels, the amount of transmitted data can be effectively reduced, the performance can be improved, and the compatibility is good.

Therefore, when reading OpenGL rendering results, first use Shader to transfer RGBA to YUV and then read, this way is very efficient and convenient.

For example, YUYV is 50% less than RGBA, while NV21 or I420 is 37.5% less than RGBA.

Of course, there are many different ways to read OpenGL rendering results, depending on the specific requirements and use scenarios, can refer to the article:OpenGL rendering image reads which strong?

I420 format familiar with the students should be very understanding, I420 has three planes (plane), one plane store Y component, the other two planes store UV component respectively.

The width and height of Y plane are the width and height of the image. The width and height of U plane and V plane are half of the width and height of the original image. So I420 image size is width * height + width * height / 4 * 2 = width * height * 1.5.

Note that this size is the same size as the texture applied later for the color buffer to save the generated I420 image.

Set the render buffer texture size according to this size:

GlTexImage2D (GL_TEXTURE_2D, 0, GL_RGBA, m_renderimage.width / 4, m_renderimage.height * 1.5, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);Copy the code

The texture used to hold the generated I420 image can be simply abstracted into the following structure (the data in the texture is not actually arranged like this) :

Why is it width over 4? Since we are using RGBA texture, each pixel takes up 4 bytes, and we only need 1 byte per Y to store.

It can be seen from the texture coordinate in the figure that in the range of texture coordinate Y < (2/3), it is necessary to complete a sampling of the whole texture for generating the image of Y plane.

When the texture coordinate y > (2/3) and y < (5/6) range, the whole texture needs to be sampled again to generate the image of U plane;

Similarly, when texture coordinate y > (5/6) range, the whole texture is sampled again to generate V plane image

The most important thing is to set the viewport correctly: glViewport(0, 0, width / 4, height * 1.5); .

As the viewport width is set to 1/4 of the original, it can be considered simple (actually complicated) to sample every 4 pixels relative to the original image. Since we need to sample every pixel to generate the image of Y Plane, three offsets are also needed.

Similarly, generating images of U plane and V plane also requires 3 additional offset samples, the difference being that each offset requires 2 pixels.

Offset needs to be set to the normalized value of a pixel: 1.0/width. According to the schematic diagram, the sampling process is simplified as a unit of 4 pixels for easy understanding.

In the texture coordinate y < (2/3) range, one sample (plus three offset samples) of 4 RGBA pixels (R,G,B,A) generates 1 (Y0,Y1,Y2,Y3), and fills the width*height buffer at the end of the whole range sampling;

When texture coordinates y > (2/3) and y < (5/6) range, one sample (plus three offset samples) of 8 RGBA pixels (R,G,B,A) generates (U0,U1,U2,U3), and since the width and height of the U plane buffer are 1/2 of the original image, U plane samples in both vertical and horizontal directions are interlaced, and a buffer of width*height/4 is filled at the end of sampling for the entire range.

When the texture coordinate y > (5/6) range, one sample (plus three offset samples) of 8 RGBA pixels (R,G,B,A) is generated (V0,V1,V2,V3). Similarly, because the width and height of the V plane buffer are 1/2 of the original image, the vertical and horizontal directions are interchanged. Fill the width*height/4 buffer at the end of the entire range sample.

Finally we use glReadPixels to read the generated I420 image (note width and height) :

GlReadPixels (0, 0, width / 4, height * 1.5, GL_RGBA, GL_UNSIGNED_BYTE, pBuffer);Copy the code

Code implementation

In the last section, we discussed in detail how Shader implements RGBA to I420. Below, we will directly post some key implementation code.

When creating an FBO, pay attention to the size of the texture as a color buffer (width / 4, height * 1.5), as explained in detail above.

bool RGB2I420Sample::CreateFrameBufferObj(a)
{
	// Create and initialize the FBO texture
	glGenTextures(1, &m_FboTextureId);
	glBindTexture(GL_TEXTURE_2D, m_FboTextureId);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
	glBindTexture(GL_TEXTURE_2D, GL_NONE);

	// Create and initialize the FBO
	glGenFramebuffers(1, &m_FboId);
	glBindFramebuffer(GL_FRAMEBUFFER, m_FboId);
	glBindTexture(GL_TEXTURE_2D, m_FboTextureId);
	glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_FboTextureId, 0);
	// When creating FBO, you need to pay attention to the size of the texture as the color buffer (width / 4, height * 1.5).
	glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, m_RenderImage.width / 4, m_RenderImage.height * 1.5.0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
	if (glCheckFramebufferStatus(GL_FRAMEBUFFER)! = GL_FRAMEBUFFER_COMPLETE) {LOGCATE("RGB2I420Sample::CreateFrameBufferObj glCheckFramebufferStatus status ! = GL_FRAMEBUFFER_COMPLETE");
		return false;
	}
	glBindTexture(GL_TEXTURE_2D, GL_NONE);
	glBindFramebuffer(GL_FRAMEBUFFER, GL_NONE);
	return true;

}
Copy the code

RGBA to I420 complete shader script:

#version 300 es
precision mediump float;
in vec2 v_texCoord;
layout(location = 0) out vec4 outColor;
uniform sampler2D s_TextureMap;
uniform float u_Offset;// Offset 1.0/width
uniform vec2 u_ImgSize;// Image size
//Y = 0.299r + 0.587g + 0.114b
//U = -0.147r-0.289g + 0.436b
//V = 0.615r-0.515g-0.100b
const vec3 COEF_Y = vec3( 0.299.0.587.0.114);
const vec3 COEF_U = vec3(0.147.0.289.0.436);
const vec3 COEF_V = vec3( 0.615.0.515.0.100);
const float U_DIVIDE_LINE = 2.0 / 3.0;
const float V_DIVIDE_LINE = 5.0 / 6.0;
void main(a)
{
    vec2 texelOffset = vec2(u_Offset, 0.0);
    if(v_texCoord.y <= U_DIVIDE_LINE) {
        // In the texture coordinate y < (2/3) range, we need to complete a sample of the entire texture,
        // Generate 1 (Y0,Y1,Y2,Y3) by sampling 4 RGBA pixels (R,G,B,A). Fill the buffer with width*height at the end of sampling;

        vec2 texCoord = vec2(v_texCoord.x, v_texCoord.y * 3.0 / 2.0);
        vec4 color0 = texture(s_TextureMap, texCoord);
        vec4 color1 = texture(s_TextureMap, texCoord + texelOffset);
        vec4 color2 = texture(s_TextureMap, texCoord + texelOffset * 2.0);
        vec4 color3 = texture(s_TextureMap, texCoord + texelOffset * 3.0);

        float y0 = dot(color0.rgb, COEF_Y);
        float y1 = dot(color1.rgb, COEF_Y);
        float y2 = dot(color2.rgb, COEF_Y);
        float y3 = dot(color3.rgb, COEF_Y);
        outColor = vec4(y0, y1, y2, y3);
    }
    else if(v_texCoord.y <= V_DIVIDE_LINE){

		// When texture coordinate y > (2/3) and y < (5/6) range, one sample (plus three offset samples) 8 RGBA pixels (R,G,B,A) generate (U0,U1,U2,U3),
		// Because the width of the U plane buffer is 1/2 of the original image, the U plane samples in the vertical and horizontal directions are interspaced, and the whole range of sampling ends with a buffer of width*height/4.

        float offsetY = 1.0 / 3.0 / u_ImgSize.y;
        vec2 texCoord;
        if(v_texCoord.x <= 0.5) {
            texCoord = vec2(v_texCoord.x * 2.0, (v_texCoord.y - U_DIVIDE_LINE) * 2.0 * 3.0);
        }
        else {
            texCoord = vec2((v_texCoord.x - 0.5) * 2.0, ((v_texCoord.y - U_DIVIDE_LINE) * 2.0 + offsetY) * 3.0);
        }

        vec4 color0 = texture(s_TextureMap, texCoord);
        vec4 color1 = texture(s_TextureMap, texCoord + texelOffset * 2.0);
        vec4 color2 = texture(s_TextureMap, texCoord + texelOffset * 4.0);
        vec4 color3 = texture(s_TextureMap, texCoord + texelOffset * 6.0);

        float u0 = dot(color0.rgb, COEF_U) + 0.5;
        float u1 = dot(color1.rgb, COEF_U) + 0.5;
        float u2 = dot(color2.rgb, COEF_U) + 0.5;
        float u3 = dot(color3.rgb, COEF_U) + 0.5;
        outColor = vec4(u0, u1, u2, u3);
    }
    else {
		// When texture coordinate y > (5/6) range, one sample (plus three offset samples) 8 RGBA pixels (R,G,B,A) are generated (V0,V1,V2,V3),
		// In the same way, since the width and height of the V plane buffer are 1/2 of the original image, the vertical and horizontal directions are interspaced, and the whole range of sampling is completed with a buffer of width*height/4.

        float offsetY = 1.0 / 3.0 / u_ImgSize.y;
        vec2 texCoord;
        if(v_texCoord.x <= 0.5) {
            texCoord = vec2(v_texCoord.x * 2.0, (v_texCoord.y - V_DIVIDE_LINE) * 2.0 * 3.0);
        }
        else {
            texCoord = vec2((v_texCoord.x - 0.5) * 2.0, ((v_texCoord.y - V_DIVIDE_LINE) * 2.0 + offsetY) * 3.0);
        }

        vec4 color0 = texture(s_TextureMap, texCoord);
        vec4 color1 = texture(s_TextureMap, texCoord + texelOffset * 2.0);
        vec4 color2 = texture(s_TextureMap, texCoord + texelOffset * 4.0);
        vec4 color3 = texture(s_TextureMap, texCoord + texelOffset * 6.0);

        float v0 = dot(color0.rgb, COEF_V) + 0.5;
        float v1 = dot(color1.rgb, COEF_V) + 0.5;
        float v2 = dot(color2.rgb, COEF_V) + 0.5;
        float v3 = dot(color3.rgb, COEF_V) + 0.5; outColor = vec4(v0, v1, v2, v3); }}Copy the code

Off-screen rendering and I420 image reading:

void RGB2I420Sample::Draw(int screenW, int screenH)
{
	// Render off-screen
	glBindFramebuffer(GL_FRAMEBUFFER, m_FboId);
    // Render to I420 width pixel to 1/4 width, height * 1.5
    glViewport(0.0, m_RenderImage.width / 4, m_RenderImage.height * 1.5);
	glUseProgram(m_FboProgramObj);
	glBindVertexArray(m_VaoIds[1]);
	glActiveTexture(GL_TEXTURE0);
	glBindTexture(GL_TEXTURE_2D, m_ImageTextureId);
	glUniform1i(m_FboSamplerLoc, 0);

	float texelOffset = (float) (1.f / (float) m_RenderImage.width);
	GLUtils::setFloat(m_FboProgramObj, "u_Offset", texelOffset);
    GLUtils::setVec2(m_FboProgramObj, "u_ImgSize", m_RenderImage.width, m_RenderImage.height);

	glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, (const void *)0);
	glBindVertexArray(0);
	glBindTexture(GL_TEXTURE_2D, 0);

	//I420 buffer = width * height * 1.5;
	uint8_t *pBuffer = new uint8_t[m_RenderImage.width * m_RenderImage.height * 3 / 2];

	NativeImage nativeImage = m_RenderImage;
	nativeImage.format = IMAGE_FORMAT_I420;
	nativeImage.ppPlane[0] = pBuffer;
	nativeImage.ppPlane[1] = pBuffer + m_RenderImage.width * m_RenderImage.height;
    nativeImage.ppPlane[2] = nativeImage.ppPlane[1] + m_RenderImage.width * m_RenderImage.height / 4;

	// Read the generated I420 image using glReadPixels (note width and height)
	glReadPixels(0.0, nativeImage.width / 4, nativeImage.height * 1.5, GL_RGBA, GL_UNSIGNED_BYTE, pBuffer);


	// Save the I420 YUV image
	std::string path(DEFAULT_OGL_ASSETS_DIR);
	NativeImageUtil::DumpNativeImage(&nativeImage, path.c_str(), "RGB2I420");
	delete []pBuffer;

	glBindFramebuffer(GL_FRAMEBUFFER, 0);

}
Copy the code

Question: Why is the efficiency of RGBA to I420 not as high as NV21 using Shader?

The code word is not easy, please give me a thumbs up! The complete implementation code can be seen in the project: github.com/githubhaoha… In the upper right corner, select RGB to I420 Demo.