Before writing an article OpenGL using Shader to implement RGBA to YUYV article, several readers in the background suggested to write an article shader to implement RGBA to NV21 article, because in practice NV21 format is used more, so I put this article out today.

For YUV image knowledge and viewing tools, you can refer to the following push, which will not be described in this article.

The basic processing of YUV image is mastered in this paper

10bit YUV (P010) storage structure and processing

Shader to achieve RGBA to NV21 benefits

In many cases, after OpenGL completes image rendering, the rendered image needs to be read into memory for further processing. In this case, direct reading of high resolution images (RGBA) using glReadPixels often brings performance problems, especially in video processing or camera preview scenes.

At this time you may consider using PBO, HardwareBuffer, ImageReader, etc., refer to OpenGL image read which strong? The article.

Although the above methods can solve the performance problem of large graph reading to a certain extent, they also bring high implementation complexity and compatibility problems. For example, HardwareBuffer requires Android 26 or later.

When super-large image reading is not involved, Shader is generally used to transfer RGBA to YUV, so that the amount of transmitted data can be effectively reduced and the performance can be improved when reading images using glReadPixels. For example, the YUYV format is reduced to 50% of the original RGBA data volume, while NV21 format can be reduced to 37.5%.

Shader implements the RGBA to NV21 principle

If you are familiar with NV21, you should know that NV21 has two planes. One plane stores Y components, and the other plane stores the VU components that are arranged in staggered order.

Where Y plane width and height is the width of the image, VU plane height is half of the original image height, so NV21 image memory size is width * height * 1.5. Note this size, and the texture applied for the color buffer will be the same size for storing generated NV21 images.

The texture used to hold the generated NV21 image can simply be abstracted into the following structure (the data in the texture is not actually arranged like this) :

Why is it width over 4? Since we are using RGBA texture, each pixel takes up 4 bytes, and we only need 1 byte per Y to store.

It can be seen from the texture coordinate in the figure that in the range of texture coordinate Y < (2/3), it is necessary to complete a sampling of the whole texture for generating the image of Y plane. When the texture coordinate y > (2/3) range, the entire texture needs to be sampled again to generate VU plane image.

The most important thing is to set the viewport correctly: glViewport(0, 0, width / 4, height * 1.5); . As the viewport width is set to 1/4 of the original, it can be considered simple (actually complicated) to sample every 4 pixels relative to the original image. Since we need to sample every pixel to generate the image of Y Plane, three offsets are also needed.

Similarly, generating an image of VU Plane also requires 3 additional offset samples.

Offset needs to be set to the normalized value of a pixel: 1 (Y0,Y1,Y2,Y3) is generated from 4 RGBA pixels (R,G,B,A) at the texture coordinate y < (2/3) range. Fill the width*height buffer at the end of the sampling range;

When texture coordinates y > (2/3) range, one sample (plus three offset samples) of 4 RGBA pixels (R,G,B,A) generates 1 (V0,U0,V0,U1), and since the VU plane buffer is height/2, The VU plane samples in the vertical direction are interlaced, and a width*height/2 buffer is filled at the end of the sampling for the entire range.

Finally we use glReadPixels to read the generated NV21 image (note width and height) :

GlReadPixels (0, 0, width / 4, height * 1.5, GL_RGBA, GL_UNSIGNED_BYTE, pBuffer);Copy the code

Code implementation

In the last section we discussed in detail how Shader implements NV21 from RGBA, but below we will directly post some key implementation code.

When creating an FBO, pay attention to the size of the texture as a color buffer (width / 4, height * 1.5), as explained in detail above.

bool RGB2NV21Sample::CreateFrameBufferObj(a)
{
	// Create and initialize the FBO texture
	glGenTextures(1, &m_FboTextureId);
	glBindTexture(GL_TEXTURE_2D, m_FboTextureId);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
	glBindTexture(GL_TEXTURE_2D, GL_NONE);

	// Create and initialize the FBO
	glGenFramebuffers(1, &m_FboId);
	glBindFramebuffer(GL_FRAMEBUFFER, m_FboId);
	glBindTexture(GL_TEXTURE_2D, m_FboTextureId);
	glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_FboTextureId, 0);
	glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, m_RenderImage.width / 4, m_RenderImage.height * 1.5.0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
	if (glCheckFramebufferStatus(GL_FRAMEBUFFER)! = GL_FRAMEBUFFER_COMPLETE) {LOGCATE("RGB2NV21Sample::CreateFrameBufferObj glCheckFramebufferStatus status ! = GL_FRAMEBUFFER_COMPLETE");
		return false;
	}
	glBindTexture(GL_TEXTURE_2D, GL_NONE);
	glBindFramebuffer(GL_FRAMEBUFFER, GL_NONE);
	return true;

}
Copy the code

RGBA to NV21 complete shader script:

#version 300 es
precision mediump float;
in vec2 v_texCoord;
layout(location = 0) out vec4 outColor;
uniform sampler2D s_TextureMap;
uniform float u_Offset;  // Offset 1.0/width
//Y = 0.299r + 0.587g + 0.114b
//U = -0.147r-0.289g + 0.436b
//V = 0.615r-0.515g-0.100b
const vec3 COEF_Y = vec3( 0.299.0.587.0.114);
const vec3 COEF_U = vec3(0.147.0.289.0.436);
const vec3 COEF_V = vec3( 0.615.0.515.0.100);
const float UV_DIVIDE_LINE = 2.0 / 3.0;
void main(a)
{
    vec2 texelOffset = vec2(u_Offset, 0.0);
    if(v_texCoord.y <= UV_DIVIDE_LINE) {
        // In the texture coordinate y < (2/3) range, we need to complete a sample of the entire texture,
        // Generate 1 (Y0,Y1,Y2,Y3) by sampling 4 RGBA pixels (R,G,B,A). Fill the buffer with width*height at the end of sampling;

        vec2 texCoord = vec2(v_texCoord.x, v_texCoord.y * 3.0 / 2.0);
        vec4 color0 = texture(s_TextureMap, texCoord);
        vec4 color1 = texture(s_TextureMap, texCoord + texelOffset);
        vec4 color2 = texture(s_TextureMap, texCoord + texelOffset * 2.0);
        vec4 color3 = texture(s_TextureMap, texCoord + texelOffset * 3.0);

        float y0 = dot(color0.rgb, COEF_Y);
        float y1 = dot(color1.rgb, COEF_Y);
        float y2 = dot(color2.rgb, COEF_Y);
        float y3 = dot(color3.rgb, COEF_Y);
        outColor = vec4(y0, y1, y2, y3);
    }
    else {
        // When texture coordinate y > (2/3) range, one sample (plus three offset samples) 4 RGBA pixels (R,G,B,A) generates 1 (V0,U0,V0,U1),
        // Because the VU plane buffer is height/2, the VU plane samples in the vertical direction are interlaced. At the end of the sampling, a buffer of width*height/2 is filled.
        vec2 texCoord = vec2(v_texCoord.x, (v_texCoord.y - UV_DIVIDE_LINE) * 3.0);
        vec4 color0 = texture(s_TextureMap, texCoord);
        vec4 color1 = texture(s_TextureMap, texCoord + texelOffset);
        vec4 color2 = texture(s_TextureMap, texCoord + texelOffset * 2.0);
        vec4 color3 = texture(s_TextureMap, texCoord + texelOffset * 3.0);

        float v0 = dot(color0.rgb, COEF_V) + 0.5;
        float u0 = dot(color1.rgb, COEF_U) + 0.5;
        float v1 = dot(color2.rgb, COEF_V) + 0.5;
        float u1 = dot(color3.rgb, COEF_U) + 0.5; outColor = vec4(v0, u0, v1, u1); }}Copy the code

Off-screen rendering and NV21 image reading:

void RGB2NV21Sample::Draw(int screenW, int screenH)
{
	// Render off-screen
	glBindFramebuffer(GL_FRAMEBUFFER, m_FboId);
	// Render to NV21 width pixel to 1/4 width, height * 1.5
    glViewport(0.0, m_RenderImage.width / 4, m_RenderImage.height * 1.5);
	glUseProgram(m_FboProgramObj);
	glBindVertexArray(m_VaoIds[1]);
	glActiveTexture(GL_TEXTURE0);
	glBindTexture(GL_TEXTURE_2D, m_ImageTextureId);
	glUniform1i(m_FboSamplerLoc, 0);
	float texelOffset = (float) (1.f / (float) m_RenderImage.width);
	GLUtils::setFloat(m_FboProgramObj, "u_Offset", texelOffset);// Offset 1.0/width
	glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, (const void *)0);
	glBindVertexArray(0);
	glBindTexture(GL_TEXTURE_2D, 0);

	//NV21 buffer = width * height * 1.5;
	uint8_t *pBuffer = new uint8_t[m_RenderImage.width * m_RenderImage.height * 3 / 2];

	NativeImage nativeImage = m_RenderImage;
	nativeImage.format = IMAGE_FORMAT_NV21;
	nativeImage.ppPlane[0] = pBuffer;
	nativeImage.ppPlane[1] = pBuffer + m_RenderImage.width * m_RenderImage.height;

    // Read the generated NV21 image using glReadPixels (note width and height)
    glReadPixels(0.0, nativeImage.width / 4, nativeImage.height * 1.5, GL_RGBA, GL_UNSIGNED_BYTE, pBuffer);

	std::string path(DEFAULT_OGL_ASSETS_DIR);
	NativeImageUtil::DumpNativeImage(&nativeImage, path.c_str(), "RGB2NV21");
	delete [] pBuffer;

	glBindFramebuffer(GL_FRAMEBUFFER, 0);

}
Copy the code

Using Shader to transform RGBA into NV21 for glReadPixels reading and directly using HardwareBuffer for glReadPixels reading. Through testing images with 5K resolution, it is found that there is no significant performance difference between the two methods.

The code word is not easy, please give me a thumbs up! The complete implementation code can be seen in the project: github.com/githubhaoha… In the upper right corner, select RGB to NV21 Demo.