The original article was first published on the wechat official account Byteflow

What is the PBO

OpenGL PBO (Pixel Buffer Object), known as Pixel Buffer Object, is mainly used for asynchronous Pixel transfer operations. PBO is only used to perform pixel transfers, is not connected to textures, and is independent of FBO (frame buffer object).

OpenGL PBO (Pixel buffer object) is similar to VBO (vertex buffer object) in that PBO also opens up GPU cache and stores image data.

There are two Target tags associated with PBO binding: GL_PIXEL_UNPACK_BUFFER and GL_PIXEL_PACK_BUFFER.

Where binding PBO to GL_PIXEL_UNPACK_BUFFER, glTexImage2D() and glTexSubImage2D() indicate unpacking pixel data from PBO and copying it to the frame buffer.

When binding PBO to GL_PIXEL_PACK_BUFFER, glReadPixels() means that pixel data is read from the frame buffer and packed into the PBO.

Why PBO

In OpenGL development, especially in low-end platform processing high resolution image, image data copy before memory and video memory often cause performance bottleneck, but using PBO can solve this problem to a certain extent.

PBO can be used to quickly transfer pixel data between gpus’ caches without affecting CPU clock cycles. In addition, PBO also supports asynchronous transfer.

The image data is first loaded into CPU memory and then copied from CPU memory to OpenGL texture object (GPU memory) by glTexImage2D function. The two data transfers (loading and copying) are completely performed and controlled by CPU.

As shown above, the image data in the file can be directly loaded into the PBO, which is controlled by the CPU. We can obtain the memory address of THE GPU buffer corresponding to the PBO through glMapBufferRange.

Loading image data into PBO and transferring image data from PBO to texture objects is completely controlled by the GPU and does not occupy CPU clock cycles. So, after binding the PBO, executeglTexImage2D(Transfer image data from PBO to texture object) operation, the CPU does not wait, can return immediately.

By comparing these two methods, you can see that the transfer of image data using PBO saves a cpu-consuming step (copying image data from CPU memory to texture object).

How to use the PBO

int imgByteSize = m_Image.width * m_Image.height * 4;//RGBA

glGenBuffers(1, &uploadPboId);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboId);
glBufferData(GL_PIXEL_UNPACK_BUFFER, imgByteSize, 0, GL_STREAM_DRAW);

glGenBuffers(1, &downloadPboId);
glBindBuffer(GL_PIXEL_PACK_BUFFER, downloadPboId);
glBufferData(GL_PIXEL_PACK_BUFFER, imgByteSize, 0, GL_STREAM_DRAW);
Copy the code

The creation and initialization of a PBO is similar to a VBO, and the above example shows creating a PBO and applying for a buffer of size imgByteSize. Binding to GL_PIXEL_UNPACK_BUFFER means that the PBO is used to transfer pixel data from the program to OpenGL; Binding to GL_PIXEL_PACK_BUFFER means that the PBO is used to read back pixel data from OpenGL.

From the above we know that when loading image data to texture object, the CPU is responsible for copying image data to PBO, and the GPU is responsible for transferring image data from PBO to texture object. Therefore, when we use multiple PBos, the two steps can be carried out simultaneously by exchanging PBOS for copy and transmission.

Load the image data into the texture object using two PBos

As shown in the figure, two PBos are used to load the image data into the texture object, and glTexSubImage2D is used to inform the GPU to transfer the image data from PBO1 to the texture object, while the CPU copies the new image data into PBO2.

int dataSize = m_RenderImage.width * m_RenderImage.height * 4;

// Use 'glTexSubImage2D' to pass image data from PBO1 to the texture object
int index = m_FrameIndex % 2;
int nextIndex = (index + 1) % 2;

BEGIN_TIME("PBOSample::UploadPixels Copy Pixels from PBO to Textrure Obj")
glBindTexture(GL_TEXTURE_2D, m_ImageTextureId);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_UploadPboIds[index]);
// glTexSubImage2D is immediately returned without affecting the CPU clock cycle
glTexSubImage2D(GL_TEXTURE_2D, 0.0.0, m_RenderImage.width, m_RenderImage.height, GL_RGBA, GL_UNSIGNED_BYTE, 0);
END_TIME("PBOSample::UploadPixels Copy Pixels from PBO to Textrure Obj")

// Update image data and copy it to PBO
BEGIN_TIME("PBOSample::UploadPixels Update Image data")
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_UploadPboIds[nextIndex]);
glBufferData(GL_PIXEL_UNPACK_BUFFER, dataSize, nullptr, GL_STREAM_DRAW);
GLubyte *bufPtr = (GLubyte *) glMapBufferRange(GL_PIXEL_UNPACK_BUFFER, 0,
											   dataSize,
											   GL_MAP_WRITE_BIT |
											   GL_MAP_INVALIDATE_BUFFER_BIT);

LOGCATE("PBOSample::UploadPixels bufPtr=%p",bufPtr);
if(bufPtr)
{
	memcpy(bufPtr, m_RenderImage.ppPlane[0].static_cast<size_t>(dataSize));

    //update image data
	int randomRow = rand() % (m_RenderImage.height - 5);
	memset(bufPtr + randomRow * m_RenderImage.width * 4.188.static_cast<size_t>(m_RenderImage.width * 4 * 5));
	glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
}
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
END_TIME("PBOSample::UploadPixels Update Image data")

Copy the code

Let’s compare the time difference between loading image data into texture objects using two PBos and no PBO:

Use two Pbos to read back image data from the frame buffer

As shown in the figure above, two PBos are used to read back image data from the frame buffer, and glReadPixels is used to inform GPU to read image data from the frame buffer back to PBO1. Meanwhile, THE CPU can directly process the image data in PBO2.

/ / exchange of PBO
int index = m_FrameIndex % 2;
int nextIndex = (index + 1) % 2;

// Read image data from the frame buffer back to the PBO
BEGIN_TIME("DownloadPixels glReadPixels with PBO")
glBindBuffer(GL_PIXEL_PACK_BUFFER, m_DownloadPboIds[index]);
glReadPixels(0.0, m_RenderImage.width, m_RenderImage.height, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
END_TIME("DownloadPixels glReadPixels with PBO")

// glMapBufferRange gets the pointer to the PBO buffer
BEGIN_TIME("DownloadPixels PBO glMapBufferRange")
glBindBuffer(GL_PIXEL_PACK_BUFFER, m_DownloadPboIds[nextIndex]);
GLubyte *bufPtr = static_cast<GLubyte *>(glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0,
                                                       dataSize,
                                                       GL_MAP_READ_BIT));
if (bufPtr) {
    nativeImage.ppPlane[0] = bufPtr;
    //NativeImageUtil::DumpNativeImage(&nativeImage, "/sdcard/DCIM", "PBO");
    glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
}
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
END_TIME("DownloadPixels PBO glMapBufferRange")
Copy the code

Let’s compare the time difference between reading image data from frame buffer with PBO and without PBO:

Comparing performance data, it can be seen that using PBO is significantly superior to the traditional glReadPixels method.

Implementation code path: NDK_OpenGLES_3_0

Refer to the article

www.songho.ca/opengl/gl_p…

Contact and exchange