This section is the fourth section of ffMPEG development player learning notes “Hard Decoding,OpenGL Rendering YUV”
Hard decoding basically (and here can also refer to specific hardware) refers to the GPU to do the decoding. The CPU is designed to be a functional processor with high flexibility and portability. On the other hand, GPU focuses on the processor with large computation but relatively single task, which has strong parallel computing capability. Using GPU to complete the decoding of video frames will reduce CPU usage. Hard decoding takes advantage of gPU-specific circuit design, so the number of hard decoding formats supported by gpus on different platforms is limited. For example, iOS/macOS platform supports hardware decoding of H264 and H265, using VideoToolbox to complete; QSV based on Intel chip, CUDA based on NVIDA.
✅ section 1 – Hello FFmpeg ✅ section 2 – Soft unclip video stream, render RGB24 ✅ section 3 – understand YUV 🔔 section 4 – hard decoding,OpenGL render YUV Disk section 5 – Metal render YUV disk section 6 – Decode audio, play with AudioQueue 📗 section 7 – audio and video synchronization 📗 Section 8 – Perfect playback control 📗 Section 9 – double speed play 📗 section 10 – increase video filtering effect section 11 – audio changes
Demo address of this section:Github.com/czqasngit/f…
Example code is providedObjective-C
withSwift
Two implementations, cited for convenience, areObjective-C
Code becauseSwift
Code Pointers don’t look neat.
The final effect of this section is shown below:
The target
- Understand the difference between FFMPEG hard decoding and soft decoding
- Add hard decoding function
- Understand the OpenGL rendering process
- Set up the OpenGL environment
- OpenGL is used to render YUV420P data
Understand the difference between FFMPEG hard decoding and soft decoding
The hardware types supported in FFMPEG are defined as follows:
enum AVHWDeviceType {
AV_HWDEVICE_TYPE_NONE,
AV_HWDEVICE_TYPE_VDPAU,
AV_HWDEVICE_TYPE_CUDA,
AV_HWDEVICE_TYPE_VAAPI,
AV_HWDEVICE_TYPE_DXVA2,
AV_HWDEVICE_TYPE_QSV,
AV_HWDEVICE_TYPE_VIDEOTOOLBOX,
AV_HWDEVICE_TYPE_D3D11VA,
AV_HWDEVICE_TYPE_DRM,
AV_HWDEVICE_TYPE_OPENCL,
AV_HWDEVICE_TYPE_MEDIACODEC,
AV_HWDEVICE_TYPE_VULKAN,
};
Copy the code
Below is the complete flow chart of hard decoded video stream:
The general differences are as follows:
- 1. Set the hard decoding context hw_device_CTx when creating AVCodecContext.
- 2. Optionally set the AVCodecContext’s target format callback function get_format to tell the FFMPEG decoder the target format to decode at run time
- 3. Read the decoded data from the video memory to the memory
Add hard decoding function
1. Improve hard decoding initialization
Hard decoding is completed with VideoToolBox in macOS. The format specified here is: AV_HWDEVICE_TYPE_VIDEOTOOLBOX. It can also be found by using the following function
av_hwdevice_find_type_by_name("h264_videotoolbox")
Copy the code
The hard decoder information for VideoToolBox is defined as follows:
const AVHWAccel ff_h264_videotoolbox_hwaccel = {
.name = "h264_videotoolbox",
.type = AVMEDIA_TYPE_VIDEO,
.id = AV_CODEC_ID_H264,
.pix_fmt = AV_PIX_FMT_VIDEOTOOLBOX,
.alloc_frame = ff_videotoolbox_alloc_frame,
.start_frame = ff_videotoolbox_h264_start_frame,
.decode_slice = ff_videotoolbox_h264_decode_slice,
.decode_params = videotoolbox_h264_decode_params,
.end_frame = videotoolbox_h264_end_frame,
.frame_params = videotoolbox_frame_params,
.init = videotoolbox_common_init,
.uninit = videotoolbox_uninit,
.priv_data_size = sizeof(VTContext),
};
Copy the code
Determine whether AVCodec in the current operating environment supports AV_HWDEVICE_TYPE_VIDEOTOOLBOX
int hwConfigIndex = 0; bool supportAudioToolBox = false; While (true) {const AVCodecHWConfig *config = avcodec_get_hw_config(self->codec, hwConfigIndex); if(! config) break; if(config->device_type == AV_HWDEVICE_TYPE_VIDEOTOOLBOX) { supportAudioToolBox = true; break; } hwConfigIndex ++; }Copy the code
Read whether AVCodecHWConfig supports AV_HWDEVICE_TYPE_VIDEOTOOLBOX by calling the avCOdec_get_hw_config function to enumerate the supported hardware decoding configurations.
Create an instance of hardware decoding and set AVCodecContext-> hw_device_CTx hard decoding context.
AVBufferRef *hw_device_ctx = NULL; If (supportAudioToolBox) {// create the hardware decoding context and specify the format of the hardware decoding // Since we have determined whether AV_HWDEVICE_TYPE_VIDEOTOOLBOX is supported in the current environment, specify RET = av_hwdevice_ctx_create(&hwDeviceContext, AV_HWDEVICE_TYPE_VIDEOTOOLBOX, NULL, NULL, 0); if(ret ! = 0) goto fail; self->codecContext->hw_device_ctx = self->hwDeviceContext; /// This callback function, when called, will give a list of current AVCodec supported decoding formats /// this array is arranged in order of decoding performance from highest to lowest /// / developers can return the most appropriate /// on demand When decode is not set, the av_hwDEVICe_CTX_CREATE format is used. // It cannot be set to NULL self->hwFrame = AV_frame_ALLOc (); }Copy the code
Get_format in AVCodecContext is a callback function that tells the decoder (AVCodecContext) what its target format is. Its function signature looks like this:
enum AVPixelFormat get_hw_format(struct AVCodecContext *s, const enum AVPixelFormat *fmt);
Copy the code
Where * FMT is an array header address,AVCodec may have a list of supported decoding formats. You can go through it like this and find the format that you want
for (p = pix_fmts; *p ! = 1; p++) { AVPixelFormat fmt = *p; // on iOS/macOS it is AV_PIX_FMT_VIDEOTOOLBOX if (FMT == AV_HWDEVICE_TYPE_VIDEOTOOLBOX) {}}Copy the code
The point of providing this callback function is that when a decoder supports multiple formats, different formats can be selected as appropriate. It should be noted here that it has been determined by AVCodecHWConfig that the current AVCodec supports AV_HWDEVICE_TYPE_VIDEOTOOLBOX in the current operating environment, so get_format can also not be set. Ffmpeg selects the format specified in the av_hwdevice_CTx_CREATE function.
At this point, the initialization of adding hard decodes is complete
2. Improve the hard decoding function
ret = avcodec_receive_frame(self.codecContext, self->hwFrame); if(ret ! = 0) return NULL; av_frame_unref(self->frame); ret = av_hwframe_transfer_data(self->frame, self->hwFrame, 0);Copy the code
Finally, the data frame format of hwFrame is AV_HWDEVICE_TYPE_VIDEOTOOLBOX. Frame is a decoded data frame, and the format hard-solved by using VideoToolBox in macOS is NV12. It is a cousin of YUV420P, which stores YUV in three separate planes, NV12 stores Y separately, and UV data in a separate plane interleaved storage.
Understand the OpenGL rendering process
OpenGL is introduced
Rendering pipeline
The following diagram illustrates the flow of the pipeline:
Vertex shader: Receives a set of vertex data that describes vertex positions, colors, etc. Primitives assembly: Takes all vertices output by the vertex shader as input and assembles all points into the shape of the specified primitives. Geometry shader: receives the image generated by the assembly of primitives and generates new graphics through further calculation. Rasterization: Maps graphics to corresponding pixels on the final screen, generating fragments for use by the Fragment Shader. Trimming is performed before the fragment shader runs. Crop will discard all pixels beyond your view to improve execution efficiency. Fragment shaders: The main purpose is to calculate the final color of a pixel, and this is where all of OpenGL’s advanced effects are generated. Typically, fragment shaders contain data about the 3D scene (such as lighting, shadows, light colors, and so on) that can be used to calculate the color of the final pixels. Testing and Blending: After all the corresponding color values are determined, the final objects are sent to a final stage called the Alpha test and Blending stage. This phase checks the corresponding depth (and Stencil) value of the fragment (described below) to determine whether the pixel is in front of or behind other objects and whether it should be discarded. This phase also checks the alpha value (which defines an object’s transparency) and blends the objects. So, even if the output color of one pixel is calculated in the fragment shader, the final pixel color may be completely different when rendering multiple triangles. Throughout the process, developers need to implement at least one vertex shader and one fragment shader.
Set up the OpenGL environment
Apple does not recommend using OpenGL for macOS. Instead, use Metal. Since the code and ideas are portable, OpenGL will continue to be used on macOS for convenience.
1. Create NSOpenGLContext
MacOS can use NSOpenGLView to draw OpenGL, and NSOpenGLView is also an NSView, so it can be added to any view, create a class that inherits from NSOpenGLView, and set the NSOpenGLContext.
- (NSOpenGLContext *)_createOpenGLContext { NSOpenGLPixelFormatAttribute attr[] = { NSOpenGLPFAOpenGLProfile, NSOpenGLProfileVersion3_2Core, NSOpenGLPFANoRecovery, NSOpenGLPFAAccelerated, NSOpenGLPFADoubleBuffer, NSOpenGLPFAColorSize, 24, 0 }; NSOpenGLPixelFormat *pf = [[NSOpenGLPixelFormat alloc] initWithAttributes:attr]; if (! pf) { NSLog(@"No OpenGL pixel format"); } NSOpenGLContext *openGLContext = [[NSOpenGLContext alloc] initWithFormat:pf shareContext: nil] ; return openGLContext; } [self setOpenGLContext:[ self _createOpenGLContext]];Copy the code
2. Draw the image
The general flow of OpenGL drawing looks like this:
CGLLockContext([self.openGLContext CGLContextObj]); CGLContextObj ([self.openGLContext CGLContextObj]); // Set the current OpenGL operation context [self.openGLContext makeCurrentContext]; // clear the screen glClearColor(0.0, 0.0, 0, 0); // clear the color buffer glClear(GL_COLOR_BUFFER_BIT); // enable glEnable(GL_TEXTURE_2D); OpenGLContext flushBuffer [self.opengLContext flushBuffer]; // unlock CGLUnlockContext([self.openGLContext CGLContextObj]);Copy the code
In the code above, you can see a black screen view. This is the necessary step for OpenGL to draw each frame
3. Initialize the OpenGL applets
OpenGL’s vertex and fragment shaders are written in the GLSL language, and each small program has a main function. Vertex shader
#version 410 layout (location = 0) in vec3 pos; layout (location = 1) in vec2 textPos; out vec2 outTextPos; Void main() {gl_Position = vec4(pos, 1.0); outTextPos = textPos; }Copy the code
Layout (location = 0): Defines the position of this variable, which can be used to specify vertex data in subsequent CPU code. In: Indicates that the data for this variable is passed in, either from the CPU or from the previous shader. Vec3: Defines a data object with three variables. Vec2: Defines a two-variable data object. Out: indicates that this variable is output to the next shader program. A variable with the same name in a subsequent shader will get the data passed from the current shader program. Gl_Position: a variable built into the vertex shader program that represents the position of the current point. Fragment shader
#version 410 out vec4 FragColor; in vec2 outTextPos; uniform sampler2D yTexture; uniform sampler2D uTexture; uniform sampler2D vTexture; void main() { float y = texture(yTexture, outTextPos).r; float cb = texture(uTexture, outTextPos).r; float cr = texture(vTexture, outTextPos).r; Float r = y + 1.403 * (cr-0.5); float r = y + 1.403 * (cr-0.5); float r = y + 1.403 * (cr-0.5); float r = y + 1.403 * (cr-0.5); Float g = y-0.343 * (cb-0.5) -0.714 * (cr-0.5); float g = y-0.343 * (cb-0.5) -0.714 * (cr-0.5); Float b = y + 1.770 * (cb-0.5); float b = y + 1.770 * (cb-0.5); // Pass FragColor = VEC4 (r, g, b, 1.0) and FragColor = vec4(r, g, b, 1.0); }Copy the code
FragColor: Is the specific color value of a pixel in space. In earlier versions of OpenGL, the fragment shader used gl_FragColor to output the specific color value to be displayed. In current versions, you simply define an output variable. Uniform sampler2D: Defines a texture variable that can be interpreted as a binary data format with shapes. Sampler2D represents flat data with width and height. Sampler3D represents 3D stylistic data with width, width and height of the surface. A uniform decorated variable is the data passed by the user to the shader. It is invariant through all available shader phases. It must be defined as a global variable and stored in a Program Object.
Compiler shader to compile a good shader program source code needs to use OpenGL provided by the compiler function to compile
/// (GLuint)_compileShader:(NSString *)shaderName shaderType:(GLuint)shaderType {if(shadername.length == 0) return -1; NSString *shaderPath = [[NSBundle mainBundle] pathForResource:shaderName ofType:@"glsl"]; NSError *error; NSString *source = [NSString stringWithContentsOfFile:shaderPath encoding:NSUTF8StringEncoding error:&error]; if(error) return -1; GLuint shader = glCreateShader(shaderType); const char *ss = [source UTF8String]; glShaderSource(shader, 1, &ss, NULL); glCompileShader(shader); int success; glGetShaderiv(shader, GL_COMPILE_STATUS, &success); if(! success) { char infoLog[512]; glGetShaderInfoLog(shader, 512, NULL, infoLog); printf("shader error msg: %s \n", infoLog); } return shader; }Copy the code
A compiled shader needs to be managed using the Program provided by OpenGL. Maybe your program has more than one drawing logic. You can switch the program in context to the program you want to use, and use the vertex and fragment shaders set in the current state when drawing.
- (BOOL)_setupOpenGLProgram {/// Set the current OpenGL operation context [self.openGLContext makeCurrentContext]; // create glProgram _glProgram = glCreateProgram(); _vertextShader = [self _compileShader:@"vertex" shaderType:GL_VERTEX_SHADER]; // _compilmentShader = [self _compileShader:@"yuv_fragment" shaderType:GL_FRAGMENT_SHADER]; GlAttachShader (_glProgram, _vertextShader); // add a fragment shader to the OpenGL applet glAttachShader(_glProgram, _fragmentShader); GlLinkProgram (_glProgram); GLint success; glGetProgramiv(_glProgram, GL_LINK_STATUS, &success); if(! success) { char infoLog[512]; glGetProgramInfoLog(_glProgram, 512, NULL, infoLog); printf("Link shader error: %s \n", infoLog); } glDeleteShader(_vertextShader); glDeleteShader(_fragmentShader); return success; }Copy the code
4. Initialize the OpenGL object
VBO(Vertex buffer object)
There are a lot of objects in OpenGL, and the first thing you need to use is VBO which is a very important object in OpenGL. The GPU cannot read or write data from memory. The data to be used in the shader program must be stored in video memory before the GPU can access it. VBO is to open up a video memory area, the CPU can send data to video memory through VBO, vertex shaders need to access the vertex data in VBO. Its implementation looks like this:
GlGenBuffers (1, &_vbo); Float vertices[] = {// positions // texture coords 0.2f, 0.2f, 0.2f, 0.2f, 0, // top right 0.2f, -0.2f, 0.2f, 0, // bottom right-1.0f, -1.0f, 0.0f, 0.0f, 0.0f, 1, // Bottom right-1.0f, -1.0f, 0.0f, 0.0f, 1, // Bottom right-1.0f, -1.0f, 0.0f, 0.0f, 1, // Bottom right-1.0f, -1.0f, 0.0f, 0.0f, 1, // 1.0f, 0.0f, 0.0f, 0.0f, 0, // top left 1.0f, 1.0f, 0.0f, 1.0f, 0, // top right}; GlBindBuffer (GL_ARRAY_BUFFER, _VBO); GL_ARRAY_BUFFER (GL_ARRAY_BUFFER, _VBO); GL_ARRAY_BUFFER // GL_STATIC_DRAW indicates that the data will not be modified. Put it in a more suitable location on the GPU memory to increase its read speed glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW); /// First argument: argument position /// second argument: read data once // third argument: data type /// fourth argument: normalized data /// fifth argument: How many data intervals are there for the next data read? GlVertexAttribPointer (0, 3, GL_FLOAT, GL_FALSE, 5 * sizeof(float), (void*)0); / / / to enable location to 0 in the vertex shader parameters glEnableVertexAttribArray (0); GlVertexAttribPointer (1, 2, GL_FLOAT, GL_FALSE, glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 5 * sizeof(float), (void*)(3 * sizeof(float))); glEnableVertexAttribArray(1);Copy the code
VAO(Vertex Data Object)
The process of configuring VBO, sending data from CPU to GPU is logically fixed. If you need to draw a complex graph with many vertices, it is not necessary to reconfigure the data every time. OpenGL provides VAO to help developers simplify this process. Create the VAO, bind the current context state to the current VAO, and subsequent operations to the VBO will be recorded by the VAO object. At the time of drawing, you only need to configure the VAO specified by the binding. As an added benefit, if the drawing process involves switching data between multiple VBO objects, it is more important to record the configuration state in advance using the VAO, which can be switched. The code looks like this:
// Create VAO glGenVertexArrays(1, &_vao); GlBindVertexArray (_VAO); // Bind the VAO of the current context to _VAO, and subsequent operations on _VBO will be logged by _VAO. // unbind VAO glBindVertexArray(0);Copy the code
Vertex data is normalized in OpenGL. The value range of position data is [-1, 1], and the value range of texture position is [0, 1]. Vertex data is configured using three points, where the z-axis is always 0. This can also be configured with two values X and Y, and setting the z-axis value to 0 in the vertex shader is also possible. The texture draws a 2D texture, so choose two values to configure.
Texture object
Creation process:
// create texture object glGenTextures(1, &_ytexture); // set the current context texture object to _yTexture glBindTexture(GL_TEXTURE_2D, _yTexture); GlTexParameteri (GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); GlGenerateMipmap (GL_TEXTURE_2D); GlBindTexture (GL_TEXTURE_2D, 0);Copy the code
• 1. Create texture objects. Create a block of memory in video memory for creating texture objects. • 2. Bind the texture object of the current context. After binding, the operation on the texture object using OpenGL function becomes the operation on the bound texture object. • 3. Set the sampling method when the texture exceeds and shrinks the image. • 4. Set multi-level fade mode of texture. When the image is reduced, OpenGL sampling will sample the pixel points of the image according to the way that people see the distance. The farther away an image looks, the smaller it looks. OpenGL simulates this by sampling pixels at different levels of remoteness. When we zoom in on the image, it looks more realistic. • 5. Unbind, because other textures may be set, unbind and bind at drawing time. Y, U and V textures are created the same way.
Texture object
The drawing process of 2d image data is the process of sampling and converting to RGB display. Sampling looks like this: When drawing a triangle,OpenGL determines which points of the two-dimensional image to sample based on the corresponding relationship between the configured vertices and texture coordinates. Once sampled, a triangular texture image is drawn.
OpenGL is used to render YUV420P data
The configuration of the VBO data is completed at the time of initialization, and it is fixed. Configuring texture Objects
Find the shader program in the fragment and send the data to the GPU
GlActiveTexture (GL_TEXTURE0); glActiveTexture(GL_TEXTURE0); GlBindTexture (GL_TEXTURE_2D, _yTexture); // bind the yTexture variable to texture 0. /* The first parameter target: specifies the target texture, which must be GL_TEXTURE_2D. The second parameter level: execution detail level. 0 is the most basic image level and n represents the NTH texture refinement level. Third parameter, internalFormat: Specifies the color components in the texture. The available values are GL_ALPHA,GL_RGB,GL_RGBA,GL_LUMINANCE, GL_LUMINANCE_ALPHA, etc. It specifies the format of the texture's data in video memory. Width: Specifies the width of the texture image, which must be 2 to the power of n. The fifth parameter height: specifies the height of the texture image. It must be 2 to the power of m. The texture image must support at least 64 texture elements. The sixth parameter, border, specifies the width of the border. Must be 0. Format: Specifies the color format of the pixel data. The value does not need to be the same as internalformatt. Refer to internalformat for optional values. It specifies the data format of the data source. The eighth parameter type: specifies the data type of the pixel data. The available values are GL_UNSIGNED_BYTE,GL_UNSIGNED_SHORT_5_6_5,GL_UNSIGNED_SHORT_4_4_4_4, and GL_UNSIGNED_SHORT_5_5_5_1. Ninth parameter Pixels: GlTexImage2D (GL_TEXTURE_2D, 0, GL_RED, videoWidth, videoHeight, 0, GL_RED, GL_UNSIGNED_BYTE, yuvFrame->data[0]); GlUniform1i (glGetUniformLocation(_glProgram, "yTexture"), 0);Copy the code
Complete drawing process
int videoWidth = yuvFrame->width; int videoHeight = yuvFrame->height; CGLLockContext([self.openGLContext CGLContextObj]); [self.openGLContext makeCurrentContext]; GlClearColor (0.0, 0.0, 0, 0). glClear(GL_COLOR_BUFFER_BIT); glEnable(GL_TEXTURE_2D); glUseProgram(_glProgram); /// Y glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, _yTexture); glTexImage2D(GL_TEXTURE_2D, 0, GL_RED, videoWidth, videoHeight, 0, GL_RED, GL_UNSIGNED_BYTE, yuvFrame->data[0]); glUniform1i(glGetUniformLocation(_glProgram, "yTexture"), 0); /// U glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, _uTexture); glTexImage2D(GL_TEXTURE_2D, 0, GL_RED, videoWidth / 2, videoHeight / 2, 0, GL_RED, GL_UNSIGNED_BYTE, yuvFrame->data[1]); glUniform1i(glGetUniformLocation(_glProgram, "uTexture"), 1); /// V glActiveTexture(GL_TEXTURE2); glBindTexture(GL_TEXTURE_2D, _vTexture); glTexImage2D(GL_TEXTURE_2D, 0, GL_RED, videoWidth / 2, videoHeight / 2, 0, GL_RED, GL_UNSIGNED_BYTE, yuvFrame->data[2]); glUniform1i(glGetUniformLocation(_glProgram, "vTexture"), 2); glBindVertexArray(_VAO); glDrawArrays(GL_TRIANGLES, 0, 6); [self.openGLContext flushBuffer]; CGLUnlockContext([self.openGLContext CGLContextObj]);Copy the code
If you draw a texture image with the vertex data configured above, the resulting rendered image looks like this:
This is because the zero coordinates to draw the triangle are in the lower left corner, while the starting coordinates of the image are in the upper left corner. Change the Y value of the texture coordinate to 1,0 and 0.
Float vertices[] = {// positions // texture coords 0.2f, 0.2f, 0.2f, 0.2f, 0, // top right 0.2f, -0.2f, 0.2f, 0, // bottom right-1.0f, -1.0f, 0.0f, 0.0f, 0.0f, 1, // Bottom right-1.0f, -1.0f, 0.0f, 0.0f, 1, // Bottom right-1.0f, -1.0f, 0.0f, 0.0f, 1, // Bottom right-1.0f, -1.0f, 0.0f, 0.0f, 1, // 1.0f, 0.0f, 0.0f, 0.0f, 0, // top left 1.0f, 1.0f, 0.0f, 1.0f, 0, // top right};Copy the code
At this point, the general process and logic of using OpenGL to accelerate rendering and using GPU to accelerate decoding are completed. With the use of GPU to complete decoding and rendering, the CPU usage is greatly reduced.
Conclusion:
• Hard decoding basically refers to the use of GPU-specific circuits to complete the decoding of specified data formats. Different Gpus support different hardware decoding formats. • STRONG CPU usage and wide coverage. Gpus have performance advantages for certain functions, such as hardware decoding. • Improve hardware decoding initialization, decoding and data reading on the basis of soft decoding. • General understanding of OpenGL rendering process and features. • Use NSOpenGLView and NSOpenGLContext to build an OpenGL drawing environment on macOS. • Understand the simple syntax of shader programs, write small shader programs, and use OpenGL to render YUV420P format data.