ECUG Meetup Review: Audio and Video Applications in Cocos Engine

In this paper, based on the “big cousin” li Yang (Cocos engine), head of the preacher held on June 26, 2021 ECUG Meetup 1) | 2021 audio and video technology website to share best practices, hangzhou sorting. The main structure of this paper is as follows:

Introduction to Cocos engine
Cocos Creator && Engine
Cocos Creator video player based on FFmpeg
Cocos audio and video playback and transformation

To get “Lecturers full PPT”, please add ECUG assistant wechat (wechat ID: ECUGCON) and note “ECUG PPT”. The rest of the lecturers will also be released in the future, please stay tuned.

The following is the sharing text:

Guys, I’m Li Yang, head of engine Evangelist at Cocos. I have been writing code for about seven years, and I have been working on the development of mobile games based on Cocos engine. One game that may be familiar to everyone is Fishing Master.

Now I am in charge of evangelist at Cocos. In my opinion, this career is a developer’s assistant and technical navigator, mainly assisting developers who use the company’s engine to develop games more efficiently, helping them solve technical problems and providing support to the engine department.

1. Introduction to Cocos engine

Cocos takes “technology-driven efficiency improvement in digital content industry” as its philosophy. Since its inception in 2011, it has 1.4 million registered developers worldwide and nearly 100,000 registered games. About 1.6 billion devices in 203 countries and territories have downloaded the Cocos engine to make games or apps.

The chart above shows some market share of Cocos engines. Mobile games account for 40 percent of the market, small games account for 64 percent, and online education apps account for 90 percent. Online education Cocos has a dedicated education editor that uses a ppT-like approach to develop an interactive game.

On the left is some data from May 2020, and on the right is some examples of games. “Sword and Expedition” I don’t know if you have played, “Animal restaurant” is a creative game micro channel small games, users and water over 100 million level; World Craft was developed in a month, and of course the team size was huge. Finally, this game was developed by a team of only three people, with only one programmer. As a result, our engine and editor were able to develop games very efficiently and quickly.

We cover a lot of areas, including education, in-car, digital twin, online exhibition, etc. Tencent, Huawei, netease and so on are all our partners, and they are all using Cocos engine for development.

Visual editing can be carried out with Cocos editor, and game creation can be realized in modules to show the state of what you see is what you get.

Cocos editor can do 2D and 3D games. It also has animation editors for art workflows, particle systems, and so on, lowering the barriers to game development through very simple modularity.

Above is our cross-platform workflow. 3DS Max and Maya are third-party software for making models; The UI and Animation visualizations can be done using the Cocos editor. Code IDE to write our logic Code. What we see is what we get.

Cocos Creator && Engine

The above diagram shows the architecture of Cocos Creator engine. Game Assets are our resources, Game Logic is the code we wrote. Then we have the Cocos Creator Engine, including rendering, Scene Graph, our resources and data, effects, physics and componentization, UI, and finally publishing to the platform. Below is our backend rendering, Metal for Apple, Vulkan for Android and WebGL for the Web.

The Cocos engine is now a bigger part of the 2D mobile game, but we’re doing very well in 3D. Huawei Haisi also helped us implement a pipeline for delayed rendering, which I will describe in detail later, which is relatively realistic.

Part of our engine is 100% open source. The open source engine is on Github. There is also a preference design in the editor, which means we download it with a corresponding version of the engine. However, as for the open source part, you can change our Engine, choose your own path on TS Engine and Native C++ Engine, and use the customizable Engine to make the version you want. This is where we add more diversity and customization to the interface for all developers.

I’m going to popularize texture rendering a little bit. The CPU reads the Data from memory and sends the Vertex Data to the Vertex Shader, where it is clipped to the fragment Shader, mixed and tested, and finally onto the screen. This is the entire rendering process.

In Cocos Creator Renderer, RenderScene manages all nodes on the engine, and FrameGraph is the data dependency. We have two pipelines here, one of which is Forward, rendering Forward.

What’s the problem with forward rendering? It is fairly linear and passes each geometry down the pipe one at a time to produce the final image. But there’s an overlap between the two models, so there’s a model in front, a model in the middle, and another model in the back, but the real camera only sees the first one. However, for imaging rendering, I would render all three of them, but only one of them will be visualized, so the calculation of the occluded part behind is redundant.

In delayed rendering, the coloring is delayed a bit until all the geometry has passed through the pipe and the final image is generated. When the vertex data goes to the vertex shader, to the fragment shader, it will store the data to the G-buffer. The next stage is to transfer the DATA from the G-buffer to the vertex shader. At this time, the vertex shader’s processing program for G-buffer data is different from the vertex shader program in the previous stage. It is two sets of logic, and its data processing mode is different.

Next, it goes to the fragment shader. The fragment shader is based on the pixels on the screen and will only render the graphics to be displayed on the current pixel. Other blocked images are not included in the calculation, so it will save a lot of calculation time.

When we talked about rendering interfaces for different platforms, we made a separate pack of GFX. It turns the interfaces into the same API, because at this stage we’re using GFX, developers can see it, but I’m publishing to different platforms, and if MY API is different, it will increase the developer’s work time and workload, so we encapsulate it. When you develop a game, regardless of the platform, you use the same API, and when it’s time to ship, you build it, you choose what rendering to use.

This is the rendering of the native part, and I’m going to talk about GFX Agent. The GFX Agent will pass some instructions to the CPU. If we have 20 commands in the queue, at each frame we will first read the queue, execute the commands in the queue, and then extract them from the queue.

But sometimes, if the whole instruction is too much, if it’s normal we’re at 60 frames per second, that’s OK. That is, within this time range, if I do not finish processing, my current CPU to the GPU instruction will appear lag. If I’m stuck in one frame and it takes me two frames to render, I’ve dropped from 60 to 30. If I’m stuck in three frames, I’ve dropped from 60 to 20 (all live frame rates).

According to this problem, we made GFX Agent, which means that the current thread is not processing GPU coding commands in real time, we only collect and package these instructions at this stage, and we will put them in the queue. And then I’ll open another thread while I execute those instructions.

Threads are also queued. If my CPU has no time to send some instructions to the GPU (as you must know, the lag is mostly the time that the GPU waits for the CPU), it will cause a lag. Therefore, when the CPU does not send instructions to the GPU, even if the GPU does not receive new instructions, there are still things that can be rendered in another opening thread, so there will be no lag.

We now have Cocos CreatorV3 with a few performance improvements:

First, multi-back-end GFX for modern graphics interfaces; Second, load balancing multithreaded renderer. When I want to issue a command to some threads, I check them, or see which thread is handling the least number of tasks. In this case, the tasks will be placed in fewer threads, which will ensure that the tasks of each thread are relatively similar to maintain load balancing; Third, a highly customizable rendering pipeline; Fourth, huawei’s contribution to the high-performance delay rendering pipeline; Fifth, Memoryless architecture based on mobile TBR&TBDR GPU.

Because as we know, when the mobile terminal eats hardware, such as my quantity is very large, when the mobile terminal is transferred to the mobile phone, you may find some heat or reception delay. TBR divides the image to be rendered into tiles. The coordinates of each tile are listed in system memory by an intermediate buffer. TBDR One of the keys to TBDR rendering is delayed rendering.

Third, Cocos Creator video player based on FFmpeg

Next, I will talk about the FFMPEg-based Cocos Creator video player, taking the happy monkey as an example. We’re going to have some interactive games in our educational courseware, and interactive games are going to involve some videos, so there’s going to be some problems.

For example, in this picture, we are using the same design, but the visual effects on Android, IOS and the Web are not consistent. And because of fix top’s design limitations, I can’t use the mask component to limit the shape, it doesn’t support far corner masking. There was also a situation during scene switching where I had to exit a scene, but I currently had remnants of the video component, which was all part of our problem when we were developing the game with the engine.

So what’s our solution? It is divided into two parts: FFmpeg and Web Video Element.

We used FFmpeg for the coding, OpenGL for the rendering, and shipped to the phone. Android uses Oboe for screen playback, IOS uses AudioQueue for audio playback. For Web, we use Video to decode, WebGL to render Video, and Web Audio API to play Audio and Video. This is the idea of our overall solution.

Here are some audio divisions. First of all, will use FFmpeg to decode audio and video, ffPlay has been transformed, made AVplay. After decoding, each end access audio and video decoding, Android, IOS, Web are different. Finally, we have a JSB bound video component interface. Since our entire engine is in js, but our audio player is written in C++, we have to use JSB bindings to enable js to call objects in other languages.

Then there’s audio playback. As I mentioned earlier, Android, IOS, and the Web use different audio playback processes. Finally, we have an optimization and extension, such as edge under casting, precise seek, libyuv instead of swscale.

Let me just add a little bit to that. Why do we need to do such a thing, because if you want to be in our game, play video with engine, the whole framework is moved to come over, there is no way to interact with the some of our engine components to do, it also does not have the so-called hierarchy, I have no way to this video, by user behavior here is no way to go to do some interactive video. So that’s why we’re going to make changes based on FFmpeg. Since this would render every frame of our video to our engine, it would have some efficiency problems, which is why we used libyuv instead of swscale.

Let me go back to AVplay modifications. After compiling the FFmpeg source code, we found three executable programs:

First, FFmpeg is used for audio and video format conversion, video editing, etc.
Second, ffPlay is used to play audio and video, which relies on SDL.
Third, FFProbe is used to analyze audio and video streaming data;

What’s wrong with it? Although it meets our needs to play audio and video, but it has some problems in our transformation, such as texture scaling, pixel, format conversion, efficiency is very low, and it does not support AndroidSL file reading, so we have to transform this.

This is the thinking road of AVplayer’s entire transformation. First, we call a function to initialize all the information, and then we create a read thread and refresh thread. It creates audio, video, and caption decoders. Refresh Threads consume sequences of images that we send, for example, in video, sample sequences in audio, and string sequences in subtitles. So that’s the entire architecture of AVPlayer.

This is the JSB bound video component interface. The reason for the JSB binding is to let JSB call C++ objects and other languages, because our engine language and the video player language are not the same, so we want them to create a calling relationship, so we make a binding on the JSB side. We also made a separate class here, which is Video, and here’s its UML.

After the binding of JSB, it’s time to render, which is what I just said we’re going to render every frame of the video to our engine. It’s basically three steps:

First, customize the material, which is responsible for the shader program;
Second, customize Assembler, which is responsible for passing vertex attributes;
Third, set the material dynamic parameters, such as texture, transformation, translation, rotation and scaling matrix.

For custom materials, the shader program needs to be written in the Effect file, and the Effect is used by the Material. For each rendering component, the Material property needs to be mounted. Since video display can be understood as picture frame animation rendering, builtin-2D-Sprite material used by CCSprite provided by Cocos Creator can be directly used.

As for customizing Assembler, once you have the material, you only need to worry about passing the position coordinates and texture coordinates. That is, to customize Assembler, refer to the official documentation to customize Assembler. And updateWorldVerts are more different on the native side than on the Web side. Otherwise, you’ll see a confusion of presentation locations.

As for setting dynamic material parameters, in the video player, the texture data needs to be dynamically modified. On the mobile end, the AVPlayer after FFPlay is playing. Void Video::setImage is called through the itexTurerenderer.render interface, actually updating the texture data.

The above is the transformation based on FFmpeg, and this is the scene after our transformation (figure). That video is actually an animation of every frame that we render, so it creates an interactive state.

4. Cocos audio playback and modification

Finally, talk about Cocos audio and video playback and transformation.

Cocos audio and video playback is relatively simple, IOS is OpenAL, Android is OpenSL, Web is WebAudio and DomAudio, is also divided into different platforms to do.

This is the audio of the Cocos engine. In fact, in the engine is relatively simple, one is the loop like background music, and one is the sound effect, such as I fire a bullet sound effect sound.

In this case, based on the above example, we also made a slight modification of the audio on the mobile side, mainly to replace the SDL audio related interface in the ffPlay program, mainly open, close, pause and resume, etc.

Next, we will also add some 3D sound to the audio and support some of the major media sound platforms. As for the application in education I just mentioned, we can also use our Cocos engine to make some interactive games in development in the live broadcast. In the future, we will share more game cases in other scenarios with you.

These are my share, thank you!

About Qiliuyun, ECUG and ECUG Meetup

Seven NiuYun: Seven NiuYun was founded in 2011, as a famous domestic cloud computing and data service provider, seven NiuYun continued in mass file storage, CDN live content distribution, video on demand, interactive and intelligent analysis and processing of large-scale heterogeneous data in the field of the core technology for deep into, is devoted to drive the digital technology in data in the future, Enabling all industries to fully enter the data age.

ECUG: Effective Cloud User Group. Founded in 2007, CN Erlounge II was initiated by Xu Shiwei. ECUG is an indispensable high-end cutting-edge Group in the technology field. As a window of technological progress in the industry, ECUG gathers many technical people, pays attention to the current hot technology and cutting-edge practice, and leads the technological change in the industry together.

ECUG Meetup: ECUG, seven NiuYun collaboration technology share a series of activities, aimed at developers and technology practitioners offline gathering, the target is for developers to build a high quality learning and social platform, look forward to every participants between knowledge to create, build, and influence each other, to generate new knowledge to promote cognitive development and technological progress, promote the common progress of the industry, and by communication For developers and technical practitioners to create a better communication platform and development space.

ECUG Meetup Review: Audio and Video Applications in Cocos Engine

1. Introduction to Cocos engine

Cocos Creator && Engine

Third, Cocos Creator video player based on FFmpeg

4. Cocos audio playback and modification

Related Posts

SpringMVC fully annotated configuration notes

Brief analysis of Fork/Join basic concept and actual practice

Spring startup process source code analysis