Background: Our engine is Egret, using the native EUI, wechat mini game; The first version of the project was released using PerfDog to test a wave of data. The results show that many problems, this paper is mainly divided into two parts

  • The first part focuses on finding problems through PerfDog,
  • The second part focuses on locating and resolving problems using PerfDog’s data.

For details about how to operate the PerfDog, see the PerfDog instructions

The first part ———— data analysis

This case is seen in the first version of the game, more common, so take it out to do an analysis. One point here. Analysis of the problem requires an overall data linkage analysis, and it does not make sense to look at a single piece of information in isolation

Data from the first test

FPS:



Memory:



CPU:



Conclusion:

1. We found that the FPS fluctuated greatly during combat 2. Memory continued to rise 3

First of all, the explanation of question 3: WHAT I chose to test before was wechat app, and the small game exists as a sub-process, so I should choose PerfDog’s sub-process for testing, so that the data will be more accurate. The dark processes in the figure below represent the running top-level processes

For this multi-process application testing:

On iOS platform, APP multi-process is divided into APP Extension and system XPC Server. For example: an esports broadcast software using APP Extension Extension process (Extension process name LABroadcastUpload). Of course, it is also possible to use the system XPC Server service process, such as the general Web browser will use WebKit.

Android platform, the general large APP, such as the game is sometimes multi-process cooperative operation (wechat small games, micro-vision and other apps and king of glory and other games many sub-process), can choose the target sub-process targeted test. The default is the main process. King of Glory





Detailed instructions can be found here:The PerfDog instruction manual

To determine what causes the FPS to fluctuate, and to determine if there is an OOM, let’s now select the child processes for the second test;

Second test data

Test data composition: In order to verify some of my conjecture and to better locate the problem, we did some special operations during the test:

1. Combat hung up [to determine if it was a memory leak triggered during combat]

2. Repeatedly open and close UI [to determine whether there is a memory leak during UI creation and destruction]

3. Stay still on a UI page [to distinguish it from other scenes]

4. Dead screen hanging [to determine whether the memory leak is caused by image resources or code resources leak]

FPS data:

The CPU data:

Memory data:



GPU is under pressure

FPS and GPU analysis:

Through our Jank FPS data found in the game process is very serious, FPS fluctuation is too severe, especially focused on the UI when opening or closing, games, rendering images, relatively GPU possible bottlenecks, one by one to view of GPU, this time we found the GPU usage data screening has become unusually high, Obviously there’s a lot of pressure to render, and the fact that the combat is actually rendered when our game UI is open is related to the design of our game, so there’s a lot of pressure to render.

Memory analysis:

Through PerfDog’s data, we found that the memory was in a state of constant rise, which would eventually be killed by the System. In fact, it is now clear that there was a memory leak. In 72 minutes, the memory went from 726M to 956M and rising;

PSS (the default memory of PerfDog is PSS) is not the only way to see if there is an OOM, but VSS is also the only way to see if there is an OOM. Simple share common memory indicators relationship

Memory consumption vSS-Virtual Set Size Virtual memory consumed (including the memory used by the shared library) RSS-resident Set Size Proportional with physical memory used (including the memory used by the shared library) PSS-Proportional Set Size Uss-unique Set Size Physical memory used by a process (excluding the memory used by the shared library) Generally, the memory Size is as follows: VSS >= RSS >= PSS >= USS

Here is a little introduction to android LMK (Low Memory Killer), the details will not be too much.

1. The Android system periodically checks the memory. When the memory reaches a certain value, it kills the corresponding process and releases the memory. Every program has a value of oom_adj. The smaller the value, the more important the program and the less likely it is to be killed. The size of oom_ADJ depends on the type of process and the order in which the process is scheduled 4. Threshold table can through the/sys/module/lowmemorykiller/parameters/adj and/sys/module/lowmemorykiller/parameters/minfree configuration

Now we’re going to combine the data from the two tests

Conclusion:

1.FPS is too volatile and unstable, especially when the uI is being created and closed; 2. There is a memory leak, because memory keeps rising no matter what operation, most likely it is caused by common components 3. There are other little problems, but these two are the first

Part two ———— Fault locating

Memory leak analysis

With the data above PerfDog, we will start to locate and troubleshoot problems.

Local project architecture:



1. The infrastructure of our project is the same base class (ancestral code) that all basic functions call, such as communication class, etc.

2. We find that memory keeps rising, no matter what environment the character is in, even when the character is off screen, so we can highly likely locate that there is something wrong with the base class within the project;

And then we start to look;

Memory leak Detection

First of all, we need to understand some JS memory management mechanism

JS memory allocation and recovery are automatically completed by the VM, do not need to write matching DELETE /free code like C/C++ for each new/malloc operation, JS engine to the variable storage is mainly in the stack memory, heap memory. The essence of a memory leak is that some objects are accidentally not reclaimed, but live in memory. One of the features of the JavaScript virtual machine is that object creation is much more expensive than object computation, and object creation leads to garbage collection, which causes the game to run out of time. View useless objects in the heap and reclaim the memory space occupied by these objects. Most implementations of GC(Gabage Collection) on the browser adopt the reachability algorithm. Objects with reachability are objects that can form a connected graph with GC Roots. When an object does not have any chain of references to the GC Roots, it becomes the target of the garbage collector, and the system reclaims its memory when appropriate.

I’m using Google Chrome’s Head Profiling here, or you can also use egret engine’s profiler: It’s easy to use:

1. Open the Google browser, open the page to be monitored, and press F12 in Win to pop up Developer Tool 2. Switch to Memory, select the Heap type, and select Take Heap SnapShot to start SnapShot 3. The right view lists the objects in the heap. Click on the object to see the reference hierarchy 4. 5. Convert the new snapshot to the Comparsion comparison view for memory comparison analysis

Note that before each snapshot is taken, a GC is automatically performed to ensure that all objects in the view are accessible to root. The GC trigger is browser-dependent, so you can’t tell if there is a memory leak by watching memory spikes from time to time.

We can take a snapshot every once in a while (I won’t show the real project because of a company project, but this is just for teaching purposes) :

We can go to chrome’s memory analysis tool and have three options, which we can alternate depending on how we debug;

1.Heap snapshot – To print a Heap snapshot, 2.Allocation Instrumentation on timeline – Record memory information on the timeline, record memory information as time changes. 3.Allocation sampling – Memory information sampling is used to record memory Allocation. This profile type has minimal performance overhead and can be used for long-running operations. It provides a good approximation of the allocation of stack subdivisions performed by javascript.



Here is an example of using heap snapshot analysis,



View details on the right



Rect: Rect: rect: rect: rect: rect: rect: rect: rect



Rect object has a property Rect is always referenced, so the memory can not be freed, so we go to the corresponding location of the code to find, can quickly locate the cause; It turns out that we instantiate an object in a custom global event listener, but some of the object’s properties continue to be referenced by the event listener without being reclaimed

Of course, in order to quickly locate which function, we can also use



It’s going to look something like this

The Overview HEAP graph shows the JS HEAP. In general, the vertical direction of the Call Stack doesn’t mean much, just that the functions are deeply nested, but the horizontal direction indicates the Call time, and if the Call time is too long, then you need to optimize. The call stack of recorded results is represented horizontally by a floating window with more detail, vertically by the call stack, and from top down by the function call. Scroll the mouse wheel to view the call stack information of a certain period of time. To see the function details, place your mouse over a function in the Call Stacks Call stack. This is usually a concern for performance optimizations and, in the case of memory leaks, is primarily used to help locate what was done. Counter pane. Here you can see the memory usage (same as the HEAP graph in the Overview pane), as shown below: The JS heap, Documents, DOM Nodes, listeners, and GPU Memory. Check or uncheck the check box to show or hide it from the chart.

Focus on the third JS heap memory, the number of nodes, the number of listeners. Move the mouse to the curve, you can show the specific data in the lower left corner. If one of these numbers is continuously rising, without a downward trend, it could be a leak. For the sake of space, I don’t want to go into the use of these tools here. There are many tutorials available online.

Caton optimization

According to PerfDog’s data, we found that the GPU was under a lot of pressure. For games, a long rendering screen usually means too many drawcalls, or a long time for each draw.



In our game, after checking the drawcall, it was determined that the reason was that the game was running with too many drawcalls, which led to a long rendering time for each frame, so there would be a lag phenomenon.

Viewing drawcalls and the like can be seen in the FPS panel of Egret itselfEgret Debug document

What did Egret do in a rendered frame before optimizing



Subdivision can be divided into

Work content of each frame:

1. EnterFrame is executed once. At this point, the engine executes the game logic. And throws an EnterFrame event 2. The engine will execute a clear. 3.Egret The kernel traverses all displayObjects in the game scene and recalculates Transform 4 for all display objects. All images are drawn to the canvas

Now to optimize: first, reduce the drawCall:

1. Replace all small pictures with atlas 2. Realize text batch, and use picture font instead of native font 3 by custom font. Separation of movement and motion, put the changes and invariable in different layers, such as background layer, icon layer and dynamic change layer 4. Animation try to use Dragon Bones frame animation instead of Spine animation 5. Use cacheAsBitmap to compute a vector graph as a bitmap at run time

Reduce frame event overhead:

1. Remove the DisplayObject and removeChild it directly instead of setting its visible property to false, otherwise it will be evaluated in step 3. Instead of creating any objects in the main loop, all the characters, monsters, and special effects in the game are made into object pool 3. Don’t do too much on the EnterFrame event, but use events that can be customized

We can count the number of gameObjects created using the following function



It shows a hashCount per second compared to a hashCount, which is used by the egret engine’s internal API to count the number of engine objects created. If the game is still, the theoretical result of the hashCount diff should be zero, but in practice it should be as low as possible below 120. If it exceeds that, just add a breakpoint to the engine’s HashObject constructor and check the call stack at run time.

To view the PerfDog information, run perfdog.qq.com/?ADTAG=medi…