How to examine through Systrace GPU rendering time | nuggets technical essay - double festival special

preface

First, let’s review the previous question:

In Systrace, as shown above, vsync-app is basically unchanged, but vsync-SF is constantly updated. What are the possible reasons?

To explain this, we need to understand what vsync-app does. To DispSync, the app is notified to start rendering and update the UI. To DispSync, the app is notified to start rendering and update the UI at the rate of the screen refresh rate. However, most mobile games use game engines such as Unity or Unreal. In these engines, the engine itself controls the frame rate of the rendering (there are usually multiple frame rates to choose from, for example, Peace Elite has 20-90 frame rates to choose from). Therefore, Mobile games can ignore the rate of vsync-app and thus have the phenomenon of Systrace above.

Ok, with all the remaining questions answered, in this bulletin board we are going to learn how to view GPU rendering time in Systrace. It is recommended that you eat this article together with the calculation principle of FPS. It is helpful for you to understand the function and significance of each fence.

queueBuffer

First of all, everything you can see in Systrace is cpu-side, such as the frequency of each core of the CPU, C-state, what threads are running, the call relationships of each thread, running time, etc., all of which are CPU side. This includes queueBuffer(), which is easily misunderstood.

We should often see a flow diagram of a Buffer in a BufferQueue:

As we know, The PRODUCER in the figure obtains a Buffer from the BufferQueue through the dequeueBuffer(), and then decides what to draw and hands it to the GPU to do the final rendering. The Buffer is also returned to the BufferQueue via queueBuffer(). Therefore, the rendering work is not complete when the PRODUCER executes queueBuffer(). So can we tell from Systrace when the GPU is done rendering? If the GPU rendering time is long, then the performance bottleneck can be preliminarily determined from the GPU side.

FenceMonitor

The answer is yes. In Android Q, libgui introduces a new internal class, FenceMonitor, which shows how long a Fence takes from generation to signal in Systrace by tracking its lifecycle. There are two FenceMonitor static variables for Surface::dequeueBuffer() and Surface::queueBuffer() in surf.cpp. Trace the lifecycle of Release Fence and Acquire Fence respectively:

int Surface::dequeueBuffer(android_native_buffer_t** buffer, int* fenceFd) {
.        static FenceMonitor hwcReleaseThread("HWC release");
        hwcReleaseThread.queueFence(fence);
        
int Surface::queueBuffer(android_native_buffer_t* buffer, int fenceFd) { . static FenceMonitor hwcReleaseThread("HWC release");  hwcReleaseThread.queueFence(fence); Copy the code

Let’s see how it does this:

Realize the principle of

class FenceMonitor {
public:
    explicit FenceMonitor(const char* name) : mName(name), mFencesQueued(0), mFencesSignaled(0) {
        std::thread thread(&FenceMonitor::loop, this);
        pthread_setname_np(thread.native_handle(), mName);
 thread.detach();  }   void queueFence(const sp<Fence>& fence) {  char message[64];   std::lock_guard<std::mutex> lock(mMutex);  if(fence->getSignalTime() ! = Fence::SIGNAL_TIME_PENDING) { snprintf(message, sizeof(message), "%s fence %u has signaled", mName, mFencesQueued);  ATRACE_NAME(message);  // Need an increment on both to make the trace number correct.  mFencesQueued++;  mFencesSignaled++;  return;  }  snprintf(message, sizeof(message), "Trace %s fence %u", mName, mFencesQueued);  ATRACE_NAME(message);  // The fence that declares the period will need to be enqueued and the blocked loop in threadLoop() will need to be awakened mQueue.push_back(fence);  mCondition.notify_one();  mFencesQueued++;  ATRACE_INT(mName, int32_t(mQueue.size()));  }  private:  void loop() {  while (true) {  threadLoop();  }  }   void threadLoop() {  sp<Fence> fence;  uint32_t fenceNum;  {  std::unique_lock<std::mutex> lock(mMutex); // mQueue is empty and will be blocked until queueFence() is executed while (mQueue.empty()) {  mCondition.wait(lock);  } // Get the team head's fence to start tracking the declaration cycle fence = mQueue[0];  fenceNum = mFencesSignaled;  } // Note that a block scoping trick is used here, as described below {  char message[64];  snprintf(message, sizeof(message), "waiting for %s %u", mName, fenceNum);  ATRACE_NAME(message);   status_t result = fence->waitForever(message);  if(result ! = OK) { ALOGE("Error waiting for fence: %d", result);  }  }  {  std::lock_guard<std::mutex> lock(mMutex);  mQueue.pop_front();  mFencesSignaled++;  ATRACE_INT(mName, int32_t(mQueue.size()));  }  }   const char* mName;  uint32_t mFencesQueued;  uint32_t mFencesSignaled;  std::deque<sp<Fence>> mQueue;  std::condition_variable mCondition;  std::mutex mMutex; }; Copy the code

The constructor creates a thread and executes loop(), which is an infinite loop that keeps running threadLoop(). In threadLoop(), there are three block scopes, with threadLoop() blocking in the first block scope until queueFence() is executed.

When queueFence() is executed, the fence passed in is enqueued, and the loop in threadLoop() is awakened, and the queue head fence is fetched, entering the second block scope, and lifecycle listening begins.

Listening for the Fence lifecycle uses the tricks of C++ block scope and the ATRACE_NAME() destructor. As we mentioned in the destructor magic, ATRACE_CALL() is actually a special case of ATRACE_NAME() (the string argument passed is the name of the currently called function), and ATRACE_NAME() itself defines a variable of type ScopedTrace, The constructor calls atrace_begin() and the destructor calls atrace_end(). Therefore, in the second block scope, atrace_begin() is called indirectly through ATRACE_NAME() calling the constructor of ScopedTrace, then waitForever() is called to wait for the fence to be signalled, and then the second block scope ends, The destructor to ScopedTrace indirectly calls atrace_end(), thus tracing the fence life cycle. We can see something like this in Systrace:

The length of each waiting for GPU completion XXX is the time required by GPU rendering (i.e., the total time of acquisition fence release), through which we can determine whether there is a GPU bound phenomenon.

Waiting for HWC release XXX is the total time before the release fence signal, The GPU cannot read or write the Buffer that dequeueBuffer() received (because the Buffer owner is still in HWC), so you can use this to determine if the Display is faulty.

The tail

No tail this time.

If you think this article is helpful to you, ask for praise, collect and forward, the most important thing is to point a big attention, your support is the biggest power of my update.

See you in the next article.

🏆 nuggets technical essay | double festival special articles

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How to examine through Systrace GPU rendering time | nuggets technical essay – double festival special

preface

queueBuffer

FenceMonitor

Realize the principle of

The tail

How to examine through Systrace GPU rendering time | nuggets technical essay – double festival special

preface

queueBuffer

FenceMonitor

Realize the principle of

The tail

Related Posts

How a View works

Flutter VS Native and React-Native: Check performance

Custom View video clipping bar