• The drawing task is too heavy, and the time consuming of drawing a frame of content cannot be guaranteed to be completed within 16ms.
  • The main thread is too busy, so VSync signal is not ready data, resulting in packet loss.

So why 16ms? How does the system guarantee that?

The Display process of Android can be simply summarized as: the application layer is responsible for drawing, and the system layer is responsible for rendering. The application layer transmits the surface cache data after measurement, layout and drawing to the system layer service through interprocess communication. SuraceFlinger in the system layer renders the data to the display screen and refreshes the data through Android’s VSYNC refresh mechanism.

Cross-process communication uses anonymous shared memory SharedClient. Each application and SurfaceFlinger creates a SharedClient, and each SharedClient creates up to 31 SharedBufferStacks. Each SharedBufferStack corresponds to a window for that application. Each SharedBufferStack contains two or three buffers from which the CPU/GUP calculates and renders data, which is then displayed in the corresponding window, as described below.

To summarize, the application layer draws to the cache, and SurfaceFlinger renders the cache data to the screen. Since these are two different processes, Android’s anonymous SharedClient is used.

This picture should be familiar:

The Android system sends a VSYnc signal every 16ms (60FPS) to trigger UI rendering. Once the VSYnc signal is received, the CPU processes each frame of data. If every drawing can be done in 16ms (corresponding to the CPU calculation + GPU rendering in the image), then no lag will be felt. Where A and B represent the two buffers in the SharedBufferStack, which are exchanged to display the data correctly, A technique called double buffering.

So why not use single buffering? This will cause a problem. If the data of frame A is processed by CPU/GPU, the driver layer will update every pixel of the buffered data to the screen. If single buffering is used, the data of frame A is not displayed completely, and the data of frame A +1 comes again, and the buffer is updated again, there will be a residual problem.

Are those cushions perfect? Let’s look at another situation:

When the processing time of CPU/GPU exceeds 16ms, the first VSync will arrive, but the data of buffer B is not ready, so it can only continue to display the previous DATA of A, resulting in frame loss phenomenon, and after the completion of B, because there is no VSync signal, As a result, the CPU/GPU has to wait for the next VSync signal before processing the next data, resulting in a large amount of wasted time in between.

So here comes triple-buffering. Look at the chart below:

As you can see from the figure, when the B buffer is not ready and the VSync signal is coming, the C buffer will proceed to the next frame of data, thus avoiding CPU/GPU waste. This does not solve the problem of lost frames, so it requires us not to keep the main thread “too busy” and to avoid anything that blocks the main thread.

After this chapter, the following section introduces some optimization methods and actual monitoring schemes.

  • Refer to Best Practices for Android Application Performance Optimization: China Machine Press