Summary of iOS performance optimization

The cause of caton

After the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate the display content in the CPU, such as view creation, layout calculation, picture decoding, text drawing, etc. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. That’s why the interface gets stuck.

During development, excessive pressure on either CPU or GPU will lead to frame drop. Therefore, it is necessary to evaluate and optimize the pressure on CPU and GPU respectively during development.

CPU and GPU in iOS devices

CPU

Loading resources, object creation, object adjustment, object destruction, layout calculation, Autolayout, text calculation, text rendering, image decoding, and Core Graphics are all done on the CPU.

GPU

The GPU is a processing unit specifically designed for highly concurrent graphics computation. It uses less power than the CPU to complete the work and the floating point computing power of the GPU exceeds that of the CPU by a large margin.

GPU rendering performance than the CPU efficiency a lot, at the same time, the load and the consumption of the system is also lower, so in development, we should try to get the CPU is responsible for the main thread of the UI, the graphical display work related to the GPU to deal with, when it comes to the rasterizer when some of the work, the CPU will also be involved, which are described in more detail later.

Compared to the CPU, the GPU can do a single thing: take the submitted Texture and vertex description, apply the transform, mix and render, and then print it to the screen. The main things you can see are textures (pictures) and shapes (vector shapes for triangle simulation).

CPU and GPU collaboration

As can be seen from the figure above, to display a view on the screen, CPU and GPU need to work together. CPU calculates the displayed content and submits it to GPU, and GPU puts the result into frame cache after rendering. Then the video controller will read the data of frame buffer line by line according to VSync signal and transfer it to the display through possible digital-to-analog conversion.

The buffer mechanism

IOS uses double buffering. That is, THE GPU will pre-render a frame and put it into a buffer (front frame cache) for the video controller to read. After the next frame is rendered, the GPU will directly point the pointer of the video controller to the second buffer (post frame cache). When your video controller has finished reading one frame and is ready to read the next, the GPU will wait for the VSync signal from the display. The front frame cache and the back frame cache will switch instantly, and the back frame cache will become the new front frame cache, while the old front frame cache will become the new back frame cache.

Optimization scheme

TableViewCell reuse

The cellForRowAtIndexPath: callback creates only the instance, returns the cell quickly, and does not bind the data. Bind data (assign) at willDisplayCell: forRowAtIndexPath:.

Highly cache

When the tableView slides, heightForRowAtIndexPath: will be called repeatedly. When the cell height needs to be adaptive, each callback will calculate the height, which will cause the UI to get stuck. To avoid repeating meaningless calculations, cache height is required.

View hierarchy optimization

Do not create views dynamically

Under the premise of controllable memory, cache subview.
To make good use of hidden.

Reducing view hierarchy

Reduce the number of subviews and draw elements with Layer.
Use less clearColor, maskToBounds, shadow effects, etc.

Reduce unnecessary drawing operations

The picture

Do not use JPEG images, use PNG images instead.
Child thread pre-decode (Decode), main thread directly render. Because when image-no Decode is available, assigning directly to imageView will perform a Decode operation.
Optimize the image size and try not to zoom dynamically (contentMode).
Combine as many pictures as possible into one for display.

Reduce transparent View

Using transparent views gives rise to blending, which in iOS graphics processing refers primarily to the calculation of blending pixel colors. The most intuitive example is when we overlay two layers together. If the first layer is transparent, the final pixel color calculation needs to take into account the second layer. This process is called Blending. Causes of blending:

Alpha UIView < 1. UIImageView’s image contains an alpha channel (even if UIImageView’s alpha is 1, as long as the image contains a transparent channel, it still causes blending).

Why does blending cause performance loss?

The reason is straightforward: if a layer is opaque, the system simply displays the color of the layer. If the layer is transparent, it will cause more calculation because the other layer needs to be included in the calculation of the blended color. Opaque is set to YES to reduce the performance cost because the GPU will not do any composition and will simply copy from this layer.

Reduce off-screen rendering

Off-screen rendering refers to rendering an image once before drawing it to the current screen.

In OpenGL, the GPU screen can be rendered in the following two ways:

On-screen Rendering refers to the on-screen Rendering operation performed by the GPU in the on-screen buffer currently used for display. Off-screen Rendering refers to when the GPU creates a new buffer in addition to the current Screen buffer for Rendering.

Why does off-screen rendering stall? It mainly includes two aspects:

Create a new buffer. Context switching: The whole process of off-screen rendering requires multiple context switching (CPU rendering and GPU switching), first from the current Screen (on-screen) to off-screen (off-screen); When the off-screen rendering is finished, the rendering results of the off-screen buffer are displayed on the screen, and the context needs to be switched from off-screen to the current screen. Context switching is costly.

Off-screen rendering is triggered when the following properties are set:

ShouldRasterize, rasterize
The layer mask, mask
AllowsGroupOpacity is YES, and the value of layer.opacity is less than 1.0
Layer. cornerRadius and set layer.masksToBounds to YES. This can be done using clipped images or layer drawings.
Layer. shadows, (denoting related attributes beginning with shadow), uses shadowPath instead.

There are two different ways to draw shadows:

Do not use shadowPathUsing shadowPath

Performance difference, as shown below:

Optimization suggestions for off-screen rendering

Use ShadowPath to specify the layer shadow effect path.
Layer rendering using asynchrony (AsyncDisplayKit, Facebook’s open source asynchronous drawing framework).
Set the opaque value of the layer to YES to reduce complex layer composition.
Try to use image resources that do not contain alpha channels.
Try to set the size of layer to an integer value.
The most efficient solution is to have the artist cut the image into rounded corners.
In many cases, the user uploads images for display and can handle rounded corners on the client side.
Use code to manually generate the rounded image set to the View to display, using UIBezierPath (Core Graphics framework) to draw the rounded image.

ShouldRasterize should be used properly

Rasterization is to transfer GPU operations to CPU, generate bitmap cache, and directly read reuse.

Advantages: CALayer is raster to bitmap, shadows, cornerRadius and other effects are cached.

Disadvantages:

Updating the rasterized layer will cause an off-screen rendering.
The bitmap will be removed if it is not used over 100ms.
The Size of the cache is 2.5X Screen Size due to system restrictions.

ShouldRasterize is suitable for static pages, dynamic pages add overhead. If shouldRasterize is set to YES, remember to set rasterizationScale to contentsScale as well.

Asynchronous rendering

Draw in child thread, render in main thread. For example VVeboTableViewDemo

Rational use -drawRect:

You might be surprised to see a number of developers blogging about performance tuning using -drawRect: to optimize performance. But I don’t really recommend using the -drawrect: method without thinking. Here’s why:

When you load a view with UIImageView, the view still has a CALayer, but it doesn’t apply for a backup store. Instead, it uses a buffer that uses an off-screen render, uses CGImageRef as the content, and uses the render service to draw the image data to the frame. When we scroll through the view, the view will reload, wasting performance. So with the -drawRect: method, CALayer is preferred for drawing layers. Because using CALayer’s -drawinContext:, Core Animation will apply a back-up store for this layer to hold the bitmaps drawn by those methods. The code inside those methods will run on the CPU, and the results will be uploaded to the GPU. The performance is better this way.

-drawRect: is recommended for static pages, but not for dynamic pages.

According to the need to load

Do not refresh the entire section or tableView. Refresh the smallest element.
Use Runloop to improve the smoothness of the slide. Load the content when the slide stops, such as a flash (quick slide). There is no need to load the content and you can fill it with the default placeholders.

About Performance Testing

The first thing we need to do after we have graphics performance problems, slides, and animations that aren’t smooth enough is to locate the problem. And this process is not only to rely on experience and exhaustive method of exploration, we should use the context, sequence of scientific means to explore.

First, we need to have a model for locating problems. We can follow this order step by step to locate the problem.

In order to give the user a smooth feeling, we need to keep the frame rate around 60 frames. When we run into problems, we first check to see if the frame rate stays at 60 frames.
Locate the bottleneck, which is CPU or GPU. We want to have as little occupancy as possible, one for fluency, and two for saving power.
Check for unnecessary CPU rendering, such as places where we overwrite drawRect: when we don’t need to and shouldn’t. We want gpus to do more of the work.
Check to see if there are too many off-screen renders, which will drain GPU resources, as already analyzed. Off-screen rendering causes the GPU to constantly switch context between onScreen and offscreen. We want to have less off-screen rendering.
Check if we have too much Blending, the GPU can save resources by rendering an opaque layer.
Check whether the image format is in common use and whether the size is normal. If an image format is not supported by the GPU, it can only be rendered by the CPU. PNG is the most common format for iOS development, and some of the material we’ve read in the past indicates that Apple has specifically optimized its rendering and compression algorithms for PNG.
Check if there are any views or effects that cost a lot of resources and we need to use them reasonably and sparingly.
Finally, we need to check if there are any inaccuracies in our View hierarchy. For example, sometimes we keep adding and removing views, and sometimes we inadvertently cause bugs to happen.

Test tools:

Core Animation, A graphical performance test tool for Instruments.
View debugging, Xcode built-in, view level.
Reveal: view hierarchy.

Original author: Software iOS development

The original address: zhuanlan.zhihu.com/p/35693019