= = = = = = =
1. Off-screen rendering
Screen Render (current screen render)
Refers to the fact that the RENDERING operation of the GPU is performed in the screen buffer currently used for display.
Normally, what we display on the screen is the data rendered by the GPU reading the Frame Buffer and then displaying on the screen.
Off-screen render (off-screen render)
This refers to creating a buffer on the GPU outside the current screen buffer for rendering operations.
If it is not possible to write the rendering directly to the frame buffer due to some limitations, but instead exists temporarily in another memory area before writing to the frame buffer, this process is called off-screen rendering. That is to say, the GPU needs to create a new buffer for rendering in addition to the current screen buffer
The concept of current screen rendering is that the RENDERING operation of the GPU is performed in the current screen buffer.
Off-screen rendering Compared with the current screen rendering, all rendering processes that are not performed in the current screen buffer are off-screen rendering, that is, the GPU creates a new buffer outside the current screen buffer for rendering operations.
One of the special rendering methods is CPU rendering. This happens when we override the DrawRect method of UIView and use Core Graphics technology in our code, which is CPU rendering. The whole CPU is synchronized within the App, and the rendered bitmap is finally submitted to the GPU for display. But since all Core Graphics are thread-safe, CPU rendering can be done asynchronously.
The screen displays the original drawing of the image
Let’s start with the principle of CRT display in the past. CRT electron gun scans from top to bottom line in the above way. After scanning, the display displays a frame, and then the electron gun returns to the initial position to continue the next scan. To synchronize the display with the system’s video controller, the display (or other hardware) generates a series of timing signals using a hardware clock. When the gun moves to a new line and is ready to scan, the monitor emits a horizontal synchronization signal, or HSync; When a frame is drawn and the gun returns to its original position, the display sends a vertical synchronization signal, or VSync, before it is ready to draw the next frame. The monitor usually refreshes at a fixed rate, which is the frequency at which the VSync signal is generated. Although today’s devices are mostly LCD screens, the principle remains the same.
Generally speaking, the CPU, GPU, and display in a computer system work together in this way. The CPU calculates the display content and submits it to THE GPU. After the GPU finishes rendering, the rendering result is put into the frame buffer. Then the video controller will read the data of the frame buffer line by line according to the VSync signal and transmit it to the display through possible digital-to-analog conversion.
In the simplest case, there is only one frame buffer, and reading and flushing of the frame buffer can be inefficient. To solve the efficiency problem, the display system usually introduces two buffers, that is, double buffering mechanism. In this case, the GPU will pre-render one frame into a buffer for the video controller to read, and after the next frame is rendered, the GPU will point the video controller’s pointer directly to the second buffer. So there’s a huge increase in efficiency.
Double buffering solves the efficiency problem, but it introduces a new one. When the video controller has not finished reading, that is, when the screen content is just half displayed, THE GPU submits a new frame content to the frame buffer and exchanges the two buffers, the video controller will display the lower part of the new frame data on the screen, causing the phenomenon of picture tearing.
To solve this problem, gpus usually have a mechanism called VSync (also known as V-sync). When VSync is enabled, the GPU will wait for a VSync signal from the display before performing a new frame rendering and buffer update. This will solve the problem of tearing and increase the smoothness of the picture, but it will consume more computing resources and cause some latency.
So what about the mainstream mobile devices? As you can see from the web, iOS devices always use dual caching and enable vSYNC. On Android devices, Google didn’t introduce this mechanism until version 4.1. Currently, Android has triple cache + vSYNC.
Away from the screen of the high costs of rendering, want to undertake off-screen rendering, preferred to create a new buffer, there will be a screen rendering context of a concept, off-screen rendering the whole process of need to switch the context, from the current first display switch to off screen, etc. At the end of context switching to come back again. This is why it costs performance. . Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, that frame will be discarded and displayed at the next opportunity, while the display will retain the previous content. That’s why the interface gets stuck.
2. The performance cost of off-screen rendering
- A new buffer needs to be created
- In the whole process of off-screen rendering, the context needs to be changed several times, first from the current Screen (on-screen) to off-screen (off-screen); When the off-screen rendering is finished, the rendering results of the off-screen buffer are displayed on the screen, and the context needs to be switched from off-screen to the current screen
3. Why is off-screen rendering so performance-intensive and why is there a mechanism?
When using rounded corners, shadows, and masks, the mixture of layer attributes is specified that it cannot be drawn directly on screen until pre-compositing (until the next HSync signal starts), so off-screen rendering is required. Understand it this way. My boss asked me to make an app in a short time. I can do it by myself, but the time is too short, so I have to ask my friends to help me do it. (Performance cost: the cost of communicating with your friends. What a waste.) But you can’t because you can’t finish it. This is going to make a little bit more sense.
4. How do I detect off-screen renderings in my project
The correct way to open:
(2) Instrument- Select Core Animation- check color Offscreen-Rendered -Rendered may have a performance problem Yellow
5. When is off-screen rendering triggered
- Set layer rounded corners to be used with maskToBounds
- Put masks in the masks
- Set Shadows
- shouldRasterize
6. The pros and cons of off-screen rendering
disadvantage
- Off-screen rendering requires additional storage space, with an upper limit of 2.5 times the screen pixel size, beyond which off-screen rendering cannot be used
- Easy to drop frames: Once the final frame cache is saved due to off-screen rendering, and it has exceeded 16.67ms, the frame will drop, resulting in lag
advantage
While off-screen rendering requires a new temporary cache to store intermediate state, data that appears multiple times on the screen can be rendered in advance for reuse so that the CPU/GPU does not have to do repeated calculations.
Special product requirements, in order to achieve some special dynamic effects, need multiple layers and off-screen cache to preserve the intermediate state, in this case, off-screen rendering is necessary. For example, if the product needs to implement Gaussian blur, either custom Gaussian blur or call system API will trigger off-screen rendering
7. Setting rounded corners does not necessarily result in off-screen rendering
If the layer layer is simple, it will not trigger the off-screen rendering. For example, UIImageView Settings cornerRadius and masksToBounds will not trigger off-screen rendering. Setting a background color on UIImageView will trigger off-screen rendering.
8. The reason for interface delay
Those of you who have experience with games may be familiar with the concept of FPS. We know that any screen always has a refresh rate, for example the iPhone’s recommended refresh rate is 60Hz, which means the GPU refreshes the screen 60 times per second, so the interval between refreshes is 16.67ms. This is called a frame. FPS means Frames per second, or how many frames are displayed per second. For static content, we don’t need to consider its refresh rate, but the FPS value directly reflects the smoothness of the slide when animating or sliding
CPU 1.Layout: UI Layout, text calculation 2.Display: rendering 3.Prepare: image decoding 4.Com MIT: bitmap submission
GPU Rendering pipeline (OpenGL) Vertex shading, pixel assembly, rasterization, fragment shading, fragment processing
The interface is mainly considered from two perspectives
CPU limits
-
1. Object creation, release, and property adjustment. In particular, CALayer’s property adjustments create implicit animations, which are performance-destroying.
-
2. View and text layout calculation, AutoLayout layout calculation are on the main thread, so occupy a lot of CPU time.
-
3. Text rendering, such as UILabel and UITextview are rendered in the main thread
-
4. Image decoding. It should be mentioned here that USUALLY THE CPU will decode UIImage only at the moment before it is handed to the GPU.
GPU limitation
-
Blending of views. For example, when a dozen or so layers of views are stacked together on an interface, the GPU has to calculate how many pixels are displayed for each pixel
-
Off-screen rendering. View Mask, rounded corner, shadow.
-
Translucent, THE GPU has to do the math, if it is opaque, the CPU just needs to take the upper layer
-
Floating point pixel
9. The rasterizer
Rasterization is prerendering a layer into a bitmap and adding it to the cache. If you cache static content that is resource-intensive, such as the shadow effect, you can get a significant performance improvement.
In Instrument, the second debugging option is “Color Hits Green and Misses Red,” which means that if you hit the cache it is Green, otherwise it is Red, obviously the more Green the better, and the less Red the better.
At the heart of rasterization lies the idea of caching. We play it ourselves, we can find the following interesting phenomena:
-
It stays green as it slides up and down a little bit
-
The new label appears red at first and then turns green
-
If you hold still for one second, it turns red when you first start sliding.
This is because layer is rasterized and rendered into a bitmap in the cache. When the screen slides, we read directly from the cache without rendering, so we see green. When a new label appears, there is no bitmap of the label in the cache, so it turns red. The third key point is that objects in the cache have a lifetime of 100ms, meaning that if they are not used within 0.1s they are automatically removed from the cache. That’s why it’s red when you sit there and slide.
Rasterized caching is a double-edged sword in that writing to the cache and then reading can be time-consuming. Therefore, rasterization is only suitable for more complex, static effects. According to Instrument’s debugging, rasterization often fails to hit the cache. If there is no special need, rasterization can be turned off
Frame buffer definition
Frame Buffer: Referred to as Frame Buffer or , it is a direct image of the picture displayed on the screen, also known as a Bit Map or raster. Each unit of the frame cache corresponds to a pixel on the screen, and the entire frame cache corresponds to a frame of image
The phenomenon of picture tearing
When the RENDERING of GPU completes a new image frame, the screen display only completes part of the image of the previous frame, and the GPU refreshes the frame buffer, the phenomenon of picture tearing will occur. To solve this problem, gpus usually have a mechanism called VSync (also known as V-sync). When VSync is enabled, the GPU will wait for a VSync signal from the display before performing a new frame rendering and buffer update. This will solve the problem of tearing and increase the smoothness of the picture, but it will consume more computing resources and cause some latency
Screen caton
After the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate the display content in the CPU, such as view creation, layout calculation, picture decoding, text drawing, etc. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. That’s why the interface gets stuck.
As you can see from the above figure, whichever CPU or GPU blocks the display process will cause frame drops. Therefore, it is necessary to evaluate and optimize CPU and GPU pressure respectively during development.
Causes and solutions of CPU resource consumption
Object creation
The creation of an object allocates memory, adjusts properties, and even reads files, consuming CPU resources. You can optimize performance by replacing heavy objects with lightweight objects. For example, CALayer is much lighter than UIView, so controls that do not need to respond to touch events are better displayed with CALayer. If the object does not involve UI operations, try to create it in the background thread, but unfortunately controls that contain CALayer can only be created and manipulated in the main thread. Creating view objects in Storyboard is much more expensive than creating them directly in code, and Storyboard is not a good technology choice for performance-sensitive interfaces.
Delay object creation as long as possible and spread it out over multiple tasks. Although this is a bit of a hassle to implement and doesn’t offer many advantages, if you can do it, try it. If objects can be reused and the cost of reuse is less than releasing and creating new objects, then such objects should be reused in a cache pool whenever possible.
Object to adjust
Object tuning is also a frequent CPU drain. Here’s a special word for CALayer: CALayer has no properties inside it. When a property method is called, it temporarily adds a method to the object through the runtime resolveInstanceMethod and stores the corresponding property value in an internal Dictionary. It also notifies the delegate, creates animations, and so on, which is very resource-intensive. UIView’s display-related properties (frame/bounds/ Transform, for example) are actually mapped from the CALayer properties, so adjusting these UIView properties consumes far more resources than normal properties. You should minimize unnecessary property changes in your application.
There are a lot of method calls and notifications between UIView and CALayer when view hierarchies are adjusted, so you should avoid adjusting view hierarchies, adding and removing views when optimizing performance.
Object is destroyed
The destruction of objects consumes a small amount of resources, but it adds up to a significant amount. Usually when a container class holds a large number of objects, the resource cost of their destruction becomes apparent. Similarly, if an object can be released in a background thread, move it to the background thread. Here’s a little Tip: Capture an object in a block, throw it on a background queue, send a random message to avoid compiler warnings, and let the object be destroyed in the background thread.
NSArray *tmp = self.array;
self.array = nil;
dispatch_async(queue, ^{
[tmp class];
});
Copy the code
Layout calculation
View layout calculation is the most common CPU drain in an App. If the view layout is precomputed in the background thread and cached, there are few performance issues in this place.
Regardless of the technology to the view layout through, it will eventually fall for UIView. The frame/bounds/center on the adjustment of the property. As mentioned above, adjusting these attributes is very expensive, so try to calculate the layout in advance and adjust the corresponding attributes at once if necessary, rather than multiple, frequent calculation and adjustment of these attributes.
Autolayout
Autolayout is a technology advocated by Apple itself and can be a great way to improve development efficiency in most cases, but Autolayout can often cause serious performance problems for complex views. As the number of views grows, the CPU consumption from Autolayout increases exponentially. For more information, see pilky.me/36/. If you don’t want to manually adjust the frame and other attributes, you can use some tools to replace (such as common left/right/top/bottom/width/height and quick properties), or use ComponentKit, AsyncDisplayKit framework, etc.
The text calculated
If an interface contains a large amount of text (such as weibo and wechat moments, etc.), text width and height calculation will occupy a large portion of resources and is inevitable. If you have no special requirements for text display, you can refer to the internal implementation of UILabel: With [NSAttributedString boundingRectWithSize: options: context:] to calculate the text width is high, Use – [NSAttributedString drawWithRect: options: context:] to draw text. Although these two methods perform well, they still need to be put into background threads to avoid blocking the main thread.
If you draw text in CoreText, you can create a CoreText typeset object and do the calculation yourself, and the CoreText object can be saved for later drawing.
Text rendering
All text content controls that can be seen on the screen, including UIWebView, are at the bottom of the page formatted and drawn as bitmaps through CoreText. Common text controls (UILabel, UITextView, etc.), its typesetting and drawing are carried out in the main thread, when the display of a large number of text, the CPU pressure will be very large. There is only one solution, and that is a custom text control that asynchronously draws text using TextKit or the low-level CoreText. Once the CoreText object is created, it can directly obtain the width and height of the text, avoiding multiple calculations (once when the UILabel size is adjusted, and again when the UILabel is drawn). CoreText objects take up less memory and can be cached for later multiple renders.
Image decoding
When you create an image using UIImage or CGImageSource methods, the image data is not immediately decoded. The image is set to UIImageView or calayer.contents, and the data in the CGImage is decoded before the CALayer is submitted to the GPU. This step occurs on the main thread and is unavoidable. If you want to get around this mechanism, it is common to draw images into CGBitmapContext in the background line and then create images directly from the Bitmap. At present, the common network photo library has this function.
Image drawing
Drawing an image usually refers to the process of drawing an image onto a canvas using methods that begin with CG, and then creating and displaying the image from the canvas. The most common place to do this is inside [UIView drawRect:]. Since CoreGraphic methods are usually thread-safe, drawing images can easily be put into background threads. The process for a simple asynchronous drawing looks something like this (it’s much more complicated than this, but the principle is the same) :
- (void)display { dispatch_async(backgroundQueue, ^{ CGContextRef ctx = CGBitmapContextCreate(...) ; // draw in context... CGImageRef img = CGBitmapContextCreateImage(ctx); CFRelease(ctx); dispatch_async(mainQueue, ^{ layer.contents = img; }); }); }Copy the code
Causes and solutions of GPU resource consumption
Compared to the CPU, the GPU can do a single thing: take the submitted Texture and vertex description, apply the transform, mix and render, and then print it to the screen. The main things you can see are textures (pictures) and shapes (vector shapes for triangle simulation).
Texture rendering
All bitmaps, including images, text and rasterized content, are eventually committed from memory to video memory and bound to the GPU Texture. Both the process of submitting to video memory and the process of GPU adjusting and rendering Texture consume a lot of GPU resources. When a large number of images are displayed in a short period of time (such as when the TableView has a large number of images and slides quickly), the CPU usage is very low and the GPU usage is very high, and the interface will still drop frames. The only way to avoid this situation is to minimize the display of a large number of pictures in a short period of time, and to display as many pictures as possible.
When the image is too large to exceed the maximum texture size of the GPU, the image needs to be preprocessed by the CPU, which will bring additional resource consumption to the CPU and GPU. Currently, iPhone 4S and above models, texture size upper limit is 4096×4096, more detailed information can be seen here: iosres.com. So try not to let images and views exceed this size.
Blending of views
When multiple views (or Calayers) are displayed on top of each other, the GPU blends them together first. If the view structure is too complex, the mixing process can also consume a lot of GPU resources. To reduce GPU consumption in this situation, applications should minimize the number and level of views and indicate opaque attributes in opaque views to avoid useless Alpha channel composition. Of course, this can also be done by pre-rendering multiple views as a single image.
Graph generation
ALayer’s border, rounded corners, shadows, masks, and CASharpLayer’s vector graphics display usually trigger offscreen rendering, which usually happens on the GPU. When a list view shows a large number of calayers with rounded corners and a quick swipe, you can observe that the GPU resource is full and the CPU resource consumption is low. The interface still slides normally, but the average number of frames drops to a very low level. ShouldRasterize to avoid this, try turning on the calayer.shouldrasterize property, but this will shift the off-screen rendering onto the CPU. For situations where only rounded corners are needed, you can also simulate the same visual effect by overlaying the original view with an already drawn rounded corner image. The most radical solution is to draw the graphics that need to be displayed as images in background threads, avoiding rounded corners, shadows, masks, and other attributes.