Render and off-screen render in OC

I. Related basic concepts

1.FrameBuffer

FrameBuffer: Short for frame cache or video memory, it is a direct image of the picture displayed on the screen, also known as a Bit Map or raster. Each storage unit of the frame cache corresponds to a pixel on the screen, and the entire frame cache corresponds to a frame of image.

2.Context switch

Context refers to the contents of CPU registers and program counters at a point in time. Registers are high-speed storage units with limited storage capacity. They can be used to temporarily store instructions, data, and addresses. A program counter is a special register that indicates where in an instruction sequence the CPU is executing, either where it is executing or where it will execute next (depending on the particular system). Context switch (sometimes called process switch or task switch): Switch the CPU from a process/thread/task to a process/thread/task. More specifically, context switching can be thought of as the kernel (the core of the operating system) doing the following on the CPU for processes (including threads) : (1) hang a process, the process in the state is stored within the context of the CPU to somewhere in the memory, (2) in memory retrieval under the context of a process content and its recovery in the CPU registers, (3) at the position to jump to the program counter (i.e., jump to process interrupted by lines of code), to restore the process.

Note: The CPU implements multithreaded ingenuity by allocating CPU time slices to each process, including threads. The CPU uses the time slice allocation algorithm to execute tasks. After the current task executes a time slice, it switches to the next task. However, the context state of the previous task is saved before the switch, so that the next time the task is cut back, the state of the task can be loaded again to continue the previous task. So the process from saving to loading is a context switch. Context switching is usually computationally intensive, meaning that this operation consumes a lot of CPU time, so more threads are not always better. How to reduce the number of context switching in the system is a key issue to improve the performance of multithreading.

3.The painter algorithm

Painter algorithm, also known as priority fill, is a solution to the visibility problem in computer graphics. The painter algorithm “means that a simple-minded painter first draws a distant scene, and then draws a distant scene to cover the distant part. The painter algorithm first sorts the polygons in the scene according to their depth, and then paints them in order. This approach usually overwrites the parts that are not visible, thus solving the visibility problem.

The main rendering operations in ios are performed by the CoreAnimation and Render Server modules by calling the graphics driver interface OpenGL/Matal

IOS For each Layer, RenderServer outputs them to the FrameBuffer in order of depth, with the last Layer overlaying the previous Layer to achieve the final effect. The process is irreversible, and the data covered is permanently lost and cannot be modified.

4.The rasterizer (Rasterization)

Rasterization is the process of converting vertex data into tiles. It has the function of converting a graph into an image composed of a grid, characterized by a pixel in the frame buffer corresponding to each element.

IOS Rendering Process Concept

IOS Rendering Process (iOS Rendering Process) is the iOS device Rendering Process, from setting up the image metadata to be displayed to Rendering the image on the device screen.

Before we start dissecting the iOS Rendering Process, we need to have a basic understanding of iOS Rendering concepts:

1. Tile-based rendering

The screen on IOS devices is divided into blocks of N by N pixels, each of which fitsSoCCache, geometry is broken up in a large number of blocks, and Rasterization can only be done after all geometry has been committed.

Note: Rasterization here is the process of rendering a large amount of split geometry on the screen as pixels.

2. IOS Rendering technical framework

In fact, the hierarchy associated with iOS rendering is as follows:

Graphics rendering technology stack

The Graphics rendering stack of iOS App is shown in the figure below. App uses Core Graphics, Core Animation, Core Image and other frameworks to draw visual content, and these software frameworks are also dependent on each other. These frameworks all need to use OpenGL to call the GPU for drawing, and finally display the content on the screen.

The main functions of each framework are as follows:

UIKit,

The user interaction components used in our daily development are all from UIKit Framework. We complete the daily interface painting by setting UIKit components’ properties such as Layout and BackgroundColor. In fact, UIKit Framework itself does not have the ability to image images on the screen. It is mainly responsible for responding to user operation events, and the transmission of event responses is basically realized by traversing the view tree layer by layer.

The Core, Animation

CoreAnimation comes from LayerKit, and animation is just the tip of the ice. CoreAnimation is a composition engine whose job is to combine different on-screen visuals as quickly as possible. These visuals are broken down into separate layers (calayers) that are stored in a system called layer trees. Essentially, CALayer is the only foundation that users can see on the screen.

The Core, a Graphic

Core Graphic is based on the Quartz advanced graphics engine and is primarily used to draw images at runtime. Developers can use this framework to handle path-based drawing, transformation, color management, off-screen rendering, patterns, gradients and shadows, image data management, image creation and image masking, and PDF document creation, display, and analysis. When developers need to create images at run time, they can use Core Graphics to draw them. The opposite is to create images before they run, such as using Photoshop to create images in advance and import them directly into the app. Instead, we need Core Graphics to compute and draw a series of image frames in real time to animate them.

, the Core Image

Core Image is the opposite of Core Graphics, which is used to create images at run time, and Core Image, which is used to process images created before run. The Core Image framework has a series of ready-made Image filters to efficiently process existing images.

In most cases, Core Image will do the work on the GPU, but if the GPU is busy, it will use the CPU for processing.

OpenGL ES,

OpenGL ES (GLES) is a subset of OpenGL. OpenGL is a set of third-party standards, and the internal implementation of functions is developed and implemented by corresponding GPU manufacturers.

Matal,

Metal is similar to OpenGL ES and is a third-party standard implemented by Apple. Most developers don’t use Metal directly, but virtually all developers use Metal indirectly. Core Animation, Core Image, SceneKit, SpriteKit and other rendering frameworks are all built on top of Metal.

When you debug the OpenGL program on a real machine, the console prints a log of Metal being enabled. From this point you can assume that Apple has implemented a mechanism to seamlessly bridge OpenGL commands to Metal and let Metal do the actual hardware interaction.

UIView versus CALayer

As mentioned earlier in the introduction to Core Animation, CALayer is actually the foundation of everything the user sees on screen. Why do views in UIKit render visual content? Because every UI view control in UIKit actually has an associated CALayer, or backing layer, inside it.

Because of this one-to-one correspondence, the view level has the tree structure of the view tree, and the corresponding CALayer level has the tree structure of the layer tree.The view’s job isCreate and manage layersTo ensure that when a subview is added or removed from the hierarchy, its associated layers have the same operation in the layer tree, ensuring that the view tree and layer tree are structurally consistent. The view layer alsoManage user interaction responses.

Q: So why does iOS provide two parallel hierarchies based on UIView and CALayer?

The reason for this is separation of responsibilities, which also avoids a lot of duplicate code. Events and user interaction are different in a lot of ways on iOS and Mac OS X. There’s a fundamental difference between a multi-touch based user interface and a mouse and keyboard based interaction. That’s why iOS has UIKit and UIView. Mac OS X has AppKit and NSView. They are similar in function, but there are significant differences in implementation.

In fact, there are not two hierarchies, there are four. Each plays a different role. In addition to view trees and layer trees, there are render trees and render trees.

1.CALyaer

So why is CALayer able to present visualizations? Because a CALayer is basically a texture. Texture is an important basis for GPU image rendering.

A texture is essentially an image, so CALayer also contains a contents property pointing to a cache called the backing Store, which holds the backing Bitmap. The images stored in this cache are called in iOSBoarding figure.Graphics rendering pipelines support drawing from vertices (where vertices are processed to produce textures), as well as rendering directly with textures (images). Accordingly, in the actual development, there are two ways to draw the interface: one is manual drawing; The other is to use pictures. There are two ways to do this in iOS:

  • Use image: Contents Image
  • Manual drawing: Custom Drawing
Contents Image

Contents Image means to configure the Image through the Contents property of CALayer. However, the contents property is of type ID. In this case, you can give the contents property any value you want, and your app will still compile. But in practice, if the content value is not CGImage, the resulting layer will be blank. So why define the contents property type as ID instead of CGImage? This is because on Mac OS, this property works on both CGImage and NSImage values, while on iOS, this property works only on CGImage. Essentially, the contents property points to an area of cache, called the backing Store, where bitmap data can be stored.

Custom Drawing

Custom Drawing refers to using Core Graphics to draw boarding diagrams directly. In practice, you typically customize your drawing by inheriting UIView and implementing the -drawRect: method. Although -drawRect: is a UIView method, it’s actually the underlying CALayer that does the redrawing and saves the resulting image. The diagram below shows -drawRect: Drawing the basic principle that defines the boarding diagram.UIView has an associated layer called CALayer. CALayer has an optional delegate property that implements the CALayerDelegate protocol. UIView implements the CALayerDelegae protocol as a proxy for CALayer. When it is time to redraw, CALayer calls -drawRect:, asking its agent to give it a boarding diagram to display. CALayer will first try to call the -DisplayLayer: method, at which point the agent can set the contents property directly.

  • -(void)displayLayer:(CALayer *)layer;

If the proxy does not implement the -DisplayLayer: method, CALayer will try to call the -drawLayer:inContext: method. Before calling this method, CALayer creates an empty host diagram (size determined by bounds and contentScale) and a Core Graphics drawing context in preparation for drawing the host diagram, passed in as the CTX argument.

  • -(void)drawLayer:(CALayer *)layer inContext:(CGContextRef)ctx;

Finally, the boarding image generated by Core Graphics is stored in the backing Store.

Core Animation pipeline

Now that we know the nature of CALayer, how does it call the GPU and display visual content? Now we need to introduce the working principle of Core Animation pipeline.In fact, the app itself is not responsible for rendering, rendering is handled by a separate process, the Render Server process. The App throughIPCSubmit the Render task and related data to the Render Server. After the Render Server processes the data, it passes it to the GPU. Finally, GPU invokes iOS image device for display.

The detailed process of Core Animation assembly line is as follows:

  • First of all, App handles Events, such as user click operations. During this process, APP may need to update view tree, and accordingly, layer tree will also be updated.
  • Secondly, APP completes the calculation of display content through CPU, such as view creation, layout calculation, picture decoding, text drawing, etc. After evaluating the display, the app packages the layers and sends them to Render Server on the next RunLoop, completing a Commit Transaction.
  • Render Server mainly executes Open GL, Core Graphics related programs and calls GPU
  • The GPU completes the rendering of the image on the physical layer.
  • Finally, the GPU displays the image on the screen through the Frame Buffer, video controller and other related components.

The above steps are cascaded in succession and take far more than 16.67 ms to execute, so to support a screen refresh rate of 60 FPS, these steps need to be broken down and pipelined in parallel, as shown in the figure below.Commit Transaction In the Core Animation pipeline, the last step before the App calls the Render Server to Commit Transaction can be divided into 4 steps:

  • Layout

Layout stage mainly carries on the view construction, including: LayoutSubviews method overload, addSubview: method filling subview and so on.

  • Display

In the Display stage, view rendering is mainly carried out. Here, only the image metadata to be imaged is set. The drawRect: method of an overloaded view can customize the display of UIView by drawing a host diagram inside the drawRect: method, which uses CPU and memory.

  • Prepare

The Prepare stage is an additional step and generally processes image decoding and conversion.

  • Commit

The Commit phase mainly packages layers and sends them to Render Server. This is done recursively, because layers and views exist in a tree structure.

Principles of animation rendering

IOS Animation rendering is also based on the Core Animation pipeline mentioned above. Here we focus on the execution process of APP and Render Server. In daily development, UIView Animation is generally used for Animation that is not particularly complex. IOS divides its processing process into the following three stages:

  • Step 1: call animationWithDuration: animations: method
  • Step 2: Perform Layout, Display, Prepare, Commit and other steps in the Animation Block.
  • Step 3: Render Server according to the Animation frame by frame.

Off-screen Rendering (OFF-screen Rendering)

Definition of off-screen rendering

In OpenGL, GPU Screen Rendering has the following two ways: 1. On-screen Rendering: Normally, we render data directly in the FrameBufffer, and then read the data and display it On the Screen.Off-screen Rendering: If the rendered result cannot be written to the frame buffer due to some limitation, it is temporarily stored in another area of memory before being written to the frame buffer, this process is called off-screen Rendering.

In the diagram of CoreAnimation pipeline above, we can know that the main rendering operation is performed by the Render Server module of CoreAnimation by calling the OpenGL or Metal interface provided by the graphics card driver. For each layer, The Render Server will follow the “painter algorithm” (from far to near), output to the frame buffer in order, and then draw to the screen in order. When a layer is drawn, it will remove the layer from the frame buffer (to save space) as shown below, output from left to right to get the final display result.However, in some scenes, although the “painter algorithm” can output layer by layer, it cannot go back to erase/modify a certain part after a certain layer is rendered, because the layer pixel data before this layer has been permanently overwritten. This means that each layer can either be rendered in a single pass, or a chunk of memory can be set aside as a temporary transit zone for complex modifications/clipping operations.

To illustrate: Rounded and clipped Figure 3: ImageView. ClipsToBounds = YES, imageView. Layer. The cornerRadius = 10, this is not a simple layer overlay, figure 1, figure 2, figure 3 rendering is finished, but also for cuts, and the child view layer because the parent view have rounded corners, It also needs to be clipped so that you can’t go back and erase/change parts of a layer after it’s rendered. So you can’t follow the normal process, so Apple renders each layer, stores it in a buffer, the off-screen buffer, and then after stacking and processing it, stores it in the frame cache, and then draws it on the screen, which is called off-screen rendering

Analysis of clip-triggered off-screen rendering

Examples of common Settings for fillet trigger off-screen render description:

  • Btn1 sets the image, sets rounded corners, turns clipsToBounds = YES on, and triggers the off-screen rendering.
  • Btn2 sets the background color, sets rounded corners, turns clipsToBounds = YES on, and does not trigger the off-screen render.
  • Img1 sets images, sets rounded corners, turns masksToBounds = YES, triggers off-screen rendering.
  • Img2 set the background color, set rounded corners, turned on masksToBounds = YES, did not trigger the off-screen render

For each layer, you either find an algorithm that can render in a single pass, or you have to open up a separate memory and use this temporary transit area to do more complex, multiple modifications/clipping operations. For BTN1, because it contains a subview (image is its subview), cutting bTN1 will remove the excess of the subview and trigger offline rendering. While img1 should set the backgroundColor, the painter algorithm will draw the backgroundColor first and then draw the content in the content, also cannot complete the rendering in a single traversal, so it will trigger the off-screen. However, both BTN2 and IMG2 can render in a single pass, so they do not trigger an off-screen render

Common off-screen rendering scene analysis

  • CornerRadius +clipsToBounds: cornerRadius+ cornerRadius+clipsToBounds And if just set the cornerRadius (if you don’t need to cut the content, only need a border with a rounded corners), or just the content of the need to cut the rectangle area (although also be cut, but it is ok to want to find a little, for the pure rectangle, realize the algorithm does not seem to need another open memory), will not trigger the off-screen rendering. There are several options for cutting fillet performance optimization depending on the scenario, and it is highly recommended to read the AsyncDisplayKit document.

  • Shadow, the reason is that although the layer itself is a rectangular area, the shadow is applied to the “opaque area” of the layer by default, and needs to be displayed below the content of all layers, because at this point, the body of the shadow (layer and its sub-layer) has not been combined. So you can’t draw a shape in the first step that you can’t know until you finish the last step

In this case, I would have to allocate another block of memory, draw the contents of the body first, then add shadows to the frame buffer according to the shape of the rendered result, and finally draw the contents on it (this is just my guess, but it could be more complicated). However, if we can tell the CoreAnimation (via the shadowPath property) the geometry of the shadow, then of course the shadow can be rendered independently first, without relying on the Layer body, so there is no need for off-screen rendering.

  • Group opacity: Alpha is not applied to each layer separately, but after the entire layer tree is drawn, alpha is added uniformly, and it is combined with pixels of other layers at the bottom. Obviously, you can’t get the final result in one iteration. Place a pair of blue and red layers on top of each other, and set opacity=0.5 on the parent layer, with a copy next to it for comparison. Group opacity on the left is turned off, while the right remains on default (starting from iOS7, group opacity is turned on if it is not explicitly specified). When debugging offscreen Rendering, the group on the right is indeed rendered off-screen.

  • Mask, as we know, is applied on top of layer and any combination of its sub-layers, and may have transparency. In fact, the principle of group opacity is similar to that of group opacity, which has to be completed in off-screen rendering.

  • Other things like allowsEdgeAntialiasing can trigger off-screen rendering, and the principle is similar: if you can’t draw the final result using only the frame buffer, you’ll have to create another memory space to store the intermediate result. These principles are not mysterious.

Performance impact of GPU off-screen rendering

The effect of off-screen rendering on performance is not only reflected in the extra memory for off-screen rendering, but also in context switching. GPU operation is highly pipelined. All of the computational work was being methodically output to the frame Buffer, when suddenly we received an instruction that we needed to output to another piece of memory, and everything that was going on in the pipeline had to be discarded and switched to serving only our current “rounding” operation. Wait until you are done, then clear again, and return to the normal flow of output to the Frame Buffer.

In tableView or collectionView, each frame change in the scroll triggers a redrawing of each cell, so once there is an off-screen rendering, the context switch mentioned above occurs 60 times per second, and there are probably dozens of images per frame that require it. You can imagine the performance impact on gpus (Gpus are very good at massively parallel computing, but I think frequent context switching is obviously not in the design consideration)

Use off-screen rendering to improve performance

CALayer provides a corresponding solution for this scheme: shouldRasterize. Once set to true, the Render Server forces the layer’s rendering results (including its sub-layers, rounded corners, shadows, group opacity, etc.) to be stored in a single block of memory so that the next frame can still be reused without triggering an off-screen rendering again. There are a few points to note:

  • ShouldRasterize aims to reduce the performance penalty, but always triggers at least one off-screen render. If your layer isn’t complicated, doesn’t have rounded shadows, etc., turning this on will add an unnecessary off-screen render

  • The off-screen rendering cache has a maximum size of 2.5 times the total pixels of the screen

  • Once the cache exceeds 100ms and is not used, it is automatically discarded

  • The contents of the layer (including the sublayers) must be static, because any changes (e.g., resize, animation) invalidate the cache that you worked so hard to create. If this happens too often, we’re back to the “every frame needs to be rendered off-screen” situation, which is exactly what developers need to avoid. In this case, Xcode provides an option to “Color Hits Green and Misses Red” to help us see if the cache is being used as expected

  • In addition to addressing the overhead of multiple off-screen renderings, shouldRasterize can also be used in another scene: If the substructure of the layer is very complex and takes a long time to render, you can also turn this switch on, draw the layer into a cache, and reuse the result later so that you don’t have to redraw the entire layer tree each time

When do I need CPU rendering

Tuning rendering performance is always about one thing: balancing the load on the CPU and GPU so that they do what they do best. In most cases, thanks to the GPU’s optimization for graphics processing, we tend to leave the RENDERING task to the GPU, giving the CPU enough time to process the various complex App logic. Core Animation has done a lot of work to convert rendering into gPU-friendly forms (i.e., hardware acceleration, such as Layer Composition, setting backgroundColor, etc.).

However, for some cases, such as text (CoreText uses CoreGraphics rendering) and image (ImageIO rendering), as the GPU is not good at these tasks, it has to be processed by the CPU before passing the result to the GPU as texture. In addition, sometimes the GPU is too busy and the CPU is relatively idle (GPU bottleneck). In this case, the CPU can share part of the work to improve the overall efficiency.

reference

Chuquan. Me / 2018/09/25 /… Juejin. Cn/post / 684490… www.jianshu.com/p/39b91ecaa… www.jianshu.com/p/c778cf2a1…