Want to understand the principle of graphics and image knowledge, or familiar with the underlying hardware to the relevant knowledge of the software framework. However, the visualization program is a CPU and GPU cooperation to complete.

Basic knowledge of

Let’s look at the basic concepts of GPU and CPU:

  • The Central Processing Unit (CPU) is the computing and control Unit of the system. It is the final execution Unit for information Processing and program execution. The internal structure of CPU has a certain degree of parallel computing capability. The main functions of the CPU are: processing instructions, performing operations, controlling time, and processing data.
  • Graphics Processing Unit (GPU) : a Graphics Processing Unit that generates 2D or 3D Graphics images and videos and has superior parallel computing capabilities. Gpus make graphics card reduces the dependence on the CPU, and part of the original CPU work, especially in a 3 d graphics GPU core technology adopted by the hardware T&L (geometric transformation and lighting), cubic texture environment and vertex blending, texture compression and bump mapping, double four pixels 256 rendering engine, etc., Among them, GPU manufacturers are mainly NVIDIA and ATI.

The CPU to GPU workflow

workflow

When the CPU encounters image processing, it will call GPU for processing. The main process can be divided into the following four steps:

  1. Copy processing data from main memory to video memory
  2. CPU instructions drive the GPU
  3. Each computing unit in the GPU is processed in parallel
  4. The GPU sends the results of video memory back to main memory

Principle of screen imaging display

If we want to study the principle of picture display, we need to start from the principle of CRT display, as shown in the following classic figure. The CRT electron gun scans the screen line by line from top to bottom, rendering a single frame. The gun then returns to its original position for the next scan. To synchronize the display process with the system’s video controller, the display generates a series of timing signals using a hardware clock. When the gun switches lines to scan, the monitor emits a horizontal synchronization signal, or HSync; When a frame is drawn and the gun returns to its original position, the display sends a vertical synchronization signal, or VSync, before it is ready to draw the next frame. The monitor usually refreshes at a fixed rate, which is the frequency at which the VSync signal is generated. Although today’s displays are basically LCD screens, the principle is basically the same.

Extension: CRT display is technically called "Cathode Ray Tube", which is a Cathode Ray Tube used as a Cathode Ray Tube.Copy the code

The following figure shows the working mode of CPU, GPU and monitor. The CPU calculates the display content and submits it to THE GPU. After the GPU finishes rendering, the rendering result is stored in the frame buffer. The video controller will read the data in the frame buffer frame by frame according to the VSync signal, and finally display it by the display after data conversion.

At its simplest, there is only one frame buffer. At this point, both the frame buffer and the read and flush have significant efficiency issues. In order to solve the efficiency problem, GPU usually introduces two buffers, namely double buffering mechanism. In this case, GPU will pre-render a frame into the buffer for the video controller to read. After the next frame is rendered, GPU will directly point the pointer of the video controller to the second buffer.

Double buffering solves the efficiency problem, but it introduces a new one. When the video controller has not finished reading, that is, the screen content is just half displayed, THE GPU submits a new frame content to the frame buffer and exchanges the two buffers, the video controller will display the lower part of the new frame data on the screen, causing the picture tearing phenomenon, as shown in the figure below

To solve this problem, gpus usually have a mechanism called VSync (also known as V-sync). When VSync is enabled, the GPU will wait for a VSync signal from the display before performing a new frame rendering and buffer update. This will solve the problem of tearing and increase the smoothness of the picture, but it will consume more computing resources and cause some latency.

IOS Rendering Framework

IOS provides developers with a rich Framework (UIKit, Core Animation, Core Graphic, OpenGL, etc.) to meet the various needs of development from top to bottom.

It can be seen that the Core of iOS rendering view is Core Animation. From bottom to top is GPU->(OpenGL, Core Graphic) -> Core Animation -> UIKit.

UIKit

UIKit is the most commonly used framework for iOS developers. It can draw interfaces by setting UIKit controls. In fact, UIKit itself does not have the ability of screen imaging, and its main responsibility is to respond to user operation events [inherited from UIResponder].

Core Animation

Core Animation is derived from Layer Kit, which is a composite engine responsible for drawing different visualizations. These layers are in the Layer tree system. Essentially, CALayer is the foundation of everything the user can see on the screen.

Core Graphics

Core Graphics is based on the Quartz Graphics engine and is primarily used to draw images at runtime. You can use this framework for mapping, transformation, off-control rendering, image creation, and PDF document creation as well as display and analysis.

Core Image

Core Image is the opposite of Core Graphics, which is used to create images at run time, and Core Image, which is used to process images created before run.

In most cases, Core Image will do the work in the GPU, and if the GPU is busy, it will use the CPU for processing.

OpenGL  ES

OpenGL ES is a subset of OpenGL, and the internal implementation of the function is developed by the manufacturer GPU.

Metal

Apple’s own graphic image processing framework. Metal is similar to OpenGL ES. It is a third-party standard implemented by Apple. Most developers don’t use Metal directly, but virtually all developers use Metal indirectly. Core Animation, Core Image, SceneKit, SpriteKit, and other rendering frameworks are built on top of Metal.

Core Animation pipeline

The working principle of Core Animation is introduced as follows:

In fact, the APP itself is not responsible for rendering, rendering is handed over to a separate process, the Render Sever process.

APP will submit the rendering task and related data to Render Server through IPC. After the data is processed by Render Server, it will be transmitted to GPU, and finally the GPU will call the image device of iOS for display.

Detailed process of Core Animation pipeline:

  1. First, the App handles events, such as click actions, during which the App may need to update the view tree and, accordingly, the layer will be updated
  2. Secondly, the App updates the display content through CPU, such as: View creation, layout calculation, image decoding, text drawing, etc. After the calculation of the display content is completed, the app will package the layers and send them to Render Server in the next Runloop, i.e. a Commit Transaction operation is completed
  3. Render Server mainly executes OpenGL, Core Graphics related programs, and calls GPU
  4. GPU completes image rendering at the physical layer
  5. The GPU displays the image on the screen through the Frame Buffer, video controller and other related components.

They take far more time to execute than 16.67 ms, so to support a screen with a refresh rate of 60 FPS, these steps need to be broken down and pipelinated in parallel, as shown in the figure below.

Commit Transaction

Commit Transaction, the last step before App calls Render Server

  1. Layout
  2. Display
  3. Prepare
  4. Commit

Layout

Layout stage mainly carries on the view construction, including the LayoutSubviews method overload, addSubview adds the subview

Display

The drawRect method can customize the reality of UIView. The principle is that the drawRect method draws the host map inside, using CPU and memory

Prepare

The Prepare stage is an additional step and generally deals with image decoding and conversion operations

Commit

Commit is used to package layers and send them to Render Server, recursively because layers and views exist in a tree structure

Principles of animation rendering

IOS Animation rendering is also based on the Core Animation pipeline mentioned above. Here we focus on the execution flow of app and Render Server.

If the Animation is not particularly complex, UIView Animation is generally used to achieve it. IOS divides its processing process into the following three stages:

  • Step 1: CallanimationWithDuration:animations:methods
  • Step 2: Proceed in the Animation BlockLayout.Display.Prepare.CommitSuch steps.
  • Step 3:Render ServerRender frame by frame according to the Animation.

Caton reasons and solutions

Caton principle

FPS (Frames Per Second) refers to the number of Frames rendered Per Second, which is usually used to measure the smoothness of the picture. The more Frames Per Second, the smoother the picture will be. 60fps is the best.

After the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate the display content in the CPU, such as view creation, layout calculation, picture decoding, text drawing, etc. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. That’s why the interface gets stuck.

As you can see from the above figure, whichever CPU or GPU blocks the display process will cause frame drops. Therefore, it is necessary to evaluate and optimize CPU and GPU pressure respectively during development.

CPU optimization

1. Layout calculation

The calculation of the view layout is the most CPU consuming part of the APP. If the view layout is calculated in advance in the background thread, and the view layout is cached, this can solve the performance problem. Adjust the corresponding property once, rather than repeatedly evaluating and adjusting the control’s frame/bounds/ Center property.

2. Text calculation

If an interface contains a large amount of text (such as weibo and wechat moments, etc.), text width and height calculation will occupy a large portion of resources and is inevitable. If you have no special requirements for text display, you can refer to the internal implementation of UILabel: With [NSAttributedString boundingRectWithSize: options: context:] to calculate the text width is high, Use – [NSAttributedString drawWithRect: options: context:] to draw text. Although these two methods perform well, they still need to be put into background threads to avoid blocking the main thread.

3. Picture drawing

Drawing an image usually refers to the process of drawing an image onto a canvas using methods that begin with CG, and then creating and displaying the image from the canvas. The most common place to do this is inside [UIView drawRect:]. Since CoreGraphic methods are usually thread-safe, drawing images can easily be put into background threads. The process for a simple asynchronous drawing looks something like this (it’s much more complicated than this, but the principle is the same) :

- (void)display { dispatch_async(backgroundQueue, ^{ CGContextRef ctx = CGBitmapContextCreate(...) ; // draw in context... CGImageRef img = CGBitmapContextCreateImage(ctx); CFRelease(ctx); dispatch_async(mainQueue, ^{ layer.contents = img; }); }); }Copy the code

4. Object creation

The creation of an object allocates memory, adjusts properties, and even reads files, consuming CPU resources. You can optimize performance by replacing heavy objects with lightweight objects. For example, CALayer is much lighter than UIView, so controls that do not need to respond to touch events are better displayed with CALayer. If the object does not involve UI operations, try to create it in the background thread, but unfortunately controls that contain CALayer can only be created and manipulated in the main thread. Creating view objects in Storyboard is much more expensive than creating them directly in code, and Storyboard is not a good technology choice for performance-sensitive interfaces.

Delay object creation as long as possible and spread it out over multiple tasks. Although this is a bit of a hassle to implement and doesn’t offer many advantages, if you can do it, try it. If objects can be reused and the cost of reuse is less than releasing and creating new objects, then such objects should be reused in a cache pool whenever possible.

GPU optimization

Compared to the CPU, the GPU can do a single thing: take the submitted Texture and vertex description, apply the transform, mix and render, and then print it to the screen. The main things you can see are textures (pictures) and shapes (vector shapes for triangle simulation).

1. Texture rendering

All bitmaps, including images, rasterization and other contents, are eventually submitted to video memory by memory. Both the process of submitting to video memory and the process of rendering Texture consume a lot of GPU. When a large number of images are displayed in a short period of time (such as when the TableView has a large number of images and slides quickly), the CPU usage is very low and the GPU usage is very high, and the interface will still drop frames. The only way to avoid this situation is to minimize the display of a large number of pictures in a short period of time, and to display as many pictures as possible.

2. View mixing

When multiple views (or Calayers) are displayed on top of each other, the GPU blends them together first. If the view structure is too complex, the mixing process can also consume a lot of GPU resources. To reduce GPU consumption in this situation, applications should minimize the number and level of views and indicate opaque attributes in opaque views to avoid useless Alpha channel composition. Of course, this can also be done by pre-rendering multiple views as a single image.

3. Graph generation

CALayer’s border, rounded corners, shadows and masks, CASharpLayer’s vector graphics display, will usually result in off-screen rendering, and off-screen rendering usually occurs in GPU. When a list has a large number of rounded corners, and the list is quickly animated, GPU resources are fully occupied and CPU resources are less consumed.

The most radical solution is to draw the graphics you want to display as images in the background, avoiding rounded corners, shadows, mask properties, and so on.

For how to monitor katton, via Runloop, you can refer to the Nuggets there are many articles, are similar, in this will not do the narrative!!