This is the 29th day of my participation in the August Wenwen Challenge.More challenges in August
This paper mainly introduces the principle and optimization of interface stalling
1. The interface is slow
Generally speaking, the display process in a computer is as follows: the CPU, GPU, and monitor work together to display images on the screen
- 1. The CPU calculates the display content and submits it to the GPU
- 2. Put the rendering result into the GPU after rendering
FrameBuffer
(Frame cache) - 3, then
Video controller
Will be in accordance with theVSync
The signal is read line by lineFrameBuffer
The data of - 4, through the possible digital to analog conversion transfer to the display
In the beginning, there was only one FrameBuffer. In this case, reading and refreshing of the FrameBuffer was very efficient. To solve this problem, double cache was introduced. Double buffering. In this case, the GPU will pre-render one frame into the FrameBuffer for the video controller to read, and after the next frame is rendered, the GPU will point the video controller’s pointer directly to the second FrameBuffer. Although the double caching mechanism solves the efficiency problem, it also brings a new problem. When the video controller has not finished reading, for example, the screen content is just half displayed, the GPU submits a new frame to the FrameBuffer and exchanges the two framebuffers, The video controller then displays the second half of a new frame of data on the screen, causing the screen to tear
To solve this problem, a vSYNC signal mechanism is adopted. When VSync is enabled, the GPU waits for the VSync signal from the display to render a new frame and update the FrameBuffer. Currently, iOS devices use double cache +VSync
2. The screen is stuck
Let’s talk about why the screen freezes
Upon the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate and display contents in the CPU. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. So it’s easy to think of dropping frames as out of date
As shown in the figure below, it is a display process. The first frame is processed before the arrival of VSync and normally displayed. The second frame is still being processed after the arrival of VSyncFrame drop
In this case, the card will be visible when renderingPhenomenon,
As can be seen from the figure, whichever CPU or GPU blocks the display process will causeFrame drop
So in order to provide users with better experience, in the development, we need to carry outCaton detection
And the correspondingTo optimize the
3. Caton monitoring
There are generally two kinds of schemes for Caton’s monitoring:
FPS monitoring
: In order to maintain the UI interaction of the process, the App refresh struggle should be kept at60fps
About, the reason is becauseiOS
The default refresh frequency of the device is60 times per second
, while 1 refresh (i.eVSync
The interval between the signal emitted is1000 ms / 60 = 16.67 ms
, so if in16.67 ms
A stutter occurs when a frame is not ready for the next frameMain thread stuck monitoring
: Monitor the main thread RunLoop through the child thread to determine two states (kCFRunLoopBeforeSources
和kCFRunLoopAfterWaiting
) whether the elapsed time reaches a certain threshold
3.1 FPS monitoring
FPS monitoring, refer to YYFPSLabel in YYKit, mainly through CADisplayLink. The time difference of link is used to calculate the time required for a refresh, and then the refresh frequency is obtained by the refresh times/time difference, and the range is judged, and the severity of the lag is indicated by displaying different text colors. The code implementation is as follows:
class CJLFPSLabel: UILabel { fileprivate var link: CADisplayLink = { let link = CADisplayLink.init() return link }() fileprivate var count: Int = 0 fileprivate var lastTime: TimeInterval = 0.0 fileprivate var fpsColor: UIColor = {return uicolor.green}() fileprivate var FPS: Double = 0.0 override init(frame: Zero {f.size = CGSize(width: 80.0, height: 22.0)} super.init(frame: f) self.textColor = UIColor.white self.textAlignment = .center self.font = UIFont.init(name: "Menlo", size: 12) self.backgroundcolor = uicolor. lightGray // Through the virtual class link = cadisplaylink. init(target: CJLWeakProxy(target:self), selector: #selector(tick(_:))) link.add(to: RunLoop.current, forMode: RunLoop.Mode.common) } required init?(coder: NSCoder) { fatalError("init(coder:) has not been implemented") } deinit { link.invalidate() } @objc func tick(_ link: CADisplayLink){ guard lastTime ! Timestamp return} count += 1 // let detla = link.timestamp - lastTime guard detla >= 1.0 Else {return} lastTime = link.timestamp // Refresh times/Time difference = refresh times FPS = Double(count)/detla let fpsText = "(String.init(format: "%.2f", fps)) FPS" count = 0 let attrMStr = NSMutableAttributedString(attributedString: NSAttributedString(string: FpsText)) if FPS > 55.0 {// smooth fpsColor = uicolor.green}else if (FPS >= 50.0 &&fps <= 55.0){// general fpsColor = 55.0 UIColor. Yellow} else {/ / caton fpsColor = UIColor. Red} attrMStr. SetAttributes ([NSAttributedString. Key. ForegroundColor: fpsColor], range: NSMakeRange(0, attrMStr.length - 3)) attrMStr.setAttributes([NSAttributedString.Key.foregroundColor: UIColor.white], range: NSMakeRange(attrMStr.length - 3, 3)) DispatchQueue.main.async { self.attributedText = attrMStr } } }Copy the code
For simple monitoring, an FPS is sufficient.
3.2 Main thread lag monitoring
In addition to the FPS, you can also monitor it through RunLoop because it’s transactions that are stuck, and transactions are handled by the RunLoop of the main thread.
Implementation idea: detect the main thread each time to execute the message loop, when this time is greater than the specified threshold, it is recorded as the occurrence of a stall. This is also the principle of wechat Katon tripartite Matrix
Here is a simple implementation of RunLoop monitoring
// // CJLBlockMonitor. Swift // UIOptimizationDemo // // Created by Chen Jia-lin on 2020/12/2. // import UIKit class CJLBlockMonitor: NSObject { static let share = CJLBlockMonitor.init() fileprivate var semaphore: DispatchSemaphore! fileprivate var timeoutCount: Int! fileprivate var activity: CFRunLoopActivity! Private override init() {super.init()} public func start(){ fileprivate extension CJLBlockMonitor{ func registerObserver(){ let controllerPointer = Unmanaged<CJLBlockMonitor>.passUnretained(self).toOpaque() var context: CFRunLoopObserverContext = CFRunLoopObserverContext(version: 0, info: controllerPointer, retain: nil, release: nil, copyDescription: nil) let observer: CFRunLoopObserver = CFRunLoopObserverCreate(nil, CFRunLoopActivity.allActivities.rawValue, true, 0, { (observer, activity, info) in guard info ! = nil else{ return } let monitor: CJLBlockMonitor = Unmanaged<CJLBlockMonitor>.fromOpaque(info!) .takeUnretainedValue() monitor.activity = activity let sem: DispatchSemaphore = monitor.semaphore sem.signal() }, &context) CFRunLoopAddObserver(CFRunLoopGetMain(), observer, CFRunLoopMode.com monModes)} func startMonitor () {/ / create a signal semaphore = DispatchSemaphore (value: Dispatchqueue.global ().async {while(true){dispatchqueue.global ().async {while(true){ Let st = self.semaphore. Wait (timeout: dispatchtime.now ()+1.0) if st! = DispatchTimeoutResult. Success {/ / two state kCFRunLoopBeforeSources, kCFRunLoopAfterWaiting monitoring, if self.activity == CFRunLoopActivity.beforeSources || self.activity == CFRunLoopActivity.afterWaiting { Self. timeoutCount += 1 if self.timeoutCount < 2 {print(" timeoutCount = (self.timeoutCount)") continue Great possibility to avoid large scale printing continuously! Print (" More than two consecutive stutters detected ")}} self.timeoutCount = 0}}}}Copy the code
To use it, call it directly
CJLBlockMonitor.share.start()
Copy the code
You can also use the tripartite library directly
Swift
Caton detects third partiesANREye, the main idea is: create child threads for cyclic monitoring, set the marker to true each time, then send tasks to the main thread, set the marker to false, and then judge whether the marker is false when the child thread sleep exceeds the threshold, if not, it indicates that the main thread has stalledOC
You can useWeChat matrix,Drops DoraemonKit
4. Interface optimization
4.1 CPU Level Optimization
- 1, as far as possible
Use lightweight objects
Instead of heavyweight objects, performance can be optimized, such as controls that do not require corresponding touch eventsCALayer
Instead ofUIView
- 2. Minimize the number of pairs
UIView
andCALayer
Property modification of- CALayer has no properties inside it; when a property method is called, it is through the runtime
resolveInstanceMethod
Adding a temporary method to an object, storing the corresponding property value in an internal Dictionary, notifying the delegate, creating an animation, and so on, can be time-consuming UIView
Related display properties, such as frame, Bounds, Transform, and so on, are actually mapped from CALayer and consume more resources to adjust than normal properties
- CALayer has no properties inside it; when a property method is called, it is through the runtime
- 3, when a large number of objects are released, it is also very time-consuming, try to move to the background thread to release
- 4, as far as possible
Calculate the view layout ahead of time
, i.e.,Preliminary layout
, for example, the row height of the cell - 5,
Autolayout
While this can be a great way to improve development efficiency in simple pages, it can cause serious performance problems for complex views. Autolayout’s CPU consumption increases exponentially as the number of views increases. So use as much as you canCode layout
. If you don’t want to manually adjust frames, etc., you can also use tripartite libraries, for exampleNavigation (OC), SnapKit (Swift), ComponentKit, AsyncDisplayKit, etc
- 6, text processing optimization: when an interface has a large amount of text, its line height calculation, drawing is also very time-consuming
- 1. If you have no special requirements for text, you can use the internal implementation method of UILabel and put it into a child thread to avoid blocking the main thread
- Calculate text width and height:
[NSAttributedString boundingRectWithSize:options:context:]
- Text drawing:
[NSAttributedString drawWithRect:options:context:]
- Calculate text width and height:
- 2, custom text control, use
TextKit
Or at the bottomCoreText
Asynchronously draw text. andCoreText
After the object is created, the width and height of the text can be directly obtained, avoiding multiple calculations (adjustments and drawings need to be calculated once). CoreText directly uses CoreGraphics with small memory footprint and high efficiency
- 1. If you have no special requirements for text, you can use the internal implementation method of UILabel and put it into a child thread to avoid blocking the main thread
- 7. Image processing (decoding + drawing)
- 1. When used
UIImage
或CGImageSource
When the image is created, the image data is not decoded immediately, but decoded at setup time (i.eUIImageView/CALayer.contents
In, then inCALayer
Before submitting to GPU rendering,CGImage
Is decoded. This step isThe inevitable
Of, and happened inThe main thread
In the. A common way to get around this mechanism is to draw the image to in the child thread firstCGBitmapContext
And then fromBitmap
Create images directly, for exampleSDWebImage
Tripartite frame of the picture encoding and decoding processing. That’s ImagePreliminary decoding
- 2. When drawing an image to a canvas using the CG method, and then creating an image from the canvas, you can convert the image to
draw
inThe child thread
In the
- 1. When used
- 8. Image optimization
- 1, use as much as possible
PNG
Pictures, not usedJPGE
The picture - 2, through the
The child thread is pre-decoded and the main thread is rendered
, that is, throughBitmap
Create the image and assign iMAg to the child thread - 3. Optimize the image size to avoid dynamic zooming
- 4, as far as possible to multiple pictures together for a display
- 1, use as much as possible
- 9, as far as possible
Avoid transparent Views
, because the use of transparent view will result in the calculation of pixels in GPU, the pixels of the lower layer of transparent view will also be counted, i.eColor mixing
To deal with - 10,
According to the need to load
For example, instead of loading an image when sliding in TableView, use the default placeholder map and load it when sliding stops - 11. Use less
addView
给cell
Dynamically addview
4.2 GPU Layer Optimization
Compared to CPU, GPU mainly receives texture + vertices submitted by CPU, goes through a series of transform, finally mixes and renders, and outputs to the screen.
- 1, as far as possible
Reduce the number of images displayed in a short period of time
, as far as possibleMultiple pictures are displayed together
, mainly because when a large number of pictures are displayed, both CPU calculation and GPU rendering are very time-consuming, and frames may be dropped - 2. Try to avoid exceeding the size of the picture
4096 x 4096
, because when the image exceeds this size, it will be preprocessed by CPU and then submitted to GPU for processing, resulting in additional CPU resource consumption - 3. Try to reduce the number and level of views, mainly because when there are too many and overlapping views, GPU will mix them, and the mixing process is very time-consuming
- 4. Try to avoid off-screen rendering
- 5. Asynchronous rendering. For example, all the controls and views in the cell can be combined into a picture for display. Consider Graver’s tripartite framework
Note: The implementation of the above optimization methods needs to be evaluated according to their own projects and optimized with reasonable use