This is the 29th day of my participation in the August Wenwen Challenge.More challenges in August

This paper mainly introduces the principle and optimization of interface stalling

1. The interface is slow

Generally speaking, the display process in a computer is as follows: the CPU, GPU, and monitor work together to display images on the screen

  • 1. The CPU calculates the display content and submits it to the GPU
  • 2. Put the rendering result into the GPU after renderingFrameBuffer(Frame cache)
  • 3, thenVideo controllerWill be in accordance with theVSyncThe signal is read line by lineFrameBufferThe data of
  • 4, through the possible digital to analog conversion transfer to the display

In the beginning, there was only one FrameBuffer. In this case, reading and refreshing of the FrameBuffer was very efficient. To solve this problem, double cache was introduced. Double buffering. In this case, the GPU will pre-render one frame into the FrameBuffer for the video controller to read, and after the next frame is rendered, the GPU will point the video controller’s pointer directly to the second FrameBuffer. Although the double caching mechanism solves the efficiency problem, it also brings a new problem. When the video controller has not finished reading, for example, the screen content is just half displayed, the GPU submits a new frame to the FrameBuffer and exchanges the two framebuffers, The video controller then displays the second half of a new frame of data on the screen, causing the screen to tear

To solve this problem, a vSYNC signal mechanism is adopted. When VSync is enabled, the GPU waits for the VSync signal from the display to render a new frame and update the FrameBuffer. Currently, iOS devices use double cache +VSync

2. The screen is stuck

Let’s talk about why the screen freezes

Upon the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate and display contents in the CPU. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. So it’s easy to think of dropping frames as out of date

As shown in the figure below, it is a display process. The first frame is processed before the arrival of VSync and normally displayed. The second frame is still being processed after the arrival of VSyncFrame dropIn this case, the card will be visible when renderingPhenomenon, As can be seen from the figure, whichever CPU or GPU blocks the display process will causeFrame dropSo in order to provide users with better experience, in the development, we need to carry outCaton detectionAnd the correspondingTo optimize the

3. Caton monitoring

There are generally two kinds of schemes for Caton’s monitoring:

  • FPS monitoring: In order to maintain the UI interaction of the process, the App refresh struggle should be kept at60fpsAbout, the reason is becauseiOSThe default refresh frequency of the device is60 times per second, while 1 refresh (i.eVSyncThe interval between the signal emitted is1000 ms / 60 = 16.67 ms, so if in16.67 msA stutter occurs when a frame is not ready for the next frame
  • Main thread stuck monitoring: Monitor the main thread RunLoop through the child thread to determine two states (kCFRunLoopBeforeSources 和 kCFRunLoopAfterWaiting) whether the elapsed time reaches a certain threshold

3.1 FPS monitoring

FPS monitoring, refer to YYFPSLabel in YYKit, mainly through CADisplayLink. The time difference of link is used to calculate the time required for a refresh, and then the refresh frequency is obtained by the refresh times/time difference, and the range is judged, and the severity of the lag is indicated by displaying different text colors. The code implementation is as follows:

class CJLFPSLabel: UILabel { fileprivate var link: CADisplayLink = { let link = CADisplayLink.init() return link }() fileprivate var count: Int = 0 fileprivate var lastTime: TimeInterval = 0.0 fileprivate var fpsColor: UIColor = {return uicolor.green}() fileprivate var FPS: Double = 0.0 override init(frame: Zero {f.size = CGSize(width: 80.0, height: 22.0)} super.init(frame: f) self.textColor = UIColor.white self.textAlignment = .center self.font = UIFont.init(name: "Menlo", size: 12) self.backgroundcolor = uicolor. lightGray // Through the virtual class link = cadisplaylink. init(target: CJLWeakProxy(target:self), selector: #selector(tick(_:))) link.add(to: RunLoop.current, forMode: RunLoop.Mode.common) } required init?(coder: NSCoder) { fatalError("init(coder:) has not been implemented") } deinit { link.invalidate() } @objc func tick(_ link: CADisplayLink){ guard lastTime ! Timestamp return} count += 1 // let detla = link.timestamp - lastTime guard detla >= 1.0 Else {return} lastTime = link.timestamp // Refresh times/Time difference = refresh times FPS = Double(count)/detla let fpsText = "(String.init(format: "%.2f", fps)) FPS" count = 0 let attrMStr = NSMutableAttributedString(attributedString: NSAttributedString(string: FpsText)) if FPS > 55.0 {// smooth fpsColor = uicolor.green}else if (FPS >= 50.0 &&fps <= 55.0){// general fpsColor = 55.0 UIColor. Yellow} else {/ / caton fpsColor = UIColor. Red} attrMStr. SetAttributes ([NSAttributedString. Key. ForegroundColor: fpsColor], range: NSMakeRange(0, attrMStr.length - 3)) attrMStr.setAttributes([NSAttributedString.Key.foregroundColor: UIColor.white], range: NSMakeRange(attrMStr.length - 3, 3)) DispatchQueue.main.async { self.attributedText = attrMStr } } }Copy the code

For simple monitoring, an FPS is sufficient.

3.2 Main thread lag monitoring

In addition to the FPS, you can also monitor it through RunLoop because it’s transactions that are stuck, and transactions are handled by the RunLoop of the main thread.

Implementation idea: detect the main thread each time to execute the message loop, when this time is greater than the specified threshold, it is recorded as the occurrence of a stall. This is also the principle of wechat Katon tripartite Matrix

Here is a simple implementation of RunLoop monitoring

// // CJLBlockMonitor. Swift // UIOptimizationDemo // // Created by Chen Jia-lin on 2020/12/2. // import UIKit class CJLBlockMonitor: NSObject { static let share = CJLBlockMonitor.init() fileprivate var semaphore: DispatchSemaphore! fileprivate var timeoutCount: Int! fileprivate var activity: CFRunLoopActivity! Private override init() {super.init()} public func start(){ fileprivate extension CJLBlockMonitor{ func registerObserver(){ let controllerPointer = Unmanaged<CJLBlockMonitor>.passUnretained(self).toOpaque() var context: CFRunLoopObserverContext = CFRunLoopObserverContext(version: 0, info: controllerPointer, retain: nil, release: nil, copyDescription: nil) let observer: CFRunLoopObserver = CFRunLoopObserverCreate(nil, CFRunLoopActivity.allActivities.rawValue, true, 0, { (observer, activity, info) in guard info ! = nil else{ return } let monitor: CJLBlockMonitor = Unmanaged<CJLBlockMonitor>.fromOpaque(info!) .takeUnretainedValue() monitor.activity = activity let sem: DispatchSemaphore = monitor.semaphore sem.signal() }, &context) CFRunLoopAddObserver(CFRunLoopGetMain(), observer, CFRunLoopMode.com monModes)} func startMonitor () {/ / create a signal semaphore = DispatchSemaphore (value: Dispatchqueue.global ().async {while(true){dispatchqueue.global ().async {while(true){ Let st = self.semaphore. Wait (timeout: dispatchtime.now ()+1.0) if st! = DispatchTimeoutResult. Success {/ / two state kCFRunLoopBeforeSources, kCFRunLoopAfterWaiting monitoring,  if self.activity == CFRunLoopActivity.beforeSources || self.activity == CFRunLoopActivity.afterWaiting { Self. timeoutCount += 1 if self.timeoutCount < 2 {print(" timeoutCount = (self.timeoutCount)") continue Great possibility to avoid large scale printing continuously! Print (" More than two consecutive stutters detected ")}} self.timeoutCount = 0}}}}Copy the code

To use it, call it directly

CJLBlockMonitor.share.start()
Copy the code

You can also use the tripartite library directly

  • SwiftCaton detects third partiesANREye, the main idea is: create child threads for cyclic monitoring, set the marker to true each time, then send tasks to the main thread, set the marker to false, and then judge whether the marker is false when the child thread sleep exceeds the threshold, if not, it indicates that the main thread has stalled
  • OCYou can useWeChat matrix,Drops DoraemonKit

4. Interface optimization

4.1 CPU Level Optimization

  • 1, as far as possibleUse lightweight objectsInstead of heavyweight objects, performance can be optimized, such as controls that do not require corresponding touch eventsCALayerInstead ofUIView
  • 2. Minimize the number of pairsUIViewandCALayerProperty modification of
    • CALayer has no properties inside it; when a property method is called, it is through the runtimeresolveInstanceMethodAdding a temporary method to an object, storing the corresponding property value in an internal Dictionary, notifying the delegate, creating an animation, and so on, can be time-consuming
    • UIViewRelated display properties, such as frame, Bounds, Transform, and so on, are actually mapped from CALayer and consume more resources to adjust than normal properties
  • 3, when a large number of objects are released, it is also very time-consuming, try to move to the background thread to release
  • 4, as far as possibleCalculate the view layout ahead of time, i.e.,Preliminary layout, for example, the row height of the cell
  • 5,AutolayoutWhile this can be a great way to improve development efficiency in simple pages, it can cause serious performance problems for complex views. Autolayout’s CPU consumption increases exponentially as the number of views increases. So use as much as you canCode layout. If you don’t want to manually adjust frames, etc., you can also use tripartite libraries, for exampleNavigation (OC), SnapKit (Swift), ComponentKit, AsyncDisplayKit, etc
  • 6, text processing optimization: when an interface has a large amount of text, its line height calculation, drawing is also very time-consuming
    • 1. If you have no special requirements for text, you can use the internal implementation method of UILabel and put it into a child thread to avoid blocking the main thread
      • Calculate text width and height:[NSAttributedString boundingRectWithSize:options:context:]
      • Text drawing:[NSAttributedString drawWithRect:options:context:]
    • 2, custom text control, useTextKitOr at the bottomCoreTextAsynchronously draw text. andCoreTextAfter the object is created, the width and height of the text can be directly obtained, avoiding multiple calculations (adjustments and drawings need to be calculated once). CoreText directly uses CoreGraphics with small memory footprint and high efficiency
  • 7. Image processing (decoding + drawing)
    • 1. When usedUIImage 或 CGImageSourceWhen the image is created, the image data is not decoded immediately, but decoded at setup time (i.eUIImageView/CALayer.contentsIn, then inCALayerBefore submitting to GPU rendering,CGImageIs decoded. This step isThe inevitableOf, and happened inThe main threadIn the. A common way to get around this mechanism is to draw the image to in the child thread firstCGBitmapContextAnd then fromBitmapCreate images directly, for exampleSDWebImageTripartite frame of the picture encoding and decoding processing. That’s ImagePreliminary decoding
    • 2. When drawing an image to a canvas using the CG method, and then creating an image from the canvas, you can convert the image todrawinThe child threadIn the
  • 8. Image optimization
    • 1, use as much as possiblePNGPictures, not usedJPGEThe picture
    • 2, through theThe child thread is pre-decoded and the main thread is rendered, that is, throughBitmapCreate the image and assign iMAg to the child thread
    • 3. Optimize the image size to avoid dynamic zooming
    • 4, as far as possible to multiple pictures together for a display
  • 9, as far as possibleAvoid transparent Views, because the use of transparent view will result in the calculation of pixels in GPU, the pixels of the lower layer of transparent view will also be counted, i.eColor mixingTo deal with
  • 10,According to the need to loadFor example, instead of loading an image when sliding in TableView, use the default placeholder map and load it when sliding stops
  • 11. Use lessaddView 给cellDynamically addview

4.2 GPU Layer Optimization

Compared to CPU, GPU mainly receives texture + vertices submitted by CPU, goes through a series of transform, finally mixes and renders, and outputs to the screen.

  • 1, as far as possibleReduce the number of images displayed in a short period of time, as far as possibleMultiple pictures are displayed together, mainly because when a large number of pictures are displayed, both CPU calculation and GPU rendering are very time-consuming, and frames may be dropped
  • 2. Try to avoid exceeding the size of the picture4096 x 4096, because when the image exceeds this size, it will be preprocessed by CPU and then submitted to GPU for processing, resulting in additional CPU resource consumption
  • 3. Try to reduce the number and level of views, mainly because when there are too many and overlapping views, GPU will mix them, and the mixing process is very time-consuming
  • 4. Try to avoid off-screen rendering
  • 5. Asynchronous rendering. For example, all the controls and views in the cell can be combined into a picture for display. Consider Graver’s tripartite framework

Note: The implementation of the above optimization methods needs to be evaluated according to their own projects and optimized with reasonable use