Performance optimization

The CPU and GPU

In the process of screen imaging, CPU and GPU play a crucial role

  • CPU (Central Processing Unit)
    • Object creation and destruction, object property adjustment, layout calculation, text calculation and typesetting, image format conversion and decoding, image drawing (Core Graphics)
  • Graphics Processing Unit (GPU)
    • Texture rendering

In iOS, it is double buffering mechanism, with frame caching before and frame caching after

Principles of screen imaging

Let’s start with the principle of CRT display in the past. CRT electron gun scans from top to bottom line in the above way. After scanning, the display displays a frame, and then the electron gun returns to the initial position to continue the next scan. To synchronize the display with the system’s video controller, the display (or other hardware) generates a series of timing signals using a hardware clock. When the gun moves to a new line and is ready to scan, the monitor emits a horizontal synchronization signal, or HSync; When a frame is drawn and the gun returns to its original position, the display sends a vertical synchronization signal, or VSync, before it is ready to draw the next frame. The monitor usually refreshes at a fixed rate, which is the frequency at which the VSync signal is generated. Although today’s devices are mostly LCD screens, the principle remains the same.

Generally speaking, the CPU, GPU, and display in a computer system work together in this way. The CPU calculates the display content and submits it to THE GPU. After the GPU finishes rendering, the rendering result is put into the frame buffer. Then the video controller will read the data of the frame buffer line by line according to the VSync signal and transmit it to the display through possible digital-to-analog conversion.

In the simplest case, there is only one frame buffer, and reading and flushing of the frame buffer can be inefficient. To solve the efficiency problem, the display system usually introduces two buffers, that is, double buffering mechanism. In this case, the GPU will pre-render one frame into a buffer for the video controller to read, and after the next frame is rendered, the GPU will point the video controller’s pointer directly to the second buffer. So there’s a huge increase in efficiency.

Double buffering solves the efficiency problem, but it introduces a new one. When the video controller has not finished reading, that is, the screen content is just half displayed, THE GPU submits a new frame content to the frame buffer and exchanges the two buffers, the video controller will display the lower part of the new frame data on the screen, causing the picture tearing phenomenon, as shown below:

To solve this problem, gpus usually have a mechanism called VSync (also known as V-sync). When VSync is enabled, the GPU will wait for a VSync signal from the display before performing a new frame rendering and buffer update. This will solve the problem of tearing and increase the smoothness of the picture, but it will consume more computing resources and cause some latency.

IOS devices always use dual caching and enable vSYNC.

The cause of caton

After the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate the display content in the CPU, such as view creation, layout calculation, picture decoding, text drawing, etc. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. That’s why the interface gets stuck.

As you can see from the above figure, whichever CPU or GPU blocks the display process will cause frame drops. Therefore, it is necessary to evaluate and optimize CPU and GPU pressure respectively during development.

Causes and solutions of CPU resource consumption

Causes of resource consumption

  • Object creation, adjustment, and destruction
  • Layout calculation, Autolayout
  • Text calculation, text rendering
  • Picture decoding and rendering

The solution

  • Try to use lightweight objects, such as CALayer instead of UIView, for places that don’t handle events

  • Don’t make frequent calls to UIView properties such as frame, bounds, Transform, etc., and minimize unnecessary changes

  • Try to calculate the layout in advance, adjust the corresponding attributes at one time if necessary, do not modify the attributes more than once

  • Autolayout consumes more CPU resources than setting the frame directly

  • The image size should be exactly the same as the UIImageView size

  • Control the maximum number of concurrent threads

  • Try to put time-consuming operations into child threads

    • Text processing (size calculation, drawing)
    • Image processing (decoding, rendering)

Causes and solutions of GPU resource consumption

Cause of Resource Destruction

Compared to the CPU, the GPU can do a single thing: take the submitted Texture and vertex description, apply the transform, mix and render, and then print it to the screen. The main things you can see are textures (pictures) and shapes (vector shapes for triangle simulation).

Texture rendering

All bitmaps, including images, text and rasterized content, are eventually committed from memory to video memory and bound to the GPU Texture. Both the process of submitting to video memory and the process of GPU adjusting and rendering Texture consume a lot of GPU resources. When a large number of images are displayed in a short period of time (such as when the TableView has a large number of images and slides quickly), the CPU usage is very low and the GPU usage is very high, and the interface will still drop frames. The only way to avoid this situation is to minimize the display of a large number of pictures in a short period of time, and to display as many pictures as possible.

When the image is too large to exceed the maximum texture size of the GPU, the image needs to be preprocessed by the CPU, which will bring additional resource consumption to the CPU and GPU. For now, iPhone 4S and up, texture size is up to 4096 x 4096, more information can be found here: iosres.com. So try not to let images and views exceed this size.

Blending of views

When multiple views (or Calayers) are displayed on top of each other, the GPU blends them together first. If the view structure is too complex, the mixing process can also consume a lot of GPU resources. To reduce GPU consumption in this situation, applications should minimize the number and level of views and indicate opaque attributes in opaque views to avoid useless Alpha channel composition. Of course, this can also be done by pre-rendering multiple views as a single image.

Graph generation.

CALayer’s border, rounded corners, shadows and masks, CASharpLayer’s vector graphics display, usually trigger offscreen rendering, which usually happens on the GPU. When a list view shows a large number of calayers with rounded corners and a quick swipe, you can observe that the GPU resource is full and the CPU resource consumption is low. The interface still slides normally, but the average number of frames drops to a very low level. ShouldRasterize to avoid this, try turning on the calayer.shouldrasterize property, but this will shift the off-screen rendering onto the CPU. For situations where only rounded corners are needed, you can also simulate the same visual effect by overlaying the original view with an already drawn rounded corner image. The most radical solution is to draw the graphics that need to be displayed as images in background threads, avoiding rounded corners, shadows, masks, and other attributes.

The solution

  • Try to avoid the display of a large number of pictures in a short period of time, as far as possible to display a composite of multiple pictures
  • The maximum texture size that GPU can process is 4096×4096. Once the texture size exceeds this size, IT will occupy CPU resources for processing, so the texture size should not exceed this size
  • Minimize the number of views and levels
  • Reduce transparent views (alpha<1) and set Opaque to YES for opaque views
  • Try to avoid off-screen rendering

Off-screen rendering

In OpenGL, the GPU has 2 rendering methods

  • On-screen Rendering: Renders On the Screen buffer currently used for display
  • Off-screen Rendering: Creates a new buffer outside the current Screen buffer for Rendering

Off-screen rendering is a performance drain

  • A new buffer needs to be created
  • In the whole process of off-screen rendering, the context needs to be changed several times, first from the current Screen (on-screen) to off-screen (off-screen); When the off-screen rendering is finished, the rendering results of the off-screen buffer are displayed on the screen, and the context needs to be switched from off-screen to the current screen

What actions trigger an off-screen rendering?

  • ShouldRasterize = YES

  • Mask the layer mask

  • Rounded corners with layer.masksToBounds = YES and layer.cornerRadius > 0

    • Consider using CoreGraphics to draw rounded corners, or ask an artist to provide rounded images
  • Shadow layer. ShadowXXX

    • If you set it uplayer.shadowPathThere will be no off-screen rendering

Caton detection

  • The “lag” is usually referred to as a time-consuming operation performed on the main thread

  • You can add an Observer to the main RunLoop to monitor the lag by listening for the time it takes for the RunLoop state to change

Caton detection tool class: LXDAppFluecyMonitor

#import "LXDAppFluecyMonitor.h"
#import "LXDBacktraceLogger.h"

#define LXD_DEPRECATED_POLLUTE_MAIN_QUEUE

#define SHAREDMONITOR [LXDAppFluecyMonitor sharedMonitor]

@interface LXDAppFluecyMonitor(a)

@property (nonatomic.assign) int timeOut;
@property (nonatomic.assign) BOOL isMonitoring;

@property (nonatomic.assign) CFRunLoopObserverRef observer;
@property (nonatomic.assign) CFRunLoopActivity currentActivity;

@property (nonatomic.strong) dispatch_semaphore_t semphore;
@property (nonatomic.strong) dispatch_semaphore_t eventSemphore;

@end


#define LXD_SEMPHORE_SUCCESS 0
static NSTimeInterval lxd_restore_interval = 5;
static NSTimeInterval lxd_time_out_interval = 1;
static int64_t lxd_wait_interval = 200 * NSEC_PER_MSEC;


/ *! * @brief Listen to whether the runloop is in before waiting state */
static inline dispatch_queue_t lxd_event_monitor_queue() {
    static dispatch_queue_t lxd_event_monitor_queue;
    static dispatch_once_t once;
    dispatch_once(&once, ^{
        lxd_event_monitor_queue = dispatch_queue_create("com.sindrilin.lxd_event_monitor_queue".NULL);
    });
    return lxd_event_monitor_queue;
}

/ *! * @brief listens for runloop status between after waiting and before sources */
static inline dispatch_queue_t lxd_fluecy_monitor_queue() {
    static dispatch_queue_t lxd_fluecy_monitor_queue;
    static dispatch_once_t once;
    dispatch_once(&once, ^{
        lxd_fluecy_monitor_queue = dispatch_queue_create("com.sindrilin.lxd_monitor_queue".NULL);
    });
    return lxd_fluecy_monitor_queue;
}

#define LOG_RUNLOOP_ACTIVITY 0
static void lxdRunLoopObserverCallback(CFRunLoopObserverRef observer, CFRunLoopActivity activity, void * info) {
    SHAREDMONITOR.currentActivity = activity;
    dispatch_semaphore_signal(SHAREDMONITOR.semphore);
#if LOG_RUNLOOP_ACTIVITY
    switch (activity) {
        case kCFRunLoopEntry:
            NSLog(@"runloop entry");
            break;
            
        case kCFRunLoopExit:
            NSLog(@"runloop exit");
            break;
            
        case kCFRunLoopAfterWaiting:
            NSLog(@"runloop after waiting");
            break;
            
        case kCFRunLoopBeforeTimers:
            NSLog(@"runloop before timers");
            break;
            
        case kCFRunLoopBeforeSources:
            NSLog(@"runloop before sources");
            break;
            
        case kCFRunLoopBeforeWaiting:
            NSLog(@"runloop before waiting");
            break;
            
        default:
            break;
    }
#endif
};

@implementation LXDAppFluecyMonitor

#pragma mark - Singleton override
+ (instancetype)sharedMonitor {
    static LXDAppFluecyMonitor * sharedMonitor;
    static dispatch_once_t once;
    dispatch_once(&once, ^{
        sharedMonitor = [[super allocWithZone: NSDefaultMallocZone()] init];
        [sharedMonitor commonInit];
    });
    return sharedMonitor;
}

+ (instancetype)allocWithZone: (struct _NSZone *)zone {
    return [self sharedMonitor];
}

- (void)dealloc {
    [self stopMonitoring];
}

- (void)commonInit {
    self.semphore = dispatch_semaphore_create(0);
    self.eventSemphore = dispatch_semaphore_create(0);
}


#pragma mark - Public
- (void)startMonitoring {
    if (_isMonitoring) { return; }
    _isMonitoring = YES;
    CFRunLoopObserverContext context = {
        0,
        (__bridge void *)self.NULL.NULL
    };
    _observer = CFRunLoopObserverCreate(kCFAllocatorDefault, kCFRunLoopAllActivities, YES.0, &lxdRunLoopObserverCallback, &context);
    CFRunLoopAddObserver(CFRunLoopGetMain(), _observer, kCFRunLoopCommonModes);
    
    dispatch_async(lxd_event_monitor_queue(), ^{
        while (SHAREDMONITOR.isMonitoring) {
            if (SHAREDMONITOR.currentActivity == kCFRunLoopBeforeWaiting) {
                __block BOOL timeOut = YES;
                dispatch_async(dispatch_get_main_queue(), ^{
                    timeOut = NO;
                    dispatch_semaphore_signal(SHAREDMONITOR.eventSemphore);
                });
                [NSThread sleepForTimeInterval: lxd_time_out_interval];
                if(timeOut) { [LXDBacktraceLogger lxd_logMain]; } dispatch_wait(SHAREDMONITOR.eventSemphore, DISPATCH_TIME_FOREVER); }}});dispatch_async(lxd_fluecy_monitor_queue(), ^{
        while (SHAREDMONITOR.isMonitoring) {
            long waitTime = dispatch_semaphore_wait(self.semphore, dispatch_time(DISPATCH_TIME_NOW, lxd_wait_interval));
            if(waitTime ! = LXD_SEMPHORE_SUCCESS) {if(! SHAREDMONITOR.observer) { SHAREDMONITOR.timeOut =0;
                    [SHAREDMONITOR stopMonitoring];
                    continue;
                }
                if (SHAREDMONITOR.currentActivity == kCFRunLoopBeforeSources || SHAREDMONITOR.currentActivity == kCFRunLoopAfterWaiting) {
                    if (++SHAREDMONITOR.timeOut < 5) {
                        continue;
                    }
                    [LXDBacktraceLogger lxd_logMain];
                    [NSThread sleepForTimeInterval: lxd_restore_interval];
                }
            }
            SHAREDMONITOR.timeOut = 0; }}); } - (void)stopMonitoring {
    if(! _isMonitoring) {return; }
    _isMonitoring = NO;
    
    CFRunLoopRemoveObserver(CFRunLoopGetMain(), _observer, kCFRunLoopCommonModes);
    CFRelease(_observer);
    _observer = nil;
}


@end
Copy the code

Optimize the power consumption

Main sources of electricity consumption:

  • CPU Processing

  • Networks and Networking

  • Location, Location

  • Images, Graphics

Power consumption optimization:

  • Minimize CPU and GPU power consumption

  • Use timers as little as possible

  • Optimize I/O operations

    • Do not write small data frequently. It is better to write data in batches at a time
    • Consider dispatch_io for reading and writing large amounts of important data, which provides an API for asynchronously operating file I/O based on GCD. The system optimizes disk access with dispatch_io
    • For large amounts of data, it is recommended to use databases (such as SQLite and CoreData).
  • Network optimization

    • Reduce and compress network data
    • If the result of multiple requests is the same, try to use caching
    • Use breakpoint continuation, otherwise the same content may be transmitted multiple times when the network is unstable
    • Do not attempt to perform network requests when the network is unavailable
    • Allows users to cancel long-running or slow network operations and set appropriate timeouts
    • For example, when downloading video streams, instead of sending small packets, download the entire file directly or in large chunks. If you download ads, download several at a time and display them slowly. If you download emails, download several at a time, not one by one
  • Location optimization

    • If you just need to quickly determine the user’s location, it’s best to use the requestLocation method of CLLocationManager. After the locating is complete, the locating hardware is automatically powered off
    • If it’s not a navigation app, try not to update your location in real time and turn off location services when you’re done
    • Try to reduce the positioning accuracy, such as try not to use the highest accuracy kCLLocationAccuracyBest
    • Need background position, try to set up pausesLocationUpdatesAutomatically to YES, if the user is unlikely to move the system will automatically suspended position update
    • Try not to use startMonitoringSignificantLocationChanges, priority startMonitoringForRegion:
  • Hardware detection optimization

    • When the user moves, shakes or tilts the device, motion events are generated, which are detected by hardware such as accelerometers, gyroscopes and magnetometers. These hardware should be turned off when testing is not required

The start of the App

APP startup can be divided into two types

  • Cold Launch: Start an APP from scratch
  • Warm Launch: The APP is already in memory and alive in the background. Click the icon again to Launch the APP

The optimization of APP startup time is mainly for cold startup

  • Print APP startup time analysis by adding environment variables (Edit Scheme -> Run -> Arguments)
    • DYLD_PRINT_STATISTICS is set to 1
    • If you need more detailed information, set DYLD_PRINT_STATISTICS_DETAILS to 1

dyld

Dyld (Dynamic Link Editor), Apple’s dynamic linker, which can be used to load Mach-O files (executables, dynamic libraries, etc.)

When you launch your APP, dyld does a couple of things

  • Load the executable file of the APP, and recursively load all dependent dynamic libraries
  • When dyLD finishes loading the executable and dynamic libraries, it notifies Runtime for further processing

Runtime

When you start your APP, the Runtime does the following

  • Map_images is called for parsing and processing of executable file contents
  • Call call_load_methods in load_images and call the +load method for all classes and categories
  • Initialize various objC structures (register objC classes, initialize class objects, and so on)
  • Call the C++ static initializer and__attribute__((constructor))Modified function

Up to this point, all symbols (Class, Protocol, Selector, IMP,…) in executables and dynamic libraries All have been successfully loaded into memory in the format managed by the Runtime.

Main

To summarize

  • The startup of the APP is dominated by dyLD, which loads the executable into memory, along with all the dependent dynamic libraries
  • The Runtime is responsible for loading the structure defined by objC
  • When all initialization is complete, Dyld calls the main function
  • Next up is the UIApplicationMain function, the AppDelegateapplication:didFinishLaunchingWithOptions:methods

Start the optimization

According to different stages

  • dyld

    • Reduce dynamic libraries, merge some dynamic libraries (periodically purge unnecessary dynamic libraries)
    • Reduce the number of Objc classes and classes, reduce the number of selectors (periodically clean up unnecessary classes and classes)
    • Reduce the number of C++ virtual functions
    • Swift uses structs as much as possible
  • runtime

    • with+initializeMethod and dispatch_once replace all__attribute__((constructor)), C++ static constructor, ObjC+load
  • main

    • Delay as much as possible without affecting the user experience, and don’t put everything into the finishLaunching method and load it on demand

Mounting package slimming

  • Installation package (IPA) mainly consists of executable files and resources

  • Resources (pictures, audio, video, etc.)

    • Take lossless compression
    • Remove unused resources: github.com/tinymind/LS…
  • Thin executable files

    • Compiler optimization
      • Strip Linked Product, Make Strings read-only, Symbols Hidden by Default set to YES
      • Disable exception support, Enable C++ Exceptions, Enable Objective-C Exceptions set to NO, Other C Flags add -fno-exceptions
    • Detect unused Code with AppCode: menu bar -> Code -> Inspect Code
    • Write the LLVM plug-in to detect duplicate code, uncalled code

LinkMap

Generate a LinkMap file to view the specific composition of the executable file

You can use third-party tools to parse LinkMap files: github.com/huanxsd/Lin…

The interview questions

  1. How did you optimize memory in your project?

  2. What aspects of optimization do you start with?

  3. What are the possible causes of stutter? How do you normally optimize?

  4. Did you get a tableView stuck? What are the causes of the problem?

Refer to the article

IOS tips for keeping the interface smooth