IOS Interview Investigation (1) : Runtime related questions

IOS interview Investigation (9) : Performance optimization related questions

@[TOC]

1. IOS interview Investigation (ix) : Performance optimization related questions

1.1 Startup Optimization

1.1.1 Startup time

The startup time of your APP directly affects users’ first experience and judgment of your APP. If the startup time is too long, not only the experience plummets, but also it may trigger Apple’s Watch Dog mechanism to kill your APP, which will be a tragedy. Users will feel that the APP freezes up and crashes as soon as it starts, and then they can’t use it. They will hold down the APP and click delete. (Xcode does not enable Watch Dog in debug mode, so we must connect the real machine to test our APP.)

Before measuring the startup time of APP, let’s first understand the startup process of APP:

We divided the startup mode of App into:

Cold start: When the App is started, the application process is not in the system (opened for the first time or the program is killed), and the system needs to allocate a new process to start the App.
Hot startup: After the App is returned to the background, the corresponding process is still in the system.

1.1.2 APP startup process

APP startup can be divided into two phases: before main() and after main(). The summary is as follows:

T (total App startup time) = T1 (loading time before main()) + T2 (loading time after main()).

T1 = systemdylibLoading time of (dynamic link library) and App executable;

t2 = main()After the function is executedAppDelegateIn the classapplicationDidFinishLaunching:withOptions:The time before the execution of the method ends.

So we get and optimize APP startup time from these two stages. Let’s first look at how to get the startup time before the main() function is executed.

1.1.2.1 `main()`Before the function is executed

To measure the elapsed time before main(), which is called time1, Apple provides a way to measure the elapsed time before main(). Check the DYLD_PRINT_STATISTICS option (you can use DYLD_PRINT_STATISTICS_DETAILS for more detailed information) as shown below:

Total pre-main time: 34.22 milliseconds (100.0%) Dylib loading time: 14.43 milliseconds (42.1%) Rebase /binding time: 1.82 milliseconds (5.3%) ObjC Setup Time: 3.89 milliseconds (11.3%) Initializer time: 13.99 milliseconds (40.9%) slowest intializers: 2.20 milliseconds (6.4%) libBacktraceRecording. Dylib: 2.90 milliseconds (8.4%) libMainThreadChecker. Dylib: 6.55 milliseconds (2.0%) libswiftCoreImage.dylib: 0.71 milliseconds (2.0%)Copy the code

The system-level dynamic link library is optimized by Apple so it doesn’t take much time, and most of the time t1 is spent on our own App code and linking to third-party libraries. So how can we reduce the time before calling main()? The points we can optimize are:

Merge dynamic libraries to reduce unnecessaryframework, especially third-party links, because dynamic links are time-consuming;

check frameworkShould be set tooptionalandrequiredIf theframeworkIf all iOS versions supported by the current App exist, set it torequiredOtherwise, set tooptionalBecause theoptionalThere will be some extra checks;

Merge or cut somethingOCClass for cleaning up classes that are not used in the projectAppCodeCode inspection tools:

Remove static variables that are useless

Deletes methods that are not called or that are obsolete

Defer to +initialize things that do not have to be done in the +load method

Try not to use C++ virtual functions (creating virtual tables is expensive)

Avoid the use ofattribute((constructor)), can be implemented in the initialization method to matchdispatch_onceUse.

Reduce the number of C++ static global variables that are not primitive types. (Because such global variables are usually classes or structs, heavy work in constructors can slow startup.)

We can theoretically analyze what main does before executing:

Load the executable fileLoading:Mach-OFormat file, that is, the format generated after compilation of all classes in App is.oObject file collection.
Loading dylib completes the following steps:

Analyze all the dependencies of the Appdylib.

finddylibThe correspondingMach-OFile.

Open and read theseMach-OFile and verify its validity.

Register code signatures in the system kernel

rightdylibEach and every onesegmentcallmmap().

The system dependent dynamic library is optimized and can be loaded quickly, while the dynamic library introduced by developers takes longer time.

Rebase and Bind operations: Due to the use of ASLR technology, pointer offsets need to be calculated to get the correct resource address during dylib loading. Rebase reads the image into the memory and modifies the pointer inside the image, consuming I/O performance. Bind queries the symbol table to Bind external images, which requires a lot of CPU.
Objc setup: Initializes Objc, including registering Objc classes, checking selector uniqueness, and inserting classification methods.
Initializers: Write content into the application stack. This includes executing +load methods, calling C/C++ constructor functions (functions modified with attributes ((((constructor)), creating C++ static global variables of non-primitive types, and so on.

1.1.2.2 `main()`After the function is executed

After measuring the main () function performs the time-consuming Time of the second stage of the statistics, we think that is from the main () after applicationDidFinishLaunching: withOptions: method finally, so we can through the way of rbi. Objective-c projects have main files, so we can get them directly by adding code:

// 1. Add the following code to main.m:
CFAbsoluteTime AppStartLaunchTime;

int main(int argc, char * argv[]) {
    AppStartLaunchTime = CFAbsoluteTimeGetCurrent(a); . }// 2. Declare at the beginning of appdelegate.m
extern CFAbsoluteTime AppStartLaunchTime;

/ / 3. Finally the AppDelegate. M didFinishLaunchingWithOptions added
dispatch_async(dispatch_get_main_queue(), ^{
  NSLog(@"App startup time --%f", (CFAbsoluteTimeGetCurrent() -AppStartLaunchTime));
});
Copy the code

The Swift project does not have a main file, but we can add the main function by adding the @UIApplicationMain flag. So if we need to do something else inside main, we need to create our own main.swift file, which Is also allowed by Apple. We can remove the @UIApplicationMain flag from the AppDelegate class;

Then create your own main.swift file and add the program entry:

import UIKit

var appStartLaunchTime: CFAbsoluteTime = CFAbsoluteTimeGetCurrent(a)UIApplicationMain(
    CommandLine.argc,
    UnsafeMutableRawPointer(CommandLine.unsafeArgv)
        .bindMemory(
            to: UnsafeMutablePointer<Int8>.self,
            capacity: Int(CommandLine.argc)),
    nil.NSStringFromClass(AppDelegate.self))Copy the code

Then the AppDelegate didFinishLaunchingWithOptions: method finally add:

/ / APP startup time time-consuming and end from mian function to didFinishLaunchingWithOptions method
DispatchQueue.main.async {
  print("APP startup time time-consuming, from mian function is to: (CFAbsoluteTimeGetCurrent didFinishLaunchingWithOptions method () - appStartLaunchTime).")}Copy the code

In general, optimizations after main can be done in the following ways:

Use pure code to minimize the use of XIBs;

Whether all network requests in the startup phase are put into asynchronous requests;

Whether some time-consuming operations can be carried out later or asynchronously.

Using a simple AD page as a transition, the calculation operation of the home page and the network request are asynchronously placed on the AD page display.

When the page display needs to be changed (for example, singles’ Day), the data cache will be delivered in advance

The home page controller is built in pure code rather than xiB /Storyboard to avoid time-consuming layout transitions.

Avoid a large number of calculations in the main thread. Instead, the calculations irrelevant to the first screen are displayed on the page to shorten the CPU computing time.

Avoid using large images, reduce the number and hierarchy of views, and reduce the burden of GPU.

Optimize network request interfaces (such as DNS policies) and only request data related to the first screen.

Cache the first screen data locally and request new data after rendering is complete.

1.1.3 APP startup optimization

1.1.3.1 APP startup optimization — binary rearrangement

The optimization related to App startup in 1.1.1 and 1.1.2 above is based on some code levels, and the design should be optimized as best as possible to reduce the startup time. We also have an optimization of the underlying principles of the operating system, which is also the optimization of the stage before the main function is executed.

After learning the principle of operating system, we will know that there are two ways of paging and segmenting when loading memory by our operating system. Due to the limitation of the actual memory of mobile phone, the memory given to us by the operating system in general is virtual memory, that is to say, memory needs to be mapped. For example, if the memory required by our App is large, the App can only load a limited number of memory pages at a time, instead of loading all the memory of the App into the memory at one time. If the page that started loading is found not in memory during APP startup, the page missing interrupt will occur, and the missing page will be found from the disk and added to the memory again. Page missing interrupts are time-consuming, albeit millisecond, but if they occur multiple times in a row, the user will notice a significant startup delay.

Knowing this, we need to solve the problem in terms of paging. The idea of binary rearrangement is to rearrange all the relevant classes required for the startup of our APP in the compilation stage to the front, so as to avoid and reduce the number of page missing interruptions as much as possible, so as to achieve the purpose of startup optimization.

Let’s take a closer look at how memory loading works

1.1.3.1.1 Principle of Memory Loading

In the early days of computers, there was no concept of virtual memory, and any application that was loaded from disk into running memory was fully loaded and ordered. But there are some problems with using physical memory directly like this:

Security: The memory module uses real physical addresses, and each application process in the memory module is in sequence. The address offset in process 1 allows access to the memory of other processes.

Efficiency problem: With the development of software, more and more memory needs to be occupied when a software is running. However, users often do not use all the functions of the application, resulting in a large memory waste, and the later open processes often need to queue.

In order to solve the problems of physical memory, the concept of virtual memory is introduced. By referring to virtual memory, a large contiguous memory space that we think we have in our process is actually virtual, meaning that we can access it from 0x000000 to 0xFFFFFF. But in fact, this memory address is only a virtual address, and this virtual address can be mapped through a mapping table to obtain the real physical address.

Here’s a diagram of how virtual memory works:

After referencing virtual memory, there is no problem with offsets that can access the address space of other processes. Because each process’s mapping table is separate, you can access these addresses in your process any way you want. These addresses are restricted by the mapping table, and their real physical addresses are always within the specified range, so there is no problem of offsetting the memory space of other processes.

After virtual memory is introduced, the CPU needs to map to find the real physical address before accessing data through the virtual memory address. The process is as follows:

Based on the virtual memory address, find the mapping table of the corresponding process.

Through the mapping table to find its corresponding real physical address, and then find the data.

Having studied operating systems, we know that there are two ways to address CPU memory: paging and segmentation.

Virtual memory and physical memory are mapped through a mapping table, but the mapping cannot be one-to-one. It would be a waste of memory. We know that physical memory is actually a contiguous space, and if all of it is allocated to one application, other applications will not respond. In order to solve the problem of efficiency, the operating system adopts two ways of memory management: paging and segmenting.

For most of our multi-user, multi-process scenarios, we use paging, where the operating system splits a contiguous chunk of memory into multiple pages, each of which is the same size, For example, on Linux, the memory size of a page is 4KB. The Mac OS kernel is also based on Linux, so the memory size of a page is also 4KB. But on iOS, a page is 16KB.

The memory is divided into many pages, just like our thick book, there are many pages, but so many pages, without the table of contents, we can hardly find the page we really need. The operating system uses a cache to store pages that need to be loaded ahead of time. Because the CPU time slice is precious, the CPU is responsible for do many important things, and direct reading data from disk into memory IO operations is very time consuming, in order to improve the efficiency, the cache mode, is the first part will need paging is loaded into the cache, CPU need read read directly from the cache, instead of direct methods disk, This is a huge increase in CPU efficiency, but we also have a limited cache size, a limited number of pages to load, and if the pages the CPU needs to read are not in the cache, then a page miss interrupt occurs and the required pages are loaded from disk into the cache.

The following figure shows the virtual page table mapping between two processes:

When an application is loaded into memory, the entire application is not loaded into memory. This is the lazy loading concept. In other words, the actual physical memory is stored as much as the application uses.

When an application accesses an address that is zero in the mapping table, that is, not loaded into physical memory, the system immediately blocks the entire process, triggering what is known as a page-missing interruption-Page Fault.

When a page-missing interrupt is triggered, the operating system re-reads the page from disk to physical memory, and then points the virtual memory in the mapping table to the corresponding memory. (If the current memory is full, the operating system will find a page of data to overwrite through the displacement page algorithm, which is why no amount of applications will crash. But the root cause of restarting the previously opened application when it is opened again).

The operating system solves the problem of memory waste and efficiency perfectly through this paging and overwriting mechanism, but since virtual memory is adopted, one of the functions will be fixed in virtual memory no matter how many times it is run. In this way, there will be vulnerabilities, hackers can easily write a good program in advance to obtain the implementation of a fixed function to modify hook operation. So this is a very serious security issue.

For example, if the application has a function with an offset of 0x00a000 based on the initial address, then the virtual address ranges from 0x000000 to 0xffFFFF based on this, then I can get the real implementation address of this function anyway by passing the virtual address 0x00a000.

In order to solve the above security problems, ASLR technology is introduced. Its principle is to add a random offset value each time the virtual address is mapped to the real address.

Android 4.0, Apple iOS4.3, OS X Mountain Lion10.8 began to introduce ASLR technology to all people, and in fact, since the introduction of ASLR, the threshold of hackers has been raised. It’s no longer the age when anyone can be a hacker

From the memory loading principle explained above, we learned about paging and missing page interrupts. And we are going to explain the startup optimization – binary rearrangement technology is based on the above principle, as far as possible to reduce the number of page interruption, so as to reduce the loss of startup time, and finally achieve the purpose of startup time optimization.

1.1.3.1.2 Principle of binary rearrangement technology

Now that we know that paging can trigger interrupts and Page faults can block processes, we know that this problem can have an impact on performance. In fact, iOS also performs a signature verification for production applications when they are reloaded due to page missing interruptions. As a result, the application Page fault in the iOS production environment is more time-consuming.

The cost of a Page Fault shared by the Tiktok team was 0.6-0.8ms. The actual test found that different pages would be different, and it was also related to the CPU load state, which was between 0.1-1.0ms.

When users use the app, the first immediate impression is that it takes time to start the app. However, due to the large number of classes, categories, tripartite and so on that need to be loaded and executed during the startup period, the time caused by multiple page faults can not be underestimated. This is also necessary for binary rearrangement for boot optimization.

Suppose at startup we need to call two functions method1 and method4. The position of function compilation in Mach -O is based on the compile order of LD (Xcode’s linker) rather than the call order. Therefore, it is likely that the two functions are distributed on different memory pages.

At startup, both Page1 and Page2 need to be loaded into physical memory from scratch, triggering a Page fault twice.

The binary rearrangement is to place method1 and method4 on a single memory page, and then only load Page1 at startup, triggering a Page fault once for optimization purposes.

The practice in a real project is to group together the functions that need to be called at startup (for example, in the first 10 pages) to minimize page faults for optimization purposes. This is called binary rearrangement.

1.1.3.1.3 How Do I Check page Fault

If you want to see the actual number of Page faults, you should uninstall the application, see the effect after the first application installation, or open many other applications first.

Since you have run the app before, and part of the app has been loaded into physical memory and mapped to the map table, restarting the app will trigger less page missing interrupts, as well as killing the app before opening it.

In fact, the hope is to overwrite/clean up the physical memory loaded before, reduce the error.

The procedure is as follows:

Open Instruments and select System Trace.
Select real machine, select project, click start, when the first page is loaded out click stop. Note here that it is best to kill and reinstall the application, because the definition of hot and cold startup does not necessarily mean that killing and restarting the application in the background is cold startup because of the process.

Wait for the analysis to complete and check the number of missing pages

The following picture shows the background kill to restart the application:

The following figure shows the first installation and startup of the application:

As a side check, you can add DYLD_PRINT_STATISTICS to check the total time spent in the pre-main phase.

1.1.3.1.4 How do I Perform binary rearrangement

Binary rearrangement is actually very simple, Xcode already provides this mechanism, and LibobJC is actually optimized with binary rearrangement.

Libobjc.order is provided in objC4-750 source code as follows:

We do the binary rearrangement in Xcode by following these steps:

First, Xcode uses a linker called ld, and LD has a parameter called Order File. We can use this parameter to configure the path of an Order File.
In the order file, write the symbols you need in order
When the project is built, Xcode reads this file, and the binary package will generate the corresponding Mach-O according to the symbolic order in the file.
How to view the symbol order of my project

Before and after rearranging, we need to check whether our symbol order has been changed successfully, and this time we use Link Map.

1.1.3.1.5 Link Map

The Link Map is generated during compilation and records the layout of the binaries. Write Link Map File is used to set the output or not. The default is no.

clean
Products - show in finder
macho

Find the latest.txt file as shown below and open it.

This file stores the order of all Symbols in the # Symbols: section:

page fault

As you can see, this symbol order is obviously in the order of the files Compile Sources has.

1.2 Caton optimization

1.2.1 Analysis of Caton principle

Before we look at the causes of stuttering, let’s take a look at how screen graphics work.

We will first connect some concepts about CPU and GPU:

CPU: responsible for object creation and destruction, object property adjustment, layout calculation, text calculation and typesetting, image format conversion and decoding, Core Graphics
GPU: Responsible for rendering textures (rendering data to the screen)

The GPU is a processing unit specifically designed for highly concurrent graphics computation. It uses less power than the CPU to complete the work and the floating point computing power of the GPU exceeds that of the CPU by a large margin.

GPU rendering performance than the CPU efficiency a lot, at the same time, the load and the consumption of the system is also lower, so in development, we should try to get the CPU is responsible for the main thread of the UI, the graphical display work related to the GPU to deal with, when it comes to the rasterizer when some of the work, the CPU will also be involved, which are described in more detail later.

Compared to the CPU, the GPU can do a single thing: take the submitted Texture and vertex description, apply the transform, mix and render, and then print it to the screen. The main things you can see are textures (pictures) and shapes (vector shapes for triangle simulation).

CPU and GPU collaboration:

As can be seen from the figure above, to display a view on the screen, CPU and GPU need to work together. CPU calculates the displayed content and submits it to GPU, and GPU puts the result into frame cache after rendering. Then the video controller will read the data of frame buffer line by line according to VSync signal and transfer it to the display through possible digital-to-analog conversion.
VSync technology: It allows the CPU and GPU to prepare data after receiving vSync signal, preventing tearing and frame skipping. Generally speaking, it ensures that the number of frames output per second is no higher than the number of frames displayed on the screen.
Double buffering technology: IOS is a dual-buffer mechanism, front-frame cache and back-frame cache. The CPU puts it into the buffer after GPU rendering. When the GPU finishes rendering the next frame and puts it into the buffer, and the video controller has read the front-frame, the GPU will wait for vSync(vertical synchronization signal) signal to be sent, and immediately switch the front-frame cache. And let the CPU start to prepare the next frame of data android 4.0 uses triple buffer, more than a post-frame buffer, can reduce the possibility of continuous frame loss, but will occupy more CPU and GPU

Principle of screen display image:

The display of the image can be simply understood as the calculation/typesetting/codec and other operations of THE CPU, and then the rendering is delivered to the GPU and put into the buffer. When the video controller receives vSync, the rendered frame will be read from the buffer and displayed on the screen.

Screen drawing principle:

Today’s mobile devices are basically using dual cache + vertical Sync (V-sync) screen display technology. As shown in the figure above, the CPU, GPU and display in the system cooperate to complete the display work. The CPU is responsible for calculating what is displayed, such as view creation, layout calculation, image decoding, text drawing, and so on. Then the CPU submits the calculated content to the GPU, which transforms, synthesizes and renders it. The GPU will pre-render a frame into a buffer for the video controller to read, and after the next frame is rendered, the GPU will directly point the video controller’s pointer to the second container (double cache principle). Here, the GPU waits for the display’s VSync (vertical sync) signal before a new frame is rendered and the buffer updated (this solves the tear and increases the smoothness of the picture, but consumes more computing resources and introduces some latency).

1.2.1.1 iOS image decompression to render process

CPU: compute the frame view, decode the picture, and hand the texture picture to GPU through the data bus

GPU: Texture blending, vertex transformation and calculation, pixel filling calculation, rendering to frame buffer.

Clock signal: V-Sync/H-Sync.

Double buffering on iOS devices: Display systems typically introduce two frame buffers, double buffering

The display of images on the screen is a collaboration between the CPU and GPU

In summary, the process of rendering an image to the screen:

Read file -> calculate Frame-> image decode -> Bitmap data of decoded texture picture delivered to GPU through data bus ->GPU to obtain picture Frame-> Vertex transformation calculation -> Raster -> obtain color value of each pixel according to texture coordinates (color * transparency value of each pixel is required if transparent value occurs)-> render to Frame Cache -> Render to screen

1.2.1.1.1 Image loading process

Image loading workflow:

Suppose we use the +imageWithContentsOfFile: method to load an image from disk without decompressing the image;

And then assign the generated UIImage to the UIImageView

Then an implicit CATransaction captures the changes to the UIImageView layer tree

When the next runloop of the main thread arrives, Core Animation commits this implicit transaction, which may copy the image depending on whether the image is byte aligned, etc. This copy operation may involve some or all of the following steps: (1). Allocate a memory buffer to manage file I/O and decompression operations (2). Read file data from disk to memory. (3) decoding compressed image data into uncompressed bitmap form, which is a very time-consuming CPU operation; (4). Finally, CALayer in Core Animation renders the UIImageView layer with uncompressed bitmap data. (5). CPU calculates the Frame of the image and decompresses the image. I’ll hand it over to the GPU to do the image rendering

Rendering process: (1).GPU obtains coordinates of image (2). Gives coordinates to vertex shader (vertex calculation) (3). Rasterize the image (get the image’s corresponding pixels on the screen) (4). Chip shader calculation (calculate the final display color value of each pixel) (5). Render from the frame cache to the screen

1.2.1.1.2 Why Do I Decompress images

Why decompress images when it takes a lot of CPU time to decompress them? Is it possible to display images on the screen without unzipping them? The answer is no. To understand this, we first need to know what is a bitmap

In essence, a bitmap is an array of pixels, each of which represents a point in the image. The JPEG and PNG images we often use in our applications are bitmaps

You can try

UIImage *image = [UIImage imageNamed:@"text.png"];
CFDataRef rawData = CGDataProviderCopyData(CGImageGetDataProvider(image.CGImage));
Copy the code

Print rawData, which is the rawData of the image.

In fact, both JPEG and PNG images are a compressed bitmap graphics format. However, PNG images are lossless compression and support alpha channels, while JPEG images are lossy compression and can specify a compression ratio of 0-100%. It’s worth noting that apple’s SDK provides two functions specifically for generating PNG and JPEG images:

// return image as PNG. May return nil if image has no CGImageRef or invalid bitmap format
UIKIT_EXTERN NSData * __nullable UIImagePNGRepresentation(UIImage * __nonnull image);

// return image as JPEG. May return nil if image has no CGImageRef or invalid bitmap format. compression is 0(most).. 1(least)
UIKIT_EXTERN NSData * __nullable UIImageJPEGRepresentation(UIImage * __nonnull image, CGFloat compressionQuality);
Copy the code

Therefore, before rendering an image on disk to the screen, the raw pixel data of the image must be obtained before any subsequent drawing operations can be performed, which is why the image needs to be decompressed.

1.2.1.1.3 Principle of Decompressing images

Since image decompression is inevitable, and we don’t want it to run on the main thread, affecting the responsiveness of our application, is there a better solution?

As we mentioned earlier, when an uncompressed image is about to be rendered to the screen, the system unzips the image in the main thread, and does not unzips the image if it has already been unzipped. As a result, the industry’s solution is to force the image decompression ahead of time in the child thread.

The principle of forced decompression is to redraw the image and get a new bitmap after decompression. The core function used is CGBitmapContextCreate:

CG_EXTERN CGContextRef __nullable CGBitmapContextCreate(void * __nullable data,
    size_t width, size_t height, size_t bitsPerComponent, size_t bytesPerRow,
    CGColorSpaceRef cg_nullable space, uint32_t bitmapInfo)
    CG_AVAILABLE_STARTING(__MAC_10_0, __IPHONE_2_0);
   
Copy the code

Function parameters:

Data: If not NULL, it should point to a block of memory with a size of at least bytesPerRow * height; If the value is NULL, the system automatically allocates and frees the required memory.
Width and height: the width and height of the bitmap, which are assigned to the pixel width and pixel height of the image respectively.
BitsPerComponent: The number of bits used for each color component of a pixel. Specify 8 in RGB color space.
BytesPerRow: The number of bytes used in each line of the bitmap. The size must be at least Width * bytes per pixel bytes. When 0/NULL is specified, the system not only automatically calculates for you, but also optimizes the cache line alignment
Space: is the color space we mentioned before, generally use RGB;
The layout of the bitmapInfo: bitmap information. KCGImageAlphaPremultipliedFirst
YYImage extract the image code: YYImage is used to extract the function of the picture YYCGImageCreateDecodedCopy exists in YYImageCoder class, the core code is as follows:

CGImageRef YYCGImageCreateDecodedCopy(CGImageRef imageRef, BOOL decodeForDisplay) {
    ...

    if (decodeForDisplay) { // decode with redraw (may lose some precision)
        CGImageAlphaInfo alphaInfo = CGImageGetAlphaInfo(imageRef) & kCGBitmapAlphaInfoMask;

        BOOL hasAlpha = NO;
        if (alphaInfo == kCGImageAlphaPremultipliedLast ||
            alphaInfo == kCGImageAlphaPremultipliedFirst ||
            alphaInfo == kCGImageAlphaLast ||
            alphaInfo == kCGImageAlphaFirst) {
            hasAlpha = YES;
        }

        // BGRA8888 (premultiplied) or BGRX8888
        // same as UIGraphicsBeginImageContext() and -[UIView drawRect:]
        CGBitmapInfo bitmapInfo = kCGBitmapByteOrder32Host;
        bitmapInfo |= hasAlpha ? kCGImageAlphaPremultipliedFirst : kCGImageAlphaNoneSkipFirst;

        CGContextRef context = CGBitmapContextCreate(NULL, width, height, 8.0.YYCGColorSpaceGetDeviceRGB(), bitmapInfo);
        if(! context)return NULL;

        CGContextDrawImage(context, CGRectMake(0.0, width, height), imageRef); // decode
        CGImageRef newImage = CGBitmapContextCreateImage(context);
        CFRelease(context);

        return newImage;
    } else{... }}Copy the code

It takes an original bitmap parameter, imageRef, and returns a new uncompressed bitmap, newImage, through the following three steps:

Create a bitmap context using the CGBitmapContextCreate function;

Draw the raw bitmap into the context using the CGContextDrawImage function;

Using CGBitmapContextCreateImage function creates a new bitmap after decompression.

In fact, the image decompression process in SDWebImage is exactly the same as above, with minor differences in some parameters passed to CGBitmapContextCreate. Performance comparison of SDWebImage and YYImage decompression images:

To extract the PNG image, go to SDWebImage>YYImage

After extracting the JPEG image,SDWebImage

The core code of SDWebImage decompression image is as follows:

SDWebImageThe use of:CGImageRef imageRef = image.CGImage;
        // device color space
        CGColorSpaceRef colorspaceRef = SDCGColorSpaceGetDeviceRGB(a);BOOL hasAlpha = SDCGImageRefContainsAlpha(imageRef);
        // iOS display alpha info (BRGA8888/BGRX8888)
        CGBitmapInfo bitmapInfo = kCGBitmapByteOrder32Host;
        bitmapInfo |= hasAlpha ? kCGImageAlphaPremultipliedFirst : kCGImageAlphaNoneSkipFirst;
        
        size_t width = CGImageGetWidth(imageRef);
        size_t height = CGImageGetHeight(imageRef);
        
        // kCGImageAlphaNone is not supported in CGBitmapContextCreate.
        // Since the original image here has no alpha info, use kCGImageAlphaNoneSkipLast
        // to create bitmap graphics contexts without alpha info.
        CGContextRef context = CGBitmapContextCreate(NULL,
                                                     width,
                                                     height,
                                                     kBitsPerComponent,
                                                     0,
                                                     colorspaceRef,
                                                     bitmapInfo);
        if (context == NULL) {
            return image;
        }
        
        // Draw the image into the context and retrieve the new bitmap image without alpha
        CGContextDrawImage(context, CGRectMake(0.0, width, height), imageRef);
        CGImageRef imageRefWithoutAlpha = CGBitmapContextCreateImage(context);
        UIImage *imageWithoutAlpha = [[UIImage alloc] initWithCGImage:imageRefWithoutAlpha scale:image.scale orientation:image.imageOrientation];
        CGContextRelease(context);
        CGImageRelease(imageRefWithoutAlpha);
        
        return imageWithoutAlpha;
Copy the code

1.2.2 Causes and monitoring of stalling

1.2.2.1 Causes of lag

The principle of picture display and screen rendering is explained above. There are many reasons for the lag, the most important reason is the occurrence of frame drop, as shown below:

The principle shown on the screen above is a mobile device that uses a vSYNC mechanism. After the arrival of VSync signal, the graphics service of the system will notify the App through CADisplayLink and other mechanisms, and the main thread of the App will start to calculate the display content in the CPU, such as view creation, layout calculation, picture decoding, text drawing, etc. Then THE CPU will submit the calculated content to the GPU, which will transform, synthesize and render. The GPU then submits the rendering result to the frame buffer and waits for the next VSync signal to be displayed on the screen. Due to the VSync mechanism, if the CPU or GPU does not complete the content submission within a VSync period, the frame will be discarded and displayed at the next opportunity, while the display will keep the previous content unchanged. That’s why the interface gets stuck.

During development, excessive pressure on either CPU or GPU will lead to frame drop. Therefore, it is necessary to evaluate and optimize the pressure on CPU and GPU respectively during development.

1.2.2.2 Lag monitoring

There are generally two implementation schemes for caton monitoring:

Main thread stuck monitoring. The runLoop of the main thread is monitored by the child thread to determine whether the time between the two state regions has reached a certain threshold.

FPS monitoring. To maintain smooth UI interaction, the App refresh rate should be as high as 60fps. The implementation principle of FPS monitoring has been discussed above and is skipped here.

In the practice of using FPS to monitor performance, it is found that FPS value jitter is large, which makes it difficult to detect stutter. To solve this problem, the main thread is monitored by detecting the time it takes to execute the message loop each time. When this time is greater than the specified threshold, it is recorded as the occurrence of a delay. This is also the Hertz performance monitoring scheme adopted by Meituan’s mobile terminal. The wechat team also put forward a similar scheme in practice — wechat Reading iOS Performance Optimization Summary.

The following figure is the flow chart of Meituan Hertz scheme:

The scheme is proposed that Sources events or other interactive events triggered by rolling are always executed quickly, and then enter the kCFRunLoopBeforeWaiting state. If a lag occurs during scrolling, the RunLoop must remain in one of two states: kCFRunLoopAfterWaiting or kCFRunLoopBeforeSources. Therefore, the first scheme to monitor the main thread is:

Start a child thread and calculate whether the time between kCFRunLoopBeforeSources and kCFRunLoopAfterWaiting exceeds a certain threshold in real time to determine whether the main thread is running late. However, since the main thread RunLoop is basically in a Before Waiting state when idle, this detection method can always identify the main thread as being stalled even if nothing has occurred. In order to solve this problem, Hanshen (Nanzhi Qinhan) has given its own solution to the problem by detecting the third party ANREye. The general idea of this set of stuck monitoring scheme is: create a child thread for cyclic detection, set the flag bit to YES each time, and then send tasks to the main thread to set the flag bit to NO. Then the child thread sleeps the timeout threshold and determines whether the flag bit is successfully set to NO. If it does not indicate that the main thread has stalled. Combined with this scheme, when the main thread is in the Before Waiting state, it can handle the stacken detection under normal conditions by sending tasks to the main thread to set the marker bits:

#define lsl_SEMAPHORE_SUCCESS 0
static BOOL lsl_is_monitoring = NO;
static dispatch_semaphore_t lsl_semaphore;
static NSTimeInterval lsl_time_out_interval = 0.05;


@implementation LSLAppFluencyMonitor

static inline dispatch_queue_t __lsl_fluecy_monitor_queue() {
    static dispatch_queue_t lsl_fluecy_monitor_queue;
    static dispatch_once_t once;
    dispatch_once(&once, ^{
        lsl_fluecy_monitor_queue = dispatch_queue_create("com.dream.lsl_monitor_queue".NULL);
    });
    return lsl_fluecy_monitor_queue;
}

static inline void __lsl_monitor_init() {
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        lsl_semaphore = dispatch_semaphore_create(0);
    });
}

#pragma mark - Public
+ (instancetype)monitor {
    return [LSLAppFluencyMonitor new];
}

- (void)startMonitoring {
    if (lsl_is_monitoring) { return; }
    lsl_is_monitoring = YES;
    __lsl_monitor_init();
    dispatch_async(__lsl_fluecy_monitor_queue(), ^{
        while (lsl_is_monitoring) {
            __block BOOL timeOut = YES;
            dispatch_async(dispatch_get_main_queue(), ^{
                timeOut = NO;
                dispatch_semaphore_signal(lsl_semaphore);
            });
            [NSThread sleepForTimeInterval: lsl_time_out_interval];
            if (timeOut) {
                [LSLBacktraceLogger lsl_logMain];       // Prints the main thread call stack
// [LSLBacktraceLogger lsl_logCurrent]; // Prints the call stack for the current thread
// [LSLBacktraceLogger lsl_logAllThread]; // Prints the call stack for all threads
            }
            dispatch_wait(lsl_semaphore, DISPATCH_TIME_FOREVER); }}); } - (void)stopMonitoring {if(! lsl_is_monitoring) {return; }
    lsl_is_monitoring = NO; } @ the endLSLBacktraceLoggerIs a class that gets stack information, as detailed in the codeGithub. The following logs are displayed:2018-08-16 12:36:33.910491+0800 AppPerformance[4802:171145] Backtrace of Thread 771:
======================================================================================
libsystem_kernel.dylib         0x10d089bce __semwait_signal + 10
libsystem_c.dylib              0x10ce55d10 usleep + 53
AppPerformance                 0x108b8b478 $S14AppPerformance25LSLFPSTableViewControllerC05tableD0_12cellForRowAtSo07UITableD4CellCSo0kD0C_10Foundation9IndexPathVtF + 1144
AppPerformance                 0x108b8b60b $S14AppPerformance25LSLFPSTableViewControllerC05tableD0_12cellForRowAtSo07UITableD4CellCSo0kD0C_10Foundation9IndexPathVtF To + 155
UIKitCore                      0x1135b104f -[_UIFilteredDataSource tableView:cellForRowAtIndexPath:] + 95
UIKitCore                      0x1131ed34d- [UITableView _createPreparedCellForGlobalRow:withIndexPath:willDisplay:] + 765
UIKitCore                      0x1131ed8da- [UITableView _createPreparedCellForGlobalRow:willDisplay:] + 73
UIKitCore                      0x1131b4b1e- [UITableView _updateVisibleCellsNow:isRecursive:] + 2863
UIKitCore                      0x1131d57eb- [UITableView layoutSubviews] + 165
UIKitCore                      0x1133921ee- [UIView(CALayerDelegate) layoutSublayersOfLayer:] + 1501
QuartzCore                     0x10ab72eb1- [CALayer layoutSublayers] + 175
QuartzCore                     0x10ab77d8b _ZN2CA5Layer16layout_if_neededEPNS_11TransactionE + 395
QuartzCore                     0x10aaf3b45 _ZN2CA7Context18commit_transactionEPNS_11TransactionE + 349
QuartzCore                     0x10ab285b0 _ZN2CA11Transaction6commitEv + 576
QuartzCore                     0x10ab29374 _ZN2CA11Transaction17observer_callbackEP19__CFRunLoopObservermPv + 76
CoreFoundation                 0x109dc3757 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
CoreFoundation                 0x109dbdbde __CFRunLoopDoObservers + 430
CoreFoundation                 0x109dbe271 __CFRunLoopRun + 1537
CoreFoundation                 0x109dbd931 CFRunLoopRunSpecific + 625
GraphicsServices               0x10f5981b5 GSEventRunModal + 62
UIKitCore                      0x112c812ce UIApplicationMain + 140
AppPerformance                 0x108b8c1f0 main + 224
libdyld.dylib                  0x10cd4dc9d start + 1
Copy the code

Scheme 2: It is realized by combining CADisplayLink

1.2.2.2.1 FPS

According to Wikipedia, FPS is short for Frames Per Second, which is the number of Frames transmitted Per Second. FPS measures the amount of information used to save and display moving video. The more frames per second, the smoother the image will be, and the lower the FPS, the more sluggish it will be, so this is a measure of how well an application will perform in its image rendering process. Generally, as long as the FPS of our APP is between 50 and 60, the user experience is relatively smooth. The normal refresh rate of the iPhone screen is 60 times per second, which can be interpreted as an FPS of 60. We all know that CADisplayLink is saved at the same screen refresh rate, so can we use it to monitor our FPS?

CADisplayLinkWhat is?

CADisplayLink is another NSTimer class provided by CoreAnimation. It always starts before a screen update has been completed. Its interface is designed to be similar to NSTimer, so it is essentially a replacement for the built-in implementation. But unlike timeInterval, which is measured in seconds, CADisplayLink has a frameInterval attribute of integer type that specifies how many frames must be separated before execution. The default is 1, which means that each screen update is preceded by one. But if the animation code executes for more than 1/60th of a second, you can specify a frameInterval of 2, which means the animation executes every other frame (30 frames per second).

Use CADisplayLink to monitor the FPS value of the interface, refer to YYFPSLabel:

import UIKit

class LSLFPSMonitor: UILabel {

    private var link: CADisplayLink = CADisplayLink.init(a)private var count: NSInteger = 0
    private var lastTime: TimeInterval = 0.0
    private var fpsColor: UIColor = UIColor.green
    public var fps: Double = 0.0

    // MARK: - init

    override init(frame: CGRect) {
        var f = frame
        if f.size == CGSize.zero {
            f.size = CGSize(width: 55.0, height: 22.0)}super.init(frame: f)

        self.textColor = UIColor.white
        self.textAlignment = .center
        self.font = UIFont.init(name: "Menlo", size: 12.0)
        self.backgroundColor = UIColor.black

        link = CADisplayLink.init(target: LSLWeakProxy(target: self), selector: #selector(tick))
        link.add(to: RunLoop.current, forMode: RunLoopMode.commonModes)
    }

    deinit {
        link.invalidate()
    }

    required init? (coder aDecoder:NSCoder) {
        fatalError("init(coder:) has not been implemented")}// MARK: - actions

    @objc func tick(link: CADisplayLink) {
        guardlastTime ! =0 else {
            lastTime = link.timestamp
            return
        }

        count+ =1
        let delta = link.timestamp - lastTime
        guard delta >= 1.0 else {
            return
        }

        lastTime = link.timestamp
        fps = Double(count) / delta
        let fpsText = "(String.init(format: "%.3f", fps)) FPS"
        count = 0

        let attrMStr = NSMutableAttributedString(attributedString: NSAttributedString(string: fpsText))
        if fps > 55.0{
            fpsColor = UIColor.green
        } else if(fps >= 50.0 && fps <= 55.0) {
            fpsColor = UIColor.yellow
        } else {
            fpsColor = UIColor.red
        }
        attrMStr.setAttributes([NSAttributedStringKey.foregroundColor:fpsColor], range: NSMakeRange(0, attrMStr.length - 3))
        attrMStr.setAttributes([NSAttributedStringKey.foregroundColor:UIColor.white], range: NSMakeRange(attrMStr.length - 3.3))
        DispatchQueue.main.async {
            self.attributedText = attrMStr
        }
    }
}
Copy the code

The implementation of CADisplayLink, tested on real machines, does meet the business requirements of monitoring FPS and provides a reference for improving user experience to a large extent, but the values may differ from those of Instruments. Let’s take a look at the possible problems with CADisplayLink. CADisplayLink runs in the added RunLoop (usually in the main thread), so it can only detect the frame rate of the current RunLoop. The scheduling timing of tasks managed in RunLoop is affected by the RunLoopMode and CPU busy. Therefore, to locate the exact performance problem, you are advised to use the Instrument to confirm it. (2). Possible circular reference problems with CADisplayLink.

For example:


let link = CADisplayLink.init(target: self, selector: #selector(tick))

let timer = Timer.init(timeInterval: 1.0, target: self, selector: #selector(tick), userInfo: nil, repeats: true)
Copy the code

Cause: Both of the above uses have strong references to self. In this case, the timer holds self, and self also holds timer. Circular references cause that when the page is dismissed, both sides cannot be released, resulting in circular references. Weak does not work as well:

weak var weakSelf = self
let link = CADisplayLink.init(target: weakSelf, selector: #selector(tick))
Copy the code

So how do we solve this problem? One might say that calling the timer’s invalidate method in deinit(or dealloc) is not valid because it has created a circular reference and will not go to this method.

The solution provided by the authors of YYKit is to use YYWeakProxy, which does not inherit from NSObject but from NSProxy.

NSProxy is an abstract superclass that defines interfaces for objects, and acts as a proxy for other objects, or for objects that don’t exist.

The modified code is as follows:

let link = CADisplayLink.init(target: LSLWeakProxy(target: self), selector: #selector(tick))
Copy the code

1.2.2.3 Lag Optimization CPU related optimization

Caton optimization from the CPU level related optimization, there are the following ways:

Try to use lightweight objects, such as those that are not used for event handlingCALayerreplaceUIView

Try to calculate the layout in advance (e.gcellLine height)

Don’t call and tune too oftenUIViewFor exampleframe,bounds,transformTo minimize unnecessary calls and modifications (UIViewThe display properties of theCALayer, andCALayerThey don’t have any of these attributes themselves, but are passed when they are first calledresolveInstanceMethodAdd and createDictionarySave, expend resources)

AutolayoutWould be better than setting directlyframeConsume moreCPUResources, which grow exponentially as the number of views grows.

The imagesizeBetter be right on top ofUIImageViewthesizeKeep consistent, reduce the processing calculation of picture display

Control the maximum number of concurrent threads

Try to put time-consuming operations into child threads

Text processing (sizing, drawing,CoreTextandYYText(1). Calculate the width and height of the textboundingRectWithSize:options:context:And text drawingdrawWithRect:options:context:Put in child thread operation (2). UseCoreTextCustom text space, in the object creation process can cache width information, avoid imageUILabel/UITextViewMultiple calculations are required (once for both adjustments and drawings), andCoreTextDirectly usedCoreGraphicsSmall memory footprint, high efficiency. (YYText)

Image processing (decoding, rendering) all images need to be decoded intobitmapTo render to the UI, created by iOSUIImage, will not be decoded immediately, only until the display will be decoded in the main thread, solid can use CoreGraphicsIn theCGBitmapContextCreateThe related operations force decompression to obtain the bitmap in the child thread ahead of time.

TableViewCell reuse: incellForRowAtIndexPath:The callback only creates the instance and returns quicklycell, do not bind data. inwillDisplayCell: forRowAtIndexPath:Bind data (assign)

Height cache: intableViewWhen you swipe, it keeps callingheightForRowAtIndexPath:whencellWhen the height needs to be adaptive, the height must be calculated for each callback, which causes the UI to stall. To avoid repeating meaningless calculations, cache height is required.

View hierarchy optimization: do not create views dynamically, cache them under memory controlsubview. Make the best usehidden.

Reduce the view hierarchy: ReducesubviewsThe number,layerDraw elements. Use them sparinglyclearColor.maskToBounds, shadow effects, etc.

Reduce unnecessary drawing operations.

Image optimization: (1) Do not useJPEGShould be usedPNGThe image. (2) Sub-thread pre-decoding (Decode), the main thread renders directly. Because whenimageThere is noDecode, and directly assign toimageViewA Decode operation is performed. (3) Optimize the image size and try not to zoom dynamically (contentMode). (4) As many pictures as possible into one display.

Less transparentview: Caused by using transparent ViewblendingIn iOS graphics processing,blendingMainly refers to the calculation of mixed pixel color. The most intuitive example is when we overlay two layers together. If the first layer is transparent, the final pixel color calculation needs to take into account the second layer. So this process is going to beBlending.

Rational use of-drawRect: When you useUIImageViewWhen you load a view, the view is still thereCALayer, but did not apply to a backup storage, instead using one that uses off-screen rendering, willCGImageRefAs content, the rendering service draws the image data into the frame buffer, which is displayed on the screen, and when we scroll the view, the view will reload, wasting performance. So for using-drawRect:Methods are more likely to be usedCALayerTo draw layers. Because the use ofCALayer–DrawInContext:, Core AnimationA backup store will be applied for this layer to hold the bitmaps drawn by those methods. The code inside those methods will run on the CPU, and the results will be uploaded to the GPU. The performance is better this way. Static interfaces are recommended-drawRect:The way dynamic pages are not recommended.

Load on demand: partial refresh, refresh a cell can solve the problem, do not refresh the wholesectionOr the entiretableViewTo refresh the minimum cell element. usingrunloopTo improve the smoothness of the slide, load the content when the slide stops, such as a flash (quick slide), there is no need to load, you can use the default placeholder to fill the content.

aboutBlendingSupplement:

Causes of blending:

UIViewthealpha< 1.

UIImageViewThe image containsalpha channel(even ifUIImageViewthealphaIt’s 1, but as long asimageIf it contains transparent channels, it will still causeblending).

Why does blending cause performance loss?

The reason is straightforward: if a layer is opaque, the system simply displays the color of the layer. If the layer is transparent, it will cause more calculation because the other layer needs to be included in the calculation of the blended color.

opaqueSet to YES to reduce the performance cost because the GPU will not do any composition and will simply copy from this layer.

1.2.2.4 Optimization of GPU-related optimization

fromGPULayer laton-related optimization has the following ways:

Try to avoid the display of a large number of pictures in a short period of time, as far as possible to display a composite of multiple pictures

GPUThe maximum texture size that can be processed is 4096×4096, and once this size is exceeded, it will be occupiedCPUResources are processed, so the texture should try not to exceed this size

GPUMultiple views are mixed together to display, consuming CPU resources and minimizing the number and level of views

Reduce transparent views (alpha<1), set the opacityopaqueforYES.GPUI’m not going to do italphaChannel synthesis of

Try to avoid off-screen rendering.

Use rasterization wiselyshouldRasterize: Rasterization is to transfer GPU operations to CPU, generate bitmap cache, and directly read reuse.CALayerWill be rasterizedbitmap.shadows,cornerRadiusSuch effects will be cached. Update the already rasterizedlayer, will cause off-screen rendering.bitmapIt will be removed if it is not used over 100ms. The Size of the cache is 2.5X Screen Size due to system restrictions.shouldRasterizeSuitable for static page display, dynamic page can add overhead. If you set it upshouldRasterizeFor YES, remember the Settings, toorasterizationScaleforcontentsScale.

Asynchronous rendering. Render in child thread, render in main thread. For example VVeboTableViewDemo

1.2.3 Off-screen rendering

What is off-screen rendering?

In OpenGL, the GPU has two Rendering methods: on-screen Rendering, which is used to render in the current Screen buffer; off-screen Rendering, which creates a new buffer outside the current Screen buffer to render

The whole process of off-screen rendering requires multiple context changes, first from the current Screen (on-screen) to off-screen (off-screen); When the off-screen rendering is finished, the rendering results of the off-screen buffer are displayed on the screen, and the context needs to be switched from off-screen to the current screen

What actions will cause an off-screen rendering?

Rasterizer,layer.shouldRasterize = YES

Mask,layer.mask

Rounded corners, set simultaneouslylayer.masksToBounds = YES,layer.cornerRadiusGreater than 0. Consider passingCoreGraphicsDraw and cut rounded corners, or ask the artist to provide rounded pictures

The shadow,layer.shadowXXXIf you set it uplayer.shadowPathThere will be no off-screen rendering.

layer.allowsGroupOpacityTo YES,layer.opacityIs less than 1.0

Optimized off-screen rendering

useShadowPathThe specifiedlayerShadow effect path.

Use asynchronouslylayerRender (Facebook’s open source asynchronous drawing frameworkAsyncDisplayKit).

Set up thelayertheopaqueThe value is YES to reduce complex layer composition.

Try to use transparency without inclusion (alphaChannel image resources.

Try to set uplayerThe size value of the

The most efficient solution is to have the artist cut the image into rounded corners.

In many cases, the user uploads images for display and can handle rounded corners on the client side.

Use code to manually generate rounded cornersimageSet to displayViewOn, the use ofUIBezierPath(Core GraphicsFrame) draw the rounded picture.

1.2.4 Xcode Performance Analysis Tool

1.2.4.1 Instuments

Instuments:InstumentsisXcodeThere are underutilized tools in the suite that many iOS developers never useInstrumentMany interviewers will also ask for knowledge of performance bar tuning to determine whether the candidate is really applying his/her years of development experience.

1.2.4.2 Activity Monitor

Activity Monitor: You can view all processes and their memory usage, CPU usage, and other data.

1.2.4.3 Allocations

Managing memory Allocations is one of the most important aspects of APP development, and for developers, reducing memory usage in program architecture is usually about using Allocations to locate and find ways to reduce memory usage. Next, let’s talk about two instances of memory leaks

The first type is: the object A allocated memory space, after the object A is not used, and has not released A memory leakLeaked MemoryA memory leak

The second type: similar to recursively, memory leakage caused by the continuous application of memory space, in this caseAbandoned Momory

Allocations tools give developers a good idea of how much memory each method is consuming, and locate the associated code, as shown below:

Right click to turn on Xcode automatically locating code to the relevant memory hogging method

1.2.4.4 Core Animation

Core Animation

The number circled in the red box represents the FPS value, 60 is the best in theory, 59 is fine in practice, and the instructions are smooth. Here’s how it works: Slide the screen list up and down without taking your finger off the screen to show what the options in the Deug Display do

Color Blended Layers: When this option is turned on, the screen looks like this:

This option highlights the blending area of the screen from green to red based on the rendering level (i.e., the overlay of multiple translucent layers). Blending affects GPU performance due to redrawing, and is also one of the main culprits for sliding or animating frames. In this case, it is easy to draw the entire screen of pixels, but if the relationship between overlapping pixels requires constant redrawing of the same area, frame drops and stuttering can occur. GPU will give up drawing pixels that are completely blocked by other layers, but it is complicated to calculate whether a layer is blocked or not and will consume CPU resources. Similarly, merging transparent overlapping elements of different layers consumes a lot of resources. Therefore, in order to process quickly, it is generally not necessary to use transparent layers. Add a fixed, opaque color to the View 2). Set the opaque property to true, but this is not very helpful for performance tuning, because the opaque property of UIView defaults to true. For UIIimageView, not only does it have to be transparent, but it also doesn’t have an alpha channel in it, which is why the 9 images above are green, so the nature of the image itself can affect the results, so you can make sure your code is okay, The reason why the text on the screen is highlighted red is that no opaque background color is added to the text label, but when the content of UILabel is Chinese, the actual rendering area of the label is larger than the size of the label, because there is a circle of shadows around the label. We need to add the following code to the Chinese label:

retweededTextLab? .layer.masksToBounds =trueretweededTextLab? .backgroundColor =UIColor.groupTableViewBackground
        statusLab.layer.masksToBounds = true
        statusLab.backgroundColor = UIColor.white
Copy the code

Take a look at the rendering:

statusLab.layer.masksToBounds = true
label

Color Hits Green and Misses Red

This option checks if we are abusing or using the layer’s shouldRasterize attribute correctly. The layer that is successfully cached will be marked green, and the layer that is not successfully cached will be marked red. Many view layers render very high due to Shadow, Mask and Gradient, etc. Therefore UIKit provides apis used to cache the Layer, the self. The Layer. The shouldRasterize = true system will these Layer cache into the Bitmap Bitmap for the use of rendering, if fails then discard the Bitmap to regenerate. The advantage of Rasterization is that it has a small impact on the refresh rate, but the disadvantage is that the Bitmap cache after Rasterization takes up memory, and when the layer needs to be scaled, extra calculations are made to the Bitmap after Rasterization. When this option is used, the Rasterized Layer is marked red if it is invalid and green if it is valid. When the test application frequently flashes red labeled layers, it indicates that Rasterization of layers is not effective. During the test, the layer with rasterization turned on was shown in red the first time it was loaded, which is normal because the cache has not been successful yet. But if in the next test, For example, if we scroll the TableView back and forth, we still see a lot of red areas, so we need to be careful

Color Copied Image

This option mainly checks whether we are using the wrong image format. Since the mobile phone display is based on pixels, when the mobile phone wants to display a picture, the system will help us to transform the picture. For example, a pixel takes up one byte, so RGBA takes up four bytes, and a 1920 x 1080 image takes up about 7.9m, but a normal JPG or PNG image is not that big because it is compressed, but reversible. So if the image is not formatted correctly, the system may take longer to convert the image into pixels. This option checks whether the image format is supported by the system. If the image format is not supported by the GPU, the image will be marked cyan, which can only be processed by the CPU. Instead of simply pointing to the original image, the CPU is forced to generate some images and send them to the rendering server. We don’t want the CPU to do this in real time while scrolling through the view, because we might block the main thread.

Color Immediately (Color Immediately updates)

Usually Core Animation updates the debug color of the layer at a rate of 10 times per second. This may be too slow for some effects. This option can be used to set it to update every frame (it may affect rendering performance, so don’t set it all the time).

Color Misaligned Image

This will highlight images that are scaled or stretched and are not properly aligned to the pixel boundaries, i.e. the image Size does not match the Size in the imageView, which will cause the image process to scale, and scaling will take up CPU, so make sure the image Size matches the imageView when writing the code, as shown below: Image size 170 × 220px

To see the image highlighted in yellow, change the imageView size:

        let imageView = UIImageView(frame: CGRect(x: 50, y: 100, width: 85, height: 110))
        imageView.image = UIImage(named: "cat")
        view.addSubview(imageView) 
Copy the code

Take a look at the rendering:

Rendered Yellow, those layers that need to be Rendered off-screen will be highlighted Yellow, and those Rendered off-screen may be Rendered

/* Fillet */
view.layer.maskToBounds = truesomeView.clipsToBounds = true
/* Set the shadow */
view.shadow..
/* Rasterize */
view.layer.shouldRastarize = true
Copy the code

For rasterization, we need to specify the screen resolution

        // Off-screen rendering - Asynchronous rendering consumes power
        self.layer.drawsAsynchronously = true
        
        // Rasterization - After asynchronous drawing, a single image is generated. When the cell scrolls on the screen, it essentially scrolls the same image
        // Cell optimization, to minimize the number of layers, like only one layer
        // After the scroll stops, you can listen
        self.layer.shouldRasterize = true
        
        // To use rasterization, you must specify resolution
        self.layer.rasterizationScale = UIScreen.main.scale
Copy the code

Specifies the shadow path to prevent off-screen rendering

        // Specify the shadow curve to prevent off-screen rendering due to shadow effects
        imageView.layer.shadowPath = UIBezierPath(rect: imageView.bounds).cgPath 
Copy the code

This line of code specifies the shadow path. If it is not specified manually, Core Animation calculates it automatically, which triggers an off-screen rendering. If the shadow path is manually specified, the calculation can be avoided, thus avoiding off-screen rendering. Setting the cornerRadius itself does not cause off-screen rendering, but many times it needs to be used with layer.masksToBounds = true. As summarized earlier, setting masksToBounds will result in an off-screen rendering. The solution is to avoid rounded corners when sliding as much as possible, and if you must, use rasterization to cache rounded corners:

// Set rounded corners
label.layer.masksToBounds = true
label.layer.cornerRadius = 8
label.layer.shouldRasterize = true
label.layer.rasterizationScale = layer.contentsScale
Copy the code

If there are many controls in the interface that need to be rounded, such as in tableView, when the tableView has more than 25 rounded corners, use the following method

view.layer.cornerRadius = 10
view.maskToBounds = Yes
Copy the code

So FPS will drop a lot, especially for some controls also set the shadow effect, will exacerbate the interface lag, frame drop phenomenon, for different controls will adopt different methods to deal with: 1). For the label class, you can use CoreGraphics to draw a rounded label 2). For imageView, the image drawn by CoreGraphics is trimmed to form a rounded imageView with the following code:

    /// create a rounded image
    ///
    /// -parameter Radius: indicates the radius of a rounded corner
    /// -parameter size: specifies the size of the image
    /// -parameter backColor: the default background color is white
    /// -parameter lineWith: the default width of the rounded corner line is 1
    /// -parameter lineColor: lineColor defaults to darkGray
    ///
    /// - returns: image
    func yw_drawRectWithRoundCornor(radius: CGFloat, size: CGSize, backColor: UIColor = UIColor.white, lineWith: CGFloat = 1, lineColor: UIColor = UIColor.darkGray) -> UIImage? {
        let rect = CGRect(origin: CGPoint(x: 0, y: 0), size: size)
        UIGraphicsBeginImageContextWithOptions(rect.size, true.0)
        let bezier = UIBezierPath(roundedRect: rect, byRoundingCorners: UIRectCorner.allCorners, cornerRadii: CGSize(width: radius, height: radius))
        
        backColor.setFill()
        UIRectFill(rect)
        
        bezier.addClip()
        draw(in: rect)

        bezier.lineWidth = 1
        lineColor.setStroke()
        bezier.stroke()
        let result = UIGraphicsGetImageFromCurrentImageContext(a)UIGraphicsEndImageContext(a)return result
    }
Copy the code

Color Compositing Fast-Path Blue

This option will highlight any layer drawn directly with OpenGL. Using only UIKit or Core Animation apis will not have any effect

Flash Updated Regions This option highlights the redrawn content to yellow, that is, layers drawn using Core Graphics at the software level.

1.2.4.5 Leaks

Leaks is mainly used to check for Memory Leaks, which are classified into two types in Allcations. Currently, we study Leaked Memory, which does not cause damage from the user’s point of view, and users do not feel the existence of Memory Leaks at all. The real hazard is the accumulation of memory leaks that eventually deplete the system of memory. Let’s go straight to the picture:

instruments
Leaks
Allocations
Allocations
Allocations
Leaks
leaks

Analysis of monitoring results:

1.2.4.6 Time profilers

Time Profiler is a tool that comes with Xcode. It periodically captures the stack information of threads and calculates the approximate elapsed Time of each method by comparing the stack status between Time intervals. The accuracy depends on the timing interval set.

Xcode → Open Developer Tool → Instruments → Time Profiler Change the Debug Information Format value of the project to DWARF with dSYM File. Otherwise, you will only see a bunch of threads that cannot locate the function.

A normal Time Profiler samples every 1ms. By default, it only collects all the call stacks in the running thread and then summarizes them in a statistical manner. As a result, short functions and dormant threads are not counted. For example, method3 is not sampled in the five samples shown in the figure below, so method3 is not seen in the stack aggregated.

To get a more accurate call stack, turn up the configuration in File -> Recording Options.

We can also use System Trace when the main thread is blocked by another thread, which is not immediately visible through the Time Profiler. For example, we deliberately hibernate for 10ms in the dyld callback after linking to the dynamic library:

static void add(const struct mach_header* header.intptr_t imp) {
    usleep(10000);
}
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
    dispatch_sync(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT.0), ^{ _dyld_register_func_for_add_image(add); }); . }Copy the code

You can see that the whole recording process took 7s, but the Time Profiler only showed 1.17s and was blank for a while after seeing the startup. Check the status of each thread through System Trace.

Then we look at thread 0x5d39c and find that the thread performed multiple 10ms sleep operations during the main thread block.

In the future, when we want to see the scheduling between threads more clearly, we can use System Trace. However, we still recommend using Time Profiler as it is easy to understand and more efficient to troubleshoot problems.

1.2.4.7 App Launch

App Launch is a new tool released after Xcode11 and functions as a combination of Time Profiler and System Trace.

1.2.4.8 Hook objc_msgSend

You can Hook objc_msgSend to get the specific elapsed time of each function and optimize functions that take a lot of time during startup or call them later. See objc_msgSend for iOS method time monitoring in my previous article.

1.2.5 Analysis of interview questions related to Caton

Did you get a tableView stuck? What are the causes of the problem?

TableView may get stuck because:

The most common one iscellFor reuse, register reuse identifiers if not reusedcellWhen, every time acellWhen displayed on the screen, a new cell is created; When you have a lot of data, it piles upcell. If reusecell, creates an ID for the cell. Whenever a cell needs to be displayed, the buffer pool is first searched for a recyclable onecellIf no, create it againcell

avoidcellRelayout ofcellThe operation such as the layout filling is time-consuming, generally created when the layout is good, such as cancellPut it in a custom class and lay it out at initialization

Calculate and cache ahead of timecellProperties and content when we createcellThe compiler determines the height of each cell according to the content, and then creates the cell to be displayed. When scrolling, the height is calculated every time the cell enters the visible area. The height is estimated in advance and the compiler is informed of the height. The next step is to create the cell, and then call the height specific calculation method, which wastes time evaluating cells that are not displayed

To reducecellThe number of controls should be as much as possible so that the layout of cells should be roughly the same. Different styles of cells can use different reuse identifiers. Controls can be added during initialization, and those that are not applicable can be hidden first

Don’t useClearColor, no background color, transparency should not be set to 0 rendering time is very long

Use local updates if only a group is updatedreloadSectionMake local updates

Load network data, download images, use asynchronous loading, and cache

Use lessaddView 给cellDynamically addview

Load cells on demand. If a cell scrolls quickly, load only the cells in the range

Don’t implement useless proxy methods,tableViewOnly two protocols are adhered to

Cache line height:estimatedHeightForRowCan’t andHeightForRowThe inside of thelayoutIfNeedBoth exist at the same time, both exist at the same time to produce the “stampede” bug. So my advice is: whenever the line height is fixed, write the estimated line height to reduce the number of line height calls and improve performance. Instead of writing predictive methods for dynamic line height, use a line height cache dictionary to reduce the number of calls to your code

Don’t do extra drawing work. In the implementationdrawRectWhen:, its rect parameter is the region to draw, and the region outside this region does not need to be drawn. For example, in the above example, it can be usedCGRectIntersectsRect,CGRectIntersectionorCGRectContainsRectDetermine whether you need to drawimageandtext, and then call the draw method.

Pre-rendered images. There is still a brief pause when a new image appears. The solution is inbitmap contextSo let’s draw it first and derive it fromUIImageObject, which is then drawn to the screen;

Use the right data structure to store data.

1.3 Optimization of power consumption

1.4 Network Optimization

1.5 Crash problem optimization

1.6 Memory Optimization

1.7 Optimization of body shaping in installation package

Reference: juejin. Cn/post / 684490…

www.jianshu.com/p/72dd07472…

Juejin. Cn/post / 684490…

Advanced Graphics and Animations for iOS Apps(Session 419) Use ASDK Performance Tuning – Improve iOS Interface rendering capabilities Designing for iOS Summary iOS off-screen rendering deep understanding of mobile terminal optimization off-screen rendering iOS fluency Performance optimization, CPU, GPU, off-screen rendering off-screen rendering optimization details: example demonstration + Performance test

IOS performance tuning is an essential skill to be a qualified iOS programmer

IOS Official Documentation

Thematic content is more, later subdivision content will have partial repetition.

The Performance project
Core Animation Programming Guide

IOS Performance optimization books

Pro iOS Apps Performance Optimization
High Performance iOS Apps
iOS-Core-Animation-Advanced-Techniques

Instruments tools related

Instruments User Guide Chinese Translation -PDF
Leaks learning
Instruments Allocations of learning
Summary of Instrument’s Time Profiler
Core Animation learning for Instruments Learning
Getting started with The Activity Monitor for Instruments

GMTC-2018 PPT

How to Optimize the performance of LinkedIn mobile Apps
Meituan client monitoring and troubleshooting practice
Iqiyi APP ultimate experience road
Best practices for front-end monitoring in the Big front-end era
Performance optimization and monitoring PPT

Performance optimization

WWDC2012-235-iOS APP Performance:Responsiveness
Wechat reading iOS performance optimization
Wechat reading iOS quality assurance and performance monitoring
An in-depth look at iOS performance optimization
Shen Zhe, Vice president of Research and development of Magic Window: the optimization of mobile SDK
Sogou input method iOS version development and optimization practicePPT
Stability and performance practice of Mogujie AppPPT
Mobile iOS performance optimization exploration
IOS App stability index and monitoring

Memory optimization

Memory Usage Performance Guidelines
WWDC-2018-416Chinese translation
Explore iOS memory allocation
IOS wechat memory monitoring
Memory management and optimization (1)-QQ browser
Memory management and optimization (bottom)-QQ browser
OOM: XNU memory status management

Caton optimization

UIKit performance tuning practical demonstration
QQ space frame rate optimization combat
Realize the 60FPS netease cloud music home page
IOS tips for keeping the interface smooth
Summary of iOS UI Performance optimization
Wechat iOS Caton monitoring system
Ios-caton detection
IOS monitoring: Lag detection
IOS app UI thread congestion monitoring

Power optimization

Guide – Energy Efficiency Guide for iOS Apps
WWDC2017 – Writing Energy Efficient Apps
Research on common iOS power consumption detection schemes
How to develop a Power-saving iOS App
Analysis of the characteristics of mobile cellular network and its energy-saving scheme
IOS power testing practice
IOS Advanced -App power optimization is enough

Start the optimization

WWDC2016-406-Optimizing App Startup Time
WWDC2017-413-App Startup Time:Past,Present,and Future
How to accurately measure the startup time of iOSAPP
Optimize App startup time – Yang Xiaoyu
IOS client startup speed optimization – Today’s Headlines
IOS App startup Performance Optimization -WiFi Butler
How to optimize startup time for iOS Apps -Facebook
IOS startup speed optimization – Baidu input method
An immediate startup time optimization
Obj Chinese-Mach-o executable file
IOS APP startup speed research practice
IOS App cold launch governance: From the practice of Meituan Waimai

Volume optimization

IOS wechat installation package slimming
Toutiao IPA installation package optimization
IOS slim removes useless Mach-O files from the FrameWork
An iOS package size slimming scheme based on Clang plug-in
IOS executable file thin method
IOS image optimization solution
Didi goes out on the iOS end of the slimming practice in Slides

Network optimization

Mobile network optimization practice of Meituan-Dianping
Open source HttpDNS solution in detail
Ctrip App network performance optimization practice
Ctrip App network service channel governance and performance optimization practice in 2016
Mogujie App Chromium Network stack practice
Mogujie high concurrency multi terminal wireless Gateway practice
Overview of mobile APP Network optimization

Compiler optimization

Optimizing-Swift-Build-Times

APM

Analysis of the technical principle of mobile terminal monitoring system
Share netease – NeteaseAPM iOS SDK technology implementation
Netease Lede – iOS unburied data SDK practice road
Listening cloud – mobile APM product development skills
Listen to the Cloud – Mobile App performance monitoring
IOS Performance Monitoring SDK — Wedjat development process investigation and collation
Reveal the core technology of APM iOS SDK
iOS-Monitor-Resources
IOS traffic monitoring analysis
Xcode Reverse: App memory monitoring principle
Apmcon-2016 Lecture Transcript
360 Mobile Terminal Performance Monitoring Practices QDAS-APM (iOS)
Mobile terminal performance monitoring scheme Hertz

Commissioning & Crash

IOS project development process used in the advanced debugging skills, involving tripartite library dynamic debugging, static analysis and decompression and other fields
Understanding and Analyzing Application Crash Reports

Related open source libraries

network

HTTPDNSLib-for-iOS
HTTPDNSLib-for-Andorod
NetworkEye

memory

FBMemoryProfiler
iOS Memory Budget Test

caton

PerformanceMonitor-Runloop
GYMonitor-FPS

Thin body

LSUnusedResources
LinkMap

APM

iOS-System-Services
System Monitor
PerformanceTestingHelper
GT
GodEye
ArgusAPM
AppleTrace
matrix
MTHawkeye