As the App continues to iterate, the number of business modules increases, the logic becomes complicated, and more third-party libraries are integrated, the App will start slowly. Therefore, we hope to maintain a good start speed and bring users a good experience while expanding our business.

I. Theory of noun concept

In order to understand the process of App cold start more accurately, we need to grasp some basic concepts

1.1. The Mach – O

Mach-O (Mach Object File Format) is a File Format for recording executables, Object code, shared libraries, dynamically loaded code, and memory dumps. The binary executable generated by App compilation is in Mach-o format, and all the classes in iOS project will generate the corresponding object file. O file after compilation, and this executable file is a collection of these.

Enter the following command in the Xcode console to print out all the Mach-O files loaded into the application at run time.

image list -o -f
Copy the code

The Mach-O file consists of three main parts:

  • Mach header: Describes the CPU architecture, file types, and load commands of Mach-O.
  • Load Commands: Describes the specific organization structure of data in a file. Different data types use different Load commands.
  • Data: The Data of each segment of Data is stored here. Each segment has one or more sections, which store specific Data and code, mainly including the following three types:

    • __TEXTContains Mach headers, executed code, and read-only constants such as C strings. Read-only executable (R-x).
  • __DATAContains global variables, static variables, etc. Read-write (RW….) .
  • __LINKEDITContains the loadermetadata, such as the name and address of the function. Read only (r… .

1.2. Dylib

Dylib is also a file in the Mach-O format, and files with the suffix.dylib are dynamic libraries (also known as dynamic link libraries). Dynamic libraries are loaded at runtime and can be shared by multiple App processes.

If you want to know all the dynamic libraries that TestDemo depends on, you can do this with the following command:

otool -L /TestDemo.app/TestDemo
Copy the code

Dynamic link libraries are divided into system dylib and embedded dylib (embed Dylib, that is, developers manually introduced dynamic libraries). Dylib has:

  • All the system frameworks used in iOS, such as UIKit and Foundation;
  • System-level libsystems such as libdispatch(GCD) and libsystem_blocks(Block);
  • Load the OC Runtime method libobjc;

1.2.1. Dyld

Dyld (Dynamic Link Editor) : Dynamic linker, which is essentially a Mach-O file, a library dedicated to loading dylib files. Dyld is located at /usr/lib/dyld and can be found on macs and jailbroken machines. Dyld will load the dynamic library and App file that the App depends on into memory and execute it.

1.2.2. Dyld Shared cache

Dyld shared cache is a dynamic library shared cache. The dynamic linker on OS X and iOS uses shared caches to save time parsing and processing symbols when there are many more dynamic libraries to load and more interdependent symbols. OS X’s Shared cache is located in the/private/var/db/dyld /, while the iOS/System/Library/Caches/com. Apple. Dyld /.

When loading a Mach-O file, DyLD first checks to see if it exists in the shared cache and then pulls it out and uses it. Each process maps this shared cache into its own address space. This approach greatly optimizes the startup time of programs on OS X and iOS.

1.2.3. Dyld Shared cache

Dyld shared cache is a dynamic library shared cache. The dynamic linker on OS X and iOS uses shared caches to save time parsing and processing symbols when there are many more dynamic libraries to load and more interdependent symbols. OS X’s Shared cache is located in the/private/var/db/dyld /, while the iOS/System/Library/Caches/com. Apple. Dyld /.

When loading a Mach-O file, DyLD first checks to see if it exists in the shared cache and then pulls it out and uses it. Each process maps this shared cache into its own address space. This approach greatly optimizes the startup time of programs on OS X and iOS.

1. ‘

“Images” is not a picture, but a mirror image. Each App loads in images. Images types include:

  • Executable: a binary executable file for applications
  • Dylib: dynamic link library;
  • Bundle: Resource file that belongs to an unchained dylib and can only be passed at run timedlopen()Load.

1.2.5. The framework

The Framework can be a dynamic or static library, a folder containing dylib, bundles, and headers.

Two, cold start related (home page is native)

When the user presses the home button, the iOS App will not be killed immediately. Instead, it will live for a period of time, during which time the user opens the App again, and the App will basically revert to the state in front of the back stage without doing anything. We call the startup process in which the App process is still in the system and no new process needs to be started as hot startup.

Cold start refers to the process in which the App is not in the system process. For example, after the device is restarted, or the App process is manually killed, or the App has not been opened for a long time, the user clicks to start the App. In this case, a new process needs to be created and assigned to the App. We can think of cold startup as a complete App startup process, and this article discusses the optimization of cold startup.

1. Cold start:

1.1 Origin of cold start

WWDC 2016 was the first time that the topic of App startup optimization appeared, which mentioned:

  • The best startup speed of App is within 400ms, because it takes 400ms from clicking the App icon to Launch and then the Launch Screen to appear and disappear.
  • The minimum startup time of the App should not be longer than 20 seconds. Otherwise, the process will be killed by the system. (The startup time should be based on the device with the lowest configuration supported by the App.)

1.1.1 Two opinions about cold start:

Claim one:

Cold start the whole process of refers to arouse the App from the user to the AppDelegate didFinishLaunchingWithOptions method of completion of the execution, and to carry out the timing of the main () function as the cut-off point, There are two stages: pre-main and Main ().

Claim 2:

There is also a way to say that the entire cold start phase ends with the main UI framework’s viewDidAppear function. Both of these statements can be used. The former is defined as the App has started and initialized, while the latter is defined as the user view has started, that is, the first screen has been loaded.

Note: many articles here describe the second stage as after the main function, which I think is not very good and misleading. Want to know the main function in the operation process of the App will not quit, whether in the AppDelegate didFinishLaunchingWithOptions method or the ViewController viewDidAppear method, It’s all done inside main.

1.2. The pre – the main stage

The pre-main phase refers to the process between the user invoking the App and the execution of the main() function.

1.2.1 Viewing Phase Time (XCODE13 is the watershed)

1.2.1.1. Xcode13 before

1. We can configure environment variables in Xcode

Product -> Edit Scheme -> Run -> Arguments ->Environment Variables -> +

DYLD_PRINT_STATISTICS is set to 1

So if I run this Demo on iOS 10 or higher,pre-mainThe startup time of the phase will be printed in the console (note: my X-code has been upgraded to 13.3, so the log cannot be printed)

For more detailed information, set DYLD_PRINT_STATISTICS_DETAILS to 1.

1.2.1.2 After Xcode13, the above method is invalid

The following approach can be used

The code is posted below

#import <Foundation/Foundation.h>



NS_ASSUME_NONNULL_BEGIN



 @interface AppLaunchTime : NSObject



+ (void)mark;



 @end
Copy the code
#import "AppLaunchTime.h" #import <sys/sysctl.h> #import <mach/mach.h> @implementation AppLaunchTime double __t1; // create process time double __t2; // before main double __t3; ProcessStartTime {if (__t1 == 0) {struct kinfo_proc procInfo; int pid = [[NSProcessInfo processInfo] processIdentifier]; int cmd[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid}; size_t size = sizeof(procInfo); if (sysctl(cmd, sizeof(cmd)/sizeof(*cmd), &procInfo, &size, NULL, 0) == 0) {__t1 = procinfo.kp_proc.p_un.__p_starttime.tv_sec * 1000.0 + procinfo.kp_proc.p_un.__p_starttime.tv_usec / 1000.0; } } return __t1; } /// start recording: call + (void)mark {double __t1 = [AppLaunchTime processStartTime] in DidFinish; dispatch_async(dispatch_get_main_queue(), ^ {/ / make sure didFihish code execution after calling the if (__t3 = = 0) {__t3 = CFAbsoluteTimeGetCurrent () + kCFAbsoluteTimeIntervalSince1970; } double pret = __t2 - __t1 / 1000; double didfinish = __t3 - __t2; double total = __t3 - __t1 / 1000; NSLog (@ "-- -- -- -- -- -- -- -- -- -- the App start -- -- -- -- -- -- -- -- -- time consuming: pre - the main: % f", pret); NSLog (@ "-- -- -- -- -- -- -- -- -- -- the App start -- -- -- -- -- -- -- -- -- time consuming: didfinish: % f", didfinish); NSLog (@ "-- -- -- -- -- -- -- -- -- -- the App start -- -- -- -- -- -- -- -- -- time: total: % f", total); }); } // It is relatively easy to get the end time of the pre-main() stage by calling the constructor before main. It is recommended to use the point at which the __attribute__(((constructor)) constructor function is called as the end time of the pre-main() phase: __t2 maximizes decoupling:  void static __attribute__ ((constructor)) before_main() { if (__t2 == 0) { __t2 = CFAbsoluteTimeGetCurrent() + kCFAbsoluteTimeIntervalSince1970; }}Copy the code

Run print instead

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {

  [AppLaunchTime mark];

  return YES;

}
Copy the code

Log print

Cold start optimization (3454-116391) -- -- -- -- -- -- -- -- -- -- the App start -- -- -- -- -- -- -- -- -- time consuming: pre - the main: 0.718716 cold start optimization (3454-116391) -- -- -- -- -- -- -- -- -- -- the App start -- -- -- -- -- -- -- -- -- time consuming: didfinish: 0.028895 cold start optimization (3454-116391) -- -- -- -- -- -- -- -- -- -- the App start -- -- -- -- -- -- -- -- -- time: total: 0.747611Copy the code

1.2.2. Startup process analysis and optimization

1.2.2.1. Analysis of the overall startup process

When an application is started, the system creates a new process by using the fork() method, then executes the image and replaces it with another executable program by exec(), then performs the following operations:

  1. The executable file is loaded into the memory space, and the path of dyLD can be analyzed from the executable file.
  2. Load dyLD into memory;
  3. Dyld starts with the dependencies of the executable file, recursively loads all of the dependency dynamic link libraries dylib and initializes them accordingly.

Combined with the above pre-main printed results, we can roughly understand the whole startup process as shown in the figure below:

1.2.2.2. Load Dylibs

This step refers to dynamic library loading. At this stage, DYLD will:

  1. Analyze all dylib dependencies of the App;
  2. Find the mach-o file corresponding to dylib;
  3. Open, read, and validate these Mach-O files;
  4. Registering code signatures in the system kernel;
  5. Every segment call to dylibmmap().

Typically, iOS apps need to load 100-400 Dylibs. These dynamic libraries range from systems to those manually introduced by developers. Most of these dylibs are system libraries that have been optimized, so developers should be more concerned about the embedded dylibs they manually integrate, which can be performance expensive to load.

The less dylib your App relies on, the better. Apple’s official advice is to keep the number of dylib embedded in your App to six.

Optimization scheme:

  • Try not to use inline dylib;
  • Merge existing embedded dylib;
  • Check the framework ofoptional 和 requiredSet, if the Framework exists in all versions of iOS supported by the current ApprequiredBecause set tooptionalThere will be additional screening;
  • Use static libraries instead; (However, static libraries can be plugged into the executable at compile time, resulting in an increase in the size of the executable. Both have their own advantages and disadvantages, which developers can weigh.)
  • Lazy loading of dylib. (but usedlopen()Performance will be affected, because the App is originally run in a single thread when it starts, the system will cancel the lock, butdlopen()With multithreading enabled, the system will have to lock, which not only degrades performance, but may also cause deadlocks and unknown consequences, which is not recommended.
1.2.2.3.Rebase/Binding

In this step, we’re doing pointer relocation.

In the process of loading dylib, the system introduced ASLR (Address Space Layout Randomization) technology and code signature for safety. Due to ASLR, the image will be loaded at the new actual_address, with a deviation from the previous preferred_address (slide, Slide =actual_address-preferred_address), so dyLD needs to correct this bias and point to the correct address. Specifically through the two steps:

Step 1: Rebase the pointer inside the image. Image is read into memory and encrypted and verified by page to ensure that it will not be tampered. Performance consumption is mainly in IO.

Step 2: Binding. Pointer to something outside the image. Query the symbol table and set Pointers to the outside of the mirror. The performance cost is mainly calculated by the CPU.

You can run the following command to view information about rebase and bind:

xcrun dyldinfo -rebase -bind -lazy_bind TestDemo.app/TestDemo
Copy the code

LC_DYLD_INFO_ONLY allows you to view the offsets and sizes of various information. For easy and intuitive viewing, the MachOView tool is recommended.

The smaller the number of Pointers, the less time it takes to fix them. So, the key to optimizing this phase is to reduce the number of Pointers in the __DATA section.

Optimization scheme:

  • Reduce the number of ObjC classes, methods, and categories by merging some functions and removing invalid classes, methods, and categories (use AppCode’s Inspect Code feature for slimming);
  • Reduce C++ virtual functions; The virtual function creates the vtable, which will also be in the__DATASection to create a structure.
  • Multipurpose Swift Structs. (Because Swift Structs are statically distributed, its structure is optimized internally to have fewer symbols.)
1.2.2.4. ObjC Setup

After Rebase and Bind are complete, tell the Runtime to do something that the code needs to do when it runs:

  • Dyld registers all declared ObjC classes;
  • Inserts the classification into the class’s method list;
  • Check the uniqueness of each selector.

Optimization scheme:

The Rebase/Binding phase is optimized so that this step takes less time.

1.2.2.5. Initializers

Rebase and Binding are fix-ups that modify the contents of the __DATA segment, whereas here dynamic adjustments begin, writing to the heap and stack. The specific work includes:

  • Calls in each Objc class and classification+loadMethods;
  • Call C/C++ constructor functions (withattribute((constructor))Modified function);
  • Create a C++ static global variable of a non-primitive type.

Optimization scheme:

  • Try to avoid in class+loadMethod, can be deferred to+initiailizeIn; (Because in a+loadMethod with a 4ms runtime method replacement.)
  • Avoid the use of__atribute__((constructor))Mark the method explicitly as an initializer, leaving it to be executed at the time of the initialization method call. Such as usedispatch_once(),pthread_once() 或 std::once(), which is equivalent to the first use of the initialization, delaying part of the work time. :
  • Reduce the number of C++ static global variables that are not primitive types. (Because such global variables are usually classes or structs, heavy work in constructors can slow startup.)

To summarize possible optimizations for the pre-main phase:

  • Reorganize the architecture to reduce the number of unnecessary built-in dynamic libraries
  • Slim down the code, merge or remove invalid ObjC classes, categories, methods, C++ static global variables, etc
  • Will not have to be in+loadThe task executed in the method is delayed to+initialize 中
  • Reduce C++ virtual functions

1.3. The main () phase

For the main () phase, the main measurement is from the main () function began to perform didFinishLaunchingWithOptions method performs the end of the time.

1.3.1. Viewing the Phase Time

Here are two ways to check the time spent in the main() phase.

Method 1: Manually insert the code to calculate the time consumption.

Step 1: Record the current time with the variable MainStartTime in main()

#import <UIKit/UIKit.h>

#import "AppDelegate.h"



CFAbsoluteTime MainStartTime;



int main(int argc, char * argv[]) 

{

  NSString * appDelegateClassName;

  MainStartTime = CFAbsoluteTimeGetCurrent();

  

  @autoreleasepool {

    appDelegateClassName = NSStringFromClass([AppDelegate class]);

  }

  return UIApplicationMain(argc, argv, nil, appDelegateClassName);

}
Copy the code

Step 2: Declare global variables with extern in the appdelegate. m file

Step 3: before the end of the didFinishLaunchingWithOptions method, and then get the current time, and MainStartTime difference is the main () function phase of time-consuming

#import "AppDelegate.h" #import "AppLaunchTime.h" extern CFAbsoluteTime MainStartTime; @interface AppDelegate () @end @implementation AppDelegate - (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions { [AppLaunchTime mark]; double mainLaunchTime = (CFAbsoluteTimeGetCurrent() - MainStartTime); NSLog(@"main() time: %.2fms", mainLaunchTime * 1000); return YES; } // Log: // Cold start optimization [3616:126842] Main () stage time: 21.92msCopy the code
Method 2: Check the elapsed Time with Instruments’ Time Profiler tool.

Open Xcode → Open Developer Tool → Instruments → Time Profiler.

Operation steps:

  1. The configuration Scheme. Click on theEdit SchemefindProfileUnder theBuild Configuration, is set toDebug.

2. Configure PROJECT. Click PROJECT, inBuild SettingsFound in theBuild OptionsIn the optionsDebug Information Format,DebugThe corresponding value is changed toDWARF with dSYM File.

3. Start the Time Profiler, click the red circle button in the upper left corner to start the detection, and then you can see the complete path of the code execution and the corresponding Time.

Separate Thread and Hide System Libraries can be selected on the Call Tree to view the execution time of code in the application and the actual location of the code path.

1.3.2. Start optimization

After the main () is invoked, didFinishLaunchingWithOptions phase, the App will make the necessary initialization, but viewDidAppear performance is done before the end of the home page to load and display content.

For App initialization, in addition to events such as statistics and logs that need to be configured upon App startup, some configurations can also be considered lazy loading. If you are also involved in the didFinishLaunchingWithOptions first screen loaded, you can consider from the Angle of the optimization:

  • Load the home page view in pure code, not xiB /Storyboard
  • Delay temporarily unnecessary two-party/three-party library loading;
  • Delay execution of some business logic and UI configuration;
  • Lazy/lazy loading of partial views;
  • Avoid a lot of local/network data reading during the first screen loading;
  • Remove NSLog printing in the release package;
  • Reduce the size of images on a page to a visually acceptable extent;

The home page is optimized for H5 pages

Sonic is a lightweight, high-performance Hybrid framework developed by the Tencent team that focuses on improving the first screen loading speed

  • Terminal time-consuming

    • WebView preloading: A webView is preloaded during App startup. By creating an empty webView and pre-starting the Web thread, some global initialization can be completed, and the second creation of a webView can be improved by hundreds of milliseconds.
  • Page Time (static page)

    • Static straight out: the server pulls the data and renders it through Node.js to generate AN HTML file containing the first screen data and publish it to the CDN. The webView is directly obtained from the CDN.
  • Offline prepush: Use an offline package.
  • Page time (frequently dynamically updated pages)

    • Parallel loading: WebView opening and resource request parallel;
  • Dynamic cache: the dynamic page is cached on the client. The user opens the cached page first and then refreshes the page.
  • Static and dynamic separation: the page is divided into static template and dynamic data, according to different startup scenarios for different refresh schemes;
  • Preload: Pull the required incremental update data ahead of time.

4. To summarize

Cold start is a complicated process, and there is no fixed way to optimize it. We need to apply it flexibly based on business, with some performance analysis tools and online monitoring logs