This chapter content

  1. Simple understanding of APP startup
  2. Binary rearrangement (startup time optimization)
  3. What is DYLD and what is the difference between static and dynamic libraries
  4. DYLD Process for linking dynamic and static libraries

The APP launched

As we all know, APP startup is mainly divided into two stages: pre-main and after main, and APP startup optimization is mainly carried out in these two stages. The optimization after main is nothing more than: 1. Reduce unnecessary tasks, 2. Delay the execution of necessary tasks, such as the controller interface, etc. Optimizing the startup time after Main is easy to do and won’t be covered in this article.

APP launch -> load libSystem -> Runtime register callback function -> load image -> execute map_images and load_images methods -> call main.

To check the pre-main time, add DYLD_PRINT_STATISTICS to (Edit Scheme -> Run -> Arguments -> Environment Variables) to see the time in the console

Missing page error

It’s important to understand, because this is how the program loads and runs, so you can understand why you need to do startup optimizations.

We should be clear: any program running because it is in physical memory, is the ultimate is to load the program into physical memory to run, it is loaded into the physical memory, also is the process of virtual memory mapped to physical memory is lazy loading method, and the process is the system to the CPU (translation) the interaction of the process. And this process because the application is a lazy loading mapping where you take as much as you can, so we load it page by page, which is 16 kilobytes for iOS, and 4 kilobytes for macOS. Because of the lazy loading mode, if the page is not available in physical memory, the page will be reported as “page fault” and the missing page will be reloaded into physical memory. And it’s a very short process, maybe 30ms, maybe 10ms.

Pre-main (before main)

As we can imagine, the APP does a lot of preparatory work before executing the main function, such as loading the libraries we need, loading the dependent libraries, loading the class into memory, loading the class method and inserting it into the class method list, etc. But what is the process? How does it load? You can’t imagine it, and we don’t have to write it down, we just have to understand it. Because it’s not going to do you any good unless you’re developing a compiler, an IDE.

Binary rearrangement

Binary rearrangement is the notation of the rearrangement which is the order of the functions. The logic of sorting is that the methods that must be called at startup are grouped together, not randomly distributed, so that there is no page fault during startup

plan

When we start the APP, we need to load a lot of pages, normally thousands of pages, although a page takes less time, but at that moment to load so many pages, it will take longer. We can look at the page in of the application by looking at the Main Thread of the Instruments’ System Trace. Apple uses its own binary rearrangement scheme to optimize this time, such as douyin’s binary rearrangement, how to find all functions to load, and putting functions that don’t have to be executed in the first place behind.

The difficulty of binary rearrangement

This is the core of binary rearrangement, and also the most difficult, how to get the symbol order of the order file, how to collect the symbol order of APP startup. You can HOOK it, trace it at runtime. But can you guarantee 100%? Tiktok says they’re looking at compile-time pilings, hoping to get 100 percent of the executed functions. At present, we will use the way of Clang piling to obtain about 80%, that is, the scheme of douyin. This paper does not have it, but it will be added later

Apple’s Objc performs binary reordering

You can download the objc source code to see Apple’s binary rearrangement

Binary rearrangement process

  1. Application startup time is loaded in the order ofBuild PhasestheCompile SourcesThe order of
  2. Go to theBuild SettingsIn the searchWrite Link Map FileIf you set it to YES, you write. And then there isPath to Link Map FileThe address.
  3. Find the TXT file in the build, or x86_64 in the emulator. This is the order of execution now

4. Open the terminal, go to the directory, and create the order file, for example, touch test.order 5. Write in the functions you want to sort, and then in theBuild SettingsIn theOrder FileSelect test.order from test.order, and then compile it.

DYLD, dynamic linker

Dyld: is the linker. It plays an important role in how our libraries are loaded and mapped into memory. What does dyld do from APP launch to main?

Compilation process: source files -> precompile (corresponding syntax analysis) -> compile -> assembly -> link (this process also links to static and dynamic libraries) -> executable files

APP launch -> load libSystem -> Runtime register callback function -> load image -> execute map_images and load_images methods -> call main.

Dynamic and static libraries

The difference between static and dynamic libraries is the difference between their links. One is static link, the other is dynamic link

Static libraries:.a,.lib, etc. Static libraries are sorted one by one, such as ABCBDA, which is already sorted and may be duplicated, so it has the disadvantage of wasting memory

Dynamic libraries:.dll,.framework, etc. The difference between A and A is that B and A have only one copy of their memory. However, when B is used between A and C, and B is used between C and D, B is shared between them. So Apple will mostly use dynamic libraries

How do we find its execution function

We know that load and CXX are called before main. So we can break the load function or the Cxx function and look at its call stack. Then you know that this is the general process, and everything starts with dyLD_start. Hopefully you can see the following flow against the order of function calls

DYLD source code analysis process

Dyld source download, dyLD relies on libSystem and libDispatch

First, we need to be clear about our purpose: we’re looking at the process to see how dyld links images into memory before the APP launches into main. What does it do in objc_init to call dyld, and how dyld calls back to objC.

1. _dyld_start

The _dyLD_start function is written in assembly. It doesn’t really show anything. It just does the same thing in different frameworks. The main thing is to call the start method and call our main method when the dyld layer has finished loading. Here the source code does not show you can go to source search this method to see

# call dyldbootstrap::start(app_mh, argc, argv, dyLD_MH, &startglue) You can search the start method based on thisCopy the code

2. dyldbootstrap::start

So once we find the start method, we don’t look at anything remember I gave you the function call stack, the next step is the main function for dyld. Well, this function is in between, so we don’t have to worry about it

3. The _main important

Only the text is important, the rest is nonsense.

This is where dyld is most important. The number of lines of code for this function is 849, nearly 1000 lines of code. In fact, the most important function of the above function call stack is to get us here. This is the process of executing dyld, from instantiating the main program to notifying the main function that enters the program.

A flowchart

Shoot the breeze: (this short words do not need to see) directly summarized, because the whole process is too complicated, and the loading process of dyLD is also a lot of companies interview will be mentioned, and even some companies will ask the difference between DYLD2 and DYLD3, although know that this process is not egg use, for your development help does not exist, Some interviewers feel like they know the process is just better than you, and there is no use in it. Ask which interviewers if they can develop compilers or if they want to develop compilers with you. So if you know the loading process you’ll know why the load and CXX methods are in front of main, and you’ll know that the next step is the class loading process.

The main function contains: This is an important one to look at

As you can see, in addition to the early preparation, the most important process is 5-11. Well, the process is just a general process, and it involves symbolic binding and all that stuff but that’s not important. And the most important thing is 10, which is running and running is our main program

  1. Conditions: Environment, platform, version, path, and host information.
  2. Determine if there is a shared cache and load it (usually non-emulator case)
  3. Load the GDB debugger notification. (old version of unimportant, useless, it’s ok not to know the noun)
  4. Add dyLD to UUID list, enabling stack symbolization. (Useless, don’t need to know)
  5. Instantiation of the main program, instantiateFromLoadedImage (image file loader, is loaded in Mach – o header of the way, is the Mach – o format. This is where we put an executable file inside the rotten apple, and see that the mach-o format is loaded in this format. If you want to learn about it, go to the sniffLoadCommands method and you’ll find a one-to-one correspondence.)
  6. To load any inserted library, loadInsertedDylib
  7. Link to the main program
  8. Link image file (previously inserted library)
  9. Weak references bind to the main program
  10. Run all initialized programs. initializeMainExecutable
  11. Notifies dyld that it can enter main. notifyMonitoringDyldMain

4. InitializeMainExecutable is not important, but is a transition in the middle

From 3 to 4, this method calls map_images, load_images, etc., but it is also an intermediate process. Actually this does not want to put this source code, because it also that matter.

The source code

void initializeMainExecutable() { // record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; {// sImageRoots[I]->runInitializers(gLinkContext, initializerTimes[0]); } // Run initializers for main executable and everything it brings up // sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(),  initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code

5. RunInitializers aren’t important, but they’re in betweenizers

I don’t know what this method does either, but we can see that the most important thing is to call these two methods

The source code

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo) { uint64_t t1 = mach_absolute_time(); ThisThread = mach_thread_self(); thisThread = mach_thread_self(); ImageLoader::UninitedUpwards up; up.count = 1; up.imagesAndPaths[0] = { this, this->getPath() }; // This method is to recursively instantiate our image files. Then look at the implementation of processInitializers(Context, thisThread, timingInfo, up); NotifyBatch (dyLD_image_state_initialized, false); // This method calls notifyBatchPartial. mach_port_deallocate(mach_task_self(), thisThread); uint64_t t2 = mach_absolute_time(); fgTotalInitTime += (t2 - t1); }Copy the code

6. RecursiveInitialization

Hopefully, when you look at this, you’ll remember the previous graph of the function call stack, to see the order of function calls. The fact that we are here means that the diagram above has done its job. I’m sure you’re confused here. Why do I go through this process? Actually don’t worry. We’ll talk about it later

The source code

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { recursive_lock lock_info(this_thread); recursiveSpinLock(lock_info); if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; Try {// initialize lower level libraries first // for(unsigned int I =0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); if ( dependentImage ! = NULL ) { // don't try to initialize stuff "above" me yet if ( libIsUpward(i) ) { uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) }; uninitUps.count++; } else if (dependentImage->fDepth >= fDepth) {dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); } } } // record termination order if ( this->needsTermination() ) context.terminationRecorder(this); // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; // We'll look at this function, which registers a series of Mach-o files in place. But if you look at it for yourself you'll see that this function, which we can find the implementation of, is actually an assignment operation. So we're going to see the important thing with this method, sNotifyObjCInit, which is also an assignment operation, See registerObjCNotifiers assign context. NotifySingle (dyLD_image_state_dependents_initialized, this, &timingInfo); // Initialize this image // we need to look at this function, this is the echo of // here to call the CXX method, but here to call the CXX method of objc, not our project. Bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; NotifySingle (dyLD_image_state_initialized, this, NULL); if ( hasInitializers ) { uint64_t t2 = mach_absolute_time(); timingInfo.addTime(this->getShortName(), t2-t1); } } catch (const char* msg) { // this image is not initialized State = oldState; recursiveSpinUnLock(); throw; } } recursiveSpinUnLock(); }Copy the code

7. registerObjCNotifiers

Then we can use this method to find an important function, _dyLD_OBJC_notify_register, which calls the registerObjCNotifiers

The source code

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyLD_OBJC_NOTIFY_unmapped unmapped) {// record functions to call // assignment of map_images snotifyOBJCMMapped = mapped; // Assignment of load_images sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; // Call 'mapped' function with all images mapped so far try {// notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true); } catch (const char* msg) { // ignore request to abort during registration } // <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector<ImageLoader*>::iterator it=sAllImages.begin();  it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}Copy the code

Other functions include doInitialization and doModInitFunctions

DoInitialization calls doModInitFunctions(to load dependent libraries such as libSystem and libDispatch) and who calls doInitialization, see method 6.

Conclusion 1

I’m sure anyone looking at the DYLD process will be confused by the number of function calls from 1 to 7. What’s the point of these function calls? It makes sense, the function I’m going to end up with is _dyLD_OBJC_NOTIFy_register, and why I’m going to end with this function, because we can see in the objc source code that there’s a call to objc_init which is the objC initialization method. Here’s objc -> dyld over here. That’s not true either. It’s actually dyld calling objc_init. Why do you say so

Conclusion 2

The process for loading our core application is shown in the figure above: dyld -> libSystem -> libDispatch ->objc.

The function calls from the stack frame13-Frame4 are made in dyld. This is important, but it’s important to note that dyld actually has a place where it checks to see if the libSystem library is loaded. If not, it makes it load or reports an error. Because libSystem is its underlying dependent library, it is the first library to load

supplement

Dyld mainly implements two methods. And dyld actually loads both the load method and the CXX method, but the CXX method dyld loads is the CXX method of objc, and objc calls its own CXX method, which is explained in the objc_init method. This article does not elaborate

When are map_images and load_images invoked

Since we do not know when each image file is loaded, we need a callback (next handle) to tell it that the image file is loaded, and then we need a state to indicate it, so we must notify it with notifySingle.

  1. Map_images: image file loading, resulting in read_images. This approach is important
  2. Load_images: The load of the load method

Map_images is called at notifyBatchPartial, which is called immediately after the notification is registered. And load_images is called at notifySingle.

conclusion

The whole dyld -> main function, this whole process, we just need to know about it. The key to dyld is that it calls two methods: map_images (image loading includes class loading) and load_images (load method loading).