Application loading process of iOS low-level analysis

Study in harmony! Not anxious not impatient!! I am your old friend Xiao Qinglong

In previous articles, we looked at alloc, the structure of the class, the underlying process of sending messages, and so on. So how does our code load into memory? That’s what we’re going to explore today.

To prepare data

Dyld source
Libsystem source

The whole compilation process can be roughly divided into:

precompiled(Done by Xcode)
compile(Done by Xcode)
assembly
Executable file

precompiled

This is what you do before you compile. Generally speaking, precompile is divided into:

Macro definition
File contains
Conditional compilation

Macro definitions are also called macro substitutions, which do the substitution without the calculation. The macro definition is written as follows:

#define Identifier stringCopy the code

File contains, as the name suggests, are macros used to describe the inclusion of one file into another. At the OC level, we usually use

#include: Include a source code #import: contains a source file code @class: declares a class, usually in.hI'll use it in the file, okay.mIn go toimportIt.Copy the code

Conditional compilation

Conditional compilation is when the preprocessor determines the conditions based on the preprocessor instructions before compilation. If the conditions are met, the code is compiled. Otherwise, the code is not compiled at all. Example of conditional compilation:1.#if 
2.#ifdef determines if a macro is defined. If so, execute the following statement3.#ifndef, as opposed to #ifdef, determines if a macro is undefined4.If # elif #if, #ifdef, #ifndef, or the preceding #elif condition is not satisfied, then the statement after #elif is executedelse-if
5.#elseWith #if, #ifdef, #ifndef, #ifndef, #ifndefelseThe following statements are equivalent to those in C syntaxelse
6.#endif #if, #ifdef, #ifndef.7.#ifDifference from #ifdef: #if#ifdef specifies whether a macro has been defined. To distinguishCopy the code

In order to speed up the compilation, avoid using the same file multiple files and multiple references to the same file, apple provides the concept of the precompiled header, which is what we usually use. PCH files, in. Inside the PCH definition, reference files, variables are global, and will only compile time, so we can define the common things in it.

Executable file

Dynamic library and warp state library

Static library format:.a, etc
Dynamic library formats:.framework,.dylib,.tbd, etc

Loading mode:

Static library is a state into memory, prone to duplication and waste; Dynamic libraries are loaded when you need them, which saves a lot of space.

Loading process:

The app launched
Load the appropriate library
Callback functions for the registry_dyld_objc_notify_register
loadingMemory mapping for libraries
Execute map_images, Load_images
Call main

And then we go throughSource code analysisLet’s see, the flow before main goes down.

Dyld analysis

(Dyld is also called dynamic linker)

Dyld process:

Dyld_start dyldbootstrap: : start dyld: : _main environment, such as platform, version, host information preparation instantiateFromLoadedImage instantiation of the main program link the main program if weakBind reference binding main program NotifyMonitoringDyldMain Notifies dyld that main() can be calledCopy the code

We need a function that executes before main. Select load for now:

What we found was that the first one that was executed was in dyld_dyld_startNext we downloadDyld source

Open the source code and search for _dyLD_start, we will find several __dyLD_start: definitions. Since the current running device is iPhone11, we only need to look at the #if __arm64__ :

#if __arm64__
.text
.align 2
.globl __dyld_start
__dyld_start:
mov x28, sp
and     sp, x28, #~15 // force 16-byte alignment of stack
mov x0, #0
mov x1, #0
stp x1, x0, [sp, #-16]! // make aligned terminating frame
mov fp, sp // set up fp to point to terminating frame
sub sp, sp, #16             // make room for local variables.// call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue). #endif// __arm64__
Copy the code

We found a call in the comment that called the start function of dyldBootstrap. We searched for dyldbootstrap globally in the dyld project:

Locate thestartfunction

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue){...// Reposition dyld
    rebaseDyld(dyldsMachHeader);
    // Set the envp pointer to the last argv
    const char** envp = &argv[argc+1];
    // Point the Apple pointer to the end of argv
    const char** apple = envp;
    while(*apple ! = NULL) { ++apple; } ++apple;/// Set random values for the stack__guard_setup(apple); .// The dyld boot is complete. Next, run main
    uintptr_t appsSlide = appsMachHeader->getSlide();
    return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code

Since the _main function has hundreds of lines of code, if we analyze each line, it will be very energy-consuming. We can analyze the core code by combining the final result return value and the program loading process we know from the beginning:

uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
int argc, const char* argv[], const char* envp[], const char* apple[], 
uintptr_t* startGlue){...// Load the shared cache
    checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
    if( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) { #if TARGET_OS_SIMULATOR
        if ( sSharedCacheOverrideDir)
            mapSharedCache();
#else
        mapSharedCache();
#endif
    }

    // Instantiate the main program
    / * * instantiateFromLoadedImage internal did three things: to determine whether a machO compatible initialization ImageLoader ImageLoader * / loading after initializationsMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath); .// Load the inserted library, that is, the dynamic library
    if( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) {for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib) 
            loadInsertedDylib(*lib);
    }
    ...
    // Link the main program
    link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true.ImageLoader::RPathChain(NULL, NULL), -1); .// Link dynamic libraries
    link(image, sEnv.DYLD_BIND_AT_LAUNCH, true.ImageLoader::RPathChain(NULL, NULL), -1); .// <rdar://problem/12186933> do weak binding only after all inserted images linked
    // Weak reference binding (weak reference binding symbol table is not used until all iAMge image files have been linked)sMainExecutable->weakBind(gLinkContext); .// run all initializers
    // Run all initializers
    initializeMainExecutable();
    // Tell dyld to call mainnotifyMonitoringDyldMain(); .// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
    // With LC_UNIXTHREAD, the main program finds the entry of main()result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD(); .return result;
}

Copy the code

Next, let’s explore initializeMainExecutable

void initializeMainExecutable()
{
    // run initialzers for any inserted dylibs
    // Get all the image files
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        // Iterates through all the image files and executes Initializers
        for(size_t i=1; i < rootCount; ++i) {
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); }}// run initializers for main executable and everything it brings up
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); . }Copy the code

Enter the runInitializers

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo){... processInitializers(context, thisThread, timingInfo, up); context.notifyBatch(dyld_image_state_initialized,false); . }Copy the code

Click on processInitializers

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images){...for (uintptr_t i=0; i < images.count; ++i) {
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code

Search recursiveInitialization (const

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,

  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps){...try {
    // Initialize the dependency file
    for(unsigned int i=0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); . }.../// Let the object know that we are initializing
    uint64_t t1 = mach_absolute_time();
    fState = dyld_image_state_dependents_initialized;
    oldState = fState;
    context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
    
    // Initialize the image file
    bool hasInitializers = this->doInitialization(context);
    // let anyone know we finished initializing this image
    fState = dyld_image_state_initialized;
    oldState = fState;
    // The notification has been initialized
    context.notifySingle(dyld_image_state_initialized, this, NULL); . }Copy the code

Click on notifySingle and find that the function definition is not found. Instead, search for context.notifySingle:

gLinkContext.notifySingle = &notifySingle;
Copy the code

Find the redirection to the function address &notifySingle, we click on it:

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo){... (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); . }Copy the code

Search sNotifyObjCInit

We find that the sNotifyObjCInit assignment comes from the second argument of the registerObjCNotifiers function. Let’s search registerObjCNotifiers to see where it was called:

void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

Since we can’t find any upper level function called _dyLD_OBJC_notify_register in the dyly source project, let’s look at the next symbol breakpoint to see where it was called:

run

And what we found is,_dyld_objc_notify_registerby_objc_initThe call.

So far, with respect toThe pictureDyld section of the code has been analyzed, the next step inlibobjcProject, open the ObjC project, as we analyzed earlier_dyld_objc_notify_registerFor this process, let’s do a global search in objC projects_dyld_objc_notify_register:

void _objc_init(void){... _dyld_objc_notify_register(&map_images, load_images, unmap_image); . }Copy the code

_dyLD_objc_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register = _dyC_notify_register

In that order, _objc_init is inrecursiveInitializationAnd then execute,

while_dyld_objc_notify_registerJust likeblockCallback is the same as bynotifySingleStatement, and finally_objc_initIn the call.

当_objc_initAfter an initialization operation, call_dyld_objc_notify_registertelldyldI’ve already initialized it.

And in the middle of therecursiveInitialization,doInitializationWhen we don’t have a way to go backrecursiveInitializationContinue the analysis, so here we adopt the backward deduction thinking:And then you go back from the result.

_objc_init calls _dyLD_OBJC_NOTIFy_register, which we looked at earlier.

_objc_initBy the upper_os_object_initMake the call,_os_object_initExists in the curry

libdispatch

Open libDispatch and search for _objc_init

And it shows that,objcIn the project_objc_initThe function is, indeed, made up oflibdispatchengineering_os_object_initFunction.

Follow the previous thought, continue to searchlibdispatch_init

DISPATCH_EXPORT DISPATCH_NOTHROW
void
libdispatch_init(void){... _os_object_init(); . }Copy the code

libdispatch_initbylibSystem_initializer2. To initiate or search for:foundlibSystem_initializerDerived from theLibsystemLibrary.

Libsystem source download

LibSystem_initializer:

__attribute__((constructor))
static void
libSystem_initializer(int argc,
      const char* argv[],
      const char* envp[],
      const char* apple[],
      const struct ProgramVars* vars) {... libdispatch_init(); . }Copy the code

Continue to review images found ImageLoaderMachO: : doModInitFunctions dyld library, so we went back to dyld engineering, search ImageLoaderMachO: : doModInitFunctions:

void ImageLoaderMachO::doModInitFunctions(const LinkContext& context)
{
    Initializer func = (Initializer)((uint8_t*)this->machHeader() + funcOffset); . func(context.argc, context.argv, context.envp, context.apple, &context.programVars); . }Copy the code

Continue to up a layer of search ImageLoaderMachO: : doInitialization

bool ImageLoaderMachO::doInitialization(const LinkContext& context){.../ / click doModInitFunctions, jump to ImageLoaderMachO: : doModInitFunctionsdoModInitFunctions(context); . }Copy the code

Search ImageLoader: : recursiveInitialization

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
    / / initialize the image file (click doInitialization will jump ImageLoaderMachO: : doInitialization)
    bool hasInitializers = this->doInitialization(context);
}
Copy the code

Next, let’s do a test:

@implementation ViewController
+ (void)load{
    [super load];
    NSLog(@"I'm the ViewController Load function"); }... @endCopy the code

In the main.m file:

int main(int argc, char * argv[]) {
    NSLog(@"I'm the main.m main function"); . }/// define a C++ function
__attribute__((constructor)) void SSJFun(){
    printf("\n I'm main.m SSJFun \n");
}
Copy the code

The purpose is to look at the print order of these three and see the result:

We found that the order of executionThe load function > C + + function(SSJFun) > The main functionSo why? Let’s analyze it.

Open the objc source code, swishload_images:

Load_images calls all the load methods

void
load_images(const char *path __unused, const struct mach_header *mh){...// Discover load methods{... prepare_load_methods((constheaderType *)mh); }... }Copy the code

Enter the prepare_load_methods:

void prepare_load_methods(const headerType *mhdr){... classref_tconst *classlist = 
        _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        // Enter the class's load method
        schedule_class_load(remapClass(classlist[i]));
    }
    category_t * const*categorylist = _getObjc2NonlazyCategoryList(mhdr, &count); . }Copy the code

Enter the schedule_class_load:

static void schedule_class_load(Class cls)
{
    if(! cls)return; .if (cls->data()->flags & RW_LOADED) return;
    
    schedule_class_load(cls->getSuperclass());
    
    add_class_to_loadable_list(cls);
    
    cls->setInfo(RW_LOADED); 
}

/** We find that schedule_class_load is a recursive call that executes add_class_to_loadable_list */ along CLS and its parent classes
Copy the code

Enter the add_class_to_loadable_list:

void add_class_to_loadable_list(Class cls){ IMP method; . method = cls->getLoadMethod();if(! method)return;  // Don't bother if cls has no +load method./** loadable_classes[loadable_classes_used] [loadable_classes_used 
    loadable_classes[loadable_classes_used].cls = cls;
    loadable_classes[loadable_classes_used].method = method;
    loadable_classes_used++;
}
}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
Copy the code

Enter the getLoadMethod:

IMP 
objc_class: :getLoadMethod(){...constmethod_list_t *mlist; . mlist = ISA()->data()->ro()->baseMethods();if (mlist) {
        for (const auto& meth : *mlist) {
            const char *name = sel_cname(meth.name());
            if (0 == strcmp(name, "load")) {
                return meth.imp(false); }}}return nil;
}
/** we find that getLoadMethod is the imp process of finding the load method, */
Copy the code

Return to the -> prepare_load_methods function to continue the analysis

void prepare_load_methods(const headerType *mhdr){... classref_tconst *classlist = 
        _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        // Enter the class's load method
        schedule_class_load(remapClass(classlist[i]));
    }
    }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
    
    // What does the classification do next
    category_t * const*categorylist = _getObjc2NonlazyCategoryList(mhdr, &count); . add_category_to_loadable_list(cat); }Copy the code

Enter the add_category_to_loadable_list:

void add_category_to_loadable_list(Category cat){ IMP method; . method = _category_getLoadMethod(cat);if(! method)return; . loadable_categories[loadable_categories_used].cat = cat; loadable_categories[loadable_categories_used].method = method; loadable_categories_used++; }// This is the same as the previous class.
// This is also the process of finding the IMP of the load method and writing loadable_categories.
Copy the code

Continuing back to the load_images function:

load_images(const char *path __unused, const struct mach_header *mh){...// Add the load method of the class to loadable_classes,
    // Add the load method for the category to loadable_categories
    prepare_load_methods((const headerType *)mh);
    
    // -- > < span style = "box-sizing: border-box; color: RGB (74, 74, 74)
    call_load_methods();
}
Copy the code

Enter the call_load_methods:

void call_load_methods(void)...do {
        // Loadable_classes_used has been assigned in prepare_load_methods to count the number of loads
        while (loadable_classes_used > 0) {
            // This is where the load method is called
            call_class_loads();
        }
        // This is where the load method of the category is called
        more_categories = call_category_loads();
    } while (loadable_classes_used > 0|| more_categories); . }Copy the code

At this point, we can conclude that the load_images function calls:

Load methods for all non-lazily loaded classes
Load methods for all non-lazily loaded categories

Why are C++ methods automatically called, and when?

Let’s make a breakpoint in the SSJFun method. Console bt looks at the stack information:

And what we found is,SSJFunIs the upper layer function from dyLD_simdoModInitFunctionsCalled, anddoModInitFunctionsIs made up ofdoInitializationThe call.

Open dyld source code, searchdoInitialization, we find this code:

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps){... context.notifySingle(dyld_image_state_dependents_initialized,this, &timingInfo);
      bool hasInitializers = this->doInitialization(context); . }/** notifySingle, which was analyzed earlier, eventually calls _dyLD_OBJc_notify_register as the second argument, load_images; Load_images calls the load method; NotifySingle executes before doInitialization, so it is clear that the 'load' method is called before the 'SSJFun' method. * /
Copy the code

Why is main executed last?

_dyLD_start = _dyLD_start = _dyLD_start = _dyLD_start

found_dyld_startAnd then it turns upmain()Functions;

Go back to the project and turn on DeBug:

We see that _dyLD_start does execute main, which again proves that main is executed after dyld.

Where are the functions initialized in the registerObjCNotifiers called?

As we know, after a series of initializations in _objc_init, we call _dyLD_objC_notify_register, which then goes to dyld’s registerObjCNotifiers:

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped){ sNotifyObjCMapped = mapped; sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; . }Copy the code

Find the sNotifyObjCMapped call

Search globally for sNotifyObjCMapped, find its calling function notifyBatchPartial, and search for notifyBatchPartial to find its upper call:

Look for the sNotifyObjCInit call

Search globally for sNotifyObjCInit, find its calling function notifySingle, and search notifySingle to find the upper call:

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps){...// Call only if there is a dependency
    /** if (A) is dependent on (B); For the first recursiveInitialization call, library A is empty when doInitialization completes. There is no dependency on library A, and the first notifySingle does not execute */
    context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
    / / initialization
    bool hasInitializers = this->doInitialization(context); . context.notifySingle(dyld_image_state_initialized,this, NULL); . }Copy the code

So just to wrap up here, applicationsFrom start up to objc_init:

Code:

Link: pan.baidu.com/s/1Bse22q_f… Password: DU3f (including Demo, dyld source, libDispatch source, Libsystem source, objC4 source)