preface

In previous articles, we introduced data for objects and classes, properties, methods, member variables, and so on. All of these things are done in code, they all need to be loaded into memory for us to use, or they’re just files, and today we’re going to explore how they’re loaded into an application.

The preparatory work

  • Objc4-818.2 – the source code
  • Dyld – 852 source
  • Libdispatch source
  • Libsystem source

First, the application loading principle

Every application load requires some underlying libraries, UIKit, CoreFoundation, AVFoundation, and so on. Libraries are executable binary files that can be loaded into memory by the operating system. There are static libraries and dynamic libraries.

The build process

Executable file

1. Executable files of the project

Create a macOS project:

int main(int argc, const char * argv[]) { @autoreleasepool { // insert code here... NSLog(@"Hello, World!" ); } return 0; }Copy the code
  • The code is printed by default and is not modified.

Next, generate the executable file and drag it into terminal:

  • As shown above, drag the executable file toterminalIt’s ok to do the operation. It’s printed outHello, World!.

2. System library executable file

Find the system Foundation executable file:

  • throughimage listAccess to theFoundationExecutable file path, finally found on disk successfully.

Static and dynamic linking

  • The dynamic link method can share the dynamic library, optimize the memory space, so Apple’s library is dynamic library.

The loading process

Libraries are loaded into memory through dyld (dynamic linker). The overall process can be represented by the following diagram:

Two, the derivation of dyLD

Let’s create an iOS project and add the load method to the viewController.m:

@implementation ViewController
+ (void)load{
    NSLog(@"%s",__func__);
}
@end
Copy the code

Enter a breakpoint at main and run the program:

  • The program broke successfullymainDelta function, we found that delta functionmainFunction was called beforestartFunction, so let’s add onestartSymbol breakpoints for debugging.

Add the start symbol breakpoint and run the program again:

  • Add thestartThe symbol breakpoint did not break, the program still wentmainFunction to indicate that these symbolic breakpoints are notstartThe implementation of the. inmainBefore the function+[ViewController load]It’s being called, so it’s inloadMethod to type a breakpoint.

At the break point in the ViewController’s load method, run the program:

  • Program to break inloadMethod after passingbtPrint the stack. Found in the stack_dyld_startFunction. It also leads up heredyld, click on thedyld-852Download the source code and proceeddyldSource exploration.

3. On dyLD process

Search globally for _dyLD_start in source code:

  • We are indyldStartup.sI found it in the file_dyld_startThe implementation of the. And saw thatcall dyldbootstrap::startCode like this,dyldbootstrapisC++Namespace in thestartIn this namespace.

Find the start function in the dyldBootstrap namespace:

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
    ...
    return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code
  • You can seestartFunction, returns_mainDelta function, and then yeah_mainAnalyze.

4. Main function in dyld process is the main process

Click on the _main function:

uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ... GetHostInfo (mainExecutableMH, mainExecutableSlide); {__block bool platformFound = false; ((dyld3::MachOFile*)mainExecutableMH)->forEachSupportedPlatform(^(dyld3::Platform platform, uint32_t minOS, uint32_t sdk) { if (platformFound) { halt("MH_EXECUTE binaries may only specify one platform"); } gProcessInfo->platform = (uint32_t)platform; platformFound = true; }); Const char* rootPath = _simple_getenv(envp, "DYLD_ROOT_PATH"); if ( (rootPath ! = NULL) ) { ... } else { ... } // Load shared cache mapSharedCache(mainExecutableSlide); / / instantiate the main program instantiate ImageLoader for main executable sMainExecutable = instantiateFromLoadedImage (mainExecutableMH, mainExecutableSlide, sExecPath); // Load any inserted libraries if (senv. DYLD_INSERT_LIBRARIES! = NULL ) { for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); } // link main program link(sMainExecutable, senv.dyLD_bind_at_launch, true, ImageLoader::RPathChain(NULL, NULL), -1); Bind and notify for the inserted images now interposture has been registered if (sInsertedDylibCount > 0) { for(unsigned int i=0; i < sInsertedDylibCount; ++i) { ImageLoader* image = sAllImages[i+1]; image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true, nullptr); <rdar://problem/12186933> do weak binding only after all inserted images linked sMainExecutable->weakBind(gLinkContext); gLinkContext.linkingMainExecutable = false; // Run all Initializers initializeMainExecutable(); // motoring may enter main() function notify any monmotoring proccesses that this process is about to enter main() notifyMonitoringDyldMain(); result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD(); return result; }Copy the code

InitializeMainExecutable process – Main program run

Enter the initializeMainExecutable function:

Void initializeMainExecutable() {// run initialzers for any dylibs // allImagesCount() : Get all the number of image file ImageLoader: : InitializerTimingList initializerTimes [allImagesCount ()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++ I) {// sImageRoots[I]->runInitializers(gLinkContext, initializerTimes[0]); } // Run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); }Copy the code
  • You can see that both the image file initialization and the main program initialization are calledrunInitializers.

runInitializers

Enter runInitializers:

  • The point of this function isprocessInitializersFunction.

processInitializers

Enter processInitializers:

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                                                   InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    uint32_t maxImageCount = context.imageCount()+2;
    ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
    ImageLoader::UninitedUpwards& ups = upsBuffer[0];
    ups.count = 0;
    // Calling recursive init on all images in images list, building a new list of
    // uninitialized upward dependencies.
    for (uintptr_t i=0; i < images.count; ++i) {
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code
  • The point of this function isrecursiveInitializationFunction.

recursiveInitialization

Enter the recursiveInitialization function:

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; Try {// initialize lower level libraries first // for(unsigned int I =0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); if ( dependentImage ! = NULL ) { // don't try to initialize stuff "above" me yet if ( libIsUpward(i) ) { tUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) }; uninitUps.count++; } else if ( dependentImage->fDepth >= fDepth ) { dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); } } } // record termination order if ( this->needsTermination() ) context.terminationRecorder(this); // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; NotifySingle (dyLD_image_state_dependents_initialized, this, &timingInfo); // Call init method initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; Context. notifySingle(dyLD_image_state_initialized, this, NULL); } catch (const char* msg) { ... }}}Copy the code
  • context.notifySingle: A single notification injection.
  • this->doInitialization: Calls the init method.
  • context.notifySingle: indicates that initialization is complete.

notifySingle

A global search is found for notifySingle:

Click enter to enter notifySingle:

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo) { //dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath()); std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sSingleHandlers); if ( handlers ! = NULL ) { dyld_image_info info; info.imageLoadAddress = image->machHeader(); info.imageFilePath = image->getRealPath(); info.imageFileModDate = image->lastModified(); for (std::vector<dyld_image_state_change_handler>::iterator it = handlers->begin(); it ! = handlers->end(); ++it) { const char* result = (*it)(state, 1, &info); if ( (result ! = NULL) && (state == dyld_image_state_mapped) ) { //fprintf(stderr, " image rejected by handler=%p\n", *it); // make copy of thrown string so that later catch clauses can free it const char* str = strdup(result); throw str; } } } if ( state == dyld_image_state_mapped ) { // <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache // <rdar://problem/50432671> Include UUIDs for shared cache dylibs in all image info when using  private mapped shared caches if (! image->inSharedCache() || (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) { dyld_uuid_info info; if ( image->getUUID(info.imageUUID) ) { info.imageLoadAddress = image->machHeader(); addNonSharedCacheImageUUID(info); } } } if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! = NULL) && image->notifyObjC() ) { uint64_t t0 = mach_absolute_time(); dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); uint64_t t1 = mach_absolute_time(); uint64_t t2 = mach_absolute_time(); uint64_t timeInObjC = t1-t0; uint64_t emptyTime = (t2-t1)*100; if ( (timeInObjC > emptyTime) && (timingInfo ! = NULL) ) { timingInfo->addTime(image->getShortName(), timeInObjC); }}}Copy the code
  • Locate key code(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());.

sNotifyObjCInit

Search for sNotifyObjCInit to obtain the relevant code:

static _dyld_objc_notify_init		sNotifyObjCInit;

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    // record functions to call
    sNotifyObjCMapped	= mapped;
    sNotifyObjCInit		= init;
    sNotifyObjCUnmapped = unmapped;
}
Copy the code
  • sNotifyObjCInitis_dyld_objc_notify_initType of, inregisterObjCNotifiersThe value assigned to the function, soregisterObjCNotifiersAnd where was it called.

registerObjCNotifiers

Search registerObjCNotifiers globally:

  • in_dyld_objc_notify_registerFunctionregisterObjCNotifiersAnd_dyld_objc_notify_registerWe’ve seen it before.

Look at theObjc4-818.2 – the source code δΈ­ _objc_initFunction implementation:

  • Here it is_dyld_objc_notify_registerSo let’s start with_objc_initContinue exploring for pointcuts.

Images initialization process

We will follow the process backwards, starting with _objc_init.

Open objC4-818.2 source code, at the _objc_init break point, run the program:

  • As you can see, in the_objc_initFunction was called earlier_os_object_initThis function is inlibdispatchIn the library.

_os_object_init

Download the libDispatch source code and search globally for the _OS_object_init function:

  • To find the_os_object_initFunction, and found in it_objc_init()Function is now called to get the flow:_os_object_init -> _objc_init().

Let’s see what functions are called before the _os_object_init function:

  • You can see that it islibdispatch_initIt’s called, it also belongs tolibdispatchLibrary.

libdispatch_init

Libdispatch_init:

libdispatch_init(void)
{
    ...
    _dispatch_hw_config_init();
    _dispatch_time_init();
    _dispatch_vtable_init();
    _os_object_init();
    _voucher_init();
    _dispatch_introspection_init();
}
Copy the code
  • inlibdispatch_initIs found in the implementation of_os_object_init()Currently get the process:libdispatch_init -> _os_object_init -> _objc_init().

Again, look at the function calls before libDispatch_init:

  • libdispatch_initIt was calledlibSystem_initializerThe function,libSystem_initializerBelong tolibSystemLibrary, continue validation.

libSystem_initializer

Download the Libsystem source code and search globally for libSystem_initializer:

  • To find thelibSystem_initializerFunction implementation, and in239Line to find thelibdispatch_initFunction is now called to get the flow:libSystem_initializer -> libdispatch_init -> _os_object_init -> _objc_init().

View the call to the step function on libSystem_initializer:

  • You can see that the call from the previous step isdoModInitFunctionsDelta function, this is backdyld.

doModInitFunctions

Find doModInitFunctions and enter the function:

void ImageLoaderMachO::doModInitFunctions(const LinkContext& context) { if ( fHasInitializers ) { const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds; const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)]; const struct load_command* cmd = cmds; for (uint32_t i = 0; i < cmd_count; If (CMD -> CMD == LC_SEGMENT_COMMAND) {const struct macho_segment_command* seg = (struct macho_segment_command*)cmd; const struct macho_section* const sectionsStart = (struct macho_section*)((char*)seg + sizeof(struct macho_segment_command)); const struct macho_section* const sectionsEnd = &sectionsStart[seg->nsects]; for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) { const uint8_t type = sect->flags & SECTION_TYPE; if ( type == S_MOD_INIT_FUNC_POINTERS ) { Initializer* inits = (Initializer*)(sect->addr + fSlide); for (size_t j=0; j < count; ++j) {// Initializer includes libSystem_initializer. Initializer func = inits[j]; if ( ! dyld::gProcessInfo->libSystemInitialized ) { // <rdar://problem/17973316> libSystem initializer must run first // Const char* installPath = getInstallPath(); if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) ! = 0) ) dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath()); } bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers ! = NULL); { dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0); // Initializer includes libSystem_initializer calls func(context.argc, context.argv, context.envp, context.apple, &context.programVars); } if ( ! haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) { // now safe to use malloc() and other calls in libSystem.dylib dyld::gProcessInfo->libSystemInitialized = true; }} else if (type == S_INIT_FUNC_OFFSETS) { } } } } } }Copy the code
  • We know by functional analysislibSystemIs the first one that will be loadedThe mirrorFile, otherwise an error will be reported.
  • UIKit,FoundationAnd all the other librariesRuntimeBase, thread base, environment base, etc., so load firstlibSystem.
  • Initializer func = inits[j];I got it the first timefuncislibSystem_initializer, and through thefunc(context.argc, context.argv, context.envp, context.apple, &context.programVars);Make a call, and look at itdoModInitFunctionsThe call.
  • Current process:doModInitFunctions -> libSystem_initializer -> libdispatch_init -> _os_object_init -> _objc_init().

Search for doModInitFunctions to see its calls:

  • OK!!!!!Which brings us back to the function we mentioned abovedoInitialization.
  • Current process:doInitialization -> doModInitFunctions -> libSystem_initializer -> libdispatch_init -> _os_object_init -> _objc_init().

Dyld link objc function execution

From the previous analysis, we obtain the following following doInitialization:

doInitialization -> doModInitFunctions -> libSystem_initializer -> libdispatch_init -> _os_object_init -> _objc_init() -> _dyLD_OBJC_NOTIFy_register -> registerObjCNotifiers.

Let’s review the calls to the _dyLD_OBJC_NOTIFy_register and registerObjCNotifiers, and the key code to implement them:

void _objc_init(void) { ... _dyld_objc_notify_register(&map_images, load_images, unmap_image); . } void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped) { dyld::registerObjCNotifiers(mapped, init, unmapped); } void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped) { // record functions to call sNotifyObjCMapped = mapped; sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; . }Copy the code
  • And the comparative analysis shows that,map_images() = sNotifyObjCMapped().load_images() = sNotifyObjCInit().
  • Let’s exploremap_images()andload_images()Where was it called.
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
Copy the code

Search globally for sNotifyObjCMapped:

  • innotifyBatchPartialFunction, we found itsNotifyObjCMappedThe call.

Search notifyBatchPartial globally:

  • inregisterObjCNotifiersFunctionnotifyBatchPartialThe call to the originalsNotifyObjCMappedThe function is assigned to this function and is called directly.

Where is sNotifyObjCInit called? Continue searching:

  • innotifySingleFunction, findsNotifyObjCInitThe call.

NotifySingle is called in recursiveInitialization, as in doInitialization:

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) {if (fState < dyLD_image_state_dependents_initialized -1) {try {// Single notification injection context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo); // Call init method initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; Context. notifySingle(dyLD_image_state_initialized, this, NULL); } catch (const char* msg) { ... }}}Copy the code
  • Look at the codecontext.notifySingleIs in thethis->doInitializationIt was called before, andsNotifyObjCInitIs in thedoInitializationI just got registered. Why is that?
  • becauserecursiveInitializationIt’s a recursive function, first callnotifySinglewhensNotifyObjCInitNot initialized, the second time I came insNotifyObjCInitIt’s worth it.
  • So to summarize, the first time you go in, it’s calleddoInitializationIn the functionmap_imagesandload_imagesTo initialize, immediately following this callmap_images. And then go down to thenotifySingleFunction, will callload_imagesFunction.

Viii. Dyld process analysis diagram

Today is the general process of dyLD analysis, the next article will be on the class loading and other detailed information to explore, click support!! πŸ˜„ πŸ˜„ πŸ˜„.