In previous chapters, we learned about the nature of method calls, message lookup and message forwarding, and today we’ll explore a new topic, application loading.

I was once asked this question in an interview: What happens between hitting Run and main? Here is also by this question to launch our analysis. When we click on the Run after, will experience a process of compiler to execute, temporarily to compile this part does not make analysis (are watching the self-improvement of the programmer, this book is of great help to understand the compilation, later will write an article to summarize and series), so this article will through the start function and combining Dyld for analysis.

One: knowledge leads

1. What is Dyld

Dyld (The Dynamic Link Editor) is apple’s dynamic linker, which is an important part of Apple’s operating system. After an application is compiled and packaged into a Mach-O file in executable format, DYLD is responsible for linking and loading programs.

Dyld source code is open source, can be downloaded from the official website

2. Share the cache

In iOS system, dynamic libraries dependent on each program need to be loaded into memory one by one through dyld (located at /usr/lib/dyld). However, if each program is loaded repeatedly, it will inevitably cause slow running. In order to optimize the startup speed and improve the performance of the program, the shared cache mechanism was created. All the default dynamic link libraries are merged into one large cache file, in/System/Library/Caches/com. Apple. Dyld/directory, according to the different architecture preservation were preserved.

There is no dynamic library cache

There is a dynamic library cache

Extension 3.

Since dynamic libraries are loaded into memory at runtime, which means they are not in Mach-o, how does the system find the address of the external function and make the call?

  1. Generated when the project is compiledMach-OThere’s a space set aside in the executable, which is essentially a symbol table, and it’s stored there_DATAData segment (because_DATASegments are readable and writable at run time.
  2. Compile-time: all library methods in the project that refer to the shared cache are set to symbolic addresses, such as one in the projectNSLogAt compile timeMach-OCreate aNSLogIn engineeringNSLogJust point to this symbol.)
  3. Run time: WhendyldWhen the application process is loaded into memory, according toload commandsWhich library files need to be loaded to do the binding (toNSLogFor example,dyldI’m going to find itFoundationNSLogTo the real address of_DATASection in the symbol tableNSLogThe symbol above)

This process is called PIC technology (Position Independent Code)

Two: DYLD loading process

1. The main function

If we break the main function, we can see that start is also called before main

start
libdyld.dylib

That is, the main function of the main program is called by dyld. Below we will interpret the source code of DYLD

2. Start function

In the source of dyld, find the start function in the file dyldinitialization.cpp

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);

	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
    rebaseDyld(dyldsMachHeader);

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple ! = NULL) { ++apple; } ++apple; //set up random value for stack canary
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(argc, argv, envp, apple);
#endif

	// now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code

Two important parameters of the function are explained here

  1. const struct macho_header* appsMachHeaderAnd this parameter isMach-Oheader.
  2. intptr_t slideSo this is actuallyALSRIn other words, by a random value (which is what we have hereslide) to implement random loading of address space configuration
  3. Physical address = ALSR + Virtual address (offset)

So what’s going on in this function?

  • Redirect macho according to the calculated SLIDE of ASLR.

  • Class to allow DyLD to use Mach messaging.

  • Stack overflow protection.

  • After initialization, call dyld main,dyld::_main

3. dyld::_main

//
// Entry point for dyld.  The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which__dyld_start jumps to // uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ...... To be honest, the code is too long, so I will analyze the key points below.Copy the code

Old rules first read notes: DyLD entry. The kernel program loads dyLD and jumps to __dyLD_START, a function that is called after setting registers. Returns the address of the main() function of the target program.

Next, I will select the key points for analysis

3.1 Preparation

3.1.1 Setting Environment Variables

I’m not going to post the code in this section, but you can look at it yourself, and of course we can also set environment variables in Xcode

3.1.2 Setting Context InformationsetContext

setContext(mainExecutableMH, argc, argv, envp, apple);
Copy the code

3.1.3 Detect whether threads are restricted and do relevant processingconfigureProcessRestrictions

configureProcessRestrictions(mainExecutableMH, envp);
Copy the code

3.1.4 Checking Environment VariablescheckEnvironmentVariables

{
	checkEnvironmentVariables(envp);
	defaultUninitializedFallbackPaths(envp);
}
Copy the code

3.1.5 Obtaining program architecture getHostInfo

{
	getHostInfo(mainExecutableMH, mainExecutableSlide);
}
Copy the code

3.2 Loading the Shared Cache

3.2.1 Checking whether the Shared Cache is Disabled

checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
Copy the code

3.2.2 Loading a Shared Cache LibrarymapSharedCache

	if( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) {#if TARGET_OS_SIMULATOR
		if ( sSharedCacheOverrideDir)
			mapSharedCache();
#else
		mapSharedCache();
#endif
Copy the code

3.3 Adding a DYLD to the UUID list

Add dyld itself to the UUID list addDyldImageToUUIDList

// add dyld itself to UUID list
		addDyldImageToUUIDList();
Copy the code

3.4 reloadAllImages

3.4.1 Instantiate the main programinstantiateFromLoadedImage

// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
	}
	
	throw "main executable not a known format";
}
Copy the code
  1. In the judgment condition,isCompatibleMachOWill go to Mach – OheadTo test for compatibility

  1. throughinstantiateMainExecutableIn thesniffLoadCommandsLoading the main program is actually yesMachOIn the fileLoadCommonsSome columns of the segment are loaded
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
											unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
											const linkedit_data_command** codeSigCmd,
											const encryption_info_command** encryptCmd)
{
    *compressed = false; *segCount = 0; *libCount = 0; *codeSigCmd = NULL; *encryptCmd = NULL; / *... */ // fSegmentsArrayCount is only 8-bitsif ( *segCount > 255 )
		dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);

	// fSegmentsArrayCount is only 8-bits
	if ( *libCount > 4095 )
		dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);

	if ( needsAddedLibSystemDepency(*libCount, mh) )
		*libCount = 1;
}
Copy the code

Let’s explain a few parameters here:

  • Compressed -> Based on LC_DYLD_INFO_ONYL.
  • SegCount Number of segCount commands. The maximum number of segCount commands cannot exceed 255.
  • LibCount number of dependent libraries, LC_LOAD_DYLIB (Foundation/UIKit..) , a maximum of 4095.
  • CodeSigCmd, apply the signature
  • EncryptCmd, which applies the encrypted information
  1. After the image file is generated, add it to the sAllImages global image
static void addImage(ImageLoader* image) { // add to master list allImagesLock(); sAllImages.push_back(image); allImagesUnlock(); . }Copy the code

After the above steps, the instantiation of the main program is complete

3.4.2 Loading and inserting the dynamic library

// load any inserted libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) {for(const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); }Copy the code

3.4.3 Link the main program

// link main executable
		gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
		if ( mainExcutableAlreadyRebased ) {
			// previous link() on main executable has already adjusted its internal pointers for ASLR
			// work around that by rebasing by inverse amount
			sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
		}
#endif
		link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
		sMainExecutable->setNeverUnloadRecursive();
		if ( sMainExecutable->forceFlat() ) {
			gLinkContext.bindFlat = true;
			gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
		}
Copy the code

Link each dynamic library in the main program in link(sMainExecutable, SenV. DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1) for symbolic binding

At this point, configure the environment variables -> load the shared cache -> instantiate the main program -> load the dynamic library -> link the dynamic library

3.5 initializeMainExecutable() runs all initializers

Here will bring the main executable file and run the initialization program The call order initializeMainExecutableinitializeMainExecutable – > runInitializers – > ProcessInitializers -> Call recursiveInitialization

void initializeMainExecutable()
{
	// record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(),  initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code

3.5.1 Preparing for InitializationrunInitializers

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.images[0] = this;
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}
Copy the code

3.5.2 processInitializersIterate over image.count, recursively initializing the image ‘

// <rdar://problem/14412057> upward dylib initializers can be run too soon
// To handle dangling dylibs whichare upward linked but not downward, all upward linked dylibs // have their initialization postponed until after the recursion through downward dylibs // has  completed. void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread, InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images) { uint32_t maxImageCount = context.imageCount()+2; ImageLoader::UninitedUpwards upsBuffer[maxImageCount]; ImageLoader::UninitedUpwards& ups = upsBuffer[0]; ups.count = 0; // Calling recursive init on all imagesin images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code

3.5.3 recursiveInitialization The image initialization is obtained

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
    ...
    uint64_t t1 = mach_absolute_time();
	fState = dyld_image_state_dependents_initialized;
	oldState = fState;
	context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
	// initialize this image
	bool hasInitializers = this->doInitialization(context);

	// letanyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); . }Copy the code

NotifySingle gets the callback of the mirror

void (*notifySingle)(dyld_image_states, const ImageLoader* image, InitializerTimingList*);
Copy the code

Based on the call stack, we know that the next step is to call load_images

sNotifyObjCInit
registerObjCNotifiers

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true, NULL, false.true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector
      
       ::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}
      *>Copy the code

A global search shows that the registerObjCNotifiers are only called in the _dyLD_OBJC_notify_register

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
   _dyld_objc_notify_init      init,
   _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

Here is a brief explanation of the parameters:

  • Map_images: This function is triggered when dyld loads the image into memory.
  • Load_images: This method is triggered when dyld initializes the image (the familiar load method is also called here).
  • Unmap_image: Triggered when dyld removes the image.

By means of a symbolic breakpoint, we find that _dyLD_OBJC_notify_register is called at _objc_init

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    lock_init();
    exception_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
Copy the code

3.5.4 doInitialization

This is the system-specific c++ constructor.

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

2.6.2doModInitFunctions

void ImageLoaderMachO::doModInitFunctions(const LinkContext& context) {too much code.Copy the code

Here the system’s libSystem is loaded first, followed by libDispatch.

3.6 notifyMonitoringDyldMain Monitors the main of the DYLD

3.7 Find the entry to the main program

// find entry point for main executable
  result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code

At this point, the dyLD loading process ends

Four:

The process diagram

Process breakdown

Five: reference

IOS Reverse – DyLD shared cache iOS bottom – Clean up the DYLD loading process from scratch