First, application start up preliminary study

1. Print sequence

Take a look at this code and think about the output order of the statements

@interface Person : NSObject

@end

@implementation Person

+ (void)load {
    printf("----------load-----------: %s\n", __func__);
}

@end

__attribute__((constructor)) void cc_func (a) {
    printf("--------cc_func----------: %s\n", __func__);
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        // insert code here...
        NSLog(@"Hello, World!");
    }
    return 0;
}
Copy the code

You guessed it, the output order is as follows:

----------load-----------: +[Person load]
--------cc_func----------: cc_func
Dyld[40374:1115383] Hello, World!
Copy the code

It’s in this order:

Constructor method --> main()Copy the code

2. Before main

Are you confused?

Isn’t main an entry function? Why wasn’t Main executed first?

There is usually a list of things to do before main

The picture above clearly shows the start-up process and stages.

Breakpoint load method

Break in the load method and print the stack

Output:

Thread #1, queue = 'com.apple.main-thread', stop reason = breakPoint 8.1 * frame #0: 0x0000000100003e60 Dyld`+[Person load](self=Person, _cmd="load") at main.m:17:5 frame #1: 0x00007fff203ab4d6 libobjc.A.dylib`load_images + 1556 frame #2: 0x0000000100016527 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425 frame #3: 0x000000010002c794 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 474 frame #4: 0x000000010002a55f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191 frame #5: 0x000000010002a600 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82 frame #6: 0x00000001000168b7 dyld`dyld::initializeMainExecutable() + 199 frame #7: 0x000000010001ceb8 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702 frame #8: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450 frame #9: 0x0000000100015025 dyld`_dyld_start + 37Copy the code

4. Startup process (pre-main)

So let’s take a look at what we did before main, just to get a sense of what we’re going to do

5. Startup Process (Main)

The main function and its subsequent stages should be familiar to you

Second, the dyld

1. What is dyLD?

Dyld (The Dynamic Link Editor), dynamic linker. Dyld is a user-mode process that is part of Darwin maintained by Apple (DYLD) and is located at: /usr/lib/dyld to load dynamic libraries.

2. Dyld role

  • Responsible for linking and loading the program. Applications are compiled and packaged into executable filesMach-OAfter, start up bydyldResponsible for linking and loading programs into memory.
  • Symbol binding. Because almost everything on OS X is dynamically linked,Mach-OThere are many references to external libraries and symbols in the file, so you need to populate the index at startupdyldTo execute. This process is also known as symbol binding (binding).

3. Dyld loading process

  • How is DYLD loaded?
  • How is the program initialized?

In the breakpoint bt diagram, we see that DYLD has a _dyLD_START method. When I analyze it, I find that it is implemented in assembly. Let’s take a look.

When any new process starts, the kernel sets the user mode entry point to __dyLD_START.

The specific invocation diagram is as follows:

4._dyld_start

Dyldstartup. s This is assembly code, let’s look at it briefly

#if__arm64__ && ! TARGET_OS_SIMULATOR
	.text
	.align 2
	.globl __dyld_start
__dyld_start:
	mov 	x28, sp
	and     sp, x28, #~15		// force 16-byte alignment of stack
	mov	x0, #0
	mov	x1, #0
	stp	x1, x0, [sp, #- 16]!	// make aligned terminating frame
	mov	fp, sp			// set up fp to point to terminating frame
	sub	sp, sp, #16             // make room for local variables
#if __LP64__
	ldr     x0, [x28]               // get app's mh into x0
	ldr     x1, [x28, #8]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
	add     x2, x28, #16            // get argv into x2
#else
	ldr     w0, [x28]               // get app's mh into x0
	ldr     w1, [x28, #4]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
	add     w2, w28, #8             // get argv into x2
#endif
	adrp	x3,___dso_handle@page
	add 	x3,x3,___dso_handle@pageoff // get dyld's mh in to x4
	mov	x4,sp                   // x5 has &startGlue

	// call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
	bl	__ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
	mov	x16,x0                  // save entry point address in x16.Copy the code

From the comments, you can see that the dyLDBootstrap ::start(app_MH, argc, argv, DyLD_MH, &startGlue) method is called, which is also seen in the screenshot in the previous section.

5.dyldbootstrap::start

This method is the start method in C++ namespace under the actual dyldbootstrap. The code is as follows:

DyldInitialization. CPP implementation

namespace dyldbootstrap {
    ...
//
// This is code to bootstrap dyld. This work in normally done for a program by dyld and crt.
// In dyld we have to do this manually.
//
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0.0.0.0);

	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
    rebaseDyld(dyldsMachHeader);

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple ! =NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(argc, argv, envp, apple);
#endif

	_subsystem_init(apple);

	// now that we are done bootstrapping dyld, call dyld's main
	uintptr_t appsSlide = appsMachHeader->getSlide(a);returndyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }}Copy the code

The function finally executes the first argument to dyld::_main(), which we see as macho_header

This may be familiar to us if we know the Mach-O structure. Dyld is used to load Mach-O files, so this should give you an idea.

Start function operation

  • According to thedyldsMachHeaderTo calculate theslideTo determine whether relocation is required (in the rebaseDyld function)
  • Mach_init () initialization operation (in rebaseDyld function)
  • Overflow protection
  • To calculateappsMachHeaderOffset, calldyld::_mainfunction

Let’s focus on the dyld::_main operation

6.dyld::_main()

Dyld ::main function implementation

//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
	if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
		launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0.0);
	}

	//Check and see if there are any kernel flags
	dyld3::BootArgs::setFlags(hexToUInt64(_simple_getenv(apple, "dyld_flags"), nullptr));

#if __has_feature(ptrauth_calls)
	// Check and see if kernel disabled JOP pointer signing (which lets us load plain arm64 binaries)
	if ( const char* disableStr = _simple_getenv(apple, "ptrauth_disabled")) {if ( strcmp(disableStr, "1") = =0 )
			sKeysDisabled = true;
	}
	else {
		// needed until kernel passes ptrauth_disabled for arm64 main executables
		if ( (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_V8) || (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_ALL) )
			sKeysDisabled = true;
	}
#endif

    // Grab the cdHash of the main executable from the environment
	uint8_t mainExecutableCDHashBuffer[20];
	const uint8_t* mainExecutableCDHash = nullptr;
	if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash")) {unsigned bufferLenUsed;
		if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) )
			mainExecutableCDHash = mainExecutableCDHashBuffer;
	}

	getHostInfo(mainExecutableMH, mainExecutableSlide);

#if! TARGET_OS_SIMULATOR
	// Trace dyld's load
	notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
	// Trace the main executable's load
	notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif

	uintptr_t result = 0; sMainExecutableMachHeader = mainExecutableMH; sMainExecutableSlide = mainExecutableSlide; .return result;
}
Copy the code

The code is quite long, so let’s discard the useless or non-main flow code and analyze the main flow:

  1. Environment Variable Configuration
    • Set the corresponding value according to the environment variable to get the current operating architecture
  2. Shared cache
    • Check whether shared cache is enabled and whether the shared cache maps to the shared area
  3. Initialization of the main program
    • callinstantiateFromLoadedImageThe function instantiates oneImageLoaderobject
  4. Insert dynamic library
    • traverseDYLD_INSERT_LIBRARIESEnvironment variable, callloadInsertedDylibloading
  5. The link of the main program
  6. Dynamic link library
  7. Weak sign binding
  8. Execute the initialization method
  9. Look for the main program entry, i.emainfunction

The illustration is as follows:

1). Dyld environment variable

  • Get the cdHash of the main executable from the environment variable
  • To obtainMach-OPlatform, architecture, and other information in header files
  • Check setting environment variables:checkEnvironmentVariables(envp)
  • inDYLD_FALLBACKSet the default value when null:defaultUninitializedFallbackPaths(envp)

The relevant code

// Line: 6366
// Grab the cdHash of the main executable from the environment
// cdHash to get the main executable from the environment
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash")) {unsigned bufferLenUsed;
    if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) )
        mainExecutableCDHash = mainExecutableCDHashBuffer;
}
// Get the current runtime architecture information from the Mach-o header
getHostInfo(mainExecutableMH, mainExecutableSlide);

// Line: 6453
CRSetCrashLogMessage("dyld: launch started");
// Set the context according to the executable header, parameters, etc
setContext(mainExecutableMH, argc, argv, envp, apple);

// Pickup the pointer to the exec path.
// Get the executable file path
sExecPath = _simple_getenv(apple, "executable_path");

// Line: 6535
{
    checkEnvironmentVariables(envp);          // Check setting environment variables
    defaultUninitializedFallbackPaths(envp);  // Set the default value when DYLD_FALLBACK is null
}
Copy the code

This can be done by setting environment variables in Scheme, as described in the dyld2.cpp file

Dyld environment variable

struct EnvironmentVariables {
	const char* const *			DYLD_FRAMEWORK_PATH;
	const char* const *			DYLD_FALLBACK_FRAMEWORK_PATH;
	const char* const *			DYLD_LIBRARY_PATH;
	const char* const *			DYLD_FALLBACK_LIBRARY_PATH;
	const char* const *			DYLD_INSERT_LIBRARIES;
	const char* const *			LD_LIBRARY_PATH;			// for unix conformance
	const char* const *			DYLD_VERSIONED_LIBRARY_PATH;
	const char* const *			DYLD_VERSIONED_FRAMEWORK_PATH;
	bool						DYLD_PRINT_LIBRARIES_POST_LAUNCH;
	bool						DYLD_BIND_AT_LAUNCH;
	bool						DYLD_PRINT_STATISTICS;
	bool						DYLD_PRINT_STATISTICS_DETAILS;
	bool						DYLD_PRINT_OPTS;
	bool						DYLD_PRINT_ENV;
	bool						DYLD_DISABLE_DOFS;
	boolhasOverride; . };Copy the code

Example:

  • DYLD_PRINT_OPTS = YES
  • DYLD_PRINT_ENV = YES, print all environment variables
  • OBJC_PRINT_LOAD_METHODS Displays calls to the + (void)load methods of Class and Category
  • OBJC_PRINT_INITIALIZE_METHODS Prints the call information for Class + (void)initialize

2). SharedCache

App may use a lot of system dynamic libraries, such as UIKit and Foundation, which are system dynamic libraries. After App startup, it will be time-consuming to load dynamic libraries when the corresponding dynamic library capabilities are needed. Therefore, the system has put the dynamic libraries used by iOS into the dynamic library cache in advance. Will the big cache file into the iOS directory (/ System/Library/Caches/com. Apple. Dyld /), to improve the performance of application startup, this is the role of dynamic Library cache.

There is a way to extract dynamic libraries from the dynamic shared cache. You can use the launch-cache/dsc_extractor.cpp in dyLD source code to extract dynamic libraries

  • will#if 0Code and#endifdelete
  • Compile ` dsc_extractor. CPP
clang++ -o  desc_extractor  desc_extractor.cpp
Copy the code
  • Using desc_extractor
/ desc_Extractor Dynamic library shared cache file directory stores the result folderCopy the code

The code involved in the shared cache is:

  • CheckSharedRegionDisable Checks whether shared cache is enabled (mandatory in iOS)
  • MapSharedCache Loads the shared cache library
    • Only the current process is loadedmapCachePrivate(Emulator only supports loading into current process)
    • The shared cache is loaded for the first timemapCacheSystemWide
    • The shared cache is not loaded for the first time, so nothing is done
MapSharedCache --> loadDyldCache --> mapCachePrivate ├ -> mapCacheSystemWideCopy the code

The relevant code

    // Line: 6584
	// load shared cache
    // Check whether shared cache is enabled. IOS is required
	checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
	if( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) {#if TARGET_OS_SIMULATOR
		if ( sSharedCacheOverrideDir)
			mapSharedCache(mainExecutableSlide);
#else
        // Check whether the shared cache maps to the shared region
		mapSharedCache(mainExecutableSlide);
#endif
	}

// Line: 4078
static void mapSharedCache(uintptr_t mainExecutableSlide)
{
	dyld3::SharedCacheOptions opts;
	opts.cacheDirOverride	= sSharedCacheOverrideDir;
	opts.forcePrivate		= (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion);
#if__x86_64__ && ! TARGET_OS_SIMULATOR
	opts.useHaswell			= sHaswell;
#else
	opts.useHaswell			= false;
#endif
	opts.verbose			= gLinkContext.verboseMapping;
    // <rdar://problem/32031197> respect -disable_aslr boot-arg
    // <rdar://problem/56299169> kern.bootargs is now blocked
	opts.disableASLR		= (mainExecutableSlide == 0) && dyld3::internalInstall(a);// infer ASLR is off if main executable is not slid
	loadDyldCache(opts, &sSharedCacheLoadInfo);

	// update global state
	if( sSharedCacheLoadInfo.loadAddress ! =nullptr ) {
		gLinkContext.dyldCache 								= sSharedCacheLoadInfo.loadAddress;
		dyld::gProcessInfo->processDetachedFromSharedRegion = opts.forcePrivate;
		dyld::gProcessInfo->sharedCacheSlide                = sSharedCacheLoadInfo.slide;
		dyld::gProcessInfo->sharedCacheBaseAddress          = (unsigned long)sSharedCacheLoadInfo.loadAddress;
		sSharedCacheLoadInfo.loadAddress->getUUID(dyld::gProcessInfo->sharedCacheUUID);
		dyld3::kdebug_trace_dyld_image(DBG_DYLD_UUID_SHARED_CACHE_A, sSharedCacheLoadInfo.path, (const uuid_t *)&dyld::gProcessInfo->sharedCacheUUID[0] and {0.0}, {{ 0.0}},constmach_header *)sSharedCacheLoadInfo.loadAddress); }}// Line: 858
bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results)
{
    results->loadAddress        = 0;
    results->slide              = 0;
    results->errorMessage       = nullptr;

#if TARGET_OS_SIMULATOR
    // simulator only supports mmap()ing cache privately into process
    return mapCachePrivate(options, results);
#else
    if ( options.forcePrivate ) {
        // mmap cache into this process only Loads the current process
        return mapCachePrivate(options, results);
    }
    else {
        // fast path: when cache is already mapped into shared region
        bool hasError = false;
        if ( reuseExistingCache(options, results) ) { hasError = (results->errorMessage ! =nullptr);      // Already loaded
        } else {
            // slow path: this is first process to load cache
            hasError = mapCacheSystemWide(options, results);    // First load
        }
        return hasError;
    }
#endif
}
Copy the code

3). Main program initialization

  • throughinstantiateFromLoadedImageTo obtainImageLoader
  • ImageLoaderMachO::instantiateMainExecutablecreateImageLoader(Main program)
  • sniffLoadCommandsThe function getsMach-OOf the fileLoad CommandPerform various checks

The relevant code

        // Line: 6860
		CRSetCrashLogMessage(sLoadingCrashMessage);
		// instantiate ImageLoader for main executable
        // Load the executable to generate an ImageLoader instance
		sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
		gLinkContext.mainExecutable = sMainExecutable;
		gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);

// Line: 3092
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
//	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
//	}
	
//	throw "main executable not a known format";
}

// ImageLoaderMachO.cpp Line: 566
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
	//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
	//	sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
	bool compressed;
	unsigned int segCount;
	unsigned int libCount;
	const linkedit_data_command* codeSigCmd;
	const encryption_info_command* encryptCmd;
	sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
	// instantiate concrete class based on content of load commands
	if ( compressed ) 
		return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
	else
#if SUPPORT_CLASSIC_MACHO
		return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
		throw "missing LC_DYLD_INFO load command";
#endif
}
Copy the code

4). Insert dynamic library

In this step, loadInsertedDylib is called to load the library that you’re walking through, so you can do security attacks, LoadInsertedDylib checks for dylib signatures from paths such as DYLD_ROOT_PATH, LD_LIBRARY_PATH, and DYLD_FRAMEWORK_PATH.

The relevant code

        // Line: 6974
		// load any inserted libraries
        // Load all the libraries specified by DYLD_INSERT_LIBRARIES
		if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
			for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib) 
				loadInsertedDylib(*lib);
		}
		// record count of inserted libraries so that a flat search will look at 
		// inserted libraries, then main, then others.
		sInsertedDylibCount = sAllImages.size(a)- 1;

// Line: 5176
static void loadInsertedDylib(const char* path)
{
	unsigned cacheIndex;
	try {
		LoadContext context;
		context.useSearchPaths		= false;
		context.useFallbackPaths	= false;
		context.useLdLibraryPath	= false;
		context.implicitRPath		= false;
		context.matchByInstallName	= false;
		context.dontLoad			= false;
		context.mustBeBundle		= false;
		context.mustBeDylib			= true;
		context.canBePIE			= false;
		context.origin				= NULL;	// can't use @loader_path with DYLD_INSERT_LIBRARIES
		context.rpath				= NULL;
		load(path, context, cacheIndex);
	}
	catch (const char* msg) {
		if ( gLinkContext.allowInsertFailures )
			dyld::log("dyld: warning: could not load inserted library '%s' into hardened process because %s\n", path, msg);
		else
			halt(dyld::mkstringf("could not load inserted library '%s' because %s\n", path, msg));
	}
	catch(...). {halt(dyld::mkstringf("could not load inserted library '%s'\n", path)); }}Copy the code

5). Link the main program

The relevant code

        // Line: 6982
		// link main executable
        // Link the main program
		gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
		if ( mainExcutableAlreadyRebased ) {
			// previous link() on main executable has already adjusted its internal pointers for ASLR
			// work around that by rebasing by inverse amount
			sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
		}
#endif
		link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
		sMainExecutable->setNeverUnloadRecursive(a);if ( sMainExecutable->forceFlat() ) {
			gLinkContext.bindFlat = true;
			gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
		}
Copy the code

6). Link dynamic library

The relevant code

        // Line: 6999
		// link any inserted libraries
		// do this after linking main executable so that any dylibs pulled in by inserted 
		// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
        // Link all inserted dynamic libraries
		if ( sInsertedDylibCount > 0 ) {
			for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
				ImageLoader* image = sAllImages[i+1];
				link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
				image->setNeverUnloadRecursive(a); }if ( gLinkContext.allowInterposing ) {
				// only INSERTED libraries can interpose
				// register interposing info after all inserted libraries are bound so chaining works
				for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
					ImageLoader* image = sAllImages[i+1];
					image->registerInterposing(gLinkContext); // Register symbol insertion}}}Copy the code

7). Weak symbol binding

The relevant code

        // Line: 7060
		// apply interposing to initial set of images
		for(int i=0; i < sImageRoots.size(a); ++i) {// Apply symbol insertion
			sImageRoots[i]->applyInterposing(gLinkContext);
		}
		ImageLoader::applyInterposingToDyldCache(gLinkContext);

		// Bind and notify for the main executable now that interposing has been registered
		uint64_t bindMainExecutableStartTime = mach_absolute_time(a);/ / note:
		sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
		uint64_t bindMainExecutableEndTime = mach_absolute_time(a); ImageLoaderMachO::fgTotalBindTime += bindMainExecutableEndTime - bindMainExecutableStartTime; gLinkContext.notifyBatch(dyld_image_state_bound, false);

		// Bind and notify for the inserted images now interposing has been registered
		if ( sInsertedDylibCount > 0 ) {
			for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
				ImageLoader* image = sAllImages[i+1];
				image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true.nullptr); }}// <rdar://problem/12186933> do weak binding only after all inserted images linked
		// Weak symbol binding
		sMainExecutable->weakBind(gLinkContext);
		gLinkContext.linkingMainExecutable = false;

		sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);
Copy the code

8). Execute the initialization method

The relevant code

    // Line: 7087
		CRSetCrashLogMessage("dyld: launch, running initializers");
	#if SUPPORT_OLD_CRT_INITIALIZATION
		// Old way is to run initializers via a callback from crt1.o
		if(! gRunInitializersOldWay )initializeMainExecutable(a);#else
		// run all initializers
        // Perform initialization
		initializeMainExecutable(a);#endif

// Line: 1636
void initializeMainExecutable(a)
{
	// record that we've reached this step
	gLinkContext.startedInitializingMainExecutable = true;

	// run initialzers for any inserted dylibs
	ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
	initializerTimes[0].count = 0;
	const size_t rootCount = sImageRoots.size(a);if ( rootCount > 1 ) {
		for(size_t i=1; i < rootCount; ++i) {
			sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); }}// run initializers for main executable and everything it brings up 
	sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
	
	// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
	if( gLibSystemHelpers ! =NULL ) 
		(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL.NULL);

	// dump info if requested
	if ( sEnv.DYLD_PRINT_STATISTICS )
		ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
	if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
		ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}

// ImageLoader.cpp Line: 609
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time(a);mach_port_t thisThread = mach_thread_self(a); ImageLoader::UninitedUpwards up; up.count =1;
	up.imagesAndPaths[0] = { this.this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time(a); fgTotalInitTime += (t2 - t1); }// ImageLoader.cpp Line: 587
// <rdar://problem/14412057> upward dylib initializers can be run too soon
// To handle dangling dylibs which are upward linked but not downward, all upward linked dylibs
// have their initialization postponed until after the recursion through downward dylibs
// has completed.
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount() +2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}

// ImageLoader.cpp Line: 1595
// Get the image initialization
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
	recursive_lock lock_info(this_thread);
	recursiveSpinLock(lock_info);

	if ( fState < dyld_image_state_dependents_initialized- 1 ) {
		uint8_t oldState = fState;
		// break cycles
		fState = dyld_image_state_dependents_initialized- 1;
		try {
			// initialize lower level libraries first
			for(unsigned int i=0; i < libraryCount(a); ++i) { ImageLoader* dependentImage =libImage(i);
				if( dependentImage ! =NULL ) {
					// don't try to initialize stuff "above" me yet
					if ( libIsUpward(i) ) {
						uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
						uninitUps.count++;
					}
					else if ( dependentImage->fDepth >= fDepth ) {
						dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); }}}// record termination order
			if ( this->needsTermination() )
				context.terminationRecorder(this);

			// let objc know we are about to initialize this image
			uint64_t t1 = mach_absolute_time(a); fState = dyld_image_state_dependents_initialized; oldState = fState; context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
			
			// initialize this image
			bool hasInitializers = this->doInitialization(context);

			// let anyone know we finished initializing this image
			fState = dyld_image_state_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_initialized, this.NULL);
			
			if ( hasInitializers ) {
				uint64_t t2 = mach_absolute_time(a); timingInfo.addTime(this->getShortName(), t2-t1); }}catch (const char* msg) {
			// this image is not initialized
			fState = oldState;
			recursiveSpinUnLock(a);throw; }}recursiveSpinUnLock(a); }Copy the code

NotifySingle function

The relevant code

// dyld2.cpp Line: 985
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{...if ( state == dyld_image_state_mapped ) {
		// <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache
		// <rdar://problem/50432671> Include UUIDs for shared cache dylibs in all image info when using private mapped shared caches
		if(! image->inSharedCache()
			|| (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) {
			dyld_uuid_info info;
			if ( image->getUUID(info.imageUUID) ) {
				info.imageLoadAddress = image->machHeader(a);addNonSharedCacheImageUUID(info); }}}if( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! =NULL) && image->notifyObjC()) {uint64_t t0 = mach_absolute_time(a);dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0.0);
        // Pay attention to this sentence
		(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		uint64_t t1 = mach_absolute_time(a);uint64_t t2 = mach_absolute_time(a);uint64_t timeInObjC = t1-t0;
		uint64_t emptyTime = (t2-t1)*100;
		if( (timeInObjC > emptyTime) && (timingInfo ! =NULL) ) {
			timingInfo->addTime(image->getShortName(), timeInObjC); }}... }// Line: 4643
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;     // Assign
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true.NULL.false.true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
	for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(a); it ! = sAllImages.end(a); it++) { ImageLoader* image = *it;if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC()) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0.0);
			(*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}// dyldAPIs.cpp line: 2188
// This function is only available to objC at runtime
The function dyLD_OBJC_notify_register needs to be searched in the libobjc source code
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);   // it is called here
}
Copy the code

A search for _dyLD_OBJC_notify_register in the objC4 source code shows that the method is called in _objc_init with parameters.

So sNotifyObjCInit is assigned to load_images in objC, and load_images calls all the +load methods. NotifySingle is a callback function.

The link length of the initialization process is relatively long. We will focus on it in the next section.

9). Main program entry

The code for the program entry in dyld2. CPP is as follows:

// Line: 7104
#if TARGET_OS_OSX
		if ( gLinkContext.driverKit ) {
			result = (uintptr_t)sEntryOverride;
			if ( result == 0 )
				halt("no entry point registered");
			*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
		}
		else
#endif
		{
			// find entry point for main executable
			result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN(a);if( result ! =0 ) {
				// main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
				if( (gLibSystemHelpers ! =NULL) && (gLibSystemHelpers->version >= 9) )
					*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
				else
					halt("libdyld.dylib support not present for LC_MAIN");
			}
			else {
				// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
				result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD(a); *startGlue =0; }}Copy the code

How to prove that the load method is called after the C++ constructor method?

The easiest way to do this is with breakpoints.

You can see the current breakpoint is in the load method

The current backtrace

Thread #1, queue = 'com.apple.main-thread', stop reason = breakPoint 8.1 * frame #0: 0x0000000100003e60 Dyld`+[Person load](self=Person, _cmd="load") at main.m:17:5 frame #1: 0x00007fff203ab4d6 libobjc.A.dylib`load_images + 1556 frame #2: 0x0000000100016527 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425 frame #3: 0x000000010002c794 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 474 frame #4: 0x000000010002a55f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191 frame #5: 0x000000010002a600 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82 frame #6: 0x00000001000168b7 dyld`dyld::initializeMainExecutable() + 199 frame #7: 0x000000010001ceb8 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702 frame #8: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450 frame #9: 0x0000000100015025 dyld`_dyld_start + 37Copy the code

__attribute__((constructor))void cc_func()

And finally, main()

The current backtrace

Thread #1, queue = 'com.apple.main-thread', stop reason = breakPoint 11.1 * frame #0: 0x0000000100003eb6 Dyld`main(argc=1, argv=0x00007ffeefbff3e8) at main.m:27:22 frame #1: 0x00007fff20528f3d libdyld.dylib`start + 1 frame #2: 0x00007fff20528f3d libdyld.dylib`start + 1Copy the code

As you can see, after executing the first two steps, you are back to _dyLD_START and then call main().

3. Initialization process

Above we have a clear idea of App loading, but is this the whole process?

Of course not. Now that we’ve dug a hole, let’s take a look at how the App loads and initializations.

1. Review

Review the previous breakpoint approach and explore step by step.

The breakpoint is in the +load method bt

(LLDB) thread #1, queue = 'com.apple.main-thread', stop reason = breakPoint 8.1 frame #0: 0x0000000100003e60 Dyld`+[Person load](self=Person, _cmd="load") at main.m:17:5 frame #1: 0x00007fff203ab4d6 libobjc.A.dylib`load_images + 1556 frame #2: 0x0000000100016527 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425 frame #3: 0x000000010002c794 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 474 frame #4: 0x000000010002a55f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191 frame #5: 0x000000010002a600 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82 * frame #6: 0x00000001000168b7 dyld`dyld::initializeMainExecutable() + 199 frame #7: 0x000000010001ceb8 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702 frame #8: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450 frame #9: 0x0000000100015025 dyld`_dyld_start + 37Copy the code

2.dyld start

From the stack information, we start the trace from dyLD_start

As you can see from the assembly, bootstrap::start comes next and the dyld::_main method is called

After that, initializeMainExecutalbe() is called

Called after the ImageLoader: : runInitializers

ImageLoader::processInit is then called

Internal calls the ImageLoader: : recursiveInitial

Then you see the notifySingle

You can see that it was registered in the Dyld Registerobjc Cnoti

Above that, you can see the internal call to _DYLD_OBJC_Notify_register

At this point, the trail is dead. There’s no more information

Return overdo see, this place missed a method ImageLoader: : doInitialization

How does it work internally?

DoInit implementation

3.libSystem init

Then you just have to look at the calls in libSystem

As you can see, it calls the Dispatch function internally

4.dispatch init

Libdispatch_init ()

Here you see a call to _objc_init

When we go to the next symbolic breakpoint, objc_init, we discover new ground

This method calls _objc_init of objC

5.objc init

Then we explore objC and find the function _dyLD_OBJC_Notify_register that we confused earlier

As you can see, this is where it gets called

Aha! At this point, a notifySingle is a callback function

It passes load_images as the second argument, so after execution, load_images is done

If you look at the call in loadImages, you can see why the Person Load method was called.

At this point, you should have a better understanding of dyLD loading and application initialization.

The direct relationship between each lib is shown below:

Let’s take a look at what objc_init does.