Application loading: From DYld to OBJC

To understand application loading, we need to understand the following questions:

How does the code we write get loaded into memory?
How are the dynamic and static libraries we use loaded into memory?
How is ObjC started?

All of our programs rely on libraries such as UIKit, CoreFoundation, libSystem, etc. These libraries are actually executable binaries that can be loaded into memory by the operating system. There are two types of libraries: dynamic libraries and static libraries.

The whole compilation process is shown as follows:

First of all, our code will go through the pre-compilation stage, analyze the lexical and syntax tree, and then hand it over to the compiler for compilation. The corresponding assembly file will be generated, and these content links will be loaded into memory to generate our executable file.

Dynamic library and static library are dynamic link, one is static link.

Static linking: loading one by one into memory creates repetitive problems and wastes performance. Dynamic link: not load directly, load into memory by mapping, memory is shared only one copy, most Apple libraries are dynamic libraries.

How are static and dynamic libraries loaded into memory? Connector DYLD is required. The function of DYLD is shown as follows:

Let’s start with dyld by creating a breakpoint at main and executing the program:

If you click on the function stack to execute the start method before the main function, you can see that the start method is executed in dyld:

We add a symbolic breakpoint named start, execute the program, and find that the breakpoint does not break to start

Why didn’t you stop at Start? Prove that something other than start is really called underneath. We know that load is running before the program executes, so let’s add a load and hit a breakpoint,

Look at the function call stack. Click on _dyLD_START at the bottom

Dyld source code can be downloaded from the official website. _dyLD_START = _dyLD_start = arm64

Here we can analyze the main code logic, with the comment to find the start method call:

Globally search the dyLDBootstrap namespace to find the start method

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[], const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue) { // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536> dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0); // if kernel had to slide dyld, we need to fix up load sensitive locations // we have to do this before using any global variables rebaseDyld(dyldsMachHeader); // kernel sets up env pointer to be just past end of agv array const char** envp = &argv[argc+1]; // kernel sets up apple pointer to be just past end of envp array const char** apple = envp; while(*apple ! = NULL) { ++apple; } ++apple; // set up random value for stack canary __guard_setup(apple); #if DYLD_INITIALIZER_SUPPORT // run all C++ initializers inside dyld runDyldInitializers(argc, argv, envp, apple); #endif _subsystem_init(apple); // now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code

The most important thing is the last line, the main function, and when YOU click in there, there’s a lot of code in main, almost 1000 lines of code, and it’s a little tricky to look at, because we know that this function has a return value, so let’s look at what the return value is and what the operations are.

The return value is result, where:

result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
result = (uintptr_t)&fake_main;

SMainExecutable initialization (); sMainExecutable ();

/// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code

See comments written also very clear, initialize the image file. Click in:

static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
//	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
//	}
	
//	throw "main executable not a known format";
}
Copy the code

Click enter instantiateMainExecutable, See sniffLoadCommands(mh, PATH, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd); This code just loads the file in machO format.

Shared cache processing:

// load shared cache
checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
Copy the code

Load the inserted dynamic library:

// load any inserted libraries if ( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) { for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); }Copy the code

Link main program:

link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
Copy the code

Link inserted dynamic library:

link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
Copy the code

Weak reference binding main program:

// <rdar://problem/12186933> do weak binding only after all inserted images linked
sMainExecutable->weakBind(gLinkContext);
Copy the code

Initialize the main program:

// run all initializers
initializeMainExecutable(); 
Copy the code

Callback function:

// notify any montoring proccesses that this process is about to enter main()
notifyMonitoringDyldMain();
Copy the code

initializeMainExecutable

void initializeMainExecutable() { // record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(),  initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code

First, get all the image files and run in a loop. Click in runInitializers to see how you can initialize them

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.imagesAndPaths[0] = { this, this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}
Copy the code

The most important thing here is processInitializers. Click in

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount()+2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code

RecursiveInitialization, Find the method of realization of void ImageLoader: : recursiveInitialization (const LinkContext & context, mach_port_t this_thread, Const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)

Initialize the dependent files first, and then initialize yourself. NotifySingle (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); . Search globally for sNotifyObjCInit and find assignment in registerObjCNotifiers to see where to call this method: _dyLD_OBJC_notify_register is the same name as the objC_init method in the objC source code. But the line from dyLD to OBJC has been broken by now, and we need to find another way.

Open objC source code project, create a breakpoint in _objc_init source code to view the function call stack:

See the execution process from the bottom up:

_dyld_start
dyldbootstrap::start
dyld::_main
dyld::initializeMainExecutable()
runInitializers
processInitializers
recursiveInitialization
doInitialization
doModInitFunctions

This is a dyld process. If you want to use the “_OS_object_init” method at libDispatch, download the source code and search for “_OS_object_init” globally

void
_os_object_init(void)
{
	_objc_init();
	Block_callbacks_RR callbacks = {
		sizeof(Block_callbacks_RR),
		(void (*)(const void *))&objc_retain,
		(void (*)(const void *))&objc_release,
		(void (*)(const void *))&_os_objc_destructInstance
	};
	_Block_use_RR2(&callbacks);
#if DISPATCH_COCOA_COMPAT
	const char *v = getenv("OBJC_DEBUG_MISSING_POOLS");
	if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
	v = getenv("DISPATCH_DEBUG_MISSING_POOLS");
	if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
	v = getenv("LIBDISPATCH_DEBUG_MISSING_POOLS");
	if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
#endif
}
Copy the code

Os_object_init is not called by libdispatch_init, but is called by libdispatch_init. Libdispatch_init = libSystem_initializer = libSystem_initializer = libSystem_initializer The libSystem_initializer method was not called, and the dyld source code was used to search for the libSystem_initializer method. You can see that framing is the method associated with calling libSystem_initializer:

DoModInitFunctions are called by the doInitialization method and are finally located:

At this point, the entire project has been analyzed from how dyLD loads into objc_init. We can see that _dyLD_OBJC_notify_register (&map_images, load_images, unmap_image); Method calls, map_images and load_image are arguments, so when did they get called? It’s got to be somewhere else. Keep exploring.

Search the dyLD source code for _dyLD_OBJC_notify_register and find that there are several calls. Which one is executed? Create a new project and run the _dyLD_OBJC_notify_register symbol breakpoint:

So the code for the _dyLD_OBJC_notify_register call should look like this:

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	if ( gUseDyld3 )
		return dyld3::_dyld_objc_notify_register(mapped, init, unmapped);

	DYLD_LOCK_THIS_BLOCK;
    typedef bool (*funcType)(_dyld_objc_notify_mapped, _dyld_objc_notify_init, _dyld_objc_notify_unmapped);
    static funcType __ptrauth_dyld_function_ptr p = NULL;

	if(p == NULL)
	    dyld_func_lookup_and_resign("__dyld_objc_notify_register", &p);
	p(mapped, init, unmapped);
}
Copy the code

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    log_apis("_dyld_objc_notify_register(%p, %p, %p)\n", mapped, init, unmapped);

    gAllImages.setObjCNotifiers(mapped, init, unmapped);
}
Copy the code

To find the map_images and load_images calls, you need to find the _objcNotifyMapped and _objcNotifyInit calls.

Before analyzing notifySingle call sNotifyObjCInit, find where sNotifyObjCInit is assigned,

The registerobjC_notify_register call, that is, _DYLD_OBJC_notify_register initializes three parameters, That is, a notifySingle initializes the assignment of the method to be executed, but the exact call is unknown.

Before we saw the _dyLD_OBJC_notify_register method:

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

Objc map_images load_images umap_image map_images load_images umap_image

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
	for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
		ImageLoader* image = *it;
		if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
			dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
			(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		}
	}
}
Copy the code

So now we need to look for sNotifyObjCMuttered (), sNotifyObjCInit(), sNotifyObjCUnmapped(), These are the three function calls that execute the ones we care about most: map_images, load_images, and umap_image.

Global searchsNotifyObjCMappedLook for the call and find_dyld_objc_notify_register->registerObjCNotifiers->notifyBatchPartial->sNotifyObjCMappedThe process calls, that is, when the function is executed, the method corresponding to the first argument is executed.
Global searchsNotifyObjCInit, there are two calls, a(dyld_image_state_initialized) :_dyld_objc_notify_register->registerObjCNotifiers->sNotifyObjCInit(), will be called directly when loading their own libraryload_images. b(dyld_image_state_dependents_initialized) :notifySingle->sNotifyObjCInit()When you load your own dependency library, the dependency library is passednotifySingleTo invoke theload_images.
Global searchsNotifyObjCUnmappedAnd found thatremoveImageWill be called to remove the image file, temporarily do not in-depth analysis.

To summarize the flow chart of how to load objC from dyld and the map_images and load_images calls:

Load C++ main execution sequence

Now let’s look at another interesting phenomenon. In our main function, we say Hello, World! And add a C++ method to the main function:

int main(int argc, const char * argv[]) { @autoreleasepool { SJLog(@"Hello, World!" ); } return 0; } __attribute__ ((constructor)) void sjFunc () {printf (" c + + to % s \ n ", __func__); }Copy the code

Add SJPerson class and implement load method:

@implementation SJPerson

+ (void)load
{
    SJLog(@"%s", __func__);
}

@end
Copy the code

Run the program and you can see the printed result as follows:

The load method is executed first, then the C++ method, and finally the main function. To investigate the order in which methods are executed, let’s first look at how the load method is called:

void load_images(const char *path __unused, const struct mach_header *mh) { if (! didInitialAttachCategories && didCallDyldNotifyRegister) { didInitialAttachCategories = true; loadAllCategories(); } // Return without taking locks if there are no +load methods here. if (! hasLoadMethods((const headerType *)mh)) return; recursive_mutex_locker_t lock(loadMethodLock); // Discover load methods { mutex_locker_t lock2(runtimeLock); prepare_load_methods((const headerType *)mh); } // Call +load methods (without runtimeLock - re-entrant) call_load_methods(); }Copy the code

First, prepare_load_methods prepares all the load methods,

void prepare_load_methods(const headerType *mhdr) { size_t count, i; runtimeLock.assertLocked(); classref_t const *classlist = _getObjc2NonlazyClassList(mhdr, &count); for (i = 0; i < count; i++) { schedule_class_load(remapClass(classlist[i])); } category_t * const *categorylist = _getObjc2NonlazyCategoryList(mhdr, &count); for (i = 0; i < count; i++) { category_t *cat = categorylist[i]; Class cls = remapClass(cat->cls); if (! cls) continue; // category for ignored weak-linked class if (cls->isSwiftStable()) { _objc_fatal("Swift class extensions and categories  on Swift " "classes are not allowed to have +load methods"); } realizeClassWithoutSwift(cls, nil); ASSERT(cls->ISA()->isRealized()); add_category_to_loadable_list(cat); }}Copy the code

This implements the load method that prepares and invokes classes and categories. Why is the C++ method called next? Make a breakpoint in the C++ method to debug the print function call stack:

Check the doInitialization call in the dyld source code:

This is a position we’re familiar with, as explained above, where doInitialization calls the C++ method we wrote.

To do this, add a C++ method to objc-os.mm and execute the program:

__attribute__ ((constructor)) void sjFunc() {printf("objc ++ comes %s \n", __func__); }Copy the code

See the results of the execution:

That is, the load method is executed before the C++ method is executed in the same image file. DoInitialization loads all image file initialization, not including project initialization. The last C++ method is in the project.

After executing dyLD, how do we go to main in our project? At the C++ method break point, the assembly is displayed after entering the breakpoint:

Out of this method:

Read register:

Assembler finally execute jump rax, rax register is stored in our project main function.

Application loading: From DYld to OBJC

initializeMainExecutable

Load C++ main execution sequence

Related Posts

IOS bolts-objc Code interpretation

🐻 Setting up the Swift server: Ubuntu16.04 + Vapor + Swift5.2.2 release

Swift Progression (I) — Classes and Structures