preface

Understanding dyLD’s loading process can help us understand the nature of iOS apps in a more systematic way. Dyld is an essential field, both in reverse direction and in underlying research. A clear sorting of the process can help us better understand some basic principles. For example, before we talked about the underlying principle of classification and studied the process in detail, load method call mechanism analysis, dyLD was inevitably mentioned.

This article will sort out and analyze the whole loading process, which will not be particularly detailed. After all, the whole process is too much, and any points that need to be mentioned will be introduced.

Tip: Familiarize yourself with the Mach-O file before reading this article.

1, dyld

1.1 introduction

Dyld The dynamic Link Editor. It is apple’s dynamic linker, an important part of Apple’s operating system. After the application is compiled and packaged into the executable file format of Mach-O, dyLD is responsible for linking and loading programs.

Dyld is open source, so you can download the source code from dyLD’s official website to read how it works and learn the details of how the system loads the dynamic library.

I downloaded dyLD-635.2 here.

1.2 Shared Cache

There is one essential thing for reading dyLD – shared cache.

Since iOS libraries such as UIKit/Foundation are loaded into memory via DyLD, To save space, Apple has placed these libraries in one place: Dyld shared cache (Mac OS).

Therefore, the implementation address of an NSlog-like function does not and cannot be in our own project’s Mach-O, so how can our project call the NSLog method find its real implementation address?

The process is as follows:

  • Generated when the project is compiledMach-OThere’s a space set aside in the executable, which is essentially a symbol table, and it’s stored there_DATAData segment (because_DATASegments are readable and writable at run time.

  • Compile-time: all projects that reference system library methods in the shared cache point to symbolic addresses. (For example, if there is an NSLog in the project, an NSLog is created in Mach -o at compile time, and the NSLog in the project points to this symbol.)

  • Runtime: When dyLD loads the application into memory, it does the binding based on the library files listed in Load Commands (NSLog, for example, Dyld will find the real address of the NSLog in Foundation and write it to the _DATA symbol table above the symbol of the NSLog.)

This process is called PIC technology.

With the system functions loaded, let’s look at the fishhook function name:

Rebind_symbols :: Rebinding symbols is straightforward.

The Fishhook principle is:

The symbols pointed to by the compiled system library functions are rebound at run time to the user specified function address, and then the real address of the original system function is assigned to the user specified pointer.

2. Dyld loading process

Create a new empty app project and add a load method to the ViewController.

+ (void)load{
    NSLog(The load to the "@");
}
Copy the code

The load method adds breakpoints. Runs the program. Looks at the function call stack.

Go to the entry _dyLD_START with the LLDB: bt + up/down command.

2.1 _dyld_start

Line 11 above: Call is the instruction to call a function, (same as bl). And that’s where we start our app.

When we click on an application, the system kernel starts a process and dyLD starts loading the executable file.

2.1.1 dyldbootstrap :: start

Dyldbootstrap ::start refers to the start function in the scope of the dyldbootstrap namespace.

Go to the source code, search for dyLDBootstrap, and find the start function.

CMD + Shift + j can locate the file

uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
				intptr_t slide, const struct macho_header* dyldsMachHeader,
				uintptr_t* startGlue)
{
    slide = slideOfMainExecutable(dyldsMachHeader);
    boolshouldRebase = slide ! =0;
#if __has_feature(ptrauth_calls)
    shouldRebase = true;
#endif
    if ( shouldRebase ) {
        rebaseDyld(dyldsMachHeader, slide);
    }

	mach_init();
	const char** envp = &argv[argc+1];
	const char** apple = envp;
	while(*apple ! =NULL) { ++apple; }
	++apple;

	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif
	uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
	return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code

This function first takes two arguments that we need to specify:

  • 1️, const struct macho_header* appsMachHeader, this parameter is Mach-O’s header. For this header, the Mach-O file is described in detail in the Mach-O file structure in this article.

  • 2️, INTptr_t slide, which is actually ALSR, namely, random loading of address space configuration is realized by a random value (i.e. slide).

    • When the address space that can be used and controlled by a particular process in memory is randomly allocated at runtime, it can make some attackers unable to get the address in advance, making it difficult for attackers to attack by obtaining functions or memory values through fixed addresses.

    • ASLR support is available for all applications starting with Mac OS X Lion10.7.

  • 3️ physical address = ALSR + virtual address (offset).

So what does this function actually do?

The process is as follows:

  • First, the Macho is redirected against the calculated SLIDE of ASLR.

  • Class to allow DyLD to use Mach messaging.

  • Stack overflow protection.

  • After initialization, call dyld main,dyld::_main.

2.1.2 dyld: : _main

Directly click to jump to the dyld-main function. This function is the main function for loading app.

uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
    // the function is too long to be pasted here
}
Copy the code

The main flow of this function is as follows:

2.1.2.1 Preparations
  • 1️ retail: configuration of relevant environmental variables.
  • 2️ discount: setting context informationsetContext .
  • 3️ : Whether the detection process is limited, corresponding treatment is made in the contextconfigureProcessRestrictions, detect environment variablescheckEnvironmentVariables.
    • Those familiar with jailbreak plugins are aware that certain environment variables directly affect whether or not the library will be loaded, and some protection actions are based on this principle.
  • 4️ retail: Configure print information according to environmental variables,DYLD_PRINT_OPTSDYLD_PRINT_ENV, you can configure to play in the picture below.
  • 5️ retail: Access to program architecturegetHostInfo .
2.1.2.2 Loading a Shared Cache Library

The main steps of the process are as follows:

  • 1️ : Check shared cache disallowed state checkSharedRegionDisable. (not disallowed on iOS).

  • 2️ retail: loading shared cache library, mapSharedCache -> loadDyldCache. There are several ways to load the shared cache:

    • 1. Only the current process is loadedmapCachePrivate(The simulator only supports loading into the current process).
    • The shared cache is loaded for the first timemapCacheSystemWide .
    • 3. The shared cache is not loaded for the first time, so nothing is done.
2.1.2.3 reloadAllImages
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code

Instantiate the main program to detect executable program formats.

static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
	}
	
	throw "main executable not a known format";
}
Copy the code

IsCompatibleMachO checks compatibility with magic, CPUType, and cpusubType in the header.

instantiateMainExecutable
static std::vector<ImageLoader*> sAllImages;

InstantiateMainExecutable, instantiation of the main program is to use real sniffLoadCommands this function to do. For those of you who are familiar with this function, let’s look at it a little bit.

Again the sniffLoadCommands function in the scope of ImageLoaderMachO.

void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
											unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
											const linkedit_data_command** codeSigCmd,
											const encryption_info_command** encryptCmd)
{
    *compressed = false;
	*segCount = 0;
	*libCount = 0;
	*codeSigCmd = NULL;
	*encryptCmd = NULL;
	/*
	...省略掉.
	*/
	// fSegmentsArrayCount is only 8-bits
	if ( *segCount > 255 )
		dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);

	// fSegmentsArrayCount is only 8-bits
	if ( *libCount > 4095 )
		dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);

	if ( needsAddedLibSystemDepency(*libCount, mh) )
		*libCount = 1;
}
Copy the code

This function loads the main program according to Load Commands.

Let’s explain a few parameters here:

  • compressed– > according toLC_DYLD_INFO_ONYLTo decide.
  • segCountThe maximum number of segment commands cannot exceed255A.
  • libCountDepending on the number of libraries,LC_LOAD_DYLIB (Foundation / UIKit ..), the maximum value cannot be exceeded4095A.
  • codeSigCmd, application signature, inApplication signature principle and re-signature (re-signature wechat application practice)This article covers it in great detail. I recommend reading it.
  • encryptCmd, the application of encryption information, (we commonly known as the application shell, we are not jailbreak environment heavy signature is the need to hit the shell application to debug, about the application of the shell, the subsequent reverse article jailbreak will be the actual operation exercise).

After the above steps, the instantiation of the main program is complete.

2.1.2.4 Loading and inserting dynamic libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
	for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib) 
		loadInsertedDylib(*lib);
}
Copy the code

Those familiar with jailbreak plugins should be familiar with this mechanism. The DYLD_INSERT_LIBRARIES environment variable determines whether the inserted dynamic library needs to be loaded.

Jailbreak plugins are based on this principle. Just download the plugins and you can affect the application. This environment variable is used as part of the protection method (we will write a simple jailbreak plugin for you later in this article, and then explain how to protect jailbreak environment plugin).

sInsertedDylibCount = sAllImages.size()-1;

Records the number of dynamic libraries inserted.

2.1.2.5 Link the main program
// link main executable
gLinkContext.linkingMainExecutable = true;

link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
	gLinkContext.bindFlat = true;
	gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
if ( sInsertedDylibCount > 0 ) {
	for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
		ImageLoader* image = sAllImages[i+1];
		link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
		image->setNeverUnloadRecursive();
	}
	
	for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
		ImageLoader* image = sAllImages[i+1]; image->registerInterposing(gLinkContext); }}Copy the code

Click on the link function, the link function in a series of recursiveLoadLibraries, recursiveBindWithAccounting – > recursiveBind, symbol of binding process is recursive.

After the link function completes, dyld :: main will call sMainExecutable->weakBind(gLinkContext); Weak binding, lazy binding, that is, weak binding must occur after the other library link binding is complete.

The binding process is the shared cache binding process described in Section 1.2 above.

So I get to this point, the main program is instantiated, but it’s not loaded yet, the framework is loaded, so just as an aside, which framework gets loaded first? In fact, depending on the binary order, Xcode is free to adjust.

Drag to adjust the order, and the compile order will follow that order, and you can also use MachOView to see the binary order.

At this point, you are done configuring environment variables -> loading the shared cache -> instantiating the main program -> loading the dynamic library -> linking the dynamic library.

Go down to dyld :: main and we’ll see

initializeMainExecutable();
Copy the code

So let’s go back to the function call stack.

2.1.3 Run the main program

By looking at the source code and looking at the function call stack, InitializeMainExecutable -> runInitializers -> processInitializers -> recursiveInitialization

CMD + Shift + O search for recursiveInitialization. Go to the function implementation and find the following code:

// let objc know we are about to initialize this image
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);

// initialize this image
bool hasInitializers = this->doInitialization(context);

// let anyone know we finished initializing this image
fState = dyld_image_state_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_initialized, this.NULL);
Copy the code

Call the notifySingle function.

⚠️ : Here comes the big deal. According to the function call stack, the next step is to call load_images, but the notifySingle does not find any sign of load_images. But here’s what we see:

(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
Copy the code

SNotifyObjCInit is not null, which means it must have a value. So let’s search sNotifyObjCInit to see when the value is assigned.

Search this file directly and see the following:

void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true, NULL, false.true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector
      
       ::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}
      *>Copy the code

That is, the function call, whose second argument is assigned to sNotifyObjCInit, is then executed in notifySingle.

So let’s search for the registerObjCNotifiers to see when they’re called and find:

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code

The next breakpoint in the test project is _dyLD_OBJC_notify_register. Run the breakpoint to see the function call stack.

runtime
objc 750
_objc_init

2.1.4 _objc_init

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    lock_init();
    exception_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
Copy the code

At this point, we see that _dyLD_OBJC_notify_register is called, passing three parameters that represent the underlying principles of classification discussed in detail.

  • map_images : dyldimageThis function is triggered when loaded into memory.
  • load_images : dyldInitialize theimageTriggers the method. (Well knownloadMethod is also called here).
  • unmap_image : dyldimageThis function is triggered when it is removed.

Of course, you can verify this with LLDB.

So load_images calls each class’s load method (call_load_methods). For this, please refer to the two articles on classification underlying principle and load method call mechanism parsing.

For the record:

So in other words:

  • 1 ️ ⃣ and whendyldRecursively called when loaded to start linking the main programrecursiveInitializationFunction.
  • 2️ one, the first execution of this function, proceedlibsystemThe initialization of. Will go todoInitialization -> doModInitFunctions -> libSystemInitialized .
  • 3 ️ ⃣,LibsystemInitialization, it will calllibdispatch_init , libdispatchinitWill be called_os_object_initThat’s called in this function_objc_init .
  • 4 ️ ⃣,_objc_initRegistered and saved inmap_images , load_images , unmap_imageFunction address.
  • 5️ discount: continue to return after registrationrecursiveInitializationRecursively the next call, for examplelibobjcwhenlibobjcCame torecursiveInitializationIs fired when calledlibsystemCalls to the_objc_initIn the registered callback function to calllibobjc, the callload_images.

This looks exactly like the function call stack shown in the screenshot above.

2.1.5 doInitialization

When Dyld comes to doInitialization,

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
	CRSetCrashLogMessage2(this->getPath());

	// mach-o has -init and static initializers
	doImageInit(context);
	doModInitFunctions(context);
	
	CRSetCrashLogMessage2(NULL);
	
	return (fHasDashInit || fHasInitializers);
}
Copy the code

In doModInitFunctions, it is worth mentioning that c++ constructors are called.

The demo is as follows:

This c++ constructor is stored in the __DATA section, __mod_init_func section.

2.1.6 Find the main program entry

// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code

Find the real main entry and return.

Conclusion:

The above is the complete process for dyLD to load an application. I suggest you explore it carefully.