preface
Understanding dyLD’s loading process can help us understand the nature of iOS apps in a more systematic way. Dyld is an essential field, both in reverse direction and in underlying research. A clear sorting of the process can help us better understand some basic principles. For example, before we talked about the underlying principle of classification and studied the process in detail, load method call mechanism analysis, dyLD was inevitably mentioned.
This article will sort out and analyze the whole loading process, which will not be particularly detailed. After all, the whole process is too much, and any points that need to be mentioned will be introduced.
Tip: Familiarize yourself with the Mach-O file before reading this article.
1, dyld
1.1 introduction
Dyld The dynamic Link Editor. It is apple’s dynamic linker, an important part of Apple’s operating system. After the application is compiled and packaged into the executable file format of Mach-O, dyLD is responsible for linking and loading programs.
Dyld is open source, so you can download the source code from dyLD’s official website to read how it works and learn the details of how the system loads the dynamic library.
I downloaded dyLD-635.2 here.
1.2 Shared Cache
There is one essential thing for reading dyLD – shared cache.
Since iOS libraries such as UIKit/Foundation are loaded into memory via DyLD, To save space, Apple has placed these libraries in one place: Dyld shared cache (Mac OS).
Therefore, the implementation address of an NSlog-like function does not and cannot be in our own project’s Mach-O, so how can our project call the NSLog method find its real implementation address?
The process is as follows:
Generated when the project is compiled
Mach-O
There’s a space set aside in the executable, which is essentially a symbol table, and it’s stored there_DATA
Data segment (because_DATA
Segments are readable and writable at run time.Compile-time: all projects that reference system library methods in the shared cache point to symbolic addresses. (For example, if there is an NSLog in the project, an NSLog is created in Mach -o at compile time, and the NSLog in the project points to this symbol.)
Runtime: When dyLD loads the application into memory, it does the binding based on the library files listed in Load Commands (NSLog, for example, Dyld will find the real address of the NSLog in Foundation and write it to the _DATA symbol table above the symbol of the NSLog.)
This process is called PIC technology.
With the system functions loaded, let’s look at the fishhook function name:
Rebind_symbols :: Rebinding symbols is straightforward.
The Fishhook principle is:
The symbols pointed to by the compiled system library functions are rebound at run time to the user specified function address, and then the real address of the original system function is assigned to the user specified pointer.
2. Dyld loading process
Create a new empty app project and add a load method to the ViewController.
+ (void)load{
NSLog(The load to the "@");
}
Copy the code
The load method adds breakpoints. Runs the program. Looks at the function call stack.
Go to the entry _dyLD_START with the LLDB: bt + up/down command.
2.1 _dyld_start
Line 11 above: Call is the instruction to call a function, (same as bl). And that’s where we start our app.
When we click on an application, the system kernel starts a process and dyLD starts loading the executable file.
2.1.1 dyldbootstrap :: start
Dyldbootstrap ::start refers to the start function in the scope of the dyldbootstrap namespace.
Go to the source code, search for dyLDBootstrap, and find the start function.
CMD + Shift + j can locate the file
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[],
intptr_t slide, const struct macho_header* dyldsMachHeader,
uintptr_t* startGlue)
{
slide = slideOfMainExecutable(dyldsMachHeader);
boolshouldRebase = slide ! =0;
#if __has_feature(ptrauth_calls)
shouldRebase = true;
#endif
if ( shouldRebase ) {
rebaseDyld(dyldsMachHeader, slide);
}
mach_init();
const char** envp = &argv[argc+1];
const char** apple = envp;
while(*apple ! =NULL) { ++apple; }
++apple;
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif
uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code
This function first takes two arguments that we need to specify:
1️, const struct macho_header* appsMachHeader, this parameter is Mach-O’s header. For this header, the Mach-O file is described in detail in the Mach-O file structure in this article.
2️, INTptr_t slide, which is actually ALSR, namely, random loading of address space configuration is realized by a random value (i.e. slide).
When the address space that can be used and controlled by a particular process in memory is randomly allocated at runtime, it can make some attackers unable to get the address in advance, making it difficult for attackers to attack by obtaining functions or memory values through fixed addresses.
ASLR support is available for all applications starting with Mac OS X Lion10.7.
3️ physical address = ALSR + virtual address (offset).
So what does this function actually do?
The process is as follows:
-
First, the Macho is redirected against the calculated SLIDE of ASLR.
-
Class to allow DyLD to use Mach messaging.
-
Stack overflow protection.
-
After initialization, call dyld main,dyld::_main.
2.1.2 dyld: : _main
Directly click to jump to the dyld-main function. This function is the main function for loading app.
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue)
{
// the function is too long to be pasted here
}
Copy the code
The main flow of this function is as follows:
2.1.2.1 Preparations
- 1️ retail: configuration of relevant environmental variables.
- 2️ discount: setting context information
setContext
. - 3️ : Whether the detection process is limited, corresponding treatment is made in the context
configureProcessRestrictions
, detect environment variablescheckEnvironmentVariables
.
- Those familiar with jailbreak plugins are aware that certain environment variables directly affect whether or not the library will be loaded, and some protection actions are based on this principle.
- 4️ retail: Configure print information according to environmental variables,
DYLD_PRINT_OPTS
与DYLD_PRINT_ENV
, you can configure to play in the picture below.
- 5️ retail: Access to program architecture
getHostInfo
.
2.1.2.2 Loading a Shared Cache Library
The main steps of the process are as follows:
-
1️ : Check shared cache disallowed state checkSharedRegionDisable. (not disallowed on iOS).
-
2️ retail: loading shared cache library, mapSharedCache -> loadDyldCache. There are several ways to load the shared cache:
- 1. Only the current process is loaded
mapCachePrivate
(The simulator only supports loading into the current process). - The shared cache is loaded for the first time
mapCacheSystemWide
. - 3. The shared cache is not loaded for the first time, so nothing is done.
- 1. Only the current process is loaded
2.1.2.3 reloadAllImages
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code
Instantiate the main program to detect executable program formats.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
}
throw "main executable not a known format";
}
Copy the code
IsCompatibleMachO checks compatibility with magic, CPUType, and cpusubType in the header.
instantiateMainExecutable
static std::vector<ImageLoader*> sAllImages;
InstantiateMainExecutable, instantiation of the main program is to use real sniffLoadCommands this function to do. For those of you who are familiar with this function, let’s look at it a little bit.
Again the sniffLoadCommands function in the scope of ImageLoaderMachO.
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
const linkedit_data_command** codeSigCmd,
const encryption_info_command** encryptCmd)
{
*compressed = false;
*segCount = 0;
*libCount = 0;
*codeSigCmd = NULL;
*encryptCmd = NULL;
/*
...省略掉.
*/
// fSegmentsArrayCount is only 8-bits
if ( *segCount > 255 )
dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);
// fSegmentsArrayCount is only 8-bits
if ( *libCount > 4095 )
dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);
if ( needsAddedLibSystemDepency(*libCount, mh) )
*libCount = 1;
}
Copy the code
This function loads the main program according to Load Commands.
Let’s explain a few parameters here:
compressed
– > according toLC_DYLD_INFO_ONYL
To decide.segCount
The maximum number of segment commands cannot exceed255
A.libCount
Depending on the number of libraries,LC_LOAD_DYLIB (Foundation / UIKit ..)
, the maximum value cannot be exceeded4095
A.codeSigCmd
, application signature, inApplication signature principle and re-signature (re-signature wechat application practice)This article covers it in great detail. I recommend reading it.encryptCmd
, the application of encryption information, (we commonly known as the application shell, we are not jailbreak environment heavy signature is the need to hit the shell application to debug, about the application of the shell, the subsequent reverse article jailbreak will be the actual operation exercise).
After the above steps, the instantiation of the main program is complete.
2.1.2.4 Loading and inserting dynamic libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib)
loadInsertedDylib(*lib);
}
Copy the code
Those familiar with jailbreak plugins should be familiar with this mechanism. The DYLD_INSERT_LIBRARIES environment variable determines whether the inserted dynamic library needs to be loaded.
Jailbreak plugins are based on this principle. Just download the plugins and you can affect the application. This environment variable is used as part of the protection method (we will write a simple jailbreak plugin for you later in this article, and then explain how to protect jailbreak environment plugin).
sInsertedDylibCount = sAllImages.size()-1;
Records the number of dynamic libraries inserted.
2.1.2.5 Link the main program
// link main executable
gLinkContext.linkingMainExecutable = true;
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
image->setNeverUnloadRecursive();
}
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1]; image->registerInterposing(gLinkContext); }}Copy the code
Click on the link function, the link function in a series of recursiveLoadLibraries, recursiveBindWithAccounting – > recursiveBind, symbol of binding process is recursive.
After the link function completes, dyld :: main will call sMainExecutable->weakBind(gLinkContext); Weak binding, lazy binding, that is, weak binding must occur after the other library link binding is complete.
The binding process is the shared cache binding process described in Section 1.2 above.
So I get to this point, the main program is instantiated, but it’s not loaded yet, the framework is loaded, so just as an aside, which framework gets loaded first? In fact, depending on the binary order, Xcode is free to adjust.
Drag to adjust the order, and the compile order will follow that order, and you can also use MachOView to see the binary order.
At this point, you are done configuring environment variables -> loading the shared cache -> instantiating the main program -> loading the dynamic library -> linking the dynamic library.
Go down to dyld :: main and we’ll see
initializeMainExecutable();
Copy the code
So let’s go back to the function call stack.
2.1.3 Run the main program
By looking at the source code and looking at the function call stack, InitializeMainExecutable -> runInitializers -> processInitializers -> recursiveInitialization
CMD + Shift + O search for recursiveInitialization. Go to the function implementation and find the following code:
// let objc know we are about to initialize this image
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
// initialize this image
bool hasInitializers = this->doInitialization(context);
// let anyone know we finished initializing this image
fState = dyld_image_state_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_initialized, this.NULL);
Copy the code
Call the notifySingle function.
⚠️ : Here comes the big deal. According to the function call stack, the next step is to call load_images, but the notifySingle does not find any sign of load_images. But here’s what we see:
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
Copy the code
SNotifyObjCInit is not null, which means it must have a value. So let’s search sNotifyObjCInit to see when the value is assigned.
Search this file directly and see the following:
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true, NULL, false.true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector
::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}
*>Copy the code
That is, the function call, whose second argument is assigned to sNotifyObjCInit, is then executed in notifySingle.
So let’s search for the registerObjCNotifiers to see when they’re called and find:
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped)
{
dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
The next breakpoint in the test project is _dyLD_OBJC_notify_register. Run the breakpoint to see the function call stack.
runtime
objc 750
_objc_init
2.1.4 _objc_init
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
lock_init();
exception_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
Copy the code
At this point, we see that _dyLD_OBJC_notify_register is called, passing three parameters that represent the underlying principles of classification discussed in detail.
map_images
:dyld
将image
This function is triggered when loaded into memory.load_images
:dyld
Initialize theimage
Triggers the method. (Well knownload
Method is also called here).unmap_image
:dyld
将image
This function is triggered when it is removed.
Of course, you can verify this with LLDB.
So load_images calls each class’s load method (call_load_methods). For this, please refer to the two articles on classification underlying principle and load method call mechanism parsing.
For the record:
So in other words:
- 1 ️ ⃣ and when
dyld
Recursively called when loaded to start linking the main programrecursiveInitialization
Function.- 2️ one, the first execution of this function, proceed
libsystem
The initialization of. Will go todoInitialization
->doModInitFunctions
->libSystemInitialized
.- 3 ️ ⃣,
Libsystem
Initialization, it will calllibdispatch_init
,libdispatch
的init
Will be called_os_object_init
That’s called in this function_objc_init
.- 4 ️ ⃣,
_objc_init
Registered and saved inmap_images
,load_images
,unmap_image
Function address.- 5️ discount: continue to return after registration
recursiveInitialization
Recursively the next call, for examplelibobjc
whenlibobjc
Came torecursiveInitialization
Is fired when calledlibsystem
Calls to the_objc_init
In the registered callback function to calllibobjc
, the callload_images
.
This looks exactly like the function call stack shown in the screenshot above.
2.1.5 doInitialization
When Dyld comes to doInitialization,
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
CRSetCrashLogMessage2(this->getPath());
// mach-o has -init and static initializers
doImageInit(context);
doModInitFunctions(context);
CRSetCrashLogMessage2(NULL);
return (fHasDashInit || fHasInitializers);
}
Copy the code
In doModInitFunctions, it is worth mentioning that c++ constructors are called.
The demo is as follows:
This c++ constructor is stored in the __DATA section, __mod_init_func section.
2.1.6 Find the main program entry
// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code
Find the real main entry and return.
Conclusion:
The above is the complete process for dyLD to load an application. I suggest you explore it carefully.