In previous chapters, we learned about the nature of method calls, message lookup and message forwarding, and today we’ll explore a new topic, application loading.
I was once asked this question in an interview: What happens between hitting Run and main? Here is also by this question to launch our analysis. When we click on the Run after, will experience a process of compiler to execute, temporarily to compile this part does not make analysis (are watching the self-improvement of the programmer, this book is of great help to understand the compilation, later will write an article to summarize and series), so this article will through the start function and combining Dyld for analysis.
One: knowledge leads
1. What is Dyld
Dyld (The Dynamic Link Editor) is apple’s dynamic linker, which is an important part of Apple’s operating system. After an application is compiled and packaged into a Mach-O file in executable format, DYLD is responsible for linking and loading programs.
Dyld source code is open source, can be downloaded from the official website
2. Share the cache
In iOS system, dynamic libraries dependent on each program need to be loaded into memory one by one through dyld (located at /usr/lib/dyld). However, if each program is loaded repeatedly, it will inevitably cause slow running. In order to optimize the startup speed and improve the performance of the program, the shared cache mechanism was created. All the default dynamic link libraries are merged into one large cache file, in/System/Library/Caches/com. Apple. Dyld/directory, according to the different architecture preservation were preserved.
There is no dynamic library cache
There is a dynamic library cache
Extension 3.
Since dynamic libraries are loaded into memory at runtime, which means they are not in Mach-o, how does the system find the address of the external function and make the call?
- Generated when the project is compiled
Mach-O
There’s a space set aside in the executable, which is essentially a symbol table, and it’s stored there_DATA
Data segment (because_DATA
Segments are readable and writable at run time.- Compile-time: all library methods in the project that refer to the shared cache are set to symbolic addresses, such as one in the project
NSLog
At compile timeMach-O
Create aNSLog
In engineeringNSLog
Just point to this symbol.)- Run time: When
dyld
When the application process is loaded into memory, according toload commands
Which library files need to be loaded to do the binding (toNSLog
For example,dyld
I’m going to find itFoundation
中NSLog
To the real address of_DATA
Section in the symbol tableNSLog
The symbol above)
This process is called PIC technology (Position Independent Code)
Two: DYLD loading process
1. The main function
If we break the main function, we can see that start is also called before main
start
libdyld.dylib
That is, the main function of the main program is called by dyld. Below we will interpret the source code of DYLD
2. Start function
In the source of dyld, find the start function in the file dyldinitialization.cpp
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
// Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);
// if kernel had to slide dyld, we need to fix up load sensitive locations
// we have to do this before using any global variables
rebaseDyld(dyldsMachHeader);
// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple ! = NULL) { ++apple; } ++apple; //set up random value for stack canary
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(argc, argv, envp, apple);
#endif
// now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code
Two important parameters of the function are explained here
const struct macho_header* appsMachHeader
And this parameter isMach-O
的header
.intptr_t slide
So this is actuallyALSR
In other words, by a random value (which is what we have hereslide
) to implement random loading of address space configuration- Physical address = ALSR + Virtual address (offset)
So what’s going on in this function?
-
Redirect macho according to the calculated SLIDE of ASLR.
-
Class to allow DyLD to use Mach messaging.
-
Stack overflow protection.
-
After initialization, call dyld main,dyld::_main
3. dyld::_main
//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which__dyld_start jumps to // uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ...... To be honest, the code is too long, so I will analyze the key points below.Copy the code
Old rules first read notes: DyLD entry. The kernel program loads dyLD and jumps to __dyLD_START, a function that is called after setting registers. Returns the address of the main() function of the target program.
Next, I will select the key points for analysis
3.1 Preparation
3.1.1 Setting Environment Variables
I’m not going to post the code in this section, but you can look at it yourself, and of course we can also set environment variables in Xcode
3.1.2 Setting Context InformationsetContext
setContext(mainExecutableMH, argc, argv, envp, apple);
Copy the code
3.1.3 Detect whether threads are restricted and do relevant processingconfigureProcessRestrictions
configureProcessRestrictions(mainExecutableMH, envp);
Copy the code
3.1.4 Checking Environment VariablescheckEnvironmentVariables
{
checkEnvironmentVariables(envp);
defaultUninitializedFallbackPaths(envp);
}
Copy the code
3.1.5 Obtaining program architecture getHostInfo
{
getHostInfo(mainExecutableMH, mainExecutableSlide);
}
Copy the code
3.2 Loading the Shared Cache
3.2.1 Checking whether the Shared Cache is Disabled
checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
Copy the code
3.2.2 Loading a Shared Cache LibrarymapSharedCache
if( gLinkContext.sharedRegionMode ! = ImageLoader::kDontUseSharedRegion ) {#if TARGET_OS_SIMULATOR
if ( sSharedCacheOverrideDir)
mapSharedCache();
#else
mapSharedCache();
#endif
Copy the code
3.3 Adding a DYLD to the UUID list
Add dyld itself to the UUID list addDyldImageToUUIDList
// add dyld itself to UUID list
addDyldImageToUUIDList();
Copy the code
3.4 reloadAllImages
3.4.1 Instantiate the main programinstantiateFromLoadedImage
// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
}
throw "main executable not a known format";
}
Copy the code
- In the judgment condition,
isCompatibleMachO
Will go to Mach – Ohead
To test for compatibility
- through
instantiateMainExecutable
In thesniffLoadCommands
Loading the main program is actually yesMachO
In the fileLoadCommons
Some columns of the segment are loaded
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
const linkedit_data_command** codeSigCmd,
const encryption_info_command** encryptCmd)
{
*compressed = false; *segCount = 0; *libCount = 0; *codeSigCmd = NULL; *encryptCmd = NULL; / *... */ // fSegmentsArrayCount is only 8-bitsif ( *segCount > 255 )
dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);
// fSegmentsArrayCount is only 8-bits
if ( *libCount > 4095 )
dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);
if ( needsAddedLibSystemDepency(*libCount, mh) )
*libCount = 1;
}
Copy the code
Let’s explain a few parameters here:
- Compressed -> Based on LC_DYLD_INFO_ONYL.
- SegCount Number of segCount commands. The maximum number of segCount commands cannot exceed 255.
- LibCount number of dependent libraries, LC_LOAD_DYLIB (Foundation/UIKit..) , a maximum of 4095.
- CodeSigCmd, apply the signature
- EncryptCmd, which applies the encrypted information
- After the image file is generated, add it to the sAllImages global image
static void addImage(ImageLoader* image) { // add to master list allImagesLock(); sAllImages.push_back(image); allImagesUnlock(); . }Copy the code
After the above steps, the instantiation of the main program is complete
3.4.2 Loading and inserting the dynamic library
// load any inserted libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) {for(const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); }Copy the code
3.4.3 Link the main program
// link main executable
gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
if ( mainExcutableAlreadyRebased ) {
// previous link() on main executable has already adjusted its internal pointers for ASLR
// work around that by rebasing by inverse amount
sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
}
#endif
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
Copy the code
Link each dynamic library in the main program in link(sMainExecutable, SenV. DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1) for symbolic binding
At this point, configure the environment variables -> load the shared cache -> instantiate the main program -> load the dynamic library -> link the dynamic library
3.5 initializeMainExecutable() runs all initializers
Here will bring the main executable file and run the initialization program The call order initializeMainExecutableinitializeMainExecutable – > runInitializers – > ProcessInitializers -> Call recursiveInitialization
void initializeMainExecutable()
{
// record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code
3.5.1 Preparing for InitializationrunInitializers
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.images[0] = this;
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}
Copy the code
3.5.2 processInitializers
Iterate over image.count, recursively initializing the image ‘
// <rdar://problem/14412057> upward dylib initializers can be run too soon
// To handle dangling dylibs whichare upward linked but not downward, all upward linked dylibs // have their initialization postponed until after the recursion through downward dylibs // has completed. void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread, InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images) { uint32_t maxImageCount = context.imageCount()+2; ImageLoader::UninitedUpwards upsBuffer[maxImageCount]; ImageLoader::UninitedUpwards& ups = upsBuffer[0]; ups.count = 0; // Calling recursive init on all imagesin images list, building a new list of
// uninitialized upward dependencies.
for (uintptr_t i=0; i < images.count; ++i) {
images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code
3.5.3 recursiveInitialization The image initialization is obtained
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
...
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
// initialize this image
bool hasInitializers = this->doInitialization(context);
// letanyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); . }Copy the code
NotifySingle gets the callback of the mirror
void (*notifySingle)(dyld_image_states, const ImageLoader* image, InitializerTimingList*);
Copy the code
Based on the call stack, we know that the next step is to call load_images
sNotifyObjCInit
registerObjCNotifiers
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true, NULL, false.true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector
::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}
*>Copy the code
A global search shows that the registerObjCNotifiers are only called in the _dyLD_OBJC_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped)
{
dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
Here is a brief explanation of the parameters:
- Map_images: This function is triggered when dyld loads the image into memory.
- Load_images: This method is triggered when dyld initializes the image (the familiar load method is also called here).
- Unmap_image: Triggered when dyld removes the image.
By means of a symbolic breakpoint, we find that _dyLD_OBJC_notify_register is called at _objc_init
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
lock_init();
exception_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
Copy the code
3.5.4 doInitialization
This is the system-specific c++ constructor.
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
CRSetCrashLogMessage2(this->getPath());
// mach-o has -init and static initializers
doImageInit(context);
doModInitFunctions(context);
CRSetCrashLogMessage2(NULL);
return (fHasDashInit || fHasInitializers);
}
Copy the code
2.6.2doModInitFunctions
void ImageLoaderMachO::doModInitFunctions(const LinkContext& context) {too much code.Copy the code
Here the system’s libSystem is loaded first, followed by libDispatch.
3.6 notifyMonitoringDyldMain Monitors the main of the DYLD
3.7 Find the entry to the main program
// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code
At this point, the dyLD loading process ends
Four:
The process diagram
Process breakdown
Five: reference
IOS Reverse – DyLD shared cache iOS bottom – Clean up the DYLD loading process from scratch