preface
In today’s information age, the popularity of mobile applications has solved many problems and provided higher convenience for life. The App we use every day when you open it, how does the application load and run? What happens between clicking open and the first visualization? This article focuses on the application loading process and how it is loaded into memory.
Application loading principles
The compilation process
The source file
afterprecompiled
That is, passlexical
andgrammar
Analysis of theprecompiled
Provide to upon completionThe compiler
forcompile
compile
And then someAssembly file
Assembly file
throughlink
Load it in, and getExecutable file
Static and dynamic libraries
Libraries are executable binaries that are loaded into memory by the operating system.
- The form of libraries
- Static library
- The dynamic library
- The difference between static and dynamic libraries
- Static libraries are generally
.a/.lib
And so on, the dynamic library is generally.so/.dll/.framework/.dylib
Format such as - Static libraries are loaded by static links, and dynamic libraries are loaded by dynamic links
- Static libraries are generally
- Link structure diagram for static and dynamic libraries
- Static libraries are loaded one by one, and duplicate static library files may exist
- Dynamic libraries can be loaded through shared library files to save space
How are libraries loaded into memory
Loading flow chart
The loading process of dynamic linker DYLD is roughly as follows:
- It loads when the App starts
libSystem
Runtime
Register the related callback functions- Load the new
Images (image file)
To map the library files into memory - perform
map_images()
,load_images()
- call
main
function
LLDB debugging
According to the above process, we in the actual project through LLDB debugging to analyze the application loading process, from App startup to main function do what?
- First create an empty project, then in
main
Make a breakpoint at the function entry and start running
What we found was thatmain
There’s a function before it executesstart
Delta function. Now let’s look at thisstart
You can seelibdyld.dylib start
thestart
Is fromlibdyld.dylib
Dynamic library, but how to load in and call, just from the above results do not know, next symbol breakpoint.
- Add to the project
start
Sign breakpoint, and run again
Turns out it didn’t break. It just made itmain
Function, as you can see from the flow aboveload
Methods in themain
Before delta function, and then after delta functionViewController
the+load
Make a break point in the method.
ViewController
theload
Method to break, and then run again
Found in themain
The function comes firstload
methods
- The input
bt
Printing stack information
You can see that the _DYLD_START method is called from dyLD. You can download the latest dyLD open-source code (dyLD-852) from Apple’s official web site to see how dyLD is loaded.
Source code analysis (forward projection)
From the above analysis, we get the _dyLD_START function, and then we get the dyLD source code for a global search.
Dyldbootstrap ::start C++ function dyldbootstrap::start C++ function dyldbootstrap::start C++
dyldbootstrap::start
// // This is code to bootstrap dyld. This work in normally done for a program by dyld and crt. // In dyld we have to do this manually. // uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[], Const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue) {// Omit some code...... // now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code
The main function is dyld::_main. The main function is dyld::_main.
dyld::_main
The dyld::_main function contains more than 800 lines of code, which will not be posted. It mainly involves the following processes:
- Conditions (Environment, Platform, version, path, host information…)
instantiateFromLoadedImage
Instantiate the main programloadInsertedDylib
Load the inserted dynamic librarymapSharedCache
Shared cache loadinglink
The main programlink
Insert the dynamic libraryweakBind
Weak references bind the main programinitializeMainExecutable
Initialize thenotifyMonitoringDyldMain
Notify dyld that it is ready to enter main
According to the return value of dyld::_main, the result returned is generated by the sMainExecutable function that loads the associated image file.
uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ...... // <rdar://problem/12186933> do weak binding only after all inserted images linked sMainExecutable->weakBind(gLinkContext); . / / weak binding gLinkContext linkingMainExecutable = false; sMainExecutable->recursiveMakeDataReadOnly(gLinkContext); . #if SUPPORT_OLD_CRT_INITIALIZATION // Old way is to run initializers via a callback from crt1.o if ( ! gRunInitializersOldWay ) initializeMainExecutable(); #else // run all initializers initializeMainExecutable(); Motoring #endif // Notify any motoring proccesses that this process is about to enter main() notifyMonitoringDyldMain(); // Tell dyld to enter main function...... { // find entry point for main executable result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN(); if ( result ! = 0) {...... } else { // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main() result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD(); *startGlue = 0; } } } if (sSkipMain) { notifyMonitoringDyldMain(); . result = (uintptr_t)&fake_main; *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit; } return result; }Copy the code
initializeMainExecutable
One important function call we can see in the dyld::_main function is initializeMainExecutable(), for all initialization operations, and continue to see how this function is implemented.
void initializeMainExecutable() { ....... const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); . }Copy the code
InitializeMainExecutable calls runInitializers. Look globally at the definitions of runInitializers.
runInitializers
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.imagesAndPaths[0] = { this, this->getPath() };
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}
Copy the code
Find calls to processInitializers based on the function above.
processInitializers
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
uint32_t maxImageCount = context.imageCount()+2;
ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
ImageLoader::UninitedUpwards& ups = upsBuffer[0];
ups.count = 0;
// Calling recursive init on all images in images list, building a new list of
// uninitialized upward dependencies.
for (uintptr_t i=0; i < images.count; ++i) {
images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code
RecursiveInitialization is a function defined by recursiveInitialization
recursiveInitialization
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { ...... if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; try { ...... NotifySingle (DYLD_IMAGe_STATE_dependentS_initialized, this, &timingInfo); // initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; NotifySingle (dyLD_IMAGe_STATE_initialized, this, NULL); . } } recursiveSpinUnLock(); }Copy the code
The recursiveInitialization function flows as follows:
- Get the image file path and initialize the operation
- Dependency file initialization
- Initialization of its own file
Let’s look at the definition of a notifySingle with a global search.
notifySingle
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo) { ...... if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! = NULL) && image->notifyObjC() ) { uint64_t t0 = mach_absolute_time(); dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); MachHeader (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); . }... }Copy the code
Now let’s see where sNotifyObjCInit is called.
sNotifyObjCInit
You get a definition that looks like this
static _dyld_objc_notify_init sNotifyObjCInit;
Copy the code
The _dyLD_OBJC_notify_init search found to be the second parameter of the registerObjCNotifiers.
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped) { // record functions to call sNotifyObjCMapped = mapped; sNotifyObjCInit = init; sNotifyObjCUnmapped = unmapped; . }Copy the code
SNotifyObjCInit = init, and use this function to find out where the registerObjCNotifiers were called.
registerObjCNotifiers
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped)
{
dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
Step by step, loading from a single image file leads to the function _dyLD_OBJC_notify_register, which comes from the _objc_init initialization call of libobjC.dylib, A review of the _objc_init function implementation shows that the process is strung together.
void _objc_init(void) { static bool initialized = false; if (initialized) return; initialized = true; // fixme defer initialization until an objc-using image is found? environ_init(); tls_init(); static_init(); runtime_init(); exception_init(); #if __OBJC2__ cache_t::init(); #endif _imp_implementationWithBlock_init(); _dyLD_OBJC_notify_register (&map_images, load_images, unmap_image); #if __OBJC2__ didCallDyldNotifyRegister = true; #endif }Copy the code
LLDB debugging
Through the above derivation and source code analysis, we run in the actual project, verify whether the process is the same as the analysis. Create a breakpoint in the _objc_init method, run it and type bt in the LLDB to print the current stack information.
It can be seen that the process marked with red box is the process of forward conjecture and analysis above, but the call above red box is not known for the time being, and then we need to analyze the following process and continue to explore through reverse derivation. The _objc_init call is preceded by _OS_object_init. Click here to see the details.
The _OS_object_init function comes from the libdispatch. Dylib file. You can download libdispatch from apple’s open source library.
Source code analysis (backward derivation)
Stack flow information
The _OS_object_init function is located based on the stack output above. Next, search the libdispatch source code for _OS_object_init.
_os_object_init
void _os_object_init(void) { _objc_init(); Block_callbacks_RR callbacks = { sizeof(Block_callbacks_RR), (void (*)(const void *))&objc_retain, (void (*)(const void *))&objc_release, (void (*)(const void *))&_os_objc_destructInstance }; _Block_use_RR2(&callbacks); . }Copy the code
Call _objc_init in _OS_object_init. Search for _objc_init in libDispatch
You can see that _objc_init comes from LibobJC, and combined with the stack output flow above, you can probably list something like this: libSystem_initializer ~> libdispatch_init ~> _os_object_init ~> _objc_init
libdispatch_init
Search for libdispatch_init in the libdispatch source
void
libdispatch_init(void)
{
......
_dispatch_hw_config_init();
_dispatch_time_init();
_dispatch_vtable_init();
_os_object_init();
_voucher_init();
_dispatch_introspection_init();
}
Copy the code
You can see that _OS_object_init is called to validate the above process, but for libSystem_initializer you also need to download the libSystem source code.
libSystem_initializer
// libsyscall_initializer() initializes all of libSystem.dylib // <rdar://problem/4892197> __attribute__((constructor)) static void libSystem_initializer(int argc, const char* argv[], const char* envp[], const char* apple[], const struct ProgramVars* vars) { ...... libdispatch_init(); _libSystem_ktrace_init_func(LIBDISPATCH); . }Copy the code
The search for libSystem_initializer from the libSystem source code also calls libdispatch_init, and before libSystem_initializer is dyld’s doModInitFunctions, So let’s go back to dyLD. So now adjust the call flow again.
doModInitFunctions
~> libSystem_initializer
~> libdispatch_init
~> _os_object_init
~> _objc_init
doModInitFunctions
void ImageLoaderMachO::doModInitFunctions(const LinkContext& context) { ...... Initializer* inits = (Initializer*)(sect->addr + fSlide); . Initializer func = inits[j]; . Initializer func(context.argc, context.argv, context.envp, context.apple, & context.programvars); // To obtain the path to libSystem, run Initializer func(context.argc, context.argv, context.envp, context.apple, & context.programvars); . }Copy the code
Where is doModInitFunctions called
doInitialization
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
CRSetCrashLogMessage2(this->getPath());
// mach-o has -init and static initializers
doImageInit(context);
doModInitFunctions(context);
CRSetCrashLogMessage2(NULL);
return (fHasDashInit || fHasInitializers);
}
Copy the code
The doModInitFunctions are called in the recursiveInitialization function we predicted above, which is called in the recursiveInitialization function, and the process is closed.
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { ...... if ( fState < dyld_image_state_dependents_initialized-1 ) { ...... try { ...... // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo); // initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); . }... }}Copy the code
conclusion
From the source code analysis flow and in conjunction with engineering debugging we get a chain of function calls like this:
_dyld_start
~> dyld::_main
~> initializeMainExecutable
~> runInitializers
~> processInitializers
~> recursiveInitialization
~> doInitialization
~> doModInitFunctions
~> libSystem_initializer
~> libdispatch_init
~> _os_object_init
~> _objc_init
NotifySingle called in recursiveInitialization, and the registerObjCNotifiers located in recursiveInitialization are only assignment operations. When to execute the call?
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init
init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
}
Copy the code
(sNotifyObjCMapped) (notifyBatchPartial) (notifyBatchPartial) (sNotifyObjCMapped) (notifyBatchPartial)
The sNotifyObjCInit method is called in notifySingle, and notifySingle is executed in recursiveInitialization above. Since recursiveInitialization is a recursive process, perform initialization for the first time and load image files.
Refer to the video
For an introduction to dyLD2 and DyLD3, see apple’s WWDC video.