In the previous chapter we went through the compilation journey. Our App has been successfully compiled and generated the corresponding Mach-O executable file. Now we need to start the related operations
Thread thread thread thread thread thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread
At runtime, we usually start with the main function as a starting point for coding, but before we discovered the main function, we also performed a number of operations, such as a series of operations on dyld, which will be explored in detail in this chapter
Loading and dynamic linking
First of all, AMway has a book called “Programmer self-cultivation: Linking, loading and Library”, which is refreshing after reading.
An App basically needs to go through two steps of loading and dynamic library link from executable file to real running code
loading
An executable (program) is a static concept, just a file on hard disk until it is run; Process is a dynamic concept, which is a process during the running of the program. We know that after each program is run, it will have its own independent virtual address space, and the upper limit of the address space size is determined by the hardware of the computer (CPU bits).
The virtual space of a process is controlled by the OPERATING system. Multiple processes run in the operating system at the same time. The virtual address space between these processes is isolated.
Load is the hard disk in the executable file is mapped to the virtual memory process, but is expensive and rare memory, so the program instructions and data necessary for the execution of the full loaded into memory obviously doesn’t work, so people study found that the program is running is a locality principle, can only stay in memory of the most commonly used part, Less frequently used data is stored on disk, which is the basic principle of dynamic loading
The process of loading can also be understood as the process of creating a process. The operating system only needs to do three things:
- Create a separate virtual address space
- Read the executable header, and establish a virtual space and executable mapping relationship
- Set the CPU instruction register to the executable file entry address, start running
Dynamic library linking
concept
The linked common libraries are divided into static libraries and dynamic libraries. Static libraries are compile-time linked libraries that need to be linked into your Mach-O file. If you need to update them, you have to compile them again. Dynamic libraries are run-time linked libraries that can be dynamically loaded using DYLD.
In real iOS development, you will find that many features are already available, not only for you, but also for other apps, such as GUI framework, I/O, networking, etc. Linking these shared libraries to your Mach-O files is also done through the linker.
All system frameworks (UIKit,Foundation, etc.) used in iOS are dynamically linked. Analogous to plugs and plugins, statically linked code inserts plugs and plugins one by one in the process of static linking after compilation, and directly executes binary files at runtime. Dynamic linker requires the process of “plugging” at program startup, so the dynamic linker needs to be ready before the code we write executes.
Shared cache
To save space, Apple keeps these system libraries in one place: the Dyld Shared cache.
The Mach-O file is the result of compilation, whereas the dynamic library is linked at run time and does not participate in compilation and linking of the Mach-O file, so the Mach-O file does not contain the symbol definition of the dynamic library.
That is, the symbols are shown as undefined, but their names and corresponding library paths are recorded. When the runtime imports dynamic libraries through Dlopen and DLSYM, it first finds the corresponding library path according to the record, and then finds the binding address through the record name symbol.
Dlopen will load the shared library into the address space of the running process. The loaded shared library will also have undefined symbols, which will trigger more shared libraries to be loaded. Dlopen also has the option of parsing all references at once or doing so later. Dlopen opens the dynamic library and returns the reference pointer. Dlsym uses the dynamic library pointer and function symbol returned by Dlopen to get the address of the function and then use it.
advantages
The benefits of using dynamic library links are as follows:
- Code sharing: Many programs dynamically link these LiBs, but there is only one copy of them in memory and on disk
- Easy to maintain: Libsystem.dylib is a stand-in for libsystem.b.dylib, for example. When you want to upgrade, just switch to libsystem.c.dylib and replace the stand-in
- Reduce the size of the executable file: Dynamic links do not need to be typed at compile time compared to static links, so the size of the executable file is much smaller
Watch program start from DYLD
Introduction to the
Dyld (The Dynamic Link Editor) is apple’s dynamic linker, which is an important part of Apple’s operating system. After an application is compiled and packaged into a Mach-O file in executable format, DYLD is responsible for linking and loading programs.
The related code for dyLD is the open source ☞ source address
Start the process
Create an empty project. We know that load is better than main, so put the breakpoint in load and look at the call stack.
_dyld_start
dyldbootstrap::start
Dyldbootstrap ::start refers to the start function in the scope of the dyldbootstrap namespace. Go to the source code, search for dyLDBootstrap, and find the start function.
//
// This is code to bootstrap dyld. This work in normally done for a program by dyld and crt.
// In dyld we have to do this manually.
//
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[],
intptr_t slide, const struct macho_header* dyldsMachHeader,
uintptr_t* startGlue)
{
// if kernel had to slide dyld, we need to fix up load sensitive locations
// we have to dothis before using any global variables slide = slideOfMainExecutable(dyldsMachHeader); bool shouldRebase = slide ! = 0;#if __has_feature(ptrauth_calls)
shouldRebase = true;
#endif
if ( shouldRebase ) {
rebaseDyld(dyldsMachHeader, slide);
}
// allow dyld to use mach messaging
mach_init();
// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple ! = NULL) { ++apple; } ++apple; //set up random value for stack canary
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif
// now that we are done bootstrapping dyld, call dylds main
uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code
The main call flow of start function is as follows:
1. Perform bootstrap lifting first, because DYLD itself is also a dynamic library, but it needs to link to other dynamic libraries, so it does not depend on other libraries. In addition, the relocation of global and static variables required by dyLD itself is completed by dyLD itself
const struct macho_header
This refers toMach-O
In the fileheader
intptr_t slide
This is essentiallyALSRIn other words, the address space configuration is loaded randomly with a random value (i.e. Slide) to prevent attacksrebaseDyld
It’s a dyLD redirect
2. Open function messages using: mach_init()
3. Set stack protection :__guard_setup
4. Start linking the shared object: dyld::_main
dyld::_main
uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue) { ... This is the main function of dyLD link, the code is too long, step by step analysis... }Copy the code
1. Configure environment variables
1.1 Main executable file from cdHash environment variables. Environment variables are defined by the system and can be configured in Xcode
1.2
setContext
1.3
configureProcessRestrictions
1.4
checkEnvironmentVariables
1.5
getHostInfo
2. Load the shared cache
2.1 Verifying the shared cache path: checkSharedRegionDisable
2.2
mapSharedCache
3. Add dyLD to the UUID list
Add dyld itself to the UUID list addDyldImageToUUIDList
4.reloadAllImages
4.1 instantiateFromLoadedImage instantiate the main programs
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code
The kernel is mapped to the main executable. We need the files already mapped to the main executable to create an ImageLoader
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
}
throw "main executable not a known format";
}
Copy the code
Through the sniffLoadCommands instantiateMainExecutable loading is really the main program for MachO file LoadCommons period of some of the column load
- The maximum number of segments is 256!
- The maximum number of dynamic libraries (including a custom system) is 4096!
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
const linkedit_data_command** codeSigCmd,
const encryption_info_command** encryptCmd)
{
...
for (uint32_t i = 0; i < cmd_count; ++i) {
...
}
Copy the code
After the image file is generated, it is added to the global image of sAllImages. The main program is always the first object of sAllImages
static void addImage(ImageLoader* image) { // add to master list allImagesLock(); sAllImages.push_back(image); allImagesUnlock(); . }Copy the code
4.2 Loading and inserting dynamic library loadInsertedDylib
// load any inserted libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! = NULL ) {for(const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! = NULL; ++lib) loadInsertedDylib(*lib); }Copy the code
SMainExecutable, SENV. DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
Link the main program in the dynamic library, symbol binding
// link main executable
gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
if ( mainExcutableAlreadyRebased ) {
// previous link() on main executable has already adjusted its internal pointers for ASLR
// work around that by rebasing by inverse amount
sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
}
#endif
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
Copy the code
At this point, you are done configuring environment variables -> loading the shared cache -> instantiating the main program -> loading the dynamic library -> linking the dynamic library.
5. Run all initializers
The function call is initializeMainExecutable(); . Run the initializer for the main executable and everything that comes with it
5.1 runInitializers->processInitializers Initialization Preparations
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.images[0] = this;
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}
Copy the code
5.2 Go through image.count, recursively initialize the image,
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
uint32_t maxImageCount = context.imageCount()+2;
ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
ImageLoader::UninitedUpwards& ups = upsBuffer[0];
ups.count = 0;
// Calling recursive init on all images in images list, building a new list of
// uninitialized upward dependencies.
for (uintptr_t i=0; i < images.count; ++i) {
images.images[i]->recursiveInitialization(context, thisThread, images.images[i]->getPath(), timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}
Copy the code
5.3 recursiveInitialization Indicates that the image is initialized
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
...
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
// initialize this image
bool hasInitializers = this->doInitialization(context);
// letanyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); . }Copy the code
5.3.1 notifySingle Receives a callback from a mirror
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{ ... }
Copy the code
The next step is to call load_images. The notifySingle does not find load_images. It is a callback
5.3.2 The assignment of sNotifyObjCInit is in the registerObjCNotifiers
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true, NULL, false.true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem) for (std::vector
::iterator it=sAllImages.begin(); it ! = sAllImages.end(); it++) { ImageLoader* image = *it; if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) { dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}
*>Copy the code
5.3.3 The registerObjCNotifiers are called in the _dyLD_OBJC_notify_register function
This function is used to call external shared dynamic libraries, such as objC libraries that need to be loaded in the Runtime
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped)
{
dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
We can see in the source code that _dyLD_OBJC_notify_register is called under _objc_init
The meanings of the three parameters are as follows:
map_images
This function is triggered when dyld loads the image into memory.load_images
This method is triggered when dyld initializes the image (the familiar load method is also called here).unmap_image
: this function is triggered when dyld removes the image.
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
lock_init();
exception_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
Copy the code
This is a system-specific C++ constructor call.
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
CRSetCrashLogMessage2(this->getPath());
// mach-o has -init and static initializers
doImageInit(context);
doModInitFunctions(context);
CRSetCrashLogMessage2(NULL);
return (fHasDashInit || fHasInitializers);
}
Copy the code
This C++ constructor has a specific way to write it. Find the corresponding method in the MachO file as follows
__attribute__((constructor)) void CPFunc() {printf("C++Func1");
}
Copy the code
6.notifyMonitoringDyldMain
Listen in on Dyld’s main
7. Find the call to main
Find the real main entry and return.
// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
Copy the code
summary
At this point, the entire startup process is complete
The process for loading runtime is as follows
- Dyld starts to initialize the program binary
- The ImageLoader reads the image, which contains our class, method, and other symbols
- Since the Runtime binds the callback to dyLD, when the image is loaded into memory, DyLD tells the Runtime to process it
- Called when the Runtime takes over
map_images
Do the parsing and processing, and thenload_images
In the callcall_load_methods
Method, iterating over all incoming classes, calling the Class’s +load method and its Category’s +load method by inheritance hierarchy
conclusion
The process diagram
Dyld call order
1. Boot and boot yourself from the original call stack left by the kernel
2. Recursively load the program dependent dynamic link library into memory, of course there is a caching mechanism
3. Non-lazy symbols are immediately linked to the executable file, and lazy is stored in the table
4.Runs static initializers for the executable
5. Locate the main function of the executable file, prepare parameters, and call it
6. The program execution is responsible for binding the lazy symbol, providing Runtime dynamic loading services, and providing the debugger interface
7. Execute static terminator after main function return
8. In some scenarios, adjust the _exit function of libSystem after the main function ends
Hierarchical sequence diagram
reference
- The story behind DYLD & Source code Analysis
- IOS Basics – Comb through the dyLD loading process from scratch
- What happened before the main function of the iOS program
- IOS master class
- APP startup process from the perspective of DYLD source code