This article will explore how the application loads. We usually think of main as the entry point to an application, right?
1. Case study
Introduce an example of adding a load method to viewcontroller.m and a C++ function to main.m. The ViewControler code is as follows:
@interface ViewController ()
@end
@implementation ViewController
+ (void)load{
NSLog(@"%s", __func__);
}
- (void)viewDidLoad {
[super viewDidLoad];
// Do any additional setup after loading the view.
NSLog(@"viewDidLoad --- ");
}
@end
Copy the code
The code of main.m is as follows:
__attribute__((constructor)) void kcFunc(){
printf("C... %s \n",__func__);
}
int main(int argc, char * argv[]) {
@autoreleasepool {
NSLog(@"main ...");
return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
}
}
Copy the code
Run the program. According to the run log, the program first runs the load method, then calls the C++ function, and finally enters the main method.
Load -> C++ -> main. Why is that? Isn’t main the entry point to the application? What exactly did you do before Main? Let’s go step by step.
Before analyzing the startup process, learn a few concepts.
2. Application compilation process
1.Static library
In the linking phase, the object program generated by assembly is linked and packaged into an executable file along with the referenced libraries. The static library will not change because it was copied directly to the target program at compile time. For example:.a,.lib
Advantages:
When compiled, the library is virtually useless, and the target program runs without external dependenciesDisadvantages:
Since there may be two copies of the static library, this can increase the size of the target program and cause a high memory, performance, and speed consumption
2.The dynamic library
Programs are not linked to the target program at compile time. The target program only stores references to the dynamic library and is loaded when the program is run. For example:.so,.framwork,.dll
Advantages:
After reducing packingapp
The size of shared memory, resource saving, update dynamic library, reach updaterDisadvantages:
Dynamic loading has a performance penalty. Using dynamic libraries also makes the program dependent on the external environment. If the environment lacks dynamic libraries, or the library version is incorrect, the program will not run
3.The build process
.h、.m、.cpp
Such as source files -> precompile -> compile -> assembly -> link (static and dynamic library support) -> executable files.The source file:
Load.h,.m,.cpp, etcPretreatment:
Replace macro, remove comment, expand header file, produce.i fileCompile:
Convert. I files to assembly language to produce. S filesAssembly:
Convert assembly files to machine code files to produce.o filesLinks:
References to other libraries in the.o file are made to generate the final executable
4. Dynamic linker dyld
Dyld (The Dynamic Link Editor) is an important part of Apple’s operating system. After the app is compiled and packaged into a Mach-O file in executable format, dyLD is responsible for connecting and loading the program.
Function of DYLD:
Load the individual libraries, i.eImage Image file
By thedyld
Read the table from memory, load the main program,link
Link each dynamic static library, the main program initialization.
3. Dyld Entrance exploration
How to find the entrance? That’s right bt!! Run the program by adding a breakpoint to the Load method of the Viewcontroller. Enter the instruction BT on the console to view the running function call stack information. See below:
We find the entry to the program, dyLD_start, and we see it in the stack information on the left. In fact, we can already see some clues here.
Calling process
:_dyld_start -> dyldbootstrap::start -> dyld::main -> dyld::initializeMainExecutable -> ImageLoader::runInitializers -> ImageLoader::processInitializers -> ImageLoader::recursiveInitialization -> dyld::notifySingle -> load_images -> [ViewController load]
.
Download dyLD source code: DYLD source download address, the latest version is dyLD-852.7. This part of the source code is not compilable. Open the dyld source code, search globally for _dyLD_start, and start to explore the program loading process. Finally, the pseudo assembly process was found in the dyldstartup. s file. See below:
Different implementations are provided for different environments, for example, we are now looking at an arm64 real environment implementation. I have limited ability to read it, but I find that no matter what the environment, it will eventually go to dyldbootstrap::start process. This is also consistent with the result of the function call stack we saw earlier in BT: _dyLD_start -> dyldbootstrap::start. Continue! The following global search dyldbootstrap. See below:
The dyldBootstrap namespace was found in the dyldInitialization. CPP file. The start method is the one we need to focus on, and we can see from the comments that this is the code to boot dyld. Complete the boot of dyld and execute the main function of dyld.
-
One more thing: what is macho_header? We know that the program is compiled to form an executable Mach-o file, which is connected and loaded by dyld. In other words, dyld loads the executable file Mach-o, and mmu-header is the header file of the executable. We can view executable file information using the MachOView tool.
A typical Mach-o file contains three areas:
Header
: saveMach-O
Some basic information, including platform, file type, number of instructions, total size of instructions,dyld
tagFlags
And so on.Load Commands
: follow theHeader
, the loadThe Mach - O files
This data is used to determine memory distribution and to guide system kernel loaders and dynamic linkers.Data
: eachsegment
The specific data stored here, including specific code, data, and so on.
4. Analysis of dyld main function
So let’s go to main, 600 lines of code, what do we do? Well, from return result; Do the reverse, determine the main program initialization process, summarize the main work of Dyld Main?
- Configure environment variables
Set the values based on the environment variables and get the current running framework.
- Shared cache
Check whether it is enabled and whether the shared cache is mapped to a shared region, such as UIKit, CoreFoundation, etc.
- Main program initialization
Call instantiateFromLoadedImage function instantiates a ImageLoader object.
- Loading the dynamic library
Iterate through the DYLD_INSERT_LIBRARIES environment variable and call loadInsertedDylib.
- Link main program
- Link dynamic library
- Execute the initialization method
- Look for the main program entry
The main function
Read the LC_MAIN entry from Load Command. If not, read the LC_UNIXTHREAD, which brings you to the familiar main function in daily development.
The following details of the main program initialization and main program execution flow
5. Main program initialization
The main program variables for sMainExecutable, it through instantiateFromLoadedImage function realization of the main program initialization. View the source code as follows:
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
// try mach-o loader
// if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
// }
// throw "main executable not a known format";
}
Copy the code
This method creates a ImageLoader instance objects, its creation method for instantiateMainExecutable. Enter the instantiateMainExecutable source code:
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
// sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
bool compressed;
unsigned int segCount;
unsigned int libCount;
const linkedit_data_command* codeSigCmd;
const encryption_info_command* encryptCmd;
sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
// instantiate concrete class based on content of load commands
if ( compressed )
return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
else
#if SUPPORT_CLASSIC_MACHO
return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
throw "missing LC_DYLD_INFO load command";
#endif
}
Copy the code
It creates the main program map and returns an Image object of type ImageLoder, which is the main program. The sniffLoadCommands function obtains information about load commands in the Mach-o file and performs various checks on them.
6. Main program execution process
1. Process analysis
Through the above analysis, we have tracked the following program flow: _dyLD_start -> dyLDbootstrap ::start -> dyld::main. To continue tracing the main program execution process, enter the initializeMainExecutable function:
void initializeMainExecutable() { // record that we've reached this step gLinkContext.startedInitializingMainExecutable = true; // run initialzers for any inserted dylibs ImageLoader::InitializerTimingList initializerTimes[allImagesCount()]; initializerTimes[0].count = 0; const size_t rootCount = sImageRoots.size(); if ( rootCount > 1 ) { for(size_t i=1; i < rootCount; ++i) { sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]); } } // run initializers for main executable and everything it brings up sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]); // register cxa_atexit() handler to run static terminators in all loaded images when this process exits if ( gLibSystemHelpers ! = NULL ) (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL); // dump info if requested if ( sEnv.DYLD_PRINT_STATISTICS ) ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]); if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS ) ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]); }Copy the code
This process will call the Initialzers for any inserted dylib, entering the runInitializers function as follows:
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
uint32_t maxImageCount = context.imageCount()+2;
ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
ImageLoader::UninitedUpwards& ups = upsBuffer[0];
ups.count = 0;
// Calling recursive init on all images in images list, building a new list of
// uninitialized upward dependencies.
for (uintptr_t i=0; i < images.count; ++i) {
images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.imagesAndPaths[0] = { this, this->getPath() };
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}
Copy the code
The core code in runInitializers is processInitializers, which recursively instantiates the mirrored list by calling the recursiveInitialization function. Enter the recursiveInitialization function with the following source code:
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { recursive_lock lock_info(this_thread); recursiveSpinLock(lock_info); if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; try { // initialize lower level libraries first for(unsigned int i=0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); if ( dependentImage ! = NULL ) { // don't try to initialize stuff "above" me yet if ( libIsUpward(i) ) { uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) }; uninitUps.count++; } else if ( dependentImage->fDepth >= fDepth ) { dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); } } } // record termination order if ( this->needsTermination() ) context.terminationRecorder(this); // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo); // initialize this image bool hasInitializers = this->doInitialization(context); // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; context.notifySingle(dyld_image_state_initialized, this, NULL); if ( hasInitializers ) { uint64_t t2 = mach_absolute_time(); timingInfo.addTime(this->getShortName(), t2-t1); } } catch (const char* msg) { // this image is not initialized fState = oldState; recursiveSpinUnLock(); throw; } } recursiveSpinUnLock(); }Copy the code
So far, the program loading process is as follows: _dyld_start -> dyldbootstrap::start -> dyld::main -> initializeMainExecutable -> runInitializers -> processInitializers -> recursiveInitialization
Let’s focus on the recursiveInitialization function. It’s easy to understand from the comments that the following code is the core.
Looking at the comments, these three parts are a gradual process:
- The image file will be initialized.
- Initialize the image file;
- The image file is initialized.
There are two functions we’ll focus on here: notifySingle and doInitialization.
2. NotifySingle exploration
Search for notifySingle globally, find notifySingle function implementation. See below:
The key code is line 1112 of sNotifyObjCInit, which calls the function and takes the path of the image file and the machO header. Then search globally for sNotifyObjCInit to find where the function definition was initialized. In the registerObjCNotifiers, sNotifyObjCInit is set to the init function. See below:
Where are the registerObjCNotifiers called? Search registerObjCNotifiers globally. See below:
I found the place to call the registerObjCNotifiers, the _dyLD_objC_notify_register function. A global search for _dyLD_objc_notify_register failed to find where it was called, but many functions have a comment like this:
// Also note, this function must be called after _dyld_objc_notify_register.
Copy the code
That is, _dyLD_OBJC_NOTIFy_register must be called before the process can be executed. Also, literally, what does objC notification registration have to do with the libobjc.a. dylib source code? In libobjc.a.dylib source code, search for _dyLD_OBJC_notify_register, and find the call in objc_init(). See below:
When the _dyLD_OBJC_NOTIFy_register function is called, three arguments are passed.
map_images
Function address ofload_images
functionunmap_image
function
So back to dyld source code, it is not difficult to understand:
sNotifyObjCMapped = mapped = &map_images
sNotifyObjCInit = init = load_images
sNotifyObjCUnmapped = unmapped = unmap_image
In other words:
objc_init()
todyld
Three are registered infunction
In thedyld
During the static library loading process, these three functions are called when certain conditions are met in the specific environment.
2. Call timing of map_images and load_images
When the main program is initialized, the libObjc. Dylib library is loaded, and the objc library registers three methods to dyld. So when are these three methods called?
Dyld cannot be compiled and run, you can only start with libObjc. Dylib, also using BT to view the run stack. The third method is only called when the image file is unmounted, so we’ll just look at the first two methods for now. Let’s add breakpoints in the map_images method and load_images method, and see who gets executed first! Run the program:
-
Map_images is executed preferentially, and the run stack is viewed as follows:
_dyld_objc_notify_register --> registerObjCNotifiers --> notifyBatchPartial --> map_images
-
To continue running the program, run the breakpoint to load_images:
_dyld_objc_notify_register --> registerObjCNotifiers --> load_images
At this point, it can be concluded that the call to map_images precedes load_images. The +load method can only be called after the class is implemented.
But we still need to verify through the source code! After _dyLD_OBJC_NOTIFy_register, the registerObjCNotifiers are executed, in which the sNotifyObjCInit method, load_images, is called in a cyclic manner.
Load_images method call location found, map_images call where? When you enter notifyBatchPartial, you successfully find sNotifyObjCMapped, which is where map_images was called.
Conclusion: Dyld is loaded into the static library. When the main program is initialized, the libobjc. dylib library is loaded, and the objc library registers three methods to dyld. The three methods are map_images, load_images, and unmap_image, and map_images takes precedence over the load_images method!
Dyld :main do some main work, the main program execution process, and analysis of _dyLD_objc_notify_register to achieve libObjc. Dylib dyld function registration, and the execution of the function! However, some questions remain, such as when c++ was loaded in the above case, and how does dyld call objc_init()?
3. Analyze the doInitialization function
We know how functions are registered, but we still don’t know when objc_init() is called. After analyzing notifySingle in recursiveInitialization, there is one doInitialization method that we have not explored. See the following figure for doInitialization source:
As a dynamic linker, dyld loads dynamic libraries, and libobjc.a. dylib library is what it loads. Enter the doImageInit process of the doInitialization function. See below:
How do you understand that? See comments! LibSystem Initializer must run first. LibSystem initializer must be initialized first. Continue with the doModInitFunctions implementation of the doInitialization function. See below:
The doModInitFunctions method loads all the Cxx files. How do you verify that? Bt once! Add breakpoints to the c++ function to view the running stack information, as shown in the following figure:
So the c++ call flow is: _dyld_start -> dyldbootstrap::start -> dyld::main -> dyld::initializeMainExecutable -> ImageLoader::runInitializers -> ImageLoader::processInitializers -> ImageLoader::recursiveInitialization -> doInitialization -> doModInitFunctions -> c++
4. Timing of objc_init() call
Callback function registration process not found? But from the above analysis, at least you can know that it is related to the libSystem library! So the process we’re looking for must be in the libSystem library. Download the source code of libsystem-1292.60.1. Looking for a needle in a haystack won’t do. We need an entrance first. Back to our most familiar libobjc.a.dylib source code, add breakpoints in objc_init(), BT view run stack information, reverse derivation. See below:
As you can see from the figure above, _OS_object_init calls _objc_init, and _OS_object_init is from libdispatch.dylib. As discussed earlier, the libSystem library needs to be loaded first and the stack information above is verified. Next, search globally for libSystem_initializer in the libSystem library. libSystem_initializer
Based on the stack information above, libSystem_initializer-> libDispatch_init will enter the process libDispatch_init.
extern void libdispatch_init(void); // from libdispatch.dylib
Copy the code
Looking at the source code, you can see that this part is from the libDispatch.dylib library. Nothing to say, download the libdispatch.dylib source code, open it and search for libDispatch_init. Discover the following processes:
A global search for _os_object_init() finds the following source code. The first line calls _objc_init(); Methods.
OK! Closed loop complete! The callback function registration process is also through! To summarize, the process of calling objc_init() :
_dyld_start --> dyldbootstrap::start --> dyld::_main --> dyld::initializeMainExecutable --> ImageLoader::runInitializers --> ImageLoader::processInitializers --> ImageLoader::recursiveInitialization --> doInitialization -->libSystem_initializer(libSystem.B.dylib) --> _os_object_init(libdispatch.dylib) --> _objc_init(libobjc.A.dylib)
This completes the registration of the above three callback functions! So the simple way to think about it is sNotifySingle, which is to add a notification which is addObserver, _objc_init, _dyLD_OBJC_notify_register which is to send a notification, which is push, And sNotifyObjcInit is the notification handler, the selector.
7. The main function
To continue with the previous example, the program calls load, c++ function, step over the breakpoint, dyldbootstrap::start, and looks for the main function.
LLDB uses register read to read registers and finds rax = main.
Note: main is a writable function that writes to memory and reads to dyld; If the name of the main function is changed, an error is reported.