1. Introduction to XCode compilation process
1.1 Importing Source Files
The code that we write, take OC as an example, is generallya.h,a.m
Class file. How do we determine the order in which we compile this, if we look in Xcodecompile sources
The list below shows all of our.m files in the order in which they were compiled from top to bottom. So where did our dot h file go? And if you look around when you’re writing code you’ll notice that there’s an import”.h” in our.m file by default, which means that the.m file will automatically import the contents of the.h file when it’s compiled.
1.2 the precompiled
In the pre-compilation phase, the compiler mainly does lexical and grammatical analysis, and will do the following for us
- Remove all “#define” and expand all macro definitions.
- Handle all conditional precompiled instructions, such as “#if”, “#ifend”, “#elif”, “else”, “endif”.
- Process the “#include” precompiled instruction by inserting the included file into the location of the precompiled instruction. Note that this process is recursive, meaning that the included file may also contain other files.
- Delete all comments “//””/* */”.
- Add line number and file name identification so that the compile-time compiler can generate line number information for debugging and display line number information for compile-time compilation errors or warnings.
- Keep all #pragma compiler directives because the compiler needs to use them.
1.3 build
In the compilation stage, the whole compilation process is to conduct a series of lexical analysis, grammar analysis, semantic analysis and the corresponding assembly code file generated after optimization. By compilation we also mean that this stage is the most important part of the entire program construction.
1.4 the assembly
In the assembly stage, the compiled assembly code is transformed into instructions that can be executed by the machine. Almost every sentence of assembly code corresponds to a machine instruction. In short, it was translated against a dictionary of machine instructions
1.5 the link
In the linking phase, the compiler has more to do. When we write code, we usually only write relevant business code. In fact, every.m file we write is compiled into a corresponding.o file. The key is that these files cannot be executed independently. We may also reference a number of third-party libraries, and have to add the system’s dynamic libraries to form a complete executable. I don’t want to give too much introduction here. If you are interested, you can take a look (Self-Cultivation of Programmer) to summarize:
- Merge all the.o files we wrote ourselves into one.o file
- Add copies of the associated static and dynamic libraries and merge them with our own.o file
- Once the files are combined, there are other operations such as address and space allocation, symbol resolution, relocation, and so on
1.6 Generating an executable File
After the linking phase, an executable Mach-O file is generated (for OC, this may not be the case in other languages)
2. APP loading process
Since the executables are all Mach-O binaries, how do these formatted binaries carry out the relevant division of labor at startup to achieve the effect of cooperation between their respective functions? I describe the following process in colloquial language first, and then we will comb through the process in the actual source code
- Systemlib — the underlying dependency library that all our code runs on
- GCD dispatchlib preparation –>iOS multithreading underlying support
- The preparation of all objClib classes is a prerequisite for our code to execute correctly
- The main() function entry –> the starting point of the program after the preparation is complete, where we write the code to start
2.1 APP launch
2.1.1 The activities from startup to main()
I first, I prepared a debugging project, but also find a few copies of the source code, information will be affixed in the end, interested readers can go to their own implementation of a study of the whole process. I set a breakpoint just before main() to get the screenshot belowFound no place to study,But isn't there a function that executes earlier than main()?
The answer is yes, if you look at the log below, the load method of the ViewController is executed before main()Let’s see if we have anyEarlier than the load
What, do not see do not know a look startled. So there’s so much going on before Load.
2.1.2 Load method before the story
Let’s restore this process by exploring the main process in the source code (it is a big project to fully understand the source code, so I will only show the reader the key process validation).
In the source of the dyldlib library _dyld_start
(By searching dyLD source code, we found this piece of arm64 architecture assembly code)
dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*)
(By searching start()
dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, Unsigned long*) (If already compiled, click to jump to this function)
Click to jump todyld::initializeMainExecutable()
Find the key calling functionImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&)
Find the next call stack functionImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
Move on to the next oneImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&)
So let’s find the functiondyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*)
The location of the implementation, however, the thread seems to have broken here. *sNotifyObjectInit only knows that there is a pointer to this functionThen in dyld source code global search function pointer assignment location, location found and have a new problem which is who calledRegisterObjCNotifiers function
.orFind the location in dyLD where the registerObjCNotifiers are called
, found or Apple’s old routine to do a mid-tier API to undertake. Then we can’t find the call location of _dyLD_OBC_notify_register. The only problem is that other libraries call this method, and the name makes sense.Looking back at the previous stack screenshot, I assumed it was the _dyLD_OBC_Notify_register method called by the LibobJC library.Go to the libobJC library to find the location of the call. Sure enough, it finds the location of load_images as wellClick on load_images to see the source code for this method, and find call_load_methods (call +load methods).oh!!Finally, we come to the load method
2.1.3 Objc_init Invocation Process Exploration
Everything seems to be working out, but there’s a new question?
The _dyLD_OBC_notify_register function is not called by load itself, and should be registered with load first.
_dyLD_OBC_notify_register is called in the objc_init function
Where does the objc_init process start?
With these two questions in mind, let’s go back and forth and see if there is anything left over from the analysis process that started with _DYLD_START. Let’s start with objc_initWe return to the recursiveInitialization step and find that the doInitialization is performed before the initialization notificationLet’s go through the doInitialization processClick on the next doModInitFunctions step, where we see that libSystem must execute first or the program will crashNext we go to the libsystem library to find the next execution flow, which is indeed libdispatch_initThen find the next execution point from the libDispatch libraryClick on _OS_object_init to see the code to execute. We found the point we were looking for _objc_initOkay, so now that you have a general idea, LET me summarize it with a brain map
2.1.4 How is main() invoked
Finally, I don’t know if you’re wondering how my main() function is called, even though we already know how to call objc_init and load. Don’t worry, we can’t drop out of the line after we’ve got this excited, so let’s go through one more wave, and let’s look at the stack again when we put the breakpoint in main()Screenshot from the stack aboveThe discovery is invoked directly in the _start method
🤦♂️ Looks like _start has something we haven’t looked at yet. Going back to the _start method, we found some surprising information in the _main function of the store’s direct return. It’s right behind the process we just exploredLet’s see how this main entry works, okay Hey, Macho paragraph read, what?? Don't be a pussy. Let's see if Macho has this
. Drag the executable Macho into the rotten apple and I find the correspondingLC_MAIN
Period ofBut what about the main() function? Let’s look at the fuction segmentThe exact same address completes the whole process of exploring to main(). Congratulations, readers, on your new discovery.
3. Source information
Github.com/KClichen/ap… All the source code and tools for the analysis process are here for interested readers to explore on their own