IOS underlying principles + reverse article summary
The purpose of this article is to analyze the loading process of dyld and see what else is done at the bottom before main
primers
- Create a project in
ViewController
The rewrite theload
Methods,main
I added oneC++
Methods, i.e.,kcFUnc
What is the order in which they were printed?
- Run the program to see the load, kcFunc, main
Print order
The following is the printed result, which can be seen in the order ofLoad --> C++ method --> main
Why in this order? According to conventional thinking, isn’t main the entry function? Why doesn’t main execute first?
With that in mind, let’s explore what else we do before we get to Main.
Compilation process and libraries
Before analyzing app launch, we need to understand the compilation process of the iOSapp code and the dynamic and static libraries.
The build process
The compilation process is shown in the following figure, which is mainly divided into the following steps:
The source file
: Loads files such as. H,. M, and. CPPpretreatment
: Replaces macros, removes comments, expands header file, produces.i filecompile
: Converts. I files into assembly language to produce. S filesassembly
: converts assembly files to machine code files to produce.o fileslink
: Makes references to other libraries in the.o file to generate the final executable
Static and dynamic libraries
Static library
: In the link phase, assembler generated object programs are linked and packaged into executable files along with referenced libraries. The static library will not change at this point because it is compile timeIt was copied directly into the target program
-
Benefits: Once compiled, the library files are virtually useless, and the target program has no external dependencies and can run directly
-
Disadvantages: Because the static library will have two copies, so it will cause the target program to increase the volume, memory, performance, speed consumption
-
The dynamic library
: a programIt does not link to the target program at compile time
In the program, the target program will only store references to the dynamic library in the programIs loaded at runtime
advantage
:-
Reduce the size of the packaged app: Since there is no need to copy to the target program, it does not affect the size of the target program, reducing the size of the app compared to the static library
-
Shared memory, saving resources: the same library can be used by more than one program
-
Update the program by updating the dynamic library: because it is loaded at runtime, the library can be replaced at any time without recompiling the code
-
disadvantages
: Dynamic loading will bring partPerformance loss
Using dynamic libraries also makes the program dependent on the external environment. If the environment lacks dynamic libraries or the library version is incorrect, the program will not run
Diagrams of static and dynamic libraries are shown in the figure below
Dyld Load process analysis
According to dyLD source code, and libobJC, libSystem, libDispatch source collaborative analysis
What is dyld?
Dyld (The Dynamic Link Editor) is an important part of Apple’s operating system. After the app is compiled and packed into a Mach-O file in executable format, dyLD is responsible for connecting and loading the program
So the startup flowchart of the App is as follows
The starting point for the app to start
- In the previous demo, in
load
MethodThe breakpoint
Through thebt
Viewing stack InformationWhere does app launch start
[App starting point] : Through the program run discovery, fromdyld
In the_dyld_start
I started, so I need to goOpenSource download a copy of dyld sourceTo analyze
- The entry can also be found through the stack information to the left of Xcode
Dyld ::_main function source analysis
- in
Dyld - 750.6 -
Look in the source code_dyld_start
To findArm64 architecture
Discovery, which is implemented by assembly, is called through assembly annotationsdyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
Method, is oneC++
Methods (using the ARM64 architecture as an example)
- Search in source code
dyldbootstrap
findNamespace
, and then look in this filestart
Method, the core of which is the return value of the calldyld
themain
Function, wheremacho_header
isMach-O
Head, anddyld
The file that’s loaded isThe Mach - O type
, i.e.The Mach-O type is an executable file type
, consists of four parts:Mach-o header, Load Command, section, and Other Data
, can be passedMachOView
View executable file information
-
Enter dyld::_main source code implementation, particularly long, about 600 lines, if the load process of dyld is not very familiar, can be based on the return value of the _main function, here for more. The _main function does a few things:
- [Step 1:
Environment Variable Configuration
】 : Set values based on environment variables and get the current running schema
- [Step 2:
Shared cache
】 : Check whether the shared cache is enabled and whether the shared cache is mapped to a shared area, for exampleUIKit
,CoreFoundation
Etc.
- [Step 3:
Initialization of the main program
】 : callinstantiateFromLoadedImage
The function instantiates oneImageLoader
object
- [Step 4:
Inserting a dynamic library
】 : traversalDYLD_INSERT_LIBRARIES
Environment variable, callloadInsertedDylib
loading
- [Step 5:
The link of the main program
】
- [Step 6:
Dynamic link library
】
- [Step 7:
Weak sign binding
】 - [Step 8:
Execute the initialization method
】
- [Step 9:
Look for the main program entry
namelymain
Function 】 : FromLoad Command
readLC_MAIN
Entry, if not, readLC_UNIXTHREAD
This brings us to the familiar in everyday developmentmain
The function
- [Step 1:
The following is the main analysis of [Step 3] and [Step 8].
Step 3: Main program initialization
sMainExecutable
Represents the main program variable, to view its assignment, is passedinstantiateFromLoadedImage
Method initialization
- Enter the
instantiateFromLoadedImage
Source code, which creates oneImageLoader
Instance object, passinstantiateMainExecutable
Method to create
- Enter the
instantiateMainExecutable
Source code, whose role is to create an image of the main executable file, return oneImageLoader
Image object of type, i.eThe main program
. Among themsniffLoadCommands
Function timeMach-o type files
theLoad Command
And carry out various checks on it
Step 8: Execute the initialization method
- Enter the
initializeMainExecutable
Source code, mainlyTo iterate over
, will be implementedrunInitializers
methods
- Global search
runInitializers(cons
To find the following source code, its core code isprocessInitializers
Calling a function
- Enter the
processInitializers
Function, where the mirror list is calledrecursiveInitialization
Function is recursively instantiated
- Global search
recursiveInitialization(cons
Function, its source code implementation is as follows
In this case, we need to explore the notifySingle function in two parts, the notifySingle function and the doInitialization function. We will explore the notifySingle function first
NotifySingle function
- Global search
notifySingle(
Function, the point is(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
This sentence
- Global search
sNotifyObjCInit
Found no implementation found, there is an assignment operation
- search
registerObjCNotifiers
Where is the call found in_dyld_objc_notify_register
Made a call
Note:_dyld_objc_notify_register
The function of theta needs to be inlibobjc
Search in source code
- in
objc4-781
Search in source code_dyld_objc_notify_register
, found in_objc_init
The method is called in the source code and passed in parameters, sosNotifyObjCInit
theThe assignment
isobjc
In theload_images
And theload_images
All of them will be called+load
Methods. So to sum up,notifySingle
Is aThe callback function
Load function to load
Let’s go to the source code of load_images and look at its implementation to prove that all load functions are called in load_images
- Through the objC source _objC_init source implementation, enter
load_images
Source code implementation of
- Enter the
call_load_methods
Source code implementation can be found through its coredo-while
Cycle call+load
methods
- Enter the
call_class_loads
Source code implementation, understand the call hereload
Method to validate the class we mentioned earlierload
methods
So,load_images
All of them are calledload
Function, the above source analysis process corresponds exactly to the stack print information【 summary 】 Load source chain is:_dyld_start
–> dyldbootstrap::start
–> dyld::_main
–> dyld::initializeMainExecutable
–> ImageLoader::runInitializers
–> ImageLoader::processInitializers
–> ImageLoader::recursiveInitialization
–> dyld::notifySingle
(this is a callback process) –>sNotifyObjCInit
–> load_images(libobjc.A.dylib)
So the question is, when is _objc_init called? Please read on
DoInitialization function
- Go to the
objc
the_objc_init
Function, and it’s not going to work, so let’s go back torecursiveInitialization
Recursive function source code implementation, found that we ignored a functiondoInitialization
- Enter the
doInitialization
Function source implementation
This is also divided into two parts, one part isdoImageInit
The function, part of it isdoModInitFunctions
Function – enterdoImageInit
Source code implementation, its core is mainlyThe for loop loads the call to the method
And the thing to note here is,libSystem
The initializationMust run first
– to enterdoModInitFunctions
Source code implementation, this method loads allCxx
fileYou can verify this by testing the program’s stack information by placing a breakpoint at the C++ method
When I get here, I still don’t find the call to _objc_init? What to do? Give up? Of course not, we can also look at the stack before we call _objc_init with a symbolic breakpoint,
_objc_init
Add a symbolic breakpoint, run the program, and see_objc_init
The stack information after the break
- in
libsystem
Look forlibSystem_initializer
, to see the implementation
- Based on the previous stack information, we find that the walk is
libSystem_initializer
Will calllibdispatch_init
Function, and the source code for this function is inlibdispatch
Open source library, inlibdispatch
In the searchlibdispatch_init
- Enter the
_os_object_init
Source code implementation, its source code implementation call_objc_init
function
Combined with the above analysis, from initialization_objc_init
registered_dyld_objc_notify_register
Parameter 2 of, i.eload_images
And to thesNotifySingle
–> Parameters sNotifyObjCInie = 2
到sNotifyObjcInit()
Call to form aThe closed loop
So the simple way to think about it is sNotifySingle, which is to add a notification which is addObserver, _objc_init, _dyLD_OBJC_notify_register which is to send a notification, which is push, And sNotifyObjcInit is the notification handler, the selector
[Summary] : _objc_init source chain: _dyld_start –> dyldbootstrap::start –> dyld::_main –> dyld::initializeMainExecutable –> ImageLoader::runInitializers –> ImageLoader::processInitializers –> ImageLoader::recursiveInitialization –> doInitialization LibSystem_initializer (libsystem.b.dylib) –> _os_object_init (libdispatch.dylib) –> _objc_init(libobjc.a.dylib)
Step 9: Find the main entry function
- Assembly debugging, you can see the display coming
+[ViewController load]
methods
- Go ahead. Here we go
kcFunc
The c + + function
- Click on the
stepover
You go down, you run through the process, you go back to_dyld_start
, and then callmain()
Function, completed by assemblymain
Parameter assignment and other operations
dyld
Assembly source implementation
Note:
main
It’s a writable function, writes to memory, reads todyld
, if modifiedName of the main function
And complains
So, to sum up, finallyDyld Load process
, as shown in the figure below, which also illustrates the question: Whyload-->Cxx-->main
Call order of