Static and dynamic libraries

TARGETS -> Build Phases -> Link Binary With Libraries -> (Add/Add Other…) We can add multiple system libraries or our own, which can contain both static and dynamic libraries.

Static libraries usually end with.a.lib or.framework, and dynamic libraries end with.dylib.tbd.so.framework. (and so on, the framework can be static libraries may also be a dynamic library, we will be back in detail) link, the static library is complete copy to the executable file, used by many times there will be many redundant copies, systems are not copied, dynamic link library program is run by the system dynamically loaded into memory, calls for the program, the system load only once, Multiple programs share, save memory.

Shift + Command + n creates a new project. In Framework & Library, the Framework option defaults to create the Dynamic library. The Static Library option defaults to creating a Static Library, and the value of the created Mach-O types tells us what Type they correspond to. It is also possible to switch between different Mach-0 types, such as Static Library and Dynamic Library. By default, dynamic libraries need to be signed, while static libraries do not need to be signed.

If we create a framework that is a dynamic library, we will get an error: Reason: Image Not Found. Add the Embed to Frameworks, Libraries, and Embedded Content in the General project and set the Embed to Embed & Sign.

Because the dynamic library we created is not really available to other applications, and your App Extension needs to use this dynamic library between apps. This dynamic library can be shared between An App Extension and its Bundle, which is why Apple calls the Framework Embedded Framework. And I call this dynamic library pseudo-dynamic library. Dynamic and static libraries in iOS

Here we continue to follow our Test_ipa_Simple as an example, and the above we built the dynamic library DYLIB and static library STATICLIB into Test_ipa_Simple, Dylib. framework Embed = Embed & Sign = dylib. framework Embed = Embed & Sign = dylib. framework Embed = dyliB. framework Embed = Embed & Sign Import the correct Paths to STATICLIB and.h in the Library Search Paths and Header Search Paths of Build Settings. (For comparison, import webKit. framework in Build Phases -> Link Binary With Libraries.)

dyld: Library not loaded: @rpath/DYLIB.framework/DYLIB
  Referenced from: /Users/hmc/Library/Developer/CoreSimulator/Devices/4E072E27-E586-4E81-A693-A02A3ED83DEC/data/Containers/Bundle/Application/1208BD23-B788-4BF7-A4CE-49FBA99BA330/Test_ipa_Simple.app/Test_ipa_Simple
  Reason: image not found
Copy the code
hmc@bogon Test_ipa_Simple.app % file Test_ipa_Simple 
Test_ipa_Simple: Mach-O 64-bit executable arm64
Copy the code
hmc@bogon DYLIB.framework % file DYLIB 
DYLIB: Mach-O 64-bit dynamically linked shared library arm64
Copy the code
hmc@bogon Debug-iphoneos % file libSTATICLIB.a 
libSTATICLIB.a: current ar archive random library
Copy the code

What is the difference between our dynamic library and the system’s dynamic library?

  1. The dynamic library we created that we imported into the project is in the.app directory of our own app and can only be used by our app Extension and app.
  2. The dynamic library of the system we imported into the project is in the system directory, and all programs can use it.

NSBundle *bundel = [[NSBundle mainBundle] bundlePath]; The.ipa and.app directories were explained in detail in the first article, which we won’t expand here.)

Archive Test_ipa_Simple, export and unpack test_ipa_simple.ipa, Go to the test_ipa_simple.app folder:

We can now verify the link addresses of the dynamic libraries (WebKit and DYLID) in the test_ipa_simple.app executable in the Test_ipa_Simple folder with MachOView. (@rpth stands for Framework folder in.app.)

When loading the dynamic library, the system checks the signature of the framework. The signature must contain the Team Identifier and the Team Identifier of the framework and host app must be the same. This can be done using codesign-DV test_iPA_simple. app and codesign-DV dylib. framework.

  • Why is framework both a static and a dynamic library?

System. Framework is a dynamic library, we build their own. Framework is generally static library. However, now when you create the Framework with Xcode, the default is Dynamic Library (Mach-O Type default is Dynamic Library), usually packaged as SDK for others to use the static Library, You can change the Build Settings mach-o Type to Static Library.

  • What is a framework?

The Framework is a way of packaging resources used in Cocoa/Cocoa Touch applications. It can put code files, header files, resource files, documentation, and so on together for easy use by developers. Generally, if the Framework is static, the resources packaged into the Framework cannot be read. Static Framework and.a files are compiled into executable files. Only dynamic Framework can be seen in the Framework folder under.app, and resource files in.framework can be read.

Cocoa/Cocoa Touch Development Framework itself provides a number of frameworks, such as foundation. Framework/uikit. Framework/AppKit. Framework, etc. It’s important to note that all of these frameworks are dynamic libraries.

Usually, the third-party SDK framework we use is static library, the real dynamic library is not available in AppStore (iOS 8 will be available in AppStore after iOS 8, because there is an App Extension, which requires dynamic library support).

We use use_frameworks! The Pods PROJECT generates a target for each pod. For example, if a pod is called AFNetworking, there will be a target called AFNetworking. The final target generates AFNetworking. Framework.

About use_frameworks!

Add use_frameworks to your Podfile when using CocoaPods! We can see that each source Pod generates a target for the dynamic library Framework under the Pods project. We can see in the target Build Settings -> Mach-o Type that the default setting is Dynamic Library, that is, a Dynamic Framework will be generated, We can see the generated dynamic library for each Pod under Products.

These generated dynamic libraries will be linked to the main project for use by the main project, Add the Embed to General -> Frameworks, Libraries, and Embedded Content and set the Embed to Embed & Sign. We don’t see these dynamic Libraries in Frameworks, Libraries, and Embedded Content. In Frameworks, Libraries, and Embedded Content, cocoapods implements the script to embed these dynamic Libraries in the Framework directory of.app. We can see the script executed in the Build Phase -> [CP]Embed Pods Frameworks of the main project Target. (“${PODS_ROOT}/Target Support Files/ PODs-test_ipa_simple/PODs-test_ipa_simple – framework.sh “)

So Pod defaults to generating dynamic libraries and embedding them in the Framework folder under.app. Go to the Pods project target and set Build Settings -> Mach-O Type to Static Library. The generated library is static, but cocoapods will embed it in the Framework directory of.app, and because it is static, it will report an error: Unrecognized selector sent to instanceUnrecognized selector sent to instance. Dynamic and static libraries in iOS

Let’s leave dynamic libraries and static here, and move on to the linker.

The order in which a set of functions are executed

// the main.m code is as follows:

__attribute__((constructor)) void main_front(a) {
    printf("🦁🦁🦁 %s execute \n", __func__);
}

__attribute__((destructor)) void main_back(a) {
    printf("🦁🦁🦁 %s execute \n", __func__);
}

int main(int argc, char * argv[]) {
    NSLog(@"🦁🦁🦁 %s execute", __func__);
    
// NSString * appDelegateClassName;
// @autoreleasepool {
// // Setup code that might create autoreleased objects goes here.
// appDelegateClassName = NSStringFromClass([AppDelegate class]);
/ /}
// return UIApplicationMain(argc, argv, nil, appDelegateClassName);
    
    return 0;
}

// viewController.m

@implementation ViewController

+ (void)load {
    NSLog(@"🦁🦁🦁 %s execute", __func__);
}

- (void)viewDidLoad {
    [super viewDidLoad];
    // Do any additional setup after loading the view.
}

@end

// After running, the console prints as follows:🦁 🦁 🦁 + [ViewController load] execution 🦁 🦁 🦁 main_front execution 🦁 🦁 🦁 main execution 🦁 🦁 🦁 main_back executionCopy the code

From the console print, you can see that the load function is executed first, followed by the main_front function modified by the constructor attribute, followed by the main function, and finally the main_back function modified by the destructor attribute.

__attribute__ can set Function attributes, Variable attributes, and Type attributes. __attribute__ is preceded by two underscores and is followed by a pair of parentheses with the corresponding __attribute__ syntax: __attribute__(attribute-list)).

If the function is set to the constructor attribute, the function is automatically executed before the main function. Similarly, if the function is set to the destructor property, it will be automatically executed after main or exit is called.

As we know, the.h and.m classes are precompiled when the program runs, then compiled, then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled, and then compiled.

We will see that the App needs to load the dependency libraries and.h and.m files, so who decides the order in which these things should be loaded? This is the dyLD (linker) that we are talking about today. It determines the order in which the content is loaded.

App: images -> dyld: read memory, start main program – link – initialize some necessary objects (runtime, libsysteminit, OS_init).

Let’s focus on two points: the linker itself and the interpretation of the linking process.

Dyld explore

MacOS dyld program location is /usr/lib/dyld

hmc@bogon Simple % file dyld
dyld: Mach-O universal binary with 3 architectures: [x86_64:Mach-O 64-bit dynamic linker x86_64] [i386:Mach-O dynamic linker i386] [arm64e]
dyld (for architecture x86_64):    Mach-O 64-bit dynamic linker x86_64
dyld (for architecture i386):    Mach-O dynamic linker i386
dyld (for architecture arm64e):    Mach-O 64-bit dynamic linker arm64e
Copy the code

Dyld on my computer is a FAT Mach-O file with three platforms x86_64, I386 and ARM64E.

Dyld is short for the Dynamic Link Editor, which translates as dynamic linker and is an important part of apple’s operating system. In iOS/macOS systems, with only very small amounts of process only needs the kernel can finished loading, basically all processes are dynamically linked, so Mach – O image file will have a lot of external reference library and symbols, but these references do not directly use, at startup also must pass the content of these references to fill, This filling is done by the dynamic linker DYLD, or symbol binding. When the system kernel loads the Mach-O file, it needs to link the program with DYLD and load the program into memory.

When we’re writing a project, the first executable code we’re going to see is main and load, and when we’re not overriding a load of a class, we’re going to think of main as the entry to our APP, and when we’re overriding a load of a class, The load function we know is executed before main. The C function __attribute__((constructor) constructor is executed before main, so we can see that our APP has already done some loading of the APP before actually executing the main function. So what are those? We can find some clues by looking at the break points in the load function and printing out the function call stack. As shown below:

In the simulator, where sim indicates that the simulator is in TARGET_OS_SIMULATOR:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100a769c7 Test_ipa_Simple`+[ViewController load](self=ViewController, _cmd="load") at ViewController.m:17:5
    frame #1: 0x00007fff201804e3 libobjc.A.dylib`load_images + 1442
    frame #2: 0x0000000108cb5e54 dyld_sim`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425
    frame #3: 0x0000000108cc4887 dyld_sim`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int.char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 437
    frame #4: 0x0000000108cc2bb0 dyld_sim`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 188
    frame #5: 0x0000000108cc2c50 dyld_sim`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
    frame #6: 0x0000000108cb62a9 dyld_sim`dyld::initializeMainExecutable() + 199
    frame #7: 0x0000000108cbad50 dyld_sim`dyld::_main(macho_header const*, unsigned long.int.char const* *,char const* *,char const* *,unsigned long*) + 4431
    frame #8: 0x0000000108cb51c7 dyld_sim`start_sim + 122
    frame #9: 0x0000000200dea57a dyld`dyld::useSimulatorDyld(int, macho_header const*, char const*, int.char const* *,char const* *,char const* *,unsigned long*, unsigned long*) + 2093
    frame #10: 0x0000000200de7df3 dyld`dyld::_main(macho_header const*, unsigned long.int.char const* *,char const* *,char const* *,unsigned long*) + 1199
    frame #11: 0x0000000200de222b dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int.char const**, dyld3::MachOLoaded const*, unsigned long*) + 457
  * frame #12: 0x0000000200de2025 dyld`_dyld_start + 37
(lldb) 
Copy the code

Screenshot under the real machine, compared with the simulator environment see is less dyld ` dyld: : useSimulatorDyld and dyld_sim ` start_sim call (environment) to switch to the simulator, the sequence of function calls are basically the same, In addition to the runtime environment (dyLD_SIM/dyLD).

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00000001043f19c0 Test_ipa_Simple`+[ViewController load](self=ViewController, _cmd="load") at ViewController.m:17:5
    frame #1: 0x00000001a2bc925c libobjc.A.dylib`load_images + 944
    frame #2: 0x00000001046ea21c dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 464
    frame #3: 0x00000001046fb5e8 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int.char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 512
    frame #4: 0x00000001046f9878 dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 184
    frame #5: 0x00000001046f9940 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 92
    frame #6: 0x00000001046ea6d8 dyld`dyld::initializeMainExecutable() + 216
    frame #7: 0x00000001046ef928 dyld`dyld::_main(macho_header const*, unsigned long.int.char const* *,char const* *,char const* *,unsigned long*) + 5216
    frame #8: 0x00000001046e9208 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int.char const**, dyld3::MachOLoaded const*, unsigned long*) + 396
    frame #9: 0x00000001046e9038 dyld`_dyld_start + 56
(lldb) 
Copy the code

From _dyLD_START to +[ViewController load], the call stack is concentrated in dyLD/dyLD_sim. (The last call to libobjc.a.dylib ‘load_images, which we will examine in detail.) Below we can analyze the functions that appear in the above function call stack using the dyld source code.

_dyld_start

_dyLD_start is an assembly function. Here we only look at __arm64__ &&! TARGET_OS_SIMULATOR (although the contents of __dyLD_START vary between platforms or architectures, we can see from the comments that they all call the dyLDBootstrap ::start method)

#if__arm64__ && ! TARGET_OS_SIMULATOR
    .text
    .align 2
    .globl __dyld_start
__dyld_start:
    mov     x28, sp // mov data transfer instruction x28 -> sp
    and     sp, x28, #~15        // force 16-byte alignment of stack and command ((x28 & #~15) &sp) -> sp
    mov    x0, #0
    mov    x1, #0
    
    // The STP stack instruction (a variant of STR, which operates on two registers at the same time) stores the x1, x0 values in the 16 byte left shift of sp
    stp    x1, x0, [sp, #- 16]!    // make aligned terminating frame
    
    mov    fp, sp            // set up fp to point to terminating frame
    
    // Subtract the value of one register from the value of another register and save the result in the other register
    sub    sp, sp, #16             // make room for local variables sub
    
#if __LP64__

    // load register registers the value of memory into register x0
    ldr     x0, [x28]               // get app's mh into x0
    
    ldr     x1, [x28, #8]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
    add     x2, x28, #16            // get argv into x2
#else
    ldr     w0, [x28]               // get app's mh into x0
    ldr     w1, [x28, #4]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
    add     w2, w28, #8             // get argv into x2
#endif
    
    // AdRP is used to locate data in data segment, because ASTR can cause code and data address randomization, adRP is used to locate data according to PC
    adrp    x3,___dso_handle@page
    
    add     x3,x3,___dso_handle@pageoff // get dyld's mh in to x4
    mov    x4,sp                   // x5 has &startGlue
    
    // ⬇️⬇️⬇️⬇️⬇️ dyldbootstrap::start is an entry
    // call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
    
    // the next instruction address is saved in register LR (x30) before the jump. (Note is the address of the next instruction, not the return value of the current instruction.)
    // Generally used for direct calls to different methods.
    bl    __ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
    
    // The entry point is the address of the dyld:_main function.
    mov    x16,x0                  // save entry point address in x16
    
#if __LP64__
    ldr     x1, [sp]
#else
    ldr     w1, [sp]
#endif
    
    // CMP comparison instruction, equivalent to subs, affects the program status register CPSR
    cmp    x1, #0
    
    b.ne    Lnew

    // LC_UNIXTHREAD way, clean up stack and jump to result
#if __LP64__
    add    sp, x28, #8             // restore unaligned stack pointer without app mh
#else
    add    sp, x28, #4             // restore unaligned stack pointer without app mh
#endif

#if __arm64e__
    braaz   x16                     // jump to the program's entry point
#else
    br      x16                     // jump to the program's entry point
#endif

    // LC_MAIN case, set up stack for call to main()
Lnew:    mov    lr, x1            // simulate return address into _start in libdyld.dylib

#if __LP64__
    ldr    x0, [x28, #8]       // main param1 = argc
    add    x1, x28, #16        // main param2 = argv
    add    x2, x1, x0, lsl #3
    add    x2, x2, #8          // main param3 = &env[0]
    mov    x3, x2
Lapple:    ldr    x4, [x3]
    add    x3, x3, #8
#else
    ldr    w0, [x28, #4]       // main param1 = argc
    add    x1, x28, #8         // main param2 = argv
    add    x2, x1, x0, lsl #2
    add    x2, x2, #4          // main param3 = &env[0]
    mov    x3, x2
Lapple:    ldr    w4, [x3]
    add    x3, x3, #4
#endif

    cmp    x4, #0
    b.ne    Lapple            // main param4 = apple
    
#if __arm64e__
    braaz   x16
#else
    br      x16
#endif

#endif // __arm64__ && ! TARGET_OS_SIMULATOR
Copy the code

dyldbootstrap::start

Dyldbootstrap ::start(app_MH, argc, argv, dyLD_MH, &startGlue) The namespace dyldBootstrap is defined in dyldinitialization.cpp. The start and rebaseDyld functions are defined internally, and we can already guess some of their functions from the namespace name: To initialize the DYLD, Code to bootstrap the DYLD into a runnable state. Let’s take a look at the start function.

//
// This is code to bootstrap dyld. This work in normally done for a program by dyld and crt.
// In dyld we have to do this manually.
//
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
                const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    // Issue a kdebug tracepoint to indicate that dyLD bootstrap has started
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0.0.0.0);

    // if kernel had to slide dyld, we need to fix up load sensitive locations
    // we have to do this before using any global variables
    rebaseDyld(dyldsMachHeader); // Set the virtual address offset, where offset is mainly used for redirection.

    // kernel sets up env pointer to be just past end of agv array
    // The kernel sets the env pointer (envp) just beyond the end of the AGV array (argv)
    const char** envp = &argv[argc+1];
    
    // kernel sets up apple pointer to be just past end of envp array
    // The kernel sets the Apple pointer just beyond the end of the ENVP array
    const char** apple = envp;
    while(*apple ! =NULL) { ++apple; }
    ++apple;

    // set up random value for stack canary
    // Set random values for stack Canary
    // Long __stack_chk_guard = 0; This global variable is set to a random value 😂
    __guard_setup(apple);

// The previous DYLD_INITIALIZER_SUPPORT macro had a value of 0, so the contents inside #if are not executed here
// (runDyldInitializers are also easy in the __mod_init_func section of the __DATA section.)
/ / (but in fact, Initializer function is through the void ImageLoaderMachO: : doModInitFunctions (const LinkContext & context) to perform, behind we will conduct a detailed analysis,)
// (Just make an impression of the Initializer and __mod_init_func fields.)
#if DYLD_INITIALIZER_SUPPORT
    // run all C++ initializers inside dyld
    // run all C++ initializers in dyld
    / / (here you can refer to "Hook static initializers" : https://blog.csdn.net/majiakun1/article/details/99413403).
    // (help us understand how to learn C++ initializers)
    runDyldInitializers(argc, argv, envp, apple);
#endif
    
    // from libc.a, its internal implementation cannot be viewed at present
    _subsystem_init(apple);

    // now that we are done bootstrapping dyld, call dyld's main
    // Now that we are done bootstrapping dyld, we start to call the _main function of dyld
    
    uintptr_t appsSlide = appsMachHeader->getSlide(a);return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
Copy the code

The arguments appsMachHeader and dyldsMachHeader in the start function are const dyLD3 ::MachOLoaded*. (These arguments can be understood as the executable file of the program we are currently executing and the dyLD program’s, respectively Header address). Struct VIS_HIDDEN MachOLoaded defined in namespace dyLD3 can be seen in dyld/ dyLD3 /MachOLoaded. H: Public MachOFile, that is, the MachOLoaded structure inherits from the MachOFile publicly, Struct VIS_HIDDEN MachOFile (dyld3); The mach_header, or MachOFile structure, inherits from the mach_header structure.

MachOLoaded statement:

#ifndef MachOLoaded_h
#define MachOLoaded_h

#include <stdint.h>

#include "Array.h" // You can click here to see the Array template class declaration in namespace dyLD3
#include "MachOFile.h"


class SharedCacheBuilder;

namespace dyld3 {

// A mach-o mapped into memory with zero-fill expansion
// Can be used in dyld at runtime or during closure building
struct VIS_HIDDEN MachOLoaded : public MachOFile
{
...
};

} // namespace dyld3

#endif /* MachOLoaded_h */
Copy the code

MachOFile statement:

namespace dyld3 {

...

// A mach-o file read/mapped into memory
// Only info from mach_header or load commands is accessible (no LINKEDIT info)
struct VIS_HIDDEN MachOFile : mach_header
{
...
};

} // namespace dyld3
Copy the code

Where VIS_HIDDEN is #define VIS_HIDDEN __attribute__((visibility(“hidden”))) can be used to prevent the name of a function from being exported that is not visible to the program file connected to the library. GCC extension attribute ((visibility(“hidden”)))

Dyld ::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); In the call, we see that appsMachHeader has been forcibly converted to macHO_header *, so let’s look at the macHO_header definition. In dyld/ SRC/imageloader.h you can see that under __LP64__ macho_header is publicly inherited from mach_header_64 and on other platforms it is inherited from mach_header (macho_header and Mach_header their names are separated by an o), which is inherited, but is implemented as {}.

#if __LP64__
    struct macho_header                : public mach_header_64  {};
    struct macho_nlist                : public nlist_64  {};    
#else
    struct macho_header                : public mach_header  {};
    struct macho_nlist                : public nlist  {};    
#endif
Copy the code

Mach_header in the previous post on iOS APP Startup Optimization (part 1) : The ipA (iPhone Application Archive) package and An Overview of Mach-O(Mach Object File Format) are discussed in detail but won’t be expanded here.

The Header part of the mach-o file corresponds to a data structure defined in darwin-xnu/EXTERNAL_HEADERS/mach-o/loader.h. Struct mach_header and struct mach_header_64 correspond to 32-bit and 64-bit architectures respectively. (For 32/64-bit architectures, the 32/64-bit Mach header is at the beginning of the Mach-O file)

struct mach_header_64 {
    uint32_t    magic;        /* mach magic number identifier */
    cpu_type_t    cputype;    /* cpu specifier */
    cpu_subtype_t    cpusubtype;    /* machine specifier */
    uint32_t    filetype;    /* type of file */
    uint32_t    ncmds;        /* number of load commands */
    uint32_t    sizeofcmds;    /* the size of all the load commands */
    uint32_t    flags;        /* flags */
    uint32_t    reserved;    /* reserved */
};
Copy the code

In summary, MachOLoaded -> MachOFile -> mach_header. MachOFile inherits mach_header so that it has all the member variables in the mach_header structure, and then the MachOFile definition declares a large set of functions for the Mach-O Header, such as getting the schema name, CPU type, and so on. MachOLoaded inherits from MachOFile’s definition, which makes clear a set of functions that load and process Mach-O’s headers.

dyld::_main

Next we look at the dyld::_main function. In dyld/ SRC /dyld2. CPP, you can see the definition of namespace dyld. Uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, Uintptr_t * startGlue); (The _main function has seven arguments, plus the names of each argument are long, so the function declaration is true.)

First, a comment on the _main function:

The kernel load dyld and jumps to __dyLD_start which sets up some registers and call this function. Returns address of main() in target program which __dyld_start jumps to

The entry point to dyLD. The kernel loads dyLD and jumps to __dyLD_START to set up some registers and call this function. Returns the main() address in the target program to which __dyLD_START jumps.

Here we follow the definition of _main function to analyze the content related to _main function. Because the function definition is processed and called differently according to different platforms and architectures, the function definition is extremely long, with a total of more than 800 lines. Here we only analyze the necessary code segments. The most important part is to analyze the assignment of uintptr_t result in the function.

Inside the _main function we see the following two lines of code:

.// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN(a); .// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD(a); .Copy the code

SMainExecutable is a global variable: static ImageLoaderMachO* sMainExecutable = NULL; The _main // Instantiate ImageLoader for main executable section will instantiate the executable:

// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
gLinkContext.mainExecutable = sMainExecutable;
gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
Copy the code
ImageLoaderMachO

ImageLoaderMachO is a subclass of ImageLoaderMachO which loads Mach -o format files. ImageLoaderMachO is a subclass of ImageLoader used to load files in Mach-O format.

InstantiateFromLoadedImage function returns a pointer to ImageLoaderMachO in dyld/SRC/ImageLoaderMachO. H can be seen in the class ImageLoaderMachO: Public ImageLoader definition, the ImageLoaderMachO class is publicly inherited from the ImageLoader class.

ImageLoader is an abstract base class that creates a concrete subclass of ImageLoader to support loading a specific executable file format. For each dynamic shared object in use, an ImageLoader is instantiated.

The ImageLoader base class is responsible for linking images together, but it doesn’t know anything about any particular file format and is mostly done by its specific subclasses.

For example, ImageLoaderMachO is a specific subclass of ImageLoader that can load files in Mach-O format. (For example, class ImageLoaderMegaDylib: public ImageLoader ImageLoaderMegaDylib is the concrete subclass of ImageLoader which represents all dylibs in the Shared cache.)

instantiateFromLoadedImage

InstantiateFromLoadedImage is in dyld2 h defined in a static function. According to the reference const macho_header * mh inside it called directly ImageLoaderMachO instantiateMainExecutable function for the instantiation of the main executable file (i.e., create ImageLoader object). For the dependency libraries and insertion libraries needed in the program, a corresponding image object will be created, these images will be linked, and the initialization methods of each image will be called, including the initialization of runtime. We then load the image into an Imagelist, so the first thing we look at in Xcode using the imagelist command is our Mach-o, and finally returns the address of the ImageLoader object created from our main executable, That is, the sMainExecutable is the main program that has been created.

// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
    // try mach-o loader
    // isCompatibleMachO is to check whether the subtype of Mach-o supports the current CPU.
// if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
    
        // Create an ImageLoader object from our main executable
        ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
        
        // Load an image into an imagelist, so the first one we look at in Xcode using the imagelist command is our Mach-o
        // add image to static STD ::vector
      
        sAllImages; In this global variable)
      *>
        addImage(image);
        
        return (ImageLoaderMachO*)image;
/ /}
    
// throw "main executable not a known format";
}
Copy the code
ImageLoaderMachO::instantiateMainExecutable

Let’s have a look at the ImageLoaderMachO: : instantiateMainExecutable function definition, its main function is instantiated the executable. It has another layer of nesting inside, Through sniffLoadCommands function to determine is called ImageLoaderMachOCompressed: : instantiateMainExecutable still ImageLoaderMachOClassic: : instantiateMainExecutable. ImageLoaderMachOCompressed and ImageLoaderMachOClassic ImageLoaderMachO subclass.

The class ImageLoaderMachOCompressed: Public ImageLoaderMachO: ImageLoaderMachOCompressed is the concrete subclass of ImageLoader which loads mach-o files that use the compressed LINKEDIT format. (ImageLoaderMachOCompressed is a subclass of ImageLoader, it loads using LINKEDIT Mach – o file compression format.)

The class ImageLoaderMachOClassic: Public ImageLoaderMachO: ImageLoaderMachOClassic is the concrete subclass of ImageLoader which loads mach-o files that use the traditional LINKEDIT format. (ImageLoaderMachOClassic is a concrete subclass of ImageLoader that loads Mach-O files using the traditional LINKEDIT format.)

// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
    //dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
    // sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
    
    bool compressed;
    unsigned int segCount;
    unsigned int libCount;
    const linkedit_data_command* codeSigCmd;
    const encryption_info_command* encryptCmd;
    
    sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
    
    // instantiate concrete class based on content of load commands
    // Instantiate the concrete class based on the contents of the load command
    
    / / judgment according to the specific situation is to use ImageLoaderMachOCompressed or ImageLoaderMachOClassic instantiateMainExecutable function called
    
    if ( compressed ) 
        return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
    else
#if SUPPORT_CLASSIC_MACHO
        return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
        throw "missing LC_DYLD_INFO load command";
#endif
}
Copy the code

The sniffLoadCommands function, which is also a function of the ImageLoaderMachO class, It is used to determine whether the Mach-O file has a classic or compressed LINKEDIT format and the number of segments it has. (& segCount and & libCount two parameters, used for parameters using instantiateMainExecutable function.)

sniffLoadCommands

Let’s look at the definition of the sniffLoadCommands function. This function is too long and we’ll only look at part of it.

// determine if this mach-o file has classic or compressed LINKEDIT and number of segments it has
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
                                            unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
                                            const linkedit_data_command** codeSigCmd,
                                            const encryption_info_command** encryptCmd)
{
    *compressed = false;
    *segCount = 0;
    *libCount = 0;
    *codeSigCmd = NULL;
    *encryptCmd = NULL;

    const uint32_t cmd_count = mh->ncmds;
    const uint32_tsizeofcmds = mh->sizeofcmds; .Copy the code

Determine whether the Mach-O file is classic or compressed LINKEDIT and the number of executable segments of the Mach-O file. Then we can see the validation of the various sections of Load Commands in the Mach-O file, For example, LC_DYLD_INFO, LC_DYLD_INFO_ONLY, LC_LOAD_DYLIB, LC_SEGMENT_64, LC_CODE_SIGNATURE, etc.

.switch (cmd->cmd) {
    case LC_DYLD_INFO:
    case LC_DYLD_INFO_ONLY:
        if( cmd->cmdsize ! =sizeof(dyld_info_command) )
            throw "malformed mach-o image: LC_DYLD_INFO size wrong";
        dyldInfoCmd = (struct dyld_info_command*)cmd;
        *compressed = true;
        break;
    case LC_DYLD_CHAINED_FIXUPS:
        if( cmd->cmdsize ! =sizeof(linkedit_data_command) )
            throw "malformed mach-o image: LC_DYLD_CHAINED_FIXUPS size wrong";
        chainedFixupsCmd = (struct linkedit_data_command*)cmd;
        *compressed = true;
        break;
    case LC_DYLD_EXPORTS_TRIE:
        if( cmd->cmdsize ! =sizeof(linkedit_data_command) )
            throw "malformed mach-o image: LC_DYLD_EXPORTS_TRIE size wrong";
        exportsTrieCmd = (struct linkedit_data_command*)cmd;
        break;
    caseLC_SEGMENT_COMMAND: segCmd = (struct macho_segment_command*)cmd; .Copy the code

SniffLoadCommands (mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd); Function call we see here, Then the following return ImageLoaderMachOCompressed: : instantiateMainExecutable (mh, slide, path, segCount, libCount, context). And return ImageLoaderMachOClassic: : instantiateMainExecutable (mh, slide, path, segCount, libCount, context). The ImageLoaderMachO constructor is called to create the ImageLoaderMachO object.

Here we down ImageLoaderMachOCompressed instantiateMainExecutable function execution flow of a class, is to apply for a space, Then all the way to call ImageLoaderMachOCompressed, ImageLoaderMachO, ImageLoader constructor of a class all the way down.

After sMainExecutable created, assigned to gLinkContext. MainExecutable.

Let’s follow the dyld::_main function implementation from top to bottom to see what’s worth analyzing. let’s do it!

getHostInfo

1 ⃣ ️ 1 ⃣ ️

Call getHostInfo (mainExecutableMH mainExecutableSlide); Function to get the current operating architecture information in the Mach-O header, just to assign the sHostCPU and sHostCPUsubtype global variables.

The getHostInfo function takes two arguments mainExecutableMH and mainExecutableSlide, but is only used in __x86_64__ &&! If __arm64e__ is true, sHostCPU = CPU_TYPE_ARM64; if __arm64e__ is true, sHostCPU = CPU_TYPE_ARM64; sHostCPUsubtype = CPU_SUBTYPE_ARM64E; Assignment operation.

static void getHostInfo(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide)
{
#if CPU_SUBTYPES_SUPPORTED
#if __ARM_ARCH_7K__
    sHostCPU        = CPU_TYPE_ARM;
    sHostCPUsubtype = CPU_SUBTYPE_ARM_V7K;
#elif __ARM_ARCH_7A__
    sHostCPU        = CPU_TYPE_ARM;
    sHostCPUsubtype = CPU_SUBTYPE_ARM_V7;
#elif __ARM_ARCH_6K__
    sHostCPU        = CPU_TYPE_ARM;
    sHostCPUsubtype = CPU_SUBTYPE_ARM_V6;
#elif __ARM_ARCH_7F__
    sHostCPU        = CPU_TYPE_ARM;
    sHostCPUsubtype = CPU_SUBTYPE_ARM_V7F;
#elif __ARM_ARCH_7S__
    sHostCPU        = CPU_TYPE_ARM;
    sHostCPUsubtype = CPU_SUBTYPE_ARM_V7S;
#elif __ARM64_ARCH_8_32__
    sHostCPU        = CPU_TYPE_ARM64_32;
    sHostCPUsubtype = CPU_SUBTYPE_ARM64_32_V8;
#elif __arm64e__
    sHostCPU        = CPU_TYPE_ARM64;
    sHostCPUsubtype = CPU_SUBTYPE_ARM64E;
#elif __arm64__
    sHostCPU        = CPU_TYPE_ARM64;
    sHostCPUsubtype = CPU_SUBTYPE_ARM64_V8;
#else
    struct host_basic_info info;
    mach_msg_type_number_t count = HOST_BASIC_INFO_COUNT;
    mach_port_t hostPort = mach_host_self(a);kern_return_t result = host_info(hostPort, HOST_BASIC_INFO, (host_info_t)&info, &count);
    if( result ! = KERN_SUCCESS )throw "host_info() failed";
    sHostCPU        = info.cpu_type;
    sHostCPUsubtype = info.cpu_subtype;
    mach_port_deallocate(mach_task_self(), hostPort);
  #if __x86_64__
      // host_info returns CPU_TYPE_I386 even for x86_64. Override that here so that
      // we don't need to mask the cpu type later.
      sHostCPU = CPU_TYPE_X86_64;
    #if! TARGET_OS_SIMULATOR
      sHaswell = (sHostCPUsubtype == CPU_SUBTYPE_X86_64_H);
      // <rdar://problem/18528074> x86_64h: Fall back to the x86_64 slice if an app requires GC.
      if ( sHaswell ) {
        if ( isGCProgram(mainExecutableMH, mainExecutableSlide) ) {
            // When running a GC program on a haswell machine, don't use and 'h slices
            sHostCPUsubtype = CPU_SUBTYPE_X86_64_ALL;
            sHaswell = false; gLinkContext.sharedRegionMode = ImageLoader::kDontUseSharedRegion; }}#endif
  #endif
#endif
#endif
}
Copy the code

forEachSupportedPlatform

2 ⃣ ️ 2 ⃣ ️

In this area we see the use of our old friend block in C/C++ functions.

Determine the platform information mainExecutableMH supports.

// Set the platform ID in the all image infos so debuggers can tell the process type
// Set the platform ID in all image Infos so that the debugger can determine the process type

// FIXME: This can all be removed once we make the kernel handle it in rdar://43369446
// The host may not have the platform field in its struct, but there's space for it in the padding, so always set it
{
    // __block decorates platformFound, which needs to be modified in the block below
    __block bool platformFound = false;
    
    // forEachSupportedPlatform has a void (^handler)(uintplatform, uint32_t minOS, uint32_t SDK)
    // this is also the first time we have seen blocks used in C++ functions
    
    ((dyld3::MachOFile*)mainExecutableMH)->forEachSupportedPlatform(^(dyld3::Platform platform, uint32_t minOS, uint32_t sdk) {
        if (platformFound) {
            halt("MH_EXECUTE binaries may only specify one platform");
        }
        
        // Record platform information
        gProcessInfo->platform = (uint32_t)platform;
        platformFound = true;
    });
    
    // If the platform is unknown, under macOS it is assigned masOS, other embedded platforms print and end
    if (gProcessInfo->platform == (uint32_t)dyld3::Platform::unknown) {
        // There were no platforms found in the binary. This may occur on macOS for alternate toolchains and old binaries.
        // It should never occur on any of our embedded platforms.
#if TARGET_OS_OSX
        gProcessInfo->platform = (uint32_t)dyld3::Platform::macOS;
#else
        halt("MH_EXECUTE binaries must specify a minimum supported OS version");
#endif}}...Copy the code

From CRSetCrashLogMessage (dyld: launch “started”); So here we go, dyLD starts up.

setContext

3 ⃣ ️ 3 ⃣ ️

SetContext is a static global function, mainly ImageLoader::LinkContext gLinkContext; The global variable’s properties and function pointer assignments. Setting crash and log addresses, setting context information, etc.

CRSetCrashLogMessage("dyld: launch started");

setContext(mainExecutableMH, argc, argv, envp, apple);
Copy the code

configureProcessRestrictions

4 ⃣ ️ 4 ⃣ ️

To set the environment variables, envp is the argument to _main, which is an array of all the environment variables to insert. Mainly for ImageLoader::LinkContext gLinkContext; This global variable is assigned.

configureProcessRestrictions(mainExecutableMH, envp);
Copy the code

checkSharedRegionDisable

5 ⃣ ️ 5 ⃣ ️

Check the availability of shared cache, load shared cache, depending on platform or environment, GLinkContext. SharedRegionMode will be assigned to ImageLoader: : kDontUseShareRegion or ImageLoader: : kUsePrivateSharedRegion. Without a shared region, iOS cannot run

// load shared cache
checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
Copy the code

instantiateFromLoadedImage

6 ⃣ ️ 6 ⃣ ️ 🦩 ❤ ️

Initialization of the main program. (Load the executable and generate an ImageLoader instance object, as discussed in detail above!)

// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
gLinkContext.mainExecutable = sMainExecutable;
gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
Copy the code

loadInsertedDylib

7 ⃣ ️ 7 ⃣ ️

Starting with senv. DYLD_INSERT_LIBRARIES, the loadInsertedDylib function only needs to pass in the path of the dynamic library.

Given all the DYLD_ Environment variables, the general case for loading libraries is that any given path expands into a list of possible locations to load. We also Must take care to ensure two copies of the “same” library are never loaded separate function for each “phase” of the path expansion. Each phase function calls the next phase with each possible expansion of that phase. The result is the last phase is called with all possible paths. To catch duplicates the algorithm is run twice. The first time, the last phase checks the path against all loaded images. The second time, the last phase calls open() on the path. Either time, if an image is found, the phases all unwind without checking for other paths.

// load any inserted libraries
if( sEnv.DYLD_INSERT_LIBRARIES ! =NULL ) {
    for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib ! =NULL; ++lib) 
        loadInsertedDylib(*lib);
}
Copy the code

link

8 ⃣ ️ 8 ⃣ ️

Link main program.

link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
sMainExecutable->setNeverUnloadRecursive(a);Copy the code

Link all inserted dynamic library (through the above two can know, must first link the main program, and then link all inserted library.) .

// link any inserted libraries
// do this after linking main executable so that any dylibs pulled in by inserted 
// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
if ( sInsertedDylibCount > 0 ) {
    for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
        ImageLoader* image = sAllImages[i+1];
        // Link the image to be added
        link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL.NULL), - 1);
        image->setNeverUnloadRecursive(a); }if ( gLinkContext.allowInterposing ) {
        // only INSERTED libraries can interpose
        // register interposing info after all inserted libraries are bound so chaining works
        for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
            ImageLoader* image = sAllImages[i+1];
            image->registerInterposing(gLinkContext); }}}Copy the code

weakBind

9 9 ⃣ ️ ⃣ ️

Bind weak symbols.

// <rdar://problem/12186933> do weak binding only after all inserted images linked
sMainExecutable->weakBind(gLinkContext);
Copy the code

At this point, you see CRSetCrashLogMessage(“dyld: launch, running initializers”); Initializers are the super core of DyLD ::_main. We will analyze them in detail below.

initializeMainExecutable

🔟 🔟

Perform all initialization methods. RunInitializers through the images you add before initialization.

InitializeMainExecutable () starts initializing the images that the link joins, mainly calling runInitializers recursively.

// run all initializers
initializeMainExecutable(a);Copy the code

notifyMonitoringDyldMain

1 ⃣ 🔟 1 ⃣ ️ 🔟 ️

Find the main function entry

// notify any montoring proccesses that this process is about to enter main()
notifyMonitoringDyldMain(a);Copy the code

Set up the runtime environment -> load the shared cache -> Instantiate the main program -> Insert load the dynamic library -> Connect the main program -> link the inserted dynamic library -> Perform weak symbol binding -> Perform the initialization method -> find the entry and return.

The dyld::_main function does several things:

  1. Set the operating environment, configure the environment variables, set the corresponding values according to the environment variables and get the current operating architecture.
  2. Load shared cache -> Load share cache.
  3. The mainExecutable of the main program image.
  4. Insert the dynamic library loadInsertedDylib.
  5. Link main program.
  6. Link inserts a dynamic library.
  7. WeakBind.
  8. InitializeMainExecutable ().
  9. Return main.

Now let’s analyze initializeMainExecutable. (Space limited, to be continued…)

Refer to the link

Reference link :🔗

  • Dyld – 832.7.3
  • OC Basic principles -App startup process (DYLD loading process)
  • What is dyLD cache in iOS?
  • IOS advanced basic principles – application loading (dyLD loading process, class and classification loading)
  • What does the iOS application do before entering the main function?
  • Dyld load application startup details
  • Dynamic and static libraries in iOS
  • Link path problems in Xcode
  • IOS uses the Framework for dynamic updates
  • Namespace, and problem resolution for repeated definitions
  • C++ namespace namespace
  • Learn about Xcode’s process for generating “static libraries” and “dynamic libraries”
  • Hook static initializers
  • IOS reverse dyLD process
  • OC Low-level exploration 13, class loading 1 – dyld and objC association

Here are some of the new reference links at 🔗 :

  • Section 13 — DyLD loading Process
  • Section 14 — Dyld and LibobJC
  • How can iOS 15 make your apps launch faster
  • LLVM Clang for iOS
  • Rip iOS Bottom 17 — App Loading Process (Perfect update)
  • Rip iOS bottom 18 — A preliminary study of class loading — Dyld and libObjc those things
  • IOS Basics – Comb through the dyLD loading process from scratch