preface

In the dyLD program loading process, _objc_init is a key method. Since the _objc_init method registers callback functions with dyLD, let’s explore the _objc_init method

The preparatory work

_objc_init

Explore _objc_init must be inseparable from the source, not to say more about the source

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    // fixme defer initialization until an objc-using image is found?
    environ_init(a);tls_init(a);static_init(a);runtime_init(a);exception_init(a);#if __OBJC2__
    cache_t: :init(a);#endif
    _imp_implementationWithBlock_init();
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
 
#if __OBJC2__
    didCallDyldNotifyRegister = true ;
#endif
}
Copy the code
  • environ_init: Reads environment variables that affect the runtime, and prints environment variable help if neededexport OBJC_HELP = 1
  • tls_init: About threadskey, such as the destructor for per-thread data
  • static_initRun:C++Static constructors. indyldBefore calling our static constructor,libWill be called_objc_initCall your own firstC++The constructor
  • runtime_init:runtimeInitialization of the runtime environment, which is mainlyunattachedCategoriesandallocatedClassesTwo tables
  • exception_initInitialization:libobjcLibrary exception handling system
  • cache_t::init: Initializes the cache condition
  • _imp_implementationWithBlock_init: Enables the callback mechanism. It doesn’t usually do anything, because all the initializations are lazy it’s lazy, but for some processes, it can’t wait to load, righttrampolines dylib
  • _dyld_objc_notify_registerTo:dyldRegister callback

environ_init

 void  environ_init(void) 

{    /* * * */
     // Print OBJC_HELP and OBJC_PRINT_OPTIONS output.
    if(PrintHelp || PrintOptions) { 
         ...
    if  (PrintOptions) {
        _objc_inform("OBJC_PRINT_OPTIONS is set");
    }

    for(size_t i = 0; i < sizeof(Settings)/sizeof(Settings[0]); i++){
             const  option_t *opt = &Settings[i];            
    if(PrintHelp) _objc_inform("%s: %s", opt->env, opt->help);
    if(PrintOptions && *opt->var) _objc_inform("%s is set", opt->env); }}}Copy the code

So if you have PrintHelp or PrintOptions, you can accept all of the environment variables, so let’s get rid of all of those conditions and go straight to the environment variables

 for(size_t i = 0; i <sizeof(Settings)/sizeof(Settings[0]); i++) {
      const option_t *opt = &Settings[i];
     _objc_inform("%s: %s", opt->env, opt->help);
     _objc_inform("%s is set", opt->env);
  }
Copy the code

Note that the above code can only be run in objC source code. If there is no objC source code, export OBJC_HELP = 1 can be displayed on the terminal

Terminal display is also quite convenient, the key is to use the terminal handsome ah. These environment variables can be configured using Xcode. Here are some common examples

Where environment variables are configured in Xcode: Select run target–> Edit Scheme… –> Run –> Arguments –> Environment Variables

OBJC_DISABLE_NONPOINTER_ISA

The environment variable OBJC_DISABLE_NONPOINTER_ISA is a pointer to optimization. YES indicates that the pointer is saved, and NO indicates that the optimized pointer is Nonpointer ISA. First look at Nonpointer ISA without setting the environment variables

Isa low bit 0 is 1, indicating that the ISA is optimized, and there are other data in the high bit

Look again at the environment variable OBJC_DISABLE_NONPOINTER_ISA = YES and see what isa looks like

Isa low bit 0 is 0, indicating that ISA is the memory pointer, and the high value has no other data except CLS

OBJC_PRINT_LOAD_METHODS

The environment variable OBJC_PRINT_LOAD_METHODS prints all load methods in the program. Add the load method to the custom class and set the environment variable OBJC_PRINT_LOAD_METHODS = YES

+[LWPerson Load] This is the load method in the custom LWPerson class, the others are all system-level loads. If too many load methods cause your application to start slowly, or if someone does something in the load method, you can use this environment variable to check who implements the most loads and then rub them

tls_init

Tls_init bindings for thread keys, such as destructors for per-thread data

 void  tls_init(void)
{ 
#if SUPPORT_DIRECT_THREAD_KEYS
    // Create a thread cache pool
    pthread_key_init_np(TLS_DIRECT_KEY, &_objc_pthread_destroyspecific);
#else
    // destructor
    _objc_pthread_key = tls_create(&_objc_pthread_destroyspecific);
#endif
}
Copy the code

static_init

Run the C++ static constructor. Before dyld calls our static constructor, lib calls _objc_init to call its own C++ constructor. In short, libobjc calls its own global C++ function before dyld

static void static_init(a)
{
    size_t count;
    auto inits = getLibobjcInitializers(&_mh_dylib_header, &count);
    for(size_t i = 0; i < count; i++) {
        inits[i]();
    }
    auto offsets = getLibobjcInitializerOffsets(&_mh_dylib_header, &count);
    for(size_t i = 0; i < count; i++) {
        UnsignedInitializer init(offsets[i]);
        init();
    }
}
Copy the code

Let’s test if libobjc is calling itself, and adding C++ functions to the objc source library in dyld

Debugging results show that the libobjc system library itself does call the internal C++ functions

runtime_init

The Runtime runtime environment is initialized with the unattachedCategories and allocatedClasses tables

void runtime_init(void)
{  
   objc::unattachedCategories.init(32);// Class table initialization
   objc::allocatedClasses.init(a);// Class table initialization
}
Copy the code

exception_init

Initialize the libobJC library’s exception handling system, which is similar to objC registering callbacks in dyLD, leaving everyone to handle exceptions

 void  exception_init(void)
 {
  old_terminate = std::set_terminate(&_objc_terminate);
 }
Copy the code

When your application crashes, the system will send out an exception signal when the code in the upper layer does not comply with the underlying rules of the system. The _objc_terminate method is accessed by an exception

static void (*old_terminate)(void) = nil;
static void _objc_terminate(void)
{
    if (PrintExceptions) {
        _objc_inform("EXCEPTIONS: terminating");
    }
    if (! __cxa_current_exception_type()) {
        // No current exception.
        (*old_terminate)();
    }
    else {
        // There is a current exception. Check if it's an objc exception.
        @try {
            __cxa_rethrow();
        } @catch (id e) {
            // It's an objc object. Call Foundation's handler, if any.
            (*uncaught_handler)((id)e);
            (*old_terminate)();
        } @catch(...). {// It's not an objc object. Continue to C++ terminate.(*old_terminate)(); }}}Copy the code

_objc_terminate (*uncaught_handler)((id)e) ¶ If (*uncaught_handler) is found on _objc_Terminate (*uncaught_handler)((id)e) Global search uncaught_handler

objc_uncaught_exception_handler 
objc_setUncaughtExceptionHandler(objc_uncaught_exception_handler fn)
{
    objc_uncaught_exception_handler result = uncaught_handler;
    uncaught_handler = fn;
    return result;
}
Copy the code

Uncaught_handler = fn (); fn (); fn ()

cache_t::init

The cache condition is initialized

void cache_t::init(a)
{
#if HAVE_TASK_RESTARTABLE_RANGES
    mach_msg_type_number_t count = 0;
    kern_return_t kr;
    while (objc_restartableRanges[count].location) {
        count++;
    }
    // Enable caching
    kr = task_restartable_ranges_register(mach_task_self(),
                                         objc_restartableRanges, count)
    if (kr == KERN_SUCCESS) return;
    _objc_fatal("task_restartable_ranges_register failed (result 0x%x: %s)",
                kr, mach_error_string(kr));
#endif // HAVE_TASK_RESTARTABLE_RANGES
}
Copy the code

_imp_implementationWithBlock_init

Start the callback mechanism. Usually nothing, because all initialization is lazy, but for some processes, trampolines dylib can’t wait to load

void
_imp_implementationWithBlock_init(void)
{
#if TARGET_OS_OSX
    // Eagerly load libobjc-trampolines.dylib in certain processes. Some
    // programs (most notably QtWebEngineProcess used by older versions of
    // embedded Chromium) enable a highly restrictive sandbox profile which
    // blocks access to that dylib. If anything calls
    // imp_implementationWithBlock (as AppKit has started doing) then we'll
    // crash trying to load it. Loading it here sets it up before the sandbox
    // profile is enabled and blocks it.
    //
    // This fixes EA Origin (rdar://problem/50813789)
    // and Steam (rdar://problem/55286131)
    if (__progname &&
        (strcmp(__progname, "QtWebEngineProcess") = =0 ||
         strcmp(__progname, "Steam Helper") = =0)) {
        Trampolines.Initialize(a); }#endif
}
Copy the code

_dyld_objc_notify_register

A callback to dyLD’s registration, _DYLD_OBJC_Notify_register is only called by the OBJC runtime and the implementation of the method is in the DYLD source code

// _dyld_objc_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
     dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
// _dyld_objc_notify_init
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
    notifyBatchPartial(dyld_image_state_bound, true.NULL.false.true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// call 'init' function on all images already init'ed (below libSystem)
for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(a); it ! = sAllImages.end(a); it++) { ImageLoader* image = *it;if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC()) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0.0);
     (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}Copy the code

The _dyLD_OBJC_notify_register has three parameters

  • &map_images:dyldwillimageThis function is called when loaded into memory
  • load_images:dyldInitialize allimageThe file will call
  • unmap_imageWill:imageCall when you remove

So we’ve explored load_images which is essentially a load method, and now we’re exploring map_images, &map_images is a pointer passed to the address of the same implementation, because &map_images is the first argument, Search globally for sNotifyObjCMapped in dyLD

SNotifyObjCMapped is called in the notifyBatchPartial method. NotifyBatchPartial is called in the registerObjCNotifiers when objC initializes the registration notification. So map_images is called and load_images is called

read_images

Go to map_images, the source code is as follows

void
map_images(unsigned count, const char * const paths[],
           const struct mach_header * const mhdrs[])
{
    mutex_locker_t lock(runtimeLock);
    return map_images_nolock(count, paths, mhdrs);

}
Copy the code

Click on map_images_NOLock, because there is a lot of code in it, let’s get straight to the point

I’m going to go to the _read_images method, which is a little bit confusing because there’s a lot of code in that method or I’m going to read it a little bit and find out that apple developers have provided a log

void _read_images(header_info hList, uint32_t hCount, int 
totalClasses, int 
unoptimizedTotalClasses)
{
   ... // Indicates that some code is omitted
#define EACH_HEADER \
    hIndex = 0;         \
    hIndex < hCount && (hi = hList[hIndex]); \
    hIndex++
    // Condition control to load once
    if(! doneOnce) { ... }// Fix a messy issue with @selector during precompilation
    // It is the same method in different classes but the address of the same method is different
    // Fix up @selector references
    static size_tUnfixedSelectors; {... } ts.log("IMAGE TIMES: fix up selector references");
    
    // Error messy class handling
    // Discover classes. Fix up unresolved future classes. Mark bundle classes.
    bool hasDyldRoots = dyld_shared_cache_some_image_overridden(a);for (EACH_HEADER) { ... }
    ts.log("IMAGE TIMES: discover classes");
    
    // Fix remapping some classes that were not loaded by the image file
    // Fix up remapped classes
    // Class list and nonlazy class list remain unremapped.
    // Class refs and super refs are remapped for message dispatching.
    if (!noClassesRemapped()) {... } ts.log("IMAGE TIMES: remap classes");

#if SUPPORT_FIXUP
    // Fix some messages
    // Fix up old objc_msgSend_fixup call sites
    for (EACH_HEADER) { ... }
    ts.log("IMAGE TIMES: fix up objc_msgSend_fixup");

#endif
    // When there is a protocol in the class: 'readProtocol'
    // Discover protocols. Fix up protocol refs.
    for (EACH_HEADER) { ... }
    ts.log("IMAGE TIMES: discover protocols");
    
    // Fix the protocol that was not loaded
    // Fix up @protocol references
    // Preoptimized images may have the right
    // answer already but we don't know for sure.
    for (EACH_HEADER) { ... }
    ts.log("IMAGE TIMES: fix up @protocol references");
    
    // The processing of the classification
    // Discover categories. Only do this after the initial category
    // attachment has been done. For categories present at startup,
    // discovery is deferred until the first load_images call after
    // the call to _dyld_objc_notify_register completes.  
    if (didInitialAttachCategories) { ... }
    ts.log("IMAGE TIMES: discover categories");
    
    // Class loading processing
    // Category discovery MUST BE Late to avoid potential races
    // when other threads call the new category code befor
    // this thread finishes its fixups.
    // +load handled by prepare_load_methods()
    // Realize non-lazy classes (for +load methods and static instances)
    for (EACH_HEADER) { ... }
    ts.log("IMAGE TIMES: realize non-lazy classes");
    
    // Classes that are not processed, optimize those that are violated
    // Realize newly-resolved future classes, in case CF manipulates them
    if (resolvedFutureClasses) { ... }
    ts.log("IMAGE TIMES: realize future classes"); .#undef EACH_HEADER

}
Copy the code

Sort out main flow modules according to log prompts

  • Condition control for one load
  • Fixed precompile phase@selectorThe confusion of the problem
  • Wrong messy class handling
  • Fixed remapping some classes that were not loaded by the image file
  • Fix some messages
  • When there is a protocol in a class:readProtocol
  • Fix protocols that were not loaded
  • Treatment of classification
  • Class loading processing
  • For classes that are not processed, optimize those that are violated

The following is a separate analysis based on the key points of the log

Only load once

 if(! doneOnce) { doneOnce = YES;DoneOnce = YESlaunchTime = YES; .// Preoptimized classes don't go in this table.
        // 4/3 is NXMapTable's load factor
        int namedClassesSize = 
        (isPreoptimized()? unoptimizedTotalClasses : totalClasses) *4 / 3;  
        // Create a hash table to store all classes
        gdb_objc_realized_classes =
            NXCreateMapTable(NXStrValueMapPrototype, namedClassesSize);
        ts.log("IMAGE TIMES: first time tasks");
  }
Copy the code

DoneOnce =YES after loading once, the next time will not enter the judgment. Create table gDB_objC_realized_classes, which holds all classes implemented and unimplemented

repair@selectorThe chaos

static size_t UnfixedSelectors;
{
    mutex_locker_t lock(selLock);
    for (EACH_HEADER) {
        if (hi->hasPreoptimizedSelectors()) continue;
        bool isBundle = hi->isBundle(a);// Get the method names list from macho
        SEL *sels = _getObjc2SelectorRefs(hi, &count);
        UnfixedSelectors += count;
        for (i = 0; i < count; i++) {
            const char *name = sel_cname(sels[i]);
            SEL sel = sel_registerNameNoLock(name, isBundle);
            if(sels[i] ! = sel) { sels[i] = sel; }}}}Copy the code

Because different classes may have the same method, but the same method but different address, fix those messy methods. Because methods are stored in classes, the location in each class is different, so the address of the method is different

Wrong messy class handling

for (EACH_HEADER) {
    if (! mustReadClasses(hi, hasDyldRoots)) {
        // Image is sufficiently optimized that we need not call readClass()
        continue;
    }
    // Read the class list information from macho
    classref_t const *classlist = _getObjc2ClassList(hi, &count);
    bool headerIsBundle = hi->isBundle(a);bool headerIsPreoptimized = hi->hasPreoptimizedClasses(a);for (i = 0; i < count; i++) {
        Class cls = (Class)classlist[i];
        Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);
        
        // The class may be moved at runtime, but it is not deleted
        if(newCls ! = cls && newCls) {// Class was moved but not deleted. Currently this occurs
            // only when the new class resolved a future class.
            // Non-lazily realize the class below.
            resolvedFutureClasses = (Class *)
                realloc(resolvedFutureClasses, 
                        (resolvedFutureClassCount+1) * sizeof(Class)); resolvedFutureClasses[resolvedFutureClassCount++] = newCls; }}}Copy the code

Add a breakpoint at readClass to run the source code

CLS is pointing to an address, newCls hasn’t been assigned yet, so I’m randomly assigned a dirty address, so I have some data, breakpoint let’s go down and see what happens when I’m assigned

The figure shows that readClass is used to associate class names with addresses. We may not understand, for example 🌰 here has a house now no one to buy, this does not belong to anyone, but the house has the address of the street that road number. Now Zhang SAN bought, so the property certificate has Zhang SAN’s name, the house is associated with Zhang SAN

Now through the custom class validation, now customize the two classes LWPerson and LWTeacher. Because classList = _getObjc2ClassList is retrieved from __objc_classList in the macho file Section. Now look at the macho file

The address of LWPerson is 0x0000000100004230, the address of LWTeacher is 0x0000000100004280 and the CLS above are also corresponding

Macho and the source corresponding, a touch of everything, can be said to be quite perfect. Let’s explore readClass, because readClass code is more than the main point to explore

Class readClass(Class cls, bool headerIsBundle, bool headerIsPreoptimized)
{
    // Get the class name
    const char *mangledName = cls->nonlazyMangledName(a);if (missingWeakSuperclass(cls)) { ... }
    cls->fixupBackwardDeployingStableSwift(a); Class replacing = nil;if(mangledName ! =nullptr) {... }if(headerIsPreoptimized && ! replacing) {... }else {
        if (mangledName) { 
        //some Swift generic classes can lazily generate their names
            // Associate the class name with the address
            addNamedClass(cls, mangledName, replacing);
        } else { ...}
        // Insert the associated class into another hash table
        addClassTableEntry(cls);
    }
    // for future reference: shared cache never contains MH_BUNDLEs
    if (headerIsBundle) { ... }
    return cls;

}
Copy the code

You may wonder where CLS ->nonlazyMangledName() comes from. It’s in readClass

  • nonlazyMangledNameGet the name of the class
  • rwThe assignment androThe acquisition is not inreadClassInside, wait to run the source code to explore
  • addNamedClassBind the class name with the address association
  • addClassTableEntryInserts the associated classes into a hash table of initialized classes

How does nonlazyMangledName get the class name

const char *nonlazyMangledName(a) const {
    return bits.safe_ro() - >getName(a); }Copy the code

Enter safe_ro, the source code is as follows

const class_ro_t *safe_ro(a) const {
    class_rw_t *maybe_rw = data(a);if (maybe_rw->flags & RW_REALIZED) {
        // maybe_rw is rw
        // rw has values directly obtained from ro in rw
        return maybe_rw->ro(a); }else 
        // maybe_rw is actually ro
        // Get the data directly from ro, which is the data in macho
        return (class_ro_t*)maybe_rw; }},Copy the code

Explore addNamedClass to bind the class name to the address association

static void addNamedClass(Class cls, const char *name, Class replacing = nil)
{
    runtimeLock.assertLocked(a); Class old;if ((old = getClassExceptSomeSwift(name)) && old ! = replacing) {inform_duplicate(name, old, cls);
        // getMaybeUnrealizedNonMetaClass uses name lookups.
        // Classes not found by name lookup must be in the
        // secondary meta->nonmeta table.
        addNonMetaClass(cls);
    } else {
        // Update the gDB_objC_realized_classes table with key set to name and value set to CLS
        NXMapInsert(gdb_objc_realized_classes, name, cls);
    }
    ASSERT(! (cls->data()->flags & RO_META));
    // wrong: constructed classes are already realized when they get here
    // ASSERT(! cls->isRealized());
}
Copy the code

Update gDB_objC_realized_classes hash table where key is name and value is CLS

Let’s explore addClassTableEntry and insert another table

static void
addClassTableEntry(Class cls, bool addMeta = true)
{
    runtimeLock.assertLocked(a);// This class is allowed to be a known class via the shared cache or via
    // data segments, but it is not allowed to be in the dynamic 
    //table already.
    // allocatedClasses
    auto &set = objc::allocatedClasses.get(a);ASSERT(set.find(cls) == set.end());
    if (!isKnownClass(cls))
        set.insert(cls);
    if (addMeta)
        // Insert the metaclass into the hash table
        addClassTableEntry(cls->ISA(), false);
}
Copy the code
  • allocatedClassesin_objc_initIn theruntime_initInitialization of the runtime environment, which is mainlyunattachedCategoriesandallocatedClassesTwo tables, insert at this pointallocatedClassesIn the table
  • addMeta = trueAdd the metaclassallocatedClassesIn the table

The assignment of Rw and the retrieval of ro are not in readClass

The figure clearly shows that breakpoints do not enter the assigned code area

Class loading processing

Class loading processing is more complex and important, today a simple introduction to explore the process source code as followsThe comment clearly prompts you to initialize a non-lazily loaded class. What non-lazily loaded class is implementedloadMethod or static instance method graph where you add your own judgment is not broken becauseLWPersonLazy loading classes. Now giveLWPersonaddloadmethods

  • whenLWPersonA non-lazy-loaded class breakpoint will break
  • nlclslistA little bit about the inside call_getObjc2NonlazyClassListThe way to do that is to get the non-lazy load list from Macho, frommachoIn theSectionis__objc_nlclslistGet, simply check true
  • realizeClassWithoutSwiftThis method may be somewhat familiar and will be explored in more detail later

There is only one data in the __objc_nlclslist list, and the data is 0x0100004250. Because IOS reads from right to left in small-end mode, the first data in the __objc_classList list is the LWPerson class, whose address is 0x0100004250. So the non-lazy-loaded list data is the LWPerson class

conclusion

The whole process of exploration from dyLD to _objc_init to read_images is gradually connected. A lot of knowledge is also somewhat to the surface, the context is more and more clear. This is followed by the very important and detailed class loading