This is the third day of my participation in the August More text Challenge. For details, see: August More Text Challenge

In the previous application loading article, we reviewed the process of loading dyld. Next, we will analyze the code logic in detail using _objc_init as the entry point.

Objc_init analysis

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    // The initialization terminal for environment variables opens the environment variable help export OBJC_HELP=1
    environ_init();
    // Binding on the thread key, such as the thread data destructor
    tls_init();
    Before dyld calls our static constructor, LLDB calls _objc_init, so we have to do it ourselves
    static_init();
    // Runtime environment initialization
    runtime_init();
    // The exception handling system is initialized
    exception_init();
#if __OBJC2__
    // Cache condition initialization only works in Objective-C ++ 2.0
    cache_t::init();
#endif
    // Start the callback mechanism, which usually doesn't do anything because all initialization is lazy, but for some processes we can't wait to load the tramplienes dylib
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
Copy the code
  • environ_init()Open the environment variable help instruction as:export OBJC_HELP=1;
  • tls_init(): about the thread key binding, such as the thread data destructor;
  • static_init(): the system levelC++ global static functionsA call,dyldCall ourStatic constructorBefore,lldbWill be called_objc_init, so we have to do it ourselves;
  • runtime_init(): Runtime environment initialization;
  • exception_init(): Initialization of the exception handling system;
  • cache_t::init(): The cache condition is initialized only whenOBjective - c + + 2.0Version is valid.
  • _imp_implementationWithBlock_init: startingThe callbackThe mechanism usually doesn’t do anything, because all the initializations are, rightThe inertia of theBut for some processes, we can’t wait to loadtramplienes dylib;

environ_init

The main runtime environment variables are some initialization operations, these environment variables can be in the debugging time to give us help, source code as follows:

The core code is the last for loop, pull it out, and let’s see what it prints:

This method prints out some environment variables that we may need to help with debugging as we develop;

In addition to circular printing, we can print the environment variable at the terminal by exporting OBJC_HELP=1:

These Environment Variables can be configured from target — Edit Scheme — Run –Arguments — Environment Variables;

Take OBJC_DISABLE_NONPOINTER_ISA as an example:

The ISA of the Person object before the environment variable is set

Set environment variables:

View the ISA for the Person object:

OBJC_DISABLE_NONPOINTER_ISA controls whether it is nonPOinter_ISA

Let’s try OBJC_PRINT_LOAD_METHODS again:

Run the program:

The OBJC_PRINT_LOAD_METHODS environment has been changed to print out all the load methods in the project, providing ideas for optimization (load method will affect program startup).

tls_init

The thread key binding, such as the thread data destructor, has the following source code:

static_init

Before dyld calls our static constructor, LLBC calls _objc_init, so we have to do it ourselves. The source code is as follows:

In the previous analysis, we figured out that the C++ constructor is called during the doInitialization->doModInitFunctions process, so why is it called here? We know from the comments that _objc_init calls objc’s own static constructor before dyld; In order to make dyld execute in time, we call the C++ constructor in advance.

To verify this in code, we add a C++ constructor to the _objc_init method:

Breakpoints execute source code:

Continue to perform:

You can see that the C++ constructor is actually called in this method;

runtime_init

Runtime environment initialization, source code as follows:

From the init method, we know that unattachedCategories and allocatedClasses are two tables, more on that later;

exception_init

Exception handling system initialization, source code as follows:

The implementation of _objc_terminate corresponds to:

Crash is a signal sent after an abnormal command. When a crash occurs, the _objc_terminate method is used and the exception is thrown in uncaught_handler.

In the App layer we can define a function and assign it to fn, which is uncaught_handler, so that in the App we can use this function to receive the exception and handle it;

The processing process of crash:

cache_t::init

Cache condition initialization, source code as follows:

_imp_implementationWithBlock_init

Starting the callback mechanism usually doesn’t do much because all the initialization is lazy, but for some processes we can’t wait to load the tramplienes dylib.

_dyld_objc_notify_register

_dyld_objc_notify_register(&map_images, load_images, unmap_image);
Copy the code

Here’s the question, why do you have an & in front of map_images, but you don’t have an & in front of load_images?

Because the operation in the load_images method is relatively simple, just calling the load method, whereas the operation in the map_images method is relatively complex; & means to take an address, &map_images means to pass a pointer; What good would it do? Since map_images needs to map image files, the operation is time-consuming, pointer passing can ensure that map_images calls stay synchronized;

Map_images_nolock is called in map_images. The main implementation of this method is as follows:

void 
map_images_nolock(unsigned mhCount, const char * const mhPaths[],
                  const struct mach_header * const mhdrs[])
{
    static bool firstTime = YES;
    header_info *hList[mhCount];
    uint32_t hCount;
    size_t selrefCount = 0;

    // Perform first-time initialization if necessary.
    // This function is called before ordinary library initializers. 
    // fixme defer initialization until an objc-using image is found?
    if (firstTime) {
        // Related initialization
        preopt_init();
    }

    if (PrintImages) {
        _objc_inform("IMAGES: processing %u newly-mapped images... \n", mhCount);
    }


    // Find all images with Objective-C metadata.
    // Find all images with Objective-C metadata
    hCount = 0;

    // Count classes. Size various table based on the total.
    // Count the number of classes depending on the total size of the table
    int totalClasses = 0;
    int unoptimizedTotalClasses = 0;
    {
        uint32_t i = mhCount;
        while (i--) {
            const headerType *mhdr = (const headerType *)mhdrs[i];

            auto hi = addHeader(mhdr, mhPaths[i], totalClasses, unoptimizedTotalClasses);
            if(! hi) {// no objc data in this entry
                continue;
            }
            
            if (mhdr->filetype == MH_EXECUTE) {
                // Size some data structures based on main executable's size
#if __OBJC2__
                // If dyld3 optimized the main executable, then there shouldn't
                // be any selrefs needed in the dynamic map so we can just init
                // to a 0 sized map
                if ( !hi->hasPreoptimizedSelectors() ) {
                  size_t count;
                  _getObjc2SelectorRefs(hi, &count);
                  selrefCount += count;
                  _getObjc2MessageRefs(hi, &count);
                  selrefCount += count;
                }
#else
                _getObjcSelectorRefs(hi, &selrefCount);
#endif
                
#if SUPPORT_GC_COMPAT
                // Halt if this is a GC app.
                if (shouldRejectGCApp(hi)) {
                    _objc_fatal_with_reason
                        (OBJC_EXIT_REASON_GC_NOT_SUPPORTED, 
                         OS_REASON_FLAG_CONSISTENT_FAILURE, 
                         "Objective-C garbage collection " 
                         "is no longer supported.");
                }
#endif
            }
            
            hList[hCount++] = hi;
            
            if (PrintImages) {
                _objc_inform("IMAGES: loading image for %s%s%s%s%s\n", 
                             hi->fname(),
                             mhdr->filetype == MH_BUNDLE ? " (bundle)" : "",
                             hi->info()->isReplacement() ? " (replacement)" : "",
                             hi->info()->hasCategoryClassProperties() ? " (has class properties)" : "",
                             hi->info()->optimizedByDyld()?" (preoptimized)":""); }}}// Perform one-time runtime initialization that must be deferred until 
    // the executable itself is found. This needs to be done before 
    // further initialization.
    // (The executable may not be present in this infoList if the 
    // executable does not contain Objective-C code but Objective-C 
    // is dynamically loaded later.
    if (firstTime) {
        sel_init(selrefCount);
        arr_init();

#if SUPPORT_GC_COMPAT
        // Reject any GC images linked to the main executable.
        // We already rejected the app itself above.
        // Images loaded after launch will be rejected by dyld.

        for (uint32_t i = 0; i < hCount; i++) {
            auto hi = hList[i];
            auto mh = hi->mhdr();
            if(mh->filetype ! = MH_EXECUTE && shouldRejectGCImage(mh)) { _objc_fatal_with_reason (OBJC_EXIT_REASON_GC_NOT_SUPPORTED, OS_REASON_FLAG_CONSISTENT_FAILURE,"%s requires Objective-C garbage collection "
                     "which is no longer supported.", hi->fname()); }}#endif

#if TARGET_OS_OSX
        Disable +initialize fork safety if the app is too old (< 10.13).
        // Disable +initialize fork safety if the app has a
        // __DATA,__objc_fork_ok section.

// if (! dyld_program_sdk_at_least(dyld_platform_version_macOS_10_13)) {
// DisableInitializeForkSafety = true;
// if (PrintInitializing) {
// _objc_inform("INITIALIZE: disabling +initialize fork "
// "safety enforcement because the app is "
// "too old.)");
/ /}
/ /}

        for (uint32_t i = 0; i < hCount; i++) {
            auto hi = hList[i];
            auto mh = hi->mhdr();
            if(mh->filetype ! = MH_EXECUTE)continue;
            unsigned long size;
            if (getsectiondata(hi->mhdr(), "__DATA"."__objc_fork_ok", &size)) {
                DisableInitializeForkSafety = true;
                if (PrintInitializing) {
                    _objc_inform("INITIALIZE: disabling +initialize fork "
                                 "safety enforcement because the app has "
                                 "a __DATA,__objc_fork_ok section"); }}break;  // assume only one MH_EXECUTE image
        }
#endif

    }

    if (hCount > 0) {
        // Load the image file
        _read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);
    }

    firstTime = NO;
    
    // Call image load funcs after everything is set up.
    // When everything is ready, call the image load function
    for (auto func : loadImageFuncs) {
        for (uint32_t i = 0; i < mhCount; i++) { func(mhdrs[i]); }}}Copy the code
  • preopt_init(): Initialize the relevant environment;
  • hCount: Stores all of theMirroring objective-C metadataThe number of PI, which is inwhileThe loophList[hCount++] = hiforhCount++Assignment operation;
  • totalClassesDeposit:classThe number of,classDepends on the number oftableThe total size of;
  • _read_images: Loads the image file.
  • loadImageFuncs: After all the work is ready, the image file loading function is called;

So how exactly is the image being loaded in this case? At its core is method

_read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);
Copy the code

Read_image Process (Emphasis)

This method looks complicated, so let’s close all the judgment branches and take a look at the function as a whole:

There are many log information in the _read_images function. According to the log information, the main process of _read_images function is analyzed:

  • 1. Conditional control for a load
  • 2. Fix precompile@selectorThe problem of confusion
  • 3. Error messy class handling
  • 4. Fixed the remapping of some classes that were not loaded by the image file
  • 5. Fix some messages
  • 6. What to do when a class has a protocol:readProtocol
  • 7. Fix the protocol that was not loaded
  • 8. Treatment of classification
  • 9. Class loading processing
  • 10. Unprocessed classes, optimize classes that are deleted and not recycled (future classes)

The following is a step-by-step analysis of the process:

1. Conditional control for a load

  • initializeTaggedPointerObfuscator: small object type, which mainly do some confusing operations, not as the focus;
  • namedClassesSize: Calculates the size required to create the table, here is beforecacheIn 3/4 of the reverse calculation, calculate the capacity
  • NXCreateMapTable: Create a pageHash tableIs used to store class information. The table size isnamedClassesSize;

NXCreateMapTable creates a hash table to store class information.

Gdb_objc_realized_classes:

// This is a misnomer: gdb_objc_realized_classes is actually a list of 
// named classes not in the dyld shared cache, whether realized or not.
// This list excludes lazily named classes, which have to be looked up
// using a getClass hook.
NXMapTable *gdb_objc_realized_classes;  // exported for debuggers in objc-gdb.h
Copy the code

What that means is that this hash table is used to store named classes that are not in the shared cache, and the total capacity, whether the class is implemented or not, is four thirds of the number of classes;

2. Fix precompile@selectorThe problem of confusion

Fix the @selector reference where sel is a string with an address

  • _getObjc2SelectorRefsThrough:_getObjc2SelectorRefsgetMachOStatic segment in__objc_selrefs;
  • sel_registerNameNoLockRegister:selTo add it to the hash tablenamedSelectors;
  • sels[i] = selFrom:MachOReads theselThe address is not a real address and needs to be reassigned to the address fromdyldReads theselAddress shall prevail;

This block of code takes the static segment __objc_selrefs from MachO through _getObjc2SelectorRefs, traverses the list, and adds sel to the hash table through sel_registerNameNoLock. Sel_registerNameNoLock reads from dyld;

The source code for _getObjc2SelectorRefs is as follows:

GETSECT(_getObjc2SelectorRefs,        SEL,             "__objc_selrefs"); 
Copy the code

It gives you the static section __objc_selrefs in MachO;

The same retain method is found, but the two addresses are not the same, so you need to reassign the address.

3. Error messy class handling (class with name and address)

  • Class cls = (Class)classlist[i]: getclsAt this time,clsIt’s just an address;
  • readClass: reads the class, after thatclsWill really have a name;

Breakpoint print CLS:

After executing readClass:

Breakpoint debugging shows that after the readClass step, the class has an address and a name;

if(newCls ! = cls && newCls) {// Class was moved but not deleted. Currently this occurs 
    // only when the new class resolved a future class.
    // Non-lazily realize the class below.
    resolvedFutureClasses = (Class *)realloc(resolvedFutureClasses, (resolvedFutureClassCount+1) * sizeof(Class));
    resolvedFutureClasses[resolvedFutureClassCount++] = newCls;
}
Copy the code

This code will not be executed if it passes a breakpoint; Here we’re dealing with some future classes. What is a future class? A class that should be deleted but is not is called a future class. This class needs to be processed here.

4. Fixed the remapping of some classes that were not loaded by the image file

Remap unmapped classes and Super classes:

  • _getObjc2ClassRefsUsed to obtainMachOIn the static period__objc_classrefs, that is, to obtainThe references to classes;
  • _getObjc2SuperRefsUsed to obtainMachOIn the static period__objc_superrefs, that is, to obtainA reference to the parent class;

The class that remapClassRef operates on is lazily loaded.

5. Fix some messages

  • _getObjc2MessageRefs: getMachOThe static period__objc_msgrefs;
  • fixupMessageRef: registers the function pointer and fixes it as a new pointer;

Get __objc_msgrefs in MachO through _getObjc2MessageRefs, walk through and register the function pointer with fixupMessageRef, and fix it as a new pointer

6. What to do when a class has a protocol:readProtocol

All Protocol lists are traversed and loaded into the hash table of the Protocol

  • Class cls = (Class)&OBJC_CLASS_$_Protocol;: CLS = Protocol class. All protocols and objects have similar structures. Isa corresponds to Protocol class
  • NXMapTable *protocol_map = protocols();createProtocol a hash table, table,protocol_map
  • _getObjc2ProtocolListThrough:_getObjc2ProtocolListAccess to theIn the MachOThe static period__objc_protolistProtocol list that is read from the compiler and initializedprotocol
  • readProtocol: Loops throughreadProtocolMethod to add the protocol toprotocol_mapThe hash table

7. Fix the protocol that was not loaded

  • _getObjc2ProtocolRefs: obtain the static segment __objc_protorefs of MachO
  • remapProtocolRef: Compares the current protocol with the protocols in the same memory address in the protocol list. If the protocols are different, replace them

RemapProtocolRef = remapProtocolRef = remapProtocolRef

/*********************************************************************** * remapProtocolRef * Fix up a protocol ref, in case the protocol referenced has been reallocated. * Locking: runtimeLock must be read- or write-locked by the caller * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
static size_t UnfixedProtocolReferences;
static void remapProtocolRef(protocol_t **protoref)
{
    runtimeLock.assertLocked();
    // Obtain the protocol for the uniform memory address in the protocol list
    protocol_t *newproto = remapProtocol((protocol_ref_t)*protoref);
    if(*protoref ! = newproto) {// If the current protocol is different from the same memory address protocol, replace*protoref = newproto; UnfixedProtocolReferences++; }}Copy the code

__objc_protorefs in MachO is obtained by _getObjc2ProtocolRefs, and then traverses the protocol that needs to be fixed. The remapProtocolRef method compares the current protocol and the protocol list in the same memory address of the protocol is the same, if different, replace;

8. Treatment of classification

It is used to process classes, which can only be executed after classes are initialized and data is loaded into the class. For run-time classes, it is found after the first load_images call after the execution of _dyLD_OBJC_NOTIFy_register

9. Class loading processing (returns the real data structure of a class)

Classes that implement non-lazy loading for load methods and static instance variables

Nlclslist method source code:

const classref_t *header_info::nlclslist(size_t *outCount) const
{
#if __OBJC2__
    // This field is new, so temporarily be resilient to the shared cache
    // not generating it
    if (isPreoptimized() && hasPreoptimizedSectionLookups()) {
          *outCount = nlclslist_count;
          const classref_t *list = (const classref_t *)(((intptr_t)&nlclslist_offset) + nlclslist_offset);
      #if DEBUG
          size_t debugCount;
          assert((list == _getObjc2NonlazyClassList(mhdr(), &debugCount)) && (*outCount == debugCount));
      #endif
          return list;
    }
    return _getObjc2NonlazyClassList(mhdr(), outCount);
#else
    return NULL;
#endif
}
Copy the code

AddClassTableEntry method source:

/*********************************************************************** * addClassTableEntry * Add a class to the table  of all classes. If addMeta is true, * automatically adds the metaclass of the class as well. * Locking: runtimeLock must be held by the caller. **********************************************************************/
static void
addClassTableEntry(Class cls, bool addMeta = true)
{
    runtimeLock.assertLocked();

    // This class is allowed to be a known class via the shared cache or via
    // data segments, but it is not allowed to be in the dynamic table already.
    auto &set = objc::allocatedClasses.get();

    ASSERT(set.find(cls) == set.end());

    if(! isKnownClass(cls)) set.insert(cls);if (addMeta)
        addClassTableEntry(cls->ISA(), false);
}
Copy the code
  • through_getObjc2NonlazyClassListTo obtainMachOThe static period__objc_nlclslistThat is, the table of non-lazily loaded classes
  • addClassTableEntry(cls);Inserts the class and its metaclass into the table
  • realizeClassWithoutSwiftInitialize the class (in step 3, only the name and address are used; data is not loaded), assign read and write data (such as RW), and return the real structure of the class

Note that this is a non-lazily loaded class

10. Unprocessed classes, optimize classes that are deleted and not recycled (future classes)

  • realizeClassWithoutSwift: the implementation class
  • realizeAllClasses: Implements all classes

Through combing the process, the core content is found in step 3’s readClass and step 9’s realizeClassWithoutSwift methods