In the preface, we explored the knowledge related to the working process of dyLD dynamic linker, understood the preparatory work done by iOS system before main function, and made some paving work for the loading of our study classes in the future. Next, we began to analyze and explore the loading principle of OC class.

A, guess how the class of information to memory?

1. Recall the loading process of DYLD and the acquisition of class information

Dyld — images — memory — Person (methods, protocols, classification information…)

Through dyLD dynamic linker, we can read all the information in MachO image, and this information exists in the form of address, such as class, method in class, protocol, classification information, etc..

Images (MachO) — Address — table — class — initialization (RO, RW)

As shown in the MachO structure diagram in the figure above, we can obtain the relevant information of MachO class, and then store the class information in a table through the address. The table is differentiated based on the class. Then the process we need to initialize ro and RW, and then some information in MachO can be accessed through RO and RW.

Dyld and class loading

In the previous dyLD analysis, there was a very important function: _dyLD_OBJC_notify_register. Then we created a new project and added a symbolic breakpoint, as shown in the figure below:

The _dyLD_OBJC_notify_register function is executed in _objc_init, and is a class load process.

Second, class loading principle preparation section

Class load entry function _objc_init

We can enter the _objc_init function by searching for the _dyLD_OBJC_notify_register in the runtime source code:

Before _dyLD_OBJC_notify_register, there are many other operations on _dyLD_OBJC_notify_register. So let’s take a quick look at this.

Some important functions in _objc_init

//_objc_init has several important functions
    environ_init() : 
    tls_init(a);static_init(a);runtime_init(a);exception_init(a);#if __OBJC2__
    cache_t: :init(a);#endif
    _imp_implementationWithBlock_init();
    //dyld function call
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
    
Copy the code

Based on the importance of the content, the functions of the above several functions have been commented, the ultimate purpose is to make a series of preparation for the loading process of the class, let’s briefly introduce the functions of the next few functions

3, environ_init() function

With some internal implementations of functions, you can print information about the dyLD loading process and set some environment variables, such as the following:

  • 3.1 By uncommenting the code to help print below, you can see the loading of the OBJC library throughout the function execution

  • 3.2 You can control the printed data in a specific environment by setting environment variables

The top controls whether nonpointisa is used or not, and the bottom controls the printing of all classes that use the load method.

  • 3.3 You can also export the loading process of the program using the OBJC_HELP command

4, tls_init ()

Some destructor operations for thread key bindings are not covered here.

5, static_init ()

This is a C++ constructor function that allows _objc_init() to call our own constructors methods before execution.Here we do a test to see if we will first come to our own constructors method execution?

5.1 We start by writing our custom constructors methods in the namespace where _objc_init is located.

5.2 then in static_init () internal getLibobjcInitializerOffsets executed, add breakpoints and debug, discovery is to print our custom constructor.

6, runtime_init ()

The runtime runtime environment initialization, it mainly: unattachedCategories, allocatedClasses two table is used to store, later analysis.

7, exception_init ()

  • 7.1 Functions and internal implementation analysis of functions

Initialize libobJC exception handling system, responsible for the program after the exception of the transaction processing work.

When an exception occurs through the function, old_terminate is actively raised to return the exception information of the block.

  • 7.2 Exception Capture Analysis Case

Through NSSetUncaughtExceptionHandler function, called custom void LGExceptionHandlers (NSException * exception) method, to capture the exception information in the program to run. Note here: Exceptions are not errors!

8, cache_t: : init ()

The cache condition is initialized here

9, _imp_implementationWithBlock_init ()

Start the callback mechanism. Usually this doesn’t do much because all initialization is lazy, but for some processes we can’t wait to load trampolines dylib. I’m not going to do too much analysis here.

10 and summary

In _objc_init, the environment variable control and the preparation process of class loading are analyzed, and the general understanding of what is done in _objc_init before class loading. Next, let’s directly enter the principle of class loading analysis.

Third, the loading principle of class formal chapter

1. _DYLD_OBJC_notify_register

_dyld_objc_notify_register(&map_images, load_images, unmap_image)

Copy the code

  • & MAP_images uses reference type, value copy operation, so that when the internal function is executed (there is a recursive call to images), through the pointer to return the relevant results in real time, to prevent in the process of operation, address error caused by the whole program confusion.
  • Load_images does some of the loading of the load method
  • Unmap_image prints and frees memory for unloaded libraries

2, map_images_nolock

  • Here we mainly read images _read_images, images is the MachO binary, we need to read the class information out of the binary, the previous operation is to do some necessary preparation for reading images.

3. _read_images

  • The above section focuses on the conditions that control looping and loading, and then we look at the code below

The above mainly carried out the printing during the operation, the opening up of the table, as well as the comparison of methods and repair work.

  • UnfixedSelectors: The sel obtained by loading SEL in MachO and DYLD is compared with sel obtained in DYLD as the benchmark, and then the implemented and repaired methods are obtained. The following section is devoted to the process, and the functions are briefly summarized here.

3.1 initializeTaggedPointerObfuscator

  • This function does some obfuscation for TaggedPointer without explaining too much.

3.2 NXCreateMapTable function

  • Create a global table through which you can find any class or function anywhere in the world

NXMapTable *NXCreateMapTable(NXMapTablePrototype prototype, **unsigned** capacity) { 

    return  NXCreateMapTableFromZone(prototype, capacity, malloc_default_zone());

}

Copy the code
  • Here we create a global table to store the data read from MachO. The following is the detailed operation

  • Table data is written using hash inserts, which are described below

Here we can see a few tables:

  • Gdb_objc_realized_classes: global table
  • Objc: : unattachedCategories associative classification table
  • Objc ::allocatedClasses has been created for the table

Table expansion:

  • Expansion coefficient: 4/3
  • Suppose the total size is 8 times 3/4
  • Then the expansion rule is: x = 8* 3/4 * 4/3
  • Result: x is the expanded size

3.3 UnfixedSelectors function

  • This is a callback function that returns an SEL parameter to the caller. The specific place of the call is not traced, but the internal execution process is analyzed first.

The actual function is to take sel in MachO and put it into the general table. Because sel in MachO is in different relative positions, a rearrangement and repair are needed. This is the main function of this function.

  • Where hasPreoptimizedSelectors exist, the execution continues.

  • Then _getObjc2SelectorRefs is used to get sel in MachO.

  • Sel_ registerNameNoLock gets the sel name from dyld to get the sel of the same name in MachO.

  • Finally, the sel of DYLD and MachO are obtained for recursive comparison operation. If the DYLD comparison is different, the SEL of DYLD is pointed to the SEL address of MachO, and the last updated value is subject to the one loaded by DYLD.

4. Class loading and repair

After performing SEL comparison, we assume breakpoints in the following conditional judgments, and then step into the process of Discover Classes.

4.1 Let’s analyze the significance of the contents printed by LLDB below.

  • This is the CLS print before and after readClass. Obviously, we read the class information from MachO through readClass, and we get the class information from __NSStackBlock__, indicating that some class processing has been done in the process. The next section focuses on exploring and analyzing readClass.

  • Some processing of future classes or newly created classes will be followed by comments on the function, which will not be focused on here.

5. ReadClass exploration and analysis

First we see the implementation code of readClass, and then in several conditions, add breakpoints to debug, get several functions to mark the execution:

  • addNamedClass(cls, mangledName, replacing);
  • addClassTableEntry(cls);
  • addRemappedClass

5.1 Perform analysis in sequence

< span style = “color: RGB (74, 74, 74); color: RGB (74, 74, 74); line-height: 22px; font-size: 14px! Important;”

  • 1) obtain mangledName

In fact, it took a lot of comparison to get the LGPerson, but we finally got it! After we get the CLS named mangledName, we step into the following condition.

  • (2) into the addNamedClass

So if I go to addNamedClass, and we know from the comment that I’m loading the class name, it’s just going to record it in the global hash table.

Internal implementation of addNamedClass, where some storage operations are performed on dyLD acquired classes and MachO classes.

  • (3) into addClassTableEntry

Then do the next step of single step tracing, and you’ll find yourself in the addClassTableEntry function, where we’ll go to look:

  • If the incoming class is unknown, it is inserted into the unknown class

  • If you add a class, you also load the metaclass information for that class

  • Has the information of type 4 been added?

We continue to trace the readClass and find that all operations have been completed, with no ro and RW operations found. So why don’t we do an assignment here? I guess the system is just doing a placeholder in there, putting the class information in first, and then doing the rest of the assignment later. Let’s continue exploring!

5.2 readClass summary

  • Insert related classes into the table by matching class names
  • ② Use class to find metaclass, insert metaclass information into table
  • ③ No other operations are done on the class, so the information in the class is empty

We’ve added classes to the global table, but how do we bind categories, attributes, and so on? Let’s keep exploring!

Three, the principle of class loading (class information binding: introduction)

1. Where does the class load the rest of the information?

Following the readClass we did above, we continue to look for processing conditions for classes

We find these two conditions, among other things, and then the breakpoint follows inside the function.

Step through the debug trace, we found that the function ended up in realizeClassWithoutSwift, except for a few class and method fixes. Let’s go inside realizeClassWithoutSwift and look at the implementation:

RealizeClassWithoutSwift function

The figure above shows some reads of ro and RW, that is, after we operate on the class, we do other operations on the class information in this method.

The figure above is for the preparation of classes and metaclasses before binding.

The above figure shows the binding operation of the relation chain of classes and metaclasses through superclass and ISA.

3, methodizeClass

After ro, RW and classes are processed, the associated binding of classes and Categories is performed.

Here, the first chapter of class loading principle has been completed, which is a little long, but full of dry goods, and then the final summary work to clarify the main ideas and ideas of this paper:

Iv. Summary at the end of the article

1. Give a general flow chart first

2. Process summary

  • We started with _objc_init to explore the loading principles of the class, did a series of environment preparations, and then went to the function _dyLD_OBJC_notify_register (&map_images, load_images, unmap_image), Start the class loading process.

  • With &map_images we find the _read_images function, and inside we load the class:

1: conditional control for a load 2: fix @selector mess during precompilation 3: error messy class handling 4: fix remapping some classes that were not loaded by the image file 5: fix some messages! 6: When there are protocols in our class: readProtocol 7: repair protocols not loaded 8: Classification 9: class loading 10: unprocessed classes optimize those classes that are violatedCopy the code

🌺🌺🌺 more content look forward to sharing with you, if you like, click a like plus a concern, continue to create good content for you.