preface
In our last article, iOS Class Loading Process Analysis (Part 2), we discussed the class loading process, but today we will take a detailed look at the classification loading process.
Learn the key
Conditions for classification loading
Category loading mode
1. Exploration of classified loading process
1.1 Source analysis after classification compilation
Before exploring the loading process of the classification, let’s first look at the data structure of the classification when it is compiled from OC to C++ source code. First, create the class and the classification and write the following code:
@interface Person: NSObject @property (nonatomic, copy) NSString *name; @property (nonatomic, assign) int age; - (void)say; + (void)class_method; @implementation Person - (void)say {NSLog(@"%s", __func__); } + (void)class_method { NSLog(@"%s", __func__); } @end //Person+ ca.h @interface Person (CA) <NSObject> @property (nonatomic, copy) NSString *ca_name; @property (nonatomic, assign) int ca_age; - (void)ca_say; - (void)ca_eat; + (void)ca_class_method; @end @implementation Person (CA) - (void)ca_say { NSLog(@"%s", __func__); } - (void)ca_eat { NSLog(@"%s", __func__); } + (void)ca_class_method { NSLog(@"%s", __func__); Int main(int argc, const char * argv[]) {Person *p = [[Person alloc] init]; p.name = @"zzz"; return 0; }Copy the code
Use the clang command to compile the main.m file into C++ source code, as shown in the figure below:
You can see that the generated main.cpp source file compiles a data structure like the one shown below:
The category defined in the above code is compiled to generate the static _category_t structure variable _OBJC_$_CATEGORY_Person_$_CA based on the _category_t data type. The default values of the member variables are as follows:
You can also see the protocol it follows, as shown below:
The difference between class and class is that you can find the get and set methods for class properties in the Person class method list, as shown in the figure below:
But you won’t find get and set methods for attributes defined in the category in the list of methods, as shown in the figure below:
Attributes defined in a class are not member variables of the class and can only be used as computed attributes. However, you can also add associated objects to store or read the values of attributes in the class.
The _category_t structure in ObjC is actually the category_t structure, as shown in the figure below:
1.2 Analysis of classification loading process
In the previous analysis of class loading principles, we left two problems, which we will analyze one by one
- Problem 1: Class loading process, when called
methodizeClass
Function, where thecls
therwe
fornil
But under what circumstancesrwe
Don’t fornil
? When will it be rightcls
therwe
What about initialization?
First, let’s look at the ObjC source code, as shown in the figure below:
The class_rw_ext_t * pointer in the methodizeClass function is obtained by calling ext() of rw, which is the code in red box 2, but at this point we also notice extAllocIfNeed, This function returns a pointer to class_rw_ext_t *. The code logic of this function is that it first gets an address pointer based on the _value in the ro_or_rw_ext member of class_rw_t. If it is, it returns the address. If it is not, it calls extAlloc to create an in-memory pointer of type class_rw_ext_t * and returns it. The extAlloc function code looks like this:
class_rw_ext_t * class_rw_t::extAlloc(const class_ro_t *ro, bool deepCopy) { runtimeLock.assertLocked(); auto rwe = objc::zalloc<class_rw_ext_t>(); rwe->version = (ro->flags & RO_META) ? 7:0; method_list_t *list = ro->baseMethods(); if (list) { if (deepCopy) list = list->duplicate(); rwe->methods.attachLists(&list, 1); } // See comments in objc_duplicateClass // property lists and protocol lists historically // have not been deep-copied // // This is probably wrong and ought to be fixed some day property_list_t *proplist = ro->baseProperties; if (proplist) { rwe->properties.attachLists(&proplist, 1); } protocol_list_t *protolist = ro->baseProtocols; if (protolist) { rwe->protocols.attachLists(&protolist, 1); } set_ro_or_rwe(rwe, ro); return rwe; }Copy the code
The code in this function copies the address of the method list, property list, and protocol list from ro to rew, stores the value of ro to the member variable ro of REw, and sets the value of Rwe to the ro_or_rw_ext of rW in the Person class.
ExtAllocIfNeeded is a global search for the keyword extAllocIfNeeded. There are 8 places where the function is called. There are demangledName, class_setVersion, addMethods_finish, class_addProtocol, _class_addProperty, and objc_duplicateClass functions And attachCategories, which is consistent with the WWDC 2020 description of ro, RW, and RWE. Rwe is created when class attributes, methods, and protocols are added to the categories. AttachCategories the attachCategories function is used to add methods, attributes, and protocols from all categories of the main class to its main class. The result is as follows:
The call to attachToClass is made to attachCategories, and the call to attachToClass is made to MethodizeClass (in addition to the call to the code block in the if (previously) conditional, We also call directly in the following code block, whereas the previously parameter is actually nil throughout the function call chain, so the code in the previously parameter is not executed), and the function code in attachToClass looks like this:
Apart from the attachCategories call above, only the load_categories_nolock function was called in the source code, Let’s write the following code before the attachCategories call in the load_categories_NOLock function to see when the attachCategories function is called, as shown in the picture below:
But first let’s take a look at what attachCategories does. The code looks like this:
The code logic in this function is as follows:
-
Initialize three 64-size pointer arrays mlists, Proplists, and Protolists of types method_LIST_t *, property_LIST_T *, and protocol_list_t *.
-
Set the three count variables McOunt, propcount, and protocount to represent the number of Pointers of the corresponding type that have been added to the three pointer arrays.
-
Iterating through the list of categories for the main class, getting a list of methods, a list of attributes, and a list of protocols from each category,
-
If the list Pointers are not empty, check whether the number of added elements in the three pointer arrays is greater than 64. If yes, go to Step 5. Otherwise, go to Step 6.
-
Call the function attachLists in the rWE pointer member variable of the corresponding type (methods, properties, protocols) and set the pointer array count variable of this type to 0. Go to Step 6.
-
Add the list to the corresponding pointer array (from the end of the array to the front slot), and then add 1 to the corresponding count variable value.
-
Determine if the number of methods added to the Method list is greater than zero. If it is, call prepareMethodLists to fix the selector for each Method in the mLists array and sort by the address of the selector in the Method. The attachLists function is then called to attach the list of methods for all classes to the main class method list.
-
Call the attachLists function to attach a list of protocols for all classes to the main class protocol list.
-
Call the attachLists function to attach a list of properties for all classes to the main class property list.
So that’s the attachCategories function code logic, but we don’t yet know how the list of methods, the list of properties, and the list of protocols in the category are attached to the corresponding list in the main class, but the attachLists function is called to do so.
2. Classification loading mode
Next, we need to verify that the following categories and categories are loaded in a way that calls the attachCategories function, so write validation code and a breakpoint inside the attachCategories function, as shown below:
2.1 Lazy Loading Main Class and lazy loading classification
2.1.1 A single lazy Loading Class
Comment out the Load method in the Person class, compile and run the program, and add code and breakpoints to the realizeClassWithoutSwift function, as shown below:
Compile and run the program, the program gets stuck at a breakpoint, and the Person class loads lazily, printing all the methods in the Person class
As you can clearly see, at this point, before attachCategories are called, the methods in the class have been loaded into the main class method list, and also into ro’s baseMethodList. The rw_ext value of the class is nil, clear the output, and pass the breakpoint. The program does not execute the attachCategories function.
Conclusion: When both the main class and its classification are in lazy loading mode, the attachCategories function is not called to attach the methods in the classification to the main class. The compiler does this, and the list of methods in the main class is one-dimensional, in order that the methods in the classification come before the methods in the main class.
After the classification data is loaded, the memory structure of the main class method category is shown as follows:
2.1.2 Multiple lazy Loading Classes
But so far there is a question, if the main class has many categories, how to deal with it? So add the following categories to man.m, as follows:
// Person+ cb. h @interface Person (CB) <NSObject> - (void)say; - (void)cb_say; - (void)cb_eat; + (void)cb_class_method; @end // Person+ cb. m @implementation Person (CB) - (void)say {NSLog(@"%s -- CB ", __func__); } - (void)cb_say { NSLog(@"%s", __func__); } - (void)cb_eat { NSLog(@"%s", __func__); } + (void)cb_class_method { NSLog(@"%s", __func__); } @end // Person+ cc. h @interface Person (CC) <NSObject> - (void)cc_say; - (void)cc_eat; + (void)cc_class_method; @ the end / / Person + CC. M file code - (void) cc_say {NSLog (@ "% s", __func__); } - (void)cc_eat { NSLog(@"%s", __func__); } + (void)cc_class_method { NSLog(@"%s", __func__); } @end // Person+ cd. h @interface Person (CD) <NSObject> - (void)say; - (void)cd_say; - (void)cd_eat; + (void)cd_class_method; @end // Person+ cd. m @implementation Person (CD) - (void)say {NSLog(@"%s -- CD category ", __func__); } - (void)cd_say { NSLog(@"%s", __func__); } - (void)cd_eat { NSLog(@"%s", __func__); } + (void)cd_class_method { NSLog(@"%s", __func__); #import <Foundation/Foundation. H > #import "Person. H "#import "Person+ ca.h" #import "Person+ cb.h "#import "Person+ cb.h" #import "Person+CD.h" #import "Person+CC.h" int main(int argc, const char * argv[]) { Person *p = [[Person alloc] init]; [p say]; return 0; }Copy the code
Note that the say method added to the CB and CD classes is the same as the say method in the main class and is called in the main.m function. Set the compile order of these classes as follows:
When compiling and running the program, it is found that the lazy loading process of the main class is still executed, and the methot_t information in the method list of the main class is printed, as shown in the following figure:
As you can see, the order of methods in the main class method list is related to the compilation order of the classification. When a classification is compiled, all its methods are inserted at the top of its main class method list, as shown in the diagram below:
Clear the output information of the console, continue to execute the program, and view the output information, as shown in the figure below:
We found that although we called the instance method say in the main class of Person, the print result showed that the instance method say in the Person class CB was displayed. In fact, we have already investigated this reason in the slow search process of methods, namely the binary search algorithm, as shown in the figure below:
Actually also is white projection place code logic, the meaning of this code is that if found in the list of methods to find the way, will traverse forward lookup, if found again in the later compilation in the classification of this method, will return to this method of classification, and then call, in the code above, Although the Person class CD and the Person class CB both have the same say instance method as the main class, they call the Say instance method in the Person class CB because the say instance method in the Person class CB is at the top of the list of main class methods. The fundamental reason is that the Person category CB is compiled after the Person category CD category.
2.2 Non-lazy Loading main Class and lazy loading classification
2.2.1 A single lazy Loading class
First, add the load class method to the main class, leaving only the Person class CA, compile and run the code, the program executes to the breakpoint, and take a look at the function call stack, as shown below:
Print out the list of methods in the main class, as shown in the figure below:
2.2.2 Multiple lazy Loading Classes
Continue with the program and find that the attachCategories function is still not invoked, which has the same load type as in 2.1. Then try adding multiple categories like this. Add the CB, CC, CD categories in the compilation order shown below:
Compile and run the program to a breakpoint, look at the function call stack, and print out the method order of the list of methods in the Person class, as shown in the figure below:
The schematic diagram is as follows:
Continue executing the program, and the output of the console is as shown in the figure below:
2.3 Non-lazy Loading main class and non-lazy loading classification
2.3.1 Single non-lazy Loading category
First, add the load class method to the main class, leaving only one Person class CA, and add the Load class method to the Person class CA. Call the class method in the main function with the following breakpoint:
Compile and run the code, the program runs to the breakpoint, and take a look at the function call stack, as shown below:
Print out the list of methods in the main class, as shown in the figure below:
You can see that the main class only has its own instance method. You can pass the breakpoint and continue executing the program. You can see that the breakpoint in the following function is executed:
The function call stack looks like this:
Now we see that when we implement the load class method in the class, the class is loaded when ObjC’s load_images function is called, and the breakpoint is skipped to attachCategories, where the reW in the Person class is initialized. The attachLists function is called to attach the methods in the Person class to the method list of the main class. The code looks like this:
It is clear that the attachLists function attaches lists in three ways. Since attachLists, attribute lists, and protocol lists are all called by this function, we analyze the execution process of this function using method lists as an example.
-
The first: When the class method list is empty and the number of attached method lists is 1 (list is nil and addedCount), directly assign the address of the method list to the member variable list (pointer type) of rwe’s methods (C++ class method_array_t). The value of this list is just the address of a one-dimensional array that holds method_t.
-
The second: (oldCount (1) + addedCount) array_T * (array_t *); Use to store the value of type (method_list_t *), and then store the value of the list pointer at the end of the memory space, followed by the address of the newly added list of methods.
-
The third: When there are multiple method lists in rWE’s methods, oldCount (oldCount + addedCount) is obtained, and array_T * (oldCount + addedCount) is used to store (array_LIST_t *) values. Then store the value of the old method list pointer to the end of the memory space, and store the address of the new method list to the front of the memory space.
First, before rWE is initialized, we look at the address of ro in the Person class and the value of its member variable baseMethodList. Since baseMethodList is the starting address of the Person method list, we use the command to see how the memory data in the method list looks like this:
You may be puzzled by this data, so I combined the data of the method list in Person with the memory data printing, draw the memory structure of the method list in the main class of Person, as shown below:
It is clear from the above analysis that (method_list_t *) is actually a pointer to the first address of a chunk of memory in the heap that holds data of multiple method_t struct types.
At this point, skip the breakpoint and when the attachLists function is executed, the code in the second branch will be executed as follows:
After executing this code, the class’s method list array data structure looks like this:
Pay attention to
-
The method_list_t structure inherits from the entsize_list_tt structure (as does ivar_list_t and property_list_t, But protocol_list_t is a separate struct type) is actually a storage structure similar to an array pointer (used to store method_t struct variables, The baseMethodList, IVars, baseProperties, and baseProtocols member variables in the class_ro_t structure are Pointers to these types, and the structure ‘ ‘.
-
Method_array_t, property_array_t, and protocol_array_t are all C++ classes that inherit from list_array_tt. Their internal iterator class defines a common body with two member variables. List = uintPtr_t; arrayAndFlag = uintPtr_t; These three classes correspond to the types of the methods, properties, and Protocols member variables in the class_rw_ext_t structure, respectively.
2.3.1 Multiple non-lazy loading classes
Add CB, CC, and CD categories in the project, and the compilation sequence is shown in the figure below:
Add the load class method to each of these categories, compile and run the program to the following breakpoint, view the function call stack, and print the data in the list of methods in the class, as shown below:
As you can see, the compiler does not attach the data from the classification to the main class.
The breakpoint was passed and the program stuck in the load_categories_nolock function. The function call stack looks like this:
When the breakpoint is passed, the program is stuck in the attachCategories function and the value of mCount is printed, as shown below:
If the number of categories loaded is 1, print the information of this category as follows:
The data in the CB class is loaded first. If the attachLists function is executed, the code in the second branch should be executed as follows:
That is, the CB method list precedes the main class method list, passes the breakpoint, applies the load_categories_nolock function again, passes the breakpoint, goes to the attachCategories function, and print the value of McOunt, as well as information about the currently loaded class, as shown below:
The second is to load the data in the CC category. If the attachLists function is executed, the code in the third branch should be executed as follows:
After passing the breakpoint, go to the attachCategories function again and print the value of McOunt, as well as information about the currently loaded category, as shown below:
The data in the CD category is loaded again. If the attachLists function is executed, the code in the third branch should be executed as follows:
Once again, go to the attachCategories function and print the value of McOunt, as well as information about the currently loaded category, as shown below:
Finally, the data in the CA category is loaded again, and when the attachLists function is executed, the code in the third branch is executed as follows:
So why is there such a load classification? The reason is that the load_categories_nolock function retrieves all the categories of Person and then applies the attachCategories function one by one.
After all the classification data has been loaded, we print the address of the method list array and the data for the last method list in it, as shown in the figure below:
According to the above research data, the method list array structure diagram of the class is as follows:
2.4 Lazy loading main class and non-lazy loading classification
2.4.1 Single non-lazy Loading Category
First, remove the load class method from the main class, leaving only a Person class CA, and add the Load class method to the Person class CA. Compile the program, which is stuck at the breakpoint, and the function call stack looks like this:
At this point, although we have not implemented the load class method in the main class, we have implemented the class method in the classification, but we still load the data of the main class in a non-lazy way, and print the method list method in the main class, as shown below:
As you can see, the compiler has attached the data in the class to the main class, and after passing the breakpoint, the attachCategories function is not executed.
2.4.2 Multiple non-lazy Loading Classes
Add the CB, CC, and CD classes to the project, and add the load class method only in CA and CB, as shown in the figure below:
Run the program, the program executes to the breakpoint, and the function stacks are as follows:
Print the method list data in the class, as shown in the figure below:
In this case, the compiler does not attach the class data to the main class. The breakpoint is passed, and the program executes in the attachToClass function, as shown below:
After passing the breakpoint, the program executes the attachCategories function, which prints the value of McOunt, along with the classification information, as shown below:
In this case, all four classes are loaded (although the CC and CD classes do not implement the load method). After passing the breakpoint, the program executes the attachLists function, which should execute the code in the second branch, as follows:
According to the code execution logic in this branch, after execution, the method list in the main class method list array should belong to CD, CC, CA, CB, and finally the main class method list, as shown in the figure below:
According to the above research data, the method list array structure diagram of the class is as follows:
3 summary
According to the above research, in fact, there are five kinds of classified loading:
- If the main class is lazy-loaded and all classes are also lazy-loaded, the class data will be loaded when the class first sends a message, and all the class data has been attached to the main class. The main class and the method list, attribute list and protocol list of the class are all lazy-loaded
Level 1 pointer
Is stored in thero
In the.
(objc_msgSend->lookupImpOrForward->realizeClassWithSwift)
- When the main class is lazily loaded but there is only one non-lazily loaded class, the main class will be loaded as a non-lazily loaded class, and all the classification data has been attached to the main class
Level 1 pointer
Is stored in thero
In the.
Map_images -> map_images_NOLock ->_read_images->realizeClassWithSwift
- When the main class is lazily loaded, but there are multiple non-lazily loaded classes, the main class is loaded as a non-lazily loaded class and is called
attachCategories
Function to classify all dataAll at once
Attached to the main class, the main class and the classification’s list of methods, attribute lists, and protocol lists areThe secondary pointer
Is stored in therwe
In the.
Function call stack: load_images->prepare_load_methods->realizeClassWithoutSwift->methodizeClass->attachToClass->attachCategories
- When the main class is non-lazily loaded and all classes are lazily loaded, this situation is the same as
2
The main class, as well as the method list, attribute list, and protocol list of the classification are all based onLevel 1 pointer
Is stored in thero
In the.
Map_images -> map_images_NOLock ->_read_images->realizeClassWithSwift
- When the main class is a non-lazy-loaded class and there is at least one non-lazy-loaded class in the class, the exception is called
realizeClassWithoutSwift
Function, but also callsattachCategories
Function will classify the dataOne by one
Is attached to the main class in compile order. The main class and the classification’s list of methods, attribute lists, and protocol lists are all based onThe secondary pointer
Is stored in therwe
In the.
(Function call stack: load_images->loadAllCategories-> load_categories_NOLock ->attachCategories)