Through the previous exploration summary of Alloc bottom layer, the process also carried out some simple tests on the source code, this paper will do some research on OC classes, records. (The whole is long, about 6728 words)

Key points of this paper:

  1. The underlying types of NSObject and Class, and their inheritance relationships
  2. Objc_class data structure analysis
  3. Isa data structure and function analysis
  4. SuperclassAPI analysis
  5. Cache_t data structure analysis
  6. Bits data structure analysis
Resource preparation:
  1. Objc4-818 source

  2. The code used for the whole dynamic analysis:

    //Person
    @interface UIPerson : NSObject 
    - (void)say3;
    @property(atomic, copy)NSString* name;
    @property(nonatomic, copy)NSString* age;
    @end
    @interface UIPerson(a)- (void)say2;
    @end
    @implementation UIPerson
    - (void)say1{NSLog(@"say1"); }/ / not statement
    - (void)say2{NSLog(@"say2"); }// Internal declaration
    - (void)say3{NSLog(@"say3"); }// External declaration
    @end
    
    //main
    //main.m
    int main(int argc, const char * argv[]) {
        @autoreleasepool {
            // insert code here...
            NSLog(@"I'm coming!");
            Class  cls = [UIPerson class];
        }return 0;
    }
    Copy the code

The data type of Class

The most common thing in OC is classes, and the base class of all classes is NSObject. So what exactly are NSObject and Class types underneath? Start by transforming the prepared code using the Clang command.

Clang-rewrite-objc main.m -o outmain.cpp

After execution, a search in the newly generated outmain. CPP file reveals that NSObject is the alias of the objc_Object structure and Class is the structural pointer to objc_class. Of course, other types have been found, such as the ID type.

The converted code provides a preliminary understanding of why the ID type is universally accepted. It turns out that ID is a pointer to the objC_Object structure.

Now that you know that the underlying structure of a class is objc_class, you can search the source code for objc_class to see what the data structure looks like inside it. Objc-runtime-new. h: objC_class (); objc-runtime-new.h: objc_class ();

That way, we know the underlying data types of NSObject and Class, and we can’t help but think of the classic ISA diagram in the official documentation, through the inheritance of the underlying structure, and its internal members and the ISA in its parent Class, which is related to the inheritance of the OC Class.


Isa pointer

1) ISA diagram analysis

Then we start with the isa of the class. When we first explore the underlying type of the class, we can see that the ISA in objc_class is derived from its parent (or root) objc_Object, so we need to start with the ISA diagram in the official document.

It can be preliminarily analyzed through this figure that there is no difference between object method and class method at the bottom layer. All object methods are object methods, but the level of class object and metaclass object search is different through ISA.

2) Understand ISA through the API

The most direct use of ISA is the [XXX class] method. Its implementation is to find the class object through the ISA of the instance, and then the class finds the metaclass, the root metaclass, and so on. How do you find class objects from instance objects using ISA? You can check out the source code.

An implementation of [xx class] can be found in nsobject. m by clicking: the class returns itself, and the instance finds the corresponding class through ISA. Static analysis may seem like an easy answer, but it’s not. Use dynamic debugging to run the code in resource preparation and look at the assembly to see that the actual call to [xx class] has been changed by LLVM to objc_opt_class

Obj -> objc_object::getIsa(); bits&ISA_MSK; If yes, the offset and & operations are used to restore.

Bits&ISA _MSK can be printed and verified by p, x/4gx command. Class CLS = [UIPerson Class]; The uintptr_t data of its metaclass is calculated and verified in 4. First Po to prove that it is indeed UIPerson, then transform the Uintptr_t into memory address, output the header address of UIPerson’s class, and compare its ISA pointer to the result after calculation in Figure 3.

Note that if the result is MetaClass, objc_opt_class will return the current caller obj.

So far, the implementation of ISA diagram is completed by analyzing the underlying logic from API: the first address of isaInstance isa& mask can find the first address of the class object, the first address of the class object isa& mask can find the metaclass, and the metaclass can find the root metaclass.

Note one thing:

In the process of analysis and verification, console output printing was carried out continuously, and it was found that when ISA arrived at the class, its ISA address was directly the address of its metaclass, and so was the metaclass to the root metaclass. The reason for this may be that isa does not need to be optimized to store more information

3) Understand Superclass through apis

After exploring the isa process [XXX superclass] is much simpler, directly use dynamic debugging to see the method called at the bottom.

Superclass is called through objc_msgSend, and the breakpoint is directly located in nsobject. m, which is called step by step.

So the superclass method is to divide plus minus. GetIsa () is used to obtain the superclass from the objc_class structure, or the class directly from its structure.

4) ISA_T data structure

The isa of the previously known class comes from its parent objc_object, where defined it can have data type isa_t, which isa union. This means that its members share memory space. This is also in contrast to the TaggedPointer bit identifier from high to low, which is defined in ISa.h in macro terms and varies in length and meaning depending on the architecture.

Then go into ISa.h and look at the macro definition of ISA_BITFIELD, where:

  • nonpointer: indicates whether it is an optimization pointer.0Represents a pure ISA pointer,1Represents an optimization pointer, includingISA_BITFILED content of the platform definition, such as the address of the object, reference counting, C++ identity, etcAnd stored in bit-fields.
  • Has_assoc:Associated object flag bit.0Indicates no association,1Represents an association.
  • has_cxx_dtor: Indicates whether the object hasC ++orObjcDestructor of. If so, the destructor is executed without skipping.
  • shiftcls: stores class PointersvalueTo enable pointer optimization.Arm64 architectureIn the33Bits are used to store class Pointers,x86_64In the44position
  • magicThe debugger determines whether the current object is a real object or notInitialized space
  • weakly_referenced: Indicates whether to point to oneARCIs a weak variable of. If there are weak references, the weak object should be released first. If there are no weak references, the weak object should be skipped.
  • has_sidetable_rc: When the object reference count is greater than10, the variable is required to be stored.
  • extra_rc: The reference-count value of the object. Value is equal to therealcount-1For example, the object reference count is10At this time,extra_rcfor9, if greater than10, then use the abovehas_sidetable_rc
  • isDeallocating: indicates whether the object is being released. (Method in structure)

The ISA build for a class, member initialization assignment is done in objc_Object ::initIsa(), in the previous three steps exploring the underlying implementation of Alloc:

  1. Calculate object size
  2. Open up corresponding space
  3. Binding the isa

The binding of classes to ISA was mentioned.

5) API analysis of class object comparison

Isa data structure, diagram, inheritance chain have been analyzed, we can start to analyze two relatively confusing comparison class API: isKindOfClass, isMemberOfClass, first a code to see the output.

 BOOL re1 = [(id)[NSObject class] isKindOfClass: [NSObject class]];       //y
 BOOL re2 = [(id) [NSObject class] isMemberOfClass: [NSObject class]];     //f
 BOOL re3 = [(id) [UIPerson class] isKindOfClass: [UIPerson class]];       //f
 BOOL re4 = [(id) [UIPerson class] isMemberOfClass: [UIPerson class]];     //f
 NSLog(@" re1: %hhd\n re2: %hhd\n re3: %hhd\n re4: %hhd\n",re1.re2.re3.re4);

 BOOL re5 = [(id) [NSObject alloc] isKindOfClass: [NSObject class]];       //y
 BOOL re6 = [(id) [NSObject alloc] isMemberOfClass: [NSObject class]];     //y
 BOOL re7 = [(id) [UIPerson alloc] isKindOfClass: [UIPerson class]];       //y
 BOOL re8 = [(id) [UIPerson alloc] isMemberOfClass: [UIPerson class]];     //y
 NSLog(@" re5: %hhd\n re6: %hhd\n re7: %hhd\n re8: %hhd\n",re5.re6.re7.re8);
Copy the code

1. Kind

There are two implementations of Kind in nsobject. m: objc_opt_isKindOfClass and isKindOfClass, but the core logic of the logic is the same: this is the for loop.

   for (Class tcls = cls; tcls; tcls = tcls->getSuperclass()) {
     if (tcls = = otherClass) return YES;
    }
Copy the code

Superclass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass = isKindOfClass The implementation logic for isK is summarized as follows:

obj -> obj->getIsa()->getSuperclass() == otherClass

Re1 and re 3 are both CLS comparisons, but only NSObjcet outputs 1, because the superclass of NSObjcet’s metaclass is exactly the same as NSObjcet itself. UIPerson is already a metaclass inheritance chain compared to UIPerson, can never be equal. Same with RE5 and RE7


Note that there is one thing that has not been investigated: using dynamic debugging to look at assembly when isK is called, sometimes the +- method in NSObject is executed, and sometimes the > redirect objc_opt_isKindOfClass() is performed. The same applies to the previous API process for exploring [XXX class].

2. Member

The implementation of Member is relatively simple and only compares 1 time. Click to jump to the implementation in nsobject. m, there are both +- methods but the core logic is the same:

The self – > ISA () = =? cls;

So A isM B is a.isa () =? B, that is, whether A’s class or metaclass is B.

Re2 and re4 in the printed result are actually compared by self to self.isa.


Third, cache_t cache

1) Cache_t data structure

Cache: What should a class cache be? Methods? Attribute? Let’s take a look at the data structure of cache_t and see if we can find any ideas: a pointer, a union, part private method, part public method.

From the point of view of the members is to analyze what ideas, then continue to deeper level analysis. Start by searching for _bucketsAndMaybeMask, which is a pointer to bucket_t, as noted in the method below. So what’s the structure of bucket_t? Click into its structure, it is clear that SEL and IMP are distinguished by frame.

Bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t: bucke_t Since bucket_t is a container, there must be access methods to operate in cache_t. Insert (), buckets(), alloc(), and empty() are the main methods found in the structure.

2) Buckets’ get()

Bucketsandmaybemask. load(memory_order_relaxed) addr = bucketsandmaybemask. load(memory_order_relaxed) And since _bucketsAndMaybeMask is an atomic type in the cache_t structure, there must be a load method for reading and a store method for storing, and 80% of the store is done in insert().

Mory_order_relaxed pattern in multithreaded memory synchronization where atomicity of the current operation is guaranteed and thread synchronization is not considered. Other threads may read new or old values.

In the command, bucketsMask = ~0ul (0ul is an unsigned long integer 0. ~ indicates the reverse bit, that is, 0xFFFF…)

3) Buckets’ Insert()

Insert () is then analyzed, both to explore its internal structure and logic, and to verify that the store() we just analyzed is actually in it. Click to enter its structure:

Fastpath () is used when judging a condition while, which is a macro definition that tells the compiler that the current condition has a high probability of success. __builtin_expect(bool(x), 1)) This directive was introduced by GCC to allow programmers to tell the compiler which branch is most likely to be executed. This command is written as: __builtin_expect(EXP, N) meaning: EXP==N with high probability.

As shown in the figure, insert() is mainly composed of two parts: space management and insert cache, to summarize the main logic of these two parts respectively.

1. Space management

  1. Initial occupation: newOcupied = 0+1; Capacity = oldCapacity = capacity(0) = 0;
  2. NewOccupied + CACHE_END_MARKER(1) > cache_FILL_ratio (capacity) is equivalent to _OCCUPIED +2 >= (_mybeMaks+1)*3/4
  3. The most important method, besides determining conditions, is reallocate()

Moving on to cache_t::reallocate(), the internal logic of the cache_t::reallocate() also does two things:

  1. According to thenewCapacityCreate newBuckets in thesetBucketsAndMask_bucketsAndMaybeMask, _maybeMask, _occupied;
  2. -> Free (oldBuckets)

2. Inserted into the cache

  1. Prepare to insert: obtain the start address of bucket B, the maximum operation interval M, hash the start insertion position begin, change the parameter I, and then start to do… The while loop.
  2. There are only two things in the loop: insert, inserted. The while condition of the loopcache_nextwithcache_hashThe hash algorithm is consistent on variable parametersifor+-To perform the offset lookup of B [I].
  3. bycache_hashThe algorithm can see that its insertion is not zero… 1.. 2.. Sequential insertion.

What is the uintptr_t data type of sel? What is it for?

First, uintptr_t is not in C++, but in C99

, which is defined as an “unsigned integer type” as an optional type. Why can’t you do bitwise operations on pointer in C, and is there a way around this? Therefore, in order to bitwise manipulate the pointer, you need to convert the pointer to type UNITPR_t, and then perform the bitwise operation. And the property is that any valid pointer to void can be converted to this type, and then converted back to a pointer to void, and the result will be equal to the original pointer.

Continuing with the logic of buckett::set() in the loop:

Bucket * B, SEl, IMP, and Cls are passed to the parameters first, which is all the information needed to store in the cache. Second, although the final storage is architecturally different, imP-encode in both parts is the same, encodeImp(base, newImp, newSel, CLS) in struct bucket_t. LDP/STP was used for SEL/IMP storage on __arm64__, memory_ORDER_relaxed (multithreaded memory synchronization) was used for other cases. The storage order is the same as sel and IMP of struct bucket_t previously explored.

4) _maybeMask changes

Note: for _maybeMask, newCapacity -1 is performed in setBucketsAndMask(), and +1 is restored in capacity().

So why is _maybeMask = CapCity – 1, or why do I do that? The answer can be found in the allocateBuckets() method of reallocate, which inserts a tag at the end of bucket_t, depending on the framework.

  • Arm: newBbucket-1
  • Other: indicates the first address of newbBucket


Four, calss_data_bits_t bits

In objC_class, isa records metaclass chains, Superclass record inheritance chains, cache as the name implies, and bits stores all the data of the class. The previous functions and data structures have been analyzed, and then we start from bits to explore the data structure of the class in more depth.

1) Method data structure

  1. inclass_data_bits_tIn the structure method, andbitsMost of them are relevant, most of them are rightbitsAccess calculation.
  2. In combination withobjc_classCan be analyzedclass_data_bits_tIn the*data()And methods are more critical, and can ultimately be located based on the type of data it returnsclass_rw_tThis data structure.
  3. Enter theclass_rw_tAnd then you can see thatThis contains all the information about a class itselfThe one that’s easy to identifymethods(),properties(),protocols()These 3 methods, some of the other lower level asro_or_rw_ext_t(),class_ro_t*,roAnd so on. We’ll do the analysis later.

  1. From the firstmethods()To begin the exploration, drill down through method calls and their data types to observe the list data types of class storage methods:Method_array_t: list_array_tt <method_t, method_list_t. method_list_t_authed_ptr> {... }, which contains three paradigms andlist_array_ttIs already the root class and enteredlist_array_ttPost analysis source code alsoNot teasing outThe storage relationship of the data structure.

  1. It seems that static analysis can only go so far, so try combining dynamic analysis and passllvmWe can get it from the above analysisbits.dataLayer by layer,Visually observe the structure.
  2. Before you start printing, analyze itobjc_classBefore you can calculate the correct offset: whereIsa and Superclass are both Pointers to 8 bytes.cache_t cacheContains a pointer to aunionSo the total is going to be16 bytesIf you want to print out a classbits, fromThe Class first addressBackward migration32 bytesCan.

Although the offset calculation was correct in the static analysis, the data structure analysis was not so transparent, and the printing process took a lot of attempts to print out the most desired results. The relationship between the data structures is also worked out: method_array_t contains method_list_t, and then method_t below.

Step test commands and procedures have been combined and omitted. For example, to view step printing, you only need to split the commands:

P ((class_rw_t *) ((class_data_bits_t *) Class_Index + 32) – > data () – > the methods (). The list. The PTR – > get (0). The big ()

2) Propert and Protocol data structures

Class properties and protocols are the same as methods(). They are retrieved from class_rw_t, and the store types are propert_array_t and PROTOCOL_array_t, the corresponding subclasses of list_array_tt. Since the first steps are the same for both, the difference is concentrated on the basic data structure (such as method_t) stored in each of them. Therefore, according to the basic data structure, we can obtain the content of the property, and remove the big().

p ((class_rw_t *)((class_data_bits_t *)Class-Index+32)->data())->properties().list.ptr->get(0)

The protocol reading process is different from methods and attributes. The data type is protocol_list_t* when a PTR is read, but printing as shown above results in an error without the get() method: No member named ‘get’ in ‘protocol_list_t’, same parent class List, no get();

After repeated analysis of the source code, noting the iterator xx method and the starting list[0] and trying to print those sections, I get the protocol_list_t:

The protocol_ref_t is protocol_ref_t, and the protocol_ref_t is a pointer. The protocol_ref_t * is a pointer to a protocol_t*. The final print is consistent with the contents of the protocol_list_t structure.

The protocol_t structure inherits from objC_object and contains other data structures such as method_list_t and property_list_t. The contents of these data structures can be obtained by expanding them in accordance with the previous analysis methods and properties.

The command to obtain the protocol_t is summarized as follows:

// protocol_T Contents P *(protocol_t*)((class_rw_t *)((class_datA_bits_t *)Class_Index+32)->data())->protocols().list.ptr.list[0]

// @required method p (*(protocol_t*)((class_rw_t *)((class_datA_bits_t) *)Class_Index+32)->data())->protocols().list.ptr.list[0]).instanceMethods->get(0).big()

// @optional method p (*(protocol_t*)((class_rw_t *)((class_data_bits_t) *)Class_Index+32))->data())->protocols().list.ptr.list[0]).optionalInstanceMethods->get(0).big()

3) Class_rw_t in bits

Before we start analyzing class_rw_t, we need to declare two concepts: dirt-memory and clean-memory: dirty memory is memory that changes while the process is running, and clean memory is memory that does not change after loading.

Dirty memory is much more expensive than clean memory, especially in iOS, because as long as a process is running, it must be kept, whereas clean memory can be freed and always reloaded off the disk when needed. In addition, as long as the class is used, the data structure of the class will become “dirty”, such as creating a new cache, the storage of written data, etc., so the more data that is kept clean, the better, so that more space can be saved for storing data in the App.

Then analyze class_rw_t mentioned in the method’s data structure and those lower fields in the structure such as ro_OR_rw_ext_t (), class_ro_t*, ro. This all boils down to three parts class_ro_t, class_rw_ext, and class_rw_t.

  • class_ro_tIs a pointer to more information about the storage class. The “ro” in the name isread-only(” rw “for the same reason), so when it is loadedclean-memory.
  • class_rw_tIs the memory allocated by the process at runtime to store data for read/write classes, i.eDirty - the memory.
  • class_rw_extIs toclass_rw_tMethods, properties, protocols, and so on in about 90% of the classes are separated from data that is not dynamically modified at run time, allowing more numbers to be kept clean to save space, all willclass_rw_tSplit it into two parts.

Get_ro_or_rwe () assigns the atomic data type pointer ro_or_rw_ext_t to the struct method class_rw_t. Class_rw_ext_t ->ro(), and class_ro_t returns ro.

The data structure of the above records can be verified during dynamic debugging: the properties() data of both ro and RW parts are printed for comparison, and the test commands are integrated:

p (class_ro_t *)((class_rw_t *)((class_data_bits_t *)Class_Index+32)->data())->ro())->baseProperties->get(0)

p (class_rw_t *)((class_data_bits_t *)Class_Index+32)->data())->properties().list.ptr->get(0)

WWDC2020 Advancements in the Objective-C Runtime There are also changes to the Objective-C method list and tagged Pointers.


Class attributes

1) Attributes/member variables

Class member variables (ivars) are stored in class_ro_t, as can be seen in the class_rw_t split-optimized image above. Class attribute = ivar + set() + get().

2) Attribute modification

So how do get() and set() find ivar?

In addition, ivAR is often used with a modifier keyword, even if nothing is added, there will be a default modifier strong, atomic, so that the previous keyword is different to find ivar.

For these two questions, a short piece of code is prepared, and the code is transformed through clang for observation and comparison.

@interface UIPerson : NSObject
@property(nonatomic.copy)  NSString* judy_NC;
@property(atomic, copy)     NSString* judy_AC;
@property(copy)             NSString* judy_3_C;
@property(nonatomic)        NSString* judy_N;
@property(atomic)           NSString* judy_A;
@end
@implementation UIPerson
@end
Copy the code

Clang-rewrite-objc main.m -o outmain.cpp

After comparing the converted CPP code, we can find:

  • When onlyCopy modifier Property modifierWhen the set method is usedobjc_setProperty()Method to perform an assignment.
  • incopyOn the basis of if method and preceding plusatomicWhen the get method is usedobjc_getProperty()Method to evaluate.
  • While other keywords modifyThe get and setuseSelf + memory translation direct assignment, value.
get() set()
n, c offset objc_setProperty()
a, c objc_getProperty() objc_setProperty()
n(s) offset offset
a(s) offset offset

At this point, the table summary is the answer to the first two questions, but a new question arises: why do copy modifiers use objc_setProperty and objc_getProperty methods to access values?

The objc\_retain process must be locked (spinlock_t) during the get process. In the set process, the assignment process is also locked, and not only that, copy or mutablecopy is checked to call a different copy method.


Six, summarized

So far, the record of the low-level exploration and analysis related to classes has been basically completed. Starting from the classes at the OC level, the data types, various data structures and the storage of Class information of the underlying classes have been explored and completed. Now the following summaries are made:

  1. The underlying data types, and inheritance relationships between NSObject and Class.
    1. NSObject isobjc_objectThe alias
    2. The Class isobjc_classStructure pointer to
    3. Objc_class: objc_object {isa_t isa}
  2. Objc_class data structure analysis
    1. isa,superclass,cache,bitsPublic method, private method
  3. Isa data structure and function analysis
    1. throughisarightClass object, metaclass objectLookup.
  4. SuperclassAPI analysis 2. PasssuperclassrightClass The parent class of an objectStep by step.
  5. Cache_t data structure analysis
    1. _bucketsAndMaybeMaskPointers, common bodies (_maybeMask, _flags, _occupied), private methods, public methods.
    2. The storage container is bucket_t
  6. Bits data structure analysis
    1. class_ro_tclass_rw_extclass_rw_tAnd properties, protocols, and methods.

Vii. Link Summary:

  • Objc4-818 source
  • Advancements in the Objective-C runtime
  • C++ 11 a primer on multithreading – STD ::memory_order
  • Memory order in C++
  • Builtin_expect instructions