Deconstruct the implementation of the objc_msgSend function in depth

After reading this article, you will learn more about the implementation of Runtime, the practice of the share design pattern, memory storage optimization, compilation memory barrier, multi-threaded lockless read and write implementation, garbage collection and other related technical points.

Objc_class (Class object) structure introduction

Developers familiar with the Runtime and object method invocation mechanisms of the OC language know that all OC method calls are converted at compile time to calls to the C function objc_msgSend.

*/ / [view1 addSubview:view2]; */ / [addSubview:view2]; objc_msgSend(view1,"addSubview:", view2);
      
   // CGSize size = [view1 sizeThatFits:CGSizeZero];
   CGSize size = objc_msgSend(view1, "sizeThatFits:", CGSizeZero);

   //  CGFloat alpha = view1.alpha; 
   CGFloat alpha = objc_msgSend(view1, "alpha");
Copy the code

The Runtime library implements polymorphic and Runtime method lookup and execution through the objc_msgSend function and isa data members hidden in OC objects. The isa for each object holds a pointer to the object’s Class object. A Class object is data of type Class, and Class is an alias for the pointer type of an objC_class structure, which is defined as follows:

   typedef struct objc_class * Class;
Copy the code

Struct objc_class is defined in the #import

header file, but it is only defined in objC 1.0. The currently running objc2.0 runtime library does not expose the details defined by struct objc_class.

You can download it at https://opensource.apple.com/source/objc4/objc4-723/ and view the source of the latest version of the Runtime library source code. The Runtime library source code is implemented in a mixture of assembly and C++. You can see a detailed definition of the struct objc_class structure in the objc-runtimenew. h header file. The objC_class structure is used to describe the class information of an OC class, including the name of the class, the inherited base class, the description of the method list defined in the class, the description of the attribute list, the description of the implementation protocol, the description of the defined member variables, and so on. In OC, Class information is also an object, so it is also called a Class object. Here is a static class diagram defined by the objc_class structure:

The content shown on the far left of the image has an editing error and should not be NSObject but objc_class.

The data members in the objc_class structure are quite numerous and complex. I won’t go into detail here. This article focuses on the internal implementation of the objc_msgSend function, so the following code will hide most of the data members’ definitions. And lists only the data members that will be accessed and used within the objc_msgSend method without changing the actual structure definition.

Internal implementation of the objc_msgSend function

The objc_msgSend function is the core engine for all OC method calls. It is responsible for finding the implementation of real class or object methods and executing those method functions. Because the frequency of calls is so high, the internal implementation is required to achieve the highest possible performance. The internal code implementation of this function is written in assembly language and does not involve any thread synchronization or lock-related code. You can check out the Messengers on various architectures under the rolls listed above.

; Listed here is an assembly code implementation in ARM64-bit real mode. 0x18378c420 <+0>: cmp x0,#0x0 ; =0x0
    0x18378c424 <+4>:   b.le   0x18378c48c               ; <+108>
    0x18378c428 <+8>:   ldr    x13, [x0]
    0x18378c42c <+12>:  and    x16, x13, #0xffffffff8
    0x18378c430 <+16>:  ldp    x10, x11, [x16, #0x10]
    0x18378c434 <+20>:  and    w12, w1, w11
    0x18378c438 <+24>:  add    x12, x10, x12, lsl # 4
    0x18378c43c <+28>:  ldp    x9, x17, [x12]
    0x18378c440 <+32>:  cmp    x9, x1
    0x18378c444 <+36>:  b.ne   0x18378c44c               ; <+44>
    0x18378c448 <+40>:  br     x17
    0x18378c44c <+44>:  cbz    x9, 0x18378c720           ; _objc_msgSend_uncached
    0x18378c450 <+48>:  cmp    x12, x10
    0x18378c454 <+52>:  b.eq   0x18378c460               ; <+64>
    0x18378c458 <+56>:  ldp    x9, x17, [x12, #-0x10]!
    0x18378c45c <+60>:  b      0x18378c440               ; <+32>
    0x18378c460 <+64>:  add    x12, x12, w11, uxtw # 4
    0x18378c464 <+68>:  ldp    x9, x17, [x12]
    0x18378c468 <+72>:  cmp    x9, x1
    0x18378c46c <+76>:  b.ne   0x18378c474               ; <+84>
    0x18378c470 <+80>:  br     x17
    0x18378c474 <+84>:  cbz    x9, 0x18378c720           ; _objc_msgSend_uncached
    0x18378c478 <+88>:  cmp    x12, x10
    0x18378c47c <+92>:  b.eq   0x18378c488               ; <+104>
    0x18378c480 <+96>:  ldp    x9, x17, [x12, #-0x10]!
    0x18378c484 <+100>: b      0x18378c468               ; <+72>
    0x18378c488 <+104>: b      0x18378c720               ; _objc_msgSend_uncached
    0x18378c48c <+108>: b.eq   0x18378c4c4               ; <+164>
    0x18378c490 <+112>: mov    x10, #-0x1000000000000000
    0x18378c494 <+116>: cmp    x0, x10
    0x18378c498 <+120>: b.hs   0x18378c4b0               ; <+144>
    0x18378c49c <+124>: adrp   x10, 202775
    0x18378c4a0 <+128>: add    x10, x10, #0x220 ; =0x220
    0x18378c4a4 <+132>: lsr    x11, x0, # 60
    0x18378c4a8 <+136>: ldr    x16, [x10, x11, lsl # 3)
    0x18378c4ac <+140>: b      0x18378c430               ; <+16>
    0x18378c4b0 <+144>: adrp   x10, 202775
    0x18378c4b4 <+148>: add    x10, x10, #0x2a0 ; =0x2a0
    0x18378c4b8 <+152>: ubfx   x11, x0, # 52, # 8
    0x18378c4bc <+156>: ldr    x16, [x10, x11, lsl # 3)
    0x18378c4c0 <+160>: b      0x18378c430               ; <+16>
    0x18378c4c4 <+164>: mov    x1, #0x0
    0x18378c4c8 <+168>: movi   d0, # 0000000000000000
    0x18378c4cc <+172>: movi   d1, # 0000000000000000
    0x18378c4d0 <+176>: movi   d2, # 0000000000000000
    0x18378c4d4 <+180>: movi   d3, # 0000000000000000
    0x18378c4d8 <+184>: ret    
    0x18378c4dc <+188>: nop    
Copy the code

After all, assembly language code is relatively obscure, so here we disassemble the implementation of the function into PSEUDO-code in C:

// The following structure lists only the data structures and members used for internal access to the objc_msgSend function. /* a typedef char * SEL; /* a typedef char * SEL; /* THE IMP type is the prototype type for all OC methods. */ typedef id (*IMP)(id self, SEL _cmd, ...) ; */ struct bucket_t {SEL key; // method name IMP IMP; // imp is a function pointer type}; /* A cache structure used to speed up method execution. This structure is essentially a hash bucket based on open address conflict resolution. */ struct cache_t { struct bucket_t *buckets; // Cache method hash bucket array pointer, number of buckets = mask + 1 int mask; - 1 int occupied; // The number of methods that have been cached in the bucket. }; The class structure description of the OC object indicates that the first argument to all OC objects is an ISA pointer. */ struct objc_object { void *isa; }; /* The OC class information structure, where only the necessary data members are shown. */ struct objc_class : objc_object { struct objc_class * superclass; // Base class information structure. cache_t cache; // method cache hash table //... Other data members are ignored. }; String */ id objc_msgSend(id receiver, SEL op,...) {/ / 1... Object null value determination. // Return nil if the object passed in is nilif (receiver == nil)
        returnnil; / / 2... Gets or constructs isa data for an object. void *isa = NULL; // If the address bit of the object is 0, it is an ordinary OC object; otherwise, it is an object of type Tagged Pointerif ((receiver & 0x8000000000000000) == 0) {
        struct objc_object  *ocobj = (struct objc_object*) receiver;
        isa = ocobj->isa;
    }
    else{//Tagged Pointer Does not store ISA data directly. Therefore, special processing is required to find the corresponding ISA data. // If the highest four bits of the object address are 0xF, then it is a user-defined extended Tagged Pointer objectif(((NSUInteger) Receiver) >= 0xf000000000000000) {bits 52-59 in a custom extended Tagged Pointer type object store the index value of a global extended Tagged Pointer class array. int classidx = (receiver & 0xFF0000000000000) >> 52 isa = objc_debug_taggedpointer_ext_classes[classidx]; }else{// Bits 60 to 63 in a Tagged Pointer stored in the system store the index value of a global Tagged Pointer array. int classidx = ((NSUInteger) receiver) >> 60; isa = objc_debug_taggedpointer_classes[classidx]; }} // Because of memory address alignment and virtual memory space constraints, // and isa definition needs to be matched with the upper 0xffffff8 to get the Class object to which the object belongs. struct objc_class *cls = (struct objc_class *)(isa & 0xffffffff8); / / 3... Iterate through the cache hash bucket and look for method implementations in the cache. IMP imp = NULL; // CMD and mask in cache to calculate the index in the hash bucket to find whether the method has been put into the cache hash bucket. int index = cls->cache.mask & op;while (true) {// If the cache hash bucket matches the corresponding method implementation, it is saved to imp and exits the loop.if (cls->cache.buckets[index].key == op) {
              imp = cls->cache.buckets[index].imp;
              break; } // The method implementation is not cached and exits the loop if the corresponding bucket is emptyif (cls->cache.buckets[index].key == NULL) {
             break; } // If the item in the hash bucket is already occupied but is not the method to be executed, the open address method continues to find the bucket that caches the method.if(index == 0) { index = cls->cache.mask; }}else{ index--; // Index minus 1 to continue searching. } } /*endwhile* / / / 4... Execute method implementation or method does not hit cache handler functionif(imp ! = NULL)returnimp(receiver, op, ...) ; // Here... Is a parameter in the OC method passed to objc_msgSend.else
         returnobjc_msgSend_uncached(receiver, op, cls, ...) ; } /* method failed to hit cache handler: pseudocode implementation of objc_msgSend_uncached C version, also written in assembly language. */ id objc_msgSend_uncached(id receiver, SEL op, Struct objc_class * CLS) {/ / this function is very simple is the direct call _class_lookupMethodAndLoadCache3 to find the methods and cache to the struct objc_class in the cache, Finally, return the IMP type. IMP imp = _class_lookupMethodAndLoadCache3(receiver, op, cls);returnimp(receiver, op, ....) ; }Copy the code

As you can see, the implementation logic of objc_msgSend is divided into four parts:

1. Object null value judgment

If it is nil, the function directly returns. This means that when a method is called on a nil object, it will not crash, nor enter the corresponding method implementation. In fact, the whole process will not happen, but directly return nil.

2. Obtain or construct isa data of the object

Normally, every OC object starts with a hidden data member isa. Isa holds the class description, so we need to get this pointer value from the object before executing the method. To reduce memory waste, Apple introduced the concept of Tagged Pointer objects. For example, instances of NSString and NSNumber are defined as Tagged Pointer objects. An object of Tagged Pointer uses an integer as long as a machine word to represent an OC object. To distinguish it from a common OC object, the highest bit of each Tagged Pointer object is 1 and the highest bit of a common OC object is 0. Therefore, in the above code, if the highest bit of the object receiver address is 1, the object will be treated as Tagged Pointer. It can be seen from the code implementation that there are two types of Tagged Pointer objects in the system. If all four bits are 1, they are user-defined extended Tagged Pointer objects; otherwise, they are built-in Tagged Pointer objects. It is not possible to store an ISA in Tagged Pointer. Instead, some bits in Tagged Pointer are used to store the index value of the specified class. The system defines two global array variables:

   extern "C" { 
    extern Class objc_debug_taggedpointer_classes[16*2];
    extern Class objc_debug_taggedpointer_ext_classes[256];
}
Copy the code

To save all Tagged Pointer class information. For Tagged Pointer objects, the higher four bits store an index value. By using this index value, you can find the Class object to which the object belongs in the objC_debug_taggedPOinter_classes array. For custom extended Tagged Pointer objects, the 52-59 8-bit bits hold an index value. This index can be used to find the Class object that the object belongs to in the objC_debug_taggedPOinter_ext_classes array.

Thinking and Practice: The Tagged Pointer object obtains ISA data in the share design mode. This design mode can reduce the memory size of an object to some extent. For example, in the 256-color bitmap, the color index value is stored in each pixel position instead of the RGB value of the color, thus reducing the file storage space of the low-color bitmap. It might take eight bytes to save an object reference, and only one byte to save an index value.

In the second step, both common OC objects and Tagged Pointer objects need to find the ISA information to which the object belongs and further find the class object to which the object belongs. Only when the class object is found, can the implementation of the corresponding method be found.

Isa internal structure

In the above code implementation, there is also an and 0xffffff8 operation when converting ISA to struct objc_class. Although ISA is an 8-byte pointer value, the value it holds is not necessarily a pointer to a struct objc_class object. The maximum accessible virtual memory address range for user processes in ARM64-bit architecture is 0x0000000000-0x1000000000, which means that the available virtual memory space for each user process is 64GB. And because of the memory alignment factor for a pointer variable, the lowest three bits of the pointer variable must be 0. So the value of 0xffffff8 and the contents saved in ISA is the actual Class object pointer to the object. The ARM64 architecture is optimized for isa. In addition to storing Pointers to Class objects, it also stores information such as the reference count of the OC object itself, whether the object is weakly referenced, whether the object has an associated object flag, whether the object is being destroyed, and so on. If you want to more detailed understanding of the internal structure of the isa, please refer to the article: blog.csdn.net/u012581760/… Introduction to.

Thought and practice: for all pointer types of data, we can also use the features of the 0-2 and 36-63 segments to store and set some specific data, thus reducing some memory waste and overhead.

3. Iterate through the cache hash bucket and find the method implementation in the cache

The data member of a Class object has a method list array that holds the description of all the methods of the Class and the function address entry that implements them. If this lookup is done every time a method is called, and you have to traverse the base class for method lookups when a base class method is called, it can be very costly to performance. To solve this problem the system creates a hash table for each class to cache methods **(the data member cache in objC_class is an object of type cache_t)**. This hash table cache is implemented by the hash bucket. Every time a method call is made, the method lookup from this cache is always carried out first. If it is found, the method function stored in the cache is executed. When found, the method name and method function address are saved in the cache to speed up execution next time. So part 3 of the objc_msgSend function is to find the corresponding method in the cache hash table of the Class object:

The Aid function 3.1 first works with the method name op and the mask in the cache. The value of this mask is the number of buckets in the cache minus 1. The number of buckets in the initial cache of a class is 4, multiplied by 2 each time the number of buckets increases. In other words, all bits of the binary value of mask are 1, so when op and mask perform and, that is, the low mask bits in op are taken to match elements in the hash bucket. Therefore, the index value of the hash algorithm must be less than the number of buckets in the cache without crossing the threshold.

When the corresponding index value is obtained through the hash algorithm, whether the key value in the corresponding bucket is the same as op is determined. Each bucket is a struct bucket_t structure that holds the method name (key) and the method implementation address (IMP). Once the key value is equal to the OP value, it indicates that the cache hit, and then save the IMP value and end the search out of the loop; Once the key value is NULL, it indicates that the method has not been cached, and it is necessary to break out of the loop for method missing cache processing. When the key is not NULL but not equal to op, it indicates that there is a conflict. The mechanism to resolve the conflict is to use the open address method to reduce the index value by 1 to continue the loop to find the cache.

Question 1: How does the number of hash buckets in the cache increase dynamically as the number of method accesses increases?

Question 2: Is there an infinite loop in the cache loop lookup?

Problem 3: When the number of buckets increases, the value of mask will also change, so there will be inconsistent values of index calculated before and after. How to solve this problem?

Q4: Since the number of hash buckets can be added dynamically at runtime, how can synchronization and security be handled in a multi-threaded access environment?

All four questions are answered in the internal implementation of the objc_msgSend_uncached function in Step 4.

4. Execute method implementation or method does not hit cache handler function

When a method is hit in the hash bucket and an implementation exists, the implementation is called and the function returns, and the whole function is executed. The objc_msgSend_uncached function is called when the method is not cached. It is also implemented in assembly language and does two things internally: One is the call _class_lookupMethodAndLoadCache3 function in the Class object lookup method the implementation of the body function and returns; The other is to call the returned implementation body function to execute the corresponding method. Can see from the _class_lookupMethodAndLoadCache3 function name after the realization of the function of the it is to find the cache, and this function is implemented in C language, so you can clear the source code to read it. _class_lookupMethodAndLoadCache3 function source code implementation is mainly from the method of Class object list or the list of the base Class method to find the corresponding method and implementation, and updates to the Class object in the buffer cache. If you read the source code carefully, you can easily answer the four questions posed in Step 3:

💡 Question 1: How does the number of hash buckets in the cache increase dynamically as the number of method accesses increases? 🔑 answer: Each Class object is initialized with four buckets for the cache, and the cache contains a data member occupied to hold the number of buckets that are used by the cache. The number of occupied buckets is increased by one each time a method’s cache information is stored in the bucket. The system doubles the capacity of the bucket and continues to expand in this order.

💡 Question 2: Does the cache loop lookup have an infinite loop? 🔑 answer: No, because the system always ensures that the number of empty buckets is 1/4 free. Therefore, when the loop is iterated, the cache will be hit or key == NULL will occur and the loop will exit.

💡 Problem 3: When the number of buckets increases, the value of mask will also change, so there will be two calculated index value inconsistent situation, how to solve this problem? 🔑 A: After the number of hash buckets is expanded, the system allocates a new batch of empty buckets to the cache and does not maintain the information about the old buckets in the cache. This means that each method needs to be recached when the number of buckets is expanded, and all cached information is cleared and restarted. So there will be no inconsistency between index calculations.

💡 Q4: Since the number of hash buckets can be added dynamically at runtime, how can synchronization and security be handled in a multi-threaded access environment? 🔑 answer: The entire objc_msgSend function reads the method cache without adding any locking or synchronization information for maximum performance. In order to ensure data security and synchronous access in multi-threaded environment, security and synchronous processing are required in writing and reading scenarios: ☞ First, the processing method of multi-threaded simultaneously writing cache is investigated. If two threads are detected method is not in the cache and data or write bucket need to expand the cache, the expansion of the cache and write a bucket of data before using a global mutex to ensure that written by synchronous processing, in locked within the scope of and also made a check of the cache handling, so that even in the two threads to invoke the same method is also won’t appear twice to write cache. Therefore, the solution of multi-threaded simultaneous writing is simply to introduce a mutex to solve the problem.

☞ The processing method of multi-threading reading and writing the cache at the same time is discussed. As mentioned above, when expanding the hash bucket in the cache, the solution is to completely discard the old cache, create a new hash bucket and update all the data members in the Class object cache. Therefore, if not handled properly, an exception will occur in step 3 of the objc_msgSend function when accessing the data member in the cache. The fourth instruction of the objc_msgSend function solves this problem in a very clever way:

 0x18378c430 <+16>:  ldp    x10, x11, [x16, #0x10]
Copy the code

This instruction will be put in the cache in the hash bucket buckets and mask | occupied the whole structure data members respectively read x10 and x11 two registers. Because CPU can ensure the atomicity of single instruction execution, the function does not read buckets and mask in cache again in the whole subsequent assembly code, but uses the values in registers X10 and X11 to search the hash table. So even if other writer threads expand the number of hash buckets in the cache and reallocate memory, data access by the current reader thread will not be affected. Buckets and mask are updated when the writing thread expands the number of buckets in the cache. The implementation code for this part is as follows:

// Set the hash bucket memory and mask values for the update cache. void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets' bounds). // It is unsafe for objc_msgSend to see old buckets and new mask. // Therefore we write new buckets, wait a lot, then write new mask. // objc_msgSend reads mask first, then buckets. // ensure other threads see buckets contents before buckets pointer mega_barrier(); buckets = newBuckets; // ensure other threads see new buckets before new mask mega_barrier(); mask = newMask; occupied = 0; }Copy the code

This code is written in C++ to achieve. In this code, the hash bucket data member BUCKETS are first modified and then the value of the mask is modified. To ensure that the assignment order is not optimized by the Compiler, mega_baerrier() is added to implement the Compiler Memory Barrier **. If you don’t add a compile memory barrier, the compiler might optimize the code so that mask is assigned first and buckets is assigned later. What would happen? LDP x10, X11, [X16, #0x10] can be used to read the new mask value and the old buckets value, and the new mask value is larger than the old mask value. This will cause a crash when the memory array is out of bounds. Adding a compile memory barrier ensures that the buckets assignment is performed before the mask assignment, so that even after the writer thread has done the buckets assignment, but before the mask assignment, If LDP x10, X11, [x16, #0x10] is executed by the reader thread, the new buckets value and the old mask value will not be abnormal. It can be seen that lockless read/write synchronization can be implemented to a certain extent with the help of compile memory barrier related techniques. Of course, if the code is written in assembly language instead of high-level language, the new buckets and mask values can be written with STP instruction without compiling memory barrier technology, and lockless read and write can be realized.

Thinking and practice: if you want to compile barrier related knowledge, please refer to the article introduction to https://blog.csdn.net/world_hello_100/article/details/50131497

In the case of multiple threads to read and write, there is a problem need to solve is to write because the thread cache for the expansion of the new hash bucket of memory allocated at the same time will destroy the old hash bucket of memory, while if the reader thread is visiting old cache, may read memory happens because of improper handling exceptions and system crash. To solve this problem, the system stores the starting and ending addresses of all six API functions that access the cache data in the Class object in two global arrays:

uintptr_t objc_entryPoints[] = {cache_getImp, objc_msgSend, objc_msgSendSuper, objc_msgSendSuper2, objc_msgLookup, objc_msgLookupSuper2}; //LExit indicates the end address of the function. uintptr_t objc_exitPoints[] = {LExit_cache_getImp,LExit_objc_msgSend, LExit_objc_msgSendSuper, LExit_objc_msgSendSuper2, LExit_objc_msgLookup,LExit_objc_msgLookupSuper2};Copy the code

When a writer thread extends the hash bucket in the Class cache, it first stores the allocated old hash bucket memory block address to a global garbage collection array variable, garbage_refs, and then iterates through all threads in the current process. And check to see if the current PC register in the thread state is in the objc_entryPoints and objc_exitPoints range. That is, check to see if any thread is executing a function in the objc_entryPoints list. If not, no function is accessing the cache data in the Class object. In this case, we can safely execute the actual destruction of all hash bucket memory blocks in the global garbage collection array variable Garbage_refs. If any thread is executing a function in the objc_entryPoints list, it does nothing but wait to check again and destroy it when appropriate. This ensures that the reader thread does not generate a memory access exception when accessing buckets in the cache of the Class object.

Think and practice: The technical solution described above is an implementation of a garbage collection technology. In garbage collection, memory is not immediately released. Instead, it is temporarily placed in a place for unified management. When certain conditions are met, all allocated memory is destroyed and released uniformly.

The ObjC2.0 Runtime uses LDP instructions, compile memory barriers, and memory garbage collection techniques to solve the lockless processing of multithreaded data reads and writes, improving system performance.

summary

That’s all there is to say about the internal implementation of the objc_msgSend function. Did you learn anything new in this article? Have you gained more insight into Runtime? In the introduction of these things, but also incidentally introduced the concept of meta-mode, as well as the pointer type of data memory optimization, but also introduced the multi-thread under the lock read and write related implementation skills and so on. If you enjoyed this post please give me a like at 👍,

Welcome to visit myMaking the address

Deconstruct the implementation of the objc_msgSend function in depth

Objc_class (Class object) structure introduction

Internal implementation of the objc_msgSend function

1. Object null value judgment

2. Obtain or construct isa data of the object

Isa internal structure

3. Iterate through the cache hash bucket and find the method implementation in the cache

4. Execute method implementation or method does not hit cache handler function

summary

Related Posts

What’s new in the Swift 5?

Study Notes on Flutter – Containers and Layouts (1)

Core Text: Create an indicator line