preface
The Runtime message sending and forwarding process gets a lot of attention, but it’s often overlooked that the method caching mechanism is behind objc_msgSend’s remarkable performance improvement.
This paper will comb the message sending and forwarding process through source code, focusing on analyzing the implementation details of method caching mechanism. Some assembly code is involved in the writing process, but it does not affect the understanding of the core logic.
Source code based on Runtime 750, ARM64 architecture.
Start with objc_msgSend
Note: ARM64 assembler code has a lot of p, which is actually a macro, x for 64 bits, W for 32 bits, p for register.
Before looking at the caching mechanism, let’s take a look at the message sending and forwarding process and find out when the cache is stored and read.
objc_msgSend
The objc_msgSend code looks like this:
ENTRY _objc_msgSend UNWIND _objc_msgSend, NoFram ... // Handle tagged pointer or nil (x0 stores objc_object address) LDR p13, [x0] // p13 = isa put the first 64 bits of x0 into p13 (i.e. isa member of objc_Object) GetClassFromIsa_p16 p13 // p16 = class through ISA Class LGetIsaDone: CacheLookup NORMAL // Find IMP from method cache or method list and call...Copy the code
The GetClassFromIsa_p16 macro code is:
.macro GetClassFromIsa_p16
...
and p16, $0.#ISA_MASK // #define ISA_MASK 0x0000000ffffffff8ULL.Copy the code
$0 gets the macro’s first argument, called p13, which is isa. This step is to use the ISA_MASK to find the Class in isa and place it in p16.
CacheLookup
CacheLookup contains the core logic for reading the method cache, which is analyzed later in the code.
All you need to know is that it queries the method cache of the current Class, producing two main results: if the cache hits, return IMP or call IMP; If cached and not cached, call __objc_msgSend_uncached (called when IMP is found) or __objc_msgLookup_uncached (not called when IMP is found).
STATIC_ENTRY __objc_msgSend_uncached
UNWIND __objc_msgSend_uncached, FrameWithNoSaves
MethodTableLookup
TailCallFunctionPointer x17
END_ENTRY __objc_msgSend_uncached
Copy the code
If an IMP is found, it is placed in an X17 register and passed to the TailCallFunctionPointer macro to call the method.
MethodTableLookup
.macro MethodTableLookup
// push frame
SignLR
stp fp, lr, [sp, # - 16]!mov fp, sp ... // save registers: x0.. x8, q0.. q7 // receiver and selector alreadyin x0 and x1
mov x2, x16
bl __class_lookupMethodAndLoadCache3
// IMP inx0 mov x17, x0 ... // restore registers mov sp, fp ldp fp, lr, [sp],# 16
AuthenticateLR
.endmacro
Copy the code
Since the macro has to jump function internally, which means the LR change, the previous FP/LR value needs to be stored on the stack after opening up the stack space to facilitate the reset state. The author removes the logic of save registers and restore registers. In fact, the values of each register are stored on the stack first so that the value of the register can be reset when the internal function frame is released.
After the call the __class_lookupMethodAndLoadCache3 return in x0 IMP value is copied into the x17.
__class_lookupMethodAndLoadCache3 is a C function, before the jump in the value is copied to the x2 x16 (x16 store now is GetClassFromIsa_p16 code to find the object of the Class), then register the layout at this time is: X0 -> receiver/x1 -> selector/x2 -> class, which corresponds to the argument list of this method:
IMP _class_lookupMethodAndLoadCache3(id obj, SEL sel, Class cls) {
return lookUpImpOrForward(cls, sel, obj,
YES/*initialize*/, NO/*cache*/, YES/*resolver*/);
}
Copy the code
lookUpImpOrForward
The lookUpImpOrForward method is complex and the simplified logic is as follows:
IMP lookUpImpOrForward(Class cls, SEL sel, id inst, bool initialize, bool cache, bool resolver) { IMP imp = nil; bool triedResolver = NO; . // cache for YES lookup method cacheif (cache) {
imp = cache_getImp(cls, sel);
if (imp) returnimp; } // lock runtimelock.lock (); // If necessary, initialize the class and call +initialize... IMP IMP = cache_getImp(CLS, sel);if (imp) goto done; // Look for IMP in the list of current class methodsif(find IMP) {save IMP method cache gotodone; } // Look for IMP in the parent class's method cache/method listwhile(Class cur = cls->superClass; cur ! = nil; cur = cur->superClass) {ifFind IMP in method cacheif (IMP == _objc_msgForward_impcache) { break; } put IMP into the method cache goTO of the current CLS classdone;
}
if(Find IMP in the list of methods) {store IMP in the method cache goto of the current CLS classdone; }} // IMP not found, try dynamic message processingif(resolver && ! triedResolver) { runtimeLock.unlock(); _class_resolveMethod(cls, sel, inst); runtimeLock.lock(); triedResolver = YES; goto retry; } // If dynamic message processing fails, IMP points to a function and stores IMP method cache IMP = (IMP)_objc_msgForward_impcache; cache_fill(cls, sel, imp, inst);done:
runtimeLock.unlock();
return imp;
}
Copy the code
Method cache access
Method cache storage follows general logic. Imps are cached whenever they are found, and cache_fill is called whenever a method cache is added. It is important to note that methods found in the parent chain will still be added to the cache list of the current class, which can greatly improve the efficiency of finding methods in the parent chain.
Readers may wonder why this method is even fetching the cache. By the time the previous assembly methods got here, theoretically the current class had no method cache for SEL. The previous cache_getImp method is used because lookUpImpOrForward is called by other functions and is not in the flow I analyzed earlier. Retry: The following cache_getImp is used because imPs may be inserted during dynamic message processing and then goto retry.
Query for a list of methods
The query for the class’s method list is handled with the getMethodNoSuper_nolock-> search_method_list method. Otherwise it’s a simple traversal query. Therefore, without method caching, method query efficiency is very low, and the time complexity is either O(logn) or O(n).
Logic of message forwarding
Unlock () and lock() are called in front of the _class_resolveMethod method to disable the protected state of the class, making it easy for developers to change the list of methods of the class, etc.
_class_resolveMethod sends the +resolveInstanceMethod (instance object) or +resolveClassMethod (class object) methods to the object. Developers can dynamically add IMPs to both methods. After _class_resolveMethod is removed from the stack, goto retry attempts to find the method logic.
Of course, if the developer did not do processing, IMP still can not find, through! TriedResolver avoids secondary dynamic message processing and then makes IMP = (IMP)_objc_msgForward_impcache. This way, when a lookUpImpOrForward frame is released, it still appears to the upper level that an IMP has been found. This method is called _objc_msgForward_impcache. Then the __objc_msgSend_uncached method analyzed earlier still calls the IMP, and it’s time for the actual message forwarding phase.
STATIC_ENTRY __objc_msgForward_impcache b __objc_msgForward END_ENTRY __objc_msgForward_impcache ENTRY __objc_msgForward adrp x17, __objc_forward_handler@PAGE ldr p17, [x17, __objc_forward_handler@PAGEOFF] TailCallFunctionPointer x17 END_ENTRY __objc_msgForwardCopy the code
__objc_forward_handler, which is a function pointer, is implemented by default under OBJC2:
__attribute__((noreturn)) void
objc_defaultForwardHandler(id self, SEL sel)
{
_objc_fatal("%c[%s %s]: unrecognized selector sent to instance %p "
"(no message forward handler is installed)",
class_isMetaClass(object_getClass(self)) ? '+' : The '-',
object_getClassName(self), sel_getName(sel), self);
}
void *_objc_forward_handler = (void*)objc_defaultForwardHandler;
Copy the code
Finally we see the familiar description of unrecognized selector sent to instance.
For developers familiar – forwardingTargetForSelector: redirect method, the – forwardInvocation: forwarding method, the Runtime no traces in the source code, There is only a function at the end of the file that changes the _objc_forward_handler pointer.
void objc_setForwardHandler(void *fwd, void *fwd_stret) { _objc_forward_handler = fwd; . }Copy the code
summary
So far, the overall message sending mechanism has been fairly clear, and a number of access operations to method caches have been discovered along the way, notably cache_getImp and cache_fill functions. Of course, method caching also has cleanup operations, which we’ll talk about later. The next section focuses on the implementation details of method caching.
Second, the data structure foundation of method cache
Cache_t is the data structure for the method cache; in objc_class the cache variable is offset by 64*2 bits:
struct objc_class : objc_object { // Class ISA; Class superclass; cache_t cache; class_data_bits_t bits; .Copy the code
Bits stores attributes, protocols, and methods of a class. Cache_t’s structure is also simple:
struct cache_t { struct bucket_t *_buckets; // bucket_t mask_t mask; Mask_t _occupied; // Number of valid caches...Copy the code
At first glance, it looks like a hash table, just like weak_table_t/weak_entry_t, the underlying data structure for weak references. Bucket_t under arm64:
struct bucket_t { MethodCacheIMP _imp; cache_key_t _key; .Copy the code
MethodCacheIMP is an IMP alias, and cache_key_t is an unsigned long.
Write to method cache
cache_fill
Cache_fill is the entry method for method cache writes:
void cache_fill(Class cls, SEL sel, IMP imp, id receiver) {
mutex_locker_t lock(cacheUpdateLock);
cache_fill_nolock(cls, sel, imp, receiver);
}
Copy the code
This lock looks strange, but when you look inside it is actually a class like this:
class locker : nocopy_t {
mutex_tt& lock;
public:
locker(mutex_tt& newLock)
: lock(newLock) { lock.lock(); }
~locker() { lock.unlock(); }};Copy the code
Locking at locker construction and unlocking at destructor precisely protects method calls in method scope. This is the same as the __attribute__((cleanup(AnyFUNC), unused) used extensively in EasyReact to automatically unlock the device.
cache_fill_nolock
Cache_fill_nolock is the core logic for writing (modified for brevity) :
static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver) { ... // Write to cache is not allowed until the class is initializedif(! cls->isInitialized())return; // At this point, it is possible that the cache was already written by another thread when the cacheUpdateLock was occupied, so the cache was queried firstif (cache_getImp(cls, sel)) return;
cache_t *cache = getCache(cls);
cache_key_t key = getKey(sel);
mask_t newOccupied = cache->occupied() + 1;
mask_t capacity = cache->capacity();
if(cache->isConstantEmptyCache()) {// If the cache is read-only, reallocate the cache->reallocate(capacity, capacity? : INIT_CACHE_SIZE); }else if(newOccupied > capacity /4 * 3) {// Expand the cache if the number of available buffers exceeds 3/4 ->expand(); Bucket_t *bucket = cache->find(key, receiver);if (bucket->key() == 0) cache->incrementOccupied();
bucket->set(key, imp);
}
Copy the code
The lock preemption
The cache_fill method is already locked, but it can be accessed simultaneously by multiple threads, all of which are adding the same SEL to the same Class. If one thread holds the lock and updates successfully, the other threads do not need to write to the cache again after idling or hanging for a while. So if (cache_getImp(CLS, sel)) return; This sentence is necessary.
This is also a insurance policy because the caller may call this method without determining whether a SEL of the Class is cached.
Hash table memory allocation
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity) { bool freeOld = canBeFreed(); bucket_t *oldBuckets = buckets(); bucket_t *newBuckets = allocateBuckets(newCapacity); .setBucketsAndMask(newBuckets, newCapacity - 1);
if (freeOld) {
cache_collect_free(oldBuckets, oldCapacity);
cache_collect(false); }}Copy the code
AllocateBuckets (newCapacity * sizeof(bucket_t)); Then you can be sure that the method cache hash gives up the previous cache each time it allocates memory.
The following assignment method is interesting:
#define mega_barrier() \
__asm__ __volatile__( \
"dsb ish" \
: : : "memory")
void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask) {
mega_barrier();
_buckets = newBuckets;
mega_barrier();
_mask = newMask;
_occupied = 0;
}
Copy the code
Since the previous cache was discarded, _occupied is set to 0. Mega_barrier The mega_barrier inline assembly uses __volatile__ to prevent the compiler from caching variables into registers, uses memory barriers to prevent CPUS from using registers to optimize execution instructions, and uses DSB ish to isolate when all preceding memory accesses are complete. To execute the instructions following it. What’s the point of a macro that’s doing everything it can?
With cache_T, both _buckets and _mask are read unlocked, so it is important to ensure that _buckets is always longer than _mask. In the worst case, you simply cannot access the existing cache, or you are likely to access the wrong or illegal memory after a hash operation.
The second mega_barrier() ensures that the new _buckets is always assigned a value before the new _mask. This is on the premise that the new buckets are always longer than the old ones. In cache_T there is no logic for reducing the memory of _buckets, only a key/ IMP logic for empting each bucket of the _buckets array (memory readonly), so this is guaranteed.
Previously, the if (cache->isConstantEmptyCache()) branch of cache_fill_NOLock is the logic marked as readonly after memory has been emptied. INIT_CACHE_SIZE (8) is used to allocate memory. The new buckets are smaller than the old ones.
This is not true. Although the memory is not reduced when you empty the _BUCKETS, _occupied is set to 0, which means no other threads will access it.
The first mega_barrier() is a bit of a dream.
The simulation from the newBuckets pointer to assigning memory to _buckets is as follows:
X0 = 0x111; x0 = x0Copy the code
Since memory access is slower than register access, it is likely to be optimized by the operating system like this:
X0 = 0 x0 = 0 x0 = 0Copy the code
So the _buckets will already have a value before the third step, but the memory is still illegal, so the mega_barrier() plays a key role in making the heap open before the second step.
Hash table memory freed
CanBeFreed () is to check whether the old _buckets is read-only after cleaning. If not, it canBeFreed.
There are two steps to release:
Cache_collect_free (oldBuckets, oldCapacity); Insert oldBuckets to be released into a global two-dimensional array:
static bucket_t **garbage_refs = 0;
Copy the code
The exact algorithm is not detailed, but garbage_refs will double the capacity when full.
The second step cache_collect (false); Internal checks the size of garbage_refs and does nothing if it is less than 32*1024. Otherwise, a loop will be made to determine if there are no cached accesses in the process before the actual memory is freed.
This should also be done for access security, ensuring that a cache_t block of memory is not freed when it is accessed.
As you can see, it takes a lot of effort to access cache_t’s member variables without locking, but it’s worth it for such a high-frequency caching mechanism.
Expansion of hash table
void cache_t::expand() {... uint32_t oldCapacity = capacity(); uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE; // Overstep processingif((uint32_t)(mask_t)newCapacity ! = newCapacity) { newCapacity = oldCapacity; } reallocate(oldCapacity, newCapacity); }Copy the code
The _mask member variable of cache_t is of type mask_T and is defined as:
#if __LP64__
typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif
Copy the code
As noted, 64-bit systems use 32-bit shaping more efficiently. The above newCapacity uses uint32_t operation, so if the value of mask_t is 16 bits, it may exceed the limit. If it exceeds the limit, it will give up expansion and just call reallocate to reallocate to allocate memory of the same size as before.
Since the previous analysis of memory allocation method reallocate always creates new memory and abandons the old memory, so each expansion will abandon the old cache. You might worry that dropping the old cache would slow down message delivery, but the hash table grows twice as fast, starting with eight, and for most categories, a few times is enough.
Discarding the previous cache during capacity expansion has another benefit: Instead of writing the old cache into the hash table in sequence according to the hash algorithm (since the capacity of the hash table will change after expansion, which will directly affect the objects whose hash value will be intercepted by the mask, so all objects have to be re-inserted using the hash algorithm), imagine that if the old cache is not abandoned, It takes at least O(n) time to synchronize the old cache to the new hash table, and the process inevitably makes the cache no longer safe to read.
Write hash table
The core operation of the write operation is to read an available bucket_t via cache_t’s find function:
bucket_t * cache_t::find(cache_key_t k, id receiver) {
bucket_t *b = buckets();
mask_t m = mask();
mask_t begin = cache_hash(k, m);
mask_t i = begin;
do {
if (b[i].key() == 0 || b[i].key() == k) {
return&b[i]; }}while((i = cache_next(i, m)) ! = begin); . }Copy the code
The cache_hash hash algorithm is a simple operation :(mask_t)(key & mask), then directly compare bucket.key() in the array and return the address of the bucket if the key is 0 or matches the target.
When a hash collision occurs, cache_next increments the hash value by one, polling until a space is found. The cache_next code is (I +1) & mask, and even if the hash value has accumulated to the maximum value of the array and no space has been found, it goes back to the head of the array. Since the hash table expands at 3/4 capacity, this find operation is bound to find empty Spaces.
Bucket.key () == 0 indicates that the bucket is empty, so the upper-level method has this code (_occupied++) :
if (bucket->key() == 0) cache->incrementOccupied();
Copy the code
4. Read method cache
Each call to objc_msgSend or cache_getImp calls the CacheLookup macro. The difference is that they are called with different parameters:
objc_msgSend -> CacheLookup NORMAL
cache_getImp -> CacheLookup GETIMP
Copy the code
Here’s a look at the top half of CacheLookup:
.macro CacheLookup
// p1 = SEL, p16 = isa
1 ldp p10, p11, [x16, #CACHE] // p10 = buckets, p11 = occupied|mask
#if ! __LP64__
and w11, w11, 0xffff // p11 = mask
#endif
2 and w12, w1, w11 // x12 = _cmd & mask
3 add p12, p10, p12, LSL #(1+PTRSHIFT)
// p12 = buckets + ((_cmd & mask) << (1+PTRSHIFT))
4 ldp p17, p9, [x12] // {imp, sel} = *bucket
5 1: cmp p9, p1 // if(bucket->sel ! = _cmd) 6 b.ne 2f // scan more 7 CacheHit$0 // call or return imp
2: // not hit: p12 = not-hit bucket
8 CheckMiss $0 // miss if bucket->sel == 0
9 cmp p12, p10 // wrap if bucket == buckets
10 b.eq 3f
11 ldp p17, p9, [x12, #-BUCKET_SIZE]! // {imp, sel} = *--bucket
12 b 1b // loop
...
Copy the code
Note the initial register states p1 = SEL, p16 = ISA:
- Line 1: Defined
#define CACHE (2 * __SIZEOF_POINTER__)
, so 64-bit systemCACHE == 64*2
According to the data structure, that’s exactly what it isobjc_class
In thecache
The offset of the member variable, andcache_t
The first 64 bit in_buckets
A pointer,mask_t
It’s 32 bits, so the second 64 bit is_mask + _occupied
. - Line 2:
x11
Register-placed_mask + _occupied
thatw11
It’s 32 bits lower_mask
._cmd & mask
It’s the hash algorithm that caches the hash table, sox12
Now that’s the hash key. - Line 3: Calculate the pointer offset by the hash key and find its corresponding
bucket_t
.PTRSHIFT
The literal meaning is pointer offset, although I can’t find its definition, but can try to infer. Due to the< < 1
I’m going to double it, sobuckets + ((_cmd & mask) << (1+PTRSHIFT)
Can be translated into:Buckets + ((_cmd & mask) * (2 ^ 1+PTRSHIFT)
, abucket_t
128-bit size, that can be inferred from thisPTRSHIFT == 6
. We know thatmask
Is the value of the total length minus 1, which happens to apply to the algorithm here, so maybe that’s why it’s stored, rightmask
One reason I want minus 1. - Line 4:
x12
The corresponding hash key is savedbucket_t
Object address, willbucket
The two member variables are removed separately, nowp17 -> imp / p9 -> sel
. - Line 5:
p1
Save is the goalSEL
So here’s the comparison. - Line 6: Jumps to if the status register is not equel (ne)
2:
, line 8. - Line 7: Hit cache finds IMP, call
CacheHit
.CacheHit
According to the$0
Judge, ifNORMAL
The callIMP
; ifGETIMP
It returnsIMP
. - Line 8: Call
CheckMiss
To check if the cache is missing is to lookp9
(sel
) is 0. If it is 0, it will jump if cache is lost.CacheLookup
The assembly code behind it won’t go either. when$0
isNORMAL
The previously analyzed one is called__objc_msgSend_uncached
; when$0
isGETIMP
The jump toLGetImpMiss
Don’t be surprisedLGetImpMiss
Is what,CacheLookup
andCheckMiss
It’s all macros, and the upper level call might becache_getImp
(jump toLGetImpMiss
Reset) :
STATIC_ENTRY _cache_getImp
GetClassFromIsa_p16 p0
CacheLookup GETIMP
LGetImpMiss:
mov p0, # 0 / / reset
ret
END_ENTRY _cache_getImp
Copy the code
- Line 9:
p10
Is the header of the array pointer and the currently foundbucket
Comparison. - Line 10: Jumps to if equal indicates that the loop has not found the cache
3f
(Regardless of the implementation for the time being, it is just looking out of the hash algorithm). - Line 11: Indicates that the hash is conflicted and defined
#define BUCKET_SIZE (2 * __SIZEOF_POINTER__)
.bucket_t
It’s just two Pointers big, so we’re moving the pointer to the first index of the cache array (a little weird, the method cache writes to hash +1, -1 here, but it always iterates through). - Line 12: Jump to
1b
, forming a cycle.
CacheLookup
What did you do in the second half
3: // wrap: p12 = first bucket, w11 = mask
add p12, p12, w11, UXTW #(1+PTRSHIFT)// p12 = buckets + (mask << 1+PTRSHIFT) ... (Omits loop logic)Copy the code
Point the P12 to the end of the hash table and do the same forward traversal query as before.
Look carefully at the preceding instruction that jumps to 3:. If it reaches here, it means that the SEL found by hash key is not 0 all the time, but it is not equal to the target SEL, that is, it is always in hash conflict state, and the target SEL is not found after traversing the hash table forward.
So, this iterates from the end of the hash table to the head of the hash table:
Hash table header (top traversal portion)hashEnd of key hash tableCopy the code
The reader might think that this traversal would repeat the previous section of the code, but it doesn’t. Since the hash table expands when it is 3/4 full, the loop will be broken if the untraversed portion of 3: is either cached or lost (SEL == target or SEL == 0).
Clean up method cache
There are two modes of cache cleaning. One is to clean up the contents of the hash table instead of reducing the size of the hash table. One is to simply release the entire hash table.
Clean up the content
void cache_erase_nolock(Class cls) {
...
cache_t *cache = getCache(cls);
mask_t capacity = cache->capacity();
if (capacity > 0 && cache->occupied() > 0) {
auto oldBuckets = cache->buckets();
auto buckets = emptyBucketsForCapacity(capacity);
cache->setBucketsAndMask(buckets, capacity - 1); // also clears occupied
cache_collect_free(oldBuckets, capacity);
cache_collect(false); }}Copy the code
This is done by freeing the oldBuckets and retrieving a buckets array of the same capacity using emptyBucketsForCapacity. This array is not restricted to read-only language, but it should be understood as a read-only array.
emptyBucketsForCapacity
- if
capacity
Small enough to return a sumbucket_t *
Global variable of the same size_objc_empty_cache
. - Otherwise, from a static hash table
static bucket_t **emptyBucketsList = nil;
To obtain; If not found, initialize a space of equal size, store intoemptyBucketsList
And fill the empty array in the middle so that objects in between can get the hash keybucket_t
The array.
Remember the cache->isConstantEmptyCache() call to determine if the cache is read-only? This function essentially calls emptyBucketsForCapacity to determine whether the cache array is read-only.
Why do you have to do all this complicated logic to empty an array? As explained in the hash table memory allocation section above, this is to ensure that the cache hash table is read safely.
Search the source code and list just a few places to call this cleanup method:
attachCategories
Synchronizing Category information to Class._method_setImplementation / method_exchangeImplementations
Directly set the implementation of the method or when the exchange method is implemented.addMethod / addMethods
When adding methods.setSuperclass
When setting the parent class.
The case for needing to flush can be summarized in a nutshell: it may cause the cache to be invalidated.
Direct release
Cache_delete determines whether the contents of an array are read-only using isConstantEmptyCache. If they are not, cache_delete calls free to free them. Some readers may be worried that this release will make the method cache unsafe to read, but it won’t, because free_class is called when I only see it.
After the language
The method caching mechanism does a lot of extra complexity to make the reads secure by not locking the read logic for maximum efficiency, but the benefits are significant because the method cache reads very frequently.
The logic of objc_msgSend is undoubtedly complex, involving a lot of assembly and operating system knowledge, but it is not very difficult to analyze it according to the picture. Finally I have to say:
IOS is too hard.