Cache_t objc-msG-arm64.s cache_t objc-msg-arm64.s

Bucket_t and cache_t are all covered in the previous article, but there are only two functions left for this article because of space constraints. We will then move on to objc-cache.mm and objc-cache.h. It is these that make up the complete method caching implementation. ⛽ ️ ⛽ ️

Hair of late, this article is mainly on the assembly time is spent on reading these days, the assembly level is probably just a year ago I have ever seen Wang Shuang teacher the assembly of the book, and then didn’t contact, feeling the next source study involves the assembly place too much, so it is necessary for special assembly to do an overall cognition and learning, Rather than just knowing what registers and individual instructions mean. The second half of this article examines every line of the objC-MSG-arm64.s file. ⛽ ️ ⛽ ️

insert

Insert sel and IMP specified in cache_T. If the cache_T is initially empty, an array of hashes of 4 capacity is initialized before the cache_T is inserted. In other cases, the cache is calculated to determine whether the percentage of used capacity has reached a threshold, and if so, it is expanded before the cache_T is inserted. If not, it is directly inserted, and if a hash conflict occurs in the insert operation, +1/-1 hash probes are performed successively.

ALWAYS_INLINE
void cache_t::insert(Class cls, SEL sel, IMP imp, id receiver)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked(a);#else
    // If the lock fails, the assertion is executed
    runtimeLock.assertLocked(a);// Again, runtimeLock is used under __objc2__
#endif
    
    // assert that sel cannot be 0 and CLS has been initialized
    ASSERT(sel ! =0 && cls->isInitialized());

    // Use the cache as-is if it is less than 3/4 full.
    // If the cache usage is less than 3/4, it can be used as is.
    
    // Record new usage (old usage + 1)
    mask_t newOccupied = occupied() + 1;
    
    / / the old capacity
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    
    if (slowpath(isConstantEmptyCache())) { // Probably false
        EmptyBucketsList static bucket_t **emptyBucketsList
        // Bucket_t does not actually store bucket_t, so we need to re-allocate space to replace the empty cache.
        // Cache is read-only. Replace it.
        
        if(! capacity) capacity = INIT_CACHE_SIZE;// If capacity is 0, an initial value of 4 is assigned
        // Apply for buckets, mask(capacity-1), and _occupied
        // The static data prepared by buckets does not need to be released.
        // So the last argument is passed false.
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= capacity / 4 * 3)) { 
        // Most of it is here
        // Cache is less than 3/4 full. Use it as-is.
        // The cache takes up less than 3/4 of the space. Use as is.
        
        // add a CACHE_END_MARKER to the parentheses
        / / because the __arm__ | | __x86_64__ | | __i386__ these platform,
        // Buckets will have a bucket_t *end at the end of the bucket
        // The __arm64__ platform does not have this extra +1
    }
    else {
        // In the third case, the hash table space needs to be expanded
        // Double the original capacity
        // The old cache will not be copied to the new space for performance reasons.
        
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        
        // If greater than MAX_CACHE_SIZE, use MAX_CACHE_SIZE(1 << 16)
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        
        // Apply for space and do some initialization
        // In the case of isConstantEmptyCache, the old buckets need to be released after capacity expansion.
        // The third parameter is passed as true, indicating that the old buckets should be released.
        // The capacity of the old buckets is not used and collected at the old buckets has reached its threshold.
        // The memory space is actually freed
        reallocate(oldCapacity, capacity, true);
    }

    // Temporary variables
    bucket_t *b = buckets(a);mask_t m = capacity - 1;
    
    // use sel and _mask to hash the sel
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // Scan the first unused "slot" and insert bucket_t into it.
    
    // There is guaranteed to be an empty slot because the
    // minimum size is 4 and we resized at 3/4 full.
    // Make sure there is an empty slot, because the minimum size is 4,
    // If the usage ratio exceeds 3/4, expand the capacity.
    // The old cache will not be copied to the new space for performance reasons.
    // The old buckets will be discarded and their memory will be freed at the appropriate time
    
    // Do while does a linear hash probe (open addressing),
    // Find a space for sel and IMP.
    do {
        if (fastpath(b[i].sel() = =0)) {
            // If self is 0, then sel hash corresponds to an empty subscript,
            // Place sel and IMP directly here.
            
            // Occupied
            incrementOccupied(a);// Store sel and IMP atomically in _sel and _IMP of Bucket_t
            b[i].set<Atomic, Encoded>(sel, imp, cls);
            
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other
            // thread before we grabbed the cacheUpdateLock.
            // Before cacheUpdateLock(runtimeLock) locks,
            // The SEL/IMP has been added to the cache by some other thread.
            
            return;
        }
        
      // Next hash probe, where different platforms handle this in order of +1 or -1
    } while (fastpath((i = cache_next(i, m)) ! = begin));// if no suitable location is found, bad_cache
    cache_t: :bad_cache(receiver, (SEL)sel, cls);
}
Copy the code

INIT_CACHE_SIZE

/* Initial cache bucket count. INIT_CACHE_SIZE must be a power of two. Initialize the capacity of the cache bucket. INIT_CACHE_SIZE must be a power of 2 */
enum {
    INIT_CACHE_SIZE_LOG2 = 2,
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2), // 1 << 2 = 0b100 = 4
    MAX_CACHE_SIZE_LOG2  = 16,
    MAX_CACHE_SIZE       = (1 << MAX_CACHE_SIZE_LOG2), // 1 << 16 = 2^16 
};
Copy the code

cache_hash

// Class points to cache. SEL is key. Cache buckets store SEL+IMP.
// The class points to the cache. SEL is key. The buckets store SEL+IMP(struct bucket_t).

// Caches are never built in the dyld shared cache.
// Caches are never built into dyLD shared Caches.

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    // The hash value can be calculated arbitrarily, that is, sel and mask will not cross the boundary
    return (mask_t) (uintptr_t)sel & mask;
}
Copy the code

cache_next

Here is sel hash conflict occurs, hash value movement detection mode has different processing in different platforms.

#if __arm__  ||  __x86_64__  ||  __i386__
// objc_msgSend has few registers available.
// objc_msgSend has very few registers available.

// Cache scan increments and wraps at special end-marking bucket.
// Cache scan deltas are wrapped around special end tag buckets.
// If CACHE_END_MARKER is 1, endMarker is at the head of buckets.

#define CACHE_END_MARKER 1

// I moves backward by 1 each time, with the mask, ensuring that it does not cross the boundary
// If you arrive at buckets' destination, the value of mask and operation will be 0.
// Then move the probes backward until begin is reached. If the appropriate location is not found, a memory error has occurred.

static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask;
}

#elif __arm64__
// objc_msgSend has lots of registers available.
// objc_msgSend has many registers available.
// Cache scan decrements. No end marker needed.
// Cache scan decrement. No closing tag is required.
// If CACHE_END_MARKER is 0, there is no endMarker assignment.

#define CACHE_END_MARKER 0

// I is in descending order
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i- 1 : mask;
}

#else

// Unknown schema
#error unknown architecture

#endif
Copy the code

bad_cache

void cache_t::bad_cache(id receiver, SEL sel, Class isa)
{
    // Log in separate steps in case the logging itself causes a crash.
    // Log in separately to prevent logging itself from crashing.
    
    _objc_inform_now_and_on_crash
        ("Method cache corrupted. This may be a message to an "
         "invalid object, or a memory error somewhere else.");
         
    / / the cache
    cache_t *cache = &isa->cache;
    
    // Different platforms handle buckets and masks
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED
    bucket_t *buckets = cache->_buckets.load(memory_order::memory_order_relaxed);
    _objc_inform_now_and_on_crash
        ("%s %p, SEL %p, isa %p, cache %p, buckets %p, "
         "mask 0x%x, occupied 0x%x", 
         receiver ? "receiver" : "unused", receiver, 
         sel, isa, cache, buckets,
         cache->_mask.load(memory_order::memory_order_relaxed),
         cache->_occupied);
    _objc_inform_now_and_on_crash
        ("%s %zu bytes, buckets %zu bytes", 
         receiver ? "receiver" : "unused".malloc_size(receiver), 
         malloc_size(buckets));
#elif (CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16 || \
       CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4)
    uintptr_t maskAndBuckets = cache->_maskAndBuckets.load(memory_order::memory_order_relaxed);
    _objc_inform_now_and_on_crash
        ("%s %p, SEL %p, isa %p, cache %p, buckets and mask 0x%lx, "
         "occupied 0x%x",
         receiver ? "receiver" : "unused", receiver,
         sel, isa, cache, maskAndBuckets,
         cache->_occupied);
    _objc_inform_now_and_on_crash
        ("%s %zu bytes, buckets %zu bytes",
         receiver ? "receiver" : "unused".malloc_size(receiver),
         malloc_size(cache->buckets()));
#else

// Unknown cache mask storage type
#error Unknown cache mask storage type.

#endif

// SEL is just a string representing the method name (cast to const char *)
// const char *sel_getName(SEL sel) 
/ / {
// if (! sel) return "
      
       ";
      
// return (const char *)(const void*)sel;
// }

    // sel
    _objc_inform_now_and_on_crash
        ("selector '%s'".sel_getName(sel));
        
    / / the name of the class
    _objc_inform_now_and_on_crash
        ("isa '%s'", isa->nameForLogging());

    _objc_fatal
        ("Method cache corrupted. This may be a message to an "
         "invalid object, or a memory error somewhere else.");
}
Copy the code

This concludes the bucket_t and cache_t definitions. Next, let’s analyze the contents of objc-cache.h. (The objc-cache.h file is defined in the system library, not in objC4-781.)

objc-cache.h

// objc-cache.h specifies the entire contents of the file
#ifndef _OBJC_CACHE_H
#define _OBJC_CACHE_H

#include "objc-private.h"

__BEGIN_DECLS

extern void cache_init(void); / / initialization
extern IMP cache_getImp(Class cls, SEL sel); // Get the specified IMP
extern void cache_fill(Class cls, SEL sel, IMP imp, id receiver); // SEL and IMP are inserted into cache
extern void cache_erase_nolock(Class cls); // Reset the cache
extern void cache_delete(Class cls); / / delete buckets
extern void cache_collect(bool collectALot); // The old buckets are returned

__END_DECLS

#endif
Copy the code

cache_init

// Define HAVE_TASK_RESTARTABLE_RANGES to enable
// usage of task_restartable_ranges_synchronize()
/ / define HAVE_TASK_RESTARTABLE_RANGES
/// to enable task_restartable_ranges_synchronize()

// Scope/range of tasks that can be restarted

// #if TARGET_OS_SIMULATOR || defined(__i386__) || defined(__arm__) || ! TARGET_OS_MAC
// # define HAVE_TASK_RESTARTABLE_RANGES 0
// #else
// See that our x86_64 and arm64 platforms are both 1
// # define HAVE_TASK_RESTARTABLE_RANGES 1
// #endif

void cache_init(a)
{
#if HAVE_TASK_RESTARTABLE_RANGES
    // unsigned int
    mach_msg_type_number_t count = 0;
    // int
    kern_return_t kr;

    // typedef struct {
    // uint64_t location; / / position
    // unsigned short length; / / the length
    // unsigned short recovery_offs; / / migration
    // unsigned int flags; / / sign
    // } task_restartable_range_t;
    
    // extern "C" task_restartable_range_t objc_restartableRanges[];
    
    // Count something
    while (objc_restartableRanges[count].location) {
        count++;
    }

    // extern mach_port_t mach_task_self_;
    // #define mach_task_self() mach_task_self_
    // #define current_task() mach_task_self()
    
    // register
    kr = task_restartable_ranges_register(mach_task_self(),
                                          objc_restartableRanges, count);
                                          
    if (kr == KERN_SUCCESS) return; // If successful, return
    
    // If it fails, crash
    _objc_fatal("task_restartable_ranges_register failed (result 0x%x: %s)",
                kr, mach_error_string(kr));
                
#endif // HAVE_TASK_RESTARTABLE_RANGES
}
Copy the code

cache_getImp

Cache_getImp is an assembly function. (suddenly inexplicable excitement, finally find the need to seriously review the summary of the assembly of the reason, before seeing wang Shuang teacher’s assembly book now almost forgotten clean, finally can pick up the assembly. 🎉 🎉)

cache_fill

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
    // Use runtimeLock to lock. If the lock fails, perform the assertion
    CacheUpdateLock and runtimeLock are used to determine which lock to use.
    runtimeLock.assertLocked(a);#if! DEBUG_TASK_THREADS
    // Never cache before +initialize is done.
    // Do not cache until +initialize completes.
    
    if (cls->isInitialized()) {
        // Get the Class cache
        cache_t *cache = getCache(cls);
        
#if CONFIG_USE_CACHE_LOCK // __OBJC2__ cache does not use lock
        mutex_locker_t lock(cacheUpdateLock);
#endif

        / / insert
        cache->insert(cls, sel, imp, receiver);
    }
    
#else

    // Verify
    _collecting_in_critical();
    
#endif
}
Copy the code

DEBUG_TASK_THREADS

/* objc_task_threads Replacement for task_threads(). Define DEBUG_TASK_THREADS to debug crashes when task_threads() is failing. Define DEBUG_TASK_THREADS to debug crash when task_threads() fails. A failure in task_threads() usually means somebody has botched their Mach or MIG traffic Mach or MIG traffic. For example, somebody's error handling was wrong and they left a message queued on the MIG reply port for task_threads() to trip over. For example, if someone's error handling is wrong, they leave a message on the MIG reply port that causes task_threads() to "trip/trip". The code below is a modified version of task_threads(). The following code is a modified version of task_threads(). It logs the msgh_id of the reply message. The msgh_id can identify the sender of the message, which can help pinpoint the faulty code. It records the MSgh_ID of the reply message. Msgh_id identifies the sender of the message, which can help pinpoint the wrong code. DEBUG_TASK_THREADS also calls collecting_in_critical() during every message dispatch, DEBUG_TASK_THREADS also calls collection_in_critical() during each message distribution, which improves error reproducibility. This code can be regenerated by running `mig /usr/include/mach/task.defs`. Can run through ` mig/usr/include/Mach/task defs ` to regenerate the code. * /
Copy the code

cache_erase_nolock

The cache_erase_nolock function is used to empty the cache and reclaim the old buckets.

// Reset this entire cache to the uncached lookup by reallocating it.
// Reset the whole cache to uncached lookup by reallocating it.

// This must not shrink the cache - that breaks the lock-free scheme.
// This must not shrink the cache - this will break the lock-free scheme.

void cache_erase_nolock(Class cls)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked(a);#else
    runtimeLock.assertLocked(a);// __OBJC2__ is locked. If the lock fails, the assertion is executed
#endif
    
    / / the cache
    cache_t *cache = getCache(cls);
    
    // Cache size, return mask()? mask()+1 : 0;
    mask_t capacity = cache->capacity(a);if (capacity > 0  &&  cache->occupied(a) >0) {
        // The capacity is greater than 0 and the occupied capacity is greater than 0
        
        // 取得 buckets，bucket_t *
        auto oldBuckets = cache->buckets(a);// get a bucket to represent the bucket_t
        
        // buckets is a global cache_t::emptyBuckets() or
        // Prepare a bucket in the static emptyBucketsList
        EmptyBucketsForCapacity Allocate is true by default.
        // Space will be allocated when capacity is larger than EMPTY_BYTES.
        
        auto buckets = emptyBucketsForCapacity(capacity);
        
        _buckets and _mask will also set _occupied to 0
        cache->setBucketsAndMask(buckets, capacity - 1); // also clears occupied

        // Put the old buckets back for release
        cache_collect_free(oldBuckets, capacity); }}Copy the code

cache_delete

void cache_delete(Class cls)
{
#if CONFIG_USE_CACHE_LOCK
    mutex_locker_t lock(cacheUpdateLock);
#else
    runtimeLock.assertLocked(a);// If the lock fails, an assertion will be executed
#endif

    // Determine whether the release operation can be performed
    if (cls->cache.canBeFreed()) {
        / /! isConstantEmptyCache();
        / /! (occupied() == 0 && buckets() == emptyBucketsForCapacity(capacity(), false))
        
        // Buckets are recorded to be released
        if (PrintCaches) recordDeadCache(cls->cache.capacity());
        
        // Free buckets' memory
        free(cls->cache.buckets()); }}Copy the code

cache_collect

Void cache_collect(bool collectALot) attempts to release the old buckets. The collectALot parameter indicates whether buckets should try to free the memory of the old buckets even if the memory usage of the buckets to be freed is less than the threshold (32 x 1024). If the memory ratio of buckets to be released is less than the threshold and collectALot is false, return. If the above condition is false, we continue to determine whether buckets can be released. If collectALot is false, it checks whether objc_msgSend (or some other cache reader) is currently looking for caches and may still be using buckets waiting to be released, and returns. If collectALot is true, the loop waits for _collecting_in_critical() until no objc_msgSend (or other cache reader) is looking up the cache. Then it can be released as normal, with the garbage marker set to 0 to indicate the initial state. See the previous article for more details.

In objc-cache.mm, there are only threads that have their own storage space and hold information based on a few specified keys. One of the most important cache_getImp assembly functions is cache_getImp. Yes, it’s implemented in assembly. My level of assembly is limited to reading a book about a year ago by Wang Shu. Other do not know anything about assembly if, but it doesn’t matter which involves instruction is not complicated, if we have studied the above bucket_t and cache_t structure, saw it must be able to understand, hard to understand, is our daily pointer into a register operation, it is not difficult to understand, We just need to focus on the execution process.

Believe that all developers have heard of the news of the Objective – C send some knowledge of process and method of cache is to serve the message flow to, at this time if we need to continue learning has a cognition to the message flow, to send a message that must have first message, that the news come from where? This involves the execution of the objc_msgSend function, so let’s learn.

objc_msgSend

Where does objc_msgSend come from

Let’s start with some validation of the cache_T structure using the console.

// LGPerson.h
@interface LGPerson : NSObject

// Instance method
- (void)instanceMethod1;
- (void)instanceMethod2;
- (void)instanceMethod3;
- (void)instanceMethod4;
- (void)instanceMethod5;
- (void)instanceMethod6;
- (void)instanceMethod7;

@end

// Write the following call in main.m

LGPerson *person = [LGPerson alloc];
LGPerson *p = [person init]; // ⬅️ this line interrupts

[p instanceMethod1];
[p instanceMethod2];
[p instanceMethod3];
[p instanceMethod4];
[p instanceMethod5];
[p instanceMethod6];
[p instanceMethod7];
Copy the code

The console prints the following:

// Prints class information
(lldb) p [person class]
(Class$0 =) LGPerson

// Based on the objc_class structure, 0x1000021E8 is the starting address of the cache member variable

(lldb) x/4gx $0
0x1000021d8: 0x00000001000021b0(isa)0x00000001003ee140 (superclass)
0x1000021e8: 0x0000000100677860 0x0002801000000003 (cache_t)

(lldb) p (cache_t *)0x1000021e8 // cast to cache_t pointer
(cache_t *) $1 = 0x00000001000021e8

// Dereference the cache_t pointer directly to see its contents
(lldb) p *$1

// We are currently operating on x86_64, so cache_t is structured as
// Cache_mask_storage_may include a style that includes no mask
(cache_t) $2 = {

  // bucket_t pointer, STD ::__1::atomic is a c++ atomic operation,
  // We only focus on the template abstract type in <>
  _buckets = {
    std::__1::atomic<bucket_t* > =0x0000000100677860 {
      _sel = {
        std::__1::atomic<objc_selector *> = 0x00007fff70893e54
      }
      _imp = {
        std::__1::atomic<unsigned long> = 4041432}}}// mask = 3, capacity = 4,
  // The chache_t hash array has an initial length of 4.
  _mask = {
    std::__1::atomic<unsigned int> = 3
  }
  
  _flags = 32784
  
  // We just called the [LGPerson clloc] function
  // The occupancy is 2
  
  _occupied = 2
}

// Continue to print the contents of _buckets
(lldb) p (bucket_t*) $1->buckets()
(bucket_t *) $4 = 0x0000000100677860

// _occupied: occupied
$[0] = $[0];
// Buckets is a bucket_t pointer array with length 4
// bucket_t has only _sel and _IMP member variables

// Since the blogger is currently using Xcode 12, it seems that the runtime classes are blocked by Apple,
// The code cannot be tested again
// These test prints are excerpts from my previous tests on Xcode 11 😭,
// For the time being it will have to be done.

// Otherwise you can use NSString *NSString from Selector(SEL aSelector)
// the function gets the name _sel and let's see what it is,
// Now you can only see a hexadecimal address and can't see who it is

(lldb) p $4[0]
(bucket_t) $5 = {
  _sel = {
    std::__1::atomic<objc_selector *> = 0x00007fff70893e54
  }
  _imp = {
    std::__1::atomic<unsigned long> = 4041432}}/ / 0
(lldb) p $4[1]
(bucket_t) $6 = {
  _sel = {
    std::__1::atomic<objc_selector *> = 0x0000000000000000
  }
  _imp = {
    std::__1::atomic<unsigned long> = 0}}/ / 0
(lldb) p $4[2]
(bucket_t) $7 = {
  _sel = {
    std::__1::atomic<objc_selector *> = 0x0000000000000000
  }
  _imp = {
    std::__1::atomic<unsigned long> = 0}}Copy the code

Class CLS = NSClassFromString(@”LGPerson”); Mask = 0, _occupied = 0, LGPerson *person = [LGPerson alloc]; Mask = 3; _occupied = 2;

// Just call NSClassFromString(@"LGPerson") to get the LGPerson
Class cls = NSClassFromString(@"LGPerson"); ./ / print cache_t
(cache_t) $3 = {
  _buckets = {
    std::__1::atomic<bucket_t* > =0x00000001003e8490 {
      _sel = {
        std::__1::atomic<objc_selector *> = 0x0000000000000000
      }
      _imp = {
        std::__1::atomic<unsigned long> = 0}}}// Mask value is 0
  _mask = {
    std::__1::atomic<unsigned int> = 0
  }
  
  _flags = 16
  
  // Occupied is also 0
  _occupied = 0 
}

// The breakpoint is executed at [persont init] and then printed
// The list of commands used is the same as above
p [person class]
x/4gx $0
p (cache_t *)0x1000021f0
p *$1. _mask = { std::__1::atomic<unsigned int> = 3
}
_flags = 32784

// The occupied is 2
_occupied = 2.// Print after init.// The value of mask is 3
_mask = {
  std::__1::atomic<unsigned int> = 3
}
_flags = 32784

// Already occupied is also 3
_occupied = 3.Copy the code

_capacity _mask _occupied; _capacity _mask _occupied;

variable	0	1	2	3	4	5	6	7	8
_capacity	4	4	8	8	8	8	8	8	16
_mask	3	3	7	7	7	7	7	7	15
_occupied	2	3	1	2	3	4	5	6	1

_occupied will be expanded by 2 times each time it reaches 3/4 of _capacity. Cache_t: occupies a bucket at the end of each cache_t expansion.

Objc_msgSend is not yet displayed.

// LGPerson.h
@interface LGPerson : NSObject
- (void)method1;
- (NSString *)methodWithReturn;
- (NSString *)method:(NSInteger)param;
@end

// main.m
LGPerson *person = [[LGPerson alloc] init];
[person method1];
[person methodWithReturn];
[person method:11];
Copy the code

Then we execute the clang-rewrite-objc main.m command on the terminal to convert main.m to main. CPP.

int main(int argc, const char * argv[]) {

/* @autoreleasepool */ { __AtAutoreleasePool __autoreleasepool; 
 
NSLog((NSString *)&
                   __NSConstantStringImpl__var_folders_l0
                   _ntvl5rs97t30j69kh6g3vb_c0000gn_T
                   _main_416c0e_mi_0);
 
LGPerson *person = ((LGPerson *(*)(id, SEL))
                   (void *)objc_msgSend)((id)((LGPerson *(*)(id, SEL))
                   (void *)objc_msgSend)((id)objc_getClass("LGPerson"), 
                                        sel_registerName("alloc")), 
                   sel_registerName("init"));
 
((void (*)(id, SEL))(void *)objc_msgSend)((id)person,
                                          sel_registerName("method1"));
 
((NSString *(*)(id, SEL))(void *)objc_msgSend)((id)person, 
                                               sel_registerName("methodWithReturn"));

((NSString *(*)(id, SEL, NSInteger))(void *)objc_msgSend)((id)person, 
                                                          sel_registerName("method:"),
                                                          (NSInteger)11);

} // Corresponds to the right curly bracket at the end of autoreleasepool above

return 0;
}

// Parse the function calls and see that the objc_msgSend is converted to a different function pointer each time the function is called:
// [person method1];
Void (*)(id, SEL))
// id is the person object on which the function is called
/ / SEL is sel_registerName (" method1)"

// [person method:11];
Return value (NSString *(*)(id, SEL, NSInteger)).Copy the code

Here we see that our everyday OC call is actually converted to objc_msgSend, which we’ve seen many times before, For example ((id(*)(objc_object *, SEL))objc_msgSend)(this, @selector(retain)); This is called when the retain function is overridden. When we looked at bucket_t several times we said SEL is the function name string, IMP is the address of the function, and the essence of function execution is to find the address of the function and execute it, and that’s exactly what objc_msgSend does, More specifically, find the SEL address on the ID and execute it. So how is objc_msgSend implemented? At first glance it looks like a C/C++ function, but it’s actually implemented in assembly. There are also some important reasons for using assembly, besides the fact that it is fast, method lookup operations are frequent, assembly is easier for machine recognition than the underlying language, saving some compilation in the middle

Assembly is easier for machines to read.
Parameter unknown, type unknown ForC 和 C++Is not as handy as assembly

Objc_msgSend assembly implementation

Under objC4-781 /Source, we can see several files with the suffix.s. Yes, they are assembler files. And the name of each file contains a suffix -ARM /-arm64/-i386/-x86-64 and -simulator-i386/-simulator-x86-64, which indicates the corresponding platform of the assembler file. So let’s look at objC-msG-arm64.s.

objc-msg-arm64.s

RestartableEntry

/* * objc-msg-arm64. s-arm64 code to support objc messaging */
Copy the code

#ifdef __arm64__ // Qualify to belong to the __arm64__ platform

#include <arm/arch.h>
#include "isa.h"
#include "arm64-asm.h"
#include "objc-config.h"

// Assembler names beginning with. Are not mnemonics for instructions and will not be translated into machine instructions.
// Instead give the assembler some special hints,
// Called Assembler Directive or pseudo-operation,
// Since it is not a real command, add a "false" word.

//.section indicates dividing the code into sections,
When the program is loaded by the operating system, each segment is loaded to a different address.
// The operating system sets different read, write, and execute permissions for different pages.

// .section .data 
//.data section holds program data, is readable and writable, equivalent to C program global variables.

// .section .text 
The.text section holds the code, which is read-only and executable, and all subsequent instructions belong to the.text section.

//.section section, you can customize a section with the.section pseudo-operation
// .section expr; // expr can be.text/.data/.bss
//.text compiles the code at the beginning of the definition into a code snippet
//.data compiles the data at the beginning of the definition to the data segment
//.bss stores variables in the.bss section, which usually refers to
// An area of memory used to store uninitialized global variables in the program,
// A data segment is usually an area of memory used to store initialized global variables in a program
// Note: the.bss section should come before the.text section in the source program

.data // compiles the data at the beginning of the definition to the data segment

// _objc_restartableRanges is used by
// method dispatch caching code to figure out whether
// any threads are actively in the cache for dispatching. 
// The labels surround the asm code that do cache lookups.
// The tables are zero-terminated.

// The method schedules the cache code using _objc_restartableRanges
// to determine if any threads are active in the cache for scheduling.
// labels surrounds the ASM code that performs cache lookups. These tables end in zero.

// Define the following 6 private restarTableEntries,
// 6 "functions" related to our message sending

.macro RestartableEntry
#if __LP64__
    // Under the arm64 64-bit operating system
    //.quad defines an 8-byte (two-word) type
    Tags that start with L are called local tags. These tags can only be used inside functions.
    .quad    LLookupStart$0  
#else
    .long    LLookupStart$0 //. Long defines a long integer of 4 bytes
    .long    0 // The value of 0 is 0. // The value of 0 is 0.
#endif

    .short    LLookupEnd$0 - LLookupStart$0 //.short defines a short 2-byte integer
    .short    LLookupRecover$0 - LLookupStart$0
    .long    0 // The value of 0 is 0. // The value of 0 is 0.
    // RestartableEntry specifies the end of the macro definition, mainly used for the following declaration
    // (corresponding to fill below, a RestartableEntry is exactly 16 bytes)
.endmacro 

    .align 4 // indicates alignment with 2^4 16 bytes
    .private_extern _objc_restartableRanges // Private outreach?
_objc_restartableRanges:
    
    // Define 6 private RestartableEntry names that correspond to functions we use in daily message sending
    // The implementation of C functions is shown below, which will be interpreted line by line
    
    RestartableEntry _cache_getImp 
    RestartableEntry _objc_msgSend
    RestartableEntry _objc_msgSendSuper
    RestartableEntry _objc_msgSendSuper2
    RestartableEntry _objc_msgLookup
    RestartableEntry _objc_msgLookupSuper2
    
    //. Fill repeat, size, value
    // Size and value are optional. The default values are 1 and 0 respectively
    // Fill all with 0
    
    .fill    16.1.0

// The following is the macro definition of C, which is mixed with assembly

/* objc_super parameter to sendSuper */ 
// The objc_super annotation here is defined in Public Header/message.h
Struct objc_super, struct objc_super, id receiver, Class super_class/ Class Class

#define RECEIVER         0

// The definition of __SIZEOF_POINTER__ is not found globally.
// The size of a pointer is 8 bytes
#define CLASS            __SIZEOF_POINTER__

/* Selected field offsets in class structure */
/* The offset of the Selected field in the class structure */

// Members of the objc_class structure,
// We know that its first member variable is isa_t isa inherited from objc_Object
// Then Class superclass and cache_t cache
// The superclass offset is 8 bytes, and the cache offset is 16 bytes

#define SUPERCLASS       __SIZEOF_POINTER__
#define CACHE            (2 * __SIZEOF_POINTER__)

/* Selected field offsets in method structure */
/* The offset of the Selected field in the method structure */

// This corresponds to the method_t structure, which has three member variables:
// SEL name, const char *types, MethodListIMP imp
// name is offset 0, (SEL is actually unsigned long 8 bytes, so the offset of the types member variable is 8)
// The imp member offset is 16; // The imp member offset is 16.
// IMP offset is 2 * 8

#define METHOD_NAME      0
#define METHOD_TYPES     __SIZEOF_POINTER__
#define METHOD_IMP       (2 * __SIZEOF_POINTER__)

// BUCKET_SIZE is the size of bucket_t. It has two member variables, _IMP and _sel, which are 8 bytes each, so this is 16 bytes
#define BUCKET_SIZE      (2 * __SIZEOF_POINTER__)
Copy the code

GetClassFromIsa_p16

Get the class pointer from ISA and place it on the general purpose register P16.

/* * GetClassFromIsa_p16 SRC * SRC isa raw ISA field. Sets p16 to the corresponding class pointer. Set p16 to the corresponding class pointer. * * The raw isa might be an indexed isa to be decoded, * or a packed isa that needs to be masked. One is to get the pointer of the class directly from the corresponding bit through the mask, * one is to get the index of the class from the corresponding bit and then get the corresponding class in the global class table * uintptr_t shiftcls in ISA_BITFIELD: 33; And uintptr_t indexcls: 15; * * On exit: When remembered * p16 isa class pointer * x10 is clobbered * when remembered * P16 isa class pointer x10 is clobbered */

SUPPORT_INDEXED_ISA is not supported on either X86_64 or ARM64,
// Mainly used in watchOS (__arm64__ &&! __LP64__) (armV7K or arm64_32)
#if SUPPORT_INDEXED_ISA 
    // If the optimized ISA stores indexCLs
    .align 3 // Align with 2^3 = 8 bytes
    .globl _objc_indexed_classes // Define a global tag _objc_indexed_classes
_objc_indexed_classes:

    // PTRSIZE is defined in arm64-ASm. h, 8 under arm64, 4 under arm64_32,
    // Indicates the width of a pointer, 8 bytes or 4 bytes
    // ISA_INDEX_COUNT is defined in isa.h
    
    // #define ISA_INDEX_BITS 15
    #define ISA_INDEX_COUNT (1 << ISA_INDEX_BITS) #define ISA_INDEX_COUNT (1 << ISA_INDEX_BITS
    
    // uintptr_t nonpointer : 1;
    // uintptr_t has_assoc : 1;
    // uintptr_t indexcls : 15;
    // ...
    // Indexcls is the 1-15 digits
    
    //. Fill repeat, size, value
    // Size and value are optional. The default values are 1 and 0 respectively
    // Fill all with 0
    
    .fill ISA_INDEX_COUNT, PTRSIZE, 0
#endif

// Assembler macro defines GetClassFromIsa_p16
.macro GetClassFromIsa_p16 /* src */

// The following are three familiar cases
// 1. Isa holds the index of the class as a mask
// 2. Isa holds Pointers to classes as masks
// 3. Isa is the original class pointer

SUPPORT_INDEXED_ISA is not supported on either X86_64 or ARM64,
// Mainly used in watchOS (__arm64__ &&! __LP64__) (armV7K or arm64_32)
#if SUPPORT_INDEXED_ISA
    // Indexed isa
    // if there isa class index in isa
    
    // $parameter is the parameter of the macro instruction.
    // When the macro instruction is expanded, it is replaced with the corresponding value,
    // Similar to formal arguments in functions,
    // You can specify default values for the parameters at macro definition time.
    // this is the macro with arguments that we use in C.
    // $0 represents the macro's first argument.
    
    // Set $0 to p16. This $0 is isa_t/Class isa
    
    mov    p16, $0            // optimistically set dst = src
    
    // #define ISA_INDEX_IS_NPI_BIT 0
    // Defined in isa.h
    
    // p16[0] is compared with 1f, which is exactly what is in our ISA_BITFIELD
    // uintptr_t nonpointer : 1;
    // Identifies bits for comparison. If the value is 1, it indicates an optimized ISA; if not, it indicates a primitive pointer
    
    // TBNZ X1, #3 label // if X1[3]! If = 0, the system jumps to label
    // if X1[3]==0, go to label
    
    // if p16[0]! If = 1, p16 is now storing a non-pointer ISA and ends the macro definition directly
    
    tbz    p16, #ISA_INDEX_IS_NPI_BIT, 1f    // done if not non-pointer isa
    
    // isa in p16 is indexed
    // isa in p16
    
    // Select a pointer from the global class table based on the index in ISA.
    
    // ADR
    // Function: small range of address read instruction. The ADR instruction reads the address value based on the relative offset of the PC into the register.
    // Principle: add the signed 21-bit offset to PC,
    // The result is written to the general-purpose register, which can be used to calculate the valid address of any byte in the +/- 1MB range.
    
    // ADRP
    // Function: a wide range of address reading instructions in unit of page.
    // The ADRP command is PC+imm (offset) and then finds a 4KB page where the lable is.
    // Get the base address of the label and offset it to address it.
    
    // Read the base address of the page where _objc_indexed_classes is located into the X10 register
    adrp    x10, _objc_indexed_classes@PAGE
    
    // x10 = x10 + _objc_indexed_classes(offset in page)
    // the x10 base address is memory offset based on the offset
    add    x10, x10, _objc_indexed_classes@PAGEOFF
    
    // Unsigned bit-field extract instruction
    // UBFX Wd, Wn, #lsb, #width ; 32-bit
    // UBFX Xd, Xn, #lsb, #width ; 64-bit
    // Select * from Wn register LSB bit; select * from Wd register width bit
    
    // #define ISA_INDEX_SHIFT 2
    // #define ISA_INDEX_BITS 15
    
    // Start from the ISA_INDEX_SHIFT bit of p16,
    // Extract the ISA_INDEX_BITS bits to register P16, and fill the other bits with zeros
    // Raise indexCls from the bitfield
    
    ubfx    p16, p16, #ISA_INDEX_SHIFT, #ISA_INDEX_BITS  // extract index
    
    // __LP64__ 下: #define PTRSHIFT 3 // 1<
    / /! <
    
    // __LP64__: #define UXTP UXTX
    / /! __LP64__: #define UXTP UXTW
    // Extend the instruction to extend p16 by 8/4 bits
    // Then offset the corresponding bit from x10 and store the value in p16.
    // Where does the global class table exist?
    
    // Load classes from arrays into p16
    ldr    p16, [x10, p16, UXTP #PTRSHIFT]    // load class from array
    
1: // What is the meaning of the 1 in the TBZ command?

#elif __LP64__
    // If class Pointer is stored in isa
    // #define ISA_MASK 0x0000000ffffffff8ULL
    // ISA_MASK and $0(isa) extract the class pointer and put it in p16
    // The same as our (Class)(isa.bits & ISA_MASK)

    // 64-bit packed isa
    and    p16, $0, #ISA_MASK

#else
    // In the last case, isa is the original class pointer
    
    // 32-bit raw isa
    // put isa directly into p16
    mov    p16, $0

#endif

.endmacro // End of macro definition
Copy the code

ENTRY/STATIC_ENTRY/STATIC_ENTRY

/* * ENTRY functionName * STATIC_ENTRY functionName * END_ENTRY functionName */

// Define an assembler macro ENTRY that defines a 32-byte aligned global function in the text segment.
// "$0" also produces a function entry label.
$0 is the first input to the macro definition. $0 is the first input to the macro definition
// ($1, $2, etc.)

.macro ENTRY /* name */
    .text //.text defines a snippet of code that is followed when the processor starts executing the code. This is a must for GCC.
    .align 5 // 2^5 32 bytes aligned
    .globl    $0 The global keyword is used to make a symbol visible to the linker and can be used by other link object modules.
                 // Tell the assembler to follow with a globally visible name (possibly a variable or function name)
                 
                 $0 = $0;
                 // All functions representing the ENTRY annotation are globally visible
                 
                 / / 00001:
                 // 00002: .text
                 // 00003: .global _start
                 / / 00004:
                 // 00005: _start:
                 
                 //. Global _start and _start: match,
                 // Define a global flag for the code start address _start.
                 // _start is the start address of a function, as well as the start address of compiled and linked programs.
                 // Since the program is loaded through the loader,
                 // the function with the name _start must be found, so _start must be defined globally,
                 // in order to exist in the compiled global symbol table,
                 // to be found by other programs such as loaders.
                 
                 //. Global _start makes the _start symbol a visible identifier,
                 // So the linker knows where to jump to in the program and start executing,
                 // Linux looks for the _start tag as the default entry point for the program.
                 
                 Extern XXX specifies that XXX is an external function,
                 // When called, you can search through all the files to find the function and use it
                 
// In the GNU ARM compiler environment,
// Assembler to use the.global pseudo-operation to declare the assembler as a global function,
// can be called by an external function,
// Extern is also used in C programs to declare functions to be called by assembly.

$0:
.endmacro

// STATIC_ENTRY
.macro STATIC_ENTRY /*name*/ / / same as above
    .text
    .align 5
    .private_extern $0 // this is private_extern (private function)
$0:
.endmacro

// END_ENTRY Entry ends
.macro END_ENTRY /* name */
LExit$0: // There is only one LExit$0 tag (tags beginning with L are called local tags, and these tags can only be used inside functions)
.endmacro
Copy the code

UNWIND

You can see that each of the following UNWIND times comes after ENTRY/STATIC_ENTRY.

/* * UNWIND name, flags * UNWIND info generation */
.macro UNWIND
    .section __LD,__compact_unwind,regular,debug
    
    // __LP64__: #define PTR .quad
    / /! __LP64__: #define PTR .long
    
    PTR $0 //.quad defines an 8-byte (two-word) type. /.long defines a 4-byte long integer
    
    .set  LUnwind$0, LExit$0 - $0 //.set assigns a value to a global or local variable
    
    //.long Defines long integers of 4 bytes. (Labels beginning with L are called local labels. These labels can only be used inside functions.)
    .long LUnwind$0  
    .long $1 $1 = $1; $1 = $1
    
    //.quad defines an 8-byte (word) type. // long defines a 4-byte long integer.
    PTR 0     /* no personality */
    
    //.quad defines an 8-byte (word) type. // long defines a 4-byte long integer.
    PTR 0  /* no LSDA */ 
    
    .text //.text defines a snippet of code that is followed when the processor starts executing the code. This is a must for GCC.
.endmacro

// hard code the value.
#define NoFrame 0x02000000  // no frame, no SP adjustment
#define FrameWithNoSaves 0x04000000  // frame, no non-volatile saves
Copy the code

TailCallCachedImp

In the Project Headers/arm64 – asm. H file defines several assembly macro to handle CacheLookup NORMAL | GETIMP | LOOKUP < XSL: > function of different results. When a cache hit is NORMAL, TailCallCachedImp is used, which validates and invokes imp.

.macro TailCallCachedImp
    
    // eOR (exclusive or)
    Eor directives are in the format of eOR {condition}{S} Rd, Rn, operand
    // the eOR directive pairs Rn with operand bit logic "xor",
    // The same is 0, the different is 1, the result is stored in the destination register Rd.
    
    // $0 = cached imp, $1 = address of cached imp, $2 = SEL, $3 = isa
    
    // set SEL and IMP addresses to xor,
    // And place the result in $1 (mix SEL into ptrauth modifier)
    eor    $1, $1, $2    // mix SEL into ptrauth modifier
    
    // Place isa and $1 bitwise xOR in $1 (mix ISA into ptrauth modifier)
    eor    $1, $1, $3  // mix isa into ptrauth modifier
    
    // bra unconditional jump instruction
    $1 = $0; $1 = $0;
    // The bra instruction is unconditioned
    brab    $0, $1
.endmacro
Copy the code

AuthAndResignAsIMP

Verify IMP only.

.macro AuthAndResignAsIMP
    // $0 = cached imp, $1 = address of cached imp,
    // $2 = SEL, $3 = isa
    // Note: assumes the IMP is not nil
    
    // $1 and $2 perform xor bitwise, and place the result in $1
    eor    $1, $1, $2    // mix SEL into ptrauth modifier
    
    // $1 and $3 perform xor bit by bit, and place the result in $1
    eor    $1, $1, $3  // mix isa into ptrauth modifier
    
    // Use key B to verify instruction address. This directive validates the directive address using the modifier and key B.
    autib    $0, $1        // authenticate cached imp
    
    // XZR is zero register
    // put the contents of $0 into XZR
    // crash if authentication failed Crash when authentication fails
    ldr    xzr, [$0]    
    
    // Not found yet
    paciza    $0        // resign cached imp as IMP
.endmacro
Copy the code

CacheLookup

/ * * CacheLookup NORMAL | GETIMP | LOOKUP < XSL: > * (represent three different execution purposes, LOOKUP is to look up, GETIMP is to GETIMP, Locate the implementation for a selector in a class method cache. Locate the implementation for a selector in a class method cache Select implementation. * * When this is used in a function that doesn't hold the runtime lock, * This represents the critical section that may access dead memory. * When used in functions that do not hold runtime locks, it represents the possibility of accessing critical parts of dead memory. * * If the kernel causes one of these functions to go down the recovery path, * We pretend the lookup failed by jumping the JumpMiss branch. * If the kernel causes one of these functions to disappear along the recovery path, we will skip the JumpMiss branch and pretend the lookup failed. * * Kills: * * Kills: * * hold (*); * * hold (*); * x9,x10,x11,x12, x17 * * On exit: (found) calls or returns IMP * with x16 = class, x17 = IMP * (not found) jumps to LCacheMiss * 1. If found, will call or return IMP, x16 save class information, X17 save IMP * 2. If not, jump to LCacheMiss */

#define NORMAL 0
#define GETIMP 1
#define LOOKUP 2

// The functionality of CacheHit is to cache hits and process them for different situations.

// CacheHit: x17 = cached IMP, x12 = address of cached IMP, x1 = SEL, x16 = isa
X17 IMP x12 IMP address x17 SEL x16 save class information

// Cache hit macros:
.macro CacheHit

.if $0 == NORMAL
    // NORMAL indicates that function execution is normally found in the cache and returned
    
    // TailCallCachedImp is defined in arm64-ASm.h
    // Verify and execute IMP
    // Authenticate and call IMP authenticate and call IMP
    TailCallCachedImp x17, x12, x1, x16    
    
.elseif $0 == GETIMP
    // GETIMP only looks for IMP in the cache
    
    // p17 is cached IMP and put into P0
    mov    p0, p17 // Put the contents of p17 into p0
    
    // CBZ Compare, if the result is Zero (can only skip to the following instruction)
    // CBNZ compares, if the result is non-zero (can only jump to the following instruction)
    
    / / CBZ and CBNZ
    // Compare, if zero, jump; Comparison, if is not zero, jump
    / / grammar
    // CBZ Rn, label
    // CBNZ Rn, label
    Rn is the register that stores operands and label is the jump target
    
    // If p0 is 0, go to label 9, where ret is executed directly
    cbz    p0, 9f            // don't ptrauth a nil imp
    
    // AuthAndResignAsIMP is defined in arm64-ASm.h
    // Verify and XXX IMP
    // authenticate imp and re-sign as IMP
    AuthAndResignAsIMP x0, x12, x1, x16    
    
    // return IMP
9:    ret                // return IMP
.elseif $0 == LOOKUP
    // LOOKUP to perform the LOOKUP
    
    // No nil check for ptrauth: the caller would
    // crash anyway when they jump to a nil IMP.
    // We don't care if that jump also fails ptrauth.
    // PTR validation has no nil detection: the caller jumps to nil IMP and crashes anyway.
    // We don't care if the jump fails.
    
    // AuthAndResignAsIMP is defined in arm64-ASm.h
    // Verify and XXX IMP
    // authenticate imp and re-sign as IMP
    AuthAndResignAsIMP x17, x12, x1, x16 
    
    // return imp
    ret                // return imp via x17
.else

//. Abort stops the assembly
// Our Linux kernel also apologetically says to us when certain fatal problems occur:
// Oops, I'm sorry I screwed up.
// The Linux kernel prints Oops when a kernel panic occurs.
// Show the current register status, stack contents, and complete Call trace.
// This will help us locate the error.
.abort oops

.endif
.endmacro // End the CacheHit assembly macro definition

// The function of CheckMiss is to handle different cases of cache miss.

// Cache missed macros
.macro CheckMiss
    // miss if bucket->sel == 0
    // If the sel of the bucket is 0 when searching the cache
.if $0 == GETIMP
    // GETIMP only looks for IMPs in the cache

    // CBZ Compare, if the result is Zero (can only skip to the following instruction)
    // CBNZ compares, if the result is non-zero (can only jump to the following instruction)

    / / CBZ and CBNZ
    // Compare, if zero, jump; Comparison, if is not zero, jump
    / / grammar
    // CBZ Rn, label
    // CBNZ Rn, label
    Rn is the register that stores operands and label is the jump target

    // If p9 is 0, jump to LGetImpMiss
    // bucket_t has sel 0 when the hash probe fails and bucket_t has sel 0.
    // if the bucket_t sel is not in the hash array at all, if the bucket_t sel is not in the hash array at all,
    +1/-1 +1 +1 +1 +1 +1 +1 +1 +1
    
    // What LGetImpMiss does is put p0 in 0 and ret
    cbz    p9, LGetImpMiss
    
.elseif $0 == NORMAL

    // If p9 is 0, go to __objc_msgSend_uncached
    // bucket_t has sel 0 when the hash probe fails and bucket_t has sel 0.
    // if the bucket_t sel is not in the hash array at all, if the bucket_t sel is not in the hash array at all,
    +1/-1 +1 +1 +1 +1 +1 +1 +1 +1
    cbz    p9, __objc_msgSend_uncached
    
.elseif $0 == LOOKUP

    // If p9 is 0, go to __objc_msgLookup_uncached
    // bucket_t has sel 0 when the hash probe fails and bucket_t has sel 0.
    // if the bucket_t sel is not in the hash array at all, if the bucket_t sel is not in the hash array at all,
    +1/-1 +1 +1 +1 +1 +1 +1 +1 +1
    cbz    p9, __objc_msgLookup_uncached
.else

//. Abort stops the assembly
// Our Linux kernel also apologetically says to us when certain fatal problems occur:
// Oops, I'm sorry I screwed up.
// The Linux kernel prints Oops when a kernel panic occurs.
// Show the current register status, stack contents, and complete Call trace.
// This will help us locate the error.

.abort oops

.endif
.endmacro // End the CheckMiss assembly macro definition

// JumpMiss's function is to jump to different situations when the cache misses.

.macro JumpMiss

.if $0 == GETIMP
    // Jump to the label LGetImpMiss
    b    LGetImpMiss
    
.elseif $0 == NORMAL
    // Skip to the __objc_msgSend_uncached tag
    b    __objc_msgSend_uncached
    
.elseif $0 == LOOKUP
    // Skip to the label __objc_msgLookup_uncached
    b    __objc_msgLookup_uncached
    
.else

//. Abort stops the assembly
// Our Linux kernel also apologetically says to us when certain fatal problems occur:
// Oops, I'm sorry I screwed up.
// The Linux kernel prints Oops when a kernel panic occurs.
// Show the current register status, stack contents, and complete Call trace.
// This will help us locate the error.

.abort oops

.endif
.endmacro // End the JumpMiss assembly macro definition

// CacheLookup performs assembly lookup
.macro CacheLookup
    //
    // Restart protocol:
    // Restart protocol:
    //
    // As soon as we're past the LLookupStart$1 label we 
    // may have loaded an invalid cache pointer or mask.
    // Once the LLookupStart$1 tag is exceeded, we may have loaded an invalid cache pointer or mask.
    // 
    // When task_restartable_ranges_synchronize() is called,
    // (or when a signal hits us) before we're past LLookupEnd$1,
    // then our PC will be reset to LLookupRecover$1 which forcefully
    // jumps to the cache-miss codepath which have the following.
    // called before we exceed LLookupEnd$1 (or when the signal hits us)
    / / task_restartable_ranges_synchronize (),
    // Our PC will reset to LLookupRecover$1, which will force a jump to the cache missed code path,
    // It contains the following contents.
    // requirements:
    / / requirements:
    //
    // GETIMP:
    / / get IMP:
    // A cache miss returns NULL
    // The cache-miss is just returning NULL (setting x0 to 0) 
    //
    // NORMAL and LOOKUP:
    // -x0 contains the receiver // x0 contains the receiver
    // -x1 contains the selector.
    // -x16 contains the isa // x16 contains the isa of class.
    Registers are set as per calling conventions // Other registers are set as per calling conventions
    //
LLookupStart$1:

    P1 = SEL, p16 = isa
    // #define CACHE (2 * __SIZEOF_POINTER__) //
    // [x16, #CACHE] indicates the x16(ISA) + 16 memory address.
    
    // (the first member corresponding to objc_class is isa_t isa,
    // The second member variable is Class superclass,
    // The third member variable is cache_t cache. Isa and cache are exactly 16 bytes different from each other based on their types.)
    
    // Read the contents of the cache into p11 (it reads 8 bytes at a time)
    // In __arm64__ &&__lp64__, the top 16 bits are mask and the bottom 48 bits are buckets
    ldr    p11, [x16, #CACHE]                // p11 = mask|buckets

// Do different processing according to the mask type
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
    // p11&# 0x0000FFFFFFFFFF indicates that you should drop buckets at the next 48 bits of P11
    // P11&0x0000FFFFFFFFFFFF is stored in buckets P10
    and    p10, p11, #0x0000ffffffffffff    // p10 = buckets
    
    // LSR Logic Shift Right
    // p11, LSR #48 indicates that _maskAndBuckets moves 48 places to the right to obtain _mask
    // and is the same as "&" of C
    // p1 is SEL, then do and operation with _mask obtained from 👆 above to obtain SEL hash value and save it in p12
    and    p12, p1, p11, LSR #48        // x12 = _cmd & mask
    
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
    // The mask is 4 bits lower
    
    and    p10, p11, #~0xf            // p10 = buckets
    and    p11, p11, #0xf            // p11 = maskShift
    mov    p12, #0xffff
    lsr    p11, p12, p11                // p11 = mask = 0xffff >> p11
    
    // Save SEL hash in P12 as well
    and    p12, p1, p11                // x12 = _cmd & mask
#else

ARM64 does not support cache mask storage.
#error Unsupported cache mask storage for ARM64.

#endif
    // You can see the PTRSHIFT macro definition in Project Headers/ arm64-ASm.h
    // #if __arm64__
    // #if __LP64__ // 64-bit architecture
    // #define PTRSHIFT 3 // 1<
    // // "p" registers are pointer-sized
    // // true arm64
    // #else
    // // arm64_32 // 32-bit system architecture
    // #define PTRSHIFT 2 // 1<
    // // "p" registers are pointer-sized
    // // arm64_32
    // #endif

    // LSL Logic Shift Left
    / / p10 is buckets
    // p12 is (_cmd & mask) // hash
    // p12 does the logical left shift first (here the logical left shift means that the hash is multiplied by 8 times),
    // Then add to p10 and store the final result in P12
    // p12 = buckets + ((_cmd & mask) << (1+PTRSHIFT))
    
    // (Moving SEL to the left by 4 bits means multiplying SEL by 8, which refers to a pointer that takes up 8 bytes
    // Calculate the distance between SEL and buckets' starting address.
    // The distance is measured in bytes, so multiply by 8.)
    // P12 stores the starting address of the bucket_t pointer to buckets hash
    add    p12, p10, p12, LSL #(1+PTRSHIFT)
                     // p12 = buckets + ((_cmd & mask) << (1+PTRSHIFT))

    // LDR loads the contents of storage at a specified address into a Register
    // STR stores the contents of the register into memory
    / / sample:
    // LDR R1, =0x123456789 Large range of address reading instructions: r1 =0x123456789
    
    // LDR r1, [R2, #4]
    // Read data from r2+4 into R1, equivalent to * in C
    
    [XXX] : [x] : [x] : [x] : [x] : [x] : [x]
    // LDR r1, [r2], #4
    
    // STR r1, [r2, #4] store the value of R1 into memory at address R2 + 4
    
    // if [XXX] = #x and [XXX] = #x and [XXX] = #x and [XXX] = #x
    // STR r1, [r2], #4; // STR r1, [r2], #4
    
    // LDP/STP is a derivative of LDR/STR. It can read and write two registers at the same time. LDR/STR can read and write only one
    // Example: LDP x1, x0, [sp, #0x10] take the value offset by sp by 16 bytes and put it in x1 and x0
    
    // buckets' hash pointer to bucket_t is bucket_t. Buckets' hash pointer is bucket_t
    P10 is buckets and P12 is SEL's bucket_t pointer to buckets
    // Bucket_t _sel and _IMP are reversed on different platforms,
    // Under __arm64__ _imp is first _sel is last, on other platforms it is the other way around
    
    P17 (_IMP) and p9(_sel);
    ldp    p17, p9, [x12]        // {imp, sel} = *bucket 
    
    // CMP compare instruction
    // p1 = SEL (p1 has not been changed since the beginning)
    // check whether the _sel of bucket_t found by the SEL hash is SEL.
    // This is different from SEL because of hashing,
    // We need to perform linear probes forward and backward to find bucket_t depending on the platform (the hash function is different on different platforms).
    // Probe back from the starting position under __arm64__ ((I +1) &mask)
    
    // compare p1 with p9, p9 is the _sel of bucket_t found in the hash table,
    // p1 is the sel passed in (they should be the same if there is no hash collision)
1:  cmp    p9, p1            // if (bucket->sel ! = _cmd)

    // If p9 and p1 are not equal, jump to tag 2 (hash probe)
    b.ne    2f            // scan more
    
    // Call CacheHit if p9 and P1 are equal, i.e. a CacheHit
    CacheHit $0            // call or return imp
    
2:  // NOT hit: P12 = not-hit bucket Not hit
    // CheckMiss $0 -> check whether p9 is 0, bucket_t will start with 0,
    // The _sel of bucket_t is 1
    If it is null, it means that there is no sel corresponding method in the current method cache list.
    // Go to the class's method list to find the method
    // If it is not empty, a hash conflict has occurred and bucket_t exists elsewhere
    
    // (CheckMiss $0 checks if sel is not in the current cache)
    CheckMiss $0            // miss if bucket->sel == 0
    
    // Check if the class is already the first one. If it is the first one, go to the method list of the class
    // Check whether p12(the subscript bucket) equals P10 (buckets).
    // If equal, jump to tag 3 below
    cmp    p12, p10        // wrap if bucket == buckets
    
    // If p12 is equal to p10, go to tag 3 below,
    // The hash conflict is handled all the way to the head of the hash array
    // Continue the probe by jumping to the end of the hash array
    b.eq    3f
    
    // Can also continue to conflict forward lookup
    // #define BUCKET_SIZE (2 * __SIZEOF_POINTER__)
    // It is the width of bucket_t
    // look ahead
    // Move from x12 to the next bucket_t, then store its _IMP into p17 and _sel into p9
    ldp    p17, p9, [x12, #-BUCKET_SIZE]!    // {imp, sel} = *--bucket
    
    // Jump to the first step and continue to compare sel and CMD
    _sel of bucket_t from the hash array
    // CMD is the _cmd of SEL stored in p1
    P0-p7 is the save function argument, p0 is the id of self, and p1 is _cmd.
    b    1b            // loop

3:   // wrap: p12 = first bucket, w11 = mask
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
    / / the p11 is _maskAndBuckets
    // P11 moves to the right by 44 logical steps. First, p11 moves to the right by 48 bits, then moves to the left by 4 bits.
    P11 has been moved 44 bits to the right.
    // This represents the total distance that the buckets pointer moves.
    // Move to the end of the hash array. (The value of mask is capacity - 1)
    
    add    p12, p12, p11, LSR #(48 - (1+PTRSHIFT))
                    // Who does the p12 point to
                    // p12 = buckets + (mask << 1+PTRSHIFT)
                    
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
    // If the lower 4 bits are a mask, the same operation is done
    add    p12, p12, p11, LSL #(1+PTRSHIFT)
                    // p12 = buckets + (mask << 1+PTRSHIFT)
#else

ARM64 does not support cache mask storage.
#error Unsupported cache mask storage for ARM64.

#endif

    // Clone scanning loop to miss instead of hang when cache is corrupt.
    // When the cache is corrupted, the clone scan cycle is lost instead of suspended.
    // The slow path may detect any corruption and halt later.
    // Slow Path may detect any damage and stop later.
    
    // Check the cache again
    
    // x12(bucket_t) read into p17(_IMP), p9(_sel)
    ldp    p17, p9, [x12]        // {imp, sel} = *bucket
    
    // compare sel with p1(CMD)
1:  cmp    p9, p1            // if (bucket->sel ! = _cmd)

    // If not, jump to tag 2
    b.ne    2f            // scan more
    
    // If the match is equal, the cache hit, directly return IMP
    CacheHit $0            // call or return imp
    
2:  // not hit: p12 = not-hit bucket
    // The sel of bucket_t is 0, indicating that the sel is not in the cache array
    CheckMiss $0            // miss if bucket->sel == 0
    
    // Check whether p12 (the subscript bucket) equals P10 (buckets).
    // Indicates that there is no front, but still not found
    cmp    p12, p10        // wrap if bucket == buckets
    
    // If equals, jump to tag 3 below
    b.eq    3f
    
    // The memory size BUCKET_SIZE that you need to translate from X12 buckets
    // Get the second bucket element, imp-sel p17-p9 respectively, i.e
    ldp    p17, p9, [x12, #-BUCKET_SIZE]!    // {imp, sel} = *--bucket
    
    // Jump to tag 1 and continue comparing sel with CMD
    b    1b            // loop

LLookupEnd$1: // corresponding to LLookupStart$1 above:

LLookupRecover$1:
3:    // double wrap
// Jump to JumpMiss because it is normal, jump to __objc_msgSend_uncached
    JumpMiss $0

.endmacro
Copy the code

Due to the limitation of text, the rest of the content in the next chapter.

ARM’s stack is decrement and grows downward, that is, the bottom of the stack is at the high address and the top of the stack is at the low address, so the stack area is generally placed at the top of the memory.

Refer to the link

Reference link :🔗

Method to find the flow objC_MSG_arm64.s
OC Low-level Exploration 09, objc_msgSend process 1- Cache lookup
Assembly instruction interpretation
Objc-msg-arm64 source code in-depth analysis
Assembly language learning notes
IOS assembly tutorial: Understand ARM
Assembler jump instruction B, BL, BX, BLX and BXJ difference
Introduction to ARM64 assembler for iOS developers
C language stack area (based on ARM) and the function of THE ARM SP, FP registers
.align 5 specifies how many bytes are aligned
Interpret objc_msgSend
ARM assembly instruction
Translation – Why does objc_msgSend have to be implemented in assembly
IOS Runtime overview, internal principles, and application scenarios
Ios-runtime Class, message mechanism, super keyword
Go deep into the assembly language of iOS
Operating system memory management (Mind map details)
ARM Instruction analysis 2 (ADRP, B)
Arm64 assembler: UBFX instruction
Part 9 – Linux ARM assembly syntax
Instructions for CBZ and CBNZ RealView Compiler
Linux kernel OOPS (1)
BRAA, BRAAZ, BRAB, BRABZ