1. The nature of method calls in OC

First, write the following code in the main function:

@interface Person : NSObject @property (nonatomic, copy) NSString *name; - (void)saySomething:(NSString *)worlds; + (void)think; @end @implementation Person - (void)saySomething:(NSString *)worlds { NSLog(@"%@", worlds); } + (void)think { NSLog(@"think"); } @end @interface Student : Person - (void)sleep; + (void)drinking; @end @implementation Student - (void)sleep { NSLog(@"sleep..." ); } + (void)drinking { NSLog(@"drinking"); } @end int main(int argc, const char * argv[]) {@autoreleasepool {// class_data_bits_t // Person *p = [Person alloc]; Student *s = [Student alloc]; // In the second group, the parent class calls instance methods and class methods [p saySomething:@" hahaha "]; // Call the method [Person think]; [s sleep]; [s sleep]; // Call method [Student drinking]; [s saySomething:@" whooooooooooooooooo "]; // Call the superclass method [Student think]; // Call the superclass method} return 0; }Copy the code

Then use the terminal command clang to compile the main.m file into a c++ file and see the source code for each set of compiled methods.

Person *p = ((Person *(*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("Person"), sel_registerName("alloc")); Student *s = ((Student *(*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("Student"), sel_registerName("alloc")); ((void (*)(id, SEL, NSString *__strong))(void *)objc_msgSend)((id)p, sel_registerName("saySomething:"), (NSString *)&__NSConstantStringImpl__var_folders_99_49qsqpv90l58q7813rrhltjc0000gn_T_main_a60d73_mi_4); ((void (*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("Person"), sel_registerName("think")); / / the third group ((void (id, SEL)) (*) (void *) objc_msgSend) ((id) s, sel_registerName (" sleep ")); ((void (*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("Student"), sel_registerName("drinking")); ((void (*)(id, SEL, NSString *__strong))(void *)objc_msgSend)((id)s, sel_registerName("saySomething:"), (NSString *)&__NSConstantStringImpl__var_folders_99_49qsqpv90l58q7813rrhltjc0000gn_T_main_a60d73_mi_5); ((void (*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("Student"), sel_registerName("think"));Copy the code

The alloc call is actually a call to the objc_msgSend function, which is declared as follows:

As you can see, the objc_msgSend function takes two arguments. The first argument is a universal pointer of type ID. The second argument is a method number of type SEL. It is easy to see from the source code compiled above and from the bits field in the underlying objc_class (class_data_bits struct type) that instance methods are stored in the class method list. So the essence of an object calling its example method is to call objc_msgSend with the address of its object (which is essentially a structure) (that is, the address of the isa (objc_class *) pointer to the class) and sel as arguments, Sel is obtained by calling the underlying API interface sel_registerName, passing in the method name string as an argument. Since class methods are stored in the metaclass method list, So the essence of calling a class method in a class is to call objc_msgSend with the address of its class (that is, the address of the isa (objc_class *) pointer to the metaclass) and sel as arguments, The sel of a class method is also obtained by calling sel_registerName, passing in the string of the class method name as an argument. At the bottom level, the method and the class method are essentially the same, consisting of sel (the method number) and IMP (the entry address of the method function). One is stored in the class’s method list, and one in the metaclass’s method list. In fact, methods in main can also be called using the Runtime API. First, introduce objc headers

#import <objc/message.h>
Copy the code

In current versions of XCode, using the Runtime API, the default is only one parameter. If you want to pass multiple parameters, you need to set the following:

Method calls in main use objc_msgSend directly, as shown below:

Int main(int argc, const char * argv[]) {@autoreleasepool {// class_data_bits_t // Person *p = [Person alloc]; Student *s = [Student alloc]; Objc_msgSend (p, sel_registerName("saySomething:"), @" hahaha "); // Call method objc_msgSend(objc_getClass("Person"), sel_registerName("think")); Objc_msgSend (s, @selector(sleep)); objc_msgSend(s, @selector(sleep)); // Call the objc_msgSend(objc_getClass("Student"), @selector(drinking)); Objc_msgSend (s, sel_registerName("saySomething:"), @" saySomething "); // Call objc_msgSend(object_getClass(s), sel_registerName("think")); // Call the superclass method} return 0; }Copy the code

Compile and run, the code execution result is as follows:

In the fourth set of code examples above, the subclass calls the instance methods of the superclass as well as the class methods of the superclass. In previous versions, this was actually done through the objc_msgSendSuper function, which is declared as follows:

This function needs to pass a pointer to an objc_super struct, which is implemented as follows:

There are two fields in objc_super. The first field receiver is a universal pointer to the message receiver. The second field uses class in the objC1 version and super_class in the C2 version. The second field is actually the class that contains the method, or it can be a subclass of the class that contains the method, as follows:

Int main(const char * argv[]) {@autoreleasepool {Student *s = [Student alloc]; struct objc_super objS1; objS1.receiver = s; objS1.super_class = objc_getClass("Student"); // or // objs1. super_class = objc_getClass("Person"); Objc_msgSendSuper (&objs1, sel_registerName("saySomething:"), @" la la la "); struct objc_super objS2; objS2.receiver = object_getClass(s); objS2.super_class = objc_getMetaClass("Student"); // or // objs2.super_class = objc_getMetaClass("Person"); objc_msgSendSuper(&objS2, sel_registerName("think")); } return 0; }Copy the code

Execute the code, and the console output is as follows:

2. Explore the objc_msgSend function

By looking at the source code, we know that the essence of calling methods in OC is to send messages (calling objc_msgSend), so to study the logical process of objc_msgSend, we need to look at the source code of objc_msgSend. Search for the objc_msgSend function in the objc source code, as follows:

In fact, apple engineers implement objc_msgSend in C, C++, and assembly language, so you can only find the declaration of objc_msgSend in the message.h header file, and the implementation can only be seen in assembly code in the assembly file. There are several implementations of objc_msgSend depending on the CPU architecture. The.s file in the red box above is an assembly file, but as an iOS developer, we only need to focus on the implementation of the real machine architecture. That is, just look at the assembly implementation of objc_msgSend in objc-msg-arm64.s, and then explore it. First, we find ENTRY _objc_msgSend, as follows:

	ENTRY _objc_msgSend
	UNWIND _objc_msgSend, NoFrame
        //p0表示参数_objc_msgSend函数的参数1,也就是消息接收者,cmp是比较指令,p0与0比较
	cmp	p0, #0			// nil check and tagged pointer check
        //判断是否支持tagged pointers,tagged points最高位为1,二进制转化为十进制就表示负数,比0小
#if SUPPORT_TAGGED_POINTERS
        //比较结果为小于,就跳转到LNilOrTagged执行,很明显Person对象不是一个tagged_pointer
	b.le	LNilOrTagged		//  (MSB tagged pointer looks negative)
#else
        //比较结果为等于,表示参数1为nil,参数1的值显然不为0,所有不会执行这个分支
	b.eq	LReturnZero
#endif
        //ldr指令:将x0寄存器中的值加载到p13,x0寄存器中的值实际上是参数1的值
	ldr	p13, [x0]		// p13 = isa
        //我们并不知道GetClassFromIsa_p16代表什么,所以我们全局搜索一下GetClassFromIsa_p16
	GetClassFromIsa_p16 p13, 1, x0	// p16 = class
        
        
        
//搜索到的GetClassFromIsa_p16代码,原来GetClassFromIsa_p16是一个宏定义,其参数 src = p13(也就是isa指针的地址), needs_auth = 1, auth_address = x0(也是isa指针的地址)
.macro GetClassFromIsa_p16 src, needs_auth, auth_address /* note: auth_address is not required if !needs_auth */

//工程运行在Mac OS上的时候,SUPPORT_INDEXED_ISA为0
#if SUPPORT_INDEXED_ISA
	// Indexed isa
	mov	p16, \src			// optimistically set dst = src
	tbz	p16, #ISA_INDEX_IS_NPI_BIT, 1f	// done if not non-pointer isa
	// isa in p16 is indexed
	adrp	x10, _objc_indexed_classes@PAGE
	add	x10, x10, _objc_indexed_classes@PAGEOFF
	ubfx	p16, p16, #ISA_INDEX_SHIFT, #ISA_INDEX_BITS  // extract index
	ldr	p16, [x10, p16, UXTP #PTRSHIFT]	// load class from array
1:

#elif __LP64__ //因此代码会执行到这个分支

//needs_auth值为1
    .if \needs_auth == 0 // _cache_getImp takes an authed class already
	mov	p16, \src
    .else
//因此会执行到这个分支,我们再来查看ExtractISA是如何定义的,全局搜索ExtractISA
	// 64-bit packed isa,执行完这个宏定义后,p16的值就为类地址
	ExtractISA p16, \src, \auth_address
    .endif
#else
	// 32-bit raw isa
	mov	p16, \src

#endif

.endmacro



//ExtractISA的定义如下所示,在A12芯片的环境下__has_feature这个宏定义为真,因此在Mac OS中走else分支
#if __has_feature(ptrauth_calls)
...
...
...
.macro ExtractISA
	and	$0, $1, #ISA_MASK
    #if ISA_SIGNING_AUTH_MODE == ISA_SIGNING_STRIP
	xpacd	$0
    #elif ISA_SIGNING_AUTH_MODE == ISA_SIGNING_AUTH
	mov	x10, $2
	movk	x10, #ISA_SIGNING_DISCRIMINATOR, LSL #48
	autda	$0, x10
    #endif
.endmacro
...
...
...
#else
...
...
...
//执行这个地方的汇编代码
.macro ExtractISA
        //and是与指令,意思是将$1 & #ISA_MASK然后将值赋值到$0
        //$0 代表 p16,$1 代表 isa,因此$1 & #ISA_MASK就得到class地址,赋值给p16
	and    $0, $1, #ISA_MASK
.endmacro
// not JOP
#endif



//执行完GetClassFromIsa_p16之后,p16就代表获取到的类地址,p13代表isa指针的地址,就执行接下来的汇编代码
LGetIsaDone:
	// calls imp or objc_msgSend_uncached
        //执行CacheLookup这个宏所代表的的汇编代码
	CacheLookup NORMAL, _objc_msgSend, __objc_msgSend_uncached



//全局搜索CacheLookup,查看其宏定义
//此时: Mode 为 NORMAL,Function 为 _objc_msgSend,MissLabelDynamic 为 __objc_msgSend_uncached。、
.macro CacheLookup Mode, Function, MissLabelDynamic, MissLabelConstant
        //将x16寄存器中的值存储到x15寄存器,x16与x15存储了class的地址
	mov	x15, x16			// stash the original isa
LLookupStart\Function:
	// p1 = SEL, p16 = isa
//表示架构是模拟器或者Mac OS       
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
        //#define CACHE            (2 * __SIZEOF_POINTER__)
        //CACHE就是16字节,也就是x16寄存器的值+16字节获取得到cache字段的地址,也就是cache_t结构体中首个成员变量_bucketsAndMaybeMask的地址,ldr指令将计算后的内存中的值加载到p10寄存器,最后p10 = _bucketsAndMaybeMask
	ldr	p10, [x16, #CACHE]				// p10 = mask|buckets
        //lsr:将p10右移48位,得到的值存储到p11,p11 = mask
	lsr	p11, p10, #48			// p11 = mask
        //p10 = p10 & #0xffffffffffff = mask的值
	and	p10, p10, #0xffffffffffff	// p10 = buckets
        //w1为p1寄存器(_cmd)的低32位,w11是p11寄存器的低32位,这一步相当于cache中的hash函数的作用,获取_sel在buckets的idx
	and	w12, w1, w11			// x12 = _cmd & mask
//真机走这个分支
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
        //p11 = x16(类isa指针的) + 16字节 = cache字段的地址(也是cache_t结构体中bucketsAndMaybeMask字段的地址),ldr指令将计算得到的内存地址中的值加载到p11寄存器,p11 = bucketsAndMaybeMask字段的值
	ldr	p11, [x16, #CACHE]			// p11 = mask|buckets
//真机环境 CONFIG_USE_PREOPT_CACHES为1
    #if CONFIG_USE_PREOPT_CACHES
        //A12芯片__has_feature(ptrauth_calls)为1
        #if __has_feature(ptrauth_calls)
        //tbnz: 寄存器测试不为0就跳转执行后面的汇编代码
        //p11寄存器的值中0位不为0,就跳转到LLookupPreopt\Function处执行汇编代码
	tbnz	p11, #0, LLookupPreopt\Function
        //p10 = p11 & #0x0000ffffffffffff = buckets
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
        #else //非A12芯片
        //p10 = p11 & #0x0000fffffffffffe = buckets
	and	p10, p11, #0x0000fffffffffffe	// p10 = buckets
        //p11寄存器的值中0位不为0,就跳转到LLookupPreopt\Function处执行汇编代码
	tbnz	p11, #0, LLookupPreopt\Function
        #endif
        
        //eor(异或操作)p12 = p1 ^ (p1 >> 7),p1就是sel,这一步操作与cache_t中cache_hash中的逻辑一致
	eor	p12, p1, p1, LSR #7
        //p12 = p12 & (p11 >> 48),p11就是bucketsAndMaybeMask的值,右移48获取到mask的值,最后p12 & mask就是获取_cmd在hash表中映射的位置idx
	and	p12, p12, p11, LSR #48		// x12 = (_cmd ^ (_cmd >> 7)) & mask
    #else //非真机环境
        //p10 = p11 & #0x0000ffffffffffff,p10为buckets
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
        //p12 = p1 & (p11 >> 48),也就是p12 = _cmd & (bucketsAndMaybeMask >> 48 = mask),获取_cmd在hash表中映射的位置idx
	and	p12, p1, p11, LSR #48		// x12 = _cmd & mask
    #endif // CONFIG_USE_PREOPT_CACHES
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4 //mask存储在低4位
	ldr	p11, [x16, #CACHE]				// p11 = mask|buckets
        //p10 = p11 & !0x1111 
	and	p10, p11, #~0xf			// p10 = buckets
	and	p11, p11, #0xf			// p11 = maskShift
	mov	p12, #0xffff
	lsr	p11, p12, p11			// p11 = mask = 0xffff >> p11
	and	p12, p1, p11			// x12 = _cmd & mask
#else
#error Unsupported cache mask storage for ARM64.
#endif
        //目前,p16: 类isa指针的地址,p12 = idx,p10 = buckets()
        //在Mac OS环境中PTRSHIFT为3,arm64中PTRSHIFT为2
        //p13 = p10 + (p12 << (1 + PTRSHIFT)),这句汇编代码的意义是获取到_cmd映射在hash表中所在位置的地址
	add	p13, p10, p12, LSL #(1+PTRSHIFT)
						// p13 = buckets + ((_cmd & mask) << (1+PTRSHIFT))

						// do {
        //加载x13寄存器的值所代表的内存地址中的值,也就是获取bucket_t结构体中imp以及sel的值分别赋值给p17以及p9寄存器,然后x13寄存器中的值减16字节大小。
1:	ldp	p17, p9, [x13], #-BUCKET_SIZE	//     {imp, sel} = *bucket--
        //比较p9与_cmd的值
	cmp	p9, p1				//     if (sel != _cmd) {
        //不相等,执行3处的汇编
	b.ne	3f				//         scan more
						//     } else {
        //相等,缓存命中,执行CacheHit处的汇编代码
2:	CacheHit \Mode				// hit:    call or return imp
						//     }
        //cbz:为0就跳转,也就是p9为0就跳转执行MissLabelDynamic,也就是没有查找到_cmd,_cmd不在缓存中                                        
3:	cbz	p9, \MissLabelDynamic		//     if (sel == 0) goto Miss;
        //比较_cmd映射到的hash表中的地址与buckets()的值
	cmp	p13, p10			// } while (bucket >= buckets)
        //比较结果为大于等于,跳转到1处执行汇编代码
	b.hs	1b
//MAC OS环境
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS 
	add	p13, p10, w11, UXTW #(1+PTRSHIFT)
						// p13 = buckets + (mask << 1+PTRSHIFT)
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16 //真机环境
        //p10: buckets()   p11 = bucketsAndMaybeMask
        //p13 = p10 & (p11 >> (48 - (1+PTRSHIFT))),p11先右移48得到mask的值,mask再左移(1+PTRSHIFT)位得在hash表中偏移量,然后hash表的地址加上这个偏移量得到的值赋值给p13,意思就是获取hash表倒数第二个位置元素的地址
	add	p13, p10, p11, LSR #(48 - (1+PTRSHIFT))
						// p13 = buckets + (mask << 1+PTRSHIFT)
						// see comment about maskZeroBits
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
	add	p13, p10, p11, LSL #(1+PTRSHIFT)
						// p13 = buckets + (mask << 1+PTRSHIFT)
#else
#error Unsupported cache mask storage for ARM64.
#endif
        //目前:p12为当前_cmd在buckets hash表中的索引idx, p10:buckets
        //p12 = p10 + p12 << (1+PTRSHIFT),意思是,获取到当前_cmd通过hash函数映射到buckets hash表中所在位置的地址,存储到p12中
	add	p12, p10, p12, LSL #(1+PTRSHIFT)
						// p12 = first probed bucket

						// do {
        //加载到x13寄存器值所表示的内存地址中的imp与sel的值给p17与p9,x13寄存器的值减16字节长度获取下一个bucket的地址                                       
4:	ldp	p17, p9, [x13], #-BUCKET_SIZE	//     {imp, sel} = *bucket--
        //比较sel与_cmd的值
	cmp	p9, p1				//     if (sel == _cmd)
        //相等,执行2处的汇编代码,也就是缓存命中
	b.eq	2b				//         goto hit
        //比较p9与0的值
	cmp	p9, #0				// } while (sel != 0 &&
        //比较p13与p12的值
	ccmp	p13, p12, #0, ne		//     bucket > first_probed)
        //比较结果为无符号大于,执行4处的汇编代码
	b.hi	4b
...
...
...
.endmacro



//缓存命中的汇编代码 $0 为 NORMAL
.macro CacheHit
.if $0 == NORMAL
        //执行TailCallCachedImp宏处的汇编代码
        //x17寄存器的值就是查找到的imp的值,也就是函数的实现地址
        //x10:buckets()
        //x1:_cmd的值
        //x16:class(p0中isa指针值)
	TailCallCachedImp x17, x10, x1, x16	// authenticate and call imp
.elseif $0 == GETIMP
	mov	p0, p17
	cbz	p0, 9f			// don't ptrauth a nil imp
	AuthAndResignAsIMP x0, x10, x1, x16	// authenticate imp and re-sign as IMP
9:	ret				// return IMP
.elseif $0 == LOOKUP
	// No nil check for ptrauth: the caller would crash anyway when they
	// jump to a nil IMP. We don't care if that jump also fails ptrauth.
	AuthAndResignAsIMP x17, x10, x1, x16	// authenticate imp and re-sign as IMP
	cmp	x16, x15
	cinc	x16, x16, ne			// x16 += 1 when x15 != x16 (for instrumentation ; fallback to the parent class)
	ret				// return imp via x17
.else
.abort oops
.endif
.endmacro



//执行TailCallCachedImp宏处所定义的汇编代码
#if __has_feature(ptrauth_calls)
// JOP
//A12芯片执行这个分支汇编代码
.macro TailCallCachedImp
	
	eor	$1, $1, $2	// mix SEL into ptrauth modifier
	eor	$1, $1, $3  // mix isa into ptrauth modifier
	brab	$0, $1
.endmacro

.macro TailCallCachedImp
	// $0 = cached imp, $1 = address of cached imp, $2 = SEL, $3 = isa
        //$0 ^= $3,存储某个sel所对应的imp时,是存储的这个imp ^ cls(也就是isa)的值,所以获取这个sel所对应imp时,是获取所存储的imp ^ cls后的值
	eor	$0, $0, $3
        //跳转到imp(函数内存地址)继续执行
	br	$0
.endmacro
#else

Copy the code

3. Objc_msgSend Quick search summary

3.1 Key steps of the objc_msgSend quick search process

  1. judgereceiverExists or not, returns if not.
  2. To obtainreceiverIn theisaThe value of the.
  3. Based on what we gotisaThe value of the acquisitioncacheThe in-memory address of the field.
  4. Take out thecacheMember variables in_bucketsAndMaybeMaskAre obtained by the shift operationbucketswithmaskThe value of the.
  5. Based on the parameters passed in_cmdTo calculate the value of (real machine has[sel ^= (sel >> 7)]Steps,idx = sel & mask) index value.
  6. To obtainidxThe correspondingbucketssomebucketTake out thisbucketIn theimpwithsel.bucketMove back one unit and judgeselwith_cmdIs it equal? YescacheHit.selIf the value is empty, the slow search process is executedbucket >= buckets, the process is repeatedly repeated.
  7. After the execution of step 6, that is, the first half of the search is not found, the second half of the search, take outmaskIndex correspondingbucketssomebucketTake out thisbucketIn theimpwithselTo determineselwith_cmdIs it equal? YescacheHit, selIf the value is empty, the slow search process is executedbucket > idxOf the position corresponding to the indexBucket ‘, the process repeats itself.
  8. If the cache matches, obtain the valueimp^isaThe value of theimp2To jump toimp2Execute the code.
  9. Slow search process, execution__objc_msgSend_uncachedCode in.

3.2 objc_msgSend Quick search flow chart

3.3 conclusion

We know from the above investigation that if the quick lookup process cannot find imp corresponding to SEL, it will call __objc_msgSend_uncached. As for how __objc_msgSend_uncached will deal with the case that IMP corresponding to SEL is not found in the fast lookup mode. The slow lookup process will be discussed in detail in the next article, thanks for reading.