OC Fundamentals 06: Fast lookup process for message flow analysis

In the last article, we explored how the insert method inserts SEL and IMP into the cache when instance objects call instance methods. Now let’s look at how a method is retrieved from the cache when it is called, a quick sel-IMP lookup.

Objc_msgSend bedding

1. View the source code

How do I get into insert when I call a method?

In the source codeobjc_cache.mmIn addition to the insert method from the previous article, you can also find itcache_fillMethods:Is in thecache_fillIs calledinsertAnd search againcache_fill:

In other wordsCache writersCache write is in progresscache_fillOperation, and takes place before writing to the cacheCache readersCache read process, which hasObjc_msgSend and cache_getImp.

2. Clang

Use Clang to compile the following code:

@interface Person : NSObject

- (void)sayHello;
- (int)addNumber:(int)number;

@end  @implementation Person - (void)sayHello{  NSLog(@"Hello world"); } - (int)addNumber:(int)number{  return number+1; } @end  Person *p = [Person alloc]; [p sayHello]; int result = [p addNumber:2]; Copy the code

In the CPP file, we can see the compiled result:Whether it’s callingClass method alloc, orExample method sayHello, will be compiled toObjc_msgSend (message receiver, method body, method parameters..).

The meaning of the message receiver is that the method is found through the message receiverRoots path.

3. Develop

#import <objc/message.h>
@interface Tercher : Person
@end

@implementation Tercher
@end   Person *p = [Person alloc]; [p sayHello]; objc_msgSend(p, sel_registerName("sayHello"));  Tercher *t = [Tercher alloc]; [t sayHello]; struct objc_super xsuper; xsuper.receiver = t; xsuper.super_class = [Person class]; objc_msgSendSuper(&xsuper, sel_registerName("sayHello")); Copy the code

Set Build Settings->Enable Strict Checking of objc_msgSend Calls to NO.

Objc_msgSend Quick lookup

objc_msgSend

Objc-msg-arm64.s assembler code: objc-msG-arm64.s

The objc_msgSend function is the core engine for all OC method calls and is responsible for finding the implementation of the method and executing it. Because the call frequency is very high, its internal implementation has a great impact on performance, so we use assembly language to write the internal implementation code. Assembly is characterized by high speed and parameter uncertainty.

Resolution:

1.
cmp p0, #0 	// nil check and tagged pointer check
Copy the code

The first piece of code, CMP, compared, we can see from the comments that nil check is nil. P0 is the first parameter of objc_msgSend, the message receiver.

2.
#if SUPPORT_TAGGED_POINTERS
	b.le	LNilOrTagged		// (MSB tagged pointer looks negative)
#else
	b.eq	LReturnZero
#endif
Copy the code

SUPPORT_TAGGED_POINTERSDetermine whether small object types are supported. Yesb.leJump toLNilOrTagged, otherwise,b.eq LReturnZeroReturns an empty.

When small object types are supported, the result of CMP P0, #0 is still used to decide whether to continue, and LReturnZero is also called if the message receiver is empty.

Le = less equal; Eq equals equals

3.
ldr p13, [x0]       // p13 = isa 
Copy the code

According to the object get ISA into register P13.

4.
GetClassFromIsa_p16 p13     // p16 = class 
Copy the code

In 64-bit true machines, will$0(incoming P13 -> ISA)andISA_MASKMask and operation, can be obtainedClass class informationAfter finding the class information, you can offset tocachePerform a method lookup, i.eCacheLookup NORMAL Quick lookup.

5.
LGetIsaDone: // The ISA has been obtained
	// calls imp or objc_msgSend_uncached
	CacheLookup NORMAL, _objc_msgSend
Copy the code

CacheLookup NORMAL

Source:

.macro CacheLookup
 //
 // Restart protocol:
 //
 // As soon as we're past the LLookupStart$1 label we may have loaded
 // an invalid cache pointer or mask.  //  // When task_restartable_ranges_synchronize() is called,  // (or when a signal hits us) before we're past LLookupEnd$1,  // then our PC will be reset to LLookupRecover$1 which forcefully  // jumps to the cache-miss codepath which have the following  // requirements:  //  // GETIMP:  // The cache-miss is just returning NULL (setting x0 to 0)  //  // NORMAL and LOOKUP:  // - x0 contains the receiver  // - x1 contains the selector  // - x16 contains the isa  // - other registers are set as per calling conventions  // LLookupStart$1:   // p1 = SEL, p16 = isa  ldr p11, [x16, #CACHE] // p11 = mask|buckets  #if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16  and p10, p11, #0x0000ffffffffffff // p10 = buckets  and p12, p1, p11, LSR #48 // x12 = _cmd & mask #elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4  and p10, p11, #~0xf // p10 = buckets  and p11, p11, #0xf // p11 = maskShift  mov p12, #0xffff  lsr p11, p12, p11 // p11 = mask = 0xffff >> p11  and p12, p1, p11 // x12 = _cmd & mask #else #error Unsupported cache mask storage for ARM64. #endif    add p12, p10, p12, LSL #(1+PTRSHIFT)  // p12 = buckets + ((_cmd & mask) << (1+PTRSHIFT))   ldp p17, p9, [x12] // {imp, sel} = *bucket 1: cmp p9, p1 // if (bucket->sel ! = _cmd)  b.ne 2f // scan more  CacheHit $0 // call or return imp  2: // not hit: p12 = not-hit bucket  CheckMiss $0 // miss if bucket->sel == 0  cmp p12, p10 // wrap if bucket == buckets  b.eq 3f  ldp p17, p9, [x12, #-BUCKET_SIZE]! // {imp, sel} = *--bucket  b 1b // loop  3: // wrap: p12 = first bucket, w11 = mask #if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16  add p12, p12, p11, LSR #(48 - (1+PTRSHIFT))  // p12 = buckets + (mask << 1+PTRSHIFT) #elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4  add p12, p12, p11, LSL #(1+PTRSHIFT)  // p12 = buckets + (mask << 1+PTRSHIFT) #else #error Unsupported cache mask storage for ARM64. #endif   // Clone scanning loop to miss instead of hang when cache is corrupt.  // The slow path may detect any corruption and halt later.   ldp p17, p9, [x12] // {imp, sel} = *bucket 1: cmp p9, p1 // if (bucket->sel ! = _cmd)  b.ne 2f // scan more  CacheHit $0 // call or return imp  2: // not hit: p12 = not-hit bucket  CheckMiss $0 // miss if bucket->sel == 0  cmp p12, p10 // wrap if bucket == buckets  b.eq 3f  ldp p17, p9, [x12, #-BUCKET_SIZE]! // {imp, sel} = *--bucket  b 1b // loop  LLookupEnd$1: LLookupRecover$1: 3: // double wrap  JumpMiss $0  .endmacro  Copy the code

1.
 // p1 = SEL, p16 = isa
 ldr p11, [x16, #CACHE]    // p11 = mask|buckets
Copy the code

Among them#CACHE == 2*8 = 16To:From the class structure, we can see that shifting ISA by 16 bytes can obtain the cache, which is the final resultp11=cache. But why are commentsp11 = mask|buckets? In the 64-bit system, masks and buckets are stored together to save memory and facilitate access.

2.
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
and	p12, p1, p11, LSR #48		// x12 = _cmd & mask
Copy the code

The p11 (mask | buckets) and 0 x0000ffffffffffff and operation, its high 16 MaLing, the result is the buckets, deposited in the p10.

and p12, p1, p11, LSR #48Divided into two sections, first calculatep11, LSR #48, p11 is logically moved 48 bits to the right to obtaincacheIn themask. thenp1With the operationmaskExists the result ofp12In the. The p1 assel(_cmd). seeCacheLookupSource code in the beginning of the comments.And finally the result of the operationp12Is that methods existbucketstheThe subscript.

Because in the previous article, the position to insert in the insert method is the subscript calculated using sel & mask.

3.
add	p12, p10, p12, LSL #(1+PTRSHIFT)
		             // p12 = buckets + ((_cmd & mask) << (1+PTRSHIFT))
Copy the code

Here is also divided into two paragraphs:

p12, LSL #(1+PTRSHIFT). Global searchPTRSHIFT:

64-bit real machine,PTRSHIFT = 3, so the meaning of the first piece of code is the methodThe subscriptPerform a logical shift of 4 bits to the left. It’s the same thing as moving 4 to the left2 ^ 4.

0000 0001 << 4  = 0001 0000 = 16 = 2^4
Copy the code

So the last piece of assembly code means the subscript of the method * 2^4. The results are stored in P10.

add p12, p10

Bucket_t (buckets’ first address saved by P12); bucket_t (buckets’ first address subscript * 2^4 bytes);

Why is the subscript multiplied by 16 bytes? The reason is that bucket_t stores SEL and IMP, both of which are 8 bytes, and a bucket_t is 16 bytes, so subscript times the size of each bucket_t to find the bucket_t. Bucket_t is the same as bucket in assembly. Bucket_t is a structure in C language, and bucket is assembly.

4.
ldp	p17, p9, [x12]		// {imp, sel} = *bucket
Copy the code

In the bucket of the method obtained above, IMP and SEL can be found, which are stored in P17 and P9 respectively.

5.
1:	cmp	p9, p1			// if (bucket->sel ! = _cmd)
	b.ne	2f			// scan more
	CacheHit $0			// call or return imp
Copy the code

If sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd) if sel = p1(_cmd

6.
2:	// not hit: p12 = not-hit bucket
	CheckMiss $0			// miss if bucket->sel == 0
	cmp	p12, p10		// wrap if bucket == buckets
	b.eq	3f
	ldp	p17, p9, [x12, #-BUCKET_SIZE]!	// {imp, sel} = *--bucket
	b	1b			// loop
Copy the code

If sel is not equal to buckets p1(_cmd), the LDP p17, p9, [x12, # -bucket_size]! . That is, if the value is not equal, the bucket will be searched forward and the bucket will jump to 1 again for loop.

7.
3:	// wrap: p12 = first bucket, w11 = mask
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
	add	p12, p12, p11, LSR #(48 - (1+PTRSHIFT))
					// p12 = buckets + (mask << 1+PTRSHIFT)
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
	add	p12, p12, p11, LSL #(1+PTRSHIFT)
					// p12 = buckets + (mask << 1+PTRSHIFT)
#else
#error Unsupported cache mask storage for ARM64.
#endif
Copy the code

Add p12, p12, p11, LSR #(48 – (1+PTRSHIFT))

The p11 knew from the start p11 = mask | buckets, p11 logic moves to the right 44, can also be considered in the p11 mask left four, namely comments (mask < < 1 + PTRSHIFT) = = mask * 2 ^ 4.

In the previous article, the value of mask was equal to Capacity-1, which is the number of all constructs in buckets minus one.

If the first bucket is the same as the other buckets, they will move to the last bucket and compare buckets again

Quick search process summary:

The recommended reference

Deconstruct the implementation of the objc_msgSend function in depth

Objc_msgSend Quick lookup of process analysis

OC Fundamentals 06: Fast lookup process for message flow analysis

Objc_msgSend bedding

1. View the source code

2. Clang

3. Develop

Objc_msgSend Quick lookup

objc_msgSend

CacheLookup NORMAL

The recommended reference

Related Posts

IOS infrastructure like Discovery (part 2)

Packaging process of uniAPP native plug-in on iOS

IOS13 [UIViewController presentViewController] style adaptation