1. Cache insertion process analysis

How methods are cached in a class was explored in Cache_T Analysis like OC Low-level Exploration! But when is the method stored in the class cache?

Let’s start exploring when methods are inserted into the cache!

Open objc’s source code and place a breakpoint on the call method:

After the break, set a breakpoint in the INSERT method, and then use bt in the LLDB to view the stack:

On the stack you can see the entire flow from the object calling the method to entering the insert method:

_objc_msgSend_uncached -> lookUpImpOrForward -> log_and_fill_cache -> insert

You can see that the objc_msgSend method is called first when an object calls a method!

The objc_msgSend method involves the Runtime!

Run-time understanding of runtime

1. Static and dynamic programming

The first thing we need to know is that programming languages can be static and dynamic.

In a static language, all type judgments are determined before the program is run, and all members and methods of a class have their memory addresses determined at compile time. This means that all class objects can access only their own member variables and methods, otherwise the compiler will directly report an error. More common static languages such as Java, C ++, C and so on.

In dynamic languages, on the other hand, the determination of type, class member variables, and method memory locations are determined at runtime, and member variables and methods can be added dynamically. This means that if you call a method that doesn’t exist, the compilation will pass, and even if an object’s type is not what it looks like on the surface, its true type can only be determined after it’s run. Dynamic languages are more flexible and subscriptable than static languages. Oc is a dynamic language.

2. Compile time

Compile time, as the name implies, is when you are compiling. What is compilation? A compiler helps you translate the source code into code that the machine can recognize. (In a general sense, of course, but probably translated into some intermediate state of language.)

So compilation is simply a process of doing some translation work, such as checking if you accidentally miswrote any keywords, lexical analysis, grammatical analysis and so on. It’s like a teacher checking a student’s essay for typos and bad sentences. The compiler will tell you if it finds any errors.

If you are using Microsoft VS, click build and start compiling. If errors or warning is displayed below, it is checked by the compiler. This is called a compile-time error, and any type checking done during this process is also called compile-time type checking, or static type checking (static is not actually running the code in memory, but simply scanning the code as text).

So sometimes when people say that memory is allocated at compile time, they are definitely wrong.

3. Runtime

At runtime, the code runs and gets loaded into memory.

Your code is dead on disk until it is loaded into memory. It becomes alive only when it is in memory. Run-time type checking is different from compile-time type checking (or static type checking) described earlier. It’s not just scanning code. I’m doing things in memory, making judgments.

More details about the runtime can be found in the official documentation. (Objective-C Runtime Programming Guide)

4. Runtime initiation mode

1. OC method.

2. NSObject methods.

Objc dynamic library API.

Hierarchies can be represented by a graph:

5. The difference between runtime and compile time

Create an object with two methods, implement only one method, run:

The difference between compile time and run time is that the compilation succeeds, but an error is reported as soon as it runs!

6, through the underlying analysis

Next, let’s look at the underlying implementation of OC code through clang restoration, and find the main function:

int main(int argc, const char * argv[]) {
    /* @autoreleasepool */ { __AtAutoreleasePool __autoreleasepool; 
        HPerson * p = ((HPerson *(*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("HPerson"), sel_registerName("alloc"));
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)p, sel_registerName("saySix"));
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)p, sel_registerName("sayHello"));
    }
    return 0;
}
Copy the code

It is found that the upper code will get an explanation after compilation!

The process of calling the method is to call the objc_msgSend function, or message send!

The objc_msgSend function takes two parameters from the underlying code:

One is (id)objc_getClass(“HPerson”) or (id) P, the recipient of the message!

One is sel_registerName(” XXX “) sel!

Now all methods are called without arguments. What if they are called with arguments?

We add parameters:

Clang again:

int main(int argc, const char * argv[]) {
    /* @autoreleasepool */ { __AtAutoreleasePool __autoreleasepool; 
        HPerson * p = ((HPerson *(*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("HPerson"), sel_registerName("alloc"));
        ((void (*)(id, SEL, NSString *))(void *)objc_msgSend)((id)p, sel_registerName("saySomething:"), (NSString *)&__NSConstantStringImpl__var_folders_1h_55lzq4fd39b0mz94wmqthqf860mpgy_T_main_189844_mi_2);
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)p, sel_registerName("saySix"));
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)p, sel_registerName("sayHello"));
    }
    return 0;
}
Copy the code

You can see that there is an NSString argument!

Therefore, the message sending mode can be obtained:

Objc_msgSend (Message receiver, message body (SEL + argument))

7. Low-level code calls

Can we call it directly in code?

Set Enable Strict Checking of objc_msgSend Calls to NO in Build Setting

The objc_msgSend function is called with a relaxed set of parameters.

Then we try calling it directly from the code:

We found that the method is exactly the same as we normally use code to call!

NSObject method call

So how does NSObject get called?

If you go to nsobject. h, you can see the related methods:

So obviously performSelector has to do with methods, so let’s try it out:

With our normal call is the same!

View objc_msgSend source code

1. Assembly debugging

To open the firstassembly:

Then place a breakpoint in front of the method and run:

Create a breakpoint at objc_msgSend and hold down CTRL to step into:

Objc_msgSend is the source code of objC.

2, view the source code

Open objc source and search for objc_msgSend:

Since the underlying objc_msgSend is written in assembly, let’s look directly at the.s file!

Since we’re mostly using real machines, we’ll look at arm64:

Find ENTRY _objc_msgSend:

	ENTRY _objc_msgSend
	UNWIND _objc_msgSend, NoFrame

	cmp	p0, #0			// nil check and tagged pointer check
#if SUPPORT_TAGGED_POINTERS
	b.le	LNilOrTagged		// (MSB tagged pointer looks negative)
#else
	b.eq	LReturnZero
#endif
	ldr	p13, [x0]		// p13 = isa
	GetClassFromIsa_p16 p13, 1, x0	// p16 = class
LGetIsaDone:
	// calls imp or objc_msgSend_uncached
	CacheLookup NORMAL, _objc_msgSend, __objc_msgSend_uncached
Copy the code

We can see that _objc_msgSend is written in syntagma, whereas our previous source code was written in C or C ++. Why?

Because assembly is fast and dynamic!

Four, assembly source code analysis

1, assembly of common instructions

B instruction

  • bl Jump to labeled out execution
  • b.leCheck whether the value of CMP above is less than or equal to the execution label, otherwise go straight down
  • b.geGreater than or equal to the execution address otherwise down
  • b.ltDetermine if the value of camp above is less than the method in the address below or go straight down
  • b.gtGreater than the execution address or down
  • b.eqEqual to execute address otherwise down
  • b.hiThe result of the comparison is unsigned greater than, execute the method in the address, otherwise no jump
  • b.hsThe instruction is to determine whether unsigned is less than
  • b.lsThe instruction is to determine whether unsigned greater than
  • b.loThe instruction is to determine whether unsigned greater than or equal to

Ret return

  • mov x0,#0x10 -> x0 = 0x10
  • str w10 ,[sp]Store the value of register W10 to sp stack space memory
  • STP x0, x1, [sp. # 0 x10] *: x0, x1 values are stored in SP + 0x10
  • ORR x0, WZR, # 0 x1 : x0 = wzr | 0x1
  • stur w10 ,[sp]Store the value of register W10 to sp stack space memory
  • ldr w10 ,[sp]W10 = value in sp stack memory
  • The LDP x0, x1, [sp]X0, x1 = sp stack memory values

Adrp gets a string from base address + offset (global variable)

  • cbzComparison, is zero, jump;
  • cbnz: compares. If the value is non-zero, the system jumps.
  • CMP: Comparison functionSuch as:CMP OPR1, OPR2. = (OPR1) - (OPR2)

2. Start analyzing

cmp	p0, #0			// nil check and tagged pointer check
Copy the code

Search p0:

#if __LP64__
// true arm64

#define SUPPORT_TAGGED_POINTERS 1
#define PTR .quad
#define PTRSIZE 8
#define PTRSHIFT 3  // 1<<PTRSHIFT == PTRSIZE
// "p" registers are pointer-sized
#define UXTP UXTX
#define p0  x0
#define p1  x1
#define p2  x2
#define p3  x3
#define p4  x4
#define p5  x5
#define p6  x6
#define p7  x7
#define p8  x8
#define p9  x9
#define p10 x10
#define p11 x11
#define p12 x12
#define p13 x13
#define p14 x14
#define p15 x15
#define p16 x16
#define p17 x17

// true arm64
#else
// arm64_32

#define SUPPORT_TAGGED_POINTERS 0
#define PTR .long
#define PTRSIZE 4
#define PTRSHIFT 2  // 1<<PTRSHIFT == PTRSIZE
// "p" registers are pointer-sized
#define UXTP UXTW
#define p0  w0
#define p1  w1
#define p2  w2
#define p3  w3
#define p4  w4
#define p5  w5
#define p6  w6
#define p7  w7
#define p8  w8
#define p9  w9
#define p10 w10
#define p11 w11
#define p12 w12
#define p13 w13
#define p14 w14
#define p15 w15
#define p16 w16
#define p17 w17

// arm64_32
#endif
Copy the code

You can see that p0 is register x0! That’s the first argument we pass in, p.

Compare p0 with 0 to see if P0 is null!

That is, whether the message receiver exists, if not:

#if SUPPORT_TAGGED_POINTERS
	b.le	LNilOrTagged		// (MSB tagged pointer looks negative)
#else
	b.eq	LReturnZero
#endif
Copy the code

If SUPPORT_TAGGED_POINTERS is 1, LNilOrTagged is displayed:

#if SUPPORT_TAGGED_POINTERS
LNilOrTagged:
	b.eq	LReturnZero		// nil check
	GetTaggedClass
	b	LGetIsaDone
// SUPPORT_TAGGED_POINTERS
#endif
Copy the code

Similar to when support_tagged_pointer is 0, LReturnZero is executed:

LReturnZero:
	// x0 is already zero
	mov	x1, #0
	movi	d0, #0
	movi	d1, #0
	movi	d2, #0
	movi	d3, #0
	ret

	END_ENTRY _objc_msgSend
Copy the code

That is, empty the register, and then finish.

The normal case, of course, is to continue down:

ldr	p13, [x0]		// p13 = isa
Copy the code

Assign x0 to p13, which is the message receiver, the first argument p passed in.

The comment indicates that isa is assigned to p13. Why isa?

Because isa is the first address of p!

Get class-getClassFromisA_p16

Continue to:

GetClassFromIsa_p16 p13, 1, x0	// p16 = class
Copy the code

The comment indicates that class is assigned to p16!

Start exploring GetClassFromIsa_p16!

Call GetClassFromIsa_p16 and pass in p13, 1, x0.

Enter the GetClassFromIsa_p16:

/******************************************************************** * GetClassFromIsa_p16 src, needs_auth, auth_address * src is a raw isa field. Sets p16 to the corresponding class pointer. * The raw isa might be an indexed isa to be decoded, or a * packed isa that needs to be masked. * * On exit: * src is unchanged * p16 is a class pointer * x10 is clobbered * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
 
.macro GetClassFromIsa_p16 src, needs_auth, auth_address /* note: auth_address is not required if ! needs_auth */

#if SUPPORT_INDEXED_ISA
	// Indexed isa
	mov	p16, \src			// optimistically set dst = src
	tbz	p16, #ISA_INDEX_IS_NPI_BIT, 1f	// done if not non-pointer isa
	// isa in p16 is indexed
	adrp	x10, _objc_indexed_classes@PAGE
	add	x10, x10, _objc_indexed_classes@PAGEOFF
	ubfx	p16, p16, #ISA_INDEX_SHIFT, #ISA_INDEX_BITS  // extract index
	ldr	p16, [x10, p16, UXTP #PTRSHIFT]	// load class from array
1:

#elif __LP64__
.if \needs_auth == 0 // _cache_getImp takes an authed class already
	mov	p16, \src
.else
	// 64-bit packed isa
	ExtractISA p16, \src, \auth_address
.endif
#else
	// 32-bit raw isa
	mov	p16, \src

#endif

.endmacro
Copy the code

.macro means this is a macro definition!

Look at the SUPPORT_INDEXED_ISA:

// Define SUPPORT_INDEXED_ISA=1 on platforms that store the class in the isa 
// field as an index into a class table.
// Note, keep this in sync with any .s files which also define it.
// Be sure to edit objc-abi.h as well.
#if__ARM_ARCH_7K__ >= 2 || (__arm64__ && ! __LP64__)
#   define SUPPORT_INDEXED_ISA 1
#else
#   define SUPPORT_INDEXED_ISA 0
#endif
Copy the code

It’s 64-bit now, so we’ll focus on SUPPORT_INDEXED_ISA being 0 and __LP64__ :

.if \needs_auth == 0 // _cache_getImp takes an authed class already
	mov	p16, \src
.else
	// 64-bit packed isa
	ExtractISA p16, \src, \auth_address
.endif
Copy the code

Needs_auth is the second argument, which is 1!

So go ExtractISA, pass in P16, P13 (ISA), x0:

.macro ExtractISA	and    $0, $1, #ISA_MASK.endmacro
Copy the code

Add $1 to #ISA_MASK and assign $0!

Here and we looked before the source is very similar! Isa&ISA _MASK, class!

So this step assigns class to p16!

Why do I take the class out? Insert a cache into a class.

Then finish GetClassFromIsa_p16!

4, find cache -CacheLookup

4.1 the overview,

After class is obtained:

LGetIsaDone:
	// calls imp or objc_msgSend_uncached
	CacheLookup NORMAL, _objc_msgSend, __objc_msgSend_uncached
Copy the code

Then we go to cache up, which looks up the cache:

/******************************************************************** * * CacheLookup NORMAL|GETIMP|LOOKUP 
      
        MissLabelDynamic MissLabelConstant * * MissLabelConstant is only used for the GETIMP variant. * * Locate the implementation for a selector in a class method cache. * * When this is used in a function that doesn't hold the runtime  lock, * this represents the critical section that may access dead memory. * If the kernel causes one of these functions to go down the recovery * path, we pretend the lookup failed by jumping the JumpMiss branch. * * Takes: * x1 = selector * x16 = class to be searched * * Kills: * x9,x10,x11,x12,x13,x15,x17 * * Untouched: * x14 * * On exit: (found) calls or returns IMP * with x16 = class, x17 = IMP * In LOOKUP mode, the two low bits are set to 0x3 * if we hit a constant cache (used in objc_trace) * (not found) jumps to LCacheMiss * with x15 = class * For constant caches in LOOKUP mode, the low bit * of x16 is set to 0x1 to indicate we had to fallback. * In addition, when LCacheMiss is __objc_msgSend_uncached or * __objc_msgLookup_uncached, 0x2 will be set in x16 * to remember we took the slowpath. * So the two low bits of x16 on exit mean: * 0: dynamic hit * 1: fallback to the parent class, when there is a preoptimized cache * 2: slowpath * 3: preoptimized cache hit * ********************************************************************/
      

#define NORMAL 0
#define GETIMP 1
#define LOOKUP 2

.macro CacheLookup Mode, Function, MissLabelDynamic, MissLabelConstant
	//
	// Restart protocol:
	//
	// As soon as we're past the LLookupStart\Function label we may have
	// loaded an invalid cache pointer or mask.
	//
	// When task_restartable_ranges_synchronize() is called,
	// (or when a signal hits us) before we're past LLookupEnd\Function,
	// then our PC will be reset to LLookupRecover\Function which forcefully
	// jumps to the cache-miss codepath which have the following
	// requirements:
	//
	// GETIMP:
	// The cache-miss is just returning NULL (setting x0 to 0)
	//
	// NORMAL and LOOKUP:
	// - x0 contains the receiver
	// - x1 contains the selector
	// - x16 contains the isa
	// - other registers are set as per calling conventions
	//

	mov	x15, x16			// stash the original isa
LLookupStart\Function:
	// p1 = SEL, p16 = isa
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
	ldr	p10, [x16, #CACHE]				// p10 = mask|buckets
	lsr	p11, p10, #48			// p11 = mask
	and	p10, p10, #0xffffffffffff	// p10 = buckets
	and	w12, w1, w11			// x12 = _cmd & mask
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
	ldr	p11, [x16, #CACHE]			// p11 = mask|buckets
#if CONFIG_USE_PREOPT_CACHES
#if __has_feature(ptrauth_calls)
	tbnz	p11, #0, LLookupPreopt\Function
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
#else
	and	p10, p11, #0x0000fffffffffffe	// p10 = buckets
	tbnz	p11, #0, LLookupPreopt\Function
#endif
	eor	p12, p1, p1, LSR #7
	and	p12, p12, p11, LSR #48		// x12 = (_cmd ^ (_cmd >> 7)) & mask
#else
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
	and	p12, p1, p11, LSR #48		// x12 = _cmd & mask
#endif // CONFIG_USE_PREOPT_CACHES
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
	ldr	p11, [x16, #CACHE]				// p11 = mask|buckets
	and	p10, p11, #~0xf			// p10 = buckets
	and	p11, p11, #0xf			// p11 = maskShift
	mov	p12, #0xffff
	lsr	p11, p12, p11			// p11 = mask = 0xffff >> p11
	and	p12, p1, p11			// x12 = _cmd & mask
#else
#error Unsupported cache mask storage for ARM64.
#endif

	add	p13, p10, p12, LSL #(1+PTRSHIFT)
						// p13 = buckets + ((_cmd & mask) << (1+PTRSHIFT))

						// do {
1:	ldp	p17, p9, [x13], #-BUCKET_SIZE	// {imp, sel} = *bucket--
	cmp	p9, p1				// if (sel ! = _cmd) {
	b.ne	3f				// scan more
						// } else {
2:	CacheHit \Mode				// hit: call or return imp
						/ /}
3:	cbz	p9, \MissLabelDynamic		// if (sel == 0) goto Miss;
	cmp	p13, p10			// } while (bucket >= buckets)
	b.hs	1b

	// wrap-around:
	// p10 = first bucket
	// p11 = mask (and maybe other bits on LP64)
	// p12 = _cmd & mask
	//
	// A full cache can happen with CACHE_ALLOW_FULL_UTILIZATION.
	// So stop when we circle back to the first probed bucket
	// rather than when hitting the first bucket again.
	//
	// Note that we might probe the initial bucket twice
	// when the first probed slot is the last entry.


#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
	add	p13, p10, w11, UXTW #(1+PTRSHIFT)
						// p13 = buckets + (mask << 1+PTRSHIFT)
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
	add	p13, p10, p11, LSR #(48 - (1+PTRSHIFT))
						// p13 = buckets + (mask << 1+PTRSHIFT)
						// see comment about maskZeroBits
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
	add	p13, p10, p11, LSL #(1+PTRSHIFT)
						// p13 = buckets + (mask << 1+PTRSHIFT)
#else
#error Unsupported cache mask storage for ARM64.
#endif
	add	p12, p10, p12, LSL #(1+PTRSHIFT)
						// p12 = first probed bucket

						// do {
4:	ldp	p17, p9, [x13], #-BUCKET_SIZE	// {imp, sel} = *bucket--
	cmp	p9, p1				// if (sel == _cmd)
	b.eq	2b				// goto hit
	cmp	p9, #0				// } while (sel ! = 0 &&
	ccmp	p13, p12, #0, ne		// bucket > first_probed)
	b.hi	4b

LLookupEnd\Function:
LLookupRecover\Function:
	b	\MissLabelDynamic

#if CONFIG_USE_PREOPT_CACHES
#ifCACHE_MASK_STORAGE ! = CACHE_MASK_STORAGE_HIGH_16
#error config unsupported
#endif
LLookupPreopt\Function:
#if __has_feature(ptrauth_calls)
	and	p10, p11, #0x007ffffffffffffe	// p10 = buckets
	autdb	x10, x16			// auth as early as possible
#endif

	// x12 = (_cmd - first_shared_cache_sel)
	adrp	x9, _MagicSelRef@PAGE
	ldr	p9, [x9, _MagicSelRef@PAGEOFF]
	sub	p12, p1, p9

	// w9 = ((_cmd - first_shared_cache_sel) >> hash_shift & hash_mask)
#if __has_feature(ptrauth_calls)
	// bits 63.. 60 of x11 are the number of bits in hash_mask
	// bits 59.. 55 of x11 is hash_shift

	lsr	x17, x11, #55			// w17 = (hash_shift, ...)
	lsr	w9, w12, w17			// >>= shift

	lsr	x17, x11, #60			// w17 = mask_bits
	mov	x11, #0x7fff
	lsr	x11, x11, x17			// p11 = mask (0x7fff >> mask_bits)
	and	x9, x9, x11			// &= mask
#else
	// bits 63.. 53 of x11 is hash_mask
	// bits 52.. 48 of x11 is hash_shift
	lsr	x17, x11, #48			// w17 = (hash_shift, hash_mask)
	lsr	w9, w12, w17			// >>= shift
	and	x9, x9, x11, LSR #53		// &= mask
#endif

	ldr	x17, [x10, x9, LSL #3]		// x17 == sel_offs | (imp_offs << 32)
	cmp	x12, w17, uxtw

.if \Mode == GETIMP
	b.ne	\MissLabelConstant		// cache miss
	sub	x0, x16, x17, LSR #32		// imp = isa - imp_offs
	SignAsImp x0
	ret
.else
	b.ne	5f				// cache miss
	sub	x17, x16, x17, LSR #32		// imp = isa - imp_offs
.if \Mode == NORMAL
	br	x17
.elseif \Mode == LOOKUP
	orr x16, x16, #3 // for instrumentation, note that we hit a constant cache
	SignAsImp x17
	ret
.else
.abort  unhandled mode \Mode
.endif

5:	ldursw	x9, [x10, #- 8 -]			// offset -8 is the fallback offset
	add	x16, x16, x9			// compute the fallback isa
	b	LLookupStart\Function		// lookup again with a new isa
.endif
#endif // CONFIG_USE_PREOPT_CACHES

.endmacro
Copy the code

4.2. Get buckets

First look at the arguments passed in: NORMAL, _objc_msgSend and __objc_msgSend_uncached.

Corresponding to Mode, Function, MissLabelDynamic, MissLabelConstant!

But this function takes four arguments, indicating that the last one is the default value!

Then follow along:

mov	x15, x16			// stash the original isa
Copy the code

So here we’re assigning x16 or class to x15.

Continue to:

LLookupStart\Function:	// p1 = SEL, p16 = isa
Copy the code

Function is passed _objc_msgSend, i.e. start _objc_msgSend!

Then comes a judgment:

#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
	/ /...
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
	/ /...
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
	/ /...
#else
#error Unsupported cache mask storage for ARM64.
#endif
Copy the code

Let’s start with CACHE_MASK_STORAGE:

#if defined(__arm64__) && __LP64__ 
#if TARGET_OS_OSX || TARGET_OS_SIMULATOR
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
#else
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16 // True 64-bit
#endif
#elifdefined(__arm64__) && ! __LP64__
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_LOW_4
#else
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_OUTLINED
#endif
Copy the code

Since we’re mainly looking at the 64-bit mode of the real machine, we just need to look at CACHE_MASK_STORAGE_HIGH_16:

	ldr	p11, [x16, #CACHE]			// p11 = mask|buckets
#if CONFIG_USE_PREOPT_CACHES
#if __has_feature(ptrauth_calls)
	tbnz	p11, #0, LLookupPreopt\Function
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
#else
	and	p10, p11, #0x0000fffffffffffe	// p10 = buckets
	tbnz	p11, #0, LLookupPreopt\Function
#endif
	eor	p12, p1, p1, LSR #7
	and	p12, p12, p11, LSR #48		// x12 = (_cmd ^ (_cmd >> 7)) & mask
#else
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
	and	p12, p1, p11, LSR #48		// x12 = _cmd & mask
#endif // CONFIG_USE_PREOPT_CACHES
Copy the code

LDR = x16 + #CACHE + p11!

#define CACHE

#define CACHE            (2 * __SIZEOF_POINTER__)
Copy the code

__SIZEOF_POINTER__ is the size of the pointer, i.e. the CACHE is 16!

So x16 shifted 16 bytes, so class shifted 16 bytes to get cache! P11 is the first address of the cache (_bucketsAndMaybeMask)!

Next look at CONFIG_USE_PREOPT_CACHES:

#ifdefined(__arm64__) && TARGET_OS_IOS && ! TARGET_OS_SIMULATOR && ! TARGET_OS_MACCATALYST
#define CONFIG_USE_PREOPT_CACHES 1
#else
#define CONFIG_USE_PREOPT_CACHES 0
#endif
Copy the code

In real time, 1! Just look at this passage:

#if __has_feature(ptrauth_calls)
	tbnz	p11, #0, LLookupPreopt\Function
	and	p10, p11, #0x0000ffffffffffff	// p10 = buckets
#else
	and	p10, p11, #0x0000fffffffffffe	// p10 = buckets
	tbnz	p11, #0, LLookupPreopt\Function
Copy the code

__has_feature: This function determines whether the compiler supports a feature.

Ptrauth_calls: pointer authentication for arm64E architecture; Devices using Apple A12 or later A series processors (such as iPhone XS, iPhone XS Max, and iPhone XR or newer devices) support the ARM64E architecture.

Let’s look at most cases, which are true machines below A12, the else part!

P11&0x0000fffffffffffe is assigned to p10! So what is P10?

Review how to get buckets:

struct bucket_t *cache_t::buckets(a) const
{
    uintptr_t addr = _bucketsAndMaybeMask.load(memory_order_relaxed);
    return (bucket_t *)(addr & bucketsMask);
}
Copy the code

P11 is the start address of the cache, _bucketsAndMaybeMask, 0x0000FFfffffffe is the mask, p10 is buckets!

After getting Buckets:

tbnz	p11, #0, LLookupPreopt\Function
Copy the code

TBNZ: the jump occurs if the 0th bit is not 0.

Buckets are usually zero.

4.3 Start the search

Keep going:

eor	p12, p1, p1, LSR #7
and	p12, p12, p11, LSR #48		// x12 = (_cmd ^ (_cmd >> 7)) & mask
Copy the code

LSR: move right by bit.

The hash is the same as the insert method before:

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    uintptr_t value = (uintptr_t)sel;
#if CONFIG_USE_PREOPT_CACHES
    value ^= value >> 7;
#endif
    return (mask_t)(value & mask);
}
Copy the code

So this is rehash, get the subscript, p12 is the start value (begin)!

After getting the hash subscript:

add	p13, p10, p12, LSL #(1+PTRSHIFT)
						// p13 = buckets + ((_cmd & mask) << (1+PTRSHIFT))
Copy the code

LSL: move to the left bit.

Then look for PTRSHIFT:

#if __LP64__
#define PTRSHIFT 3  // 1<<PTRSHIFT == PTRSIZE
#else
#define PTRSHIFT 2  // 1<<PTRSHIFT == PTRSIZE
#endif
Copy the code

So PTRSHIFT is 3!

P12 is the hash subscript. If you move 4 bits to the left, the memory size is multiplied by 16.

So p10 buckets translate the bucket!

So P13 is the bucket currently being looked for!

And then:

						// do {
1:	ldp	p17, p9, [x13], #-BUCKET_SIZE	// {imp, sel} = *bucket--
	cmp	p9, p1				// if (sel ! = _cmd) {
	b.ne	3f				// scan more
						// } else {
2:	CacheHit \Mode				// hit: call or return imp
						/ /}
3:	cbz	p9, \MissLabelDynamic		// if (sel == 0) goto Miss;
	cmp	p13, p10			// } while (bucket >= buckets)
	b.hs	1b
Copy the code

LDP x0, x1,[sp] : x0, x1 = values in sp stack memory.

CBZ: Comparison. If the value is zero, the switch will be forwarded.

Analysis:

1: Assign the imp and SEL of the current bucket to P17 and P9 respectively, then x13-16 (x13 is the previous bucket), then compare p9 (SEL) with the method we passed in, if not jump to 3, if not jump to 2.

2. Found the method we passed in: CacheHit CacheHit.

If p9 (sel) is empty, then MissLabelDynamic (__objc_msgSend_uncached) is the third parameter passed to CacheLookup. Check whether the address of BUCKET X13 is greater than or equal to the first address of bucket P10. If the address is greater than or equal to p10, go to 1.

These 3 steps are a do… The while loop!

Why p9 (SEL) is empty and then miss?

1. The hash algorithm for fetching and storing the cache is the same.

2. In the real case, if there is a hash conflict in the cache, then:

static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i- 1 : mask;
}
Copy the code

The real machine stores the cache by bit, so if it is empty, it means that the cache has been searched, and no cache has been found, then miss!

Similarly, if the address of the bucket is less than that of the buckets, the cache is exhausted.

4.4 CacheHit CacheHit

When we find the method we passed in, we go into the cache and hit the CacheHit:

// CacheHit: x17 = cached IMP, x10 = address of buckets, x1 = SEL, x16 = isa
.macro CacheHit
.if $0 == NORMAL
	TailCallCachedImp x17, x10, x1, x16	// authenticate and call imp
.elseif $0 == GETIMP
	mov	p0, p17
	cbz	p0, 9f			// don't ptrauth a nil imp
	AuthAndResignAsIMP x0, x10, x1, x16	// authenticate imp and re-sign as IMP
9:	ret				// return IMP
.elseif $0 == LOOKUP
	// No nil check for ptrauth: the caller would crash anyway when they
	// jump to a nil IMP. We don't care if that jump also fails ptrauth.
	AuthAndResignAsIMP x17, x10, x1, x16	// authenticate imp and re-sign as IMP
	cmp	x16, x15
	cinc	x16, x16, ne			// x16 += 1 when x15 ! = x16 (for instrumentation ; fallback to the parent class)
	ret				// return imp via x17
.else
.abort oops
.endif
.endmacro
Copy the code

$0 is the first value passed to CacheLookup, which is NORMAL!

So just look:

TailCallCachedImp x17, x10, x1, x16	// authenticate and call imp
Copy the code

Then enter the TailCallCachedImp function:

#if __has_feature(ptrauth_calls)
// JOP
.macro TailCallCachedImp
	// $0 = cached imp, $1 = address of cached imp, $2 = SEL, $3 = isa
	eor	$1, $1, $2	// mix SEL into ptrauth modifier
	eor	$1, $1, $3  // mix isa into ptrauth modifier
	brab	$0, $1
.endmacro
#else
// not JOP
.macro TailCallCachedImp
	// $0 = cached imp, $1 = address of cached imp, $2 = SEL, $3 = isa
	eor	$0, $0, $3
	br	$0
.endmacro
#endif
Copy the code

Let’s look at A12:

.macro TailCallCachedImp
	// $0 = cached imp, $1 = address of cached imp, $2 = SEL, $3 = isa
	eor	$0, $0, $3
	br	$0
.endmacro
Copy the code

Eor: xOR by bit.

We passed in X17 (IMP), X10 (buckets), X1 (SEL), X16 (ISA)!

Then xor $0 (IMP) and $3 (isa, class) and assign $0.

Why do we have xOR here?

When we insert the cache, we encode:

// Sign newImp, with &_imp, newSel, and cls as modifiers.
    uintptr_t encodeImp(UNUSED_WITHOUT_PTRAUTH bucket_t *base, IMP newImp, UNUSED_WITHOUT_PTRAUTH SEL newSel, Class cls) const {
        if(! newImp)return 0;
#if CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_PTRAUTH
        return (uintptr_t)
            ptrauth_auth_and_resign(newImp,
                                    ptrauth_key_function_pointer, 0,
                                    ptrauth_key_process_dependent_code,
                                    modifierForSEL(base, newSel, cls));
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_ISA_XOR
        return (uintptr_t)newImp ^ (uintptr_t)cls;
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_NONE
        return (uintptr_t)newImp;
#else
#error Unknown method cache IMP encoding.
#endif
    }
Copy the code

So here’s decoding to get imp!

Finally jump imp!

Here is objc_msgSend imp through sel search process!

4.5. Continue searching

If P13 (bucket) is smaller than P10 (buckets), then you jump out of the loop and continue down:

// wrap-around:
	// p10 = first bucket
	// p11 = mask (and maybe other bits on LP64)
	// p12 = _cmd & mask
	//
	// A full cache can happen with CACHE_ALLOW_FULL_UTILIZATION.
	// So stop when we circle back to the first probed bucket
	// rather than when hitting the first bucket again.
	//
	// Note that we might probe the initial bucket twice
	// when the first probed slot is the last entry.


#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
	add	p13, p10, w11, UXTW #(1+PTRSHIFT)
						// p13 = buckets + (mask << 1+PTRSHIFT)
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
	add	p13, p10, p11, LSR #(48 - (1+PTRSHIFT))
						// p13 = buckets + (mask << 1+PTRSHIFT)
						// see comment about maskZeroBits
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
	add	p13, p10, p11, LSL #(1+PTRSHIFT)
						// p13 = buckets + (mask << 1+PTRSHIFT)
#else
#error Unsupported cache mask storage for ARM64.
#endif
Copy the code

Just look at CACHE_MASK_STORAGE_HIGH_16:

	add	p13, p10, p11, LSR #(48 - (1+PTRSHIFT))
						// p13 = buckets + (mask << 1+PTRSHIFT)
						// see comment about maskZeroBits
Copy the code

Mask is the memory created by buckets, so move buckets to the right and mask memory.

So P13 is buckets’ last bucket!

And then:

	add	p12, p10, p12, LSL #(1+PTRSHIFT)
						// p12 = first probed bucket
Copy the code

P12 (BEGIN) is moved four bits to the left, that is, begin * 16 obtained by hash. Then P10 (buckets) is moved four bits to the right.

So p12 is the bucket at begin!

Continue to:

						// do {
4:	ldp	p17, p9, [x13], #-BUCKET_SIZE	// {imp, sel} = *bucket--
	cmp	p9, p1				// if (sel == _cmd)
	b.eq	2b				// goto hit
	cmp	p9, #0				// } while (sel ! = 0 &&
	ccmp	p13, p12, #0, ne		// bucket > first_probed)
	b.hi	4b
Copy the code

Here the loop above is similar:

P17, p9, then bucket–.

2. Compare p9 (SEL) to an incoming SEL, if the same is entered into a CacheHit.

3. Compare whether P9 (SEL) is not 0 and whether BUkcet is greater than bucket p12 (begin).

4. If satisfied, jump back to 4 (the beginning of the loop).

4.6 not found

If p9 (SEL) is 0, or bukcet is less than p12 (bucket at begin), continue:

LLookupEnd\Function:
LLookupRecover\Function:
	b	\MissLabelDynamic
Copy the code

MissLabelDynamic (__objc_msgSend_uncached)

Five, the summary