Caching principle of OC source analysis method

preface

If you want to be an iOS developer, you have to read source code. The following is a small series that the author combed in OC source code exploration — Class and Object, welcome you to read and correct, but also hope to help you.

Object creation for OC source analysis

OC source code analysis of ISA

OC source code analysis and such structural interpretation

Caching principle of OC source analysis method

OC source code analysis method search principle

Analysis and forwarding principle of OC source code analysis method

This article is an analysis of the method cache — cache_t (source code version objC4-756.2), so let’s move on to the body.

1 `cache_t`Source code analysis

When your OC project is compiled, the instance methods of the class (SEL and IMP) are stored in the class’s method list. We know that OC, for its dynamic nature, wraps the method call as SEL’s search for IMP. Imagine if every time a method is called, you have to look up its function address in the method list of the class (or even the method list of the parent class or the root class). To address this problem, OC uses a method caching mechanism to improve call efficiency, also known as cache_t, which is used to cache methods that have been called. When a method is called, objc_msgSend looks in the cache first and executes the method if it finds one. If it is not in the cache, the class’s list of methods (including those of the parent and root classes) is searched. Once found, SEL and IMP are cached in cache_T for quick execution on the next call.

1.1 `cache_t`structure

First, take a look at the structure of cache_t

struct cache_t {
    struct bucket_t* _buckets;  // Cache arrays, that is, hash buckets
    mask_t _mask;               // Cache array capacity threshold
    mask_t _occupied;           // Number of cached methods in the cache array.// Some functions
};

#if __LP64__
typedef uint32_t mask_t;
#else
typedef uint16_t mask_t;
#endif

struct bucket_t {
private:
#if __arm64__
    uintptr_t _imp;
    SEL _sel;
#else
    SEL _sel;
    uintptr_t _imp;
#endif.// Some methods
};
Copy the code

As you can see from the source code, cache_t is 16 bytes long on a 64-bit CPU architecture. Structurally, methods are cached in bucket_t (also known as a hash bucket). Let’s use an example to verify that cache_t caches invoked methods.

1.2 Validation of method cache

Create a simplePersonClass, code as follows

@interface Person : NSObject

- (void)methodFirst;
- (void)methodSecond;
- (void)methodThird;

@end

@implementation Person

- (void)methodFirst {
    NSLog(@"%s", __FUNCTION__);
}

- (void)methodSecond {
    NSLog(@"%s", __FUNCTION__);
}

- (void)methodThird {
    NSLog(@"%s", __FUNCTION__);
}

@end
Copy the code

Before the method callcache_t

Make a breakpoint before the method call to see how cache_t is cached

Description:

fromobjc_classThe structure is easy to derive,0x1000011d8iscache_tThe first address. If you are interested in class structure, please click hereOC source code analysis and such structural interpretation)
Since there are no method calls yet, so_maskand_occupiedIs zero

After a method callcache_t

Cache_t changes as follows after alloc and init are executed

As you can see from the above figure, _mask is 3 and _occupied is 1 after init is called. The _buckets pointer is changed (0x1003DB250 to 0x101700090), and SEL and IMP of the init method are cached.

Consider: 1. Where is the cache after the alloc method is called? 2. Why is init not in _buckets first?

MethodFirst, and cache_t

The _mask value is 3 (unchanged), _buckets is 2; the _buckets pointer address remains the same; SEL and IMP are cached for methodFirst.

And then we do methodSecond, and let’s see

Obviously, _BUCKETS is changed to 3, and the _buckets pointer remains the same, and the method cache for methodSecond is added.

Finally, after executing methodThird, look at the cache_t change

This time it was a different story. _mask is 7, _occupied is 1 again, and _buckets not only has the initial address changed, but also the init, methodFirst, and methodSecond methods that were previously cached, leaving only the new methodThird method. It seems that cache_t is not what we want it to be — a method is called to cache a method.

Think: What happened to the previously cached methods (init, methodFirst, methodSecond)?

1.3 `cache_t`summary

Let’s go through the above example. After executing the Person instance methods init, methodFirst, methodSecond, methodThird, cache_T changes as follows

Method called	_buckets	_mask	_occupied
Uncalled method	empty	0	0
init	init	3	1
Init, methodFirst	Init, methodFirst	3	2
Init, methodFirst, methodSecond	Init, methodFirst, methodSecond	3	3
Init, methodFirst, methodSecond, methodThird	methodThird	7	1

As you can see, cache_t does cache invoked methods in real time.

The validation above also helps us understand the meaning of the three cache_t member variables. A hash bucket a hash bucket a hash bucket a hash bucket Occupied; number of cached methods; At first glance, we don’t have a clue, but notice that cache_t has a function called capacity, whose source code is shown below

struct cache_t {.mask_t mask();
    mask_tcapacity(); . }mask_t cache_t::mask() 
{
    return _mask; 
}

mask_t cache_t::capacity() 
{
    return mask() ? mask()+1 : 0; 
}
Copy the code

Therefore, if _mask is 0, it means that the instance method is not called, that is, the capacity of the bucket is 0. If _mask is not equal to 0, it means that the instance method has been called and the capacity of the bucket is _mask + 1. Therefore, _mask reflects the capacity of the bucket from the side.

2 `cache_t`Method caching principle

Next, I’ll examine the principle of method caching in Cache_T, starting with the invocation of methods.

2.1 `cache_fill`

The essence of the OC method is message sending (that is, objc_msgSend), and the bottom line is to find the IMP through the SEL of the method. When calling a method, objc_msgSend queries cache_t for the function implementation of the method (which is efficiently implemented in assembly code), without the procedure being found in the cache. When none is in the cache, the class’s list of methods is searched until it is, and then cache_fill is called to cache the method in cache_t. The source code is shown below

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if! DEBUG_TASK_THREADS
    mutex_locker_t lock(cacheUpdateLock);
    cache_fill_nolock(cls, sel, imp, receiver);
#else
    _collecting_in_critical();
    return;
#endif
}
Copy the code

The specific process of objc_msgSend will be analyzed in another article, which will not be described here.

2.2 `cache_fill_nolock`

Cache_fill goes to cache_fill_NOLock, which writes SEL and IMP to _buckets and updates _mask and _occupied.

Its source code and detailed analysis are as follows:

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
    cacheUpdateLock.assertLocked();

    // If the class is not initialized
    if(! cls->isInitialized())return;

    Before fetching cacheUpdateLock, ensure that no other thread has written the method to the cache
    if (cache_getImp(cls, sel)) return;

    // Get the cache_t pointer to CLS
    cache_t *cache = getCache(cls);

    // newOccupied: the number of new method caches is equal to the number of current method caches +1
    mask_t newOccupied = cache->occupied() + 1;
    // Gets the total capacity of the current cache_t, which is mask+1
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
        // When an instance method of a class is called for the first time (as in example [1.2] of this article)cache->reallocate(capacity, capacity ? : INIT_CACHE_SIZE); }else if (newOccupied <= capacity / 4 * 3) {
        // The number of caches in the new method is no more than 3/4 of the total capacity
    }
    else {
        // The number of caches in the new method is greater than 3/4 of the total capacity and needs to be expanded
        cache->expand();
    }

    // The sel of this bucket is 0.
    // it can also be equal to sel (hash conflict, very unlikely)
    bucket_t *bucket = cache->find(sel, receiver);
    // If and only if sel of the bucket is 0, _occupied++ is executed
    if (bucket->sel() == 0) cache->incrementOccupied();
    // Update bucket SEL and IMP
    bucket->set<Atomic>(sel, imp);
}

// INIT_CACHE_SIZE is 4
enum {
    INIT_CACHE_SIZE_LOG2 = 2,
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2)
};
Copy the code

As you can see from the source code above, cache_fill_NOLock is primarily the scheduling center for the cache_t caching method, where it will

Decided to perform_bucketsWhich cache strategy (post-initialization cache, direct cache, post-expansion cache, one of the three);
And then through the methodselTo find abucketAnd update thisbuckettheselandimp. (If thisbucketthesel0, indicating an empty bucket, which is good enough to cache methods, so execute_occupied++).

Consider: Why is the critical point of expansion 3/4?

2.3 `reallocate`

Reallocate is performed in two cases:

One is the first initialization_bucketswhen
The other is_bucketsWhen expanding the capacity

Let’s look at what RealLocate does

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    // feeOld is true if and only if '_buckets' has a caching method
    bool freeOld = canBeFreed();

    // Get the current buckets pointer, _buckets
    bucket_t *oldBuckets = buckets();
    // Open a new buckets pointer
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    assert(newCapacity > 0);
    assert((uintptr_t) (mask_t)(newCapacity- 1) == newCapacity- 1);

    // Set the new buckets and new mask (newcapacity-1) to the current _buckets and _mask
    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    if (freeOld) {
        // Free buckets' memory
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false); }}Copy the code

Reallocate perfectly explains several situations in example [1.2] :

initAnd when it’s done,_bucketsPointer address changed,_maskIt becomes 3;
methodThirdAnd when it’s done,_bucketsNot only has the pointer address changed, but also the previously cachedinit,methodFirstandmethodSecondThe methods are gone

Note that the _occupied change happens after returning to cache_FILL_NOLock.

Thinking: after capacity expansion, why not just add the cached methods to the new BUCKETS?

2.4 `expand`

According to the cache_fill_nolock source code, _buckets is expanded when the number of caches (_occupied+1) is larger than the total capacity (_mask+1)

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    // Get the current total capacity, that is _mask+1
    uint32_t oldCapacity = capacity();
    New capacity = old capacity * 2
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;

    if ((uint32_t) (mask_t)newCapacity ! = newCapacity) {// mask overflow - can't grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
    }
    
    reallocate(oldCapacity, newCapacity);
}
Copy the code

This function is very simple. It simply calls the RealLocate function after calculating the new capacity. Note that:

In not more thanuint32_tWhen the size is 4 bytes, it is doubled each time
If you exceeduint32_t, and re-apply for the same size as the originalbuckets

2.5 `find`

After the buckets strategy is implemented, the next step is to find a suitable location (bucket) to store SEL and IMP of the method. All find does is return an appropriate bucket, based on the SEL of the method, again using the source code

bucket_t * cache_t::find(SEL s, id receiver) { assert(s ! =0);
    // Get the current buckets, that is _buckets
    bucket_t *b = buckets();
    // Get the current mask, that is _mask
    mask_t m = mask();
    // start index from sel & mask
    mask_t begin = cache_hash(s, m);
    mask_t i = begin;
    do {
        // sel = 0;
        // sel = s: hit the cache, indicating that the method at position I has been cached, possibly a hash conflict
        if (b[i].sel() == 0  ||  b[i].sel() == s) {
            return&b[i]; }}while((i = cache_next(i, m)) ! = begin);// hack
    // Can't find extra hash buckets (error handling, print problem). You don't usually end up here!
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)s, cls);
}

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    return (mask_t) (uintptr_t)sel & mask;
}

#if __arm__  ||  __x86_64__  ||  __i386__
// objc_msgSend has few registers available.
// Cache scan increments and wraps at special end-marking bucket.
#define CACHE_END_MARKER 1
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask;
}

#elif __arm64__
// objc_msgSend has lots of registers available.
// Cache scan decrements. No end marker needed.
#define CACHE_END_MARKER 0
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i- 1 : mask;
}

#else
#error unknown architecture
#endif
Copy the code

Find uses hash to find buckets: _buckets and cache_hash to hash buckets and cache_hash to produce index (xx & mask). Since the index value is hashed, the result is naturally unordered, which is why the init method is not in the first place of _BUCKETS in this example.

The effect of multithreading on method caching

Since the number of hash buckets increases dynamically at runtime, does it affect method caching when a method is called in a multithreaded environment? Look at the analysis below.

3.1 Multithreading simultaneously reads cache

Throughout the objc_msgSend function, no locks are added to read operations on the method cache for maximum performance. If multiple threads call the cached method at the same time, _buckets and _mask will not be changed. Therefore, it is safe to read the method cache by multiple threads at the same time.

3.2 Multithreading simultaneously write cache

We know from source before barrels quantity expansion and write data, the system USES a global mutexes (cacheUpdateLock. AssertLocked ()) to ensure that the write synchronous processing, Cache_getImp (CLS, sel) return;) In this way, even if multiple threads write to the cache of the same method at the same time, the effect will only be written once, that is, the operation of multiple threads writing to the cache at the same time will not have security risks.

3.3 Multithreading simultaneously reads and writes cache

This situation is more complicated. Let’s first look at the code for objc_msgSend read cache (arm64 architecture assembler).

.macro CacheLookup
	// x1 = SEL, x16 = isa
	ldp	x10, x11, [x16, #CACHE]	// x10 = buckets, x11 = occupied|mask
	and	w12, w1, w11		// x12 = _cmd & mask
	add	x12, x10, x12, LSL #4	// x12 = buckets + ((_cmd & mask)<<4)

	ldp	x9, x17, [x12]		// {x9, x17} = *bucket
1:	cmp	x9, x1			// if(bucket->sel ! = _cmd) b.ne 2f // scan more CacheHit$0			// call or return imp
	
2:	// not hit: x12 = not-hit bucket
	CheckMiss $0			// miss if bucket->sel == 0
	cmp	x12, x10		// wrap if bucket == buckets
	b.eq	3f
	ldp	x9, x17, [x12, # - 16]! // {x9, x17} = *--bucket
	b	1b			// loop

3:	// wrap: x12 = first bucket, w11 = mask
	add	x12, x12, w11, UXTW #4	// x12 = buckets+(mask<<4)

	// Clone scanning loop to miss instead of hang when cache is corrupt.
	// The slow path may detect any corruption and halt later.

	ldp	x9, x17, [x12]		// {x9, x17} = *bucket
1:	cmp	x9, x1			// if(bucket->sel ! = _cmd) b.ne 2f // scan more CacheHit$0			// call or return imp
	
2:	// not hit: x12 = not-hit bucket
	CheckMiss $0			// miss if bucket->sel == 0
	cmp	x12, x10		// wrap if bucket == buckets
	b.eq	3f
	ldp	x9, x17, [x12, # - 16]! // {x9, x17} = *--bucket
	b	1b			// loop

3:	// double wrap
	JumpMiss $0
	
.endmacro
Copy the code

LDP the directive, which is used to read data from memory to the register, the first LDP code will take cache_t _buckets and _occupied | _mask whole structure members respectively read x10 and x11 two registers, And instead of reading cache_t member data again, subsequent code in CacheLookup keeps doing hash lookups using the values in X10 and X11. LDP x10, X11, [x16, #CACHE] matches _buckets and _mask (before capacity expansion or after capacity expansion). There is no security risk for multiple threads reading and writing the method cache at the same time.

3.3.1 Compiling memory barriers

How does the system ensure this consistency between _buckets and _mask? Let’s take a look at the written source code for these two variables

void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets' bounds).
    // It is unsafe for objc_msgSend to see old buckets and new mask.
    // Therefore we write new buckets, wait a lot, then write new mask.
    // objc_msgSend reads mask first, then buckets.

    // ensure other threads see buckets contents before buckets pointer
    mega_barrier();

    _buckets = newBuckets;
    
    // ensure other threads see new buckets before new mask
    mega_barrier();
    
    _mask = newMask;
    _occupied = 0;
}
Copy the code

This C++ code modifies _buckets and then updates _mask. To ensure that this order is not optimized by the Compiler, mega_baerrier() is used to implement the Compiler Memory Barrier.

Without a compiler memory barrier, the compiler might optimize the code to assign _mask first and _buckets second, and if another thread executed LDP x10, x11, [x16, #0x10] in between, the old _buckets and the new _mask would be left. Then appear memory array out of bounds cause program crash.

If the compiler memory barrier is added, the new _buckets and the old _mask will not cause the program to crash.

The memory barrier is compiled to ensure that the assignment of _buckets takes precedence over the assignment of _mask. That is, in any scenario where LDP x10, x11, [x16, #CACHE] is executed, the _buckets array must have a length greater than or equal to _mask+1. This ensures that no program crashes due to out-of-bounds memory arrays. It can be seen that lockless read and write techniques can be implemented to some extent with the help of memory barrier compilation techniques.

Memory barriers are also known as Memory barriers.

3.3.2 Memory garbage collection

As we know, the writer thread may expand the _buckets (creating a new _buckets memory and destroying the old _buckets memory) when the multithreaded method cache is read by another thread, and the system may crash due to a memory read exception. To solve this problem, OC uses two global arrays objc_entryPoints and objc_exitPoints, respectively, to hold the start and end addresses of all functions that access the cache

extern "C" uintptr_t objc_entryPoints[];
extern "C" uintptr_t objc_exitPoints[];
Copy the code

These functions are listed below (again using arm64 architecture assembly as an example)

.private_extern _objc_entryPoints
_objc_entryPoints:
	.quad   _cache_getImp
	.quad   _objc_msgSend
	.quad   _objc_msgSendSuper
	.quad   _objc_msgSendSuper2
	.quad   _objc_msgLookup
	.quad   _objc_msgLookupSuper2
	.quad   0

.private_extern _objc_exitPoints
_objc_exitPoints:
	.quad   LExit_cache_getImp
	.quad   LExit_objc_msgSend
	.quad   LExit_objc_msgSendSuper
	.quad   LExit_objc_msgSendSuper2
	.quad   LExit_objc_msgLookup
	.quad   LExit_objc_msgLookupSuper2
	.quad   0
Copy the code

When a thread expands the hash bucket, it stores the old bucket memory in a global garbage collection array variable, garbage_refs, and then iterates through all threads in the current process (in iOS, a process is an application). Check whether any thread is executing the objc_entryPoints function. If not, no thread is accessing the cache. It is safe to actually destroy all hash bucket memory blocks in garbage_refs that need to be destroyed. If yes, there is a thread accessing the cache. The cache is not processed this time, and will be checked next time and destroyed when appropriate.

Above, OC 2.0 runtime cleverly uses LDP assembly instructions, compile memory barrier technology, memory garbage collection technology and other means to solve the multi-threaded read and write lockless processing scheme, which not only ensures security, but also improves the system performance.

Here, special thanks to Ouyang Brother! This post will help you learn more about the Runtime implementation, as well as the caching of multithreaded read-write methods. I highly recommend reading it!

4 Discussion

I’m sure you’ve gained some understanding of how the cache_t caching method works. Now look at the following questions:

4.1 Cache location of class methods

Q: Where is the cache after the Person class calls the alloc method?

A: Cached in the Person metaclass cache_t. The proof is shown below

4.2 `_mask`The role of

Q: Explain the function of _mask in cache_t

A: _mask is A side-view of the number of hash buckets in cache_t (number of hash buckets = _mask + 1), ensuring that no boundaries are crossed when looking for hash buckets.

We know from the above source code that the number of hash buckets per cache_t is >=4 and divisible by 4, and that _mask is equal to the number of hash buckets -1. That is, the binary bits of _mask are all ones when caching the method. When querying hash buckets in a loop, the index value is obtained by xx & _mask operation, so the index value is less than the number of hash buckets (index <= _mask, therefore index < capacity), so the boundary will not be exceeded.

4.3 Critical point of expansion`Three quarters of`The discussion of the

Q: Why is the critical point of expansion 3/4?

A: To set A critical point, you have to trade off space utilization and time utilization. At the 3/4 critical point, the space utilization is high, while avoiding a considerable amount of hash conflicts, and the time utilization is high.

The critical point of capacity expansion directly affects the efficiency of the circular hash bucket search. Imagine two extremes:

When the critical point is 1, that is, when all hash buckets have methods cached, the capacity will be expanded. Although this makes the utilization of the memory space opened up to 100%, but will cause a large number of hash conflicts, aggravating the time cost of index search, resulting in low time utilization, which is contrary to the purpose of caching;

When the critical point is 0.5, which means that half of the hash bucket is occupied, the capacity is expanded. This greatly avoids hashing and is very time efficient, but it wastes half of the space and makes space inefficient. This trade of space for time is equally undesirable;

When the critical point of expansion is 3/4, both space utilization and time utilization are relatively high.

4.4 Cache lookup loop

Q: Does the cache loop find hash buckets in an infinite loop?

A: No.

When the hash bucket utilization reaches 3/4, the next cache will be expanded, i.e., the number of empty buckets will be at least 1/4 of the total number. Therefore, when the index is queried in a loop, either a hit or an empty bucket will occur, thus ending the loop.

5 concludes

With the above examples verified, the source code analyzed, and the issues discussed, a few conclusions about Cache_t are summarized:

cache_tAbility to cache called methods.
cache_tOf the three member variables of,
- _bucketsIs of typestruct bucket_t *, the pointer array, which represents a series of hash buckets (of called methods)SELandIMPA bucket can cache a method.
- _maskIs of typemask_t(mask_tin64In bitwiseuint32_t, 4 bytes long), which is equal to the total number of hash buckets -1 (capacity - 1), side reflects the total number of hash buckets.
- _occupiedThe type ofmask_tIt stands for the present_bucketsNumber of cached methods.
When the number of cached methods reaches a critical point (3/4 of the total number of buckets), the new method will be cached next time, the old bucket will be discarded first, and new memory will be created, that is, capacity expansion (all new buckets will be created after capacity expansion, and each method will be cached again), and then the new method will be cached_occupied1.
When multiple threads call a method at the same time, there are several cases:
- Multithreaded read cache: The read cache is implemented by assembly, locking free and efficient, because it does not change_bucketsand_mask, so there is no safety hazard.
- Multithreaded write cache:OCUsing a global mutex (cacheUpdateLock.assertLocked()) to ensure that the cache is not written twice.
- Multithreaded read and write cache:OCUsing theLDP assembly instruction, compiler memory barrier technology, memory garbage collection technology and other means to solve the multi-thread read and write lockless processing scheme, not only to ensure security, but also improve the performance of the system.

6 Reference Materials

Deconstruct the implementation of the objc_msgSend function in depth
Understanding Memory barriers

7 PS

The source code project has been placedgithubThe stamp, pleaseObjc4-756.2 – the source code
You can also download apple’s official ObjC4 source code to study.
Reprint please indicate the source! Thank you very much!

Caching principle of OC source analysis method

preface

1 cache_tSource code analysis

1.1 cache_tstructure

1.2 Validation of method cache

1.3 cache_tsummary

2 cache_tMethod caching principle

2.1 cache_fill

2.2 cache_fill_nolock

2.3 reallocate

2.4 expand

2.5 find