preface
If you want to be an iOS developer, you have to read source code. The following is a small series that the author combed in OC source code exploration — Class and Object, welcome you to read and correct, but also hope to help you.
- Object creation for OC source analysis
- OC source code analysis of ISA
- OC source code analysis and such structural interpretation
- Caching principle of OC source analysis method
- OC source code analysis method search principle
- Analysis and forwarding principle of OC source code analysis method
This article is an analysis of the method cache — cache_t (source code version objC4-756.2), so let’s move on to the body.
1 cache_t
Source code analysis
When your OC project is compiled, the instance methods of the class (SEL and IMP) are stored in the class’s method list. We know that OC, for its dynamic nature, wraps the method call as SEL’s search for IMP. Imagine if every time a method is called, you have to look up its function address in the method list of the class (or even the method list of the parent class or the root class). To address this problem, OC uses a method caching mechanism to improve call efficiency, also known as cache_t, which is used to cache methods that have been called. When a method is called, objc_msgSend looks in the cache first and executes the method if it finds one. If it is not in the cache, the class’s list of methods (including those of the parent and root classes) is searched. Once found, SEL and IMP are cached in cache_T for quick execution on the next call.
1.1 cache_t
structure
First, take a look at the structure of cache_t
struct cache_t {
struct bucket_t* _buckets; // Cache arrays, that is, hash buckets
mask_t _mask; // Cache array capacity threshold
mask_t _occupied; // Number of cached methods in the cache array.// Some functions
};
#if __LP64__
typedef uint32_t mask_t;
#else
typedef uint16_t mask_t;
#endif
struct bucket_t {
private:
#if __arm64__
uintptr_t _imp;
SEL _sel;
#else
SEL _sel;
uintptr_t _imp;
#endif.// Some methods
};
Copy the code
As you can see from the source code, cache_t is 16 bytes long on a 64-bit CPU architecture. Structurally, methods are cached in bucket_t (also known as a hash bucket). Let’s use an example to verify that cache_t caches invoked methods.
1.2 Validation of method cache
- Create a simple
Person
Class, code as follows
@interface Person : NSObject
- (void)methodFirst;
- (void)methodSecond;
- (void)methodThird;
@end
@implementation Person
- (void)methodFirst {
NSLog(@"%s", __FUNCTION__);
}
- (void)methodSecond {
NSLog(@"%s", __FUNCTION__);
}
- (void)methodThird {
NSLog(@"%s", __FUNCTION__);
}
@end
Copy the code
- Before the method call
cache_t
Make a breakpoint before the method call to see how cache_t is cached
Description:
- from
objc_class
The structure is easy to derive,0x1000011d8
iscache_t
The first address. If you are interested in class structure, please click hereOC source code analysis and such structural interpretation) - Since there are no method calls yet, so
_mask
and_occupied
Is zero
- After a method call
cache_t
Cache_t changes as follows after alloc and init are executed
As you can see from the above figure, _mask is 3 and _occupied is 1 after init is called. The _buckets pointer is changed (0x1003DB250 to 0x101700090), and SEL and IMP of the init method are cached.
Consider: 1. Where is the cache after the alloc method is called? 2. Why is init not in _buckets first?
MethodFirst, and cache_t
The _mask value is 3 (unchanged), _buckets is 2; the _buckets pointer address remains the same; SEL and IMP are cached for methodFirst.
And then we do methodSecond, and let’s see
Obviously, _BUCKETS is changed to 3, and the _buckets pointer remains the same, and the method cache for methodSecond is added.
Finally, after executing methodThird, look at the cache_t change
This time it was a different story. _mask is 7, _occupied is 1 again, and _buckets not only has the initial address changed, but also the init, methodFirst, and methodSecond methods that were previously cached, leaving only the new methodThird method. It seems that cache_t is not what we want it to be — a method is called to cache a method.
Think: What happened to the previously cached methods (init, methodFirst, methodSecond)?
1.3 cache_t
summary
Let’s go through the above example. After executing the Person instance methods init, methodFirst, methodSecond, methodThird, cache_T changes as follows
Method called | _buckets | _mask | _occupied |
---|---|---|---|
Uncalled method | empty | 0 | 0 |
init | init | 3 | 1 |
Init, methodFirst | Init, methodFirst | 3 | 2 |
Init, methodFirst, methodSecond | Init, methodFirst, methodSecond | 3 | 3 |
Init, methodFirst, methodSecond, methodThird | methodThird | 7 | 1 |
As you can see, cache_t does cache invoked methods in real time.
The validation above also helps us understand the meaning of the three cache_t member variables. A hash bucket a hash bucket a hash bucket a hash bucket Occupied; number of cached methods; At first glance, we don’t have a clue, but notice that cache_t has a function called capacity, whose source code is shown below
struct cache_t {.mask_t mask();
mask_tcapacity(); . }mask_t cache_t::mask()
{
return _mask;
}
mask_t cache_t::capacity()
{
return mask() ? mask()+1 : 0;
}
Copy the code
Therefore, if _mask is 0, it means that the instance method is not called, that is, the capacity of the bucket is 0. If _mask is not equal to 0, it means that the instance method has been called and the capacity of the bucket is _mask + 1. Therefore, _mask reflects the capacity of the bucket from the side.
2 cache_t
Method caching principle
Next, I’ll examine the principle of method caching in Cache_T, starting with the invocation of methods.
2.1 cache_fill
The essence of the OC method is message sending (that is, objc_msgSend), and the bottom line is to find the IMP through the SEL of the method. When calling a method, objc_msgSend queries cache_t for the function implementation of the method (which is efficiently implemented in assembly code), without the procedure being found in the cache. When none is in the cache, the class’s list of methods is searched until it is, and then cache_fill is called to cache the method in cache_t. The source code is shown below
void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if! DEBUG_TASK_THREADS
mutex_locker_t lock(cacheUpdateLock);
cache_fill_nolock(cls, sel, imp, receiver);
#else
_collecting_in_critical();
return;
#endif
}
Copy the code
The specific process of objc_msgSend will be analyzed in another article, which will not be described here.
2.2 cache_fill_nolock
Cache_fill goes to cache_fill_NOLock, which writes SEL and IMP to _buckets and updates _mask and _occupied.
Its source code and detailed analysis are as follows:
static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
cacheUpdateLock.assertLocked();
// If the class is not initialized
if(! cls->isInitialized())return;
Before fetching cacheUpdateLock, ensure that no other thread has written the method to the cache
if (cache_getImp(cls, sel)) return;
// Get the cache_t pointer to CLS
cache_t *cache = getCache(cls);
// newOccupied: the number of new method caches is equal to the number of current method caches +1
mask_t newOccupied = cache->occupied() + 1;
// Gets the total capacity of the current cache_t, which is mask+1
mask_t capacity = cache->capacity();
if (cache->isConstantEmptyCache()) {
// When an instance method of a class is called for the first time (as in example [1.2] of this article)cache->reallocate(capacity, capacity ? : INIT_CACHE_SIZE); }else if (newOccupied <= capacity / 4 * 3) {
// The number of caches in the new method is no more than 3/4 of the total capacity
}
else {
// The number of caches in the new method is greater than 3/4 of the total capacity and needs to be expanded
cache->expand();
}
// The sel of this bucket is 0.
// it can also be equal to sel (hash conflict, very unlikely)
bucket_t *bucket = cache->find(sel, receiver);
// If and only if sel of the bucket is 0, _occupied++ is executed
if (bucket->sel() == 0) cache->incrementOccupied();
// Update bucket SEL and IMP
bucket->set<Atomic>(sel, imp);
}
// INIT_CACHE_SIZE is 4
enum {
INIT_CACHE_SIZE_LOG2 = 2,
INIT_CACHE_SIZE = (1 << INIT_CACHE_SIZE_LOG2)
};
Copy the code
As you can see from the source code above, cache_fill_NOLock is primarily the scheduling center for the cache_t caching method, where it will
- Decided to perform
_buckets
Which cache strategy (post-initialization cache, direct cache, post-expansion cache, one of the three); - And then through the method
sel
To find abucket
And update thisbucket
thesel
andimp
. (If thisbucket
thesel
0, indicating an empty bucket, which is good enough to cache methods, so execute_occupied++
).
Consider: Why is the critical point of expansion 3/4?
2.3 reallocate
Reallocate is performed in two cases:
- One is the first initialization
_buckets
when - The other is
_buckets
When expanding the capacity
Let’s look at what RealLocate does
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
// feeOld is true if and only if '_buckets' has a caching method
bool freeOld = canBeFreed();
// Get the current buckets pointer, _buckets
bucket_t *oldBuckets = buckets();
// Open a new buckets pointer
bucket_t *newBuckets = allocateBuckets(newCapacity);
// Cache's old contents are not propagated.
// This is thought to save cache memory at the cost of extra cache fills.
// fixme re-measure this
assert(newCapacity > 0);
assert((uintptr_t) (mask_t)(newCapacity- 1) == newCapacity- 1);
// Set the new buckets and new mask (newcapacity-1) to the current _buckets and _mask
setBucketsAndMask(newBuckets, newCapacity - 1);
if (freeOld) {
// Free buckets' memory
cache_collect_free(oldBuckets, oldCapacity);
cache_collect(false); }}Copy the code
Reallocate perfectly explains several situations in example [1.2] :
init
And when it’s done,_buckets
Pointer address changed,_mask
It becomes 3;methodThird
And when it’s done,_buckets
Not only has the pointer address changed, but also the previously cachedinit
,methodFirst
andmethodSecond
The methods are gone
Note that the _occupied change happens after returning to cache_FILL_NOLock.
Thinking: after capacity expansion, why not just add the cached methods to the new BUCKETS?
2.4 expand
According to the cache_fill_nolock source code, _buckets is expanded when the number of caches (_occupied+1) is larger than the total capacity (_mask+1)
void cache_t::expand()
{
cacheUpdateLock.assertLocked();
// Get the current total capacity, that is _mask+1
uint32_t oldCapacity = capacity();
New capacity = old capacity * 2
uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;
if ((uint32_t) (mask_t)newCapacity ! = newCapacity) {// mask overflow - can't grow further
// fixme this wastes one bit of mask
newCapacity = oldCapacity;
}
reallocate(oldCapacity, newCapacity);
}
Copy the code
This function is very simple. It simply calls the RealLocate function after calculating the new capacity. Note that:
- In not more than
uint32_t
When the size is 4 bytes, it is doubled each time - If you exceed
uint32_t
, and re-apply for the same size as the originalbuckets
2.5 find
After the buckets strategy is implemented, the next step is to find a suitable location (bucket) to store SEL and IMP of the method. All find does is return an appropriate bucket, based on the SEL of the method, again using the source code
bucket_t * cache_t::find(SEL s, id receiver) { assert(s ! =0);
// Get the current buckets, that is _buckets
bucket_t *b = buckets();
// Get the current mask, that is _mask
mask_t m = mask();
// start index from sel & mask
mask_t begin = cache_hash(s, m);
mask_t i = begin;
do {
// sel = 0;
// sel = s: hit the cache, indicating that the method at position I has been cached, possibly a hash conflict
if (b[i].sel() == 0 || b[i].sel() == s) {
return&b[i]; }}while((i = cache_next(i, m)) ! = begin);// hack
// Can't find extra hash buckets (error handling, print problem). You don't usually end up here!
Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
cache_t::bad_cache(receiver, (SEL)s, cls);
}
static inline mask_t cache_hash(SEL sel, mask_t mask)
{
return (mask_t) (uintptr_t)sel & mask;
}
#if __arm__ || __x86_64__ || __i386__
// objc_msgSend has few registers available.
// Cache scan increments and wraps at special end-marking bucket.
#define CACHE_END_MARKER 1
static inline mask_t cache_next(mask_t i, mask_t mask) {
return (i+1) & mask;
}
#elif __arm64__
// objc_msgSend has lots of registers available.
// Cache scan decrements. No end marker needed.
#define CACHE_END_MARKER 0
static inline mask_t cache_next(mask_t i, mask_t mask) {
return i ? i- 1 : mask;
}
#else
#error unknown architecture
#endif
Copy the code
Find uses hash to find buckets: _buckets and cache_hash to hash buckets and cache_hash to produce index (xx & mask). Since the index value is hashed, the result is naturally unordered, which is why the init method is not in the first place of _BUCKETS in this example.
The effect of multithreading on method caching
Since the number of hash buckets increases dynamically at runtime, does it affect method caching when a method is called in a multithreaded environment? Look at the analysis below.
3.1 Multithreading simultaneously reads cache
Throughout the objc_msgSend function, no locks are added to read operations on the method cache for maximum performance. If multiple threads call the cached method at the same time, _buckets and _mask will not be changed. Therefore, it is safe to read the method cache by multiple threads at the same time.
3.2 Multithreading simultaneously write cache
We know from source before barrels quantity expansion and write data, the system USES a global mutexes (cacheUpdateLock. AssertLocked ()) to ensure that the write synchronous processing, Cache_getImp (CLS, sel) return;) In this way, even if multiple threads write to the cache of the same method at the same time, the effect will only be written once, that is, the operation of multiple threads writing to the cache at the same time will not have security risks.
3.3 Multithreading simultaneously reads and writes cache
This situation is more complicated. Let’s first look at the code for objc_msgSend read cache (arm64 architecture assembler).
.macro CacheLookup
// x1 = SEL, x16 = isa
ldp x10, x11, [x16, #CACHE] // x10 = buckets, x11 = occupied|mask
and w12, w1, w11 // x12 = _cmd & mask
add x12, x10, x12, LSL #4 // x12 = buckets + ((_cmd & mask)<<4)
ldp x9, x17, [x12] // {x9, x17} = *bucket
1: cmp x9, x1 // if(bucket->sel ! = _cmd) b.ne 2f // scan more CacheHit$0 // call or return imp
2: // not hit: x12 = not-hit bucket
CheckMiss $0 // miss if bucket->sel == 0
cmp x12, x10 // wrap if bucket == buckets
b.eq 3f
ldp x9, x17, [x12, # - 16]! // {x9, x17} = *--bucket
b 1b // loop
3: // wrap: x12 = first bucket, w11 = mask
add x12, x12, w11, UXTW #4 // x12 = buckets+(mask<<4)
// Clone scanning loop to miss instead of hang when cache is corrupt.
// The slow path may detect any corruption and halt later.
ldp x9, x17, [x12] // {x9, x17} = *bucket
1: cmp x9, x1 // if(bucket->sel ! = _cmd) b.ne 2f // scan more CacheHit$0 // call or return imp
2: // not hit: x12 = not-hit bucket
CheckMiss $0 // miss if bucket->sel == 0
cmp x12, x10 // wrap if bucket == buckets
b.eq 3f
ldp x9, x17, [x12, # - 16]! // {x9, x17} = *--bucket
b 1b // loop
3: // double wrap
JumpMiss $0
.endmacro
Copy the code
LDP the directive, which is used to read data from memory to the register, the first LDP code will take cache_t _buckets and _occupied | _mask whole structure members respectively read x10 and x11 two registers, And instead of reading cache_t member data again, subsequent code in CacheLookup keeps doing hash lookups using the values in X10 and X11. LDP x10, X11, [x16, #CACHE] matches _buckets and _mask (before capacity expansion or after capacity expansion). There is no security risk for multiple threads reading and writing the method cache at the same time.
3.3.1 Compiling memory barriers
How does the system ensure this consistency between _buckets and _mask? Let’s take a look at the written source code for these two variables
void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
// objc_msgSend uses mask and buckets with no locks.
// It is safe for objc_msgSend to see new buckets but old mask.
// (It will get a cache miss but not overrun the buckets' bounds).
// It is unsafe for objc_msgSend to see old buckets and new mask.
// Therefore we write new buckets, wait a lot, then write new mask.
// objc_msgSend reads mask first, then buckets.
// ensure other threads see buckets contents before buckets pointer
mega_barrier();
_buckets = newBuckets;
// ensure other threads see new buckets before new mask
mega_barrier();
_mask = newMask;
_occupied = 0;
}
Copy the code
This C++ code modifies _buckets and then updates _mask. To ensure that this order is not optimized by the Compiler, mega_baerrier() is used to implement the Compiler Memory Barrier.
Without a compiler memory barrier, the compiler might optimize the code to assign _mask first and _buckets second, and if another thread executed LDP x10, x11, [x16, #0x10] in between, the old _buckets and the new _mask would be left. Then appear memory array out of bounds cause program crash.
If the compiler memory barrier is added, the new _buckets and the old _mask will not cause the program to crash.
The memory barrier is compiled to ensure that the assignment of _buckets takes precedence over the assignment of _mask. That is, in any scenario where LDP x10, x11, [x16, #CACHE] is executed, the _buckets array must have a length greater than or equal to _mask+1. This ensures that no program crashes due to out-of-bounds memory arrays. It can be seen that lockless read and write techniques can be implemented to some extent with the help of memory barrier compilation techniques.
Memory barriers are also known as Memory barriers.
3.3.2 Memory garbage collection
As we know, the writer thread may expand the _buckets (creating a new _buckets memory and destroying the old _buckets memory) when the multithreaded method cache is read by another thread, and the system may crash due to a memory read exception. To solve this problem, OC uses two global arrays objc_entryPoints and objc_exitPoints, respectively, to hold the start and end addresses of all functions that access the cache
extern "C" uintptr_t objc_entryPoints[];
extern "C" uintptr_t objc_exitPoints[];
Copy the code
These functions are listed below (again using arm64 architecture assembly as an example)
.private_extern _objc_entryPoints
_objc_entryPoints:
.quad _cache_getImp
.quad _objc_msgSend
.quad _objc_msgSendSuper
.quad _objc_msgSendSuper2
.quad _objc_msgLookup
.quad _objc_msgLookupSuper2
.quad 0
.private_extern _objc_exitPoints
_objc_exitPoints:
.quad LExit_cache_getImp
.quad LExit_objc_msgSend
.quad LExit_objc_msgSendSuper
.quad LExit_objc_msgSendSuper2
.quad LExit_objc_msgLookup
.quad LExit_objc_msgLookupSuper2
.quad 0
Copy the code
When a thread expands the hash bucket, it stores the old bucket memory in a global garbage collection array variable, garbage_refs, and then iterates through all threads in the current process (in iOS, a process is an application). Check whether any thread is executing the objc_entryPoints function. If not, no thread is accessing the cache. It is safe to actually destroy all hash bucket memory blocks in garbage_refs that need to be destroyed. If yes, there is a thread accessing the cache. The cache is not processed this time, and will be checked next time and destroyed when appropriate.
Above, OC 2.0 runtime cleverly uses LDP assembly instructions, compile memory barrier technology, memory garbage collection technology and other means to solve the multi-threaded read and write lockless processing scheme, which not only ensures security, but also improves the system performance.
Here, special thanks to Ouyang Brother! This post will help you learn more about the Runtime implementation, as well as the caching of multithreaded read-write methods. I highly recommend reading it!
4 Discussion
I’m sure you’ve gained some understanding of how the cache_t caching method works. Now look at the following questions:
4.1 Cache location of class methods
Q: Where is the cache after the Person class calls the alloc method?
A: Cached in the Person metaclass cache_t. The proof is shown below
4.2 _mask
The role of
Q: Explain the function of _mask in cache_t
A: _mask is A side-view of the number of hash buckets in cache_t (number of hash buckets = _mask + 1), ensuring that no boundaries are crossed when looking for hash buckets.
We know from the above source code that the number of hash buckets per cache_t is >=4 and divisible by 4, and that _mask is equal to the number of hash buckets -1. That is, the binary bits of _mask are all ones when caching the method. When querying hash buckets in a loop, the index value is obtained by xx & _mask operation, so the index value is less than the number of hash buckets (index <= _mask, therefore index < capacity), so the boundary will not be exceeded.
4.3 Critical point of expansionThree quarters of
The discussion of the
Q: Why is the critical point of expansion 3/4?
A: To set A critical point, you have to trade off space utilization and time utilization. At the 3/4 critical point, the space utilization is high, while avoiding a considerable amount of hash conflicts, and the time utilization is high.
The critical point of capacity expansion directly affects the efficiency of the circular hash bucket search. Imagine two extremes:
When the critical point is 1, that is, when all hash buckets have methods cached, the capacity will be expanded. Although this makes the utilization of the memory space opened up to 100%, but will cause a large number of hash conflicts, aggravating the time cost of index search, resulting in low time utilization, which is contrary to the purpose of caching;
When the critical point is 0.5, which means that half of the hash bucket is occupied, the capacity is expanded. This greatly avoids hashing and is very time efficient, but it wastes half of the space and makes space inefficient. This trade of space for time is equally undesirable;
When the critical point of expansion is 3/4, both space utilization and time utilization are relatively high.
4.4 Cache lookup loop
Q: Does the cache loop find hash buckets in an infinite loop?
A: No.
When the hash bucket utilization reaches 3/4, the next cache will be expanded, i.e., the number of empty buckets will be at least 1/4 of the total number. Therefore, when the index is queried in a loop, either a hit or an empty bucket will occur, thus ending the loop.
5 concludes
With the above examples verified, the source code analyzed, and the issues discussed, a few conclusions about Cache_t are summarized:
cache_t
Ability to cache called methods.cache_t
Of the three member variables of,_buckets
Is of typestruct bucket_t *
, the pointer array, which represents a series of hash buckets (of called methods)SEL
andIMP
A bucket can cache a method._mask
Is of typemask_t
(mask_t
in64
In bitwiseuint32_t
, 4 bytes long), which is equal to the total number of hash buckets -1 (capacity - 1
), side reflects the total number of hash buckets._occupied
The type ofmask_t
It stands for the present_buckets
Number of cached methods.
- When the number of cached methods reaches a critical point (3/4 of the total number of buckets), the new method will be cached next time, the old bucket will be discarded first, and new memory will be created, that is, capacity expansion (all new buckets will be created after capacity expansion, and each method will be cached again), and then the new method will be cached
_occupied
1. - When multiple threads call a method at the same time, there are several cases:
- Multithreaded read cache: The read cache is implemented by assembly, locking free and efficient, because it does not change
_buckets
and_mask
, so there is no safety hazard. - Multithreaded write cache:
OC
Using a global mutex (cacheUpdateLock.assertLocked()
) to ensure that the cache is not written twice. - Multithreaded read and write cache:
OC
Using theLDP assembly instruction
, compiler memory barrier technology, memory garbage collection technology and other means to solve the multi-thread read and write lockless processing scheme, not only to ensure security, but also improve the performance of the system.
- Multithreaded read cache: The read cache is implemented by assembly, locking free and efficient, because it does not change
6 Reference Materials
- Deconstruct the implementation of the objc_msgSend function in depth
- Understanding Memory barriers
7 PS
- The source code project has been placed
github
The stamp, pleaseObjc4-756.2 – the source code - You can also download apple’s official ObjC4 source code to study.
- Reprint please indicate the source! Thank you very much!