In the previous chapter, we talked about the underlying structure of ISA – class, analyzed the underlying structure of the class (objC_class), and summarized the ISA /superclass bitmap,class_data_bits_t bits. One remaining cache_t cache is a brief introduction to the cache_t cache. In this article, we’ll explore the cache.

Cache_t cache

From the name, it’s all about caching, which we talk about from time to time in development. The official definition of cache: cache is a memory that can exchange data at high speed. It exchanges data with the CPU before the memory, so the speed is very fast. As you can see, the cache is important both in everyday and official terms, and the subsequent exploration process shows how complex it is, as well as the source code contains a lot of design ideas from Apple’s underlying code.

Cache_t source code analysis

(Source code uses objC4-818.2)

struct cache_t { private: explicit_atomic<uintptr_t> _bucketsAndMaybeMask; union { struct { explicit_atomic<mask_t> _maybeMask; #if __LP64__ uint16_t _flags; #endif uint16_t _occupied; }; explicit_atomic<preopt_cache_t *> _originalPreoptCache; }; / * # if defined (__arm64__) && __LP64__ # if TARGET_OS_OSX | | TARGET_OS_SIMULATOR / / __arm64__ simulator # define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS #else //__arm64__ true machine #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16 #endif #elif defined(__arm64__) && ! __LP64__ // 32-bit true machine #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_LOW_4 #else //macOS emulator #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_OUTLINED #endif ****** The middle is a mask used mainly for different types of masks and buckets: void incrementOccupied(); void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask); void reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld); unsigned capacity() const; struct bucket_t *buckets() const; Class cls() const; void insert(SEL sel, IMP imp, id receiver); // The following method is not related to exploration for the moment};Copy the code

CACHE_MASK_STORAGE

As you can see from the source code, CACHE_MASK_STORAGE is handled in four different scenarios defined by macros. CACHE_MASK_STORAGE is based on different architectures.

A quick review of Apple’s instruction set

Armv7 | ARMV7S | ARM64 are all instruction sets of ARM processors

I386 | x86_64 is the instruction set of the Mac processor

__LP64__ is simply the length of a CPU address

__arm64__, __LP64__ related extensions

Differences between ARM64 and LP64

Linux programming model ILP32 and LP64

IOS armV7, ARMV7s, and ARM64 differ from application 32-bit and 64-bit configurations

CONFIG_USE_PREOPT_CACHES

Cache_t internally implements configuration item CONFIG_USE_PREOPT_CACHES that are optimized for some systems.

// __arm64__ && IOS operating system &&! Simulator &&! TARGET_OS_MACCATALYST #if defined(__arm64__) && TARGET_OS_IOS && ! TARGET_OS_SIMULATOR && ! TARGET_OS_MACCATALYST #define CONFIG_USE_PREOPT_CACHES 1 #else #define CONFIG_USE_PREOPT_CACHES 0 #endifCopy the code

The following explorations are based on the X86_64 architecture. Some contents may vary depending on the architecture.

  • _bucketsAndMaybeMaskvariableuintptr_tAccounting for 8 bytes andisa_tBits is a pointer type that holds an address
  • A union has a structure and a structure pointer_originalPreoptCache
  • There are three member variables in the structure_maybeMask._flags._occupied.
  • _originalPreoptCacheAnd the structure are mutually exclusive,_originalPreoptCacheThe initial cache, now exploring the cache in the class, this variable is almost never used
  • cache_tThere is a common way to get values and masks for masks and buckets depending on the architecture system

Buckets () is represented in cache_t, which is similar to methods() provided in class_data_bits_t, which get values by methods. Check out the bucket_t source code

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__ //真机
    explicit_atomic<uintptr_t> _imp;
    explicit_atomic<SEL> _sel;
#else
    explicit_atomic<SEL> _sel;
    explicit_atomic<uintptr_t> _imp;
#endif
  ....
  //省略无关源码
};
Copy the code
  • bucket_tDistinguish between the real machine and the others, but the variables do not change_seland_impJust in a different order
  • bucket_tWhat’s in there is_seland_imp.cacheIt’s supposed to cache methods

Cache_t overall structure diagram

LLDB debugging verification

First create the XJPerson class, customize some instance methods, create an instantiation object of XJPerson in the main function, and then perform LLDB debugging

  • p/x pClassAfter passing the first address0x0000000100008740+ offset value0x10Get the cache address
  • thenp (cache_t*)$1Convert the address tocache_ttype
  • throughp *$2 The address fetching method displays the contents of the cache_maybeMask=0 _occupied = 0Because the method hasn’t been called yet
  • p/x $3.buckets()Getting the buckets address is the first address of the array
  • p *$4Displays the memory inside bucket,_sel and _IMP

Now that there are no methods cached, call object methods through LLDB,[p sayHello]Continue LLDB debugging

  1. callsayHelloLater,_mayMaskandoccupiedThese two variables should be related to the cache
  2. bucket_tThe structure providessel()andimp(nil,pClass)methods
  3. sayhellomethodsselandimpThere,bucketIn,cacheIn the

summary

Through LLDB debugging, combined with source code. Methods are stored in the cache. Sel and IMP of methods are bucket. The LLDB debugging above can help us to analyze the internal structure of the cache, but it is tedious and prone to error operation, operation error will have to start again. It’s inconvenient. Is there a more convenient way?

Separate source code by project search

Test code:

void testCase2() {
    Class cls = XJPerson.class;
    struct xj_objc_class *xj_cls = (__bridge  struct xj_objc_class *)cls;
    struct xj_cache_t cache = xj_cls->cache;
    struct xj_bucket_t *buckets = cache._bukets;
    for (uint32_t i = 0; i < cache._maybeMask; i++) {
        struct xj_bucket_t bucket = buckets[i];
        NSLog(@"%@", NSStringFromSelector(bucket._sel));
    }
}

int main(int argc, const char * argv[]) {
    XJPerson *p = [XJPerson alloc];
    [p say1];
    [p say2];
    testCase2();
    return 0;
}
Copy the code

Test results:

(null)
say1
say2
Copy the code

What about the cache process? We’ve found the cache_t:: INSERT method, so let’s take a look at it and see how powerful it is.

cache_t::insert

  • Compare the current memory usage with the total cache usage when SEL/IMP is inserted to see whether it needs to be expanded.
  • Then check whether the current cache is empty and apply for bucket memory space to store SEL or IMP.
  • Finally, check whether the current total capacity exceeds 3/4, which requires capacity expansion. In this case, the previously applied bucket will be recycled.

Note: The hash table load factor chooses the 3/4 threshold because the probability of hash collisions when data is inserted below this threshold is low. If a hash collision occurs, the overhead of choosing to re-hash or inserting data through a linked list/red-black tree under a hash subscript can be very high.

  • The bucket hash subscript position is calculated by SEL hash algorithm and SEL/IMP is inserted
void cache_t::insert(SEL sel, IMP imp, id receiver) { ... // Use the cache as-is if until we exceed our expected fill ratio. Mask_t newOccupied = occupied() + 1; // _occupied + 1 unsigned oldCapacity = capacity(), capacity = oldCapacity; Slowpath (isConstantEmptyCache())) {// Cache is empty // Cache is read-only. Replace it. If (! capacity) capacity = INIT_CACHE_SIZE; 4 reallocate(oldCapacity, capacity, /* freeOld */false); Elseif (fastPath (newOccupied + CACHE_END_MARKER); // (occupied() + 1) + CACHE_END_MARKER <= Cache_fill_ratio (capacity)) {// newOccupied+1; Use it as-is.} // (occupied() + 1) + CACHE_END_MARKER <= Total capacity #if CACHE_ALLOW_FULL_UTILIZATION else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= {// newOccupied+1; // newOccupied+1; Use it as.} #endif else {// (occupied() + (4); // If the final number is occupied; // If the final number is occupied, the final number is occupied; 5 + 2 is greater than 8 x (3/4). Capacity = Capacity? capacity * 2 : INIT_CACHE_SIZE; If (capacity > MAX_CACHE_SIZE) {// MAX_CACHE_SIZE = 1 << 16, 65536 Capacity = MAX_CACHE_SIZE; } reallocate(oldCapacity, capacity, true); // bucket *b = bucket (); mask_t m = capacity - 1; mask_t begin = cache_hash(sel, m); Cache_hash (sel, m) ==> value & mask mask_t I = begin; // Scan the unused position and insert it, Insert sel/imp // Scan for the first unused slot and insert there. // There is guaranteed to be an empty slot. Do {if (fastPath (b[I].sel() == 0)) {// set sel incrementOccupied(); // _occupied++; Set <Atomic, Encoded>(b, sel, imp, CLS ()); // sel/imp return; If (b[I].sel() == sel) {if (b[I].sel() == sel) {if (b[I].sel() == sel) {if (b[I].sel() == sel) {if (b[I].sel() == sel) grabbed the cacheUpdateLock. return; } / / hash, cache_next (I, m) = = > (I + 1) & mask / / 1. Mask range, back to find barrels / / 2. Cache_next (7, 7) ==> 8&7 ==> 1000&111 ==> 0, return to 0, start again} while (fastPath ((I = cache_next(I, m))! = begin)); // Throw exceptions bad_cache(receiver, (SEL) SEL); #endif // ! DEBUG_TASK_THREADS }Copy the code

(Lines 826-895 of objc-cache.mm source location file)

Verifying Capacity Expansion Rules

@interface XJPerson : NSObject @end @implementation XJPerson - (void)say1 {} - (void)say2 {} - (void)say3 {} - (void)say4 {} - (void)say5 {} -  (void)say6 {} - (void)say7 {} - (void)say8 {} - (void)say9 {} @end void testCase1() { XJPerson *p = [XJPerson alloc]; [p say1]; // Initial capacity 4-1 [p say2]; [p say3]; [p say4] [p say4]; [p say5]; [p say6]; [p say7]; [p say8]; [p say9] [p say9]; NSLog(@"---"); } int main(int argc, const char * argv[]) { testCase1(); return 0; }Copy the code

Expansion beforeAfter the expansion

Cache_t: : flow chart of the insert

supplement

Why is capacity expansion done at 3/4 of capacity?

  1. The load factor of 3/4 is the consensus of most data structure algorithms, and the space utilization is relatively large when the load factor is 0.75.
  2. When the cache is stored, the value calculated by the hash algorithm is used as the storage subscript. The remaining size of the cache space is critical to whether the subscript conflicts. When 3/4 is used as the load factor, the probability of hash conflicts is relatively low.

Analysis of the_bucketsAndMaybeMaskThe role of

In the above exploration, we have obtained the following data structure:

_bucketsAndMaybeMask = ‘bucketSandMaybemask’;

You can see that _bucketsAndMaybeMask stores the starting address of buckets()

Official explanation:

// _bucketsAndMaybeMask is a buckets_t pointer;
// _maybeMask is the buckets mask
Copy the code
  1. To obtain_bucketsAndMaybeMaskThen you can get the next one by memory translationbucketThe address;
  2. buckets()The method is actually right_bucketsAndMaybeMaskInside the value of the operation, getbucket_t;

The source code is as follows:

bucketMaskThe value of the definition varies according to the architecture;

bucketsThe values

  • Buckets () operates on _bucketsAndMaybeMask to get bucket_t.

  • Buckets () returns $n as bucket_t, but p $n[1] is often used to value buckets. Bucket_t looks like an array, but bucket() is not an array.

  • The bucket_t returned by buckets () is only a structure, but the space generated when storing bucket_t is a continuous memory space, so as long as the location of bucket_t is known, the memory of the next adjacent area can be obtained by memory translation.

  • Using p $n[1] has the same effect as using p $n+1;

  • Arrays are also evaluated by memory translation. When +1 is performed, the translation unit is determined by the data type inside the array.

Hash functions (cache_hash) and quadratic hash functions (cache_next)

Cache_hash function for the first time

mask_t m = capacity - 1; // Capacity -1:4-1=3 mask_t BEGIN = cache_hash(sel, m); // Obtain the hash value based on the capacity -1 and sel parameters mask_t I = begin; static inline mask_t cache_hash(SEL sel, mask_t mask) { uintptr_t value = (uintptr_t)sel; // Change sel to unsigned long integer #if CONFIG_USE_PREOPT_CACHES value ^= value >> 7; #endif return (mask_t)(value & mask); // Use sel transform value and mask to get the result by bit; }Copy the code

Secondary hash function cache_next

Static inline mask_t cache_NEXT (mask_t I, mask_t I, mask_t I, mask_t I, mask_t mask) { return i ? i-1 : mask; // If I is not 0, return i-1, otherwise return mask (capacity -1); It can be interpreted that the location where the conflict occurs is at the beginning of buckets. If it is not at the beginning of buckets, they move forward directly; if they jump to the position of capacity-1 at the beginning of buckets, they move forward in turn until they reach the position of begin again. }Copy the code