preface

From the previous two articles, the nature of the OC-method (top) and the nature of the OC-method (bottom), we learned a little bit about classes. We also analyze isa, superclass, bits, and cache. Today we will examine and explore the cache of a class.

Cache_t data structure

From the last two articles, we’ve seen how to restore data in Bits. Next we try to restore the data in the cache.

(lldb) p/x TPerson.class (Class) $0 = 0x0000000100008708 TPerson (lldb) p (cache_t *)0x0000000100008718 (cache_t *) $1 =  0x0000000100008718 (lldb) p *$1 (cache_t) $2 = { _bucketsAndMaybeMask = { std::__1::atomic<unsigned long> = { Value = 4298515392 } } = { = { _maybeMask = { std::__1::atomic<unsigned int> = { Value = 0 } } _flags = 32820 _occupied = 0 } _originalPreoptCache = {STD ::__1::atomic<preopt_cache_t *> = {Value = 0x0000803400000000}}}} (lldb) p $2._bucketsAndMaybeMask (explicit_atomic<unsigned long>) $3 = { std::__1::atomic<unsigned long> = { Value = 4298515392 } } (lldb) p $3.Value error: <user expression 4>:1:4: no member named 'Value' in 'explicit_atomic<unsigned long>' $3.Value ~~ ^ (lldb) p $2._maybeMask (explicit_atomic<unsigned int>) $4 = { std::__1::atomic<unsigned int> = { Value = 0 } } (lldb) p $4.Value error: <user expression 6>:1:4: no member named 'Value' in 'explicit_atomic<unsigned int>' $4.Value ~~ ^ (lldb) p $2._originalPreoptCache (explicit_atomic<preopt_cache_t *>) $5 = { std::__1::atomic<preopt_cache_t *> = { Value = 0x0000803400000000 } } (lldb) p $5.Value error: <user expression 8>:1:4: no member named 'Value' in 'explicit_atomic<preopt_cache_t *>' $5.Value ~~ ^ (lldb)Copy the code

You can see that there is _bucketsAndMaybeMask, _maybeMask, _flags, _OCCUPIED, _originalPreoptCache data in cache_T, but none of their values can be printed. So what do these numbers mean? Let’s take a look at the description and processing of the underlying source code (Apple objC source code and OBJC source code compilation).

In cache_t, I can explicitly open the _bucketsAndMaybeMask, _maybeMask, _flags, _occupied, _originalPreoptCache structures.

So where is the cache for cache_t? Where are IMP and SEl? Next we continue to look at the source code with these questions. So how to analyze the source code? Since cache_t is used to cache data, it must have methods for adding, deleting, modifying, and querying. We can do this by looking at the methods provided in cache_t.

You see here, there’s onebucket_t.emptyBuckets()emptybuckets.allocateBuckets()Open upbuckets.emptyBucketsForCapacity().endMarker().bad_cache().

Keep reading

I found one hereinsertInsert method. So let’s take a look at thisinsertHow the method is implemented.

Found in theinsertMethod, rightbucket_tThe operation is performed. Come to a conclusion:incache_tAt the heart of itbucket_t.

So let’s look at bucket_t.

When we click inbucket_tSource code, the first look at what we have been looking forIMPandSEl. So you get this onecache_tData structure diagram of.

Cache_t analysis

We’ve learned a little bit about the CACHE_T data structure. Let’s take a look at cache_t.

1. BylldbAnalysis of the

According to our previous analysis of bits. And since we’ve already got cache_t in the previous section, what about IMP and SEl? With the data structure diagram for Cache_t, I started to analyze cache_t.

(lldb) p [tp formatPerson]
2021-06-23 14:13:32.768517+0800 KCObjcBuild[5296:231526]  --- TPerson -- formatPerson ---
(lldb) p/x tClass
(Class) $0 = 0x0000000100008780 TPerson
(lldb) p (cache_t *)(0x0000000100008780+0x10)
(cache_t *) $1 = 0x0000000100008790
(lldb) p *$1
(cache_t) $2 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4326470912
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic<unsigned int> = {
          Value = 7
        }
      }
      _flags = 32816
      _occupied = 1
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0001803000000007
      }
    }
  }
}
(lldb) p $2.buckets()
(bucket_t *) $3 = 0x0000000101e0b500
(lldb) p *$3
(bucket_t) $4 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 48368
    }
  }
}
Copy the code

Now that we’ve analyzed bucket_t, what about IMP and SEl from bucket_t? Keep looking for the bucket_t source code.

Bucket_t provides sel(), IMP (), and rawImp() methods. So let’s verify this one by one.

(lldb) p $4.sel()
(SEL) $15 = "formatPerson"
Copy the code
(lldb) p $4.imp(nil, TPerson.class)
(IMP) $16 = 0x0000000100003b70 (KCObjcBuild`-[TPerson formatPerson])
Copy the code

There is one point to note here:impYou need to pass in aUNUSED_WITHOUT_PTRAUTH bucket_t *baseThe parameters. Look at the macro definition to see that this parameter can be set tonilThere is one point to note here:p $2.buckets()Sometimes it is not possible to get the value. becausebuckets()It’s a setThe hash functionBecause theThe hash functionThe position of the internal elements is random. Can be achieved byp $2.buckets()[1]Index method to fetch. Why did I pass herep $2.buckets()Just get the value out? Maybe it’s because it was implemented at the beginningp [tp formatPerson]Methods. Or it is found when the output is in the following formatbucket_tThe value of the. I was therep $2.buckets()[6]“Is used to get the value. So be patient and careful!

(bucket_t) $4 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 48368
    }
  }
}
Copy the code

2. ByFrom the sourceAnalysis of the

When we don’t have the source code, or have the source code but can’t debug the situation, then how should we analyze? Since the source code can not be compiled, so we in our own project, copy the source code to write a compiled code.

We already know the data structure of objc_class, so let’s define a tCD_objc_class.

Here’s one caveat:objc_classHas a hidden parameter inisaI need a one-to-one correspondence here. If not added, subsequent output will resultcacheThe data is wrong.

There is a cache_t in objc_class, and we also define a tCD_cache_t.

Objc_class has a class_datA_bits_t, and we’ll also define a tCD_class_datA_bits_t.

There is a bucket_t in cache_t, and we also define a tCD_bucket_t.

We’ve customized everything we need in objc_class. The complete code is as follows:

typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits

struct tcd_bucket_t {
    SEL _sel;
    IMP _imp;
};

struct tcd_cache_t {
    uint16_t _bucjetsAndMaybeMask; // 8
    mask_t    _maybeMask; // 4
    uint16_t  _flags;  // 2
    uint16_t  _occupied; // 2
};

struct tcd_class_data_bits_t {
    uintptr_t bits;
};

// cache class
struct tcd_objc_class {
    Class isa;
    Class superclass;
    struct tcd_cache_t cache;             // formerly cache pointer and vtable
    struct tcd_class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
};
Copy the code

All the things we need are all ready, we come down to see how to use it!

int main(int argc, char * argv[]) {
    NSString * appDelegateClassName;
    @autoreleasepool {
        TPerson * tp = [TPerson alloc];
        Class tpClass = tp.class;
        [tp formatPerson0];        
    }
    return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}
Copy the code

With tp defined, you get the tpClass. Then we need to convert the tpClass (objC_class type) to the tCD_objc_class type that we define.

struct tcd_objc_class *tcd_class = (__bridge struct tcd_objc_class *)(tpClass);
Copy the code

Next we print the cache for tCD_class.

Output: NSLog (@ "- TPerson - formatPerson - : % @", tcd_class - > cache). Print: -- TPerson -- formatPerson -- : -- [TPerson formatPerson0]Copy the code

Next we print _maybeMask and _occupied

NSLog(@"-- _maybeMask: %u -- _occupied: % U ", tcd_class->cache._maybeMask, tcd_class->cache._occupied); Print: -- _maybeMask: 3 -- _occupied: 1Copy the code

It is found that _occupied is 1 and _maybeMask is 3 and 3 positions. So our LLDB analysis prints very similar results.

We learned at THE LLDB analysis that data is stored at buckets, so I’ll tweak the structure of tCD_cache_T. Tcd_bucket_t replaces _bucjetsAndMaybeMask. The tCD_bucket_t code is adjusted as follows:

struct tcd_cache_t {
    struct tcd_bucket_t *_bukets; // 8
    mask_t    _maybeMask; // 4
    uint16_t  _flags;  // 2
    uint16_t  _occupied; // 2
};
Copy the code

We know buckets is a set of hash functions, so we will print the result through the for loop.

TPerson * tp = [TPerson alloc];
Class tpClass = tp.class;
    
[tp formatPerson0];
struct tcd_objc_class *tcd_class = (__bridge struct tcd_objc_class *)(tpClass);
        
NSLog(@"-- _maybeMask : %u -- _occupied : %u ", tcd_class->cache._maybeMask, tcd_class->cache._occupied);
        
for (mask_t i = 0; i < tcd_class->cache._occupied; i++) {
    struct tcd_bucket_t bucket = tcd_class->cache._bukets[i];
    NSLog(@"-- %@ -- %p",NSStringFromSelector(bucket._sel), bucket._imp);
}
Copy the code

We didn’t get the data we wanted, so let’s print it in a different way.

for (mask_t i = 0; i < tcd_class->cache._maybeMask; i++) {
    struct tcd_bucket_t bucket = tcd_class->cache._bukets[i];
    NSLog(@"-- %@ -- %pf",NSStringFromSelector(bucket._sel), bucket._imp);
}
Copy the code

Found that the last one printed the data we wanted. So why? First of all,occupiedIt means it stores a few, and we only have them here1A,maybemaskIt’s how much memory we’ve opened up, and we’ve opened up3A memory. The secondbucketsIs aThe hash functionSet, so the value doesn’t have to be the first one. Here, when I print, the data is at the end.

Let’s continue trying to call multiple methods.

int main(int argc, char * argv[]) {
    NSString * appDelegateClassName;
    @autoreleasepool {
        TPerson * tp = [TPerson alloc];
        Class tpClass = tp.class;
        [tp formatPerson0];
        [tp formatPerson1];
        [tp formatPerson2];
        [tp formatPerson3];
        
        struct tcd_objc_class *tcd_class = (__bridge struct tcd_objc_class *)(tpClass);
        
        NSLog(@"-- _maybeMask : %u -- _occupied : %u ", tcd_class->cache._maybeMask, tcd_class->cache._occupied);
        
        
        for (mask_t i = 0; i < tcd_class->cache._maybeMask; i++) {
            struct tcd_bucket_t bucket = tcd_class->cache._bukets[i];
            NSLog(@"-- %@ -- %pf",NSStringFromSelector(bucket._sel), bucket._imp);
        }
        
    }
    return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}
Copy the code

Four methods formatPerson0 to 3 are called here. Shall we look at the print?

According to the print result, we can know: opened up7Memory, but only stored2A. Printed methodformatPerson3formatPerson2.formatPerson0formatPerson1Where did the method go? The next thing we know isUnderlying principle analysisFind out.

3. Analysis of underlying principles

Let’s start by finding the install method in the cache_t source code.

Void insert(SEL SEL, IMP IMP, id receiver); IMP IMP also has a message receiver ID Receiver. Let’s look at how the INSERT method is handled.

1. Calculate the capacity

occupied()Initialization: The first entry isoccupied()No value for0And insert it into the1newOccupied = 1.

2. Create capacity

INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2)
Copy the code

Capacity = 4; But capacity is equal to 3. Now let’s look at what happens in realLocate.

reallocateIn does three things: 1.allocateBuckets()Open upbucketsMemory; 2. BysetBucketsAndMaskSet up thebucketsandmaskThe value of the; 3. ByfreeOldControl whethercollect_freeRelease memory. Let’s take a look at what each of them did in turn.

allocateBuckets()

allocateBuckets()There are two main things done in Chinese: 1.bucket_t *newBuckets = (bucket_t *)calloc(bytesForCapacity(newCapacity), 1);Open up size; 2.end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)newBuckets, nil);SELIMPThe assignment.

setBucketsAndMask

setBucketsAndMaskIs based on different architectural systems_bucketsAndMaybeMask_maybeMaskWrite data.

collect_free collect_freeIn, it is mainly to empty data and reclaim memory.

3. Hash value

bucketsthroughcache_hash(sel, m)The starting position of theta, and thendo whileLoop lookup in order to preventHash conflictthecache_next(i, m)The hash.

4. 3/4 volume

More than75%Perform capacity expansion75%Normal insertion.

beyond75%,capacity * 2 2 times the capacity. The maximum capacity to be expanded does not exceedmaskThe maximum value of the2 ^ 15

4. Insert call process

At insert, add a breakpoint.

Insert call stack: _objc_msgSend_uncached -> lookUpImpOrForward -> log_AND_fill_cache -> INSERT

But we don’t know what kind of process goes from [TP formatPerson] to _objc_msgSend_uncached. Next we can look at it in the form of assembly.

In assembly code, we can see that the underlying implementation of [TP formatPerson] is implemented through the objc_msgSend method. Here we see the full flow of the INSERT method. [TP formatPerson] -> objc_msgSend -> _objc_msgSend_uncached -> lookUpImpOrForward -> log_AND_fill_cache -> INSERT.

Attachment: cache_T flowchart

Supplement:

1. Processor adaptation architecture

Real machine: ARM64, simulator: I386, computer: X86_64

conclusion

With LLDB and source code debugging, we know that methods are stored in the cache, and method IMPs and SELS are stored in buckets. Then we learned to imitate objC_class source code, custom TCD_objC_class method for debugging. It is found that this method is simple to call and easy to operate. Next, we’ve examined the cache_t source code to see the stored procedure for the method. You know how tp formatPerson is called in the underlying code.