preface
From the previous two articles, the nature of the OC-method (top) and the nature of the OC-method (bottom), we learned a little bit about classes. We also analyze isa, superclass, bits, and cache. Today we will examine and explore the cache of a class.
Cache_t data structure
From the last two articles, we’ve seen how to restore data in Bits. Next we try to restore the data in the cache.
(lldb) p/x TPerson.class (Class) $0 = 0x0000000100008708 TPerson (lldb) p (cache_t *)0x0000000100008718 (cache_t *) $1 = 0x0000000100008718 (lldb) p *$1 (cache_t) $2 = { _bucketsAndMaybeMask = { std::__1::atomic<unsigned long> = { Value = 4298515392 } } = { = { _maybeMask = { std::__1::atomic<unsigned int> = { Value = 0 } } _flags = 32820 _occupied = 0 } _originalPreoptCache = {STD ::__1::atomic<preopt_cache_t *> = {Value = 0x0000803400000000}}}} (lldb) p $2._bucketsAndMaybeMask (explicit_atomic<unsigned long>) $3 = { std::__1::atomic<unsigned long> = { Value = 4298515392 } } (lldb) p $3.Value error: <user expression 4>:1:4: no member named 'Value' in 'explicit_atomic<unsigned long>' $3.Value ~~ ^ (lldb) p $2._maybeMask (explicit_atomic<unsigned int>) $4 = { std::__1::atomic<unsigned int> = { Value = 0 } } (lldb) p $4.Value error: <user expression 6>:1:4: no member named 'Value' in 'explicit_atomic<unsigned int>' $4.Value ~~ ^ (lldb) p $2._originalPreoptCache (explicit_atomic<preopt_cache_t *>) $5 = { std::__1::atomic<preopt_cache_t *> = { Value = 0x0000803400000000 } } (lldb) p $5.Value error: <user expression 8>:1:4: no member named 'Value' in 'explicit_atomic<preopt_cache_t *>' $5.Value ~~ ^ (lldb)Copy the code
You can see that there is _bucketsAndMaybeMask, _maybeMask, _flags, _OCCUPIED, _originalPreoptCache data in cache_T, but none of their values can be printed. So what do these numbers mean? Let’s take a look at the description and processing of the underlying source code (Apple objC source code and OBJC source code compilation).
In cache_t, I can explicitly open the _bucketsAndMaybeMask, _maybeMask, _flags, _occupied, _originalPreoptCache structures.
So where is the cache for cache_t? Where are IMP and SEl? Next we continue to look at the source code with these questions. So how to analyze the source code? Since cache_t is used to cache data, it must have methods for adding, deleting, modifying, and querying. We can do this by looking at the methods provided in cache_t.
You see here, there’s onebucket_t
.emptyBuckets()
emptybuckets
.allocateBuckets()
Open upbuckets
.emptyBucketsForCapacity()
.endMarker()
.bad_cache()
.
Keep reading
I found one hereinsert
Insert method. So let’s take a look at thisinsert
How the method is implemented.
Found in theinsert
Method, rightbucket_t
The operation is performed. Come to a conclusion:incache_t
At the heart of itbucket_t
.
So let’s look at bucket_t.
When we click inbucket_t
Source code, the first look at what we have been looking forIMP
andSEl
. So you get this onecache_t
Data structure diagram of.
Cache_t analysis
We’ve learned a little bit about the CACHE_T data structure. Let’s take a look at cache_t.
1. Bylldb
Analysis of the
According to our previous analysis of bits. And since we’ve already got cache_t in the previous section, what about IMP and SEl? With the data structure diagram for Cache_t, I started to analyze cache_t.
(lldb) p [tp formatPerson]
2021-06-23 14:13:32.768517+0800 KCObjcBuild[5296:231526] --- TPerson -- formatPerson ---
(lldb) p/x tClass
(Class) $0 = 0x0000000100008780 TPerson
(lldb) p (cache_t *)(0x0000000100008780+0x10)
(cache_t *) $1 = 0x0000000100008790
(lldb) p *$1
(cache_t) $2 = {
_bucketsAndMaybeMask = {
std::__1::atomic<unsigned long> = {
Value = 4326470912
}
}
= {
= {
_maybeMask = {
std::__1::atomic<unsigned int> = {
Value = 7
}
}
_flags = 32816
_occupied = 1
}
_originalPreoptCache = {
std::__1::atomic<preopt_cache_t *> = {
Value = 0x0001803000000007
}
}
}
}
(lldb) p $2.buckets()
(bucket_t *) $3 = 0x0000000101e0b500
(lldb) p *$3
(bucket_t) $4 = {
_sel = {
std::__1::atomic<objc_selector *> = "" {
Value = ""
}
}
_imp = {
std::__1::atomic<unsigned long> = {
Value = 48368
}
}
}
Copy the code
Now that we’ve analyzed bucket_t, what about IMP and SEl from bucket_t? Keep looking for the bucket_t source code.
Bucket_t provides sel(), IMP (), and rawImp() methods. So let’s verify this one by one.
(lldb) p $4.sel()
(SEL) $15 = "formatPerson"
Copy the code
(lldb) p $4.imp(nil, TPerson.class)
(IMP) $16 = 0x0000000100003b70 (KCObjcBuild`-[TPerson formatPerson])
Copy the code
There is one point to note here:imp
You need to pass in aUNUSED_WITHOUT_PTRAUTH bucket_t *base
The parameters. Look at the macro definition to see that this parameter can be set tonil
。 There is one point to note here:p $2.buckets()
Sometimes it is not possible to get the value. becausebuckets()
It’s a setThe hash function
Because theThe hash function
The position of the internal elements is random. Can be achieved byp $2.buckets()[1]
Index method to fetch. Why did I pass herep $2.buckets()
Just get the value out? Maybe it’s because it was implemented at the beginningp [tp formatPerson]
Methods. Or it is found when the output is in the following formatbucket_t
The value of the. I was therep $2.buckets()[6]
“Is used to get the value. So be patient and careful!
(bucket_t) $4 = {
_sel = {
std::__1::atomic<objc_selector *> = "" {
Value = ""
}
}
_imp = {
std::__1::atomic<unsigned long> = {
Value = 48368
}
}
}
Copy the code
2. ByFrom the source
Analysis of the
When we don’t have the source code, or have the source code but can’t debug the situation, then how should we analyze? Since the source code can not be compiled, so we in our own project, copy the source code to write a compiled code.
We already know the data structure of objc_class, so let’s define a tCD_objc_class.
Here’s one caveat:objc_class
Has a hidden parameter inisa
I need a one-to-one correspondence here. If not added, subsequent output will resultcache
The data is wrong.
There is a cache_t in objc_class, and we also define a tCD_cache_t.
Objc_class has a class_datA_bits_t, and we’ll also define a tCD_class_datA_bits_t.
There is a bucket_t in cache_t, and we also define a tCD_bucket_t.
We’ve customized everything we need in objc_class. The complete code is as follows:
typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits
struct tcd_bucket_t {
SEL _sel;
IMP _imp;
};
struct tcd_cache_t {
uint16_t _bucjetsAndMaybeMask; // 8
mask_t _maybeMask; // 4
uint16_t _flags; // 2
uint16_t _occupied; // 2
};
struct tcd_class_data_bits_t {
uintptr_t bits;
};
// cache class
struct tcd_objc_class {
Class isa;
Class superclass;
struct tcd_cache_t cache; // formerly cache pointer and vtable
struct tcd_class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags
};
Copy the code
All the things we need are all ready, we come down to see how to use it!
int main(int argc, char * argv[]) {
NSString * appDelegateClassName;
@autoreleasepool {
TPerson * tp = [TPerson alloc];
Class tpClass = tp.class;
[tp formatPerson0];
}
return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}
Copy the code
With tp defined, you get the tpClass. Then we need to convert the tpClass (objC_class type) to the tCD_objc_class type that we define.
struct tcd_objc_class *tcd_class = (__bridge struct tcd_objc_class *)(tpClass);
Copy the code
Next we print the cache for tCD_class.
Output: NSLog (@ "- TPerson - formatPerson - : % @", tcd_class - > cache). Print: -- TPerson -- formatPerson -- : -- [TPerson formatPerson0]Copy the code
Next we print _maybeMask and _occupied
NSLog(@"-- _maybeMask: %u -- _occupied: % U ", tcd_class->cache._maybeMask, tcd_class->cache._occupied); Print: -- _maybeMask: 3 -- _occupied: 1Copy the code
It is found that _occupied is 1 and _maybeMask is 3 and 3 positions. So our LLDB analysis prints very similar results.
We learned at THE LLDB analysis that data is stored at buckets, so I’ll tweak the structure of tCD_cache_T. Tcd_bucket_t replaces _bucjetsAndMaybeMask. The tCD_bucket_t code is adjusted as follows:
struct tcd_cache_t {
struct tcd_bucket_t *_bukets; // 8
mask_t _maybeMask; // 4
uint16_t _flags; // 2
uint16_t _occupied; // 2
};
Copy the code
We know buckets is a set of hash functions, so we will print the result through the for loop.
TPerson * tp = [TPerson alloc];
Class tpClass = tp.class;
[tp formatPerson0];
struct tcd_objc_class *tcd_class = (__bridge struct tcd_objc_class *)(tpClass);
NSLog(@"-- _maybeMask : %u -- _occupied : %u ", tcd_class->cache._maybeMask, tcd_class->cache._occupied);
for (mask_t i = 0; i < tcd_class->cache._occupied; i++) {
struct tcd_bucket_t bucket = tcd_class->cache._bukets[i];
NSLog(@"-- %@ -- %p",NSStringFromSelector(bucket._sel), bucket._imp);
}
Copy the code
We didn’t get the data we wanted, so let’s print it in a different way.
for (mask_t i = 0; i < tcd_class->cache._maybeMask; i++) {
struct tcd_bucket_t bucket = tcd_class->cache._bukets[i];
NSLog(@"-- %@ -- %pf",NSStringFromSelector(bucket._sel), bucket._imp);
}
Copy the code
Found that the last one printed the data we wanted. So why? First of all,occupied
It means it stores a few, and we only have them here1
A,maybemask
It’s how much memory we’ve opened up, and we’ve opened up3
A memory. The secondbuckets
Is aThe hash function
Set, so the value doesn’t have to be the first one. Here, when I print, the data is at the end.
Let’s continue trying to call multiple methods.
int main(int argc, char * argv[]) {
NSString * appDelegateClassName;
@autoreleasepool {
TPerson * tp = [TPerson alloc];
Class tpClass = tp.class;
[tp formatPerson0];
[tp formatPerson1];
[tp formatPerson2];
[tp formatPerson3];
struct tcd_objc_class *tcd_class = (__bridge struct tcd_objc_class *)(tpClass);
NSLog(@"-- _maybeMask : %u -- _occupied : %u ", tcd_class->cache._maybeMask, tcd_class->cache._occupied);
for (mask_t i = 0; i < tcd_class->cache._maybeMask; i++) {
struct tcd_bucket_t bucket = tcd_class->cache._bukets[i];
NSLog(@"-- %@ -- %pf",NSStringFromSelector(bucket._sel), bucket._imp);
}
}
return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}
Copy the code
Four methods formatPerson0 to 3 are called here. Shall we look at the print?
According to the print result, we can know: opened up7
Memory, but only stored2
A. Printed methodformatPerson3
和 formatPerson2
.formatPerson0
和 formatPerson1
Where did the method go? The next thing we know isUnderlying principle analysisFind out.
3. Analysis of underlying principles
Let’s start by finding the install method in the cache_t source code.
Void insert(SEL SEL, IMP IMP, id receiver); IMP IMP also has a message receiver ID Receiver. Let’s look at how the INSERT method is handled.
1. Calculate the capacity
occupied()
Initialization: The first entry isoccupied()
No value for0
And insert it into the1
,newOccupied = 1
.
2. Create capacity
INIT_CACHE_SIZE = (1 << INIT_CACHE_SIZE_LOG2)
Copy the code
Capacity = 4; But capacity is equal to 3. Now let’s look at what happens in realLocate.
在 reallocate
In does three things: 1.allocateBuckets()
Open upbuckets
Memory; 2. BysetBucketsAndMask
Set up thebuckets
andmask
The value of the; 3. ByfreeOld
Control whethercollect_free
Release memory. Let’s take a look at what each of them did in turn.
allocateBuckets()
在 allocateBuckets()
There are two main things done in Chinese: 1.bucket_t *newBuckets = (bucket_t *)calloc(bytesForCapacity(newCapacity), 1);
Open up size; 2.end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)newBuckets, nil);
给 SEL
和 IMP
The assignment.
setBucketsAndMask
在setBucketsAndMask
Is based on different architectural systems_bucketsAndMaybeMask
和 _maybeMask
Write data.
collect_free
在collect_free
In, it is mainly to empty data and reclaim memory.
3. Hash value
buckets
throughcache_hash(sel, m)
The starting position of theta, and thendo while
Loop lookup in order to preventHash conflict
thecache_next(i, m)
再The hash
.
4. 3/4 volume
More than75%
Perform capacity expansion75%
Normal insertion.
beyond75%
,capacity * 2
2 times the capacity
. The maximum capacity to be expanded does not exceedmask
The maximum value of the2 ^ 15
4. Insert call process
At insert, add a breakpoint.
Insert call stack: _objc_msgSend_uncached -> lookUpImpOrForward -> log_AND_fill_cache -> INSERT
But we don’t know what kind of process goes from [TP formatPerson] to _objc_msgSend_uncached. Next we can look at it in the form of assembly.
In assembly code, we can see that the underlying implementation of [TP formatPerson] is implemented through the objc_msgSend method. Here we see the full flow of the INSERT method. [TP formatPerson] -> objc_msgSend -> _objc_msgSend_uncached -> lookUpImpOrForward -> log_AND_fill_cache -> INSERT.
Attachment: cache_T flowchart
Supplement:
1. Processor adaptation architecture
Real machine: ARM64, simulator: I386, computer: X86_64
conclusion
With LLDB and source code debugging, we know that methods are stored in the cache, and method IMPs and SELS are stored in buckets. Then we learned to imitate objC_class source code, custom TCD_objC_class method for debugging. It is found that this method is simple to call and easy to operate. Next, we’ve examined the cache_t source code to see the stored procedure for the method. You know how tp formatPerson is called in the underlying code.