preface
The nature of objects in IOS Underlying principles & Isa association class and the structure analysis of IOS underlying principles are analyzed respectively isa and bits. Class member variables include superclass and cache. Today we will explore the underlying principles of cache. To be honest, I don’t think there’s anything to explore in cache. It’s just a cache. I admit that I lost my mind, and I apologize to Cache. I personally think that exploring the bottom layer of cache is quite complicated, and there is a lot of design ideas in apple’s bottom layer code
The preparatory work
- Quick effect and heart saving pill
- Medlar tea
- Objc4-818.2 – the source code
Analysis of Cache Structure
First, look at the cache type cache_t. In the source code, look at the specific type of cache_t and find that the underlying structure is also a structure
cache_t
Structure analysis
struct cache_t {
private:
explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
union {
struct {
explicit_atomic<mask_t> _maybeMask;
#if __LP64__
uint16_t _flags;
#endif
uint16_t _occupied;
};
explicit_atomic<preopt_cache_t *> _originalPreoptCache;
};
/ * # if defined (__arm64__) && __LP64__ # if TARGET_OS_OSX | | TARGET_OS_SIMULATOR / / __arm64__ simulator # define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS #else // Real machine of __arm64__ #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16 #endif #elif defined(__arm64__) && ! __LP64__ // 32-bit real machine #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_LOW_4 #else //macOS emulator #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_OUTLINED #endif ****** In the middle are judgments between different architectures mainly used to mask different types of masks and buckets */
public:
void incrementOccupied(a);
void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
void reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld);
unsigned capacity(a) const;
struct bucket_t *buckets(a) const;
Class cls(a) const;
void insert(SEL sel, IMP imp, id receiver);
// The following are the methods that are basically other methods
};
Copy the code
_bucketsAndMaybeMask
variableuintptr_t
Account for8 bytes
andisa_t
In thebits
Similarly, a pointer type holds the addressA consortium
There is aThe structure of the body
And a pointer to a structure_originalPreoptCache
The structure of the body
There are three member variables in_maybeMask
._flags
._occupied
.__LP64__
Refers to theUnix
andUnix
Such systems (Linx
andmacOS
)_originalPreoptCache
And the structure isThe mutex
The,_originalPreoptCache
The initial cache, now looking at caching in classes, this variable is rarely usedcache_t
Provides common methods to obtain values, and to obtain them according to different architectural systemsmask
andbuckets
The mask
You see buckets() in cache_t, which is similar to the methods() provided in class_data_bits_t, which get values from methods. View the source code for bucket_t
struct bucket_t {
private:
// IMP-first is better for arm64e ptrauth and no worse for arm64.
// SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__ / / real machine
explicit_atomic<uintptr_t> _imp;
explicit_atomic<SEL> _sel;
#else
explicit_atomic<SEL> _sel;
explicit_atomic<uintptr_t> _imp;
#endif.// The following is method ellipsis
};
Copy the code
bucket_t
Distinguish betweenA:
andother
But the variables don’t change_sel
and_imp
Just in a different orderbucket_t
It contains the following_sel
and_imp
.cache
What’s cached in there should bemethods
cache_t
Overall structure drawing
lldb
Debug verification
First, create the LWPerson class, customize some instance methods, create the instantiation object of LWPerson in the main function, and then debug LLDB
cache
The address of the variable, requiredThe first address
The offset16 bytes
namely0x10
.cache
The address of theThe first address
+0x10
cache_t
The methods inbuckets()
It points to the first address of a block of memory, which is the first onebucket
The address of thep/x $3.buckets()[indx]
Print the rest in memorybucket
found_sel
andimp
LWPerson
Object does not call object methods,buckets
There is no cache method for data in
Call the object method in LLDB, and [p sayHello] continues LLDB debugging
- call
sayHello
Later,_mayMask
andoccupied
These two variables should be related to the cache bucket_t
Structure providessel()
andimp(nil,pClass)
methodssayhello
methodssel
andimp
There,bucket
In,cache
In the
conclusion
Debug through LLDB, combined with source code. The cache stores methods, and the sel and IMP of the methods exist in buckets. LLDB debugging can be tricky such as retrieving bukets() after calling a method, which is uncomfortable and not smooth. Is there a silky way? It’s a must
Transcoding test
With LLDB debugging and the source code, you can basically figure out the structure of cache_t. We can mimic the cache_t code structure so that we don’t need to pass LLDB in the source environment. If you need to call a method, just add code and run it again. This is the most familiar way.
typedef uint32_t mask_t;
struct lw_bucket_t {
SEL _sel;
IMP _imp;
};
struct lw_cache_t{
struct lw_bucket_t * _buckets;
mask_t _maybeMask;
uint16_t _flags;
uint16_t _occupied;
};
struct lw_class_data_bits_t{
uintptr_t bits;
};
struct lw_objc_class {
Class ISA;
Class superclass;
struct lw_cache_t cache;
struct lw_class_data_bits_t bits;
};
int main(int argc, const char * argv[]) {
@autoreleasepool {
LWPerson * p = [LWPerson alloc];
[p sayHello1];
[p sayHello2];
//[p sayHello3];
//[p sayHello4];
//[p sayHello5];
Class lwClass = [LWPerson class];
struct lw_objc_class * lw_class = (__bridge struct lw_objc_class *)(lwClass);
NSLog(@" - %hu - %u",lw_class->cache._occupied,lw_class->cache._maybeMask);
for (int i = 0; i < lw_class->cache._maybeMask; i++) {
struct lw_bucket_t bucket =lw_class->cache._buckets[i];
NSLog(@"%@ - %p".NSStringFromSelector(bucket._sel),bucket._imp); }}return 0;
}
Copy the code
2021- 06- 23 14:51:20.003332+0800 testClass[7899:291790] ---[LWPerson sayHello1]---
2021- 06- 23 14:51:20.003432+0800 testClass[7899:291790] ---[LWPerson sayHello2]---
2021- 06- 23 14:51:20.003516+0800 testClass[7899:291790] - 2 - 3
2021- 06- 23 14:51:20.003603+0800 testClass[7899:291790] sayHello2 - 0x80b0
2021- 06- 23 14:51:20.003688+0800 testClass[7899:291790] sayHello1 - 0x8360
2021- 06- 23 14:51:20.003778+0800 testClass[7899:291790] (null) - 0x0
Copy the code
objc_class
theClass ISA
It’s commented out becauseobjc_class
Is inheritedobjc_object
She can inheritobjc_object
theClass ISA
, a custom structurelw_objc_class
To add it manuallyClass ISA
Otherwise, the code conversion will make a conversion error- The simpler the structure is, the better, as long as it shows the main information
Add the sayHello3, sayHello4, and sayHello5 methods and see the print result
2021- 06- 23 14:53:45.514704+0800 testClass[7944:294241] ---[LWPerson sayHello1]---
2021- 06- 23 14:53:45.514817+0800 testClass[7944:294241] ---[LWPerson sayHello2]---
2021- 06- 23 14:53:45.514899+0800 testClass[7944:294241] ---[LWPerson sayHello3]---
2021- 06- 23 14:53:45.514982+0800 testClass[7944:294241] ---[LWPerson sayHello4]---
2021- 06- 23 14:53:45.515069+0800 testClass[7944:294241] ---[LWPerson sayHello5]---
2021- 06- 23 14:53:45.515161+0800 testClass[7944:294241] - 3 - 7
2021- 06- 23 14:53:45.515235+0800 testClass[7944:294241] (null) - 0x0f
2021- 06- 23 14:53:45.515316+0800 testClass[7944:294241] sayHello3 - 0x180b8
2021- 06- 23 14:53:45.515411+0800 testClass[7944:294241] (null) - 0x0f
2021- 06- 23 14:53:45.515525+0800 testClass[7944:294241] sayHello4 - 0x180e8
2021- 06- 23 14:53:45.515610+0800 testClass[7944:294241] (null) - 0x0f
2021- 06- 23 14:53:45.515743+0800 testClass[7944:294241] sayHello5 - 0x180d8
2021- 06- 23 14:53:45.515827+0800 testClass[7944:294241] (null) - 0x0
Copy the code
We have the following questions
_occupied
and_maybeMask
What is? Why is it still changing?sayHello1
andsayHello2
How did the method disappear? Did someone do the magic?cache
Why are the locations stored out of order? Such assayHello2
insayHello1
The front,sayHello3
The front seat is empty
With these questions in mind, what’s next for cache_t? Wondering what are _occupied and _maybeMask? You have to look at the source code and see where the value was assigned. To cache the method, we first need to figure out how to insert the method into buket. With that in mind, let’s wade through the cache_t source code
cache_t
The source code to explore
First findMethods the cache
The entrance of theinsert(SEL sel, IMP imp, id receiver)
It has parameters in itsel
andimp
That’s the way we’re familiar with it. And there are method namesinsert
Take a look at the concrete implementation of it becauseinsert
The code in the code is too much we step by step
Calculate the current capacity
occupied()
Getting the current capacity tells you how many are in the cachebucket
thenewOccupied = occupied() + 1
Which indicates the number of entries you are in the cacheoldCapacity
The purpose is to free up old memory during capacity expansion
Open the capacity
- Only the first time the method is cached, the capacity is allocated by default
capacity = INIT_CACHE_SIZE
即capacity = 4
is4
abucket
Memory size of reallocate(oldCapacity, capacity, /* freeOld */false)
Open up memory,freeOld
The variable controls whether old memory is freed
reallocate
Methods to explore
reallocate
The method mainly does three things
allocateBuckets
Create a memorysetBucketsAndMask
Set up themask
andbuckets
The value of thecollect_free
Whether to free old memory byfreeOld
control
allocateBuckets
Methods to explore
The allocateBuckets method does two things
calloc(bytesForCapacity(newCapacity), 1)
Open upnewCapacity * bucket_t
Size of memoryend->set
Store the last location of the open memorysel
=1
.imp
=The address of the first buket location
setBucketsAndMask
Methods to explore
SetBucketsAndMask writes data to _bucketsAndMaybeMask and _maybeMask based on different architectural systems
collect_free
Methods to explore
Collect_free clears data and reclaims memory
setBucketsAndMask
Methods to explore
Capacity is less thanThree quarters of
- The total capacity of the methods that need to be cached
Three quarters of
If yes, it goes straight to the cache process - When you look at apple’s design philosophy, you can see that there is a lot of leeway in what Apple does. One may be for future optimization or expansion, and the other may be for security, as is memory alignment
Capacity is full
- Apple provides variables, very user-friendly, if you need to fill up the cache, the default is not to fill up
- Personally, I suggest that you do not fill the storage, just follow the default. If the storage is full, there may be other problems, which is difficult to troubleshoot
Capacity of more thanThree quarters of
- Capacity of more than
Three quarters of
, the system will then proceedTwice the capacity
, the maximum capacity to be expanded cannot exceedmask
The maximum size of the2 ^ 15
- During capacity expansion, an important operation is performed to open up new memory and release and reclaim old memory
freeOld = true
The cache method
- First of all get
bucket()
Point to open up the memory first address, that is, the firstbucket
The address,bucket()
It’s not an array or a linked list, just a contiguic block of memory hash
Function based on cachesel
andmask
To calculate thehash
The subscript. Why do you need itmask
?mask
What this actually does is it tells the system that you can only save the frontcapacity - 1
In, for examplecapacity = 4
, the cache method can only store the front3
A vacant seat- Start caching. If there is no data at the current location, cache the method. If there is a method in this location and it is the same as your method, it is cached
return
. If there isHash conflict
With the same subscript,sel
No, it’s going to happen againhash
, the conflict resolution continues to cache
cache_hash
和 cache_next
Cache_hash mainly generates hash subscripts, and cache_next mainly resolves hash conflicts
Cache write methodset
The set writes sel and IMP to the bucket, starting the caching method
incrementOccupied
The _occupied automatically increments 1, and _occupied indicates the number of cache methods already stored in memory
insert
Calling process
How does calling an instance method call a cache insert method? Start with a breakpoint in the INSERT method, and then run the source code
The stack information on the left shows the process of invoking the INSERT method: _objc_msgSend_uncached –> lookUpImpOrForward –> log_and_fill_cache –> cache_t:: INSERT
The stack is only shown to _objC_MSgSend_cached, but we call [p sayHello1] which is the instance method and then we call cache_t:: INSERT. We now know part of the process _OBJC_MSgSend_cached to CACHE_T :: INSERT. The process from [p sayHello1] to _objC_msgSend_cached is not clear. How to explore, it is not determined to see the compilation
[p sayHello1] implements the objc_msgSend method, but we don’t know what this method does
objc_msgSend
Methods the inside_objc_msgSend_uncached
“, suddenly open, keep the cloud to see the moon. At this point the whole process is connected- call
insert
Method flow:[p sayHello1]
The underlying implementationobjc_msgSend
–>_objc_msgSend_uncached
–>lookUpImpOrForward
–>log_and_fill_cache
–>cache_t::insert
insert
Call flow chart
cache_t
Schematic diagram
conclusion
Meaning of each variable in cache_t
_bucketsAndMaybeMask
storagebuckets
和msak
(real machine),macOS
orThe simulator
storagebuckets
_maybeMask
Is the mask data used to hash the subscript in a hash algorithm or hash collision algorithm_maybeMask
=capacity -1
_occupied
As the number of caches increases, the expansion is_occupied
=0
- The data was lost because
capacity
When the old memory is reclaimed the data is all cleared cache
storagebucket
The positions of theta are out of order because theta is thetahash
According to yoursel
andmask
So it’s not fixed
conclusion
The process of exploration is really painful and happy, although the process of exploration is boring, but stick to it will have a harvest. I thought the bottom layer of cache was simple, but the truth is, cache is really powerful.
supplement
cache_t
In theinsert
Methods,
cache
– Superclass method
Careful friends may notice that when testing code conversion. All methods are called on the current class, not on the superclass. What happens when you call a method that is the parent class?
int main(int argc, const char * argv[]) {
@autoreleasepool {
LWPerson * p = [LWPerson alloc];
[p init];
[p sayAllPerson]; // Superclass method
Class pClass = [LWPerson class];
struct LW_objc_class * LW_pClass = (__bridge struct LW_objc_class *)(pClass);
NSLog(@"%hu - %u",LW_pClass->cache._occupied,LW_pClass->cache._maybeMask);
for (int i = 0; i < LW_pClass->cache._maybeMask; i++) {
struct LW_bucket_t bucket =LW_pClass->cache._buckets[i];
NSLog(@"%@ - %p".NSStringFromSelector(bucket._sel),bucket._imp); }}}return 0;
}
Copy the code
2021- 06- 24 19:27:47.309212+0800 KCObjcBuild[1140:17922] 2 - 3
2021- 06- 24 19:27:47.309558+0800 KCObjcBuild[1140:17922] init - 0x33dca0
2021- 06- 24 19:27:47.309859+0800 KCObjcBuild[1140:17922] sayAllPerson - 0x7cd0
2021- 06- 24 19:27:47.309923+0800 KCObjcBuild[1140:17922] (null) - 0x0
Copy the code
NSObject
The methods ininit
And custom superclass methodssayAllPerson
Is cached in the class that is currently calling it- When a subclass calls a method of its parent class, the parent class’s method is cached in the subclass so that the subclass can call the lookup method more quickly next time
lldb
Debugging appear7
In LLDB debugging, an instance method is called, but _maybeMask = 7. A method _maybeMask = 3 is also called in the way the code simulates the transformation. Why is 7 displayed in LLDB? First print sel and IMP source in the source code as follows
Call the instance method in the LLDB to view the LLDB information
When the sayHello1 method is called, the method is not cached yet, but memory has been created once, and the three buckets have values
- There are ways to cache
NSObject
therespondsToSelector
Method,NSObject
theclass
Method. There’s another unknown method - One last unknown method
sel
=0x1
, i.e.,sel
=1
Is this familiar? Explore it aboveallocateBuckets
Method, one line of code is highlightedend->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)newBuckets, nil)
. The last onebucket
It’s saved by defaultsel
=1
andimp
=Open the first address of memory
That isbuckets()
The address. Note that this timeset
The last parameter of the methodcls
Travels isnil
It’s equivalent to not matchingimp
coding
To verify that imp at sel=0x1 is the address of Buckets (), print out the address of Buckets () in the source code as follows
lldb
Analysis of thesel
=0x1
theimp
The address andbuckets()
The address is the samesel
=0x1
theimp
Why are we talking toclass
forExclusive or operation
. Analysis and exploration follow
conclusion
lldb
debugging_maybeMask
=7
The reason:sayHello1
When you call it, it’s already cached2
A method ofrespondsToSelector
andclass
.sel
=0x1
The default is stored when the memory is created. So when you callsayHello1
Is the number one3
Cache of more than the current capacityThree quarters of
So go ahead and expandrespondsToSelector
andclass
Method cache clearance. thesayHello1
Put it in the cache. solldb
And that’s what we’re presented with_maybeMask
=7
.occupied
=1
- call
respondsToSelector
andclass
Reason: Speculationlldb
The environment is not the same as the running environment. For the first time,[LWPerson alloc]
Yes initializesNSObject
namely[NSObject alloc]
, sorespondsToSelector
andclass
Caches toNSObject
In the class.lldb
The environment will not be initializedNSObject
namely[NSObject alloc]
, call the instance method, and it will arriveNSObject
And then cache it toLWPerson
In the class
imp
The codec
The IMP address in the bucket, stored is encoded into the uintPtr_t type data, decoding is restored to the original IMP
imp
coding
b[i].set<Atomic, Encoded>(b, sel, imp, cls())
The cachesel
.imp
.set
Method is calledencodeImp
.encodeImp
Methods toimp
coding(uintptr_t)newImp ^ (uintptr_t)cls
namelyExclusive or operation
bucket
The inside of theimp
Whether to codec, in addition to external variable control, the main is to seebucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls))
thecls
Is the parameternil
.cls
Have a valueimp
To code,cls
There is no valueimp
No coding. So the cache opens up the last one in memorybucket
callset
methodscls
=nil
And what you get when you code it is the sameimp
It’s equivalent to no coding
imp
decoding
imp
The way it’s decoded isExclusive or operation
andimp
The codingExclusive or operation
Is the same- The above
lldb
Debugging appear7
To print informationimp(nil, cls())
.imp(nil, cls())
Yeah, the last one.bucket
theimp
To make aExclusive or operation
“, so want to recoverimp
The original address needs to be manually done onceExclusive or operation
Exclusive or operation
Xor operation: Two values that participate in the operation, resulting in 0 if the corresponding bits are the same, and 1 otherwise
int main(int argc, const char * argv[]) {
@autoreleasepool {
int a= 5 ; int b = 10; int c = 0;
}
return 0;
}
Copy the code
C is equal to a to the b, so if you want to restore a a is equal to c to the b or a is equal to a to the b to the b. If you want to restore b, b is equal to c to the a or b is equal to a to the b to the a
Xor operation details
a = 5 0000 0101
b = 10 0000 1010
c = a ^ b 0000 0101 ^ 0000 1010 = 0000 1111 = 15
a = c ^ b 0000 1111 ^ 0000 1010 = 0000 0101 = 5
Copy the code