We examined the size of cache_t in iOS low-level Exploration, Class Structure Exploration (part 1). Today we’ll explore what’s actually inside cache_t.
Cache_t ()
1.1 Simple analysis of source code
First, let’s dig into the source code to see what cache_t actually looks like.
Here we must first confirm the following points:
CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
: Indicates the running environmentMacOS
, or theThe simulator
.CACHE_MASK_STORAGE_HIGH_16
: Indicates the running environmentA 64 - bit
The real machine, usually refers toARM64
The architecture.CACHE_MASK_STORAGE_LOW_4
: it means theA 64 - bit
The real machine, generally refers to32 -
.CACHE_MASK_STORAGE_OUTLINED
: indicates an unidentified device.
We are readingcache_t
Source code, there are a lot of content, a time also can not see what in the end. Again, the process of exploration is ultimately rather boring. During a long search, this was discovered:bucket_t
Why is itbucket_t
? Because I ambucket_t
I found what I was looking for in the definition of
A normal cache must store methods. Now that IMP and SEL are found in bucket_t; So that means this idea is right, and we’re going to continue to explore it.
1.2 LLDB Print cache method
Now that we’ve roughly filtered out how methods are stored in Cache_t, let’s print them through the console.
We use the previous code:
Our initial LLDB ran into problems when we reached the next phase. Where exactly are the caching methods in cache_t? (Note: pointer shifted 16 bytes here)
The $3 structure in the figure above corresponds to the source data structure:
My guess here is it should be_originalPreoptCache
, which stores the cache method. But as I continued exploring, I discovered that there was no caching method. The process is as follows:
It’s time to switch gears and look at cache_t to see if there are some corresponding methods, so buckets() :
At this point, we have buckets() :
Here we finally find SEL and IMP. But you’ll see that there’s no data in there because we didn’t call the method, so we didn’t cache the data.
Since there is no cached data, we execute the following method func to create cached data. However, when we execute the method func, we find that there is still no data, but maybeMask changes:
The main reason for this is that the cache method’s store is to compute the subscript based on the hash value. I reexecuted it and got the data I needed. (Hashes will be discussed at the end of this article.)
We can use sel() and IMP (UNUSED_WITHOUT_PTRAUTH bucket_t *base, Class CLS) to obtain sel and IMP:
-
sel
: -
imp
:
2 Non source view cache
Normally, the source code obtained from the official website cannot be compiled. In some cases, when we configure the source code, we may not be able to successfully compile it. (I’m using the command line project here.)
At this point we can take another approach that allows us to continue exploring the source code. That is the source and part of the copy to our own projects (note that not all copies) \ color {red} {the source code, part of the copy to our own projects (note that not all copies)} will source, part of the copy to our own projects (note that not all copies), for example as follows:
- copy
obj_class
For example, when we explore source code, we always go through obj_class, so we copy some of the obj_class code, change it to our own name, and copy some key information such as properties.
struct jax_objc_class {
Class isa;
Class superclass;
struct jax_cache_t cache;
struct jax_class_data_bit_t bits;
};
Copy the code
- The entire copied code looks like this:
typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits
struct jax_bucket_t {
SEL _sel;
IMP _imp;
};
struct jax_cache_t {
struct jax_bucket_t *_bukets; // 8
mask_t _maybeMask; // 4
uint16_t _flags; // 2
uint16_t _occupied; // 2
};
struct jax_class_data_bit_t {
uintptr_t bits;
};
struct jax_objc_class {
Class isa;
Class superclass;
struct jax_cache_t cache;
struct jax_class_data_bit_t bits;
};
Copy the code
- create
Person
Class, and implement some test methods:
- And then we’re going to talk about
main
Function to check if the code we copied is available. So let’s just print it out herecache
The message inside:
- And since we have a lot of options, we can print it in a loop
- Add method calls to print again; However, when we repeat the printing cycle, we find that the output print information is not normal:
3 Basic principles of cache_t
When we called multiple object methods above, our looping print failed. It is also found that _occupied and _maybeMask have also changed.
Why is that? We still need to find the answer in the source code.
3.1 occupied
First of all, void incrementOccupied();
That is, incrementOccupied() makes _occupied add itself. So we need to know where it is don’t call it.
A search revealed that it was incache_t
theinsert
Method is called within:
3.2 insert
We should get a sense of this when we look at the INSERT method. For caches, there must be an insert method. Cache_t’s INSERT is exactly that.
Let’s analyze the following insert source code:
The above part of the content, describes the development of cache space, one of the methods reallocate is worth us to study.
This method is used for both initialization and expansion, but the parameters passed in are not the same.
reallocate
As you can see, the way to enable the cache space is very simple. The first is to create a new cache space based on the value passed in. Then determine if there is an old cache, and if so, release the old cache.
Now that the cache space has been created, sel and IMP operations should be the next step.
cache_hask
This is the function that calculates the hash value:
// Class points to cache. SEL is key. Cache buckets store SEL+IMP.
// Caches are never built in the dyld shared cache.
static inline mask_t cache_hash(SEL sel, mask_t mask)
{
uintptr_t value = (uintptr_t)sel;
#if CONFIG_USE_PREOPT_CACHES
value ^= value >> 7;
#endif
return (mask_t)(value & mask);
}
Copy the code
cache_nest
This is the function that computes hash collisions:
#if CACHE_END_MARKER
static inline mask_t cache_next(mask_t i, mask_t mask) {
return (i+1) & mask;
}
#elif __arm64__
static inline mask_t cache_next(mask_t i, mask_t mask) {
return i ? i-1 : mask;
}
#else
#error unexpected configuration
#endif
Copy the code
3.3 Solutions to the above questions
When we called multiple object methods above, we had a looping print error. We then explored the insert method in the source code. Now we can explain this phenomenon.
- Object method calls increase,
_occupied
and_maybeMask
Changes in the
This is because at cache initialization, four Spaces are allocated (INIT_CACHE_SIZE == 4); With the increase of method calls, the cache space is not enough, according to the expansion algorithm in the source code, the cache space has been doubled.
mask
In hash dependent functions, we see this parameter; This is the mask, mask = capacity -1 Capacity ‘means capacity.
_occupied
The number of SEL-ImPs that already exist in the cache. There are several factors that cause _occupied to change: * init * property assignment * method calls
- The loop print above appears
A null value
What’s going on?
This is caused by the reallocation of the cache space. The old space is freed and the new space is reallocated.
sel-imp
Storage order in the cache
Now, notice this, because the subscripts are calculated by hashing, they’re not in fixed order, they’re not in order. You can see this in the second half of cache_t:: INSERT.