IOS: Cache_t analysis

IOS martial arts esoteric article summary

Writing in the front

The class structure was covered in detail in the last article, but cache_t cache remains uncovered. This article will examine Cache_t from the source level.

A possible secret Demo for this section

Cache_t

1) cache_t structure

Here is the underlying structure of the class

Among themcache_tThe structure ofOne of the_bucketsAndMaybeMask is a buckets_t pointer, it isbucket_tStructure pointer to type.From the abovebucket_tYou can see in the properties and methods it should be associated withimpThere is a connection — in factbucket_tAs a bucket, the inside is used to holdimpMethod implementation and itskeySo through the above two structure source known, and wecacheThe cache insel-impThe overall structure is shown in the figure below

Look for SEL-IMP in cache_t

There are two ways to find stored SEL-IMP in cache_t

Find through source code – LLDB debugging
Look for the source code in the project

The preparatory work

To define aTCJPersonClass, and defineTwo attributesandFive instance methodsAnd its implementation
inmainDefined in theTCJPersonThe object of the classpersonAnd call thethreeInstance method, inpersonCall the first method with a breakpoint

Find an LLDB debug through the source code

Run execution, break in[person sayHello];At this point, perform the followingLLDBDebugging process
To obtain the cache attribute, we need to translate the first address of the pclass by 16 bytes (the isa pointer is 8 bytes and the superclass pointer is 8 bytes)
From the source analysis, we know that SEL-IMP is in the _buckets attribute of cache_T (currently in macOS), and that cache_t provides a method to get the _buckets attribute buckets()
If you get the _buckets attribute, you can get SEL-IMP. The bucket_t structure also provides the corresponding sel() and IMP (UNUSED_WITHOUT_PTRAUTH bucket_t *base, Class CLS).

If a method is not called, the cache is not cached. If a method is called, the cache is cached once.

So now we know how to get it, rightcacheIn thesel-imp, how to verify printedselandimpIs that what we call? Can be achieved bymachoViewOpen thetargetTo view it in the method listimpIs the value consistent as shown below, found to be consistent, so print thissel-impisTCJPersonInstance method of

Following the previous step, we call a method again, this time we want to get the second onesel, its debuggingLLDBThe following

You can simply call the corresponding method from the first address of _buckets. How about the second one? In the previous iOS martial arts secrets ④ : P *($9+1); sel and IMP ($9+ I); p *($9+ I);

Separate source code by project search

Moving away from the source environment is what will be requiredSection of source codeCopy the complete code to the project as follows

One thing to notice here is that in the source code,objc_classtheISAAttributes are inherited fromobjc_objectBut when we copied it in, we removed itobjc_classYou need to make this property explicit, otherwise the printed result is problematic, as shown in the following figure

addISAProperty, add two method calls, and the correct print should look like this

Add two method calls, namely unpacksayMaster,sayNAThe printed result is as follows

In view of the above printed results, there are several questions

1._maskWhat is?
2,_occupiedWhat is?
3. Why does it print as method calls increaseoccupied 和 maskWill change?
4,bucketWhy is the data lost? For example, when calling four methods, onlysayMaster,sayNAMethods have function Pointers.

Dive into cache_t

Finding an entry point

First of all, from thecache_tIn the_maskAttribute start analysis, findcache_tThe function that causes the change inincrementOccupied()function

The concrete implementation of this function is

Source code, global searchincrementOccupied()Function, found only incache_ttheinsertMethod has calls
Cache_t (sel-IMP); sel-IMP (SEL-IMP)

Global searchcache_t::insert, found that before writing, there is one more operation, namelycacheRead, that is, look upsel-imp, as shown below

Insert method analysis

ininsertMethod, the source code implementation is as followsIt is mainly divided into the following parts

Step 1: Calculate the currentCache usage
[Step 2] According toCache usageDetermine the operation to be performed
[Step 3] For the need to storebucketAn internalImp and SEL assignments

[Step 1] Calculate the current cache usage

According to theoccupiedCalculates the current cache usage whenProperty is not assigned and no method calls are madeAt this timeoccupied()for0And thenewOccupiedfor1, as shown belowRegarding the calculation of cache usage, there are the following notes:

allocWhen applying for space, the object has already been created, if called againinitMethod,occupiedWill also be+ 1
whenThere are property assignmentsIs called implicitlysetMethod,occupiedIt’s also going to increase, which isThere are several attribute assignments.occupieditWe'll add a few more
whenThere are method callsWhen,occupiedIt’s also going to increase, which isA few timesThe call,occupieditWe'll add a few more

[Step 2] Determine the operation to be performed based on the cache usage

If it is created for the first time, it is enabled by default4a
If the cache is occupiedLess than or equal to 3/4, no processing is done
If the cache is occupiedMore than three-quarters, you need to performDouble the capacity and respace

Reallocate method: Open up space

The method, inFirst CreationAs well asTwice the capacity, will be used, its source code implementation as shown in the figureThere are mainly the following steps

allocateBucketsMethod: To the systemAllocating memoryThat open upbucketFor the time of,bucketIt’s just a temporary variable

The setBucketsAndMask method stores the temporary bucket in the cache in one of two cases:

If it isA:, according to thebucketandmaskThe location is stored and willoccupiedOccupation set to0
ifNot a real machine, normal storagebucketandmaskAnd willoccupiedOccupation set to0

If there’s an old onebuckets, you need toClear the previous cacheThat callcollect_freeMethod, the source code implementation is as follows

The implementation of this method mainly includes the following steps:

_garbage_make_roomMethod: Create a garbage collection space
- If it isFor the first time,, you need toAllocating reclaimed space
- ifNot the first time,Enlarge the memory segment, i.e.,Original memory *2
The record is stored this timebucket
cache_collectHow to: Recycle and clean up the oldbucket

[Step 3] Perform internal IMP and SEL assignments for the buckets to be stored

The cache_hash method, or hash algorithm, is used to calculate the hash subscripts stored in SEL-IMP in the following three cases

If I hash the position of the subscriptNot stored selIs the subscript positionGet sel is equal to 0At this time willsel-impStore it in, and put it inoccupiedTake up the sizeAdd 1
If the current hash subscript is storedsel Is equal to theAbout to be insertedsel, directly returns
If the current hash subscript is storedsel Is not equal toAbout to be insertedsel, then go through againcache_nextMethods theHash collision algorithm, re-hash calculation, get a new subscript, and then compare for storage

The source code of the two hash algorithms involved is as follows

cache_hash: Hash algorithm
cache_next: Hash collision algorithm

Cache_t query point

① What is _mask?

_mask indicates the mask data, which is used to calculate the hash subscript in the hash algorithm or hash conflict algorithm. Mask is equal to capacity – 1

② What are _occupied?

_occupied; selIMP occupied; _occupied; selIMP occupied; _occupied;

initCan lead tooccupiedchange
Attribute assignment, will also be called implicitlysetMethod, result inoccupiedchange
The method call, which may lead tooccupiedchange

③ Why does the printed version of occupied and mask change as method calls increase?

As the number of method calls increases, the number of sel-ImPs stored in the cache exceeds three-quarters of the total capacity (newOccupied + CACHE_END_MARKER; occupied = 2). You need to double the size of the cache.

(4) Why is the expansion performed at 3/4

In data structures such as hashes, there is a concept used to indicate how many empty Spaces there are called the load factor – the larger the load factor, the fewer free Spaces there are, the more conflicts there are, and the performance of the hash table deteriorates

When the load factor is 3/4, the space utilization is relatively high, and considerable Hash conflicts are avoided, which improves the space efficiency

Why is the default load factor for reading HashMap 0.75?

④ Why is bucket data lost?

The reason is that during capacity expansion, the original memory is cleared, and then the memory is applied for again

⑤ Is the method cache in order?

Because the storage of SEL-IMP is calculated by hash algorithm subscript, its calculated subscript may have stored SEL, so it needs to recalculate the hash subscript through hash collision algorithm, so the subscript is random, and not fixed

⑥ The relationship between bucket and mask, capacity, SEL and IMP

classclsHave attributescache_t.cache_tIn thebucketsThere are multiplebucket— Stores method implementationsimpAnd method numberselEquivalent to thekeyvaluecache_key_t
maskforbucketIs mainly used in the cache lookup of the hash algorithm
capacityYou can getcache_tIn thebucketThe number of

The main purpose of caching is to allow the compiler to execute the logic of sending messages faster through a series of policies

Write in the back

Study harmoniously without being impatient. I’m still me, a different color of fireworks.