Preface:

We’ve already looked at the structure of a class, which has isa, superClass (isa takes a bitmap, inheritance chain), bits (isa stores methodList, properties,protocol), and all that’s left in the class is cache, which is what the name implies, so in the class, What is cached? This blog post explores class caching and its underlying principles.

On cache Structure — What is cache? What about the data structure?

This picture is posted first, whether you look at it or not, he is here. The general flow of thought is that it goes down here.

Method: In the same way we explored bits, let’s explore cache

As you can see from the class structure, it’s after the 16th byte (isa and superClass before it), so it’s shifted by 16 bits. The results are as follows: You can see that bucketsAndMaybeMask, _maybeMask, and _originalPreoptCache are mainly included.

You should be curious, validate, and find the structure that defines the cache:

Supplement:

LP64 macOS, this is a consortium, the consortium is mutually exclusive, so obviously go up, not go originalPreoptCache. So obviously the total cache usage is 8 + 8 = 16.

Analysis:

In this case, if it is caching, it must cache attributes, methods, etc., then there must be SEL or IMP, in this case there must be methods in the implementation. After a little bit of browsing, sure enough, there are a lot of methods, such as open up space, such as empty, and insert. So when you cache it, you have to insert it before there’s a cache. The insert method is now found.

After the above investigation, the structure diagram can be roughly drawn as follows:

Storage through the LLDB verification method

When we get the bucketsAndMaybeMask, we find that the structure is different from the previous one, and printing the value is not possible to get the value, so we get stuck. However, from our previous experience in debugging bits, we can find the corresponding implementation method to get the value.

Actually here to explore my feeling is constantly trying to, through the inside of the print method to get, just not so blind looking for printing, all the way to follow, concise and quickly discovered the buckets in class () this method, and this method looks like a kind of structure construction method of initialization, so try to print, sure enough.

Pay attention to

The buckets() method here is called by $2, which is cache, not bucketsAndMaybeMask, and its method is on line 460.

Print $4 print has no data, because no method is called, no cache is generated. This confirms that cache stores methods, not other data.

So at this point we’re going to call the method in the class, print again.

supplement

There are two ways to print the method.

Structures are called by -> and objects are called by dots.

Pay attention to

You can’t print it out by clicking buckets(), you can actually see a 1 in Occpied, which means there is a method cache, but you can print it out by clicking Buckets [1]. When you print it out, you look through the source code, you have to pass two parameters internally, the first one is base and if you don’t know what to pass, you can pass it nil.

Analysis of the

Oc has its own method cache. The access method is stored in the hash array. The hash is a combination of the list and array, convenient to add, delete and check, the internal is disordered.

Break away from the source search method

Disadvantages of LLDB debugging

1. Threshold or need to have a deeper understanding of the bottom layer, have a keen perception of the method, debugging is more time-consuming. 2. Every time you need to print, layer upon layer.

I’m just going to post Miss Cocci’s right here.

Method to cache the underlying resolution

A cache_t is a pointer to a struct, which is first shifted by the pointer to the method area to find the corresponding method, which is analyzed above because it is an array of hashes, which is actually shifted by the pointer. And the way to save is also by hashing. When there is no cache in the method area, you create a bucket first, and each bucket has the address of sel and IMP. If you use the cache, the cache will grow larger and larger than the defined size. This does not fit the definition of a structure. Access insertion process, internal conflict will do dowhile circulation to prevent the hash, if there is no more than the container size, internal is the default than 3/4 solution space, do twice expansion, at the time of entering, there will be a judgment, is the first to enter or expansion, for the first time, the default is 1 < < 2, 1 left 2 units, is 4, internal made 1 oc operation is 3, If the system is expanding the capacity, the system calls an empty operation to directly clear the memory address. The reason is that after the existence is opened up, it can not be changed. It can only be added through pseudo new, that is, delete and add again. The reason is that the original method is not added again. Array copy translation needs a lot of resources, and the old old method directly replace the old with the new, can save memory, easy to query.

Boy, very clear ~~!

supplement

In the previous analysis, bucketsAndMaybeMask was not explained. Here, we add that we output its address and the first address of buckets array respectively. As you can see, bucketsAndMaybeMask is the first address of buckets.

3/4 expansion: it is a load factor. Expansion at 0.75 has high space utilization and can avoid hash conflicts to a certain extent. Excessive hash conflicts will cause great pressure on the system.

Question:

In LLDB mode, only one method is called, and the value of _maybeMask is 7??

To solve

A printable review of all insert methods shows that the responseToSeletor and class methods were called before the insert method, but that the expansion operation is not performed until after the third insert method has finished. So the way it’s printed still doesn’t tell us why. A further solution is to call the sel and IMP in all buckets and their addresses before inserting, and see that the fourth method has the same IMP address as the first one. In LLDB, the first point will call the class method. Calling the class method will call the corresponding allocBucket. Then the set method will be called. In the last bucket, we store the mask as the boundary, and when our method comes in, there are already four in the space, which must be expanded at this time, so we print out 1-7

Supplementary analysis of bucket structure

As can be seen from buckets’s method, the buckets_t address is obtained by load method first, and then the original address is restored by &mask and the 16-byte address is obtained.

High 16 indicates 16 bits, big endian mode (macOs) indicates the left 16 bits, and little endian mode (ios) indicates the right 16 bits.

First, bucketAndMaybeMask is used to get the current first address of the memory, and other buckets are translated by bucketAndMaybeMask.

About insert

thinking

So far we know the entire process of inserting, through the insert method, but we still don’t know when to insert and what was done before.

To deal with

If you look at cache_t::insert, you can see that objc_cache already has a comment, that getImp gets the cache before insert, and that there is a mechanism for sending messages before get, so let’s look at objc_msgSend.

Hash conflict handling

When you insert, you might have a hash conflict, of course 3/4 is somewhat circumvent, if there is a hash conflict, then the system will take the cache_next hash, which is either forward or backward by one unit depending on the schema, until after zero you go back to mask which is the end, If you go all the way around and still have no location (the outer layer is a Dowhile loop) then bad_cache is reported.

The real machine and the simulator insert in different order

About objc_msgSend

supplement

Three ways of RunTime

1. Our custom object method –[Person say]

2. Methods provided by Nsobject -iskindof

3. Runtime API, class_getInstanceSize

Message sending mechanism

In buildSetting, set MSG to no; in buildSetting, set MSG to no; in buildSetting, set msgSend to no; Introduce #import

. Objc_msgSend (receiver Receiver,sel)

The underlying assembly’s understanding of the message sending mechanism

Go to objc source code and search for objc_msgSend globally. Find objc-msg-arm64.s and click Entry _obcj_msgSend.

1. CMP — compare p0– the address of P0, where p0 is the address of the current recipient. #0: No, no, no, no, no, no.

2.TAGGER_POINTERS LReturnZero is no message receiver. Message receiver is null.

3. If there is no comparison with #0 and there is an address to receive, then we go down.

4. P13 = [X0] X0 is the address of the register

5. GetClassFromIsa p16, use ISA to find the corresponding class.

In the preceding operations, the class is obtained through the receiver, because the class contains cache.

What did you do when you got the class?

After getting the class, he goes to the CacheLookup method, doesn’t look at the arguments, finds where the macro is defined, and starts reading this paragraph.

LDR p11,[x16,#cache], cache = 2*sizeOfPointer, cache = 16. Class = 16 ->cache_t P10 = p11 &mask p10 = p11 &mask P10 = P11 &mask p10 = P11 &mask By printing out the number #0x which is 0 to 48, you take the buckets data, because in 64-bit, the mask and the bucket are accessed at the same time, and the way they are accessed is different according to the architecture.

TBNZ p11 #0, select * from llookupPreopt where llookupPreopt = 0, select * from llookupPreopt

Eor start —— get p1 and P1 to the right of the current 7 units in P12, then go to find

Text summary :(compilation is too difficult to understand…)

After you get the class, you can find the cache by translation, because there is a bucket and a mask in the cache, which is the bucketAndMaybeMask, then you can go to the bucket mask ->bucket, go to the mask mask ->mask, In the cache_hash function of inert, using (mask_t)(value & mask), to get the cache, make sure to get the index of the first search, find the corresponding bucket based on bucket+index, If the sel from the bucket is the same as the CMD passed in, call cacheHit, and find imp^class to imp implementation, and call the call method. If it is not found, it will be shifted (–) again. The entire query is in the dowhile loop, so there is no need to hash it. In the insert case, there is a need to hash it. If it is not found, it will __objc_msgSend_unCached.

Afterword.

Ios OC low-level cache method cache has done a lot of processing, from algorithm to access, access, including the optimization of memory, which may be a lot of processing logic omission, but the main process should be this way.

To all ios developers: the revolution is not yet successful, comrades still need to work hard!!