We all know that the ISA pointer points to a class object. What else does it store?

Union & bitfield

Before we look at unions, we need to look at unions and bitfields. Because that’s what isa’s data structure is made of. Many of the Runtime’s data structures are federation based.

A consortium

All members of a consortium (community) occupy the same memory segment, and modifying one member affects all other members. When we talk about the meaning of union, we have to refer to struct, because they are defined in exactly the same form.

The difference between the two is that each member of the structure occupies different memory blocks and has no effect on each other. However, all members of a consortium occupy the same memory block. Modifying one member will affect the value of the entire memory block.

union test_a { long a; int b; char c[9]; }; int main(int argc, const char * argv[]) { @autoreleasepool { union test_a test = {0}; NSLog(@"\ntest_a size is %ld", sizeof(test)); test.a = 97; NSLog(@"\na is : %ld\nb is %d \nc is %s", test.a, test.b, test.c); test.b = 122; NSLog(@"\na is : %ld\nb is %d \nc is %s", test.a, test.b, test.c); } return 0; } // output: 2021-09-02 22:30:33.201485+0800 OBJc [73997:948863] test_a size is 16 2021-09-02 22:30:34.658421+0800 OBJc [73997:948863] A is: 97 b is 97 C is a 2021-09-02 22:30:34.658615+0800 objc[73997:948862] a is: 122 b is 122 c is z Program ended with exit code: 0Copy the code

The above defines a union of test_A with three member variables. The size of the member variable c is 9 bytes. The size of the member variable C must be divisible by other member variables. Therefore, the current test_A requires 16 bytes. After assigning a, you can see that the values of a, B, and C have changed to 97 (c has changed to A, see ASCII).

Size of the order

Use union to view the size endian of the current system, the code is as follows:

union data{ int a; //4 bytes char b; //1 byte } ; data.a = 1; // the number of bytes is 0x 00 00 00 01 // the number of bytes is 0x 00 00 00 01 // the number of bytes is 0x 00 00 00 01 // the number of bytes is 0x 00 00 00 01 // the number of bytes is 0x 00 00 00 01 // If (1 == data.b) {// If (1 == data.b) {// If (1 == data.b) {// If (1 == data.b) {// If (1 == data.b) {// If (1 == data.b) {printf("Little_Endian\n");  } else { printf("Big_Endian\n"); }Copy the code

IOS is small end mode.

A domain

Some information does not need to occupy a complete byte, but only need one or a few binary bits; For example: when storing a switch variable, there are only 0 and 1 states, only need to use a binary bit can be stored; To save storage space,C provides a data structure called “bit fields” or “bit segments”. The so-called “bit field” is to divide the eight bits of a byte into several different regions, and describe the binary number of each region; Each bit domain has a bit domain name, allowing programmers to access according to the bit domain name in the program; This allows several different objects to be represented in a one-byte binary field;

The definition is as follows:

Struct struct struct {type descriptor 1; // Minimum; Type specifier 2-bit domain name 2: bit-field length 2; // return; Type specifier 3-bit domain name 3: bit-field length 3; . Type specifier N Bit domain name N: bit domain length M; // The highest bit; };Copy the code

You can see that the first member variable of the bitfield is low in memory and the last is high.

ISA

You can find the following code in the objC4 source code

struct objc_object { private: isa_t isa; . };Copy the code

The objc_object structure is used to represent OC objects, and isa_t is used to store ISA information.

isa_t

Isa_t isa union (common),

The isa_T data structure is as follows:

// isa_t() {} ISA_t (uintptr_t value) : bits(value) {} uintptr_t bits; private: // Accessing the class requires custom ptrauth operations, so // force clients to go through setClass/getClass by making this // private. Class cls; public: #if defined(ISA_BITFIELD) struct { ISA_BITFIELD; // defined in isa.h }; . };Copy the code

Since ISA_t isa union, data stored in bits, CLS, struct is the same. In non-nonpointer cases, isa data is stored in CLS; otherwise, ISA data is stored in bits.

Struct unwrapping is as follows:

struct {
    uintptr_t nonpointer        : 1;                           
    uintptr_t has_assoc         : 1;                           
    uintptr_t has_cxx_dtor      : 1;                           
    uintptr_t shiftcls          : 33; /*MACH_VM_MAX_ADDRESS 0x1000000000*/ 
    uintptr_t magic             : 6;                           
    uintptr_t weakly_referenced : 1;                           
    uintptr_t unused            : 1;                           
    uintptr_t has_sidetable_rc  : 1;                           
    uintptr_t extra_rc          : 19
};
Copy the code

This structure is called a bitdomain.

  • Nonpointer: indicates whether the ISA pointer is optimized. 0: indicates that the ISA pointer is not optimized. 1: indicates that the ISA pointer is optimized to store more information.
  • Has_assoc: whether there are associated objects. If so, the release speed will be slow.
  • Has_cxx_dtor: whether there is a C++ destructor, if so, it will affect the release speed is slow;
  • Shiftcls: memory address information for classes and metaclasses;
  • Magic: Used to tell if an object has not been initialized during debugging;
  • Weakly_referenced: Whether or not to be weakly referenced (weakIf yes, it will affect the release speed to slow down;
  • unused:
  • Has_sidetable_rc: Whether the reference count is too large to be stored in the sidetable;
  • Extra_rc: reference count;

shiftcls

The above illustration is the layout of ISA_t in memory. Try to extract shiftcls from it. The first 28 bits of Shiftcls store some information, the last 3 bits store some information, and the middle 33 bits of data is Shiftcls.

  1. Move 3 bits to the right first: remove the last 3 bits and fill the first 3 bits with 0;

2. Move 31 bits to the left: remove the first 31 bits and fill the last 31 bits with zeros;3. Finally move 28 bits to the right: remove the last 28 bits and fill the first 28 bits with zeros, like thisshiftclsI’m back where I started. Note: in the configuration, the blocks with gray as the bottom are all the data removed after the displacement operation, and the blocks with white as the bottom are all the data filled with 0.

Here is the LLDB debugging process: The last$3and$4The values of theta are exactly the same.

ISA_MASK

The definition of ISA_MASK

The definition of ISA_MASK can be found in isa.h of objC4’s source code

# # if __arm64__ if __has_feature (ptrauth_calls) | | TARGET_OS_SIMULATOR simulator * / / * # define ISA_MASK 0x007ffffffffffff8ULL # ...... # else /* True machine with M1 chip machine */ # define ISA_MASK 0x0000000ffffffff8ULL #...... # endif # elif __x86_64__ /* Intel chip machine */ # define ISA_MASK 0x00007ffffffff8ull #...... # endifCopy the code

How is ISA_MASK calculated?

As can be seen from the above isa_t layout in memory, if you want to extract Shiftcls from it, you can also perform bitwise and operation, which requires and above: there are 28 0’s in the front, 33 1’s in the middle, and 3 0’s in the back. Specific values: 0000000000000000000000000000111111111111111111111111111111111000, converted to hexadecimal is ffffffff8, in accordance with a filling is 8 bytes 0 x0000000ffffffff8, Exactly the same as 0x0000000ffffffff8ULL, which represents the current data type unsigned long long. Here carries on the arm64 under the real machine computation, other platform may try by oneself.

The use of ISA_MASK

Isa is stored in the first 8 bytes of the current object, as long as the first 8 bytes of the object, in &isa_mask and operation can get the memory address of the class object.

  1. x/4gx pView the data stored in memory by the current pointer P;
  2. po 0x01000001000081b9 & 0x0000000ffffffff8ULLRetrieves the first 8 bytes that the object stores in memory, and proceeds&ISA_MASKObtained by and operationPersonClass;
  3. p/x 0x01000001000081b9 & 0x0000000ffffffff8ULLView and evaluate the value in hexadecimal, which is the current valuePersonClass memory address;
  4. View the realPersonClass memory address;

As you can see, with ISA as nonpointer, you can store more data about the object, but ISA needs to perform an operation to get the memory address of the class object. Non-nonpointer refers directly to the memory address of the class object.

inEnvironment VariablesaddOBJC_DISABLE_NONPOINTER_ISAThe value is YESnonpointer.

ISA Binding Process

The ISA binding is done after the memory space is created in _class_createInstanceFromZone

  1. obj->initInstanceIsa(cls, hasCxxDtor);
  2. initIsa(cls, true, hasCxxDtor);
  3. objc_object::initIsa(Class cls, bool nonpointer, UNUSED_WITHOUT_INDEXED_ISA_AND_DTOR_BIT bool hasCxxDtor)

The first two are normal calls plus some special processing, mainly look at the third part :(omitted some code in the middle)

inline void objc_object::initIsa(Class cls, bool nonpointer, UNUSED_WITHOUT_INDEXED_ISA_AND_DTOR_BIT bool hasCxxDtor) { ASSERT(! isTaggedPointer()); isa_t newisa(0); // create isa struct if (! // If it is not nonpointer, assign CLS newisa.setClass(CLS, this); } else { ASSERT(! DisableNonpointerIsa); ASSERT(! cls->instancesRequireRawIsa()); // newisa.bits = ISA_MAGIC_VALUE; # if ISA_HAS_CXX_DTOR_BIT newisa.has_cxx_dtor = hasCxxDtor; # endif // assign CLS newisa.setclass (CLS, this); #endif // retain count newisa.extra_rc = 1; } isa = newisa; }Copy the code

If it is not nonpointer, copy CLS. If it is not nonpointer, copy CLS. If it is not nonpointer, copy CLS. After assigning HAS_cxx_dtor to ISA, CLS is next assigned.

isa_t::setClass(Class newCls, UNUSED_WITHOUT_PTRAUTH objc_object *obj) { ...... shiftcls = (uintptr_t)newCls >> 3; . }Copy the code

The isa binding is completed by moving the pointer address of the class three bits right into Shiftcls.

class

The introduction of class

Class actually can also be considered to be an object, nature is an inheritance from objc_object structure, class objects in memory only a store, store inside a class instance object’s properties and methods of protocol, data, isa point to an object class, you can directly call class objects stored in the instance of the method, each object need not store their method respectively, Can reduce memory overhead. Here is the data stored by class objects:

struct objc_class : objc_object {
    ......
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
Copy the code
  1. Like objects, class objects have oneisaThe pointer points toThe metaclass;
  2. super_classTheta points to thetaThe parent class;
  3. cache, method cache, is a hash table;
  4. bitsMethod data is storedclass_rw_t.rwStands for readwrite, which can be read and written, and stores a list of attributes inside the structureproperty_array_t, method listmethod_array_t, Protocol listprotocol_array_t. There’s also class_ro_T, which stores the class raw data,roRepresents thereadonly.

LLDB debugging explores class storage information

Get class_rw_t

Here is how the objc_class structure gets class_rw_t

class_rw_t *data() const {
    return bits.data();
}
Copy the code

Class_rw_t can be obtained from data(). LLBD can not directly convert the class to (objc_class *), so it cannot directly use objc_class data() method. As shown in the figure above, the number of bits in isa is 32 bits, and the number of bits in ISA is 32 bits. As shown in the figure above, the number of bits in ISA is 32 bits.

entsize_list_tt&list_array_tt

Before looking at property_array_t, method_ARRAY_t, and PROTOCOL_ARRAY_t, look at entsize_list_TT and list_array_TT

list_array_tt

template <typename Element, typename List, template<typename> class Ptr> class list_array_tt { ...... union { Ptr<List> list; uintptr_t arrayAndFlag; }; . }Copy the code

Property_array_t, method_array_t, and protocol_ARRAY_t inherit from list_array_TT. A list_array_TT value can have three possibilities:

  • empty
  • A pointer points to a single list
  • An array of Pointers, each pointer to a list

In particular, we should pay attention to the void attachLists(List* const * addedLists, Uint32_t addedCount) method, to understand the class, classification loading has certain help.

entsize_list_tt

Entsize_list_tt is simply a list that stores the attributes of the compiled class. The property_array_t->property_list_t and method_array_t->method_list_t one-dimensional arrays are both inherited from entsize_list_tt. See the links for details. The uint32_t I (uint32_t I) method is used to fetch elements according to the index. (Protocol_list_t does not extend entsize_list_tt?)

Explore the storage of properties in a class with class_rw_t: property_array_t

Data structure of property_array_t

class property_array_t : 
    public list_array_tt<property_t, property_list_t, RawPtr>
{
    typedef list_array_tt<property_t, property_list_t, RawPtr> Super;
 public:
    property_array_t() : Super() { }
    property_array_t(property_list_t *l) : Super(l) { }
};
Copy the code

property_array_tInherited fromlist_array_tt, where the stored elements areproperty_t, one-dimensional array data class isproperty_list_t.

Property_t data structure is as follows:

struct property_t {
    const char *name;
    const char *attributes;
};
Copy the code

Debug mode

  1. throughclass_rw_tTo obtainproperty_array_t.class_rw_tThere areproperties()Method, which can be obtained directlyproperty_array_t;
  2. throughproperty_array_tTo obtainproperty_list_t.property_array_tThere’s one in therePtrPointer storeproperty_list_tYou can get a one-dimensional array just by taking the address valueproperty_list_t;
  3. throughproperty_list_ttheget()Method to obtainproperty_t;

PersonThere’s one in the classnameProperties.

Explore the protocol list through class_rw_t protocol_array_t

Protocol_array_t Data structure

class protocol_array_t : 
    public list_array_tt<protocol_ref_t, protocol_list_t, RawPtr>
{
    typedef list_array_tt<protocol_ref_t, protocol_list_t, RawPtr> Super;
 public:
    protocol_array_t() : Super() { }
    protocol_array_t(protocol_list_t *l) : Super(l) { }
};
Copy the code

Inherited fromlist_array_tt, where the stored elements areprotocol_ref_t, one-dimensional array data class isprotocol_list_t.

Debug mode

  1. throughclass_rw_tTo obtainprotocol_array_t.class_rw_tThere areprotocols()Method, which can be obtained directlyprotocol_array_t;
  2. throughprotocol_array_tTo obtainprotocol_list_t.protocol_array_tThere’s one in therePtrPointer storeprotocol_list_tYou can get a one-dimensional array just by taking the address valueprotocol_list_t;
  3. throughprotocol_list_tTo obtainprotocol_ref_t.protocol_list_tUnlike property and method lists, they cannot be passed directlyget()Method, which is requiredprotocol_list_tIn the structurebegin()Method to get storageprotocol_ref_tthelistIs an array.
  4. throughprotocol_ref_tTo obtainprotocol_t.protocol_ref_tIt’s actually a pointer to thetaprotocol_t, just need to do a strong transformation is ok, through the pointer value, you can see the protocol related data;

KKPersonClass definitions are as follows:Get the first protocolNSCopying, the debugging method is as follows:Get the second protocolKKPersonInterface, the debugging method is as follows:

Protocol_t Data structure

struct protocol_t : objc_object {
    const char *mangledName;
    struct protocol_list_t *protocols;
    method_list_t *instanceMethods;
    method_list_t *classMethods;
    method_list_t *optionalInstanceMethods;
    method_list_t *optionalClassMethods;
    property_list_t *instanceProperties;
    uint32_t size;   // sizeof(protocol_t)
    uint32_t flags;
    // Fields below this point are not always present on disk.
    const char **_extendedMethodTypes;
    const char *_demangledName;
    property_list_t *_classProperties;
Copy the code

It stores the name of the protocol and the protocol it follows (inheritance?). , Instance methods, class methods, optional instance methods, optional class methods, properties. See method_array_t below for methods to explore and properties to explore properties above.

Explore the storage of protocols in classes with class_rw_t: method_array_t

The data structure

class method_array_t : 
    public list_array_tt<method_t, method_list_t, method_list_t_authed_ptr>
{
    typedef list_array_tt<method_t, method_list_t, method_list_t_authed_ptr> Super;
 public:
    method_array_t() : Super() { }
    method_array_t(method_list_t *l) : Super(l) { }
    const method_list_t_authed_ptr<method_list_t> *beginCategoryMethodLists() const {
        return beginLists();
    }
    const method_list_t_authed_ptr<method_list_t> *endCategoryMethodLists(Class cls) const;
};
Copy the code

Inheriting from list_array_TT, where the stored element is method_t and the one-dimensional array data class is method_list_t.

Debug mode

  1. throughclass_rw_tTo obtainmethod_array_t.class_rw_tThere aremethods()Method, which can be obtained directlymethod_array_t;
  2. throughmethod_array_tTo obtainmethod_list_t.method_array_tThere’s one in therePtrPointer storemethod_list_tYou can get a one-dimensional array just by taking the address valuemethod_list_t;
  3. throughmethod_list_ttheget()Method to obtainmethod_t;
  4. throughmethod_tTo obtainSEL.method_twithproperty_tThe data structure is not quite the same enough to print out the internal data directly,method_tOne of thename()Method can be obtained directlySEL.

Through debugging, you can see that the class is stored in the instance method, no class method.Class methods are stored in metaclasses.

The metaclass

The isa of a class refers to the metaclass, which can be knocked down when the methods of the class are stored in the metaclass. Metaclasses are not found in our own code and can be seen in build artifacts.Also throughobject_getClassGet the metaclass information, you can see that the class name isKKPersonBut their memory addresses are different;

Class inheritance relationships

Class inheritance is pretty straightforward, subclasses inherit from their parent, and their parent inherits from their parent, all the way to NSObject, and NSObject’s parent is nil.

The inheritance of metaclasses

A subclass’s metaclass inherits from its parent, which inherits from its parent metaclass, all the way toNSObjectYuan class, andNSObjectThe superclass metaclass of is nil.

Isa points to superclass inheritance

The following is the official Apple ISA reference and superclass inheritance relationship, where the solid line is the class inheritance relationship, including metaclass, dotted line is the ISA reference relationship, including metaclass;

Isa to

  1. An Instance’s ISA points to a Class;
  2. The ISA of a Class points to a Meta Class;
  3. The ISA of the Meta Class points to the root Meta Class, the metaclass of NSobject.
  4. Isa with Meta Class points to itself;

Class inheritance

  1. Subclasses inherit from their parent class, all the way to the root class, and then inherit to nil;
  2. The subclass metaclass inherits from the parent metaclass, all the way to the root metaclass, and the subclass metaclass inherits from NSObject.

The resources

  • C Language Common (Union)
  • C language union
  • [embedded R notes] Large end small solution (including code and detailed comments)