Object memory structure

Oc object pointer is actually a structure pointer, that is, oc object into c++ code is actually a structure. Define a simple class code as follows:

@interface Person : NSObject
@property (nonatomic,assign) NSUInteger age;
@property (nonatomic,copy) NSString *name;
-(void)say;
@end

@implementation Person
-(void)say{
    NSLog(@"person say");
}
@end
Copy the code

The clang compiler converts this class into c++ code, and you can see that the corresponding Person class is actually a structure as follows:

struct Person_IMPL {
	Class isa;
	NSUInteger _age;
	NSString * _Nonnull _name;
};

Copy the code

Struct Person_IMPL struct Person_IMPL struct Person_IMPL struct Person_IMPL struct Person_IMPL struct Person_IMPL struct Person_IMPL Therefore, in fact, we usually create an object in the heap for the corresponding structure of the class to open a suitable space, and return the pointer of this space to the user. This pointer is the pointer of the object we usually operate (including the call of object method, the change of object sign). Oc simply wraps the structure pointer as a pointer of type OC (Person *).

Then use pointer strong conversion to convert oc object similar pointer to C language structure pointer to verify that oc class is actually c language structure.

Run code:

Person *p = [[Person alloc] init];
p.age = 15;
p.name = @"Mike"; Struct Person_IMPL *sp = (__bridge struct Person_IMPL *)(p); NSLog(@Person_IMPL: _age = %zd, _name = %@ , sp->_age , sp->_name);
Copy the code

Print result:

The struct Person_IMPL value accessed after the OC object type pointer is converted to the struct pointer: _age = 15, _name = MikeCopy the code

The results were verified as expected.

Since the underlying data structure of oc object is c language structure, the access principle of object attributes or members is actually the same as the access principle of c language structure member variables: data in memory is operated by pointer offset: A simpler formula can be propertyValue = objcPointer + offset. The relationship between object memory and its pointer is shown below:

We can verify that the oc object’s memory data can be accessed through pointer operations using the following code:

Person *p = [[Person alloc] init];
p.age = 15;
p.name = @"Mike"; struct Person_IMPL *sp = (__bridge struct Person_IMPL *)(p); // By offsetting the pointer, or the member address inside the structure of the oc object pointer (Person * p). long long ageAdress = (long long)((char *)sp+8); long long nameAdress = (long long)((char *)sp+16); // The break point verifies NSLog(@) with the LLDB directive debugging"= = =");
Copy the code

After breaking the point at NSLog, the value of the member variable age can be obtained when the p or SP pointer is upward offset by 8 bytes through LLDB printing relevant instructions, as shown in the figure

The p or SP pointer is offset upward by 8 bytes to obtain the member variable name

Object creation process

The initialization process of the study class must be through the official source code analysis of objC, I used the version of ObjC4-750 for analysis.

We create objects by calling the +alloc method, which calls the following two methods, which I simplify as follows

static ALWAYS_INLINE id
callAlloc(Class cls, bool checkNil, bool allocWithZone=false// This function is called with the following _class_createInstanceFromZone id obj = class_createInstance(CLS, 0);returnobj; Static __attribute__((always_inline)) id _class_createInstanceFromZone(Class CLS, class_createInstanceFromzone) size_t extraBytes, void *zone, bool cxxConstruct =true, 
                              size_t *outAllocatedSize = nil)
{
    bool hasCxxDtor = cls->hasCxxDtor();
    
    size_t size = cls->instanceSize(extraBytes);
    if (outAllocatedSize) *outAllocatedSize = size;

    id obj;
    obj = (id)calloc(1, size);
    if(! obj)return nil;
    obj->initInstanceIsa(cls, hasCxxDtor);
    
    return obj;
}


Copy the code

The function above does two things

  1. Size >= 16 and align the last 3 bits of binary to 0
  2. Convert the allocated memory pointer to (struct objc_object *It’s oursidThe pointer, this also reflectsNSObjcet *The correspondingstruct objc_object *, will be analyzed below)
  3. The pointer to the initialized ISA is returned to the caller as an object pointer

Object definition analysis

It might be easier to understand how the OC object pointer (NSObjcet *, id) corresponds to those structures in objC’s source code before analyzing the initializing ISA pointer

NSObject @interface NSObject <NSObject> {Class ISA OBJC_ISA_AVAILABILITY; } struct objc_object definition struct objc_object {Class _Nonnull isa OBJC_ISA_AVAILABILITY; }; Struct objc_class struct objc_class: objc_object {// Class ISA; Class superclass; cache_t cache; // formerly cache pointer and vtable class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags} typedef struct objc_object *id; Typedef struct objc_class *Class;Copy the code

NSObject * or ID pointer is a struct objc_object pointer, and Class type is a struct objc_class pointer. You can also see that the Class type (objc_class *) actually inherits from objc_class, which means that our Class type is also an object.

In OC, object, SuperClass, Class and MetaClass are all associated by Pointers. SuperClass corresponds to the SuperClass pointer of objc_class, and Class corresponds to the ISA pointer of objc_class. Instead, the technology used for bit-fields stores the address of the Class along with some additional information.

Isa pointer

(objc-private.h + isa.h); (objc-private.h + isa.h)

# define ISA_MASK 0x00007ffffffffff8ULL
# define ISA_MAGIC_MASK 0x001f800000000001ULL
# define ISA_MAGIC_VALUE 0x001d800000000001ULL
# define RC_ONE (1ULL<<56)
# define RC_HALF (1ULL<<7)

union isa_t {
    isa_t() { } isa_t(uintptr_t value) : bits(value) { } Class cls; uintptr_t bits; struct { uintptr_t nonpointer : 1; // uintptr_t has_assoc: 1; Uintptr_t has_cxx_dtor: 1; // Uintptr_t shiftcls: 44; // Store the memory address of Class or MetaClass in the uintptr_t magic: 6; Uintptr_t Weakly_referenced: 1; // If there is a weak reference pointer to the uintptr_t deallocating: 1; // Whether the object is releasing the uintptr_t has_sidetable_rc: 1; // Store the value of the EXTRA_rc part in a global uintptr_t extra_rc: 8// Store the reference count store (reference value = store value -1)}; };Copy the code

Isa_t is actually a shared object union: an 8-byte pointer (64-bit) = CLS = bits = struct using bitfields

Struct objc_object initializes ISA

inline void objc_object::initIsa(Class cls, bool nonpointer, bool hasCxxDtor) { assert(! isTaggedPointer());if(! nonpointer) { // Taggedpointer isa.cls = cls; }else{// not Taggedpointer, the object we usually use isa_t newisa(0); Newisa. bits = ISA_MAGIC_VALUE; Has_cxx_dtor = hasCxxDtor; // Initialize has_cxx_dtor for ISA. Shiftcls = (uintptr_t) CLS >> 3; // Update the objc_object isa pointer isa = newisa; }}Copy the code

The last three bits of the pointer are zeros when the address of the Class is converted to a 64-bit binary. The last three bits of the pointer are zeros when the address of the Class is converted to a 64-bit binary. The last three bits of the pointer are saved into the 47-bit shiftcls of ISA. By printing the address of the Class, you can see that 47 bits of memory can hold the address of the next Class that moves three bits to the right, not necessarily 64 bytes.

Relationships between objects, superclasses, classes, and metaclasses

When an object calls its instance method, it first finds the memory address of the class object through the ISA pointer and obtains the instance method by accessing its member class_datA_bits_t bits. A class method called by a class object works in the same way as an instance method by finding the memory address of the MetaClass through isa and retrieving the class method by accessing the class_datA_bits_t bits of the MetaClass. In both cases, if the method of the object cannot be found in the current class, the method of the parent class will be searched along the superClass pointer until the location is found. If the method cannot be found, the method will be dynamically analyzed or the message will be forwarded. The error that the method cannot be found will be thrown if the problem is not solved. The following picture shows the relationship between the instance object (objC) and its Class, MetaClass, and SuperClass.

A closer look at the picture above reveals several points of interest

  1. The superclass pointer to the Root class object ends up pointing to nil, and its ISA points to the Root MetaClass (Root MetaClass)
  2. The isa pointer to the Root MetaClass points to itself, and the Superclass pointer points to the Root class 3. All MetaClass isAs refer to the same object, RootMetaClass

Now that we understand the relationship between these objects, we can start to analyze how object instance methods are initialized.

Class_data_bits_t analysis

The source code to define

struct class_data_bits_t {

    uintptr_t bits;
    
    class_rw_t* data() {
        return(class_rw_t *)(bits & FAST_DATA_MASK); }}Copy the code

Class_data_bits_t bit is just a pointer. The real method is stored in the block of memory to which the pointer returned by data() points. This memory is actually a class_rw_t type value. Continue parsing the returned class_rw_T * value

struct class_rw_t { uint32_t flags; uint32_t version; const class_ro_t *ro; // Compiled class information (read-only cannot write) method_array_t methods; // Method list property_array_t properties; // Protocol_array_t protocols; // Protocol list Class firstSubclass; Class nextSiblingClass; char *demangledName;#if SUPPORT_INDEXED_ISA
    uint32_t index;
#endif
}

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
#ifdef __LP64__
    uint32_t reserved;
#endif

    const uint8_t * ivarLayout;
    
    const char * name; //类名
    method_list_t * baseMethodList; //编译其已经确定的方法(没有分类方法)
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;//属性列表

    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;

    method_list_t *baseMethods() const {
        returnbaseMethodList; }};Copy the code

Through consulting materials and source code, plus practical verification, we can know that the class attributes or methods defined by us are converted into C or C++ code by the compiler. In fact, at the bottom, a variety of structures and functions work together to generate struct class_ro_T type variables containing read-only methods and attributes. Use the Person class above as an example. I picked up some important snippets after the clang compiler instructions were translated into c++ code

Static struct /*_prop_list_t*/ {unsigned int entsize; // sizeof(struct _prop_t) unsigned int count_of_properties; struct _prop_t prop_list[2]; } _OBJC_$_PROP_LIST_Person __attribute__ ((used, section ("__DATA,__objc_const"))) = {
	sizeof(_prop_t),
	2,
	{{"age"."TQ,N,V_age"},
	{"name"."T@\"NSString\",C,N,V_name"}}}; Static struct /*_ivar_list_t*/ {unsigned int entsize; // sizeof(struct _prop_t) unsigned int count; struct _ivar_t ivar_list[2]; } _OBJC_$_INSTANCE_VARIABLES_Person __attribute__ ((used, section ("__DATA,__objc_const"))) = {
	sizeof(_ivar_t),
	2,
	{{(unsigned long int *)&OBJC_IVAR_$_Person$_age."_age"."Q", 3, 8},
	 {(unsigned long int *)&OBJC_IVAR_$_Person$_name."_name"."@\"NSString\"", 3, 8}}}; Static struct /*_method_list_t*/ {unsigned int entsize; // sizeof(struct _objc_method) unsigned int method_count; struct _objc_method method_list[5]; } _OBJC_$_INSTANCE_METHODS_Person __attribute__ ((used, section ("__DATA,__objc_const"))) = {
	sizeof(_objc_method),
	5,
	{{(struct objc_selector *)"say"."v16@0:8", (void *)_I_Person_say/** function pointer */}, {(struct objc_selector *)"age"."Q16@0:8", (void *)_I_Person_age},
	{(struct objc_selector *)"setAge:"."v24@0:8Q16", (void *)_I_Person_setAge_},
	{(struct objc_selector *)"name"."@ @ 0:8 16", (void *)_I_Person_name},
	{(struct objc_selector *)"setName:"."v24@0:8@16", (void *)_I_Person_setName_}} }; // the _class_ro_t type variable tatic struct _class_ro_t _OBJC_CLASS_RO_$_Person __attribute__ ((used, section ("__DATA,__objc_const"))) = {
	0, __OFFSETOFIVAR__(struct Person, _age), sizeof(struct Person_IMPL), 
	0, 
	"Person",
	(const struct _method_list_t *)&_OBJC_$_INSTANCE_METHODS_Person,
	0, 
	(const struct _ivar_list_t *)&_OBJC_$_INSTANCE_VARIABLES_Person,
	0, 
	(const struct _prop_list_t *)&_OBJC_$_PROP_LIST_Person}; // The following methods are extern in preparation for initializing the Person class"C" __declspec(dllexport) struct _class_t OBJC_CLASS_$_Person __attribute__ ((used, section ("__DATA,__objc_data"))) = {
	0, // &OBJC_METACLASS_$_Person,
	0, // &OBJC_CLASS_$_NSObject,
	0, // (void *)&_objc_empty_cache,
	0, // unused, was (void *)&_objc_empty_vtable,
	&_OBJC_CLASS_RO_$_Person}; static void OBJC_CLASS_SETUP_$_Person(void ) {
	OBJC_METACLASS_$_Person.isa = &OBJC_METACLASS_$_NSObject;
	OBJC_METACLASS_$_Person.superclass = &OBJC_METACLASS_$_NSObject;
	OBJC_METACLASS_$_Person.cache = &_objc_empty_cache;
	OBJC_CLASS_$_Person.isa = &OBJC_METACLASS_$_Person;
	OBJC_CLASS_$_Person.superclass = &OBJC_CLASS_$_NSObject;
	OBJC_CLASS_$_Person.cache = &_objc_empty_cache;
}

static void OBJC_CLASS_SETUP_$_Person(void ) {
	OBJC_METACLASS_$_Person.isa = &OBJC_METACLASS_$_NSObject;
	OBJC_METACLASS_$_Person.superclass = &OBJC_METACLASS_$_NSObject;
	OBJC_METACLASS_$_Person.cache = &_objc_empty_cache;
	OBJC_CLASS_$_Person.isa = &OBJC_METACLASS_$_Person;
	OBJC_CLASS_$_Person.superclass = &OBJC_CLASS_$_NSObject;
	OBJC_CLASS_$_Person.cache = &_objc_empty_cache;
}

Copy the code

It can be seen from the above source code that after the program is compiled, most of the class information has been processed by the compiler, and the rest of the work is handled by the Runtime mechanism.

The Runtime mechanism processes class information

Objc source code has a function, realizeClass, which handles compilation information and run-time information that is passed back to the real structure of the class. I’ll simplify the function to leave the source code that handles _class_ro_t and class_rw_t

static Class realizeClass(Class cls){
    
    ro = (const class_ro_t *)cls->data();
    rw = (class_rw_t *)calloc(sizeof(class_rw_t), 1);
    rw->ro = ro;
    rw->flags = RW_REALIZED|RW_REALIZING;
    cls->setData(rw);
    
    return cls;
}
Copy the code

RealizeClass (Class CLS); realizeClass(CLS); realizeClass(CLS); realizeClass(CLS); realizeClass(CLS); The class_rw_t variable was created after realizeClass processing, and the original class_ro_T pointer was assigned to the ro member of the class_rw_t variable and assigned to CLS.

Next, let’s verify the objC source code

The conditional breakpoint (CLS == Person address) is broken before realizeClass starts. The LLDB command is used to obtain the class_datA_bits_t value by the offset of the pointer. It then calls the data() method to get the corresponding pointer. By forcing the pointer to class_ro_t, the printed value matches the information previously defined for the Person class. The debugging process is shown in the following figure