The profile

This article will analyze the data structure of Objective-C objects from a source point of view. Basic understanding of objective-C is required to read this article. The source code for this article is from ObjC4-706, available for download on this page. A runnable Runtime source is also attached.

Objective-c object definition

Objective-c is an object-oriented language, and NSObject is the base class for all classes. We can open the nsobject. h file and see the NSObject class definition as follows:

@interface NSObject <NSObject> {
    Class isa  OBJC_ISA_AVAILABILITY;
}
Copy the code

It says that an NSObject has a member variable of type Class, so what does Class mean? Objc-private. h: objc-private.h: objc-private.h: objc4-706

typedef struct objc_class *Class;
typedef struct objc_object *id;
Copy the code

(struct objc_class) (struct objc_class) (struct objc_class)

The id type is a pointer to a struct objc_object defined in C, and we know that we can declare an object with an ID, So this also shows that objective-C objects are actually struct objc_Object.

In objC4-680 we jump to objc_class definition:

// Note: The methods defined in the structure are not listed here
struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
};
Copy the code

Objc_class inherits from Objc_Object, so a class in Objective-C is itself an object, except that there are three additional member variables in addition to the member variables defined in objc_Object: Superclass, cache, and bits.

So the most basic data structure in Objective-C is struct objc_object. The objc_object structure is defined as follows:

Struct objc_object {private: isa_t isa; }; union isa_t { isa_t() { } isa_t(uintptr_t value) : bits(value) { } Class cls; uintptr_t bits; #if __arm64__ # define ISA_MASK 0x0000000ffffffff8ULL # define ISA_MAGIC_MASK 0x000003f000000001ULL # define ISA_MAGIC_VALUE 0x000001a000000001ULL struct { uintptr_t nonpointer : 1; uintptr_t has_assoc : 1; uintptr_t has_cxx_dtor : 1; uintptr_t shiftcls : 33; // MACH_VM_MAX_ADDRESS 0x1000000000 uintptr_t magic : 6; uintptr_t weakly_referenced : 1; uintptr_t deallocating : 1; uintptr_t has_sidetable_rc : 1; uintptr_t extra_rc : 19; # define RC_ONE (1ULL<<45) # define RC_HALF (1ULL<<18) }; #elif __x86_64__ // Definition for x86_64, not listed, check source code in objc-private.h. # else # error unknown architecture for packed isa #endifCopy the code

Note: The source code has been simplified above, the source code has multiple conditional compilation instructions. In order to remove any confusion when reading the source code, here are a few conditional compilation macros in the source code (if you haven’t read the source code, don’t worry about the following explanation) :

  1. SUPPORT_PACKED_ISA: indicates whether the platform supports inserting information other than Class into isa Pointers. If it does, it puts the Class information into the struct defined by ISA_T, along with some other information, such as the nonpointer above, etc. If not, the struct defined in ISA_T is not used, in which case isa_T only uses CLS (Class pointer). On iOS and MacOSX, SUPPORT_PACKED_ISA is defined as 1.
  2. __arm64__ and __x86_64__ indicate the CPU architecture. For example, computers generally use the __x86_64__ architecture, and mobile phones generally use the ARM architecture. In this case, 64 indicates the 64-bit CPU. Only the __arm64__ schema definitions are listed above.
  3. For the ISA pointer to SUPPORT_PACKED_ISA(see point 1), SUPPORT_INDEXED_ISA indicates whether the Class information stored in ISA_t is the address of the Class or an index that can be used to look up the Class structure address in the Class information table. SUPPORT_INDEXED_ISA is 0 on iOS devices.
  4. In C/C++, the Union is a union-by-union. In C/C++, the Union is a union-by-union. .

This section takes a look at the definition of classes, objects and related data structures in Objective-C through source code. It can be seen that isa_T structure is very critical. The isa_T structure is analyzed below.

Second, in-depth understanding of ISA_T

The definition of ISA_t in the __arm64__ schema has been given above, and the analysis continues with the definition in the __arm64__ schema (__x86_64__ schema is very similar). The following figure shows the memory layout of the ISA_T structure with the __arm64__ architecture:

Member variables of struct in isa_T structure are all drawn in the figure above. The meanings of each field are analyzed one by one as follows:

Nonpointer: indicates whether pointer optimization is enabled for isa Pointers

Before explaining the significance of Nonpointer, let’s take a quick look at apple’s proposed optimization for 64-bit devices to save memory and improve execution efficiency: Tagged Pointer.

Imagine storing an NSNumber object with a value of an NSInteger on both 32-bit and 64-bit devices.

First, let’s analyze the memory footprint:

  1. Read and write a pointer to an NSNumber object. On 32-bit devices, a pointer requires 4 bytes. On 64-bit devices, a pointer requires 8 bytes.

  2. Memory that stores the value of the NSNumber object. On 32-bit devices, NSInteger occupies 4 bytes. On 64-bit devices, NSInteger takes up 8 bytes.

  3. Objective-c memory management takes the form of reference counting, and we need to use extra space to store reference counting. If reference counting uses NSInteger, then 64-bit devices use 4 bytes more than 32-bit devices.

In addition, for efficiency’s sake, reference counts, lifecycle identifiers, etc., are stored elsewhere, and there is a lot of processing logic (such as dynamically allocating memory for reference counts, etc.).

In general, 32 bits is enough to store the integer and pointer addresses we’re used to, so on 64-bit devices, 32 bits of address space is wasted, Storing an NSNumber object with an NSInteger value wastes 8 bytes of space (4byte pointer and 4byte value).

In order to save memory and improve the execution efficiency of the program, Apple proposed Tagged Pointer. Tagged Pointer simply means using the memory space that stores Pointers to store actual data.

For example, if the NSNumber pointer takes up 8 bytes of memory space on a 64-bit device, pointer optimization can put the value of the NSNumber into the 8 bytes that store the address of the NSNumber pointer by some rule. This saves memory by reducing the 8byte memory space required for NSInteger.

In addition, Tagged Pointer is no longer an object Pointer. It stores actual data and is just a common variable. Therefore, its memory does not need to be calloc/free on the heap, thus improving memory read efficiency. However, Tagged Pointer is not a valid object Pointer, so we cannot obtain ISA information through Tagged Pointer. A more detailed introduction to Tagged Pointer can be referred to: Understanding Tagged Pointer- Tang Qiao, I will not give an in-depth introduction here.

Now that you understand Tagged Pointer, look at the nonpointer variable. The nonpointer variable takes up 1bit of memory and can have two values: 0 and 1, representing different isa_t types:

  1. 0 indicates that pointer optimization is not enabled in ISA_t and the structure defined in ISA_t is not used. Isa accessing objc_Object directly returns the CLS variable in the ISA_T structure, which points to the structure of the class to which the object belongs and takes up 8 bytes on 64-bit devices.

  2. 1 means that isa_t has Pointer optimization enabled and cannot directly access the ISA member variable of objc_Object (isa is no longer a valid memory Pointer, see Tagged Pointer). This isa is no longer a pointer, as you can tell from its name, nonpointer. But ISA contains information about classes, reference counts of objects, and so on, making full use of memory on 64-bit devices.

The structure of ISA_T whose nonpointer is 1 is the structure internally defined by ISA_T, which contains the class information of the object and reference count. It can also be seen that pointer optimization reduces memory usage, and object association information such as reference count is stored in ISA_T, which also reduces a lot of logic to obtain object information and improves execution efficiency.

Shiftcls:

Store the value of a class pointer. With pointer optimization turned on, 33 bits are used to store class Pointers in the ARM64 architecture.

The rest of the variables

A few other variables are easy to understand and won’t be covered too much here.

  1. Has_assoc This variable is associated with an associated reference to an object. When an object has an associated reference, additional logic is required to release the object. Associative references are what we normally set to objects using the objc_setAssociatedObject method. We won’t do too much analysis of associative references here, but we’ll dig into the code for associative references later when we have time to write associative reference implementations.

  2. Has_cxx_dtor indicates whether the object has a destructor for C++ or Objc. If it has a destructor, the destructor logic needs to be done. If not, the object can be freed faster.

  3. Magic is used to determine whether an object has been initialized. In ARM64 0x16 is the space where the debugger determines whether the current object is a real object or not initialized (in x86_64 this value is 0x3b).

  4. Weakly_referenced Indicates whether an object is pointed to or used to point to an ARC weak variable, and objects without weak references can be released faster.

  5. Deallocating indicates whether the object is freeing memory.

  6. Extra_rc represents the referential count of this object, which is actually the referential count minus 1. For example, if the object’s reference count is 10, extra_rc is 9. If the reference count is greater than 10, the following has_sideTABLE_rc is used.

  7. Has_sidetable_rc When the object reference technique is greater than 10, this variable is borrowed to store carry (similar to carry borrowing in addition and subtraction operations).

  8. ISA_MAGIC_MASK Obtains the magic value in mask mode.

  9. ISA_MASK Obtains the class pointer value of ISA in mask mode.

  10. RC_ONE and RC_HALF are used for reference count calculations.

Struct objc_object method

Looking at the struct objc_Object source code, we can see that many methods are defined in it. For example, the two methods related to class Pointers in ISA_t:

Class ISA();
Class getIsa();
Copy the code

Why do I need to define methods to manipulate ISA Pointers? This is only because the variable operation on ISA_t encapsulates the method because the isa with pointer optimization enabled is no longer a valid pointer. We cannot directly manipulate the object’s ISA pointer, but only through the method to perform the corresponding operation.

Objects, classes, metaclasses

NSObject, objc_Object, objc_class, NSObject, objc_object, NSObject, objc_object, NSObject, objc_object, NSObject, objc_object, NSObject, objc_object, NSObject, objc_object, NSObject So Class is also an object in Objective-C. The second part takes a deeper look at the type of the only member variable in objc_Object, ISA_t, and discusses that isa_T holds Pointers to the class to which the object belongs. In Part 3 we discuss the relationship between objects, classes, and metaclasses (meta-classes, covered later) in Objective-C.

Objc_class inherits from objc_Object, so objc_class has an ISA member variable of type ISA_t. What does shiftcls mean by isa_t? A new concept is introduced here: metaclasses. Shiftcls in objc_class’s ISA_t refers to the metaclass of Objc_Object. What is a metaclass?

Objc_class has three member variables, in addition to isa_t, which is derived from objc_Object:

  1. Class superclass: This variable points to objc_class of the superclass;
  2. Cache_t cache: This variable holds a cache of instance methods to improve each execution;
  3. Class_data_bits_t bits: Holds all the methods of the instance.

We won’t go into the objective-C Runtime method lookup here. However, through the explanation of the above three attributes, we can see that the list of methods that an object can call is stored in the object’s class structure. If the list of methods is stored in an object structure, then every time you create an object, you have to add a list of instance methods, which consumes too much resources, so it is stored in the class structure. However, in addition to instance methods, methods are usually stored in the metaclass mentioned above. The metaclass is also an objc_class structure that contains Pointers to ISA and superclass.

Note that the isa arrow in the figure above does not point directly to the isa pointer, but rather shiftcls in ISA_t.

Here are a few conclusions based on the above analysis and the chart above:

  1. Each class has its corresponding metaclass;

  2. The class structure of a class stores all instance methods of the class. The object obtains instance methods from the class structure through ISA. If the class structure does not have the required instance methods, the object searches the superclass structure through the superclass pointer until the Root class(class).

  3. The metaclass of a class stores all the class methods of the class. The class object obtains the class method from the metaclass structure through ISA. If there is no required class method in the metaclass structure, the superclass pointer searches the parent metaclass structure until the Root class(class).

  4. In Objective-C, Root class(class) is really just NSObject, and NSObject’s superclass points to nil.

  5. In Objective-C, all objects (instance, class, meta-class) can call instance methods of NSObject.

  6. In Objective-C, all classes and meta-classes can call NSObject’s class methods.

What is meta class in Objective-C? , this article explains meta-class in an easy-to-understand way.

Object initialization process

Part 4 will analyze the creation of an NSObject object from a source code perspective.

We know that the code for creating an NSObject object is [[NSObject alloc] init]; Another way is to use [NSObject New], and look at the source code to see that the interior is exactly the same as the first way.

+ (id)alloc { return _objc_rootAlloc(self); } _objc_rootAlloc(Class cls) { return callAlloc(cls, false/*checkNil*/, true/*allocWithZone*/); } static ALWAYS_INLINE id callAlloc(Class cls, bool checkNil, bool allocWithZone=false) { // means if (checkNil && ! cls)) return nil; // besides, if checkNil && ! cls probably to be false, "return nil" is optimized. if (slowpath(checkNil && ! cls)) return nil; // Check if CLS information is nil. If it is nil, no new object can be created and nil is returned. #if __OBJC2__ // if objective-C 2.0 or later. If (fastPath (! CLS ->ISA()->hasCustomAWZ())) {// Check if the class has a default alloc/allocWithZone implementation // No alloc/allocWithZone implementation. Go straight  to the allocator. // fixme store hasCustomAWZ in the non-meta class and // add it to canAllocFast's summary if (fastPath (CLS ->canAllocFast())) {// Whether to quickly allocate memory (false) // No ctors, raw isa, etc. Go straight to the metal. bool dtor = cls->hasCxxDtor(); id obj = (id)calloc(1, cls->bits.fastInstanceSize()); if (slowpath(! obj)) return callBadAllocHandler(cls); obj->initInstanceIsa(cls, dtor); return obj; } else { // Has ctor or raw isa or something. Use the slower path. id obj = class_createInstance(cls, 0); // Here is the key function call to create the object if (slowpath(! obj)) return callBadAllocHandler(cls); // Check whether the new object is valid. } } #endif // No shortcuts available. if (allocWithZone) return [cls allocWithZone:nil]; // CLS allocWithZone calls class_createInstance. return [cls alloc]; }Copy the code

As you can see, creating an object is basically a callAlloc function. There is a slowpath in this function, let’s look at the definition:

#define fastpath(x) (__builtin_expect(bool(x), 1))
#define slowpath(x) (__builtin_expect(bool(x), 0))
Copy the code

Where __builtin_expect(EXP, N) denotes the EXP == N compiler optimized GCC built-in function. In this way, the compiler keeps the more likely if branch code close to the previous code during compilation, reducing the performance penalty of instruction jumps. So logically, if(slowpath(x)) has the same meaning as if(x), but with compiler optimizations added.

Some of the statements are commented in the code above, and most of the statements in this method deal with zones used by new objects. Class_createInstance = class_createInstance = class_createInstance;

id class_createInstance(Class cls, size_t extraBytes) { return _class_createInstanceFromZone(cls, extraBytes, nil); } static __attribute__((always_inline)) id _class_createInstanceFromZone(Class cls, size_t extraBytes, void *zone, bool cxxConstruct = true, size_t *outAllocatedSize = nil) { if (! cls) return nil; // Check whether CLS is valid assert(CLS ->isRealized()); // Read class's info bits all at once for performance bool hasCxxCtor = cls->hasCxxCtor(); Bool hasCxxDtor = CLS ->hasCxxDtor(); Bool fast = CLS ->canAllocNonpointer(); Size_t size = CLS ->instanceSize(extraBytes); If (outAllocatedSize) *outAllocatedSize = size; if (outAllocatedSize) *outAllocatedSize = size; id obj; if (! zone && fast) { obj = (id)calloc(1, size); if (! obj) return nil; obj->initInstanceIsa(cls, hasCxxDtor); Else {if (zone) {obj = (id)malloc_zone_calloc ((malloc_zone_t *)zone, 1, size); } else { obj = (id)calloc(1, size); } if (! obj) return nil; // Use raw pointer isa on the assumption that they might be // doing something weird with the zone or RR. obj->initIsa(cls); } if (cxxConstruct && hasCxxCtor) { obj = _objc_constructOrFree(obj, cls); } return obj; }Copy the code

Obj ->initInstanceIsa(CLS, hasCxxDtor); This part of the code can be simplified according to part 2, which is as follows:

inline void objc_object::initInstanceIsa(Class cls, bool hasCxxDtor) { assert(! cls->instancesRequireRawIsa()); assert(hasCxxDtor == cls->hasCxxDtor()); initIsa(cls, true, hasCxxDtor); } inline void objc_object::initIsa(Class cls, bool nonpointer, bool hasCxxDtor) { assert(! isTaggedPointer()); if (! nonpointer) { isa.cls = cls; } else { assert(! DisableNonpointerIsa); assert(! cls->instancesRequireRawIsa()); isa_t newisa(0); newisa.bits = ISA_MAGIC_VALUE; newisa.has_cxx_dtor = hasCxxDtor; newisa.shiftcls = (uintptr_t)cls >> 3; }}Copy the code

In the above code, newisa.bits = ISA_MAGIC_VALUE; Is to assign an initial value to isa structure. The value of ISA_MAGIC_VALUE is 0x001d800000000001ULL. Through the structure analysis of ISA_T in the second part, We know that this assignment only assigns to the nonpointer and magic parts.

newisa.shiftcls = (uintptr_t)cls >> 3; The address of the Class is stored in the isa structure of the object. The main reason for moving the right three bits is to remove the last three bits of the Class pointer from memory. The last three bits of the Class pointer are meaningless zeros because the pointer is aligned to memory by 8 bits. A detailed explanation of class pointer alignment can be found in isa from NSObject initialization.

After initializing isa, the [NSObject alloc] work is done.

- (id)init {
    return _objc_rootInit(self);
}

id _objc_rootInit(id obj) {
    return obj;
}
Copy the code

As you can see, init just returns the pointer to the newly created object, with no extra logic.

That’s where all the logic for creating a new object ends.

5. Reference materials

1. Objective-c Runtime Day 1 — ISA and Class

2. Learn about ISA from NSObject initialization

3. Understand Tagged Pointer

4.ObjC Runtime source Code Reading Notes (1)

5.What is a meta-class in Objective-C?

Note: The content of this article does not represent the authority, any questions can be exchanged. Please indicate the original address for reprinting.