First, the nature of the object

1. Introduction to Clang

Clang is an Apple-led, LLVM-based C/C++/Objective-C compiler.

Clang has the following advantages:

  • Faster compilations: Clang compiles significantly faster than GCC on some platforms.
  • Smaller footprint: ClANG-generated AST typically takes up about one-fifth of the memory of GCC.
  • Modular design: Clang has a library-based modular design that facilitates IDE integration and reuse for other purposes.
  • Diagnostic information is readable: Clang creates and retains a lot of detailed metadata during compilation, which is better for debugging and error reporting.
  • The design is clearer and simpler, easy to understand, and easy to expand and strengthen. Compared to GCC, which has an older code base, the learning curve is much flatter.

We can see the underlying implementation of OC code through the restoration of Clang!

2. Simple use of Clang

Create a new project, select the Command Line Tool for macOS, and create an HPerson object:

Then compile at the terminal:

clang -rewrite-objc main.m -o main.cpp
Copy the code

Compile successfully, get a main.cpp:

If we create an app project that references the system’s dynamic library and then compiles it, we will get an error:

Then you can use the linked compilation method:

// clang-x objective-c-rewrite-objc-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator SDK compile the file name. Suffix // example clang-x objective-c-rewrite-objc-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator.sdk main.mCopy the code

Compile successfully, get a main.cpp:

More instructions can be explored with -help:

clang -help
Copy the code

3. View the c++ source code

  • object

Open the first generated main.cpp and search for HPerson:

#ifndef _REWRITER_typedef_HPerson
#define _REWRITER_typedef_HPerson
typedef struct objc_object HPerson;
typedef struct {} _objc_exc_HPerson;
#endif

extern "C" unsigned long OBJC_IVAR_$_HPerson$_HName;
struct HPerson_IMPL {
	struct NSObject_IMPL NSObject_IVARS;
	NSString *_HName;
};
Copy the code

Discover the underlying nature of objects as structures!

So what is NSObject_IVARS?

Take a look inside:

struct NSObject_IMPL {
	Class isa;
};
Copy the code

Discovered that NSObject_IVARS is ISA!

At the same time:

typedef struct objc_object HPerson;
Copy the code

It also confirmed the conclusion reached when exploring the bottom layer in alloc Exploration of OC Object Principle:

NSObject at the bottom is objc_Object.

You can also look at the bottom of Class:

typedef struct objc_class *Class;
Copy the code

Means Class is a structure pointer!

Id is also similar to Class:

typedef struct objc_object *id;
Copy the code

So id can be used without * :

id person;
Copy the code
  • attribute

We continue our search for HName:

/ / getter method
static NSString * _I_HPerson_HName(HPerson * self, SEL _cmd) { return (*(NSString **)((char *)self + OBJC_IVAR_$_HPerson$_HName)); }

/ / setter methods
static void _I_HPerson_setHName_(HPerson * self, SEL _cmd, NSString *HName) { (*(NSString **)((char *)self + OBJC_IVAR_$_HPerson$_HName)) = HName; }
Copy the code

It finds the getter and setter methods for the property and finds that the method has two hidden arguments:

HPerson * self // A pointer to the object itself
SEL _cmd // A pointer to a method
Copy the code

Now look at the return in the getter:

return (*(NSString **)((char *)self + OBJC_IVAR_$_HPerson$_HName));
Copy the code

This is the address of the object plus the pointer to the member variable address, get the member variable memory address!

Setter’s return does the same thing, get the memory address of the member variable and assign it!

Bit-field and union

1, a domain

Let’s take a look at this structure:

struct HCar1 {
    BOOL front;
    BOOL back;
    BOOL left;
    BOOL right;
};
Copy the code

The memory occupied by BOOL type is 1 byte. According to the memory alignment principle, the memory size of HCar1 is 4 bytes.

A byte is 8 bits, and a byte is 32 bits.

Use 0 and 1 for each direction of a car, so we can use 4 bits to meet the requirements, that is, 1 byte at most! As shown below:

If you use 4 bytes, you waste 3 bytes of space!

So how do you optimize?

We can use bitfields:

struct HCar2 {
    BOOL front: 1;
    BOOL back : 1;
    BOOL left : 1;
    BOOL right: 1;
};
Copy the code

The number after the colon represents how many bits the member variable takes up!

So HCar2 is 1 byte in size! As shown below:

This greatly optimizes memory!

2. Consortium

In a structure:

struct HTeacher1 {
    char * name;
    int age;
    double height;
};
Copy the code

All member variables have their own memory space and have their own values! As shown below:

But what if it’s a union?

Only one member variable has the correct value. Why?

Print the memory address of a member variable:

(lldb) p &teacher2.name
(char **) $10 = 0x000000030412b3e8
(lldb) p &teacher2.age
(int *) $11 = 0x000000030412b3e8
(lldb) p &teacher2.height
(double *) $12 = 0x000000030412b3e8
Copy the code

You’ll find that the memory address is the same!

So all member variables of the union use the same memory address, and only one value is correct at a time!

3, summarize

All variables in a struct are “co-existing”!

Advantages are “tolerant”, comprehensive;

The disadvantage is that the allocation of struct memory space is extensive, regardless of use, full allocation.

In a union, the variables are mutually exclusive!

The disadvantage is not enough “inclusive”;

But the advantage is that memory usage is more delicate and flexible, and also saves memory space

3. Analysis of nonPointerIsa

1. What is nonPointerIsa

In the alloC exploration of OC object principle, we explore alloc.

After the object allocates memory in alloc, the class is bound, that is, isa is created:

obj->initIsa(cls);
Copy the code

Follow up:

inline void 
objc_object::initIsa(Class cls)
{
    initIsa(cls, false, false);
}
Copy the code

Still just the transit method, follow up to the initIsa method:

inline void objc_object::initIsa(Class cls, bool nonpointer, UNUSED_WITHOUT_INDEXED_ISA_AND_DTOR_BIT bool hasCxxDtor) { ASSERT(! isTaggedPointer()); isa_t newisa(0); if (! nonpointer) { newisa.setClass(cls, this); } else { ASSERT(! DisableNonpointerIsa); ASSERT(! cls->instancesRequireRawIsa()); #if SUPPORT_INDEXED_ISA ASSERT(cls->classArrayIndex() > 0); newisa.bits = ISA_INDEX_MAGIC_VALUE; // isa.magic is part of ISA_MAGIC_VALUE // isa.nonpointer is part of ISA_MAGIC_VALUE newisa.has_cxx_dtor = hasCxxDtor; newisa.indexcls = (uintptr_t)cls->classArrayIndex(); #else newisa.bits = ISA_MAGIC_VALUE; // isa.magic is part of ISA_MAGIC_VALUE // isa.nonpointer is part of ISA_MAGIC_VALUE # if ISA_HAS_CXX_DTOR_BIT newisa.has_cxx_dtor = hasCxxDtor; # endif newisa.setClass(cls, this); #endif newisa.extra_rc = 1; } // This write must be performed in a single store in some cases // (for example when realizing a class because other threads // may simultaneously try to use the class). // fixme use atomics here to guarantee single-store and to // guarantee memory order w.r.t. the class index table // ... but not too atomic because we don't want to hurt instantiation isa = newisa; }Copy the code

Here is how to create isa!

Based on the last assignment to isa, newisa, look at isa_t:

union isa_t {
    isa_t() { }
    isa_t(uintptr_t value) : bits(value) { }

    uintptr_t bits;

private:
    // Accessing the class requires custom ptrauth operations, so
    // force clients to go through setClass/getClass by making this
    // private.
    Class cls;

public:
#if defined(ISA_BITFIELD)
    struct {
        ISA_BITFIELD;  // defined in isa.h
    };

    bool isDeallocating() {
        return extra_rc == 0 && has_sidetable_rc == 0;
    }
    void setDeallocating() {
        extra_rc = 0;
        has_sidetable_rc = 0;
    }
#endif

    void setClass(Class cls, objc_object *obj);
    Class getClass(bool authenticated);
    Class getDecodedClass(bool authenticated);
};
Copy the code

Find isa_t as union, i.e. Union!

A class is a pointer, 8 bytes, 64 bits, if only used to store Pointers is not too wasteful!

In OC, there are many other things in the class besides Pointers, such as whether they are being released, reference counts, weak, associated objects, destructors, etc. This is called nonPointerIsa, and is no longer a simple pointer address ISA.

2. Isa memory distribution

So what exactly is stored in isa?

Let’s look at ISA_BITFIELD:

# if __arm64__
// ARM64 simulators have a larger address space, so use the ARM64e
// scheme even when simulators build for ARM64-not-e.
#   if __has_feature(ptrauth_calls) || TARGET_OS_SIMULATOR
#     define ISA_BITFIELD                                                      \
        uintptr_t nonpointer        : 1;                                       \
        uintptr_t has_assoc         : 1;                                       \
        uintptr_t weakly_referenced : 1;                                       \
        uintptr_t shiftcls_and_sig  : 52;                                      \
        uintptr_t has_sidetable_rc  : 1;                                       \
        uintptr_t extra_rc          : 8
#   else
#     define ISA_BITFIELD                                                      \
        uintptr_t nonpointer        : 1;                                       \
        uintptr_t has_assoc         : 1;                                       \
        uintptr_t has_cxx_dtor      : 1;                                       \
        uintptr_t shiftcls          : 33; /*MACH_VM_MAX_ADDRESS 0x1000000000*/ \
        uintptr_t magic             : 6;                                       \
        uintptr_t weakly_referenced : 1;                                       \
        uintptr_t unused            : 1;                                       \
        uintptr_t has_sidetable_rc  : 1;                                       \
        uintptr_t extra_rc          : 19
#   endif

# elif __x86_64__
#   define ISA_BITFIELD                                                        \
      uintptr_t nonpointer        : 1;                                         \
      uintptr_t has_assoc         : 1;                                         \
      uintptr_t has_cxx_dtor      : 1;                                         \
      uintptr_t shiftcls          : 44; /*MACH_VM_MAX_ADDRESS 0x7fffffe00000*/ \
      uintptr_t magic             : 6;                                         \
      uintptr_t weakly_referenced : 1;                                         \
      uintptr_t unused            : 1;                                         \
      uintptr_t has_sidetable_rc  : 1;                                         \
      uintptr_t extra_rc          : 8
      
# else
#   error unknown architecture for packed isa
# endif
Copy the code

Found ISA_BITFIELD is a bit field!

Take arm64 as an example:

Nonpointer: indicates whether pointer optimization is enabled for isa Pointers. 0: indicates pure ISA Pointers. 1: indicates not only the address of the class object, but also the class information and reference count of the object.

Has_assoc: flag bit of the associated object. 0 does not exist and 1 exists.

Has_cxx_dtor: does the object have a destructor for C++ or Objc? If it has a destructor, the destructor logic needs to be done. If not, the object can be freed faster.

Shiftcls: Stores the value of the class pointer. With pointer optimization turned on, 33 bits are used to store class Pointers in the ARM64 architecture.

Magic: Used by the debugger to determine whether the current object is a real object or has no space to initialize.

Weakly_referenced: A weak variable that records whether an object is pointed to or used to point to an ARC. Objects without weak references can be released faster.

Deallocating: Indicates whether the object is freeing memory.

Has_sidetable_rc: When the object reference technique is greater than 10, this variable is borrowed to store carry.

Extra_rc: When representing the reference count value of this object, it is actually the reference count value minus 1, for example, if the object’s reference count is 10, then

Extra_rc to 9. If the reference count is greater than 10, has_sideTABLE_rc is used.

4. Isa derivation class

1. Obtain class by mask

Print the memory of an object and convert ISA to base 2:

(lldb) x/4gx p
0x10895e9f0: 0x011d800100008275 0x0000000000000000
0x10895ea00: 0x0000000000000000 0x0000000000000000
(lldb) p/t 0x011d800100008275
(long) $2 = 0b0000000100011101100000000000000100000000000000001000001001110101
Copy the code

Found that the 64 bits are not full, there is still a lot of space!

When we print the class, we find it is different from isa:

p/x p.class
(Class) $3 = 0x0000000100008270 LGPerson
Copy the code

So how does isa turn into a class?

In the OC object principle exploration, there isa description of the alignment article, need isa & ISA_MASK can!

p/x 0x011d800100008275 & 0x00007ffffffffff8ULL
(unsigned long long) $4 = 0x0000000100008270
Copy the code

Why is that?

Because there is more information in ISA than just class information, and you need to get the class information through the mask!

As shown below:

The class information is exposed when the mask is placed over isa!

2. Get class by bit operation

In the initIsa method you can find:

If not nonpointer, setClass; Otherwise, bit-field assignment!

So we can get class by bitfield!

The following uses X86_64 as an example:

#   define ISA_BITFIELD                                                        \
      uintptr_t nonpointer        : 1;                                         \
      uintptr_t has_assoc         : 1;                                         \
      uintptr_t has_cxx_dtor      : 1;                                         \
      uintptr_t shiftcls          : 44; /*MACH_VM_MAX_ADDRESS 0x7fffffe00000*/ \
      uintptr_t magic             : 6;                                         \
      uintptr_t weakly_referenced : 1;                                         \
      uintptr_t unused            : 1;                                         \
      uintptr_t has_sidetable_rc  : 1;                                         \
      uintptr_t extra_rc          : 8
Copy the code

Shiftcls is a member variable that stores class information, preceded by 3 bits and followed by 17 bits, namely:

Then we can first move 3 bits to the right to zero the first 3 bits, then move 20 bits to the left, then zero the last 17 bits, and finally move 17 bits to the right to return to the original position, and then we can get the class information:

(lldb) x/4gx p
0x108a09980: 0x011d800100008275 0x0000000000000000
0x108a09990: 0x0000000000000000 0x0000000000000000
(lldb) p/x 0x011d800100008275 >> 3
(long) $1 = 0x0023b0002000104e
(lldb) p/x 0x0023b0002000104e << 20
(long) $2 = 0x0002000104e00000
(lldb) p/x 0x0002000104e00000 >> 17
(long) $3 = 0x0000000100008270
(lldb) p/x p.class
(Class) $4 = 0x0000000100008270 LGPerson
Copy the code