The Runtime resolution 2.0

The nature of classes and objects

Runtime is one of the biggest differences between Objective-C and C. The Runtime library OC implements object-oriented and dynamic language features that C does not have. As the name suggests, Runtime refers to Runtime, the period after a program has been loaded and running on a computer system, as opposed to compile time. The Runtime itself is a C/C++ library that contains most of the data structures and methods we use in OC. It is Open Source and can be downloaded from the Apple Open Source website.

This article refers to objC4-781 version source code

Classes and objects

Classes and objects are the cornerstones of object-oriented languages. Every developer learning object-oriented languages for the first time will face a soul searching question: what are classes and objects?

In Objective-C, we use dot h and dot m files, and pass@interface @implementYou can define an Objective-C Class. When we use this Class to create an instance of the Class, the instance is called an object.

As shown, in the main method we create an object named T1, which is an instance of class Test1.

So in Objective-C, what are the underlying definitions of classes and objects? Let’s first convert main.m to main. CPP to see if we can find any clues.

OC to C++

With the clang-rewrite-objc main.m command, the main.m file is converted to main. CPP, or C++ code. Runtime itself is a Runtime containing a lot of C++ code.

Looking at the converted main.cpp, we see what looks like a class definition in the first line.

Remember thisstruct objc_class, also note that our operating environment is__OBJC2__This is also a key point to read the source code because there is still some old OBJC1 code in the OBJC4 code.

Scroll down to find our main function. Above the main function, we can find a definition like this.

From the naming, struct objc_object is the definition of the object in the Runtime, which we will verify later.

Next, let’s put our attention into the main method. Test1 *t1 = [[Test1 alloc] init]; What it has become.

Note the objc_msgSend method, which is the soul of OC and makes it a dynamic language. Void objc_msgSend(void); void objc_msgSend(void); In the form.

struct objc_object

In the CPP code file, we learn that the underlying type of an object is struct objc_object, and if we look through the file, we can see where the saying everything is an object comes from. It can be found that both Protocol, NSArray and other common data types, id and so on are struct objc_object types. So let’s go to OBJC4 source to find out what objC_Object is.

As you can see, in addition to some public functions, the private part of objc_object has only one value, a variable of type ISA_t.

Let’s find out what isa_t is again

In effect, isa_t corresponds to the ISA pointer in OBJC1. The ISA pointer has been optimized and encapsulated a bit more in OBJC2, but its key functionality remains the same, namely the inclusion of a value of Class type CLS.

The Class here isclass. It can be found above hereClassThe definition ofobjc_class

Since then, the underlying structure of both classes and objects has been found. Here’s a key point: OBJC1 and OBJC2 have different structure definitions for classes and objects. We should refer to objc-private,objc-runtime-new, etc. Objc.h is no longer applicable because #define OBJC_TYPES_DEFINED 1 is defined in objc-private.h.

struct objc_class

First, show the objC_class structure of the old OBJC1

Next comes the objC_class structure for the OBJC2 version The above is OBJC2 versionobjc_classStructure, the current version of OC we use is this structure, which is the same as the OBJC1 versionobjc_classThe structure differences are so large that methods lists, properties lists, and so on are no longer directly exposed in the structure.

In the new objC_class, the structure body is inheritedobjc_objectSo it also retains an ISA pointer. The ISA pointer to an object points to the class of which the object belongs, and the ISA pointer to aclass points to a metaclass.

Struct objc_class struct objc_class struct objc_class

  • Class ISA: A Meta Class that points to a Class
  • Cache_t cache: cache list. Because a class may have many methods, after a method is called, the Runtime adds the method to the cache to reduce the search time for the method next time and improve the running efficiency of the program
  • Class_data_bits_t bits: In the old version the method list values were directly exposed in the structure, but in the new versionobjc_classwithbitsThe data is separated and hidden, throughbitsAnd we can take thatclass_rw_t*Type data, passclass_rw_tCan take toclass_ro_t. In other words, the class is dividedclass_rw_tandclass_ro_tTwo parts.

Inheritance chain

From the above analysis, there is no big difference between the new version and the old version of the relation chain between classes and objects, but the new version of the class structure has been optimized and encapsulated. The chain of relationships is still object -> class -> metaclass

The structure of the class

Unlike the OBJC1 version of the code, OBJC2 does not directly expose properties, methods, and so on in its class structure. This is all hidden in the class_datA_bits_t variable. In which, it is divided into class_rw_T and class_ro_t. Later, Apple optimized this structure again for the consideration of memory usage, and divided class_rw_ext_t from class_RW_T and class_ro_t respectively.

struct objc_class

This is the Class structure of OBJC2, where the meanings of the individual values have been described above. If we want to access class_rw_t, we need to get the class_data_bits_t bits first.

Since we can’t access bits directly, we use the memory distribution of the structure to get bits in the LLDB. Since ISA and Superclass are both Pointers, they take up 8 bytes each. The cache takes 16 bytes from its structure, so the memory offset of bits in the structure should be 32 bytes.

After getting the bits in this way, call the data() method to get class_rw_t

class_rw_t

Fetching was mentioned aboveclass_rw_tNow let’s see what’s in it.

Skipping over a bunch of functions, we can look at a few familiar ones.

So, by calling these functions, you should be able to get the methods and properties in the old OBJC1 code.

The ro() function also retrieves the corresponding class_ro_t

class_ro_t

class_ro_tYou can see that there arebaseMethodList.baseProtocols.ivars.basePropertiesWait for a couple of values. In OC, instance method definitions are stored in the classclass_rw_tClass methods, member variables, and so on are stored inThe metaclasstheclass_ro_tIn the. When the program starts running, the Runtime is based onclass_ro_tCopy out a value asclass_rw_tWhen we add a method dynamically, we change itclass_rw_t.class_ro_tisconstCannot be modified.

LLDB debugging verification

The above content is based on our reading and analysis of the source code to give a conclusion, next we will use objC4-781 source code and LLDB compiler debugging to verify the above mentioned structure and conclusion.

Let’s start by writing a simple main function

The LGPerson class has an attribute, a class method, and an instance method.

Get class_data_bits_t

First of all, in the previous analysis, we know that to obtain the data in the class structure, we first need to fetch class_datA_bits_t. Due to memory offset, we know that to obtain this value, we need to offset 8+8+16=32 bytes based on the address of the class object, which is converted into hexadecimal 0x20

After entering LLDB debug mode we will try to get class_datA_bits_t

As you can see from the figure above, we successfully fetched a pointer of type class_datA_bits_t.

Get class_rw_t

The class_rw_t structure is one of the key structures in the class, even though it has a class_rw_ext_t, but because the latter is designed for optimization, the convention in this article mentions class_rw_t implicitly class_rw_ext_t by default.

Looking at the structure definition of class_data_bits_t, we can see that there is a public data() method in it, which returns a pointer to class_rw_t via bits & FAST_DATA_MASK.

By calling data() on top of the $2 we get class_rw_t for the class.

At this point, we can verify that the conclusion above is correct.

instance methods

Instance methods are stored in the classclass_rw_t, in theclass_rw_tWe can find several such methods in.

Let’s first try to get the list of instance methods of the class and see if we can see the instance methods we defined.

Calling method() returns a method_array_t, which by definition and structure is a two-dimensional container. We need to get the contents of it, get its list.

So once we get the list, we have an address in it, and we’re going to get that address.

As you can see, the LLDB output for $5.ptr is a method_list_t *const address, and we now have the list of methods for the class.

Now that we have the address of the method list, we use the * operator to read the data from the address.

You can see it read outmethod_list_tIs aentsize_list_ttThe structure of the. We can find the definition of this structure in the code.

Inside this structure are defined methods to get the values in the array.

Obviously, we can call get() to retrieve the contents of entsize_list_tt. When we call get(), we find that the data read is empty. Why? Big () can be found in objC4-781, but big() can not be found in objC4-781. If you know big(), please point it out in the comments section.

After calling big(), the LLDB finally outputs what we want to see.

You can see that our instance method instanceMethod1 is successfully printed, and the structure for method_t::big is the classic three-way structure defined by method_t (name-types-IMP).

class methods

Now that we’ve found attributes, instance methods, and member variables (stored in the class_ro_T class object), there’s only one thing left: the class method. Class methods are actually stored in the Meta Class. We know that objc_class inherits objc_Object, that is, it has an ISA pointer implicit in its structure. Where does the ISA pointer in an object point to the class to which the object belongs? The metaclass.

Class -> Meta – class is the only link that has not been verified. Next we use LLDB to find metaclasses.

The first step is to find the ISA pointer to the class object.

inobjc_objectOne of these structures is calledISA()The function of

See the implementation of this function

Isa.bits & ISA_MASK: isa.bits & ISA_MASK: isa.bits & ISA_MASK: Isa.bits & ISA_MASK: Isa.bits & ISA_MASK: Isa.bits

Use x/4gx to read the contents of the address of the class object. The first address is the address of the ISA pointer in the class.

Print the address ISA_MASK (0x00007ffffffffff8ULL) with Po and find that it is indeed LGPerson, and that the address is obviously different from the address obtained by [objC2 class], indicating that this is the address of the metaclass.

The next flow is the same as the others, adding the offset of 0x20 to the address to get the class_datA_bits_t metaclass. The data() method is then called to get the relevant data.

As you can see, here we successfully found the class method, which shows that we were right about finding the metaclass, and that the class method is indeed stored in the metaclass.

In fact, the original location of the class method should be stored in the metaclass class_ro_t, by printing the contents of class_ro_t, we can also find the definition of the class method.

Write in the last

I’ve looked at the underlying source code for classes and objects many times before, and I’ve tried LLDB debugging validation theory, but this is the first time I’ve systematically documented and run through all the validation. Personally, I think it is very important to understand the nature and principle of classes and objects, and such dark technologies as Method Swizzling are also based on the understanding of classes and objects. By learning about class and object structure, questions like why a Category can’t add member variables at run time come naturally. Although this article took a long time, most of which was spent on the verification of the LLDB, it was still fruitful in the end.

Any mistakes or omissions are welcome


Tino Wu.

more at tinowu.top