Before reading this article, you need to understand the basic concepts of compilers and linkers
2 LLVM is what
Familiar with base 2 and base 16
4 Understand the basic structure of executables (ELF, Mach-O)
5. Familiar with libobJC source code and Objective-C language system.
Enter the __objc_nlclslist research topic
About using macho file to find which classes implement the +load method, my first reaction is to use the symbol table to look up +[XXXXX load], in fact, there is a simpler and direct idea, is macho file has a section dedicated to record the implementation of the +load method class, The __objc_nlclslist is a list of classes that are not lazily loaded. The context of load methods and class loading is not covered here.
After the best answer to this question, I have some extended reflections:
At what stage did this section come out?
By whom is the 2 +load method detected? And by whom?
Before I go into practice, I’m going to give you a few guesses (I might not be right on the record)
1: Output stage assembles a single relocatable object file, then joins the same sections through the linker, and finally generates the __objc_NLclslist section of the executable file.
2: “detect” that the class implements the load method during compilation, and then record it in the corresponding __objc_nlclslist section. That’s what the compiler does.
Practice with these guesses in mind
I wrote a demo with a SSObject that implements the +load method, as shown here
Use machoView to view the final executable file ↓↓↓↓↓
It is clear that this section (objc_NLclslist) exists and has a single entry in it.
__objc_classList ↓↓↓↓
__objC_nlclslist has the same value as __objc_nlclslist. With engineering in mind, the contents of both sections represent SSObject and __objc_nlclslist is a subset of __objc_classList.
Let’s manually read the content, 0x0000000100008130 (read backwards this is related to the size mode)
↓↓↓↓↓ (All the addresses in macho are virtual addresses, which will not be described below)
The address 0x8130 represents the first address of a class structure. The ro structure is offset by 4*8 bytes according to the memory layout, i.e. 0x00000001000080B0 (represents the first address of class_ro_t), and then the address is searched down ↓ down ↓ down
80A8+8 = 80B0, then offset ro by 4+4+4+4+8 bytes to find the address of name 0x00000100003f9B, which is also the first address of the __objc_className section. Please check the libobJC source code)
It’s going to store the name of the class SSObject under the address (even if you look at the ASCII code). It’s going to look up other things, too, so I won’t try it here.
We verify the __objc_nlclslist by exploring the information in the __objc_nlclslist, finding the corresponding class, and finding its name in Macho by offsetting it. Next, I plan to take a look at the relocatable file ssobject.o and see if there is any information worth noticing
Also found the __objc_nlclslist section, but no details? No panic, this is because the current.o file is a relocatable file, and the contents of the sections that will be merged in the future will not have the correct address until the relocation is complete. So let’s take a look at the Relocations section from that, which stores some information about Relocations. Left left left left left left down down down down down
As you can see, __objc_nlclslist does need to be relocated. The details of the relocation process are not expanded here.
Exploring this section, we have seen the general process of the __objc_nlclslist section, which is generated in the corresponding.o file during compilation, and is eventually merged and relocated by the linker, which is the “summary” __objc_nlclslist section in the final executable macho.
How does the compilation stage produce sections?
Let’s take a look at the compiled ssobject. CPP via clang. Here I take a screenshot to show the relevant information
You can see that __DATA__objc_const,__DATA__objc_data, and __DATA__objc_classlist are written in the code.
From here you can find the corresponding structure in the code
CPP finally defines an array of structures that hold a single piece of data. It is easy to see that lazy classes in the current file are stored in a static array. From here you can be sure that during compilation, the compiler has already determined which classes are not lazily loaded. Left left left
Second, CPP does not fill __DATA__objc_nlclslist.
Ssobject. ll ↓↓↓↓
Need to mention the clang compiler assembly process in the middle of the product. The CPP. — — > ll – >. BC – >. — – > s. O
From the ll code, the compiler already knows that the __objc_nlclslist section is a non-lazy-loaded section, echoing the previous conclusion. How is the LL code generated? Let’s break it down a little bit. There’s pre-processing, lexical analysis, syntax analysis, semantic analysis, intermediate code generation.
The clang front end of LLVM is used to complete these steps, so we will take a look at its source code.
From the above code generation process, it is not difficult to think that the compiler determines the time to write in the section from the end of semantic analysis to the generation of intermediate code. In the CPP code, we have determined which classes are nonlazy. We can see this in the cgobjcMac. CPP generateclass function ↓↓↓
As you can see, when compiling the class, it determines if a class is nonlazy, adds it to the array of DefinedNonLazyClasses, and passes through
FinishNonFragileABIModule () – > AddModuleClassList () left left left
↑ : writes DefinedNonLazyClasses to the __objc_nlclslist section. ↓↓↓
So far, we have a comprehensive understanding of the generation process of this section.
Follow-up: By exploring one stanza, the process of generating other stanzas can be compared to a certain extent.
(This article was originally written by the author)