Less long salt collection, many virtuous. — preface to The Collection of Lanting by Wang Xizhi

The target file

Object file structure

Programmers write source code, while computers run machine instructions that the CPU can recognize. Therefore, there must be a series of tools or programs to convert source code into machine instructions. The transformation process needs to go through two main stages: compilation and linking. The so-called compilation is to convert the source code file into an intermediate Object file. The suffix of an object file is.o. The object file in iOS is also a Mach-O file. The fileType field in the struct mach_header is used to describe the type of the current file. The object file corresponds to the type MH_OBJECT. The layout structure and content in the object file are very similar to the layout structure and content in the executable file. The contents of the __TEXT sections in the compiled object file hold binary code that has been compiled into machine instructions. Here is the layout structure of an object file:

Relocation table Relocation table

The compilation operation of the system is the independent behavior of one source file at a time. Usually programs are written that refer to functions or class methods defined in other source files or dynamic libraries, as well as global variables. Therefore, the addresses of all external references cannot be determined during compilation. The operand part of the external function call instruction in the Section of the Segment in the generated object file and the address of the external global variable symbol will both have a value of 0. The operands of these instructions need to be adjusted during subsequent links to perform Relocation, so the system creates a Relocation table in the compiled object file for those sections that have external symbolic references. Each entry in this relocation table records the location information of all instructions or data accesses that need to be relocated, as well as the external symbols referenced, for update processing at link time. The diagram below shows the structure:

Now suppose there is a source file test.m in the project, and its content is as follows:

int testfn(NSString *str)
{
      return [str lenght];
}
Copy the code

This source file contains an OC method call [STR length], which is converted to a call to objc_msgSend at compile time, but because objc_msgSend is defined in the dynamic library libobjc.dylib, So this is an external symbol for the source file test.m, the compiler cannot determine the offset of objc_msgSend relative to the current instruction when generating the function call instruction, so the function call in the instruction cannot determine the value of the operand, As in the call instruction 0x00000094 above, there are only opcodes and the operands are temporarily set to 0.

To be able to relocate all external symbol references when linking, the Section structure of the description mechanism code __text:

// Section_64 struct section {/*for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size inbytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* uint32_t nreloc; /* uint32_t nreloc; /* Uint32_t flags; /* uint32_t flags; /* flags (sectiontype and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};
Copy the code

The reloff and nreloc fields in this section are used to describe all the information that needs to be relocated in this section. The “Relocations Offset” and “Number of Relocations” in the legend above describe the relocation table at the 0x116c Offset of the file. There are three pieces of information that need to be relocated. The entry in the relocation table is a structure:

struct relocation_info {
   int32_t	r_address;	/* offset in the section to what is being
				   relocated */
   uint32_t     r_symbolnum:24,	/* symbol index if r_extern == 1 or section
				   ordinal ifr_extern == 0 */ r_pcrel:1, /* was relocated pc relative already */ r_length:2, /* 0=byte, 1=word, 2=long, 3=quad */ r_extern:1, /* does not include value of sym referenced */ r_type:4; / *if not 0, machine specific relocation type */
};
Copy the code

This structure is defined in < Mach-o/roc.h >, which I will describe in more detail in a subsequent document. Here you just need to understand that the structure mainly includes the location of the directive that needs to be redirected, and the symbol information of the external reference. This is shown in the picture above.

Briefly, what the link step does

When the compiler has compiled all the source code files, the next step is linking. The main function of linking is to connect the information of all the same segments and sections in all object files into a single executable file. Directives for all the parts of the target file that need Relocation will also be adjusted because you know where each reference symbol is. When linking, the system analyzes the dependencies in each object file, which means that the contents of the sections linked into an executable file are always placed first with the undependent object file and second with the dependent object file.

Base address Redirection

Another important piece of information when linking is to add Rebase information. The reason is that each reference to the dynamic library and the base address that the executable is loaded to is different when the process executes. It’s a random act. There are many places in our source code or system implementation where an absolute address value is stored. For example, the IMP part of the method list of each OC class in Runtime is a function address pointer. When a program is compiled and linked, a default virtual base address is specified for the generated executable or dynamic library, and all subsequent absolute address values in generated code are built against this virtual base address. We can find the default base address for program loading in the LC_SEGMENT in the executable mach-o file named __TEXT or in the load Command definition in LC_SEGMENT_64. The value of the vmADDR data member in the struct segment_command struct __TEXT holds the default base address that the program is loaded with. In general, the default base address for executable programs is 0x100000000.

While the base address is fixed when the program is generated, the base address is not the same every time the program is loaded into memory, but is a random value. So there is a slide value, or address difference, between the actual base address where the program was loaded and the base address where the program was generated. However, because many of the address values in the program are based on the generated virtual base address, this part of the function address needs to be redirected when the program is loaded. To implement the rebase capability, the mach-o file constructs a load command LC_DYLD_INFO or LC_DYLD_INFO_ONLY, This load command is struct dyLD_INFO_command, which can be described in

. The rebase_OFF and rebase_size fields in this structure are used to describe the offset of the table to be rebase and the number of rebases to be rebase, respectively. The rebase table records all the information that needs to be rebase, so that when the process loads, it adjusts the value of this part of the content based on the slide value between the default base address and the actual loaded base address. Here is the contents of the rebase segment:

LC_DYLD_INFO_ONLY contains not only address information that needs to be rebase, but also weak binding and lazy loading information. Each rebase entry records the operation (opcode) to be performed by rebase, the segment where the address to be rebase resides, and the offset value within the segment. I’ll cover the details of Rebase in a later article. I won’t repeat it here.

Static library code linking rules

The application linking process starts with all the object files in the main program project, regardless of whether the code in the object files in the project is referenced or called, will be linked into the executable. During the link process, if a symbol is not defined in the main program project, it will be looked in the imported dynamic or static library file. If the symbol is defined in the dynamic library, stub code is generated for the symbol in the dynamic library (assuming that the symbol is a function) and the reference information is put into the import symbol table so that the actual function address is dynamically loaded when the subsequent program runs. If the symbol is found to be defined in the static library, it will be treated as follows:

  • By default, the link is based on the object files in the static library. As long as symbols defined in an object file are referenced by the main program, all code in that object file is linked to the executable program. If symbols defined in other object files are referenced in this object file, the link is processed recursively. An object file in the static library will not be linked to the executable if its code is not referenced anywhere else.
  • The OC class method list is built at compile time, but the method calls to it are determined dynamically at run time, so any method calls to the OC class defined in the static library in the code are not considered to be references to symbols and do not result in linking behavior. The linking behavior occurs only if the OC class itself is referenced in the code, in which case all OC class methods defined in the static library are linked into the executable (since the OC class method list has been built at compile time). This means that either all the methods defined by OC in the static library will be linked into the executable, or none will be linked at all.

Suppose a static library defines an OC class named CA:

// The class defines two methods. @interface CA:NSObject -(void)fn1 -(void)fn2; @end // Suppose we also define the CB class @interface CB:NSObject @end in the same fileCopy the code

Suppose there are two places in the main program where the CA class defined in the static library will be used:

// Although CA is an argument and the corresponding method is called, the link does not include the CA class, because it is an indirect method call at runtime. void foo1(CA *p) { [p fn1]; } // If there is no foo2 function, the code in the CA class will not be linked into the executable. voidfoo2() {// A reference to CA is indicated only when an object is explicitly created using the CA class. This will link all the methods in the CA class into the executable, and the implementation of fn2 will be linked in, even though fn2 is not called. CA *p = [CA new]; [p fn1]; } voidmain()
{
    foo1(nil);
    foo2();   
}

Copy the code

Since the CB and CA classes are implemented in the same.m file, even if the CB class is not referenced, it will still be linked to the executable according to the preceding file-by-file linkage rules, unless the CB and CA classes are not implemented in the same file.

  • The class method defined by any OC class in the static library is not linked to the executable by default, even if the method is called in the main program. The above OC class method calls are determined at run time, not compile time. Unless the -objc option is added to Other Linker Flags in the main project. This option means that all methods of OC defined in the static library will be linked to the executable, regardless of whether the class is referenced or whether the method is a class method.

  • If a C function defined in a static library is not referenced anywhere, the function will not be linked to the executable. If other symbols in the same file are referenced, the function will be linked to the executable even if it is not referenced, according to file-only linkage rules.

  • If a normal member function of a C++ class defined in the static library is not referenced anywhere, the member function will not be linked to the executable. If this is a virtual function, it will be linked to the executable as long as the class is referenced, even if the virtual function is not, because the virtual function is required to participate in the build of the virtual table at compile time. If other symbols in the same file are referenced, member functions defined in C++ classes in the file are linked to the executable according to file-only linkage rules.

  • If the Swift class defined in a file in the static library is not referenced anywhere, it is not linked to the executable. If the class itself or methods in the class are referenced, all methods defined in the class are linked to the executable. For the extension methods in the Extension defined by the Swift class, if the extension methods are defined in the same file as the class methods, the extension methods will also be linked to the executable once the class is referenced. If the extension method is defined in a different file, it is linked into the executable only when the extension method is called.

The method call of Swift class is not determined at runtime like the method call of OC class. Instead, it adopts a mechanism similar to C++ virtual function to realize the polymorphic function and is similar to the calling mechanism of virtual function. The implementation of extension in Swift directly uses the function address to call. That is, the functions defined in Extension are very similar to normal C functions.

  • If we wereOther Linker FlagsAdd * * – all_loadOption, the main program project links all code in all static libraries to the executable program, regardless of the language in which the code is implemented and whether or not the code is referenced or called. If we just want to do all the linking for all the code in a static library, we canOther Linker Flagsadd-force_load Static library path ** to implement.

This is why when we call a class method defined in the static library without -objc or -all_load, we get an unrecognized call exception. Another problem with these two options, however, is that the classes in the static library are linked to the executable regardless of whether they are referenced, thereby increasing the size of the executable.

  • We can turn the DEAD_CODE_STRIPPING(Dead Code Stripping) switch on in the main program project to optimize Code in executable programs. Note that this switch is optimized after the code link is complete. When this switch is turned on, the linker removes all C functions in the executable that are not called, as well as normal C++ member functions. Member methods of OC class that are not called, member methods of Swift class, and virtual functions in C++ classes are not removed. This switch is turned on by default in XCODE.

As you can see from the above rules, linking in the form of static libraries can reduce the size of the executable. Sometimes our applications may refer to third-party static libraries that are very large in size (such as the MAP SDK, which can be hundreds of megabytes). However, the executable file generated by the application is not that large. Nowadays, applications are often integrated with a lot of functions. Especially, some large applications have reached hundreds of megabytes in size. The download and installation of such large applications often take a long time, and will consume the network traffic of users, and even affect the startup time of the application.

What static libraries do

Whenever we build a project, the system first compiles all source code into an object file, and then links the object file into an executable program. This is true even if we change the source code in one of the files and the other files remain unchanged. So to speed up compilation, instead of providing some files as source code, a few object files can be pooled to form a static library. In this way, you can skip the compilation and link the files directly to speed up the compilation.

For iOS system, it does not support third-party integration into our project in the form of dynamic library and upload to appstore. Libraries provided by third parties are unlikely to be provided to us in the form of source code because of the security, intellectual property and confidentiality characteristics, but are provided to us in the form of static libraries.

It can be seen that the function of static library is mainly to speed up compilation, module division, and code security functions. Static libraries are the result of a compilation, while dynamic libraries are the result of a compilation link. A static library is actually made up of object files. The following is the static library and common source code to participate in the compilation and connection of the flow chart, from the flow chart can be seen in the static inventory in the role and significance:

Static library file structure

A static library is a file composed of file header flags plus symbol tables plus object file collections. A static library file is a collection of files. Static libraries typically end in.a in Unix/Linux, and.lib in Windows. Static library file is a kind of archive file. The format of archive file has not formed a unified standard.

The file format of the static library is not part of the Mach-o file format. However, the file formats and generation standards for static libraries in most current operating systems are very similar. Because the iOS system can support x64 and ARM architectures, the static library file in the iOS system can also support the collection of object files of multiple architectures at the same time. We call this static library file fat format. The static library file layout structure under the single architecture and the static library file layout structure under the multi-architecture are respectively shown below:

1. Sign the static library file

Just as most files always begin with a so-called magic identifier, single-architecture static library files begin with an 8-byte string signature:! \ n. This signature is the generic header signature for all archive files. So you can read the first 8 bytes of the file and say “! \n “is compared to determine whether it is a valid static library. Note that \n is a newline escape character.

2. Symbol header structure

The second part of the static library file is a symbol list header structure. The symbol table can also be a separate file. So the symbol table header structure is actually the structure used to describe the symbol table. This is a variable-length structure defined as follows:

struct symtab_header { char identifier[16]; Char timestamp[12]; // The timestamp generated by the symbol table, where a numeric string represents the number of milliseconds from 1970.1.1 to the present. char ownerid[6]; // Char groupid[6]; Char mode[8]; Char size[10]; // Char size[10]; // The size of the symbol table, expressed as a string size. char end[2]; // End header flag. char name[0]; // Optional symbol table file name. };Copy the code

All data members in the symbol table header structure are strings, and many of the data members in the observation structure are associated with file attributes, such as timestamp, owner, group, and read/write mode. This definition is used to set the default properties of the extracted file when extracting symbol table information from the static library. This information is also used to describe the information of the symbol table file that generated the static library. The identifier and name data members of the symbol table header structure can be used to describe the name of the symbol table. The name part is optional. The IDENTIFIER field is used to describe the name of the symbol table when the identifier is a normal string. When the identifier contains a special value: “#1/ length”, the name part is used to describe the symbol table name. The length of the name is determined by the length specified in the identifier. For example, a identifier containing “#1/20” indicates that the name of the symbol table is stored in the name field and that the name is 20 characters long. SYMDEF or sym. SYMDEF_64 is stored in the name field.

3. The symbol table

The symbol table in the static library holds a collection of symbol table information from all object files. We know that when the program links, it needs to read the symbol table information in the object file to determine whether the symbol information referenced in other object files really exists. When the symbol information referenced in other object files does not exist or cannot be found, the classical error of symbol information does not exist will be reported:

Undefined symbols for architecture arm64:
  "_fn", referenced from:
      -[ViewController viewDidLoad] in ViewController.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Copy the code

So why construct a list of symbols exported by all object files in the static library at the beginning of the static library when the symbol table information exists in the object files? The answer is to speed up the link, since reading the symbol information from the object file each time is certainly much slower than reading the symbol information from one place in the static library alone.

The symbol table structure is also a variable length structure defined as follows:

struct symtab { int size; // Size of symbol table entry. Note that this is the size of the entire array of symbol table entries, not the number of entries. struct ranlib[0]; // An array of symbol table entries, or ranlib_64 if 64-bit;Copy the code

The definition of struct ranlib can be found in < Mach-o /ranlib.h>. The structure is defined as follows:

struct ranlib { union { uint32_t ran_strx; // The symbol name is the starting offset position in the string table below.#ifndef __LP64__
	char		*ran_name;	/* symbol defined by */
#endif} ran_un; uint32_t ran_off; // The offset of the header structure of the object to which the symbol belongs. };Copy the code

Each symbol entry consists of two parts: A RAN_STRx is the position where the symbol is specified to begin offset in the following string table. A RAN_OFF specifies in which object file the symbol is defined. This value is the offset value in the static library of the corresponding object’s object header structure. So you can use this value to quickly define the object file where the symbol resides.

4. String table

The string table in the static library is designed to serve the symbol table. The string table follows the symbol table. The first four bytes hold the length of the string table, followed by an arraylist of strings ending in \0. The structure of the string table is defined as follows:

struct stringtab { int size; // String table size char strings[0]; // The contents of the string table, with each string separated by \0. };Copy the code

5. Object file header structure

The object file header structure is used to describe information about the following object files. Its structure is defined exactly the same as the symbol header structure. I won’t repeat it here.

6. Object files

An object file is a Mach-O format file, which is described in the above section on object files. For more information on object file formats, please refer to some of the documentation on the Mach-O format, which I will cover in more detail in subsequent articles.

Because a static library is a collection of object files, there are many object file header structures and object files in each static library file. Here is an example of a static library file structure:

7.Fat static library header structure

A static library file may contain only one architecture library and may contain a collection of libraries from multiple architectures, for example, a static library provided to us by a third party may have both emulator and real machine versions. Therefore, static libraries can also support multiple architectures. When a static library contains the contents of multiple architectures, a Fat static library header structure will be used at the beginning of the static library file, instead of starting with a “! \n” begins. Instead, it is a structure defined as:

struct fat_header {
	uint32_t	magic;		/* FAT_MAGIC or FAT_MAGIC_64 */
	uint32_t	nfat_arch;	/* number of structs that follow */
};
Copy the code

The definition of this structure can be found in < Mach-o /fat.h>, and you can see that when a file contains code from multiple architectures, whether for a static library or an executable, the file begins with a fat_header structure. The structure is followed by description information for the multiple architectures.

8. Architecture head

The architecture header describes information about the specific architecture, which is defined as follows:

Struct fat_arch {cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ };Copy the code

The definition of this structure can also be found in < Mach-o /fat.h>. It is clear that the structure describes the specific CPU type, as well as the offset and size for the contents of. For static libraries, the offset position of each FAT_arch is the contents of a single architecture static library file, while for executable files, the offset position specifies the image contents of the executable file.

That’s all I have to say about the static library file structure. I think you should have a better understanding of the function of static libraries and their file layout structure. We can generate a static library file through the XCODE project, and we can construct a multi-architecture static library through the LIPo command. (It’s easy to write a lipO command once you understand the static library file structure!)

Some operation commands for static libraries.

For a static library file normally we can use lipo command in building architecture more static library, you can also through the ar command to build and display in a static library files, and extract the files, or to a target file is removed from the static library, and add a target file to the static library. You can also use the nm command to view all symbol information in a static library.

Access to the lipo command: blog.csdn.net/SoaringLee_…

Ar command USES the portal: www.cnblogs.com/woxinyijiu/…

The nm command USES the entrance: www.jianshu.com/p/6d5147347…

An application scenario in a static library

The relocation information in the static library object file is A reference to the external symbol that is held, so we can change this part of the object file to make A call to function A instead of calling function B without changing the source code! A very interesting application is that we can change all calls to objc_msgSend! To implement HOOK handling of OC method calls. The reason why you want to modify the object file in the static library is because XCODE compiles and links the source code together and we can’t insert scripts to modify the contents of the object file after compilation and before linking. But the contents of the static library are free to change up front.

reference

1. This article introduces the static library structure mainly from machOView source code. 2. en.wikipedia.org/wiki/ar_ (Un…

👉 [Return to directory]


Welcome to myMaking the address