MachO file
Mach-o is short for Mach Object file format, which is the executable file format for MAC and iOS. Is a file format for executable files, object code, and dynamic libraries. As an alternative to the A.out format, Mach-O provides greater extensibility. Similar to Portable PE on Windows, ELF on Linux (Executable and Linking Format)
Mach-o File format
- Object file.o
- The library files
- .a
- .dylib
- framework
- Executable file
- dyld
- .dsym
In actual development, there are many different types of MatchO files that can be specified in Xcode. Targets → Build Settings → Linking → Mach -o Type
Universal binary
- A program code proposed by Apple that can work with binaries of multiple architectures simultaneously.
- Provides optimal performance for multiple architectures simultaneously in the same package.
- Because of the need to store multiple codes, programs are usually larger than single-platform binaries.
- Since only part of the code is called during execution, no extra memory is required to run it.
During Xcode compilation, you can specify which architecture Match -o files to generate, and you can also add Targets → Build Settings → Architectures → Architectures for other Architectures
Device CPU architecture (instruction set)
- The simulator:
- 4s-5: i386
- 5s-6s Plus: x86_64
- Real Machine (iOS device):
- Armv6: iPhone, iPhone 2, iPhone 3G, iPod Touch(first generation), iPod Touch(second generation)
- Armv7: iPhone 3Gs, iPhone 4, iPhone 4S, iPad, iPad 2
- Armv7s: iPhone 5 and iPhone 5C
- Arm64: models after iPhone 5S
The Mach-O architecture is split and merged
- Lipo tool
- Look at the MachO architecture
$lipo-info 'MachO file '
- Break down the MachO architecture
$lipo 'MachO file '-thin' schema name '-output' target MachO file '
- Merge the MachO architecture
$lipo-create 'first MachO file' 'second MachO file' -output 'target MachO file'
- The file command: displays file information
$file File path
MachO file structure
As shown in the figure, Mach-O consists of three parts: Header Commands, Load Commands, and Data
Header
Contains general information about the binary, schema type, byte order, number of load instructions, and so on.
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
Copy the code
Magic: is the location architecture 64-bit or 32-bit (e.g. MH_MAGIC_64)
Cputype: CPU type (e.g. CPU_TYPE_ARM64)
Cpusubtype: CPU specific type (e.g. CPU_SUBTYPE_ARM64_ALL)
Filetype: filetype (e.g. MH_EXECUTE)
NCMDS: Load Commands number
Sizeofcmds: Size of Load Commands
Flags: flag bit. Identifies functions supported by binary files, mainly related to system loading and linking
Reserved: reserved area (only 64 bits)
Load Commands
Contains the location of the region, symbol table, dynamic symbol table, and so on. It describes the specific organization structure of the data in the file. Different data types are represented by different loading commands.
- LC_SEGMENT_64(__PAGEZERO)
- VM Size: indicates the virtual memory. The Size is 4 gb (16 MB for 32-bit). Used to distinguish between 32 bits and storage locations
- LC_SEGMENT_64(__TEXT)
- LC_SEGMENT_64(__DATA)
- LC_SEGMENT_64(__LINKEDIT)
- VM Address: indicates the virtual memory Address
- VM Size: indicates the virtual memory Size
- File Offset: Indicates the start position of data in a File
- File Size: Indicates the Size of data in a File
- LC_DYLD_INFO_ONLY (dynamic linking information)
- Rebase: indicates the location for redirection. When MachO is loaded into memory, the system randomly allocates a memory offset, asLR, and the offset in rebase to get the actual location of the code in memory. Then open up the actual memory according to the size
- Binding: indicates the location of the Binding
- Weak Binding: Indicates the location of Weak Binding
- Lazy Binding: indicates the location of the Lazy Binding
- Export: indicates the external location information
- LC_SYMTAB (symbol table address)
- Symbol Table Offset: Indicates the position of the Symbol Table. Function name and function address associated information
- Number of symbols
- String Table Offset: Symbol name position
- String Table Size: symbol name Size
- LC_DYSYMTAB (Dynamic symbol table address)
- LC_LOAD_DYLINKER (DyLD)
- Str Offset: Position of the dynamic library connector
- Name: Dynamic library Connector path (DYLD)
- LC_UUID (MachO file unique identifier)
- LC_VERSION_MIN_IPHONESOS (MachO files support the lowest OS version)
- LC_SOURCE_VERSION
- LC_MAIN (MachO program entry: sets the entry address and stack size of the main thread of the program)
- Entry Offset: Indicates the Entry position
- Stacksize: Stacksize
- The Entry Point is a memory address.
- LC_ENCRYPTION_INFO_64 (encrypt information)
- Crypt Offset: indicates the location of the encryption information
- Crypt Size: indicates the Size of the encrypted information
- Crypt ID: indicates the ID of the encryption information. 0 for non-encryption, 1 for encryption
- LC_LOAD_DYLIB (path of dependent libraries, including three-party libraries)
- Str Offset: indicates the location of the dynamic library
- Time Stamp: Time Stamp of the dynamic database
- Current Version: Indicates the Version of the dynamic library
- LC_RPATH (frameworks message)
- LC_FUNCTION_STARTS (start position table of functions)
- LC_DATA_IN_CODE (code data information)
- LC_CODE_SIGNATURE
Data
It consists of a Segment and a Section. Store concrete data: code, data, string constants, classes, methods, etc.
- The Segment of
#define SEG_PAGEZERO "__PAGEZERO" /* Catch null pointer */ when MH_EXECUTE file
#define SEG_TEXT "__TEXT" /* Code/read only data segment */
#define SEG_DATA "__DATA" /* Data segment */
#define SEG_OBJC "__OBJC" /* Objective-c runtime segment */
#define SEG_LINKEDIT "__LINKEDIT" /* Contains symbols and other tables that need to be used by the dynamic linker, including symbol tables, string tables, etc. */
Copy the code
- Segment data structure
struct segment_command_64 {
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* Space required by section_64 structure */
char segname[16]; /* segment name */
uint64_t vmaddr; /* Virtual memory address of the described segment */
uint64_t vmsize; /* Virtual memory size allocated for the current segment */
uint64_t fileoff; /* The offset of the current segment in the file */
uint64_t filesize; /* Number of bytes in the file */
vm_prot_t maxprot; /* The maximum memory protection required for the page where the segment is located
vm_prot_t initprot; /* Raw memory protection for the page where the segment resides */
uint32_t nsects; /* Number of sections in a segment */
uint32_t flags; /* Identifier */
};
Copy the code
- Section data structure
The Segment of a Section (mainly __TEXT and __DATA) is further decomposed into sections.
struct section_64 {
char sectname[16]; /* Section name */
char segname[16]; /* Name of Segment */
uint64_t addr; /* Memory address where the Section is located */
uint64_t size; /* Size of Section */
uint32_t offset; /* The offset of the file where the Section is located */
uint32_t align; /* Section's memory-aligned boundary (a power of 2) */
uint32_t reloff; /* File offset for relocation information */
uint32_t nreloc; /* Number of relocation entries */
uint32_t flags; /* Flag attribute */
uint32_t reserved1; /* Reserved field 1 (for offset or index) */
uint32_t reserved2; /* Reserved field 2 (for count or sizeof) */
uint32_t reserved3; /* Reserved field 3 */
};
Copy the code
Here are some common sections:
__TEXT, __TEXT: Main program code
__TEXT,__stubs / __stub_helper: stubs for dynamic linking
__TEXT, __objc_methName: OC method name
__TEXT __objc_classname: OC class name
__TEXT, __objc_methType: OC method type
__TEXT,__cstring: THE C language string in the program
__DATA,__got: non-lazily loaded symbol tables
__DATA,__la_symbol_ptr: lazily loading the symbol table
__DATA, __objc_ClassList: OC classlist
__DATA, __objc_protolList: OC prototype list
__DATA, __objc_imageInfo: information about the OC image
__DATA __objc_const: OC constants
__DATA,__objc_selfrefs: OC class self-reference (self)
__DATA,__objc_superrefs: OC superclass reference (super)
__DATA,__objc_protolrefs: OC prototype references
__DATA,__objc_data / __DATA: OC code data
Dynamic Loader Info: Information needed by the Dynamic linker (redirection, symbolic binding, lazy loading binding, etc.)
Function Starts: Starts the method
Symbol Table: Symbol Table
Dynamic Symbol Table: Dynamic Symbol Table
String Table: String Table
Code Signature: Indicates the Code Signature information