This article only records knowledge points
Mach-o is short for Mach Object file format and is a file format for recording executables, Object code, shared libraries, dynamically loaded code, and memory dumps.
The Mach-O file consists of three main parts:
Mach Header: Describes information about THE CPU architecture, file type, and Load Command. Load Command: describes the specific organizational structure of Data in a file. Different Data types use different Load commands to represent Data. The Data for each Segment of Data is stored here. Segments are used to store Data and codeCopy the code
The program format that can be executed on MacOS or iOS is called macho-O, and its main components are shown below:
Using machOView parsing, the details are as follows:
__TEXT, __DATA Dynamic Loader Info Function Starts Symbol Table Dynamic Symbol Table String TableCopy the code
Mach-o header file:
struct mach_header_64 { uint32_t magic; /* Mach magic identifier */ cpu_type_t cpuTYPE; /* CPU type identifier, as defined in the common binary format */ cpu_subtype_t cPUSubType; /* Uint32_t fileType; /* File type */ uint32_t NCMDS; /* Uint32_t sizeofcmds; / / uint32_t sizeofcmds; /* Uint32_t flags; / / uint32_t flags; /* Uint32_t reserved; /* 64-bit reserved fields */};Copy the code
There are 11 fileTypes:
Define MH_OBJECT 0x1 /*.o */ #define MH_EXECUTE 0x2 /* executable binary */ #define MH_FVMLIB 0x3 /* VM shared library */ #define MH_CORE 0x4 /* Core file, */ #define MH_PRELOAD 0x5 /* preloaded executable file */ #define MH_DYLIB 0x6 / #define /usr/lib/dyld */ #define MH_BUNDLE 0x8 /* */ #define MH_DYLIB_STUB 0x9 /* static link file */ #define MH_DSYM 0xA / */ #define MH_KEXT_BUNDLE 0xb /* x86_64 */Copy the code
Composition of Segment:
LC_SEGMENT_64: maps this segment (64-bit) to the address space of the process. SEG_PAGEZERO "__PAGEZERO" /* the size is 4GB. The first 4GB of the address space of the process is mapped as unreadable, unwritable and unexecutable. */ SEG_TEXT "__TEXT" /* code/read-only data segment */ SEG_DATA "__DATA" /* data segment */ SEG_OBJC "__OBJC" /* Objective-C Runtime segment */ SEG_LINKEDIT "__LINKEDIT" /* Contains symbols and other tables that need to be used by the dynamic linker, including symbol tables, string tables, etc. */ LC_DYLD_INFO_ONLY: Loading dynamic link library information (redirection address, weak reference binding, lazy loading binding, offset value of open function, etc.) LC_SYMTAB: loading symbol table address LC_DYSYMTAB: loading dynamic symbol table address LC_LOAD_DYLINKER: /usr/lib/dyld LC_UUID = /usr/lib/dyld To check whether the dysm file and crash file match LC_VERSION_MIN_MACOSX/LC_VERSION_MIN_IPHONEOS: Determine the minimum operating system version required for the binary LC_SOURCE_VERSION: the version of the source code used to build the binary LC_MAIN: Sets the entry address and stack size for the main thread of the program LC_ENCRYPTION_INFO_64: LC_LOAD_DYLIB: loads an additional dynamic library LC_FUNCTION_STARTS: defines a function start address table that makes it easy for debuggers and other programs to see if an address is inside the function LC_DATA_IN_CODE: Non-instruction table LC_CODE_SIGNATURE defined in code snippet: Gets application signature informationCopy the code
Segment data structure:
struct segment_command_64 { uint32_t cmd; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* section_64 */ char segname[16]; /* segment name */ uint64_t vmaddr; /* Uint64_t vmsize; / / uint64_t fileOff; / / uint64_t fileoff; /* Uint64_t filesize; / / uint64_t filesize; /* The number of bytes used by the current segment */ vm_prot_t maxprot; */ vm_prot_t initprot; / / uint32_t nsects; /* Uint32_t flags; /* Identifier */};Copy the code
Partial segments (mainly __TEXT and __DATA) can be further decomposed into sections. The Segment -> Section structure is used because the sections in the same Segment can control the same permissions, or they can not be allocated according to the Page size to save memory space. The Segment is exposed as a whole and mapped to a complete virtual memory during the loading stage of the program to achieve better memory alignment
Section data structure:
struct section_64 { char sectname[16]; /* Section name */ char segname[16]; /* Segment name */ uint64_t addr; /* Uint64_t size; /* The size of the Section */ uint32_t offset; /* uint32_t align; /* Section memory alignment boundary (power of 2) */ uint32_t reloff; /* Uint32_t nreloc; /* The number of relocation entries */ uint32_t flags; / / uint32_t reserved1; /* Reserve 1 (for offset or index) */ uint32_t reserved2; / / uint32_t reserved3; /* Retain field 3 */};Copy the code
Common sections:
__TEXT. __cString C string __TEXT.__const Const constant __TEXT.__stubs is used as placeholder code for stubs. __text-. __stubs_helper The Stub ends up pointing to __text-. __objc_methName Objective-C method name __text- Objective-c method type __text-. __objc_className Objective-C classname __data.__data Variable data initialized __data.__la_symbol_ptr lazy binding Nl_symbol_ptr is a non-lazy binding pointer table. The pointer in each entry points to a non-lazy binding pointer table. __data.__const Uninitialized constant __data.__cfString Core Foundation string (CFStringRefs) __data.__bss BSS, Store global variables that are initialized, __DATA.__common is an uninitialized symbolic declaration __DATA. __objc_classList Objective-C classlist __DATA __data.__objc_imgInfo Objective-C image information __data.__objc_selfrefs Objective-c self references __data.__objc_protorefs Objective-C Stereotype references __data.__objc_superrefs Objective-C superclass referencesCopy the code
At the end of a complete user-level MachO file is a series of link information. It contains symbol tables, string tables, etc. used by dynamic loaders to link executables or dependencies. See Reference 2 for details.
What is __TEXT.__stubs:
Wikipedia definition: A Stub is a program segment that replaces some functionality. A stub program can be used to simulate the behavior of an existing program (such as the process of a remote machine) or as a temporary substitute for code to be developed.
All entries in __la_symbol_ptr are initially bound to __stub_helper. In subsequent calls, the __stub area is still skipped, but __la_symbol_ptr doesn’t need to go into the dyLD_STUB_binder stage and call the function directly because it got the real address of the corresponding method in the previous call. This completes a delayed binding process that approximates the lazy idea.
Summarize the Stub mechanism. Set function placeholders and use lazy to make the process a lazy binding. In macOS, the external function reference produces a placeholder in the __la_symbol_ptr field of the __DATA section. When the first call is invoked, the symbol is dynamically linked. Once the address is found, Change the placeholder in the __la_symbol_ptr Section of the __DATA Segment to the real address of the method, completing the execution process that requires only one symbol binding.
The main function of the preceding two sections is to let the kern kernel know how to read MachO file, specify the dynamic linker (dyly) of MachO file for subsequent dynamic library loading, and set the program entry and other information before the program starts. The Data and link information sections provide real physical address support for each instruction operation mapped into virtual memory when the program is running.
Executable file running process
Parsing the Mach -o file set running environment parameters text segment VM mapping load command dynamic library information symbol table address information dynamic symbol table information Normally on string table address information dynamic library loading function dependent on dynamic library information dynamic linker address information According to the loading information of dynamic link library, the pile placeholder, The assembly instruction specified to call _NL_SYMBOL_ptr is specified to call the entry offset address specified by the entry point of LC_MAIN to execute the entry offset related binary (the logic is to run according to the assembly instruction). A lazy dynamic binding is performed, and the dynamic linker automatically changes the address of the _la_symbol_ptr area, pointing to the symbol address of the dynamic library. On the second run of the dynamic library function, JMP directly to the specified symbol addressCopy the code
Note: Many of the dynamic libraries in the system are common, so XOS has optimized the shared library cache. If a process has used the related dynamic library, the dynamic linker will directly point the address of the pile _la_symbol_ptr to the address of the corresponding symbol in the dynamic library.
Use mach-o in action: Analyzing redundant classes and methods
Reference 1:xiaozhuanlan.com/topic/67503…
Reference 2:blog.csdn.net/bjtufang/ar…