Welcome to follow the official wechat account: FSA Full stack action 👋
Mach-o is the file format of applications on iOS/macOS. Understanding the file format of Mach-O is helpful for the subsequent static analysis and dynamic debugging of applications.
Analysis of theMach-O
File tools
otool
You can use the command line interface (CLI) to view specific parameters
man otool
Copy the code
. -h Display the Mach header. -l Display the load commands. ...Copy the code
-h displays the Header
Otool -h Mach -o fileCopy the code
-l You can check load commands. If you are interested, you can print it yourself
MachOView
Mach-o is a free open source file analysis tool
- GitHub link: gdbinit/MachOView
- Blue Cloud link: machoview.dmg
010 Editor
- Link: 010 Editor
- MachO template: machotemplate.bt
The template for 010 Editor is a powerful, paid product, but third-party templates are required to analyze ARM64-based Mach-O programs.
Go to Templates -> View Installed Templates
Click the Add button, select the downloaded machoTemplate.bt, configure the Name and Category, and click OK
Go back to the program, insert Mach-O into the 010 Editor, and from the Templates menu select the template you just clicked
The analysis results are shown in the figure
Mach-O
The structure of the
As shown in the figure above, the Mach-O file consists of three parts
Part of the | role |
---|---|
The Mach – O head (Header ) |
Save theCPU Basic information about architecture, size endian, file type, number of load commands, etc |
Load command (Load Commands ) |
Specifies the logical structure of the file and the layout of the file in virtual memory |
The data block (Data ) |
Load Commands Defined in theSegment Raw data of |
Header
The Mach-O Header stores basic information such as CPU architecture, size end order, file type, and number of loading commands. It is used to verify the validity of a Mach-O file and determine the operating environment of the file.
To ⌘ the header definition, press the shortcut key to Xcode + Shift + O and type Mach-o /loader.h
Cpus with 32-bit and 64-bit architectures use mach_header and mach_header_64 structures to describe the Mach-O header, respectively. In this article, the contents are mainly 64-bit, as defined below:
/* * The 64-bit mach header appears at the very beginning of object files for * 64-bit architectures. */
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
Copy the code
field | role |
---|---|
magic |
Magic number (feature field), used to identify whether the current device is big-endian or small-endian. Due to the iOS It’s a little endian, so it’s constantMH_MAGIC_64 , the fixed value is0xfeedfacf |
cputype |
logoCPU Schema of typecpu_type_t , which is defined asmach/machine.h |
cpusubtype |
The specificCPU Architecture, which distinguishes between different versions of the processor, of typecpusubtype , which is defined asmach/machine.h |
filetype |
Mach-O File types (e.g., executable files, library files, etc.) can be found in themach-o/loader.h Find the specific definition and value in.Common are MH_OBJECT (Intermediate target file),MH_EXECUTE (Executable file),MH_DYLIB (Dynamic link library),MH_DYLINKER (Dynamic linker) |
ncmds |
Load Commands The number of |
sizeofcmds |
Load Commands The total size of bytes |
flags |
Some identifying information is available inmach-o/loader.h Find the specific definition and value in.Among them #define MH_PIE 0x200000 Note that only files of typeMH_EXECUTE “, it indicates that the function is enabledASLR To increase program security. |
reserved |
System reserved field |
ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization ASLR: Address Space Layout Randomization
Load Commands
Following the Header, Load Commands specify the logical structure of the file and its layout in virtual memory, explicitly telling the loader how to handle binary data. Some commands are handled by the kernel and some by the dynamic linker (DYLD).
Load Commands can be regarded as a collection of multiple Commands, each of which has a constant type CMD prefixed with LC_, such as LC_SEGMENT.
In the header file Mach-o /loader.h, you can see the definition of each command. Each command has its own independent structure, but the first two fields of the structure are fixed as CMD and cmdsize
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};
Copy the code
field | role |
---|---|
cmd |
The currentLoad Commands Type, such asLC_SEGMENT |
cmdsize |
The currentLoad Commands To ensure that it can be correctly parsed |
Depending on the command type (CMD), the kernel uses different functions for parsing.
Several important command types are described below.
LC_SEGMENT
LC_SEGMENT and LC_SEGMENT_64 are segment load commands. Each segment defines a virtual memory region that the dynamic linker maps to the process address space. Its structure is defined as follows:
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
Copy the code
field | describe |
---|---|
cmd |
The currentcommand The type of |
cmdsize |
The currentcommand The size of the |
segname |
Segment name, 16 bytes |
vmaddr |
Segment virtual memory address |
vmsize |
Segment virtual memory size |
fileoff |
The offset of the segment in the file |
filesize |
The size of the segment in the file |
maxprot |
Maximum memory protection level for segment pages |
initprot |
Initial memory protection level for segment pages |
nsects |
A segment contains the number of sections. A segment can contain zero or more sections |
flags |
Section flag information (SG_HIGHVM ,SG_FVMLIB Etc.) |
The system loads contents with filesize size from fileoff to virtual memory VMADDR with vmsize. The permissions of segment pages are initialized by initprot. The permissions can be changed but cannot exceed the maxprot value.
The four sections in the figure above work as follows:
Period of | describe |
---|---|
__PAGEZERO |
A static linker is created__PAGEZERO As the first segment of the executable, the location and size of the segment in virtual memory are0 , cannot read, write, or execute, used to handle null Pointers. |
__TEXT |
Contains executable code and other read-only data. The static linker sets the virtual memory permissions for this segment to be readable and executable, and the process is allowed to execute the code, but not modify it. |
__DATA |
Contains data that will be changed. The static linker sets the virtual memory permissions for this segment to read and write. |
__LINKEDIT |
Contains the raw data of the dynamic link library, such as symbols, strings, and relocation table entries. |
64 bit section structure definitions:
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};
Copy the code
Period of | describe |
---|---|
sectname |
Section name, occupy16 bytes |
segname |
Section directs the segment name, accounting16 bytes |
addr |
The starting location of the section in memory |
size |
The size of memory occupied by the section |
offset |
Section to the offset address of the file |
align |
The byte alignment size of the section |
reloff |
Relocation entry file offset |
nreloc |
Number of entries to be repositioned |
flags |
Section type and properties |
reserved1/2/3 |
System reserved field |
LC_LOAD_DYLIB
LC_LOAD_DYLIB points to the loading information of the program dependent libraries, which can be viewed using MachOView
The LC_LOAD_DYLIB structure is defined as dylib_command
struct dylib {
union lc_str name; /* library's path name */
uint32_t timestamp; /* library's build time stamp */
uint32_t current_version; /* library's current version number */
uint32_t compatibility_version; /* library's compatibility vers number*/
};
struct dylib_command {
uint32_t cmd; /* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB, LC_REEXPORT_DYLIB */
uint32_t cmdsize; /* includes pathname string */
struct dylib dylib; /* the library identification */
};
Copy the code
field | describe |
---|---|
name |
The full path to the dependent library. The dynamic linker uses this path for dynamic library loading |
timestamp |
Dependent library build time stamp |
current_version |
Current version number |
compatibility_version |
Compatible version number |
The structure of LC_LOAD_WEAK_DYLIB is also dylib_command, the difference is that the declared dependency library is optional, that is, the lack of declared dependency library will not affect the operation of the main program, while LC_LOAD_DYLIB declared dependency library if not found, the loader will give up and end the process.
You can use otool to see which libraries are available
otool -arch arm64 -L LXFProtocolTool_Example LXFProtocolTool_Example: / System/Library/Frameworks/Accelerate framework/Accelerate (compatibility version 1.0.0, The current version 4.0.0) @ rpath/Alamofire framework/Alamofire (compatibility version 1.0.0, /usr/lib/libobjc.a. dylib (Compatibility version 1.0.0, The current version 228.0.0)/usr/lib/swift/libswiftCoreMIDI dylib (compatibility version 1.0.0, current version 5.0.0, weak) ...Copy the code
In addition to System paths like /System/Library/ and /usr/lib, you may encounter paths like @rpath and @executable_path
The path | describe |
---|---|
@executable_path |
Refers to the directory of executable files |
@rpath |
byLC_RPATH Load specified specified,iOS On is usually the application itselfframework File, default is:@executable_path/Framework |
These paths can be modified using the install_name_tool provided on MacOS. Note: This is a must for injecting dynamic libraries on unjailbroken platforms!
#Modify the dependent library path
install_name_tool -change @rpath/Alamofire.framework/Alamofire @executable_path/Alamofire.framework/Alamofire LXFProtocolTool_Example
Copy the code
Universal binary
The Universal Binary format file (also known as the fat Binary) is actually a package of mach-O files of different architectures, and a fat_header structure is added at the beginning of the file to indicate the supported schema and offset address information, as shown in the following figure:
The definition of common binaries can be seen in the header file Mach-o /fat.h:
#define FAT_MAGIC 0xcafebabe
#define FAT_CIGAM 0xbebafeca /* NXSwapLong(FAT_MAGIC) */
struct fat_header {
uint32_t magic; /* FAT_MAGIC or FAT_MAGIC_64 */
uint32_t nfat_arch; /* number of structs that follow */
};
Copy the code
field | role |
---|---|
magic |
Magic number (characteristic field), which is defined as constantFAT_MAGIC , the fixed value is0xcafebabe |
nfat_arch |
logoMach-O Number of schemas contained in the file |
The FAT_header is followed by the fat_ARCH structure, which is used to describe the details of the corresponding Mach-O file
struct fat_arch {
cpu_type_t cputype; /* cpu specifier (int) */
cpu_subtype_t cpusubtype; /* machine specifier (int) */
uint32_t offset; /* file offset to this object file */
uint32_t size; /* size of this object file */
uint32_t align; /* alignment as a power of 2 */
};
Copy the code
field | role |
---|---|
offset |
Specifies the offset of the corresponding schema relative to the beginning of the file |
size |
Specifies the size of the corresponding schema data |
align |
Specifies the memory alignment boundary for the data2 çš„ N To the power |
Cputype and cpusubType have already been mentioned, so I won’t repeat them here
data
- Xnu source
- github
- opensource.apple.com
- OS X ABI Mach-o File Format Reference