The source code for this article, which is the first in the Mach-O series, can be found here
Our program wants to run and make sure its executable format is understood by the operating system. ELF is the Linux executable format, PE32 / PE32+ is the Windows executable format, and Mach-o is the OS X and iOS executable format.
We usually know the executable file, library file, Dsym file, dynamic library, dynamic linker are in this format. The composition structure of Mach-O is shown in the following figure, including Header, Load Commands, and Data (including the specific Data of sefile).
The structure of the Header
The mach-o header allows for quick confirmation of information such as whether the current file is being used for 32-bit or 64-bit, what processor is it on, and what type of file is it
Take the above code for an example
#include <stdio.h>
int main(int argc, const char * argv[]) {
// insert code here...
printf("Hello, World! \n");
return 0;
}Copy the code
Run the following command on the terminal to generate an executable file a.out
192:Test Joy$ gcc -g main.cCopy the code
We can use MachOView (an open source tool for viewing MachO file information) to see what the format of the.out file is
I’m a little bit confused about what this is, but let’s look at the data structure of the header, okay
32 bit structure
struct mach_header { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; / *type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
};Copy the code
A 64 – bit architecture
struct mach_header_64 { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; / *type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};Copy the code
There is no significant difference between 32-bit and 64-bit header files, except that there is an extra reserved field for 64-bit
Magic:
Magic number, used to quickly determine whether the file is for 64-bit or 32-bitCputype:
CPU type, for example, ARMCpusubtype:
The corresponding specific type, such as ARM64, ARMV7Filetype:
File type, such as executable file, library file, and Dsym file. In Demo, the value is 2MH_EXECUTE
, stands for executable file
* Constants for the filetype field of the mach_header
*/
#define MH_OBJECT 0x1 /* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3 /* fixed VM shared library file */
#define MH_CORE 0x4 /* core file */
#define MH_PRELOAD 0x5 /* preloaded executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library */
#define MH_DYLINKER 0x7 /* dynamic link editor */
#define MH_BUNDLE 0x8 /* dynamically bound bundle file */
#define MH_DYLIB_STUB 0x9 /* shared library stub for static */
#define MH_DSYM 0xa /* companion file with only debug */
#define MH_KEXT_BUNDLE 0xb /* x86_64 kexts */Copy the code
NCMDS:
Number of commands to loadsizeofcmds
: Size of all load commandsReserved:
Keep fieldFlags:
Flag bit, just nowdemo
Here are all of the items displayed in the page, and the rest are available for reading if you are interestedmach o
The source code
#define MH_NOUNDEFS 0x1 #define MH_NOUNDEFS 0x1 #define MH_NOUNDEFS 0x1
#define MH_DYLDLINK 0x4 // This file is input to dyld and cannot be statically linked again
#define MH_PIE 0x200000 // The loader is in a random address space and is only used in MH_EXECUTE
#define MH_TWOLEVEL 0x80 // Two-level namespacesCopy the code
Random address space
Each time a process starts, the address space is simply randomized.
Address space randomization is an implementation detail that is completely irrelevant to most applications, but it is of great significance to hackers.
In the traditional way, the virtual memory image of the program is consistent every time the program is started, and the hacker can easily override the memory to break the program. Using ASLR can effectively avoid hacking attacks.
dyld
When the kernel executes LC_DYLINK (more on this later), the linker starts, finds the dynamic libraries that the process depends on, and loads them into memory.
Secondary namespace
This is a unique feature of Dyld, that the symbol space also includes information about the repository, so that two different libraries can export the same symbol, with the corresponding flat namespace
The Load commands structure
Following the header are Load commands, which clearly tell the loader how to process binary data, some of which are handled by the kernel and some by the dynamic linker. There are obvious comments in the source code to indicate that these are handled by the dynamic linker.
Here are a few that look familiar:….
// Map a 32-bit or 64-bit segment of the file to the process address space#define LC_SEGMENT 0x1
#define LC_SEGMENT_64 0x19// The unique UUID that identifies the binary file#define LC_UUID 0x1b /* the uuid */// Start the dynamically loaded connector as mentioned earlier#define LC_LOAD_DYLINKER 0xe /* load a dynamic linker */// Code signing and encryption#define LC_CODE_SIGNATURE 0x1d /* local of code signature */
#define LC_ENCRYPTION_INFO 0x21 /* encrypted segment information */Copy the code
The structure of the load Command is as follows
struct load_command { uint32_t cmd; / *type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};Copy the code
LC_SEGMENT_64 and LC_SEGMENT are the main commands for loading, which instruct the kernel to set the memory space of the process
CMD:
isLoad commands
The type of theta, hereLC_SEGMENT_64
Represents mapping a 64-bit segment of the file to the address space of the process.LC_SEGMENT_64
andLC_SEGMENT
The structure of the difference is not big, the following is only a list of interest can read the source code
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};Copy the code
Cmdsize:
On behalf ofload command
The size of theThe VM Address:
The virtual memory address of the segmentVM Size:
The virtual memory size of the segmentThe file offset:
Segment offset in fileThe file size:
The size of the segment in the file
Load the contents of the file corresponding to the segment into memory: load the file size from the offset to the virtual memory vmaddr, which is zero due to the _PAGEZERO segment in the memory address space (this segment has no access permission and is used to handle null Pointers)
There are other segments in the image, such as _TEXT which corresponds to code segments, _DATA which corresponds to readable/writable data, and _LINKEDIT which supports dyLD which contains symbol tables and other data
Nsects:
Marked theSegment
How much of thesecetion
The segment name:
The name of the segment, currently__PAGEZERO
Segment & Section
There is a naming problem. As shown in the figure below, __TEXT stands for Segment and the lowercase __TEXT stands for Section
Section data structure
struct section { /* for 32-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint32_t addr; /* memory address of this section */
uint32_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
};Copy the code
Sectname:
Such as_text
,stubs
Segname:
thesection
Subordinate to thesegment
, such as__TEXT
Addr:
该section
At the beginning of memoryThe size:
该section
The size of theOffset:
该section
File migration ofAlign:
Byte alignmentReloff:
The file offset of the relocation entryNreloc:
Number of entrances that need to be relocatedFlags:
containssection
thetype
andattributes
I’ve found that a lot of the underlying knowledge is based on Mach-O, so I’m going to spend some time doing some relatively in-depth summaries with Mach-O in the near future, such as symbol resolution, bitcode, reverse engineering, and so on
Refer to the link
- Deep understanding of MAC OS X & iOS operating systems
- mach-o/loader.h
- Mach-o file format and program from load to execution process
- OS X ABI Mach-O File Format Reference