The source code for this article, which is the first in the Mach-O series, can be found here

Our program wants to run and make sure its executable format is understood by the operating system. ELF is the Linux executable format, PE32 / PE32+ is the Windows executable format, and Mach-o is the OS X and iOS executable format.

We usually know the executable file, library file, Dsym file, dynamic library, dynamic linker are in this format. The composition structure of Mach-O is shown in the following figure, including Header, Load Commands, and Data (including the specific Data of sefile).

The structure of the Header

The mach-o header allows for quick confirmation of information such as whether the current file is being used for 32-bit or 64-bit, what processor is it on, and what type of file is it

Take the above code for an example

#include <stdio.h>

int main(int argc, const char * argv[]) {
    // insert code here...
    printf("Hello, World! \n");
    return 0;
}Copy the code

Run the following command on the terminal to generate an executable file a.out

192:Test Joy$ gcc -g main.cCopy the code

We can use MachOView (an open source tool for viewing MachO file information) to see what the format of the.out file is

I’m a little bit confused about what this is, but let’s look at the data structure of the header, okay

32 bit structure

struct mach_header { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; / *type of file */
    uint32_t    ncmds;        /* number of load commands */
    uint32_t    sizeofcmds;    /* the size of all the load commands */
    uint32_t    flags;        /* flags */
};Copy the code

A 64 – bit architecture

struct mach_header_64 { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; / *type of file */
    uint32_t    ncmds;        /* number of load commands */
    uint32_t    sizeofcmds;    /* the size of all the load commands */
    uint32_t    flags;        /* flags */
    uint32_t    reserved;    /* reserved */
};Copy the code

There is no significant difference between 32-bit and 64-bit header files, except that there is an extra reserved field for 64-bit

  • Magic:Magic number, used to quickly determine whether the file is for 64-bit or 32-bit
  • Cputype:CPU type, for example, ARM
  • Cpusubtype:The corresponding specific type, such as ARM64, ARMV7
  • Filetype:File type, such as executable file, library file, and Dsym file. In Demo, the value is 2MH_EXECUTE, stands for executable file
 * Constants for the filetype field of the mach_header
 */
#define MH_OBJECT 0x1 /* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3 /* fixed VM shared library file */
#define MH_CORE 0x4 /* core file */
#define MH_PRELOAD 0x5 /* preloaded executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library */
#define MH_DYLINKER 0x7 /* dynamic link editor */
#define MH_BUNDLE 0x8 /* dynamically bound bundle file */
#define MH_DYLIB_STUB 0x9 /* shared library stub for static */
#define MH_DSYM 0xa /* companion file with only debug */
#define MH_KEXT_BUNDLE 0xb /* x86_64 kexts */Copy the code
  • NCMDS:Number of commands to load
  • sizeofcmds: Size of all load commands
  • Reserved:Keep field
  • Flags:Flag bit, just nowdemoHere are all of the items displayed in the page, and the rest are available for reading if you are interestedmach oThe source code
#define MH_NOUNDEFS 0x1 #define MH_NOUNDEFS 0x1 #define MH_NOUNDEFS 0x1
#define MH_DYLDLINK 0x4 // This file is input to dyld and cannot be statically linked again
#define MH_PIE 0x200000 // The loader is in a random address space and is only used in MH_EXECUTE
#define MH_TWOLEVEL 0x80 // Two-level namespacesCopy the code

Random address space

Each time a process starts, the address space is simply randomized.

Address space randomization is an implementation detail that is completely irrelevant to most applications, but it is of great significance to hackers.

In the traditional way, the virtual memory image of the program is consistent every time the program is started, and the hacker can easily override the memory to break the program. Using ASLR can effectively avoid hacking attacks.

dyld

When the kernel executes LC_DYLINK (more on this later), the linker starts, finds the dynamic libraries that the process depends on, and loads them into memory.

Secondary namespace

This is a unique feature of Dyld, that the symbol space also includes information about the repository, so that two different libraries can export the same symbol, with the corresponding flat namespace

The Load commands structure

Following the header are Load commands, which clearly tell the loader how to process binary data, some of which are handled by the kernel and some by the dynamic linker. There are obvious comments in the source code to indicate that these are handled by the dynamic linker.

Here are a few that look familiar:….

// Map a 32-bit or 64-bit segment of the file to the process address space#define LC_SEGMENT 0x1
#define LC_SEGMENT_64 0x19// The unique UUID that identifies the binary file#define LC_UUID 0x1b /* the uuid */// Start the dynamically loaded connector as mentioned earlier#define LC_LOAD_DYLINKER 0xe /* load a dynamic linker */// Code signing and encryption#define LC_CODE_SIGNATURE 0x1d /* local of code signature */
#define LC_ENCRYPTION_INFO 0x21 /* encrypted segment information */Copy the code

The structure of the load Command is as follows

struct load_command { uint32_t cmd; / *type of load command */
    uint32_t cmdsize;    /* total size of command in bytes */
};Copy the code

LC_SEGMENT_64 and LC_SEGMENT are the main commands for loading, which instruct the kernel to set the memory space of the process

  • CMD:isLoad commandsThe type of theta, hereLC_SEGMENT_64Represents mapping a 64-bit segment of the file to the address space of the process.LC_SEGMENT_64andLC_SEGMENTThe structure of the difference is not big, the following is only a list of interest can read the source code
struct segment_command_64 { /* for 64-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT_64 */
    uint32_t    cmdsize;    /* includes sizeof section_64 structs */
    char        segname[16];    /* segment name */
    uint64_t    vmaddr;        /* memory address of this segment */
    uint64_t    vmsize;        /* memory size of this segment */
    uint64_t    fileoff;    /* file offset of this segment */
    uint64_t    filesize;    /* amount to map from the file */
    vm_prot_t    maxprot;    /* maximum VM protection */
    vm_prot_t    initprot;    /* initial VM protection */
    uint32_t    nsects;        /* number of sections in segment */
    uint32_t    flags;        /* flags */
};Copy the code
  • Cmdsize:On behalf ofload commandThe size of the
  • The VM Address:The virtual memory address of the segment
  • VM Size:The virtual memory size of the segment
  • The file offset:Segment offset in file
  • The file size:The size of the segment in the file

Load the contents of the file corresponding to the segment into memory: load the file size from the offset to the virtual memory vmaddr, which is zero due to the _PAGEZERO segment in the memory address space (this segment has no access permission and is used to handle null Pointers)

There are other segments in the image, such as _TEXT which corresponds to code segments, _DATA which corresponds to readable/writable data, and _LINKEDIT which supports dyLD which contains symbol tables and other data

  • Nsects:Marked theSegmentHow much of thesecetion
  • The segment name:The name of the segment, currently__PAGEZERO

Segment & Section

There is a naming problem. As shown in the figure below, __TEXT stands for Segment and the lowercase __TEXT stands for Section

Section data structure

struct section { /* for 32-bit architectures */
    char        sectname[16];    /* name of this section */
    char        segname[16];    /* segment this section goes in */
    uint32_t    addr;        /* memory address of this section */
    uint32_t    size;        /* size in bytes of this section */
    uint32_t    offset;        /* file offset of this section */
    uint32_t    align;        /* section alignment (power of 2) */
    uint32_t    reloff;        /* file offset of relocation entries */
    uint32_t    nreloc;        /* number of relocation entries */
    uint32_t    flags;        /* flags (section type and attributes)*/
    uint32_t    reserved1;    /* reserved (for offset or index) */
    uint32_t    reserved2;    /* reserved (for count or sizeof) */
};Copy the code
  • Sectname:Such as_text,stubs
  • Segname:thesectionSubordinate to thesegment, such as__TEXT
  • Addr:sectionAt the beginning of memory
  • The size:sectionThe size of the
  • Offset:sectionFile migration of
  • Align:Byte alignment
  • Reloff:The file offset of the relocation entry
  • Nreloc:Number of entrances that need to be relocated
  • Flags:containssectionthetypeandattributes

I’ve found that a lot of the underlying knowledge is based on Mach-O, so I’m going to spend some time doing some relatively in-depth summaries with Mach-O in the near future, such as symbol resolution, bitcode, reverse engineering, and so on

Refer to the link

  • Deep understanding of MAC OS X & iOS operating systems
  • mach-o/loader.h
  • Mach-o file format and program from load to execution process
  • OS X ABI Mach-O File Format Reference