MachO file

Mach-o is short for Mach Object file format, which is the executable file format for MAC and iOS. Is a file format for executable files, object code, and dynamic libraries. As an alternative to the A.out format, Mach-O provides greater extensibility. Similar to Portable PE on Windows, ELF on Linux (Executable and Linking Format)

Mach-o File format

  • Object file.o
  • The library files
    • .a
    • .dylib
    • framework
  • Executable file
  • dyld
  • .dsym

In actual development, there are many different types of MatchO files that can be specified in Xcode. Targets → Build Settings → Linking → Mach -o Type

Universal binary

  • A program code proposed by Apple that can work with binaries of multiple architectures simultaneously.
  • Provides optimal performance for multiple architectures simultaneously in the same package.
  • Because of the need to store multiple codes, programs are usually larger than single-platform binaries.
  • Since only part of the code is called during execution, no extra memory is required to run it.

During Xcode compilation, you can specify which architecture Match -o files to generate, and you can also add Targets → Build Settings → Architectures → Architectures for other Architectures

Device CPU architecture (instruction set)

  • The simulator:
    • 4s-5: i386
    • 5s-6s Plus: x86_64
  • Real Machine (iOS device):
    • Armv6: iPhone, iPhone 2, iPhone 3G, iPod Touch(first generation), iPod Touch(second generation)
    • Armv7: iPhone 3Gs, iPhone 4, iPhone 4S, iPad, iPad 2
    • Armv7s: iPhone 5 and iPhone 5C
    • Arm64: models after iPhone 5S

The Mach-O architecture is split and merged

  1. Lipo tool
  • Look at the MachO architecture

$lipo-info 'MachO file '

  • Break down the MachO architecture

$lipo 'MachO file '-thin' schema name '-output' target MachO file '

  • Merge the MachO architecture

$lipo-create 'first MachO file' 'second MachO file' -output 'target MachO file'

  1. The file command: displays file information

$file File path

MachO file structure

As shown in the figure, Mach-O consists of three parts: Header Commands, Load Commands, and Data

Header

Contains general information about the binary, schema type, byte order, number of load instructions, and so on.

struct mach_header_64 {
	uint32_t	magic;          /* mach magic number identifier */
	cpu_type_t      cputype;        /* cpu specifier */
	cpu_subtype_t	cpusubtype;     /* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;          /* number of load commands */
	uint32_t	sizeofcmds;     /* the size of all the load commands */
	uint32_t	flags;          /* flags */
	uint32_t	reserved;	/* reserved */
};
Copy the code

Magic: is the location architecture 64-bit or 32-bit (e.g. MH_MAGIC_64)

Cputype: CPU type (e.g. CPU_TYPE_ARM64)

Cpusubtype: CPU specific type (e.g. CPU_SUBTYPE_ARM64_ALL)

Filetype: filetype (e.g. MH_EXECUTE)

NCMDS: Load Commands number

Sizeofcmds: Size of Load Commands

Flags: flag bit. Identifies functions supported by binary files, mainly related to system loading and linking

Reserved: reserved area (only 64 bits)

Load Commands

Contains the location of the region, symbol table, dynamic symbol table, and so on. It describes the specific organization structure of the data in the file. Different data types are represented by different loading commands.

  • LC_SEGMENT_64(__PAGEZERO)
    • VM Size: indicates the virtual memory. The Size is 4 gb (16 MB for 32-bit). Used to distinguish between 32 bits and storage locations
  • LC_SEGMENT_64(__TEXT)
  • LC_SEGMENT_64(__DATA)
  • LC_SEGMENT_64(__LINKEDIT)
    • VM Address: indicates the virtual memory Address
    • VM Size: indicates the virtual memory Size
    • File Offset: Indicates the start position of data in a File
    • File Size: Indicates the Size of data in a File
  • LC_DYLD_INFO_ONLY (dynamic linking information)
    • Rebase: indicates the location for redirection. When MachO is loaded into memory, the system randomly allocates a memory offset, asLR, and the offset in rebase to get the actual location of the code in memory. Then open up the actual memory according to the size
    • Binding: indicates the location of the Binding
    • Weak Binding: Indicates the location of Weak Binding
    • Lazy Binding: indicates the location of the Lazy Binding
    • Export: indicates the external location information
  • LC_SYMTAB (symbol table address)
    • Symbol Table Offset: Indicates the position of the Symbol Table. Function name and function address associated information
    • Number of symbols
    • String Table Offset: Symbol name position
    • String Table Size: symbol name Size
  • LC_DYSYMTAB (Dynamic symbol table address)
  • LC_LOAD_DYLINKER (DyLD)
    • Str Offset: Position of the dynamic library connector
    • Name: Dynamic library Connector path (DYLD)
  • LC_UUID (MachO file unique identifier)
  • LC_VERSION_MIN_IPHONESOS (MachO files support the lowest OS version)
  • LC_SOURCE_VERSION
  • LC_MAIN (MachO program entry: sets the entry address and stack size of the main thread of the program)
    • Entry Offset: Indicates the Entry position
    • Stacksize: Stacksize
    • The Entry Point is a memory address.
  • LC_ENCRYPTION_INFO_64 (encrypt information)
    • Crypt Offset: indicates the location of the encryption information
    • Crypt Size: indicates the Size of the encrypted information
    • Crypt ID: indicates the ID of the encryption information. 0 for non-encryption, 1 for encryption
  • LC_LOAD_DYLIB (path of dependent libraries, including three-party libraries)
    • Str Offset: indicates the location of the dynamic library
    • Time Stamp: Time Stamp of the dynamic database
    • Current Version: Indicates the Version of the dynamic library
  • LC_RPATH (frameworks message)
  • LC_FUNCTION_STARTS (start position table of functions)
  • LC_DATA_IN_CODE (code data information)
  • LC_CODE_SIGNATURE

Data

It consists of a Segment and a Section. Store concrete data: code, data, string constants, classes, methods, etc.

  1. The Segment of
#define SEG_PAGEZERO    "__PAGEZERO" /* Catch null pointer */ when MH_EXECUTE file
#define SEG_TEXT    "__TEXT" /* Code/read only data segment */
#define SEG_DATA    "__DATA" /* Data segment */
#define SEG_OBJC    "__OBJC" /* Objective-c runtime segment */
#define SEG_LINKEDIT    "__LINKEDIT" /* Contains symbols and other tables that need to be used by the dynamic linker, including symbol tables, string tables, etc. */
Copy the code
  1. Segment data structure
struct segment_command_64 { 
    uint32_t    cmd;        /* LC_SEGMENT_64 */
    uint32_t    cmdsize;    /* Space required by section_64 structure */
    char        segname[16];    /* segment name */
    uint64_t    vmaddr;     /* Virtual memory address of the described segment */
    uint64_t    vmsize;     /* Virtual memory size allocated for the current segment */
    uint64_t    fileoff;    /* The offset of the current segment in the file */
    uint64_t    filesize;   /* Number of bytes in the file */
    vm_prot_t   maxprot;    /* The maximum memory protection required for the page where the segment is located
    vm_prot_t   initprot;   /* Raw memory protection for the page where the segment resides */
    uint32_t    nsects;     /* Number of sections in a segment */
    uint32_t    flags;      /* Identifier */
};
Copy the code
  1. Section data structure

The Segment of a Section (mainly __TEXT and __DATA) is further decomposed into sections.

struct section_64 { 
    char        sectname[16];   /* Section name */
    char        segname[16];    /* Name of Segment */
    uint64_t    addr;       /* Memory address where the Section is located */
    uint64_t    size;       /* Size of Section */
    uint32_t    offset;     /* The offset of the file where the Section is located */
    uint32_t    align;      /* Section's memory-aligned boundary (a power of 2) */
    uint32_t    reloff;     /* File offset for relocation information */
    uint32_t    nreloc;     /* Number of relocation entries */
    uint32_t    flags;      /* Flag attribute */
    uint32_t    reserved1;  /* Reserved field 1 (for offset or index) */
    uint32_t    reserved2;  /* Reserved field 2 (for count or sizeof) */
    uint32_t    reserved3;  /* Reserved field 3 */
};
Copy the code

Here are some common sections:

__TEXT, __TEXT: Main program code

__TEXT,__stubs / __stub_helper: stubs for dynamic linking

__TEXT, __objc_methName: OC method name

__TEXT __objc_classname: OC class name

__TEXT, __objc_methType: OC method type

__TEXT,__cstring: THE C language string in the program

__DATA,__got: non-lazily loaded symbol tables

__DATA,__la_symbol_ptr: lazily loading the symbol table

__DATA, __objc_ClassList: OC classlist

__DATA, __objc_protolList: OC prototype list

__DATA, __objc_imageInfo: information about the OC image

__DATA __objc_const: OC constants

__DATA,__objc_selfrefs: OC class self-reference (self)

__DATA,__objc_superrefs: OC superclass reference (super)

__DATA,__objc_protolrefs: OC prototype references

__DATA,__objc_data / __DATA: OC code data

Dynamic Loader Info: Information needed by the Dynamic linker (redirection, symbolic binding, lazy loading binding, etc.)

Function Starts: Starts the method

Symbol Table: Symbol Table

Dynamic Symbol Table: Dynamic Symbol Table

String Table: String Table

Code Signature: Indicates the Code Signature information