Main Contents:

  1. Understand executables
  2. understandMach-Ofile
  3. Mach-OFile structure
  4. Mach Header
  5. Load Commands
  6. Data
  7. Understand small – and small-end patterns
  8. Understand common binaries

Understand executable files

1. Executable files
  1. processIn fact, it isExecutable fileLoad the results in memory;
  2. Executable fileIt must be in a format understandable by the operating system and different from one anotherExecutable fileThe format is also different;
2. Executables for different platforms
  • Linux: the ELFfile
  • Windows:PE32 / PE32 +file
  • OS and iOS:Mach-O(Mach Object)file

Understand mach-O files

As an executable file format for iOS, iPadOS and macOS platforms, Mach-O files involve App startup and operation, bitcode analysis, crash symbolization and many other functions:

1. The Mach – O files
  1. Mach-OThe file isiOS.iPadOS,macOSExecutable file format for the platform. The corresponding system applies the binary interface (application binary interface, abbreviated asABI) to run a file in that format;
  2. Mach-OFormat is used insteadBSDIn the systema.outFormat that holds the files generated during compilation and linkingMachine code and dataTo provide a single file format for statically linked and dynamically linked code.
  3. Mach-OProvides stronger scalability, and faster symbol table information access speed;
2. Common file types in Mach-O format
  1. Executable: executable file (.out .o);
  2. Dylib: dynamic link library;
  3. Bundle: cannot be linked and can only be used at run timedlopen()Load;
  4. Image: containsExecutable,DylibandBundle;
  5. Framework: containsDylib, folder for resource files and header files;

Mach-O file structure

1. Look at two approaches to Mach-O
  1. useMachOViewSoftware, can be viewed directlyMachOFile structure;
  2. Using terminal Commandsobjdump;
2. View the Mach-O file structure

useMachOViewTo viewMach-O, the effect is as follows:

The Mach-O file contains three main sections:

  1. Header: Header, descriptionCPUType, file type, number and size of load commands, etc.
  2. Load Commands: load command whose number and size are already inheaderTo be provided in;
  3. Data: Data segment;

Other information:

  1. Dynamic Loader Info: dynamic library loading information
  2. Function Starts: entry function
  3. Symbol Table: the symbol table
  4. Dynamic Symbol Table: dynamic library symbol table
  5. String Table: string table

Mach Header

1. Function summary
  1. HeaderIs the first thing the linker reads when it loads, because it determines somethingThe infrastructure,System typeInformation such as;
  2. HeaderContain the entireMach-OKey information of the file, such asThe CPU type,The file type,Number of loading commandsAnd other information, so that the system can quickly locateMach-OThe running environment of the file;
  3. Headerfor32And the64A framework ofCPU, corresponding tomach_headerandmach_header_64Structure of;
2. Source code analysis

Header is defined in loader.h as follows:

struct mach_header_64 {
    uint32_t    magic;          // 32-bit or 64-bit, which is used by the system kernel to determine whether it is mach-O format
    cpu_type_t  cputype;        // CPU architecture type, such as ARM
    cpu_subtype_t   cpusubtype; // CPU type, for example, arm64 or armv7
    uint32_t    filetype;       // Mach-o file types, executables, object files or static and dynamic libraries
    uint32_t    ncmds;          // number of LoadCommands to load (LoadCommands immediately after header)
    uint32_t    sizeofcmds;     // the size of all LoadCommands LoadCommands
    uint32_t    flags;          // The flag bit identifies the functionality supported by binary files, mainly related to system loading and linking
    uint32_t    reserved;       // Reserved fields (as opposed to 32-bit fields)
    };
Copy the code

Because executables, object files, static and dynamic libraries, etc., are in Mach-O format, fileType is required. Common file types are as follows:

#define MH_OBJECT   0x1     /* Target file */
#define MH_EXECUTE  0x2     /* Executable file */
#define MH_DYLIB    0x6     /* Dynamic library */
#define MH_DYLINKER 0x7     /* Dynamic linker */
#define MH_DSYM     0xa     /* Store binary file symbol information for debug analysis */
Copy the code
3. MachOView demonstration

Analyze Load Commands

1. Function summary
  1. Load CommandsIs a list of load commands that describeDataLayout information in binary files and virtual memory;
  2. Load CommandsA lot of information is recorded, such as the location of dynamic linker, program entry, dependent library information, code location, symbol table location, etc.
  3. Load commandsDefined by the kernel, different versionscommandThe number of different items and their sizes are recorded inheader;
  4. Load commandsthetypeBased onLC_Is a prefix constant, for exampleLC_SEGMENT,LC_SYMTABAnd so on;
2.. The code analysis

Load Command is defined in loader.h as follows:

struct load_command {
    uint32_t cmd;       /* Type of the load command */
    uint32_t cmdsize;   /* The size of the load command */
};
Copy the code

Each Load Command has a separate structure, but the first two fields of all the structures are fixed. For example, LC_SEGMENT_64 is a command to read the segment and section. The code is as follows:

struct segment_command_64 { /* for 64-bit architectures */
    uint32_t    cmd;          // Indicates the type of the load command
    uint32_t    cmdsize;      // indicates the size of the load command (as well as the size of the nsects sections immediately following it)
    char        segname[16];  // 16-byte segment name
    uint64_t    vmaddr;       // The virtual memory start address of the segment
    uint64_t    vmsize;       // Segment virtual memory size
    uint64_t    fileoff;      // The offset of the segment in the file
    uint64_t    filesize;     // The size of the segment in the file
    vm_prot_t   maxprot;      // Maximum memory protection (4 = r, 2 = w, 1 = x)
    vm_prot_t   initprot;     // Initial memory protection for section pages
    uint32_t    nsects;       // The number of sections in a section
    uint32_t    flags;        / / sign
};
Copy the code

Sixth, the Data

1. Function summary
  1. DataStored in the actual data and code, mainly including methods, symbol table, dynamic symbol table, dynamic library loading information (redirection, symbol binding, etc.);
  2. DataThe arrangement is exactly as followsLoad CommandDescription in;
  3. DatabySegment(period) andSection(section), usually,DataHave more than onesegment, eachsegmentThere can be anything from zero to manysectionSection;
  4. differentsegmentThere is a paragraphA virtual addressMap to the address space of the process;

Almost all Mach-O files contain three segments

  1. __TEXT: code snippet, read-only executable, storedThe binary code of the function (__text).Constant string (__cstring).The class/method name of OCInformation such as
  2. __DATA: Data segment, readable and writable, storageOC string (__cfString), as well asRuntime metadata: class/protocol/method, and global variables, static variables, etc.
  3. __LINKEDIT: Read-only, the storage is startedAppRequired information, such asAddress of bind & Rebase, the name and address of the function;
2. Source code analysis

Sections make up a large part of the Data Section, and in Mach-O are concentrated in __TEXT and __DATA sections.

The Section is defined in the loader.h file as follows:

struct section_64 { /* for 64-bit architectures */
    char        sectname[16];   // Name of the current section
    char        segname[16];    // The segment name where the section resides
    uint64_t    addr;       // Start location in memory
    uint64_t    size;       / / section size
    uint32_t    offset;     // Section file offset
    uint32_t    align;    // Align the byte size
    uint32_t    reloff;     // Reposition the file offset of the entry
    uint32_t    nreloc;   // Reposition the number of entries
    uint32_t    flags;      // Flags, section types and attributes
    uint32_t    reserved1;  // Reserved (for offsets or indexes)
    uint32_t    reserved2;  // Keep (for count or sizeof)
    uint32_t    reserved3;  / / to keep
};
Copy the code

7. Understand the big and small end patterns

When analyzing Mach-O files, you often see the memory address, which brings up the concept of size-side patterns.

  1. Small-endian mode: low bytes of data, stored at low addresses in memory;
  2. Big-endian mode: low bytes of data, stored at high addresses in memory;

The processor of iOS devices is based on ARM architecture. By default, data is read in small-endian mode (low byte put low), while data is transmitted in big-endian mode (low byte put high) for network and Bluetooth:

Use unsigned int value = 0x12345678 as an example. We can use unsigned char buf[4] to represent value

Little-Endian: Low address Stores the low address as follows: Low address------------------>High address0x78  |  0x56  |  0x34  |  0x12

Big-Endian: Low address Stores the high address as follows: Low address----------------->High address0x12  |  0x34  |  0x56  |  0x78
Copy the code
Memory address Small – end mode stores content Big-endian mode stores content
0x4000 0x78 0x12
0x4001 0x56 0x34
0x4002 0x34 0x56
0x4003 0x12 0x78

Understand common binaries

1. Basic concepts
  1. The common binary storage structure is multiple architecturesMach-OThe files are packed together,CPUWhen reading the binary file, it can automatically detect and select the appropriate schema;
  2. General-purpose binaries store multiple schemas at the same time, so they are much larger than single-schema binaries and can take up a lot of disk space. However, since the system automatically selects the most appropriate, irrelevant architecture code when running, and does not take up memory space, the execution efficiency is improved;
  3. The universal binary format is also known as the fat binary format;
2. Universal binary format analysis

The general binary format is defined in < Mach-o /fat.h> :

  1. downloadxnuAfter, in turnxnu -> EXTERNAL_HEADERS ->mach-oThe file is found in.
  2. Generic binaries have two important structures:fat_header,fat_arch;

The two structures are defined as follows:

-nfat_arch: indicates that there are multiple fat_arch structures below, i.e., how many mach-o */ common binaries contain
struct fat_header {
    uint32_t    magic;      /* FAT_MAGIC */
    uint32_t    nfat_arch;  /* number of structs that follow */
};

/* Fat_arch describes mach-o-cpuType and cpusubtype: describes the platforms for mach-o - offset, size, align describe the position of the Mach-O binary in the common binary */
struct fat_arch {
    cpu_type_t  cputype;    /* cpu specifier (int) */
    cpu_subtype_t   cpusubtype; /* machine specifier (int) */
    uint32_t    offset;     /* file offset to this object file */
    uint32_t    size;       /* size of this object file */
    uint32_t    align;      /* alignment as a power of 2 */
};
Copy the code

Refer to the link

  1. xnu
  2. Mach-O official source code