Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

In the previous article we introduced the compilation process. In this article we will focus on the contents of the compilation product Mach-O file.

The target file

The files generated by the compiler after compiling the source code are called object files. For Windows Portable Executable (PE) and Linux Executable Linkable Format (ELF), both variations of the COFF Format, After the target file is the source code to compile but not the middle of the link file (under Windows. Obj and Linux. O), are similar in the structure of it and the executable file, look from broad sense, the target file and the executable file format is almost the same, so we can be generalized to the target file and executable file as a type of document, In this article, we’ll focus on Mach-O, the object file for the iOS/macOS platform.

An overview of the Mach-O file contents

Using the MachOView tool, we look at the contents of a Mach-O file \

It can be seen that it is mainly divided into three parts:

  • Header: This describes the basic properties of the entire file, such as the file version, target machine model, program entry address, and so on.
  • Load Commands: It contains many tables that describe the organization of the data in the file. Different data types are identified by different Load Commands.
  • Data: indicates the largest part of the target fileSegmentSpecific data.

Header

Let’s look at the Header section first. The field values of the test project are as follows:

We can also find the definition in the loader.h file of the iOS SDK:

struct mach_header_64 {
  uint32_t  magic;    /* mach magic number identifier */
  cpu_type_t  cputype;  /* cpu specifier */
  cpu_subtype_t cpusubtype; /* machine specifier */
  uint32_t  filetype; /* type of file */
  uint32_t  ncmds;    /* number of load commands */
  uint32_t  sizeofcmds; /* the size of all the load commands */
  uint32_t  flags;    /* flags */
  uint32_t  reserved; /* reserved */
};
Copy the code

You can see that the Header mainly includes:

  • Magic Number: a file identifier used to identify whether the current device is big-ended or small-ended. The value can be MH_MAGIC_64 or MH_CIGAM.

  • CPU Type: indicates the CPU architecture, such as ARM64, X86_64, and I386.

  • CPU SubType: indicates the CPU type. CPU versions are different.

  • File Type: Indicates the Mach-O File types, including MH_EXECUTE (executable File), MH_OBJECT (redirected target File), and MH_DYLIB (dynamically bound shared library File).

  • Number of Load Commands: Indicates the Number of Load Commads in the file.

  • Size of Load Commands: Indicates the total byte Size of Load Commads in the file.

  • Flags: combination of different file tags. Each tag has a bit. You can use the bit or to combine the Flags

    • MH_NOUNDEFS: The file has no undefined references.
    • MH_DYLDLINK: This file will be used as input to the dynamic linker and cannot be modified by the static linker.
    • MH_TWOLEVEL: This file uses a two-level namespace binding
    • MH_PIE: Executable files will be loaded to random addresses, onlyMH_EXECUTEValid, which means enabledALSR.
  • Reserved: Reserved field.

When the mach-O file is loaded by the system, the Header part will be read first, and the Load Commands Load part will be found through the Header. When the Load command is read, it can be loaded into the code we wrote.

Load Commands

Load Commands Describe segment information in Data, such as the segment name, segment length, offset in the file, read permissions, and other attributes of the segment. The starting position is determined by the size of the Header.

LoadCommad consists of several segments:

  • LCSEGMEMT64(_PAGEZERO): This segment is 0 in the virtual memory and cannot be read, written or executed. It is used to process null Pointers. If you attempt to access this segment, the system will crash.
  • LCSEGMEMT64(_TEXT): Executable code and some other read-only data, mapped to memory, immediately following the _PAGEZERO end.
  • LCSEGMEMT64(_DATA): Data that can be changed, mapped to memory, immediately following the _TEXT end.
  • LCSEGMEMT64(_LINKEDIT): Raw data for the dynamic link library, such as symbols, strings, relocation table entries, etc.
  • LC_DYLD_INFO_ONLY: Stores important information about dynamic links for address redirection. Based on the offset it records, we can find relevant information in the Dynamic Loader Info.
  • LC_SYMTAB: symbol table used by the file. Obtain the symbol table offset, symbol number, string label offset, and string table size.
  • LC_DYSYMTAB: symbol table used by the dynamic linker. If found, the offset of the indirect symbol table is obtained.
  • LC_LOAD DYLINKER: default loader path (/usr/bin/dyld)
  • LCUD: Unique Identifier (UUID) for mach-O files
  • LC_VERSION MIN IPHONEOS: indicates the lowest system version required by the Mach-O file, which is related to the target configured by Xcod.
  • LC_SOURCE VERSION: Source code VERSION of the build binary.
  • LC_MAIN: Entry to the program, including entryOffsetandPoint.
  • LC_ENCRYPTION_INFO64: File encryption information, including the encryption mark, offset, and size of the encrypted data.Crypt IdA value of 1 indicates encryption, and a value of 0 indicates unencryption.
  • LC_LOAD_DYLIB: dependent dynamic library, including dynamic library path, current version, compatible version, corresponding to our dependent framework selectionRequired.
  • LC_LOAD_WEAK_DYLIB: dependent weak reference library, if there is this library under the loading path, it is introduced, otherwise it can not be introduced, does not affect the use, but once used without this library, will also report an exception, corresponding to our dependence on the framework is selectedOptional.
  • LC_RPATH: path to @math, specifying the dynamic linker search path list
  • LC_FUNCTION_STARTS: records the function start address table
  • LC_DATA_IN_CODE: A list of non-instructions defined within a code segment.
  • LC_CODE_SIGNATURE: indicates code signature information

Load Command consists of multiple segments. Loader. h:

struct segment_command_64 { /* for 64-bit architectures */
  uint32_t  cmd;    /* LC_SEGMENT_64 */
  uint32_t  cmdsize;  /* includes sizeof section_64 structs */
  char    segname[16];  /* segment name */
  uint64_t  vmaddr;   /* memory address of this segment */
  uint64_t  vmsize;   /* memory size of this segment */
  uint64_t  fileoff;  /* file offset of this segment */
  uint64_t  filesize; /* amount to map from the file */
  vm_prot_t maxprot;  /* maximum VM protection */
  vm_prot_t initprot; /* initial VM protection */
  uint32_t  nsects;   /* number of sections in segment */
  uint32_t  flags;    /* flags */
};
Copy the code
  • CMD: The type of the segment, and the flags bit below determine how the segment is loaded.
  • Cmdsize: space required by the section_64 structure.
  • Segname [16] : The name of the segment.
  • Vmaddr: virtual memory address of the described segment.
  • Vmsize: virtual memory size allocated for the current segment.
  • Fileoff: indicates the offset of the current segment in the file.
  • Filesize: Bytes occupied by the current segment in the file.
  • Maxprot: The maximum memory protection required for the segment page, expressed in octal.
  • Initprot: protects the raw memory of the page where the segment resides.
  • Nsects: The number of sections in a segment
  • Flags: indicates the identifier

Data

All sections stored in Data, such as machine instructions, global and local static variables, symbol tables, debugging information, etc., are stored in the corresponding sections. The Section structure can be defined in loader.h:

struct section_64 { /* for 64-bit architectures */
  char    sectname[16]; /* name of this section */
  char    segname[16];  /* segment this section goes in */
  uint64_t  addr;   /* memory address of this section */
  uint64_t  size;   /* size in bytes of this section */
  uint32_t  offset;   /* file offset of this section */
  uint32_t  align;    /* section alignment (power of 2) */
  uint32_t  reloff;   /* file offset of relocation entries */
  uint32_t  nreloc;   /* number of relocation entries */
  uint32_t  flags;    /* flags (section type and attributes)*/
  uint32_t  reserved1;  /* reserved (for offset or index) */
  uint32_t  reserved2;  /* reserved (for count or sizeof) */
  uint32_t  reserved3;  /* reserved */
};
Copy the code

Several important sections are as follows

__TEXT(code snippet) section:

  • __text: Main program code.
  • __objc_methName: specifies the name of the OC method
  • __csString: read-only C language string
  • __objc_methType: OC Method type (method signature). A method signature contains information about a method’s method name, parameter types, and its class.

__Data section:

  • __const: constant
  • __data: Stores mutable data that has been initialized
  • __bSS: Stores uninitialized global and local static variables
  • __objc_classname: the name of the class that stores the OC.
  • __objc_clalSSList: list of OC methods
  • __objc_protocollist: indicates the protocollist
  • _LA_SYMBOL_Ptr: lazy bound symbol pointer table (there is also nl_symbol_pt: non-lazy bound pointer table)
  • _objc_nlclslist: List of Objective-C classes that implement the +load method.
  • _ objc_catalist:OC Category list.
  • objcClasses :OC Class reference list
  • _objc_protore: indicates the REFERENCE list of the OC protocol

conclusion

In this article we mainly looked at the structure of the Mach-O file mainly includes:

  • Header: Used to quickly determine the CPU type and file type of the file
  • Load Commands: Instructs the loader how to set and Load binary data
  • Data: Stores Data, such as code, Data, string constants, classes, methods, etc.

We have a number of applications for the Mach-O file (more on that later), including, but not limited to, the following:

  • Look at the Fishhook principle, which uses_la_symbol_ptrAnd other aspects of the principle, in the load of C language method replacement.
  • Package volume optimization to find unused classes (Objc_classlist and _objcClasses)
  • The detection used by the load method (_objC_NLclslist).