• Self-cultivation of an iOS programmer (I) Compile and link
  • What’s in Mach-O
  • An iOS programmer’s Self-cultivation (iii) Mach-O file static links
  • Self-cultivation of an iOS programmer (iv) Executable file loading
  • An iOS programmer’s Self-cultivation (5) Mach-O file dynamic linking
  • The Self-cultivation of an iOS Programmer (6) Dynamically linked Applications: The Fishhook Principle
  • The self-cultivation of an iOS programmer (7) Static link Applications: The Principle of static library staking
  • Self-cultivation of an iOS programmer (8) Memory

Mach-O

Mach-O files are widely described on the web, but most of them simply describe the structure inside the file without explaining why it is arranged the way it is. Take a look at the Mach-O file with the MachOView tool by reading Programmer Self Training.

In addition to Mach-O on iOS, there are PE on Windows and ELF on Linux. They are based on a variant called the COFF file, whose main contribution is the introduction of the “segments” mechanism in which our applications are stored in Mach-O. In addition to machine-code instructions and data, Mach-O contains symbol tables, debugging information, string tables, and so on, all stored in segments. Apple’s official description of mach-O’s structure is as follows:

The following diagram will be used to analyze the internal details of Mach-O step by step. The whole Mach-O is divided into three parts:

  • Header: The uppermost part is the Mach-O file Header, which describes the file version, target machine model, program entry, and so on.
  • Load Commands: Consists of multiple segments, each of which contains multiple sections of the same type. Why is it called the load command? Because it is used by the system to load.
  • Data: The sections described by Load Commands include written command code, defined constant variables, symbol tables, string tables, and other familiar sections. This means that the application you are writing will be split into sections and stored in a Mach-O file.

So why is Mach-O stored in this “segment” format? In fact, there are many benefits to this segmentation:

  1. Each segment can be mapped to a different memory region according to its read and write permissions. For example, the program’s instructions are readable, so they are mapped to the readable region, which can prevent the program’s instructions from being overwritten intentionally or unintentionally.
  2. Modern cpus have a powerful cache architecture that stores segments to improve cache hits.
  3. When multiple copies of the program are running in the system, their instructions are the same, so only one copy of the read-only data in memory can save a lot of memory.

Header

The following toThe ant wealthTake the MachOView as an example to look at the internal details:

Because there are many sections of Data, only part of them are captured here.

The mach-o header structure and related constants are defined in the /usr/include/mach-o/loader.h file. Since the CPU architecture after 5s is 64-bit, here is the structure definition of the 64-bit version:

struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};
Copy the code
  • Magic: A field called “magic number” identifies the format of a Mach-O file.
  • Cputype: indicates the CPU type.
  • Cpusubtype: machine identifier.
  • Filetype: indicates the filetype.
  • NCMDS: Number of Load Commands.
  • Sizeofcmds: Load Commands Size.
  • Flags: indicates the dynamic linker identifier.
  • Reserved: Reserved field.

It is not easy to understand this structure. Let’s take a look at the Header structure of ant Wealth App through MachOView:

The Header part mainly describes the Mach-O file. As can be seen from the figure above, the offset of the last field of the Header relative to the start of the file is 0x1c. Although the reserved field has no value, it still occupies 4 bytes. So the entire Header takes up 32 bytes. When the mach-O file is loaded by the system, the Header part will be read first. From the Header part, you can find the Load Commands Load command part, which can be loaded into our code. An important field in the Header, sizeofcmds, is used to indicate the sizeof Load Commands, so you can find the location of Load Commands.

Load Command

Load Command is the most important structure in a Mach-O file except for the Header. It describes the information of each segment in Data, such as the segment name, segment length, offset in the file, read and write permissions, and other attributes of the segment. Its position is determined by the size of the Header. Here are the starting and ending positions of the Load Command:

The start and end offsets of Load Command are 0x20 and 0x2380 respectively. 0x20 is 32 in base 10, which is exactly the Size of the Header. The difference between 0x2380 and 0x20 is exactly the Size of Load Commands in the Header 0x2368. This verifies that the location of the Load Command is immediately after the Header.

The main structure of Load Command in MachOView is as follows:

Load Command consists of multiple segments, one of which contains one or more sections with similar attributes. The definition of the Segment structure can also be found in “/usr/include/mach-o/loader.h” :

struct segment_command_64 { /* for 64-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT_64 */
	uint32_t	cmdsize;	/* includes sizeof section_64 structs */
	char		segname[16];	/* segment name */
	uint64_t	vmaddr;		/* memory address of this segment */
	uint64_t	vmsize;		/* memory size of this segment */
	uint64_t	fileoff;	/* file offset of this segment */
	uint64_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};
Copy the code
  • CMD: The type of the Segment, and the flags bit below determine how the Segment is loaded.
  • Segname: indicates the segment name.
  • Vmaddr: indicates the start address of the current segment in virtual memory.
  • Vmsize: indicates the length of the current segment in the virtual memory address.
  • Fileoff: offset in a file.
  • Filesize: Specifies the length of a file.
  • Nsects: indicates the number of sections.
  • Flags: flags that indicate attributes in the process’s virtual address space, such as writable and executable.

Vmaddr and VMsize are used when the application is loaded into virtual memory. When loading mach-O into virtual memory, it starts at the vmADDR position on the virtual memory and extracts the vmsize space to store this segment. The main difference between Segment and Section is that these two columns do not exist in Section because they are used for loading.

Data

All sections stored in Data, such as machine instructions, global and local static variables, symbol tables, debugging information, etc., are stored in the corresponding sections:

In Load Commands, you can find the location and size of each Section by using the Segment, as shown in the figure above_textThe offset of the segment in the file is0x40E0, and it’s in the SegmentoffsetAre consistent, as shown in the red box below:

Loader. h: loader.h: loader.h: loader.h: loader.h: loader.h: loader.h

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};
Copy the code

In particular, sectName indicates what information this Section holds. Here are some examples:

  • __text: executable machine code.
  • __cString: Some C strings.
  • __const: constant.
  • __data: Stores initialized mutable data.
  • __bSS: Stores uninitialized global and local static variables.
  • __objc_clasname: Storage OC class name.
  • __objc_classList: list of methods.
  • __objc_protocollist: indicates the protocollist.

Section and Segument

The main difference between sections and segments is not explained in detail above. If there are sections, why should there be segments?

In iOS, the size of a page is defined as 16KB, and each Section is mapped as an integer multiple of the length of the system page. Sections that are less than a page can also occupy a page, which can cause a lot of memory fragmentation as sections grow. The Segment is a repartition of the mach-O segments from the point of view of loading. For sections with the same permissions, merge them together and map them as one Segment. From the point of view of object file link, mach-O file is stored by Section, but from the point of view of load, it is divided by Segment, it should be easy to understand the above official Apple Mach-O structure.

reference

Self-cultivation of the Programmer