IOS underlying principles + reverse article summary

Virtual memory & physical memory

In the early days, data was accessed directly from physical addresses, which had two problems:

  • 1. Not enough memory

  • 2, memory data security issues

Solution for running out of memory: virtual memory

For problem 1, we added a middle layer between process and physical memory. This middle layer is called virtual memory, which is mainly used to manage physical memory when multiple processes exist at the same time. Improved CPU utilization, enabling multiple processes to be loaded simultaneously and on demand. So virtual memory is essentially a mapping table of virtual addresses and physical addresses

  • Each process has an independent virtual memory, the address is from 0, size is 4 g fixed, each virtual memory is divided into a one page (page size is 16 k in iOS, the other is 4 k), is loaded every time a page to load, is unable to visit each other between processes, ensure the security of data between processes.

  • Only part of a process is active, so only the active part of the process needs to be put into physical memory to avoid wasting physical memory

  • When the CPU needs to access data, it first accesses virtual memory, and then addresses through virtual memory. That is, it can be understood as looking for the corresponding physical address in the table, and then accesses the corresponding physical address

  • If the contents of the virtual address are not loaded into the physical memory during the access, a pagefault exception will occur and the current process will block. At this time, the data needs to be loaded into the physical memory before addressing and reading. This avoids memory waste

The following figure shows the relationship between virtual memory and physical memory

Security of in-memory data: ASLR technology

In the above explanation of virtual memory, we mentioned the starting address of the virtual memory and size are fixed, this means that when we visited, the data address is fixed, which can lead to our data is very easy to be cracked, in order to solve this problem, so apple in order to solve this problem, at the beginning of the iOS4.3 ASLR technology is introduced.

Concept of ASLR: Address Space Layout Randomization is a security protection technology against buffer overflow. By randomizing the Layout of linear areas such as heap, stack and shared library mapping, it is difficult for attackers to predict the destination Address. A technique that prevents attackers from directly locating the attack code to prevent overflow attacks.

Its purpose is to configure data address space in a random way, so that some sensitive data (such as APP login registration, payment related code) can be configured to an address that cannot be known in advance by a malicious program, making it difficult for attackers to attack.

Due to the existence of ASLR, the loading address of executable files and dynamic link libraries in virtual memory is not fixed every time when starting, so it is necessary to fix the resource pointer in the image at compile time to point to the correct address. Correct memory address = ASLR address + offset value

Executable file

Different operating systems have different formats for executable files. The system kernel reads the executable into memory and then signs the executable according to its header (magicDetermine the format of the binary file

PE, ELF and Mach-O are all variants of COFF (Command File Format). The main contribution of COFF is the introduction of the mechanism of “segments” in object files. Different object files can have different numbers and types of “segments”.

Universal binary

Because different CPU platforms support different instructions, for examplearm64andx86The common binary format in Apple isPackage mach-O files from multiple architectures togetherAnd then the system selects the appropriate Mach-O according to its CPU platform, soUniversal binary formatAlso referred to as theFat binary format, as shown in the figure below

The generic binary format is defined in

. You can download xnu and find it in xnu -> EXTERNAL_HEADERS -> Mach-o. The generic binary starts with the FAT Header structure fat_header. While Fat Archs indicate how many Mach-Os there are in the generic binary, the description of a single Mach-O is via the FAT_ARCH structure. The two structures are defined as follows:

/* -magic: allows the system kernel to read the file knowing that it is a generic binary. Mach-o */ struct fat_header {uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; /* mach-o-cputype and cpusubtype: */ struct fat_arch {cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ };Copy the code

So, to sum up,

  • Universal binary file is a new binary file storage structure proposed by Apple, which can store binary instructions of multiple architectures at the same time, so that the CPU can automatically detect and select the appropriate architecture when reading the binary file, and read it in the most ideal way

  • Because general-purpose binaries store multiple schemas simultaneously, they are much larger than single-schema binaries and take up a lot of disk space, but because the system automatically selects the most appropriate schema code, unrelated schema code does not take up memory space and is executed more efficiently

  • Mach-o can also be merged and split by instruction

    • View the current mach-o architecture: lipo-info MachO file

    • Merge: lipo-create MachO1 MachO2 -output Output file path

    • Split: Lipo MachO files – Thin schema – output Output file path

The Mach – O files

Mach-o file is short for Mach Object file format, which is a file format for executable files, dynamic libraries, and Object code. As an alternative to the A.out format, the Mach-O format provides greater extensibility and faster access to symbol table information

Familiarity with Mach-O file format is helpful to better understand the underlying operating mechanism of Apple and better master the steps of dyLD loading Mach-O.

Viewing Mach-O Files If you want to view specific Mach-O file information, you can do so in one of two ways. The second is recommended as being more intuitive

  • 【 Method 1 】 Otool terminal command:Otool -l Mach -o File name

-【 Method 2 】MachOViewTool (recommended) : Drag the Mach-o executable toMachOViewTool to open the

Mach-o File format

For OS X and iOS, Mach-O is the executable file format, which mainly includes the following file types

  • Executable: Executable file
  • Dylib: Dynamic link library
  • Bundle: a dynamic library that cannot be linked and can only be loaded at run time using dlopen
  • ImageExecutable, Dylib, and Bundle
  • FrameworkA collection containing Dylib, resource files, and header files

The mach-o image file format is shown belowThe above is the mach-o file format, a complete oneMach-OThe document is divided into three main sections:

  • Header Mach-O Header: Information about the MACH-O CPU architecture, file types, and loading commands

  • Load Commands: Describes the organization of data in a file. Different data types are represented by different Load Commands

  • Data: Each segment of the Data is stored here. The concept of a segment is similar to the concept of a middle section of an ELF file. Each segment has one or more sections that place specific data and code, mainly including code, data, such as symbol table, dynamic symbol table, and so on

Header

The Mach-o Header contains the key information for the entire Mach-o file, enabling the CPU to quickly learn the basic mac-O information, which is used in mach.h (the same path as in fat. H) for 32-bit and 64-bit cpus. Mach-o headers are described using the mach_header and mach_header_64 structures, respectively. The mach_header is the first field read by the connector when it is loaded. It determines the infrastructure, system type, number of instructions, and so on. The mach_header_64 structure is defined in 64-bit architecture.

/* -magic: 0xfeedFace (32-bit) 0xfeedfacf(64-bit), which is used by the kernel to determine if it is in mach-O format. -cputype: CPU type, such as arm-cpusubtype: For example, arm64, armv7 -filetype: Since executable files, object files, static libraries, and dynamic libraries are in mach-O format, filetype is required to indicate which type of file a Mach-o file is. -ncmds: sizeofcmds: -sizeofcmds: sizeof the LoadCommands command to load. -flags: Flag bits that indicate the functions supported by the binary file, mainly related to system loading and linking. -reserved: reserved: */ struct mach_header_64 {uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ };Copy the code

Filetype mainly records the filetype of mach-o. There are several commonly used ones

#define MH_OBJECT 0x1 /* Object file */ #define MH_EXECUTE 0x2 /* Executable file */ #define MH_DYLIB 0x6 /* Dynamic library */ #define MH_DYLINKER 0x7 /* Dynamic linker */ #define MH_DSYM 0xa /* Store binary file symbol information for debug analysis */Copy the code

The corresponding Header is inMachOViewIs shown below

Load Commands

In mach-o files, Load Commands are primarily used to Load instructions, the size and number of which are provided in the Header, and are defined below in Mach.h

/* struct load_command {uint32_t CMD; /* uint32_t; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ };Copy the code

We are inMachOViewTo view Load Commands, which records a lot of information such asDynamic linker location, program entry, dependency library information, code location, symbol table locationAnd so on, as follows

The type segment_command_64 of LC_SEGMENT_64 is defined as follows

/* segment_command Load command -cmd: indicates the type of the load command, -cmdsize: indicates the size of the load command (including the size of the nsects section immediately following) -segname: indicates the name of the 16-byte segment -vmaddr: -vmsize: indicates the virtual memory size of a segment. -fileoff: indicates the offset of a segment in a file. -filesize: indicates the size of a segment in a file. Maximum memory protection required for a segment page (4 = r, 2 = w, 1 = x) -initprot: Initial memory protection for a segment page -nsects: Number of sections in a segment -flags: Other miscellaneous flag bits - From fileoff (offset), take the filesize byte binary data, put the VMsize byte in memory vmADDR. (The binary data from filesoff to filesize bytes, known as "segments") - Each segment has the same permissions (or, at compile time, the compiler puts together data with the same permissions to become segments), and its permissions are initialized according to initProt. Initprot specifies how to initialize the page's protection level with read/write/execute bits - the protection Settings for a segment can be changed dynamically, but cannot exceed the value specified in maxProt (in iOS, */ struct segment_command_64 {/* for 64-bit architectures */ uint32_t CMD; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* includes sizeof section_64 structs */ char segname[16]; /* segment name */ uint64_t vmaddr; /* memory address of this segment */ uint64_t vmsize; /* memory size of this segment */ uint64_t fileoff; /* file offset of this segment */ uint64_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ };Copy the code

Data

After Load Commands comes the Data area, which stores specific read-only, read-write code such as methods, symbol tables, character tables, code Data, Data required by connectors (redirects, symbolic bindings, etc.). Mainly to store specific data. Most of these Mach-o files contain the following three segments:

  • __TEXT code: read-only, including functions, and read-only strings
  • __DATA data segment: reads and writes, including global variables that can be read and written
  • __LINKEDITThe: __LINKEDIT contains metadata (locations, offsets) for methods and variables, as well as information such as code signatures.

In the Data Section, a large proportion of sections are represented by the structure section_64 (under arm64 architecture) in Mach.h, which is defined below

-sectName: the name of the current Section -segName: the name of the segment where the Section is located -addr: the starting location in memory -size: Section size -offset: section file offset -align: byte size alignment -reloff: file offset for relocation entry -nreloc: Number of relocation entries -flags: Flag, type and attribute of section - reserved1: reserved (for offset or index) - reserved2: reserved (for count or sizeof) - reserved3: */ struct section_64 {/* for 64-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint64_t addr; /* memory address of this section */ uint64_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ uint32_t reserved3; /* reserved */ };Copy the code

SectioninMachOViewAs can be seen, mainly reflected inTEXTandDATAIn two paragraphs, as followsThe common sections are as follows

section – __TEXT instructions
__TEXT.__text Main program code
__TEXT.__cstring C language string
__TEXT.__const The const keyword modifies a constant
__TEXT.__stubs The placeholder code used for stubs is referred to in many places as Stub code
__TEXT.__stubs_helper The final point when the Stub cannot find the real symbol address
__TEXT.__objc_methname Objective-c method name
__TEXT.__objc_methtype Objective-c method type
__TEXT.__objc_classname Objective – C class name
section – __DATA instructions
__DATA.__data Initialized mutable data
__DATA.__la_symbol_ptr Lazy Binding’s pointer table, where Pointers start out pointing to __stub_Helper
__DATA.nl_symbol_ptr Non-lazy binding pointer table, each pointer in the entry points to a symbol that has been searched by the dynamic chain machine during the loading process
__DATA.__const An uninitialized constant
__DATA.__cfstring Core Foundation strings used in the program (CFStringRefs)
__DATA.__bss BSS, which stores global variables for initialization, is often referred to as static memory allocation
__DATA.__common Uninitialized symbol declaration
__DATA.__objc_classlist Objective – C class list
__DATA.__objc_protolist Objective – C prototype
__DATA.__objc_imginfo Objective-c mirroring information
__DATA.__objc_selfrefs Objective – C self reference
__DATA.__objc_protorefs Objective-c prototype references
__DATA.__objc_superrefs Objective-c superclass reference

Therefore, to sum up, the format diagram of Mach-O is shown below