IOS underlying principles + reverse article summary
Virtual memory & physical memory
In the early days, data was accessed directly from physical addresses, which had two problems:
-
1. Not enough memory
-
2, memory data security issues
Solution for running out of memory: virtual memory
For problem 1, we added a middle layer between process and physical memory. This middle layer is called virtual memory, which is mainly used to manage physical memory when multiple processes exist at the same time. Improved CPU utilization, enabling multiple processes to be loaded simultaneously and on demand. So virtual memory is essentially a mapping table of virtual addresses and physical addresses
-
Each process has an independent virtual memory, the address is from 0, size is 4 g fixed, each virtual memory is divided into a one page (page size is 16 k in iOS, the other is 4 k), is loaded every time a page to load, is unable to visit each other between processes, ensure the security of data between processes.
-
Only part of a process is active, so only the active part of the process needs to be put into physical memory to avoid wasting physical memory
-
When the CPU needs to access data, it first accesses virtual memory, and then addresses through virtual memory. That is, it can be understood as looking for the corresponding physical address in the table, and then accesses the corresponding physical address
-
If the contents of the virtual address are not loaded into the physical memory during the access, a pagefault exception will occur and the current process will block. At this time, the data needs to be loaded into the physical memory before addressing and reading. This avoids memory waste
The following figure shows the relationship between virtual memory and physical memory
Security of in-memory data: ASLR technology
In the above explanation of virtual memory, we mentioned the starting address of the virtual memory and size are fixed, this means that when we visited, the data address is fixed, which can lead to our data is very easy to be cracked, in order to solve this problem, so apple in order to solve this problem, at the beginning of the iOS4.3 ASLR technology is introduced.
Concept of ASLR: Address Space Layout Randomization is a security protection technology against buffer overflow. By randomizing the Layout of linear areas such as heap, stack and shared library mapping, it is difficult for attackers to predict the destination Address. A technique that prevents attackers from directly locating the attack code to prevent overflow attacks.
Its purpose is to configure data address space in a random way, so that some sensitive data (such as APP login registration, payment related code) can be configured to an address that cannot be known in advance by a malicious program, making it difficult for attackers to attack.
Due to the existence of ASLR, the loading address of executable files and dynamic link libraries in virtual memory is not fixed every time when starting, so it is necessary to fix the resource pointer in the image at compile time to point to the correct address. Correct memory address = ASLR address + offset value
Executable file
Different operating systems have different formats for executable files. The system kernel reads the executable into memory and then signs the executable according to its header (magic
Determine the format of the binary file
PE, ELF and Mach-O are all variants of COFF (Command File Format). The main contribution of COFF is the introduction of the mechanism of “segments” in object files. Different object files can have different numbers and types of “segments”.
Universal binary
Because different CPU platforms support different instructions, for examplearm64
andx86
The common binary format in Apple isPackage mach-O files from multiple architectures together
And then the system selects the appropriate Mach-O according to its CPU platform, soUniversal binary format
Also referred to as theFat binary format
, as shown in the figure below
The generic binary format is defined in
. You can download xnu and find it in xnu -> EXTERNAL_HEADERS -> Mach-o. The generic binary starts with the FAT Header structure fat_header. While Fat Archs indicate how many Mach-Os there are in the generic binary, the description of a single Mach-O is via the FAT_ARCH structure. The two structures are defined as follows:
/* -magic: allows the system kernel to read the file knowing that it is a generic binary. Mach-o */ struct fat_header {uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; /* mach-o-cputype and cpusubtype: */ struct fat_arch {cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ };Copy the code
So, to sum up,
-
Universal binary file is a new binary file storage structure proposed by Apple, which can store binary instructions of multiple architectures at the same time, so that the CPU can automatically detect and select the appropriate architecture when reading the binary file, and read it in the most ideal way
-
Because general-purpose binaries store multiple schemas simultaneously, they are much larger than single-schema binaries and take up a lot of disk space, but because the system automatically selects the most appropriate schema code, unrelated schema code does not take up memory space and is executed more efficiently
-
Mach-o can also be merged and split by instruction
-
View the current mach-o architecture: lipo-info MachO file
-
Merge: lipo-create MachO1 MachO2 -output Output file path
-
Split: Lipo MachO files – Thin schema – output Output file path
-
The Mach – O files
Mach-o file is short for Mach Object file format, which is a file format for executable files, dynamic libraries, and Object code. As an alternative to the A.out format, the Mach-O format provides greater extensibility and faster access to symbol table information
Familiarity with Mach-O file format is helpful to better understand the underlying operating mechanism of Apple and better master the steps of dyLD loading Mach-O.
Viewing Mach-O Files If you want to view specific Mach-O file information, you can do so in one of two ways. The second is recommended as being more intuitive
- 【 Method 1 】 Otool terminal command:
Otool -l Mach -o File name
-【 Method 2 】MachOView
Tool (recommended) : Drag the Mach-o executable toMachOView
Tool to open the
Mach-o File format
For OS X and iOS, Mach-O is the executable file format, which mainly includes the following file types
Executable
: Executable fileDylib
: Dynamic link libraryBundle
: a dynamic library that cannot be linked and can only be loaded at run time using dlopenImage
Executable, Dylib, and BundleFramework
A collection containing Dylib, resource files, and header files
The mach-o image file format is shown belowThe above is the mach-o file format, a complete oneMach-O
The document is divided into three main sections:
-
Header Mach-O Header: Information about the MACH-O CPU architecture, file types, and loading commands
-
Load Commands: Describes the organization of data in a file. Different data types are represented by different Load Commands
-
Data: Each segment of the Data is stored here. The concept of a segment is similar to the concept of a middle section of an ELF file. Each segment has one or more sections that place specific data and code, mainly including code, data, such as symbol table, dynamic symbol table, and so on
Header
The Mach-o Header contains the key information for the entire Mach-o file, enabling the CPU to quickly learn the basic mac-O information, which is used in mach.h (the same path as in fat. H) for 32-bit and 64-bit cpus. Mach-o headers are described using the mach_header and mach_header_64 structures, respectively. The mach_header is the first field read by the connector when it is loaded. It determines the infrastructure, system type, number of instructions, and so on. The mach_header_64 structure is defined in 64-bit architecture.
/* -magic: 0xfeedFace (32-bit) 0xfeedfacf(64-bit), which is used by the kernel to determine if it is in mach-O format. -cputype: CPU type, such as arm-cpusubtype: For example, arm64, armv7 -filetype: Since executable files, object files, static libraries, and dynamic libraries are in mach-O format, filetype is required to indicate which type of file a Mach-o file is. -ncmds: sizeofcmds: -sizeofcmds: sizeof the LoadCommands command to load. -flags: Flag bits that indicate the functions supported by the binary file, mainly related to system loading and linking. -reserved: reserved: */ struct mach_header_64 {uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ };Copy the code
Filetype mainly records the filetype of mach-o. There are several commonly used ones
#define MH_OBJECT 0x1 /* Object file */ #define MH_EXECUTE 0x2 /* Executable file */ #define MH_DYLIB 0x6 /* Dynamic library */ #define MH_DYLINKER 0x7 /* Dynamic linker */ #define MH_DSYM 0xa /* Store binary file symbol information for debug analysis */Copy the code
The corresponding Header is inMachOView
Is shown below
Load Commands
In mach-o files, Load Commands are primarily used to Load instructions, the size and number of which are provided in the Header, and are defined below in Mach.h
/* struct load_command {uint32_t CMD; /* uint32_t; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ };Copy the code
We are inMachOView
To view Load Commands, which records a lot of information such asDynamic linker location, program entry, dependency library information, code location, symbol table location
And so on, as follows
The type segment_command_64 of LC_SEGMENT_64 is defined as follows
/* segment_command Load command -cmd: indicates the type of the load command, -cmdsize: indicates the size of the load command (including the size of the nsects section immediately following) -segname: indicates the name of the 16-byte segment -vmaddr: -vmsize: indicates the virtual memory size of a segment. -fileoff: indicates the offset of a segment in a file. -filesize: indicates the size of a segment in a file. Maximum memory protection required for a segment page (4 = r, 2 = w, 1 = x) -initprot: Initial memory protection for a segment page -nsects: Number of sections in a segment -flags: Other miscellaneous flag bits - From fileoff (offset), take the filesize byte binary data, put the VMsize byte in memory vmADDR. (The binary data from filesoff to filesize bytes, known as "segments") - Each segment has the same permissions (or, at compile time, the compiler puts together data with the same permissions to become segments), and its permissions are initialized according to initProt. Initprot specifies how to initialize the page's protection level with read/write/execute bits - the protection Settings for a segment can be changed dynamically, but cannot exceed the value specified in maxProt (in iOS, */ struct segment_command_64 {/* for 64-bit architectures */ uint32_t CMD; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* includes sizeof section_64 structs */ char segname[16]; /* segment name */ uint64_t vmaddr; /* memory address of this segment */ uint64_t vmsize; /* memory size of this segment */ uint64_t fileoff; /* file offset of this segment */ uint64_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ };Copy the code
Data
After Load Commands comes the Data area, which stores specific read-only, read-write code such as methods, symbol tables, character tables, code Data, Data required by connectors (redirects, symbolic bindings, etc.). Mainly to store specific data. Most of these Mach-o files contain the following three segments:
__TEXT code
: read-only, including functions, and read-only strings__DATA data segment
: reads and writes, including global variables that can be read and written__LINKEDIT
The: __LINKEDIT contains metadata (locations, offsets) for methods and variables, as well as information such as code signatures.
In the Data Section, a large proportion of sections are represented by the structure section_64 (under arm64 architecture) in Mach.h, which is defined below
-sectName: the name of the current Section -segName: the name of the segment where the Section is located -addr: the starting location in memory -size: Section size -offset: section file offset -align: byte size alignment -reloff: file offset for relocation entry -nreloc: Number of relocation entries -flags: Flag, type and attribute of section - reserved1: reserved (for offset or index) - reserved2: reserved (for count or sizeof) - reserved3: */ struct section_64 {/* for 64-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint64_t addr; /* memory address of this section */ uint64_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ uint32_t reserved3; /* reserved */ };Copy the code
Section
inMachOView
As can be seen, mainly reflected inTEXT
andDATA
In two paragraphs, as followsThe common sections are as follows
section – __TEXT | instructions |
---|---|
__TEXT.__text |
Main program code |
__TEXT.__cstring |
C language string |
__TEXT.__const |
The const keyword modifies a constant |
__TEXT.__stubs |
The placeholder code used for stubs is referred to in many places as Stub code |
__TEXT.__stubs_helper |
The final point when the Stub cannot find the real symbol address |
__TEXT.__objc_methname |
Objective-c method name |
__TEXT.__objc_methtype |
Objective-c method type |
__TEXT.__objc_classname |
Objective – C class name |
section – __DATA | instructions |
---|---|
__DATA.__data |
Initialized mutable data |
__DATA.__la_symbol_ptr |
Lazy Binding’s pointer table, where Pointers start out pointing to __stub_Helper |
__DATA.nl_symbol_ptr |
Non-lazy binding pointer table, each pointer in the entry points to a symbol that has been searched by the dynamic chain machine during the loading process |
__DATA.__const |
An uninitialized constant |
__DATA.__cfstring |
Core Foundation strings used in the program (CFStringRefs) |
__DATA.__bss |
BSS, which stores global variables for initialization, is often referred to as static memory allocation |
__DATA.__common |
Uninitialized symbol declaration |
__DATA.__objc_classlist |
Objective – C class list |
__DATA.__objc_protolist |
Objective – C prototype |
__DATA.__objc_imginfo |
Objective-c mirroring information |
__DATA.__objc_selfrefs |
Objective – C self reference |
__DATA.__objc_protorefs |
Objective-c prototype references |
__DATA.__objc_superrefs |
Objective-c superclass reference |
Therefore, to sum up, the format diagram of Mach-O is shown below