This article learns and references iOS- Basic Principles 32: Startup Optimization (1) basic concepts, thanks in this

Virtual memory & physical memory

Early data access was directly through physical addresses, which had two problems:

  • 1. Memory is not enough
  • 2. Security of memory data

Running out of memory: Virtual memory

For problem 1, we added an intermediate layer between the process and physical memory. This intermediate layer is called virtual memory, which is mainly used to manage physical memory when multiple processes exist at the same time. Improved CPU utilization, enabling multiple processes to load simultaneously and on demand. Therefore, virtual memory is essentially a mapping table of the corresponding relationship between virtual addresses and physical addresses

  • Each process has an independent virtual memory, the address is from 0, size is 4 g fixed, each virtual memory is divided into a a page table (page table size is 16 k in iOS, the other is 4 k), is loaded every time a page to load, is unable to visit each other between processes, ensure the security of data between processes.

    • The big five partitions, stacks, and heaps are all in virtual memory
  • In a process, only some functions are active, so you only need to put active page tables in physical memory to avoid wasting physical memory

  • When the CPU needs to access data, it first accesses virtual memory and then addresses it through virtual memory. That is, it looks for the corresponding physical address in the table and then accesses the corresponding physical address

  • If the page table in virtual memory is not loaded into physical memory during access, a PageFault will occur and the current process will be blocked. In this case, the data needs to be loaded into physical memory first, and then the page is loaded into memory, and then the address is addressed for reading. This avoids memory waste

    • At the same time, there are a lot of applications in the background and the inactive application comes back to the foreground and it might be restarted because memory is full, and the page table corresponding to the inactive application has been overwritten by another application and you have to reload the start page table of that application into memory

The following figure shows the relationship between virtual memory and physical memory:

Security of in-memory data: ASLR technology

In the above explanation of virtual memory, we mentioned the starting address of the virtual memory and size are fixed, this means that when we visited, the data address is fixed, which can lead to our data is very easy to be cracked, in order to solve this problem, so apple in order to solve this problem, at the beginning of the iOS4.3 ASLR technology is introduced.

ASLR concept: Address Space Layout Randomization is a security protection technology against buffer overflow. By randomizing the linear area Layout of heap, stack and shared library mapping, it increases the difficulty for the attacker to predict the destination Address. It is a technique to prevent the attacker from locating the attack code directly and prevent overflow attack.

Its purpose is to configure the data address space in a random way, so that some sensitive data (such as APP login registration, payment related code) can be configured to an address that the malicious program cannot know in advance, making it difficult for attackers to attack.

Due to the existence of ASLR, the loading address of executable files and dynamic linked libraries in virtual memory is not fixed every time they are started, so the resource pointer in the image needs to be fixed at compile time to point to the correct address. The correct memory address = ASLR address + offset value

Executable file

Different operating systems have different executable file formats. The system kernel reads the executable into memory and signs it against the executable’s header (magicMagic number) determines the format of the binary file

PE, ELF and Mach-O are all variations of the Command File format (COFF). The main contribution of COFF is the introduction of “segments” mechanism in the target file. Different target files can have different numbers and types of “segments”.

Universal binary

Because different CPU platforms support different instructions, such as ARM64 and x86, the Common binary format in Apple is to package multiple architecture mach-O files together, and then the system chooses the appropriate mach-O based on its CPU platform. Therefore, the common binary format is also called the fat binary format, as shown in the following figure:

The general binary format is defined in

, which can be found in xnu -> EXTERNAL_HEADERS ->mach-o. The general binary format starts with fat_header. Fat Archs are how many Mach-Os there are in common binaries, and individual Mach-Os are described via the FAT_arch structure. The two structures are defined as follows:

/* -magic: can let the system kernel read the file to know that it is a common binary file -nfat_arch: Struct fat_header {uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; /* Fat_arch is used to describe mach-o-cputype and cpusubtype: */ struct fat_arch {cpu_type_t cputype; struct fat_arch {cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ };Copy the code

So, to sum up,

  • Universal binary is a new binary file storage structure proposed by Apple that canStore binary instructions for multiple schemas simultaneously, so that the CPU can automatically detect and select the appropriate architecture when reading the binary file, and read it in the most ideal way
  • Because common binaries store multiple schemas at the same time, they are much larger than single-schema binaries and take up a lot of disk space, but since the system automatically selects the most appropriate one, unrelated schema code does not take up memory space, andHigh execution efficiencythe
  • You can also merge and split Mach-O with instructions
    • View the current mach-O architecture:Lipo-info MachO file
    • Merger:Lipo-create MachO1 macho2-output Specifies the output file path
    • Resolution:Lipo MachO file - THIN architecture - Output Output file path

The Mach – O files

Mach-o files are short for The Mach Object file format, which is a file format for executables, dynamic libraries, and Object code. As an alternative to the A. out format, the Mach-o format provides greater extensibility and faster access to symbol table information

Familiarity with the Mach-O file format will help you better understand the underlying operating mechanism of Apple and better master the steps of dyLD to load Mach-O.

Viewing a Mach-O file You can view information about a Mach-O file in either of the following ways. The second method is recommended

  • [Method 1] Otool Terminal command:Otool -l Mach -o Specifies the file name\

  • 【 Method 2 】MachOViewTool (recommended) : Drag the Mach-o executable file toMachOViewTool open \

Mach-o file format

For OS X and iOS, Mach-O is the executable file format, which includes the following file types

  • Executable file

  • Dylib: dynamic link library

  • Bundle: Dynamic library that cannot be linked and can only be loaded at run time using Dlopen

  • Image: refers to one of Executable, Dylib, and Bundle

  • Framework: A collection of Dylib, resource files, and header files

The following illustration shows the Mach-O image file format

The above is a mach-O file format, a complete oneMach-OThe document is mainly divided into three parts:

  • The Header Mach - O the headMach-o CPU architecture, file types, and load commands
  • Load Commands Load Commands: describes the specific organization structure of data in a file. Different data types are represented by different load commands
  • The Data of DataThe data for each segment of the data is stored here. The concept of segment is similar to the concept of the middle section of ELF files. Each segment has one or more sections that hold specific data and code, including code, data, such as symbol tables, dynamic symbol tables, and so on
Header

The mach-o Header contains the key information of the entire Mach-o file, so that the CPU can quickly know the basic information of the MAC-O, which is in mach.h (the path is the same as the previous fat.h) for 32-bit and 64-bit architectures. The mach_header and mach_header_64 constructs are used to describe the Mach-O header, respectively. Mach_header is the first object read by the connector when it is loaded, which determines the infrastructure, system type, number of instructions, etc. The mach_header_64 structure of the 64-bit architecture is defined here. Compared to the mach_header of the 32-bit architecture, only a reserved field is added.

/* -magic: 0xfeedFace (32-bit) 0xfeedfacf(64-bit) -cpuType: CPU type, such as arm-cpusubtype: The specific CPU type, such as arm64 and armv7-fileType: Since the executable file, object file, static library, and dynamic library are all in the Mach-O format, the fileType is required to specify the type of the file. -ncmds: sizeofcmds: -sizeofcmds: indicates the sizeof the LoadCommands to load. -flags: indicates the flags that indicate the functions supported by the binary file, mainly related to system loading and linking. -Reserved: indicates the sizeof the LoadCommands to load. */ struct mach_header_64 {uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ };Copy the code

Filetype mainly records the file types of Mach-O

#define MH_OBJECT 0x1 /* Target file */ #define MH_EXECUTE 0x2 /* executable file */ #define MH_DYLIB 0x6 /* dynamic library */ #define MH_DYLINKER 0x7 /* Dynamic linker */ #define MH_DSYM 0xA /* Store binary file symbol information for debug analysis */Copy the code

The corresponding Header is inMachOViewAs shown below

Load Commands

In the Mach-o file, Load Commands are primarily used to Load Commands, the size and number of which are provided in the Header and are defined in Mach.h as follows:

*/ struct load_command {uint32_t CMD; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ };Copy the code

We are inMachOViewView Load Commands, which records a lot of information, for exampleThe location of the dynamic linker, the entry of the program, the information about the dependent libraries, the location of the code, the location of the symbol tableAnd so on, as follows:Among themLC_SEGMENT_64The type ofsegment_command_64Are defined as follows

/* segment_command -cmd: indicates the type of the segment_command, -cmdsize: indicates the size of the segment_command (including the size of the nsects sections immediately following it) -segName: indicates the name of the 16-byte segment_command -vmaddr: -vmsize: indicates the virtual memory size of a segment. -fileOff: indicates the offset of a segment in a file. -filesize: indicates the size of a segment in a file. -initProt: specifies the initial memory protection for a segment page. -nsects: specifies the number of sections in a segment. -flags: Other Miscellaneous flag bits - Take the binary data of filesize bytes from fileOFF (offset) and put the VMsize bytes at vmADDR in memory. - Each segment has the same permissions (or, at compile time, the compiler groups data with the same permissions into segments) and its permissions are initialized according to initProt. Initprot specifies how to initialize the protection level of the page with read/write/execute bits - the protection Settings of the segment can change dynamically, but cannot exceed the value specified in maxProt (in iOS, Struct segment_command_64 {/* for 64-bit architectures */ uint32_t CMD; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* includes sizeof section_64 structs */ char segname[16]; /* segment name */ uint64_t vmaddr; /* memory address of this segment */ uint64_t vmsize; /* memory size of this segment */ uint64_t fileoff; /* file offset of this segment */ uint64_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ };Copy the code
Data

Load Commands is followed by the Data area, which stores specific read-only, read-write code, such as methods, symbol tables, character tables, code Data, and Data required by connectors (redirects, symbol bindings, and so on). The main thing is to store specific data. Most of these mach-O files contain three sections:

  • __TEXT code snippet: read-only, including functions, and read-only strings

  • __DATA: Reads and writes, including global variables that can be read and written

  • __LINKEDIT: __LINKEDIT contains metadata (location, offset) for methods and variables, as well as information such as code signatures.

Sections make up a large part of the Data Section. Sections are represented in mach. h by the section_64 structure (under ARM64), which is defined as follows:

-sectname: the name of the current Section -segname: the name of the segment where the Section is located -addr: the start location of the memory -size: Section size -offset: section file offset -align: byte size alignment -reloff: relocation entry file offset -nreloc: relocation entry number -flags: Flag, section type and attribute - reserveD1: Reserved (for offset or index) - reserved2: reserved (for count or sizeof) - reserved3: Retain */ struct section_64 {/* for 64-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint64_t addr; /* memory address of this section */ uint64_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ uint32_t reserved3; /* reserved */ };Copy the code

SectioninMachOViewAs can be seen, mainly reflected inTEXTandDATAIn two paragraphs, as follows

Common sections include the following

section – __TEXT instructions
__TEXT.__text Main program code
__TEXT.__cstring C language string
__TEXT.__const A constant decorated with the const keyword
__TEXT.__stubs The placeholder code for stubs is referred to in many places as the Stub code
__TEXT.__stubs_helper The final point to when the Stub cannot find a real symbolic address
__TEXT.__objc_methname Objective-c method name
__TEXT.__objc_methtype Objective-c method type
__TEXT.__objc_classname Objective – C class name
section – __DATA instructions
__DATA.__data Initialized mutable data
__DATA.__la_symbol_ptr A lazy binding pointer table that starts with a pointer to __stub_helper
__DATA.nl_symbol_ptr Table of non-lazy binding Pointers, each of which points to a symbol searched by the dynamically-linked machine during loading
__DATA.__const Constants that have not been initialized
__DATA.__cfstring Core Foundation strings used in the program (CFStringRefs)
__DATA.__bss BSS, which stores initialized global variables, also known as static memory allocation
__DATA.__common No initialized symbol declaration
__DATA.__objc_classlist Objective – C class list
__DATA.__objc_protolist Objective – C prototype
__DATA.__objc_imginfo Objective-c image information
__DATA.__objc_selfrefs Objective – C self reference
__DATA.__objc_protorefs Objective-c stereotype references
__DATA.__objc_superrefs Objective-c superclass references

So, to sum up, the format diagram for Mach-O is as follows: