Main Contents:
- Understand executables
- understand
Mach-O
file Mach-O
File structureMach Header
Load Commands
Data
- Understand small – and small-end patterns
- Understand common binaries
Understand executable files
1. Executable files
process
In fact, it isExecutable file
Load the results in memory;Executable file
It must be in a format understandable by the operating system and different from one anotherExecutable file
The format is also different;
2. Executables for different platforms
Linux: the ELF
fileWindows
:PE32 / PE32 +
fileOS and iOS
:Mach-O(Mach Object)
file
Understand mach-O files
As an executable file format for iOS, iPadOS and macOS platforms, Mach-O files involve App startup and operation, bitcode analysis, crash symbolization and many other functions:
1. The Mach – O files
Mach-O
The file isiOS
.iPadOS
,macOS
Executable file format for the platform. The corresponding system applies the binary interface (application binary interface
, abbreviated asABI
) to run a file in that format;Mach-O
Format is used insteadBSD
In the systema.out
Format that holds the files generated during compilation and linkingMachine code and data
To provide a single file format for statically linked and dynamically linked code.Mach-O
Provides stronger scalability, and faster symbol table information access speed;
2. Common file types in Mach-O format
Executable
: executable file (.out
.o
);Dylib
: dynamic link library;Bundle
: cannot be linked and can only be used at run timedlopen()
Load;Image
: containsExecutable
,Dylib
andBundle
;Framework
: containsDylib
, folder for resource files and header files;
Mach-O file structure
1. Look at two approaches to Mach-O
- use
MachOView
Software, can be viewed directlyMachO
File structure; - Using terminal Commands
objdump
;
2. View the Mach-O file structure
useMachOView
To viewMach-O
, the effect is as follows:
The Mach-O file contains three main sections:
Header
: Header, descriptionCPU
Type, file type, number and size of load commands, etc.Load Commands
: load command whose number and size are already inheader
To be provided in;Data
: Data segment;
Other information:
Dynamic Loader Info
: dynamic library loading informationFunction Starts
: entry functionSymbol Table
: the symbol tableDynamic Symbol Table
: dynamic library symbol tableString Table
: string table
Mach Header
1. Function summary
Header
Is the first thing the linker reads when it loads, because it determines somethingThe infrastructure
,System type
Information such as;Header
Contain the entireMach-O
Key information of the file, such asThe CPU type
,The file type
,Number of loading commands
And other information, so that the system can quickly locateMach-O
The running environment of the file;Header
for32
And the64
A framework ofCPU
, corresponding tomach_header
andmach_header_64
Structure of;
2. Source code analysis
Header is defined in loader.h as follows:
struct mach_header_64 {
uint32_t magic; // 32-bit or 64-bit, which is used by the system kernel to determine whether it is mach-O format
cpu_type_t cputype; // CPU architecture type, such as ARM
cpu_subtype_t cpusubtype; // CPU type, for example, arm64 or armv7
uint32_t filetype; // Mach-o file types, executables, object files or static and dynamic libraries
uint32_t ncmds; // number of LoadCommands to load (LoadCommands immediately after header)
uint32_t sizeofcmds; // the size of all LoadCommands LoadCommands
uint32_t flags; // The flag bit identifies the functionality supported by binary files, mainly related to system loading and linking
uint32_t reserved; // Reserved fields (as opposed to 32-bit fields)
};
Copy the code
Because executables, object files, static and dynamic libraries, etc., are in Mach-O format, fileType is required. Common file types are as follows:
#define MH_OBJECT 0x1 /* Target file */
#define MH_EXECUTE 0x2 /* Executable file */
#define MH_DYLIB 0x6 /* Dynamic library */
#define MH_DYLINKER 0x7 /* Dynamic linker */
#define MH_DSYM 0xa /* Store binary file symbol information for debug analysis */
Copy the code
3. MachOView demonstration
Analyze Load Commands
1. Function summary
Load Commands
Is a list of load commands that describeData
Layout information in binary files and virtual memory;Load Commands
A lot of information is recorded, such as the location of dynamic linker, program entry, dependent library information, code location, symbol table location, etc.Load commands
Defined by the kernel, different versionscommand
The number of different items and their sizes are recorded inheader
;Load commands
thetype
Based onLC_
Is a prefix constant, for exampleLC_SEGMENT
,LC_SYMTAB
And so on;
2.. The code analysis
Load Command is defined in loader.h as follows:
struct load_command {
uint32_t cmd; /* Type of the load command */
uint32_t cmdsize; /* The size of the load command */
};
Copy the code
Each Load Command has a separate structure, but the first two fields of all the structures are fixed. For example, LC_SEGMENT_64 is a command to read the segment and section. The code is as follows:
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; // Indicates the type of the load command
uint32_t cmdsize; // indicates the size of the load command (as well as the size of the nsects sections immediately following it)
char segname[16]; // 16-byte segment name
uint64_t vmaddr; // The virtual memory start address of the segment
uint64_t vmsize; // Segment virtual memory size
uint64_t fileoff; // The offset of the segment in the file
uint64_t filesize; // The size of the segment in the file
vm_prot_t maxprot; // Maximum memory protection (4 = r, 2 = w, 1 = x)
vm_prot_t initprot; // Initial memory protection for section pages
uint32_t nsects; // The number of sections in a section
uint32_t flags; / / sign
};
Copy the code
Sixth, the Data
1. Function summary
Data
Stored in the actual data and code, mainly including methods, symbol table, dynamic symbol table, dynamic library loading information (redirection, symbol binding, etc.);Data
The arrangement is exactly as followsLoad Command
Description in;Data
bySegment
(period) andSection
(section), usually,Data
Have more than onesegment
, eachsegment
There can be anything from zero to manysection
Section;- different
segment
There is a paragraphA virtual address
Map to the address space of the process;
Almost all Mach-O files contain three segments
- __TEXT: code snippet, read-only executable, stored
The binary code of the function (__text)
.Constant string (__cstring)
.The class/method name of OC
Information such as - __DATA: Data segment, readable and writable, storage
OC string (__cfString)
, as well asRuntime metadata: class/protocol/method
, and global variables, static variables, etc. - __LINKEDIT: Read-only, the storage is started
App
Required information, such asAddress of bind & Rebase
, the name and address of the function;
2. Source code analysis
Sections make up a large part of the Data Section, and in Mach-O are concentrated in __TEXT and __DATA sections.
The Section is defined in the loader.h file as follows:
struct section_64 { /* for 64-bit architectures */
char sectname[16]; // Name of the current section
char segname[16]; // The segment name where the section resides
uint64_t addr; // Start location in memory
uint64_t size; / / section size
uint32_t offset; // Section file offset
uint32_t align; // Align the byte size
uint32_t reloff; // Reposition the file offset of the entry
uint32_t nreloc; // Reposition the number of entries
uint32_t flags; // Flags, section types and attributes
uint32_t reserved1; // Reserved (for offsets or indexes)
uint32_t reserved2; // Keep (for count or sizeof)
uint32_t reserved3; / / to keep
};
Copy the code
7. Understand the big and small end patterns
When analyzing Mach-O files, you often see the memory address, which brings up the concept of size-side patterns.
- Small-endian mode: low bytes of data, stored at low addresses in memory;
- Big-endian mode: low bytes of data, stored at high addresses in memory;
The processor of iOS devices is based on ARM architecture. By default, data is read in small-endian mode (low byte put low), while data is transmitted in big-endian mode (low byte put high) for network and Bluetooth:
Use unsigned int value = 0x12345678 as an example. We can use unsigned char buf[4] to represent value
Little-Endian: Low address Stores the low address as follows: Low address------------------>High address0x78 | 0x56 | 0x34 | 0x12
Big-Endian: Low address Stores the high address as follows: Low address----------------->High address0x12 | 0x34 | 0x56 | 0x78
Copy the code
Memory address | Small – end mode stores content | Big-endian mode stores content |
---|---|---|
0x4000 | 0x78 | 0x12 |
0x4001 | 0x56 | 0x34 |
0x4002 | 0x34 | 0x56 |
0x4003 | 0x12 | 0x78 |
Understand common binaries
1. Basic concepts
- The common binary storage structure is multiple architectures
Mach-O
The files are packed together,CPU
When reading the binary file, it can automatically detect and select the appropriate schema; - General-purpose binaries store multiple schemas at the same time, so they are much larger than single-schema binaries and can take up a lot of disk space. However, since the system automatically selects the most appropriate, irrelevant architecture code when running, and does not take up memory space, the execution efficiency is improved;
- The universal binary format is also known as the fat binary format;
2. Universal binary format analysis
The general binary format is defined in < Mach-o /fat.h> :
- downloadxnuAfter, in turn
xnu -> EXTERNAL_HEADERS ->mach-o
The file is found in. - Generic binaries have two important structures:
fat_header
,fat_arch
;
The two structures are defined as follows:
-nfat_arch: indicates that there are multiple fat_arch structures below, i.e., how many mach-o */ common binaries contain
struct fat_header {
uint32_t magic; /* FAT_MAGIC */
uint32_t nfat_arch; /* number of structs that follow */
};
/* Fat_arch describes mach-o-cpuType and cpusubtype: describes the platforms for mach-o - offset, size, align describe the position of the Mach-O binary in the common binary */
struct fat_arch {
cpu_type_t cputype; /* cpu specifier (int) */
cpu_subtype_t cpusubtype; /* machine specifier (int) */
uint32_t offset; /* file offset to this object file */
uint32_t size; /* size of this object file */
uint32_t align; /* alignment as a power of 2 */
};
Copy the code
Refer to the link
- xnu
- Mach-O official source code