preface
In previous posts:
Automatic re-signing of shell scripts with code injection
Application signature principle and re-signature (re-signature wechat application practice)
Recheckout application Debugging and Code Modification (Hook)
We’ve learned a little bit about re-signing and code injection. So we’ve come back to one of the most important files over and over again — Mach-o.
So what exactly is mach-O? Since it’s so important, it’s worth taking a look at it.
(For those of you who are less interested in the concept, skip to chapter 2 on Mach-O file structures.)
MachO file
Mach-O is short for Mach Object file format, which is the MAC and iOS executable file format, It is similar to the Portable PE Format on Windows and the ELF Format on Linux.
It is a file format for executables, object code, and dynamic libraries. As an alternative to the A. out format, Mach-o offers greater extensibility.
But in addition to executables, there are actually some files that also use the Mach-O file format.
Common files in The Mach-O format
- Object file.o
- The library files
- .a
- .dylib
- Framework
- Executable file
- Dyld (Dynamic linker)
- .dsym (symbol table)
Tips: Use the file command to view the file type
That is, Mach-O is not necessarily an executable, it is a file format, It is divided into Mach-O Object Object files, Mach-O Ececutable executable files, Mach-O Dynamically dynamic library files, Mach-O Dynamic Linker dynamic linker files, Mach-O dSYM files Companion symbol table files, etc.
You can play around with vim’s.c and clang’s.o object files and executables to get a better understanding of these files and how they compile.
So in the picture above we also see an ARM64. What does that mean?
- In Release mode
- Support iOS 11.0 or later
When these two conditions are met, the Mach-O Ececutable executable for our application is packaged with arm64 and ARM_V7 architecture, which is 64-bit for iPhone 5C and above.
A Mach-O Ececutable executable that contains support for multiple architectures is called a universal binary, meaning that multiple architectures can be read and run.
In addition, Architectures can be configured by compilation in Xcode to change the supporting architecture of the generated Mach-O Ececutable executable.
The compiler selects the intersection between Architectures and Valid Architectures when generating the Mach-O file, so if you want to support multiple Architectures, you can simply add more to Valid Architectures, After compiling to generate Mach-O, use the file command to check the results.
Universal binary
-
A program code proposed by Apple. Binaries that work with multiple schemas at the same time
-
Optimal performance for multiple architectures simultaneously in the same package.
-
Because of the need to store multiple types of code, general-purpose binary applications are generally larger than single-platform binary applications.
-
However, because the two architectures share common non-execution resources, there are not twice as many as in a single version.
-
And because only a portion of the code is called during execution, no extra memory is required to run.
A Universal binary is usually called a Universal binary, or Fat binary in MachOView and so on, and this binary can be completely split up or recombined, so let’s play with that.
Fat binary combination and split
1 – Create a new project and select the supported system version10.3
.
2 – Edit the running mode
Select Release (test finished change back. Otherwise run is too slow)
3 – Build Settings
IOS 10.3
release
arm64 + armv7
armv7s
arm64e
4 – Select real machine RUN
Run and find the Mach-O file
As you can see, our Fat binary is already generated.
The supported architecture can also be viewed using the lipo-info command
Break upFat binary
Lipo macho file name -thin which schema to split -output Split file nameCopy the code
Ex. :
Lipo Universal binary MachO_Test -thin ARMv7s - Output MACHO_ARMv7sCopy the code
Macho_armv7s > macho_armv7s > macho_armV7s > macho_armV7s > macho_armV7s
In addition, the source file does not change after splitting.
mergeFat binary
lipo -create macho_arm64 macho_arm64e macho_armv7 macho_armv7s -output newMachO
Copy the code
After the merge, let’s look at the hash values of the newly generated and previous files.
Exactly the same.
Tips:
This is often used when we merge static libraries, which are themselves Mach-O files, and when we reverse, we sometimes split binaries in this way because we only need to analyze a single schema, not a bloated binary.
supplement
On a slightly more subtle note, when multi-schema binaries are combined into common binaries, the code portion is not shared (because different combinations of code binaries may mean different things on different cpus). Common resource files are shared.
Mach-o file structure
The mach-O component structure is included as shown in the figure
-
Header contains general information about the binary file
-
Byte order, schema type, number of load instructions, etc.
-
This allows you to quickly verify information such as whether the current file is 32-bit or 64-bit, the corresponding processor, and the file type
-
-
Load Commands A table that contains many contents
- The content includes the location of the region, symbol table, dynamic symbol table, etc.
-
Data is usually the largest part of the object file
- contains
Segement
Specific data of
- contains
Let’s find a Mach-O file and use MachOView or otool to look at the file structure.
So what exactly does this Mach-O store? Add it up and let’s explore one by one.
Mach Header
The contents stored in the Header should look something like the figure above, so what exactly does each one correspond to? CMD + shift + O, load. H, mach_header_64.
struct mach_header_64 {
uint32_t magic; /* Magic number, fast location 64 /32 */
cpu_type_t cputype; /* CPU type such as ARM */
cpu_subtype_t cpusubtype; /* CPU specific type such as arm64, armv7 */
uint32_t filetype; /* File type such as executable file.. * /
uint32_t ncmds; /* Load commands Load commands */
uint32_t sizeofcmds; /* Load commands load command size */
uint32_t flags; /* The flag bit identifies the functionality supported by binary files, mainly related to system loading and linking */
uint32_t reserved; /* reserved */
};
Copy the code
Mach_header_64 has one more reserved field than mach_header, which is a 32-bit header file. Mach_header is the first thing the linker reads when it loads and determines some infrastructure, system type, number of instructions, etc.
Load Commands
Load Commands details the Load instructions that tell the linker how to Load the Mach-O file.
By looking at the memory address we see that Load Commands are immediately after Mach_header in memory.
So what do these Load Commands correspond to? Let’s take arm64 as an example.
The _TEXT segment and _DATA segment are the ones we often need to study, and are also listed in detail below MachOView.
_TEXT section
So let’s look at what’s in the _TEXT section, and actually start reading from the _TEXT section.
The name of the | content |
---|---|
_text |
Main program code |
_stubs , _stub_helper |
Dynamic link |
_objc_methodname |
Method names |
_objc_classname |
The class name |
_objc_methtype |
Method type (v@:) |
_cstring |
Static string constants |
_DATA while forming period
_DATA comes immediately after the _TEXT segment in memory.
The name of the | content |
---|---|
_got : Non-Lazy Symbol Pointers |
Non lazy loading symbol table |
_la_symbol_ptr : Lazy Symbol Pointers |
Lazy loading of symbol tables |
_objc_classlist |
The class list |
.
And some data sources are not listed.
supplement
Another point worth mentioning is the system library methods. Since they are public and stored in a shared cache, our mach-o calls system methods,
For example, call NSLog(“%@,@”haha”);
The implementation of this method is definitely not in our Mach-O, so how does it find the implementation?
In fact, dyLD will be linked to the Mach-O call stored in the shared cache for symbolic binding, and this symbol will be automatically removed when release. This is why we often need to restore symbol tables when using bug collection tools. That’s why fishhooh is called reBind when it comes to hook system functions.
We’ll talk more about symbol binding when we talk about Fishhook.
At this point, we have covered the entire mach-O file structure. We will continue to cover the details of the storage later in the reverse process.