In the previous article, MachO was briefly mentioned in the code injection. Before using the Framework to do code injection, we must first insert the relative path of the Framework into the MachO Load Commons. Let our iPhone recognize and load the Framework while executing MachO!

From these contents, we can already know how important MachO is in our APP. Similarly, in our reverse practice, MachO can’t get around the threshold!

This article will use the following tools:

  • MachOView
  • Dyld source
  • Objc source

Without further ado, this article will explain what MachO is from the following points.

  • What is a MachO
  • MachO file structure
  • APP startup process from DYLD source perspective (key!!)

What is MachO

Mach-O is short for Mach Object file format, which is the MAC and iOS executable file format, Similar to Portable PE Format on Windows and Executable and Linking Format on Linux

1. Common MachO files

A, object file:.o b, library file:.a. dylib Framework c, executable file: dyld.dsym

2, how to view the file format

We can use the file command to see the specific format of the file

Known architecture is divided into armv7, armv7s arm64, i386, x86_64, etc., actually, is a collection of these architectural in MachO. Create an empty project: Dome1 (empty project is not for Demo)

Look at MachO in dome1. ipa built

Set the lowest version to iOS 12 and package MachO in Dome1.ipa with release

Set the lowest version to iOS 8 and package MachO in Dome1.ipa with release

You can tell from the three diagrams above that MachO can be a multi-schema binary, called a “universal binary.”

Universal binary is a type of program code introduced by Apple. Binaries that can be used for multiple architectures at the same time a. provide optimal performance for multiple architectures in the same package. B. Generic binary applications are generally larger than single-platform binary applications because of the need to store multiple types of code. C. However, the two architectures do not have twice as many non-execution resources in common as a single version. D. And because only a portion of the code is called during execution, no extra memory is required to run.

Note: In addition to changing the minimum version number of MachO, you can also change the architecture of MachO in XCode

3. Split and restructure MachO

$lipo -info MachO file $lipo -info MachO file $lipo -info MachO file $lipo - thin file $lipo-create MachO1 Macho2-output Specifies the path to the output fileCopy the code

MachO file structure

First of all, a picture on the official website:MachO is divided into three parts: Header, Load Commons, and Data

1, the Header

Header contains general information about the binary file

Byte order, schema type, number of load instructions, etc. This allows you to quickly verify information such as whether the current file is 32-bit or 64-bit, the corresponding processor, and the file type

This paper analyzes the Header from two perspectives, namely “visual view after visualization with MachOView” and “system source code parsing”.

  • Visual view with MachOView

You can view a MachO file directly using MacOView as shown in the previous article

  • System source code parsing

There are also corresponding fields in the MachO source file. The diagram below:

2, the Load Commons

Load Commands is a table that contains many things.

The content includes the location of the region, symbol table, dynamic symbol table, etc.

Most of the fields in Load Commons above can be found in the table below.

The name of the meaning
LC_SEGMENT_64 Maps segments (32-bit or 64-bit) in a file to the process address space
LC_DYLD_INFO_ONLY Dynamically link related information
LC_SYMTAB Symbolic address
LC_DYSYMTAB Dynamic symbol table address
LC_LOAD_DYLINKER Who uses load, we use dyLD
LC_UUID File the UUID
LC_VERSION_MIN_MACOSX Supports the lowest operating system version
LC_SOURCE_VERSION Source code version
LC_MAIN Set the main thread entry address and stack size of the program
LC_LOAD_DYLIB Path to dependent libraries, including tripartite libraries
LC_FUNCTION_STARTS Function start address table
LC_CODE_SIGNATURE Code signing

LC_LOAD_DYLINKER and LC_LOAD_DYLIB

  • LC_LOAD_DYLINKER

This field indicates by whom our MachO was loaded. It can be understood that the address LC_LOAD_DYLINKER points to is the engine of wechat APP loading the applet, while our MachO is the applet. In the figure above, you can see that the LC_LOAD_DYLINKER of Demo1 points to the address dyld. Dyld is indeed used to load our app. In the following section, the source code of DYLD will be analyzed to describe how DyLD loads MachO.

  • LC_LOAD_DYLIB

This field marks the address of all dynamic libraries. Only if it is marked in LC_LOAD_DYLIB can dynamic libraries outside MachO (e.g. Framework) be properly referenced by DYLD, otherwise DyLD will not load actively. This is the key to code injection in the previous article!

3, the Data

Data is usually the largest part of the object file, containing seinterfaces specific Data, such as static C strings, OC methods with/without parameters, and C functions with/without parameters.

Write some code in Demo1

  • Static C string
  • Static OC string
  • OC method with arguments
  • OC method with no arguments
  • C function with arguments
  • C function with no arguments

As shown in figure:

Look at the corresponding Data segment in MachO:cstring.methname, the following two figures:

As you can see, the global static C character (myCString), Myelin (myCFuncAString:%d,myCFuncString,%s, myelin (% funcaString :%s, % funcString :%s) were stored in the data cStrings. Even parameter type strings such as %d,%s and so on are saved. But all the same strings are saved only once. Also, all OC methods are stored in methName.

I didn’t see global static OC strings and C functions (myCFuncA(int a),myCFunc()) in these tables. Where should they be preserved in the form of?

The data section is separated by cstring and MethName, and all class names, protocol names, and so on are stored in the same way.

The above has a general understanding of MachO, the next article on dyLD such an important thing for a preliminary study.

Iii. APP startup process from the perspective of DYLD source code

1. View at the breakpoint of the main function

First of all, can the end point in main see the stack corresponding to APP startup?

This part is actually hard to answer by thinking and guessing, so let’s try it directly with XCode:

You can see that at the breakpoint of the main function, the corresponding stack is not visible, indicating that the main function is also called by someone else, not in the stack of the app startup.

Since main can’t find the startup stack, can loads executed earlier than app be found?

2. View at the load method breakpoint

Similarly, direct XCode debugging:

More information can be found here, such as the assembly at the bottom of the stack (this is the phone debugging, so it is arm64 architecture) can be clearly found, is called in dyLD bootstrap file in the start method. Without stopping, open dyLD source code and find the corresponding dyLDBootstrap file in the start function. Click here to download the dyLD source code

3. Check the start function in dyLDBootstrap

// // This is code to bootstrap dyld. This work in normally done for a program by dyld and crt. // In dyld we have to do  this manually. // uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], intptr_t slide, const struct macho_header* dyldsMachHeader, uintptr_t* startGlue) { // if kernel had to slide dyld, We need to fix up load sensitive locations // We have to do this before using any global variables // slider, ASLR technology, address offset, Slide = slideOfMainExecutable(dyldsMachHeader); bool shouldRebase = slide ! = 0; #if __has_feature(ptrauth_calls) shouldRebase = true; #endif if (shouldRebase) {// redirect rebaseDyld(dyldsMachHeader, slide); } // allow dyld to use Mach messaging // mach_init(); // kernel sets up env pointer to be just past end of agv array const char** envp = &argv[argc+1]; // kernel sets up apple pointer to be just past end of envp array const char** apple = envp; while(*apple ! = NULL) { ++apple; } ++apple; __guard_setup(apple); // Set up random value for stack canary; #if DYLD_INITIALIZER_SUPPORT // run all C++ initializers inside dyld runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple); #endif // Now that we are done bootstrapping dyld, call dyld's main Uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader); return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code

Dlyd will find an address in memory for MachO to use, also known as ASLR, memory offset. Finally, the start function executes a main function (which may not be our app’s main function, but dyld’s) and returns. Again, we can’t just nuzzle, we gotta get in there!

4. View the main function in dlyd

This function is awesome, as shown below, and it’s nearly 500 lines long!

We grab the key code and step through what dyld did for us before main.

1. Configure environment variables

From the beginning of main to the functiongetHostInfo()Before all are in the configuration of some environment variables, some thread related, involved in the content is too low-level, this is not a analysis (in fact, is less than 😆)

There are many in this stepifIn fact, there are corresponding environment variables, which can be configured in XCode and corresponding operations (such as Log information).

2. Load the shared cache library

In iOS system, dynamic libraries dependent on each program need to be loaded into memory one by one through dyld (located at /usr/lib/dyld). However, if each program is loaded repeatedly, it will inevitably cause slow running. In order to optimize the startup speed and improve the performance of the program, the shared cache mechanism was created. All the default dynamic link libraries are merged into one large cache file, in/System/Library/Caches/com. Apple. Dyld/directory, according to the different architecture preservation were preserved. There are basic libraries like UIKit, Foundation, etc.

In the source code, you can see that in our iOS system, the shared cache library is explicitly loaded.

Because of this mechanism, iOS saves time and memory when loading these basic libraries!

But sometimes because of the mechanism of the shared cache library, the iOS C function in the shared cache library, that is, the system C function is not so static, there is a bit of OC runtime features!

This will be highlighted in the next article! Look at Runtime from a different Angle!

3. Instantiate the main program

Loading the main program is simply loading some columns of the LoadCommons section of the MachO file! We continue to follow up the code, as shown in the following 6 pictures:

Addendum: Call addImage(image) after instantiation, add the image to the list of all images sAllImages, the main program is always the first object of sAllImages!

You can see from the source code, loading the main program this step is actually very simple, is the MachO file part of the information step by step into memory.

From the last picture, we can know:

  • The maximum number of segments is 256!
  • The maximum number of dynamic libraries (including a custom system) is 4096!
4. Load the dynamic link library

Load dynamically linked libraries such as XCode’s ViewDebug, MainThreadChecker, and our later code injection libraries are also added this way!

5. Link the main program

The link function is actually to perform some kernel operations on the previous IMges (not images, this is a mirror). This part of Apple is not open source, can only see some source code, if you want to check it out:

Load and specific C++ constructor methods

Both from the previous breakpoint load method and we are now step by step on the source code basis,dyldtheinitializeMainExecutableThis is the entry to load:

After a series of jumps, the notifySingle function of DYLD will jump to the call_load_methods function in objC source code.

So what is the process in between? Check out the GIF below:

And finally find the function_dyld_objc_notify_registerIn the global can not find a call to the place, in fact, this function itself is not todyldCalled, but provided to external calls. How do I find out who called it_dyld_objc_notify_register?

Continue with the previous Demo1 and add it to the project_dyld_objc_notify_registerLook at the symbol breakpoint.

Run the project and check the function call stack again after stopping:And you can see it very clearlyobjc_initCalled our_dyld_objc_notify_registerFunction.

Open the sameobjcThe source (Click to download objC source code) Quick positioning_dyld_objc_notify_registerThe call location of. As shown in figure:

So we can find out how dyld loads our load method. During which if any careful students may have seen innotifySingleOn the heels ofdoInitializationSuch a function, which is a system-specific C++ constructor call method.

The C++ constructor is written specifically as follows:

__attribute__((constructor)) void CPFunc(){
    printf("C++Func1");
}
Copy the code

If you are interested, you can try it once and find the corresponding method in the MachO file! Of course, this is also available in Demo1.

7. Find the main function of APP and call it

After the above load and C++ methods are loaded, it will return to the main method of dyld, find the main function of APP and call it.

Finally, the main process in the main function of DyLD has been completed. Of course, these 7 steps are the main line, and there will be many other steps during the process, which is very complicated, so I don’t want to give an example. You can read the source code of DYLD.

Four,

This paper describes the overview of MachO, the file structure, in which the Load Commons LC_LOAD_DYLINKER extracted dyLD, then according to dyLD source code analysis of APP startup process. Respectively is: 4. Load the dynamic link library. 5. Link the main program. 6 In addition, the existence of LC_LOAD_DYLIB (load dynamic link library) in DYLD provides infinite possibilities for reverse injection of code. There are also symbol tables in MachO that provide a path for the system to query the corresponding method name, which will be covered in more detail in the next article.

Five, the reference

2. Reverse Engineering of iOS Applications, Sha Zi Press, Wu Hang, China Machine Press