The shade of green trees thick summer long, the reflection of the building into the pond. — Tang Gaopian, Summer in the Pavilion

Images of Mach-O files and processes

The storage layout of executable or dynamic library files generated by iOS is called Mach-O format. Files hold the program’s code and data, and while the program is running, the system creates a process for it and allocates virtual memory space. At the same time, the contents of the program file will be loaded into the virtual memory address space. This method of loading is generally implemented by the memory mapping file technology. The so-called image can be understood as the contents of a program file loaded into the virtual memory of the process, that is, the process image is a copy of the program disk file in memory. Generally speaking, the contents and memory layout structure of the image in a process will be the same as the contents and storage layout structure of the program file. The initial address of the image is a struct mach_header structure pointer. The content layout and program files in the image are arranged in segments. However, there are cases where the memory layout and content of the image may be inconsistent with the memory layout and content of the application file:

  1. The data section of the image, because the data section is mostly read/write accessible, that is, it can be modified at run time, or some information is rebase. Therefore, data segments cannot be shared between processes, but each process maintains a separate copy. Of course, for efficiency and performance, systems use a technique called Copy on write to Copy individual copies. Usually only immutable portions of code are shared by multiple processes and are consistent with the contents of the file. A common example is that dynamic libraries loaded in a process and snippets of code in a framework are often shared by all processes.

  2. Even a snippet of code can be inconsistent between what’s in the image and what’s in the program file. In some cases, the contents of certain segments in the image will be the segment cached in the system, rather than the corresponding segment in the program file. A good example of this is the CoreLocation library. When the library is loaded, you will find that some code snippet in the image is actually the content of the system cache rather than the content of the application file.

So there is no one-to-one correspondence between the program file and the memory image of the program after it is loaded. The relationship between program files and images is the same as the relationship between programs and processes. Access to all of the Mach-O data structures in the process after the program runs is image-based rather than file-based.

Slide mechanism

When a program is built, it is assigned a base address that is loaded in memory by default for ease of computation and processing. Thus, all address variables in the program that involve address storage are based on this base address. For example, we have variables in our code that hold the address of a function or the method structure of the OC class in Rumtime :struct method_t imp that holds the address of a function and so on. Normally, if our program is loaded at the base address specified in the program to the corresponding address in virtual memory, then everything works fine and we don’t need to change anything. But the reality is different:

  1. Any library or executable program is built to specify a loaded base address, but there is no guarantee that this base address is unique. And there is no guarantee that the address range of the program image will not overlap. Therefore, it is possible to have overlapping overlays when multiple libraries are loaded into memory.
  2. IOS system uses a technology called **ASLR(Address Space Layout Randomization)** to ensure the security of applications. This technique makes the base address of each program or library loaded into memory not fixed but random each time it runs, which makes it more difficult for hackers to crack.

The above two cases indicate that the actual base address of a program or library loaded into memory is not the same as the base address specified when the program is built. The system selects non-overlapping areas for the executable and each library to load. However, this will cause access exceptions to all of the address Pointers in the program that are based on the build-time base address, because these address values are not actually in memory.

To fix this the system adds a special load command to the program or library it builds: LC_DYLD_INFO or LC_DYLD_INFO_ONLY. This part of the information is used to record all the locations where address adjustment is required. In this way, when the program is loaded into memory, the loader will adjust the address to be adjusted separately, so as to convert it into a real memory address. This process is called rebase.

Suppose the program is built with the specified base address of A, A function pointer is stored somewhere in the program at the address of X, and the program is loaded into memory with the real base address of B. That is, the offset between the actual base address and the build base address is B-A. We call this offset difference the Slide value. So the real address x is adjusted to: x + (B – A).

The base address value of a program at build time can be obtained from the vmADDR data member of the program’s first code segment description structure, struct segment_command, named __TEXT. The mach-O head pointer to the struct mach_header of the loaded image is the actual base address of the loaded image, so:

Slide value of the image = pointer to the mach_header structure of the image – the image’s first __TEXT code segment describes the structurestruct segmeng_commandThe value of the VMADDR data member in.

Of course, the system also provides interface apis to get Slide values of executable or library images. This will be covered below.

Segment and Section

The Mach-O file consists of a number of load commands, each of which represents a data type. For example, some load commands are used to store program code and global variable data, some load commands are used to store symbol tables, and some load commands are used to store code signing information. Each load command is an extension of the struct load_command. The CMD field describes this type of load command.

Load commands of type LC_SEGMENT or LC_SEGMENT_64 are called segments. The code and global variable data in an executable program are stored in segments. The information describing the segment is a struct Segment_command structure. A program can have many segments, each with a unique segment name. For example, all code in an executable program is stored in a code segment named :__TEXT, and all data is stored in a data segment named :__DATA. Segments are aligned with page boundaries.

Each Section consists of several sections. A section is the smallest unit of management for content classification. The description of each section is a structure called a struct section. Each section has a unique name that identifies the section. For example, a section called :__text is used to hold the machine instructions corresponding to the source code written by the user in the program, and a section called :__stub_helper is used to hold the stub code of all external functions called. The following diagram shows the layout of sections and sections in an application:

Process Image manipulation API

The apis that operate on the image are all declared in < Mach-o /dyld.h>. You can import this header file to use functions defined in it. I’m going to go through each of these functions.

1. Get the number of images loaded in the current process
Uint32_t _dyLD_image_count (void)Copy the code
2. Get the pointer to the Mach-O header structure of an image
const struct mach_header*   _dyld_get_image_header(uint32_t image_index) 
Copy the code

The mach-O () function takes the index number of the image in the process and returns a pointer to the mach-O head of the image, or a pointer to the struct mach_header_64 structure on 64-bit systems. You can traverse and access all the information and data in the image through the header structure of the image returned by this function.

The pointer to an image’s header structure is actually the base address that the image loads in memory.

In general, an image with an index of 0 is the image of the DYLD library, and an image with an index of 1 is the executable image of the current process.

The system also provides a function that is not declared in the header file:

const struct mach_header* _NSGetMachExecuteHeader()
Copy the code

This function returns a pointer to the header structure of the executable image of the current process. Since this function is not declared in a specific header file, you need to declare it at the beginning of the source file when you use it:

 extern const struct mach_header* _NSGetMachExecuteHeader();
Copy the code
3. Get the Slide value loaded by an image in the process
intptr_t   _dyld_get_image_vmaddr_slide(uint32_t image_index) 
Copy the code

The input parameter of the function is the image’s index number in the process, and the return value of the function is the Slide value of the image load. The introduction to Slide values is detailed above. This value should be added to any reference to pointer fields in the structure description of a Mach-O program to be the true memory address.

4. Obtain the name of an image in the process
const char*  _dyld_get_image_name(uint32_t image_index)
Copy the code

The input parameter of the function is the index number of the image in the process. The return value of the function is the full path name of the library corresponding to the image. The returned string cannot be modified or destroyed.

5. Register the callback notification functions for image load and unload
void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide))
void _dyld_register_func_for_remove_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide))
Copy the code

If you register an image loaded callback with the function _dyLD_register_func_for_add_image, the registered callback will be called every time a new image is loaded but not initialized. The two input parameters of the callback function represent the header structure of the loaded image and the corresponding Slide value. If some images are already loaded when _dyLD_register_func_FOR_ADD_image is called, the registered callback function is called for each of those loaded images.

If you register an image unload callback with the function _dyLD_register_func_for_remove_image, the registered callback will be called every time an image is unloaded. The two inputs to the callback represent the unload image’s header structure and the corresponding Slide value, respectively.

These two functions are usually used for monitoring application loading images and some statistical processing.

6. Get the linked and runtime version number of a library
/ / get repository runtime version number int32_t NSVersionOfRunTimeLibrary (const char * libraryName) / / get the version number int32_t library link NSVersionOfLinkTimeLibrary(const char* libraryName)Copy the code

When we link some system dynamic libraries in XCODE project, sometimes we will select a specific version of the dynamic library, but some operating systems may not provide the corresponding version of the dynamic library, so that the version of the dynamic library loaded when the program runs is inconsistent with the version of the dynamic library specified during the link. Another scenario is that there is no link to the corresponding dynamic library in the project, but because other libraries will link to the corresponding dynamic library, the corresponding dynamic library will be loaded even though there is no direct link to the corresponding dynamic library. So the system provides these two apis to get a dynamic library link and load the version number of the runtime. The input parameter to both functions is the name of the dynamic library, which is the library name without path and extension and without the lib prefix. The version number of the library is returned, or -1 if the library does not exist, is not loaded, or is not linked. For example:

// the name c++ refers to the library libc++. Dylib. uint32_t v1 = NSVersionOfRunTimeLibrary("c++");
    uint32_t v2 =  NSVersionOfLinkTimeLibrary("c++");
Copy the code

The latter function returns -1 if our program does not display the link libc++. Dylib. The former usually returns a corresponding libc++ version number.

These two functions are used for library analysis and runtime monitoring, such as checking whether a library is a dynamic library that is loaded at runtime rather than being linked in.

7. Obtain the path file name of the executable program of the current process
int _NSGetExecutablePath(char* buf, uint32_t* bufsize)
Copy the code

Buf and bufsize specify the size of the cache to hold the executable path name and the size of the cache, where bufsize specifies the size of the cache and prints the actual size of the executable path name. Returns 0 if the function call returns correctly, -1 otherwise. Take this example:

char buf[256];
uint32_t bufsize = sizeof(buf)/sizeof(char);
_NSGetExecutablePath(buf, &bufsize);

Copy the code
8. Register the callback function when the current thread terminates
void _tlv_atexit(void (*termFunc)(void* objAddr), void* objAddr)
Copy the code

Sometimes we want to monitor the end of a thread, so we can use this function to do that. This function monitors the end of the current thread and calls the registered callback function when the thread terminates or terminates. The _tlv_atexit function takes two arguments: the first is a callback pointer, and the second is an extension argument used as an input parameter to the callback function.

I don’t understand why this function is declared in <mach-o/dyld. H >, completely unbounded!

Segment and Section manipulation apis

The apis that operate on segments and sections are declared in import < Mach-o /getsect.h>. You can import this header file to use functions defined in it. Of course, if you know the mach-O file format, you can skip these apis and use the image’s struct mach_header to traverse and access segments and sections directly. However, since the system already provides the relevant apis, it is best to use them first. I’m going to go through each of these functions.

The segment and section manipulation apis are implemented in the libmacho. Dylib library, which is not open source yet.

1. Get non-Slide data Pointers and dimensions for a section in a section of the in-process image
// Gets the data pointer and size of a section in a section of the executable image in the process. char *getsectdata(const char *segname, const char *sectname, Unsigned long *size) // Get the pointer and size of the segname and sectName sections of the library loaded by the process. char *getsectdatafromFramework(const char *FrameworkName, const char *segname, const char *sectname, unsigned long *size);Copy the code

These two functions return the data pointer and size of an executable image in a process or of a section in a section of a loaded dynamic library. These two functions return the values of the addr and size members of the struct section. Note that the address value returned is a pointer without the Slide value, so we need to add the corresponding Slide value when we want to access the real address in the process. Here is an example code:

Intptr_t slide = _dyLD_GET_image_vmaddr_slide (1); unsigned long size = 0; char *paddr = getsectdata("__TEXT"."__text", &size); char *prealaddr = paddr + slide; // This is the real address to visit.Copy the code

The code for getSectData is as follows:

Char * getSectData (const char *segname, const char *sectname, unsigned long *size) { const struct mach_header_64 *mhp = _NSGetMachExecuteHeader(); // This function is described below.return  getsectdatafromheader_64(mhp, segname, sectname, size);
}

Copy the code

I personally do not recommend using this function but rather the getSectionData function described below.

2. Obtain the boundary information of segments and sections
// Gets the starting address after the data of the last segment of the current process's executable image. unsigned long get_end(void); // Gets the start address after the data in the __TEXT section of the first __TEXT section of the current process's executable image. unsigned long get_etext(void); Unsigned long get_edata(void); // Get the start address after the __data section of the first _DATA segment of the current process's executable image.Copy the code

These functions are used to get the end of a specified segment or section, and to determine whether an address is within a specified boundary. Note that the boundary values returned by these functions are those without Slide values. Here are the internal implementations of these functions:

unsigned long get_end()
{
   unsigned long end = 0;
   const struct mach_header_64 *mhp =  _NSGetMachExecuteHeader();
   struct segment_command_64 *psegcmd = mhp + 1;
   for (int i = 0; i < mhp->ncmds; i++)
   {
       if(psegcmd->cmd ! = LC_SEGMENT_64)break;
       end = psegcmd->vmaddr + psegcmd->vmsize;
       psegcmd += 1;
   }
   return end;
}

unsigned long get_etext()
{
   const struct section_64 *sec = getsectbyname("__TEXT"."__text");
   return psection->addr + psection->size;
}

unsigned long get_edata()
{
   const struct section_64 *sec = getsectbyname("__DATA"."__data");
   return psection->addr + psection->size;
}

Copy the code
3. Obtain the segment description of the executable image in the process
Const struct segment_command * getSegbyName (const char *segname) // 64-bit version of the above function const struct segment_command_64 *getsegbyname(const char *segname)Copy the code

These two functions return segment description information for a segment of the executable image in the process. The segment description is a struct segment_command or struct segment_command_64 structure.

For example, the following code returns the segment information of the executable image segment __TEXT in the process.

const struct segment_command_64 *psegment = getsegbyname("__TEXT");
Copy the code
4. Get the description of a section in a section of the executable image in the process
// Gets the description of a section in a section of the executable image in the process. const struct section *getsectbyname(const char *segname, Const struct section_64 * getSectByName (const char *segname, const char * sectName)Copy the code

These two functions return the description of the SectName section in the SegName section of the executable image in the process on a 32-bit and 64-bit system, respectively. Section description is a struct section or struct section_64 structure. For example, the following code returns the description of the __TEXT section in the __TEXT segment:

  struct section_64 *psection = getsectbyname("__TEXT"."__text");
Copy the code
5. Obtain the segment data of the image in the process
// Get the data for the specified segment of the specified image. uint8_t *getsegmentdata(const struct mach_header *mhp, const char *segname, Uint8_t * getSegmentData (const struct mach_header_64 * MHP, const char *segname, unsigned long *size)Copy the code

The function returns the address pointer to the contents of segment SEGname in the specified image MHP in the process, and the size of the entire segment is returned to the pointer indicated by size. The internal implementation of this function returns the value of the vmADDR data member of the segment description struct segment_command plus the slide value of the MHP image. Size returns the VMSIZE data member in the segment description structure.

As mentioned earlier, due to the slide value at image loading time, the values of the address-related data members in the various Mach-O structures in the image need to be added with the Slide value to get the actual loading address of the image in memory.

The address of the data in the first __TEXT segment of each image in the process is actually the address of the image’s mach_header header structure. This is a special case.

The following code demonstrates getting the data from the __DATA segment of the image of the 0th index position in the process.

struct mach_header_64 *mhp = _dyld_get_image_header(0);
unsigned long size = 0;
uint8_t *pdata = getsegmentdata(mhp,  "__DATA", &size);
Copy the code
6. Obtain the data of a section in a section of the process image
// Get the data address and size of a section in a section of the process image. uint8_t *getsectiondata(const struct mach_header *mhp, const char *segname, const char *sectname, Uint8_t * getSectionData (const struct mach_header_64 * MHP, const char *segname, const char *sectname, unsigned long *size)Copy the code

The function returns the address pointer to the contents of the sectName section of segName in the specified image MHP in the process, and the size of the entire section is returned to the pointer indicated by size. The internal implementation of this function returns the value of the ADDR data member in the struct section of the section description information plus the slide value of the MHP image. Size returns the value of the size data member in the segment description structure.

As mentioned earlier, due to the slide value at image loading time, the values of the address-related data members in the various Mach-O structures in the image need to be added with the Slide value to get the actual loading address of the image in memory.

The following example gets the data address pointer and size of the “__TEXT” section of the “__TEXT” section of the 0th image in the process:

struct mach_header_64 *mhp = _dyld_get_image_header(0);
unsigned long size = 0;
uint8_t *pdata = getsectiondata(mhp,  "__TEXT"."__text", &size);
Copy the code
7. Obtain the description of a section in a section in the Mach-O file
Const struct section * getSectByNameFromHeader (const struct mach_header * MHP, Const char * segName, const char * sectName) // Get the description of a section in a section in the specified Mach-o file. FSwap transmits the NXByteOrder enumeration value. const struct section *getsectbynamefromheaderwithswap(struct mach_header *mhp, const char *segname, const char *sectname, Const struct section_64 * getSectByNameFromHeader_64 (const struct mach_header_64 * MHP, const char *segname, Const char * sectname) / / above the corresponding function of a 64 - bit system version const struct section * getsectbynamefromheaderwithswap_64 (struct mach_header_64 *mhp, const char *segname, const char *sectname, int fSwap)Copy the code

This series of functions returns the section description of the Mach-O file for 32-bit and 64-bit systems, respectively. Each function has a Segname and sectName to specify the segment name and section name to get, respectively. The MHP parameter indicates the head structure pointer to the Mach-o file. There are systems or Mach-O files where values are big-endian, so for those big-endian structures you need to pass fSwap to determine whether the codes are swapped.

The MHP structure in this series of functions is not limited to the header structure of images in process, but also applies to the header structure of Mach-O files. If you don’t know the difference between images and files, see the introduction at the beginning of this article.

Since the arrangement of the sections in the image in the process is the same as the arrangement of the sections in the Mach-O file, the above implementation of getSectByName is implemented with the help of the function provided in this Section, which is implemented as follows:

const struct section_64 *getsectbyname(
    const char *segname,
    const char *sectname)
{
   const struct mach_header_64 *mhp =  _NSGetMachExecuteHeader();
   return getsectbynamefromheader_64(mhp, segname, sectname);
}
Copy the code
8. Get the data pointer and size of a section in a section of the Mach-O file
Char * getSectDataFromHeader (const struct mach_header * MHP, const char *segname, // Char * getSectDataFromheader_64 (const struct mach_header_64 * MHP) const char *segname, const char *sectname, uint64_t *size)Copy the code

These two functions return the data pointer and size of a section in a section of a Mach-O file on a 32-bit or 64-bit system. These two functions return the corresponding section description of the struct section of the value of addr and size. Since these functions are specific to Mach-O files, they can also be used in the corresponding library image. When applied to the library image, remember to add the corresponding slide value to the returned result to the address of the actual section data!

A very useful DEMO

IOS offers a dark magic mechanism called Method swizzling. It can replace the default implementation of a method on a class at run time. However, the technology has two sides. For jailbreak system, malicious developers can change the original logic of program running through dynamic library injection and use the technique of method exchange, so as to bypass some routine detection and obtain illegal benefits.

Everything is attacked and defended, through the API function introduced in this article can to a certain extent to detect a method in a class is illegal HOOK. Take an instance method of a class in an executable program. Instance of a class of methods defined in an executable program implementation addresses in the executable image always address range within the scope of even this method by other methods in the executable HOOK, the HOOK method address is still in the executable image interval range, we still think this is a legitimate HOOK. If an instance method of a class in an executable program is injected by a malicious attacker through the dynamic library to HOOK the implementation of the original method in the form of method exchange, because the HOOK method address is within the address range of the maliciously injected dynamic library image, Therefore, we can determine whether the method is maliciously hooked by detecting whether the implementation address of the instance method of the class is within the address range of the executable image. The following is the specific implementation code of this kind of detection, it is recommended that the detection code use C function to achieve rather than using OC class methods to achieve, otherwise this detection logic may also be HOOK.

//Author by Ouyang Da#import <mach-o/dyld.h>
#import <mach-o/getsect.h>BOOL checkMethodBeHooked(Class class, SEL selector) {/ / you can also use the C runtime function to get the method implementation address IMP IMP = [class instanceMethodForSelector: selector];if (imp == NULL)
         returnNO; // Calculate the slide value of the executable program. intptr_t pmh = (intptr_t)_NSGetMachExecuteHeader(); intptr_t slide = 0;#ifdef __LP64__
    const struct segment_command_64 *psegment = getsegbyname("__TEXT");
#else 
    const struct segment_command *psegment = getsegbyname("__TEXT");
#endif
    intptr_t slide = pmh - psegment->vmaddr

    unsigned long startpos = (unsigned long) pmh;
    unsigned long endpos = get_end() + slide;
    unsigned long imppos = (unsigned long)imp;
    
    return (imppos < startpos) || (imppos > endpos);
}
Copy the code

👉 [Back to directory]

Welcome to visit myMaking the address