This is the second article in the Mach-O series, which explores Mach-O: File format analysis as a basis for this article
We all know that Mach-O is an executable in OS X, and when it comes to executables, processes are a must. In Linux, we Fork() to create a new child process, and then mirror it with exec() to replace it with another executable. Why do we do this
Reason: This is based on the operating system analysis. A process can create a child with a fork() system call that gets a separate copy of the parent’s address space, including text, data and BSS segments, heap, user stack, etc., but the new thread will only copy the thread that called the fork. All other threads in the parent process suddenly “evaporate” in the child process.
We often refer to locks in threading issues, where each lock has a holder (the thread that locked it for the last time). For performance purposes, the lock object is copied to the child process because of the fork, but the child process only copies the thread that called the fork and probably does not own the lock holder thread, so there is no way to unlock the lock, resulting in deadlock problems and memory leaks
Ways to avoid deadlocks: Immediately call the exec function in the child thread, once a process call exec class function, it is “death”, the system USES the code to replace the new program code, abandoned the original data and stack section, as well as the distribution of new processes and new data segments stack section, the only thing left, is a process, that is to say, for the system, or the same process, But that’s a different process.
To sum up, we will load an executable file in user mode through exec* series of functions, and exec* is just a wrapper for system call execve. Let’s start with the process of loading Mach-O. There are many file types such as MH_DYLIB files, MH_BUNDLE files, MH_EXECUTE files (which need dyLD dynamic loading), MH_OBJECT (kernel loading), etc. Therefore, a process often does not only need the kernel loader to complete the loading, dyLD is needed to carry out dynamic loading coordination. Considering both kernel loading and DYLD loading, the following flow chart appears
execve
This function simply calls __mac_execve() directly. For internal implementation details, see the XNU source code
__mac_execve()
BSD /kern/kern_exec.c
It does the initialization of the data for loading the image, as well as the resource-related operations. Exec_activate_image () is executed inside it, and the image load is done by it
int __mac_execve(proc_t p, struct __mac_execve_args *uap, int32_t *retval) { struct image_params *imgp; // Initialize imGP data....... exec_activate_image(imgp); }Copy the code
exec_activate_image
BSD /kern/kern_exec.c
Basically copying executable files into memory and selecting different loading functions for different executable types, all image loading either terminates on an error or eventually completes the image loading. The program that deals specifically with executable file formats in OS X is called the ExecSW Image loader
OS X has three types of executables, Mach-o handled by exec_mach_imgact, Fat binary handled by exec_fat_imgact, and interpreter handled by exec_shell_imgact
exec_mach_imgact
BSD /kern/kern_exec.c
It is used to check mach-O headers, parse its architecture, check imGP, and reject files such as Dylib and Bundle, which are loaded by DyLD
Then map mach-o to memory and call load_machfile()
load_machfile
BSD /kern/mach_loader.c
Load_machfile loads various load monmand commands in Mach -o. Internally, data segment execution is disabled to prevent overflow vulnerability attacks, address space layout randomization (ASLR) is set, and some mapping adjustments are made.
Parse_machfile () is really responsible for parsing load commands
parse_machfile
BSD /kern/mach_loader.c
Parse_machfile is loaded in a different function depending on the type of load_command, which is implemented internally by a Switch statement
Common commands include LC_SEGMENT_64, LC_LOAD_DYLINKER, LC_CODE_SIGNATURE, LC_UUID, etc. For more, check out the Mach-O file format and check out this article – Interesting Mach-O: File Format Analysis
The command is scanned multiple times, and after three scans and dylinker_command exists, load_dylinker() is executed to start the dynamic linker(dyld)
Dynamic linking process
Dynamic linking can be distinguished from the load commands specified dylib that is statically stored in a binary file (as many blogs do) and the DYLD_INSERT_LIBRARIES that are specified dynamically. The following is a pre-specified dynamic library in binary files, the following exposition is mainly from the former point of view, for the later study of dynamic specification
You can look at the load method by setting a breakpoint and looking at the call stack:
0 +[XXObject load]
1 call_class_loads()
2 call_load_methods
3 load_images
4 dyld::notifySingle(dyld_image_states, ImageLoader const*)
11 _dyld_startCopy the code
_dyld_start is very bright, feel is dyld entry, and then to see the source code, dyld _dyld_start global search, found a comment, then read down the comments
Dyld/SRC /dyld. CPP
The kernel loads dyld and call dyld_start method, then dyld_start invokes the _main (), the data in the _main function after an initialization, it invokes instantiateFromLoadedImage function to initialize the ImageLoader instance
// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);Copy the code
instantiateFromLoadedImage
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoader* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return image;
}
throw "main executable not a known format";
}Copy the code
InstantiateFromLoadedImage within this function code is easier to understand, to test whether the Mach – O legal, legal, it initializes the ImageLoader instance and then add it to to a global management ImageLoader array
IsCompatibleMachO compares some information in the Mach-O header with the current platform to determine its validity.
ImageLoader
//
// ImageLoader is an abstract base class. To support loading a particular executable
// file format, you make a concrete subclass of ImageLoader.
//
// For each executable file (dynamic shared object) in use, an ImageLoader is instantiated.
//
// The ImageLoader base class does the work of linking together images, but it knows nothing
// about any particular file format.
//
//
class ImageLoader {
}Copy the code
Note that ImageLoader is an abstract base class, and each dynamically loaded executable initializes an instance of ImageLoader
instantiateMainExecutable
The source code can refer to: dyld/SRC/ImageLoaderMachO. CPP
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
bool compressed;
unsigned int segCount;
unsigned int libCount;
const linkedit_data_command* codeSigCmd;
const encryption_info_command* encryptCmd;
sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
// instantiate concrete class based on content of load commands
if ( compressed )
return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
else
#if SUPPORT_CLASSIC_MACHO
return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
throw "missing LC_DYLD_INFO load command";
#endif
}Copy the code
It might have a compressed of type Bool as a judgment, and then return a different instance.
What do both examples do?
ImageLoaderMachOCompressed and ImageLoaderMachOClassic are inherited in ImageLoaderMachO, ImageLoaderMachO inheritance to ImageLoader
The sniffLoadCommands will determine whether mach-O is classic or compressed
InstantiateMainExecutable is ImageLoaderMachOCompressed or ImageLoaderMachOClassic do initialization, and load the load comond command, call process is simple.
This article is written relatively simple, later found better articles, friends can refer to these three articles to learn the loading process. (Updated on 2017.09.25)
- XNU, DYLD Source Analysis of Mach-O and Dynamic Library loading Process (PART 1)
- XNU, DYLD source analysis, Mach-O and dynamic library loading process (part 2)
-
Analysis of dylib dynamic library loading process
Refer to the link
-
dyld sourcecode analysis
- Hook with root permission to run the App
- Do you really understand the load method?
- In-depth understanding of MAC OS X & iOS operating systems
- XNU sourcecode
- dyld sourcecode