preface

During application loading, not every line of code starts from base zero, relying on many base libraries such as UIKit, CoreFoundation, objc, etc… These libraries are executable binaries that can be loaded into memory by the operating system. Libraries are divided into static libraries and dynamic libraries. And when you run your project, if you put a breakpoint on it, you can also see on the left side of Xcode a little bit of the loading process, from start to the method of the breakpoint. But what happens between the start of the program and the loading of main? Here’s what you’ll find out.

Resources to prepare

  • Dyld source:Multiple versions of dyLD
  • Objc source:Multiple versions of OBJC
  • Download the libDispatch library source code
  • Download the libSystem library source code
  • Ice 🍺

To enter the body

We just talked about static and dynamic libraries, so how do programs load these static and dynamic libraries into projects? You need a dynamic linker -dyld. Is how to link after all? This will be analyzed in detail below.

  • The difference between static and dynamic libraries

Static libraryCan be added repeatedly, which can waste memory, andThe dynamic libraryYou can avoid this problem, which is why dynamic libraries are used in most Macintosh systems.

  • The build process

Case introduced

Ask the question: Why did it get into_objc_initFunction?

First, inobjcSource code, do not do any operation, directly run:It’s going to go straight into_objc_initInside the function. This is aobjcWhy did I come back here?

Introduce case preparation

To rebuild a project,ViewControllerTo implement inside+loadMethods. Then set breakpoints at the following points:Run the project again, the break point comes first+loadMethod, and then throughlldbDebug, executebtCommand to view the stack information:From the stack information, we can see that the program is from_dyld_startStart, total implementation13Step, and finally to+[ViewController load]But on the left side of Xcode, you just see_dyld_start –> load_images –> +[ViewController load]These three external methods are shown, but many internal methods are not shown, which can be viewed by printing the stack information.

Similarly, you can also compile and view the entire loading process step by step.

Now on the load process, have a preliminary understanding, next, through the analysis of dyLD library, to understand in detail. The analysis in this paper is based on dyLD-852 library. This library doesn’t work because it depends too much on the underlying system libraries. Hey hey)

dyldtheMacro process

From the analysis we just did, we know that the program started from_dyld_startIn the beginning, so in theLibrary dyld - 852The starting point of the inside, that’s where it starts. The usual, indyldFull-text index inside the code_dyld_startBecause fromdyld2In order to reduce the amount of pre-binding work, The MAC system has split a variety of architectures, such as:X86,X86_64,arm,arm64And so on. The following is aarmAs an example, the search results are as follows:The assembly can be awkward to look at directly, but there is a comment lineC++The code,dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)To see thisC++Method, and then compare assembly code and comments, you can roughly clear one or two, through the calculation of the corresponding parameters, and then pass the value jump. For the sake of intuition, let’s just go straight throughC++Methods to explore. Due to the nature of C++ definition methods, direct indexdyldbootstrap::startYou’re not going to get results. Find them firstdyldbootstrapAnd then find thedyldbootstrapIn scopestartFunction.

From assembly_dyld_startTo the c + +dyldbootstrap::start

Then indyldbootstrapScopestartFunction. (Now we are exploringdyldtheMacro process, so the specific implementation of the function, do not do too much explanation, the following function is the same)

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[], Const dyld3::MachOLoaded* dyldsMachHeader, uintPtr_t * startGlue) {... Code omission... // now that we are done bootstrapping dyld, call dyld's main uintptr_t appsSlide = appsMachHeader->getSlide(); return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue); }Copy the code

As you can see from the return value of the start function, it is dyld::_main()(which is the _main() function in dyld scope). The next step is to enter dyld::_main().

fromdyldbootstrap::startEnter thedyld::_main()

Again, we can see the light in the figure belowmain()The code for the function is closeLine 1000themain()The function does not complete the program loadmain()The delta function is going to bedyldOne insidemainFunction.How do you analyze such a large piece of code? We know that,dyldThe role of isDynamically linked image files (image)So we can find the answer in the source code, as long as we find the code for this. To view_mainThe return value of theresult. Then go ahead and check in_mainInside the functionresultAs shown in the figure below:_mainIn terms of the query results inside the function, one isfake_mainFunction, and the other one issMainExecutableFunction. However,fake_mainThe function, obviously, is not what we’re looking for.

int
fake_main()
{
	return 0;
}
Copy the code

So what’s left issMainExecutableFunction, and then you have to look at_mainInside the function, rightsMainExecutableThe function is used. Below is the_mainCall inside a functionsMainExecutableFunction conditions such as weak reference binding (7229), data binding (7215), image file binding (7136), instantiation (7009), and so on, which also feed back from the side, we looksMainExecutableIf the function looks right and matches the target we are looking for (target: load all image files and the corresponding others), this is the inverse method:To determine thesMainExecutableThe function is the next step to look at directly aftersMainExecutableFunction instantiation, that is

sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
Copy the code

This initialization method is called the image file loader. Then the next into instantiateFromLoadedImage function directly.

Instantiate the main programinstantiateFromLoadedImage()

From instantiateFromLoadedImage function implementation inside, know need to macho_header, slide, path, and then load the image (image)

static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
//	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
//	}
	
//	throw "main executable not a known format";
}
Copy the code

What are the parameters that are passed in? We can justfinderIn the previous case, and thenDisplay package ContentsAnd get themachOFile:getmachOFile, use againMachOViewThis tool opens up and you can view itmachOThe contents of the file, it containsmacho_header,Load Commands,Code snippet (TEXT),DATA segment (DATA),Symbol Table,String TableAnd so on. The diagram below:

Load the inserted dynamic libraryloadInsertedDylib

According to the commentsload any inserted librariesTo learn that:

Shared cache loadingmapSharedCache

According to the commentsload shared cacheTo learn that:

linkThe main program

Linking to the main programExecutable file,Insert dynamic library:

Weak references bind to the main programweakBind

Weak reference binding is performed only after all images have been linked:

noticedyldCan enter themain()function

Notifies all monitoring processes that this process is about to entermain()

Initialize theinitializeMainExecutable

runAll instantiated content:

Enter initializationinitializeMainExecutable

  • Example Initialize the image file

  • Initializes the main program executable

It’s all calledrunInitializersFunction, so let’s just go ahead and look at itrunInitializersFunction implementation.

runInitializersThe execution of a function

Perform initialization preparations

processInitializersPreparation for initialization

throughforThe image file is looping

recursiveInitializationRecursive initialization

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize, InitializerTimingList& timingInfo, UninitedUpwards& uninitUps) { recursive_lock lock_info(this_thread); recursiveSpinLock(lock_info); if ( fState < dyld_image_state_dependents_initialized-1 ) { uint8_t oldState = fState; // break cycles fState = dyld_image_state_dependents_initialized-1; Try {// initialize lower level libraries for(unsigned int I =0; i < libraryCount(); ++i) { ImageLoader* dependentImage = libImage(i); if ( dependentImage ! = NULL ) { // don't try to initialize stuff "above" me yet if ( libIsUpward(i) ) { uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) }; uninitUps.count++; } else if (dependentImage->fDepth >= fDepth) {dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps); } } } // record termination order if ( this->needsTermination() ) context.terminationRecorder(this); // let objc know we are about to initialize this image uint64_t t1 = mach_absolute_time(); fState = dyld_image_state_dependents_initialized; oldState = fState; NotifySingle (dyLD_image_state_initialized, this, &timingInfo); Bool hasInitializers = this->doInitialization(context); // initialize this image ------ // let anyone know we finished initializing this image fState = dyld_image_state_initialized; oldState = fState; Context. notifySingle(dyLD_image_state_initialized, this, NULL); if ( hasInitializers ) { uint64_t t2 = mach_absolute_time(); timingInfo.addTime(this->getShortName(), t2-t1); } } catch (const char* msg) { // this image is not initialized fState = oldState; recursiveSpinUnLock(); throw; } } recursiveSpinUnLock(); }Copy the code

There are three main things that have been done:

1. Load dependency files (①) :

  • Injection of a single notificationcontext.notifySingleTo prepare for the subsequent process. Only after the preparation is complete can the subsequent process be started ——– To initialize the image file;

2. Load itself (②, ③) :

  • callinitmethodsdoInitialization——– Starts to initialize the image file.
  • Notification that initialization is completecontext.notifySingle——– The image file is initialized.

notifySingle

Search for the assignment of notifySingle to see what happens to it. The main thing we want to look for is the loading of the image file, so the following piece of code conforms:

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader: : InitializerTimingList * timingInfo) {... Code omission... if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit ! = NULL) && image->notifyObjC() ) { uint64_t t0 = mach_absolute_time(); dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0); //----- key (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); uint64_t t1 = mach_absolute_time(); uint64_t t2 = mach_absolute_time(); uint64_t timeInObjC = t1-t0; uint64_t emptyTime = (t2-t1)*100; if ( (timeInObjC > emptyTime) && (timingInfo ! = NULL) ) { timingInfo->addTime(image->getShortName(), timeInObjC); }}... Code omission... }Copy the code

To load an image file, (*sNotifyObjCInit)(image->getRealPath(), image->machHeader()), of type sNotifyObjCInit, and then go to sNotifyObjCInit.

sNotifyObjCInitDetails of the

The index tells you what type it is,static _dyld_objc_notify_init sNotifyObjCInit. Look at thesNotifyObjCInitAssignment of phi, which is just phiinit:And the function that performs the assignment isregisterObjCNotifiers. There are three assignment objects:

  • sNotifyObjCMapped= mapped;

  • sNotifyObjCInit= init;

  • sNotifyObjCUnmapped = unmapped;

registerObjCNotifiersThe call

When you see_dyld_objc_notify_registerFunction at the beginning of the corresponding articleObjc sourceIn the_objc_initThe implementation inside the function, also has_dyld_objc_notify_registerFunction call, as shown below, isObjc sourceIn the_obcj_init function:

  • notifySingle –> _dyld_objc_notify_registerAnd through theObjc sourceTo be seen,_dyld_objc_notify_registerinobjctheinitIt’s also called, in assembly,_dyld_objc_notify_registerIs in thelibobjc.A.dylibThis is the mirror library.

The value passed by the _dyLD_OBJC_notify_register function in the objc source code is:

  • map_imagesThe address of the function – the key data to communicate the previous content, such as:class,protocol,property,methodlistAnd so on;
  • load_imagesFunction implementation;
  • unmap_imageFunction implementation.

Combine the value assigned to the source registerObjCNotifiers in dyld above:

  • sNotifyObjCMapped= mapped = &map_images;

  • sNotifyObjCInit= init = load_images;

  • sNotifyObjCUnmapped = unmapped = unmap_image;

The objc_init function registers three functions in dyld that will be called when dyld loads the image file if the conditions are met.

Given that map_images and load_images functions are the bridge between objc and dyld, what about the calls to map_images and load_images? —– See the notes at the end of this article.

Comb summary1

The article here, has talked about a relatively long process, maybe some children will feel a little confused, so let’s comb a little, according to the above analysis, back to look at the stack information in the case, you can feel not so strange:

  • Combing process:_dyld_start –> dyldbootstrap::start –> dyld::_main –> dyld::initializeMainExecutable –> runInitializers –> processInitializers –> recursiveInitialization.

This is just the process in the Dyld library, there are other libraries involved to get to the _objc_init function, so we need to explore further.

from_objc_initFunction backstepping

Now we need to figure out what happens next. So far, we already know that there areobjcSource code, running air engineering, can be executed_objc_initFunction, then put a breakpoint on the function and check its stack information:And from that information, we can see that, in terms of what we deriverecursiveInitializationAfter the function, you need to execute multiple functions to get to_objc_initFunction, now known as the two conditions, so the need for the push, from_objc_initSo let’s start with the function, and go ahead.

_os_object_initCalling a function

Based on the stack information,_objc_initFunction is by_os_object_initFunction. In the stack information,_os_object_initFunction in theLibdispatch libraryInside, then open the library code and look for the function.From the source code,_os_object_initThe function is called_objc_initFunction. Then, it islibdispatch_initFunction, called_os_object_initFunction. So it’s still a global indexlibdispatch_initThe function,

Void libdispatch_init(void) {...... Omit the configuration code... #endif _dispatch_hw_config_init(); _dispatch_time_init(); _dispatch_vtable_init(); _os_object_init(); //-------- calls _voucher_init(); _dispatch_introspection_init(); }Copy the code

From the source, the libdispatch_init function calls the _os_object_init function.

libSystem_initializerCalling a function

Based on the stack information, yeslibSystem_initializerThe function is calledlibdispatch_initFunction, which can be known from the stack informationlibSystem_initializerFunction isLibSystem libraryInside, then open the source of this library, to find. See the source code:libSystem_initializerInside the function, it’s calledlibdispatch_initFunction.

doModInitFunctionsCalling a function

Go back to the stack information and calllibSystem_initializerThe function isdoModInitFunctionsFunction, anddoModInitFunctionsThe number of functions belongs todyldIn the library. seedoModInitFunctionsFunction implementation source:There’s a comment in there,libSystem initializer must run firstMust be loaded firstlibSystemLibrary). Based on the previous analysis, we know that,dyldIs to load all image files. whilelibdispatchLibraries andobjcIs dependent onlibSystemThe library. So,libSystemThe library is the first library to load. And from that we can infer,doModInitFunctionsThe function has to be rightlibSystemThe library is loaded.

Or you could say,doModInitFunctionsThe function loads allC++File. Why do you say that? Here’s an example:The result from the stack information on the left side is loadingkcFunc()Function is executed firstdoModInitFunctionsFunction.

doInitializationCalling a function

Again, let’s go back to the stack informationdoInitializationThe function is calleddoModInitFunctionsThe function (also can be indyldLibrary face, full text indexdoModInitFunctionsFunction). See the source code:

We can go from here, or we can godyldLibrary face, full text indexdoInitializationFunction, see the source:Is in therecursiveInitializationFunctiondoInitializationFunction.

So that’s where we’re going with the derivation.

Comb summary2

From dyld library to LibDispatch library and libSystem library, there is a smooth flow:

_dyld_start –> dyldbootstrap::start –> dyld::_main –> dyld::initializeMainExecutable –> ImageLoader::runInitializers –> ImageLoader::processInitializers –> ImageLoader::recursiveInitialization –> doInitialization –> DoModInitFunctions –>libSystem_initializer (libsystem.b.dylib) –> _os_object_init (libdispatch.dylib) –> _objc_init(libobjc.A.dylib)

  • With a general flow chart:

main()Calling a function

When performing the_objc_initAfter delta delta delta delta delta delta delta delta delta delta delta delta delta delta delta delta delta deltamain()Function. So how did he get there? According to thedyldSource,_dyld_startAssembly, through the comments, can know how to jump intomain()The inside of the:And, of course, we can also print that out in our case project. In assembly, knowing existsraxRegister, then in the project:

  • whilemainIt’s also written as a specific symboldyldInside, you can’t change it.

Comments:map_imagesandload_imagesCall the situation

From the previous analysis, we can see that the map_images and load_images functions are the bridge between objC and dyld. We need to sort out the details here.

The incomingmap_imagesandload_images, it is inobjcthe_objc_initFunction_dyld_objc_notify_registerFunction:_objc_initFunction is bydyldThe inside of thedoModInitFunctionsThe function is initialized by docking. At the same time, in thedyld, is maderegisterObjCNotifiersTo assign:

Depending on the assignment, let’s seemap_images, you can findsNotifyObjCMappedUse of the place. indyldFull-text index:notifyBatchPartialInside the function, whensNotifyObjCMappedDon’t forNULLIs called directly. That ismap_imagesThe call. Go back toregisterObjCNotifiersInside, you can see,notifyBatchPartialThis is where the function is called:load_imagesAlso in theregisterObjCNotifiersInside, and in order of code, yesmap_imagesFirst,load_imagesAfter the implementation. Registration is complete.

We can also be inObjc sourceTo do a test in themap_imagesFunctions andload_imagesTo see the order in which the function is executed:To perform firstmap_images, according to the stack information on the left:_objc_init –> _dyld_objc_notify_register –> notifyBatchPartial –> map_images. To performload_images, according to the stack information on the left:_objc_init –> _dyld_objc_notify_register –> registerObjCNotifiers –> load_images.

  • Summary:dyldThe image file is loaded when the main program is initializedlibObjc.dylibLibrary, at this point,objcWill send todyldThree ways to register:map_images,load_images,unmap_imageAnd,map_imagesWill be executed first, andload_imagesAfter performing!