preface
In the dyLD program loading process, _objc_init is a key method. Since the _objc_init method registers callback functions with dyLD, let’s explore the _objc_init method
The preparatory work
_objc_init
Explore _objc_init must be inseparable from the source, not to say more about the source
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init(a);tls_init(a);static_init(a);runtime_init(a);exception_init(a);#if __OBJC2__
cache_t: :init(a);#endif
_imp_implementationWithBlock_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
#if __OBJC2__
didCallDyldNotifyRegister = true ;
#endif
}
Copy the code
environ_init
: Reads environment variables that affect the runtime, and prints environment variable help if neededexport OBJC_HELP
=1
tls_init
: About threadskey
, such as the destructor for per-thread datastatic_init
Run:C++
Static constructors. indyld
Before calling our static constructor,lib
Will be called_objc_init
Call your own firstC++
The constructorruntime_init
:runtime
Initialization of the runtime environment, which is mainlyunattachedCategories
andallocatedClasses
Two tablesexception_init
Initialization:libobjc
Library exception handling systemcache_t::init
: Initializes the cache condition_imp_implementationWithBlock_init
: Enables the callback mechanism. It doesn’t usually do anything, because all the initializations are lazy it’s lazy, but for some processes, it can’t wait to load, righttrampolines dylib
_dyld_objc_notify_register
To:dyld
Register callback
environ_init
void environ_init(void)
{ /* * * */
// Print OBJC_HELP and OBJC_PRINT_OPTIONS output.
if(PrintHelp || PrintOptions) {
...
if (PrintOptions) {
_objc_inform("OBJC_PRINT_OPTIONS is set");
}
for(size_t i = 0; i < sizeof(Settings)/sizeof(Settings[0]); i++){
const option_t *opt = &Settings[i];
if(PrintHelp) _objc_inform("%s: %s", opt->env, opt->help);
if(PrintOptions && *opt->var) _objc_inform("%s is set", opt->env); }}}Copy the code
So if you have PrintHelp or PrintOptions, you can accept all of the environment variables, so let’s get rid of all of those conditions and go straight to the environment variables
for(size_t i = 0; i <sizeof(Settings)/sizeof(Settings[0]); i++) {
const option_t *opt = &Settings[i];
_objc_inform("%s: %s", opt->env, opt->help);
_objc_inform("%s is set", opt->env);
}
Copy the code
Note that the above code can only be run in objC source code. If there is no objC source code, export OBJC_HELP = 1 can be displayed on the terminal
Terminal display is also quite convenient, the key is to use the terminal handsome ah. These environment variables can be configured using Xcode. Here are some common examples
Where environment variables are configured in Xcode: Select run target–> Edit Scheme… –> Run –> Arguments –> Environment Variables
OBJC_DISABLE_NONPOINTER_ISA
The environment variable OBJC_DISABLE_NONPOINTER_ISA is a pointer to optimization. YES indicates that the pointer is saved, and NO indicates that the optimized pointer is Nonpointer ISA. First look at Nonpointer ISA without setting the environment variables
Isa low bit 0 is 1, indicating that the ISA is optimized, and there are other data in the high bit
Look again at the environment variable OBJC_DISABLE_NONPOINTER_ISA = YES and see what isa looks like
Isa low bit 0 is 0, indicating that ISA is the memory pointer, and the high value has no other data except CLS
OBJC_PRINT_LOAD_METHODS
The environment variable OBJC_PRINT_LOAD_METHODS prints all load methods in the program. Add the load method to the custom class and set the environment variable OBJC_PRINT_LOAD_METHODS = YES
+[LWPerson Load] This is the load method in the custom LWPerson class, the others are all system-level loads. If too many load methods cause your application to start slowly, or if someone does something in the load method, you can use this environment variable to check who implements the most loads and then rub them
tls_init
Tls_init bindings for thread keys, such as destructors for per-thread data
void tls_init(void)
{
#if SUPPORT_DIRECT_THREAD_KEYS
// Create a thread cache pool
pthread_key_init_np(TLS_DIRECT_KEY, &_objc_pthread_destroyspecific);
#else
// destructor
_objc_pthread_key = tls_create(&_objc_pthread_destroyspecific);
#endif
}
Copy the code
static_init
Run the C++ static constructor. Before dyld calls our static constructor, lib calls _objc_init to call its own C++ constructor. In short, libobjc calls its own global C++ function before dyld
static void static_init(a)
{
size_t count;
auto inits = getLibobjcInitializers(&_mh_dylib_header, &count);
for(size_t i = 0; i < count; i++) {
inits[i]();
}
auto offsets = getLibobjcInitializerOffsets(&_mh_dylib_header, &count);
for(size_t i = 0; i < count; i++) {
UnsignedInitializer init(offsets[i]);
init();
}
}
Copy the code
Let’s test if libobjc is calling itself, and adding C++ functions to the objc source library in dyld
Debugging results show that the libobjc system library itself does call the internal C++ functions
runtime_init
The Runtime runtime environment is initialized with the unattachedCategories and allocatedClasses tables
void runtime_init(void)
{
objc::unattachedCategories.init(32);// Class table initialization
objc::allocatedClasses.init(a);// Class table initialization
}
Copy the code
exception_init
Initialize the libobJC library’s exception handling system, which is similar to objC registering callbacks in dyLD, leaving everyone to handle exceptions
void exception_init(void)
{
old_terminate = std::set_terminate(&_objc_terminate);
}
Copy the code
When your application crashes, the system will send out an exception signal when the code in the upper layer does not comply with the underlying rules of the system. The _objc_terminate method is accessed by an exception
static void (*old_terminate)(void) = nil;
static void _objc_terminate(void)
{
if (PrintExceptions) {
_objc_inform("EXCEPTIONS: terminating");
}
if (! __cxa_current_exception_type()) {
// No current exception.
(*old_terminate)();
}
else {
// There is a current exception. Check if it's an objc exception.
@try {
__cxa_rethrow();
} @catch (id e) {
// It's an objc object. Call Foundation's handler, if any.
(*uncaught_handler)((id)e);
(*old_terminate)();
} @catch(...). {// It's not an objc object. Continue to C++ terminate.(*old_terminate)(); }}}Copy the code
_objc_terminate (*uncaught_handler)((id)e) ¶ If (*uncaught_handler) is found on _objc_Terminate (*uncaught_handler)((id)e) Global search uncaught_handler
objc_uncaught_exception_handler
objc_setUncaughtExceptionHandler(objc_uncaught_exception_handler fn)
{
objc_uncaught_exception_handler result = uncaught_handler;
uncaught_handler = fn;
return result;
}
Copy the code
Uncaught_handler = fn (); fn (); fn ()
cache_t::init
The cache condition is initialized
void cache_t::init(a)
{
#if HAVE_TASK_RESTARTABLE_RANGES
mach_msg_type_number_t count = 0;
kern_return_t kr;
while (objc_restartableRanges[count].location) {
count++;
}
// Enable caching
kr = task_restartable_ranges_register(mach_task_self(),
objc_restartableRanges, count)
if (kr == KERN_SUCCESS) return;
_objc_fatal("task_restartable_ranges_register failed (result 0x%x: %s)",
kr, mach_error_string(kr));
#endif // HAVE_TASK_RESTARTABLE_RANGES
}
Copy the code
_imp_implementationWithBlock_init
Start the callback mechanism. Usually nothing, because all initialization is lazy, but for some processes, trampolines dylib can’t wait to load
void
_imp_implementationWithBlock_init(void)
{
#if TARGET_OS_OSX
// Eagerly load libobjc-trampolines.dylib in certain processes. Some
// programs (most notably QtWebEngineProcess used by older versions of
// embedded Chromium) enable a highly restrictive sandbox profile which
// blocks access to that dylib. If anything calls
// imp_implementationWithBlock (as AppKit has started doing) then we'll
// crash trying to load it. Loading it here sets it up before the sandbox
// profile is enabled and blocks it.
//
// This fixes EA Origin (rdar://problem/50813789)
// and Steam (rdar://problem/55286131)
if (__progname &&
(strcmp(__progname, "QtWebEngineProcess") = =0 ||
strcmp(__progname, "Steam Helper") = =0)) {
Trampolines.Initialize(a); }#endif
}
Copy the code
_dyld_objc_notify_register
A callback to dyLD’s registration, _DYLD_OBJC_Notify_register is only called by the OBJC runtime and the implementation of the method is in the DYLD source code
// _dyld_objc_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped)
{
dyld::registerObjCNotifiers(mapped, init, unmapped);
}
Copy the code
// _dyld_objc_notify_init
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true.NULL.false.true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// call 'init' function on all images already init'ed (below libSystem)
for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(a); it ! = sAllImages.end(a); it++) { ImageLoader* image = *it;if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC()) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0.0);
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader()); }}}Copy the code
The _dyLD_OBJC_notify_register has three parameters
&map_images
:dyld
willimage
This function is called when loaded into memoryload_images
:dyld
Initialize allimage
The file will callunmap_image
Will:image
Call when you remove
So we’ve explored load_images which is essentially a load method, and now we’re exploring map_images, &map_images is a pointer passed to the address of the same implementation, because &map_images is the first argument, Search globally for sNotifyObjCMapped in dyLD
SNotifyObjCMapped is called in the notifyBatchPartial method. NotifyBatchPartial is called in the registerObjCNotifiers when objC initializes the registration notification. So map_images is called and load_images is called
read_images
Go to map_images, the source code is as follows
void
map_images(unsigned count, const char * const paths[],
const struct mach_header * const mhdrs[])
{
mutex_locker_t lock(runtimeLock);
return map_images_nolock(count, paths, mhdrs);
}
Copy the code
Click on map_images_NOLock, because there is a lot of code in it, let’s get straight to the point
I’m going to go to the _read_images method, which is a little bit confusing because there’s a lot of code in that method or I’m going to read it a little bit and find out that apple developers have provided a log
void _read_images(header_info hList, uint32_t hCount, int
totalClasses, int
unoptimizedTotalClasses)
{
... // Indicates that some code is omitted
#define EACH_HEADER \
hIndex = 0; \
hIndex < hCount && (hi = hList[hIndex]); \
hIndex++
// Condition control to load once
if(! doneOnce) { ... }// Fix a messy issue with @selector during precompilation
// It is the same method in different classes but the address of the same method is different
// Fix up @selector references
static size_tUnfixedSelectors; {... } ts.log("IMAGE TIMES: fix up selector references");
// Error messy class handling
// Discover classes. Fix up unresolved future classes. Mark bundle classes.
bool hasDyldRoots = dyld_shared_cache_some_image_overridden(a);for (EACH_HEADER) { ... }
ts.log("IMAGE TIMES: discover classes");
// Fix remapping some classes that were not loaded by the image file
// Fix up remapped classes
// Class list and nonlazy class list remain unremapped.
// Class refs and super refs are remapped for message dispatching.
if (!noClassesRemapped()) {... } ts.log("IMAGE TIMES: remap classes");
#if SUPPORT_FIXUP
// Fix some messages
// Fix up old objc_msgSend_fixup call sites
for (EACH_HEADER) { ... }
ts.log("IMAGE TIMES: fix up objc_msgSend_fixup");
#endif
// When there is a protocol in the class: 'readProtocol'
// Discover protocols. Fix up protocol refs.
for (EACH_HEADER) { ... }
ts.log("IMAGE TIMES: discover protocols");
// Fix the protocol that was not loaded
// Fix up @protocol references
// Preoptimized images may have the right
// answer already but we don't know for sure.
for (EACH_HEADER) { ... }
ts.log("IMAGE TIMES: fix up @protocol references");
// The processing of the classification
// Discover categories. Only do this after the initial category
// attachment has been done. For categories present at startup,
// discovery is deferred until the first load_images call after
// the call to _dyld_objc_notify_register completes.
if (didInitialAttachCategories) { ... }
ts.log("IMAGE TIMES: discover categories");
// Class loading processing
// Category discovery MUST BE Late to avoid potential races
// when other threads call the new category code befor
// this thread finishes its fixups.
// +load handled by prepare_load_methods()
// Realize non-lazy classes (for +load methods and static instances)
for (EACH_HEADER) { ... }
ts.log("IMAGE TIMES: realize non-lazy classes");
// Classes that are not processed, optimize those that are violated
// Realize newly-resolved future classes, in case CF manipulates them
if (resolvedFutureClasses) { ... }
ts.log("IMAGE TIMES: realize future classes"); .#undef EACH_HEADER
}
Copy the code
Sort out main flow modules according to log prompts
- Condition control for one load
- Fixed precompile phase
@selector
The confusion of the problem - Wrong messy class handling
- Fixed remapping some classes that were not loaded by the image file
- Fix some messages
- When there is a protocol in a class:
readProtocol
- Fix protocols that were not loaded
- Treatment of classification
- Class loading processing
- For classes that are not processed, optimize those that are violated
The following is a separate analysis based on the key points of the log
Only load once
if(! doneOnce) { doneOnce = YES;DoneOnce = YESlaunchTime = YES; .// Preoptimized classes don't go in this table.
// 4/3 is NXMapTable's load factor
int namedClassesSize =
(isPreoptimized()? unoptimizedTotalClasses : totalClasses) *4 / 3;
// Create a hash table to store all classes
gdb_objc_realized_classes =
NXCreateMapTable(NXStrValueMapPrototype, namedClassesSize);
ts.log("IMAGE TIMES: first time tasks");
}
Copy the code
DoneOnce =YES after loading once, the next time will not enter the judgment. Create table gDB_objC_realized_classes, which holds all classes implemented and unimplemented
repair@selector
The chaos
static size_t UnfixedSelectors;
{
mutex_locker_t lock(selLock);
for (EACH_HEADER) {
if (hi->hasPreoptimizedSelectors()) continue;
bool isBundle = hi->isBundle(a);// Get the method names list from macho
SEL *sels = _getObjc2SelectorRefs(hi, &count);
UnfixedSelectors += count;
for (i = 0; i < count; i++) {
const char *name = sel_cname(sels[i]);
SEL sel = sel_registerNameNoLock(name, isBundle);
if(sels[i] ! = sel) { sels[i] = sel; }}}}Copy the code
Because different classes may have the same method, but the same method but different address, fix those messy methods. Because methods are stored in classes, the location in each class is different, so the address of the method is different
Wrong messy class handling
for (EACH_HEADER) {
if (! mustReadClasses(hi, hasDyldRoots)) {
// Image is sufficiently optimized that we need not call readClass()
continue;
}
// Read the class list information from macho
classref_t const *classlist = _getObjc2ClassList(hi, &count);
bool headerIsBundle = hi->isBundle(a);bool headerIsPreoptimized = hi->hasPreoptimizedClasses(a);for (i = 0; i < count; i++) {
Class cls = (Class)classlist[i];
Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);
// The class may be moved at runtime, but it is not deleted
if(newCls ! = cls && newCls) {// Class was moved but not deleted. Currently this occurs
// only when the new class resolved a future class.
// Non-lazily realize the class below.
resolvedFutureClasses = (Class *)
realloc(resolvedFutureClasses,
(resolvedFutureClassCount+1) * sizeof(Class)); resolvedFutureClasses[resolvedFutureClassCount++] = newCls; }}}Copy the code
Add a breakpoint at readClass to run the source code
CLS is pointing to an address, newCls hasn’t been assigned yet, so I’m randomly assigned a dirty address, so I have some data, breakpoint let’s go down and see what happens when I’m assigned
The figure shows that readClass is used to associate class names with addresses. We may not understand, for example 🌰 here has a house now no one to buy, this does not belong to anyone, but the house has the address of the street that road number. Now Zhang SAN bought, so the property certificate has Zhang SAN’s name, the house is associated with Zhang SAN
Now through the custom class validation, now customize the two classes LWPerson and LWTeacher. Because classList = _getObjc2ClassList is retrieved from __objc_classList in the macho file Section. Now look at the macho file
The address of LWPerson is 0x0000000100004230, the address of LWTeacher is 0x0000000100004280 and the CLS above are also corresponding
Macho and the source corresponding, a touch of everything, can be said to be quite perfect. Let’s explore readClass, because readClass code is more than the main point to explore
Class readClass(Class cls, bool headerIsBundle, bool headerIsPreoptimized)
{
// Get the class name
const char *mangledName = cls->nonlazyMangledName(a);if (missingWeakSuperclass(cls)) { ... }
cls->fixupBackwardDeployingStableSwift(a); Class replacing = nil;if(mangledName ! =nullptr) {... }if(headerIsPreoptimized && ! replacing) {... }else {
if (mangledName) {
//some Swift generic classes can lazily generate their names
// Associate the class name with the address
addNamedClass(cls, mangledName, replacing);
} else { ...}
// Insert the associated class into another hash table
addClassTableEntry(cls);
}
// for future reference: shared cache never contains MH_BUNDLEs
if (headerIsBundle) { ... }
return cls;
}
Copy the code
You may wonder where CLS ->nonlazyMangledName() comes from. It’s in readClass
nonlazyMangledName
Get the name of the classrw
The assignment andro
The acquisition is not inreadClass
Inside, wait to run the source code to exploreaddNamedClass
Bind the class name with the address associationaddClassTableEntry
Inserts the associated classes into a hash table of initialized classes
How does nonlazyMangledName get the class name
const char *nonlazyMangledName(a) const {
return bits.safe_ro() - >getName(a); }Copy the code
Enter safe_ro, the source code is as follows
const class_ro_t *safe_ro(a) const {
class_rw_t *maybe_rw = data(a);if (maybe_rw->flags & RW_REALIZED) {
// maybe_rw is rw
// rw has values directly obtained from ro in rw
return maybe_rw->ro(a); }else
// maybe_rw is actually ro
// Get the data directly from ro, which is the data in macho
return (class_ro_t*)maybe_rw; }},Copy the code
Explore addNamedClass to bind the class name to the address association
static void addNamedClass(Class cls, const char *name, Class replacing = nil)
{
runtimeLock.assertLocked(a); Class old;if ((old = getClassExceptSomeSwift(name)) && old ! = replacing) {inform_duplicate(name, old, cls);
// getMaybeUnrealizedNonMetaClass uses name lookups.
// Classes not found by name lookup must be in the
// secondary meta->nonmeta table.
addNonMetaClass(cls);
} else {
// Update the gDB_objC_realized_classes table with key set to name and value set to CLS
NXMapInsert(gdb_objc_realized_classes, name, cls);
}
ASSERT(! (cls->data()->flags & RO_META));
// wrong: constructed classes are already realized when they get here
// ASSERT(! cls->isRealized());
}
Copy the code
Update gDB_objC_realized_classes hash table where key is name and value is CLS
Let’s explore addClassTableEntry and insert another table
static void
addClassTableEntry(Class cls, bool addMeta = true)
{
runtimeLock.assertLocked(a);// This class is allowed to be a known class via the shared cache or via
// data segments, but it is not allowed to be in the dynamic
//table already.
// allocatedClasses
auto &set = objc::allocatedClasses.get(a);ASSERT(set.find(cls) == set.end());
if (!isKnownClass(cls))
set.insert(cls);
if (addMeta)
// Insert the metaclass into the hash table
addClassTableEntry(cls->ISA(), false);
}
Copy the code
allocatedClasses
in_objc_init
In theruntime_init
Initialization of the runtime environment, which is mainlyunattachedCategories
andallocatedClasses
Two tables, insert at this pointallocatedClasses
In the tableaddMeta
=true
Add the metaclassallocatedClasses
In the table
The assignment of Rw and the retrieval of ro are not in readClass
The figure clearly shows that breakpoints do not enter the assigned code area
Class loading processing
Class loading processing is more complex and important, today a simple introduction to explore the process source code as followsThe comment clearly prompts you to initialize a non-lazily loaded class. What non-lazily loaded class is implementedload
Method or static instance method graph where you add your own judgment is not broken becauseLWPerson
Lazy loading classes. Now giveLWPerson
addload
methods
- when
LWPerson
A non-lazy-loaded class breakpoint will break nlclslist
A little bit about the inside call_getObjc2NonlazyClassList
The way to do that is to get the non-lazy load list from Macho, frommacho
In theSection
is__objc_nlclslist
Get, simply check truerealizeClassWithoutSwift
This method may be somewhat familiar and will be explored in more detail later
There is only one data in the __objc_nlclslist list, and the data is 0x0100004250. Because IOS reads from right to left in small-end mode, the first data in the __objc_classList list is the LWPerson class, whose address is 0x0100004250. So the non-lazy-loaded list data is the LWPerson class
conclusion
The whole process of exploration from dyLD to _objc_init to read_images is gradually connected. A lot of knowledge is also somewhat to the surface, the context is more and more clear. This is followed by the very important and detailed class loading