This is the third day of my participation in the August More text Challenge. For details, see: August More Text Challenge
In the previous application loading article, we reviewed the process of loading dyld. Next, we will analyze the code logic in detail using _objc_init as the entry point.
Objc_init analysis
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
// The initialization terminal for environment variables opens the environment variable help export OBJC_HELP=1
environ_init();
// Binding on the thread key, such as the thread data destructor
tls_init();
Before dyld calls our static constructor, LLDB calls _objc_init, so we have to do it ourselves
static_init();
// Runtime environment initialization
runtime_init();
// The exception handling system is initialized
exception_init();
#if __OBJC2__
// Cache condition initialization only works in Objective-C ++ 2.0
cache_t::init();
#endif
// Start the callback mechanism, which usually doesn't do anything because all initialization is lazy, but for some processes we can't wait to load the tramplienes dylib
_imp_implementationWithBlock_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
#if __OBJC2__
didCallDyldNotifyRegister = true;
#endif
}
Copy the code
environ_init()
Open the environment variable help instruction as:export OBJC_HELP=1
;tls_init()
: about the thread key binding, such as the thread data destructor;static_init()
: the system levelC++ global static functions
A call,dyld
Call ourStatic constructor
Before,lldb
Will be called_objc_init
, so we have to do it ourselves;runtime_init()
: Runtime environment initialization;exception_init()
: Initialization of the exception handling system;cache_t::init()
: The cache condition is initialized only whenOBjective - c + + 2.0
Version is valid._imp_implementationWithBlock_init
: startingThe callback
The mechanism usually doesn’t do anything, because all the initializations are, rightThe inertia of the
But for some processes, we can’t wait to loadtramplienes dylib
;
environ_init
The main runtime environment variables are some initialization operations, these environment variables can be in the debugging time to give us help, source code as follows:
The core code is the last for loop, pull it out, and let’s see what it prints:
This method prints out some environment variables that we may need to help with debugging as we develop;
In addition to circular printing, we can print the environment variable at the terminal by exporting OBJC_HELP=1:
These Environment Variables can be configured from target — Edit Scheme — Run –Arguments — Environment Variables;
Take OBJC_DISABLE_NONPOINTER_ISA as an example:
The ISA of the Person object before the environment variable is set
Set environment variables:
View the ISA for the Person object:
OBJC_DISABLE_NONPOINTER_ISA controls whether it is nonPOinter_ISA
Let’s try OBJC_PRINT_LOAD_METHODS again:
Run the program:
The OBJC_PRINT_LOAD_METHODS environment has been changed to print out all the load methods in the project, providing ideas for optimization (load method will affect program startup).
tls_init
The thread key binding, such as the thread data destructor, has the following source code:
static_init
Before dyld calls our static constructor, LLBC calls _objc_init, so we have to do it ourselves. The source code is as follows:
In the previous analysis, we figured out that the C++ constructor is called during the doInitialization->doModInitFunctions process, so why is it called here? We know from the comments that _objc_init calls objc’s own static constructor before dyld; In order to make dyld execute in time, we call the C++ constructor in advance.
To verify this in code, we add a C++ constructor to the _objc_init method:
Breakpoints execute source code:
Continue to perform:
You can see that the C++ constructor is actually called in this method;
runtime_init
Runtime environment initialization, source code as follows:
From the init method, we know that unattachedCategories and allocatedClasses are two tables, more on that later;
exception_init
Exception handling system initialization, source code as follows:
The implementation of _objc_terminate corresponds to:
Crash is a signal sent after an abnormal command. When a crash occurs, the _objc_terminate method is used and the exception is thrown in uncaught_handler.
In the App layer we can define a function and assign it to fn, which is uncaught_handler, so that in the App we can use this function to receive the exception and handle it;
The processing process of crash:
cache_t::init
Cache condition initialization, source code as follows:
_imp_implementationWithBlock_init
Starting the callback mechanism usually doesn’t do much because all the initialization is lazy, but for some processes we can’t wait to load the tramplienes dylib.
_dyld_objc_notify_register
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
Copy the code
Here’s the question, why do you have an & in front of map_images, but you don’t have an & in front of load_images?
Because the operation in the load_images method is relatively simple, just calling the load method, whereas the operation in the map_images method is relatively complex; & means to take an address, &map_images means to pass a pointer; What good would it do? Since map_images needs to map image files, the operation is time-consuming, pointer passing can ensure that map_images calls stay synchronized;
Map_images_nolock is called in map_images. The main implementation of this method is as follows:
void
map_images_nolock(unsigned mhCount, const char * const mhPaths[],
const struct mach_header * const mhdrs[])
{
static bool firstTime = YES;
header_info *hList[mhCount];
uint32_t hCount;
size_t selrefCount = 0;
// Perform first-time initialization if necessary.
// This function is called before ordinary library initializers.
// fixme defer initialization until an objc-using image is found?
if (firstTime) {
// Related initialization
preopt_init();
}
if (PrintImages) {
_objc_inform("IMAGES: processing %u newly-mapped images... \n", mhCount);
}
// Find all images with Objective-C metadata.
// Find all images with Objective-C metadata
hCount = 0;
// Count classes. Size various table based on the total.
// Count the number of classes depending on the total size of the table
int totalClasses = 0;
int unoptimizedTotalClasses = 0;
{
uint32_t i = mhCount;
while (i--) {
const headerType *mhdr = (const headerType *)mhdrs[i];
auto hi = addHeader(mhdr, mhPaths[i], totalClasses, unoptimizedTotalClasses);
if(! hi) {// no objc data in this entry
continue;
}
if (mhdr->filetype == MH_EXECUTE) {
// Size some data structures based on main executable's size
#if __OBJC2__
// If dyld3 optimized the main executable, then there shouldn't
// be any selrefs needed in the dynamic map so we can just init
// to a 0 sized map
if ( !hi->hasPreoptimizedSelectors() ) {
size_t count;
_getObjc2SelectorRefs(hi, &count);
selrefCount += count;
_getObjc2MessageRefs(hi, &count);
selrefCount += count;
}
#else
_getObjcSelectorRefs(hi, &selrefCount);
#endif
#if SUPPORT_GC_COMPAT
// Halt if this is a GC app.
if (shouldRejectGCApp(hi)) {
_objc_fatal_with_reason
(OBJC_EXIT_REASON_GC_NOT_SUPPORTED,
OS_REASON_FLAG_CONSISTENT_FAILURE,
"Objective-C garbage collection "
"is no longer supported.");
}
#endif
}
hList[hCount++] = hi;
if (PrintImages) {
_objc_inform("IMAGES: loading image for %s%s%s%s%s\n",
hi->fname(),
mhdr->filetype == MH_BUNDLE ? " (bundle)" : "",
hi->info()->isReplacement() ? " (replacement)" : "",
hi->info()->hasCategoryClassProperties() ? " (has class properties)" : "",
hi->info()->optimizedByDyld()?" (preoptimized)":""); }}}// Perform one-time runtime initialization that must be deferred until
// the executable itself is found. This needs to be done before
// further initialization.
// (The executable may not be present in this infoList if the
// executable does not contain Objective-C code but Objective-C
// is dynamically loaded later.
if (firstTime) {
sel_init(selrefCount);
arr_init();
#if SUPPORT_GC_COMPAT
// Reject any GC images linked to the main executable.
// We already rejected the app itself above.
// Images loaded after launch will be rejected by dyld.
for (uint32_t i = 0; i < hCount; i++) {
auto hi = hList[i];
auto mh = hi->mhdr();
if(mh->filetype ! = MH_EXECUTE && shouldRejectGCImage(mh)) { _objc_fatal_with_reason (OBJC_EXIT_REASON_GC_NOT_SUPPORTED, OS_REASON_FLAG_CONSISTENT_FAILURE,"%s requires Objective-C garbage collection "
"which is no longer supported.", hi->fname()); }}#endif
#if TARGET_OS_OSX
Disable +initialize fork safety if the app is too old (< 10.13).
// Disable +initialize fork safety if the app has a
// __DATA,__objc_fork_ok section.
// if (! dyld_program_sdk_at_least(dyld_platform_version_macOS_10_13)) {
// DisableInitializeForkSafety = true;
// if (PrintInitializing) {
// _objc_inform("INITIALIZE: disabling +initialize fork "
// "safety enforcement because the app is "
// "too old.)");
/ /}
/ /}
for (uint32_t i = 0; i < hCount; i++) {
auto hi = hList[i];
auto mh = hi->mhdr();
if(mh->filetype ! = MH_EXECUTE)continue;
unsigned long size;
if (getsectiondata(hi->mhdr(), "__DATA"."__objc_fork_ok", &size)) {
DisableInitializeForkSafety = true;
if (PrintInitializing) {
_objc_inform("INITIALIZE: disabling +initialize fork "
"safety enforcement because the app has "
"a __DATA,__objc_fork_ok section"); }}break; // assume only one MH_EXECUTE image
}
#endif
}
if (hCount > 0) {
// Load the image file
_read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);
}
firstTime = NO;
// Call image load funcs after everything is set up.
// When everything is ready, call the image load function
for (auto func : loadImageFuncs) {
for (uint32_t i = 0; i < mhCount; i++) { func(mhdrs[i]); }}}Copy the code
preopt_init()
: Initialize the relevant environment;hCount
: Stores all of theMirroring objective-C metadata
The number of PI, which is inwhile
The loophList[hCount++] = hi
forhCount++
Assignment operation;totalClasses
Deposit:class
The number of,class
Depends on the number oftable
The total size of;_read_images
: Loads the image file.loadImageFuncs
: After all the work is ready, the image file loading function is called;
So how exactly is the image being loaded in this case? At its core is method
_read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);
Copy the code
Read_image Process (Emphasis)
This method looks complicated, so let’s close all the judgment branches and take a look at the function as a whole:
There are many log information in the _read_images function. According to the log information, the main process of _read_images function is analyzed:
- 1. Conditional control for a load
- 2. Fix precompile
@selector
The problem of confusion - 3. Error messy class handling
- 4. Fixed the remapping of some classes that were not loaded by the image file
- 5. Fix some messages
- 6. What to do when a class has a protocol:
readProtocol
- 7. Fix the protocol that was not loaded
- 8. Treatment of classification
- 9. Class loading processing
- 10. Unprocessed classes, optimize classes that are deleted and not recycled (future classes)
The following is a step-by-step analysis of the process:
1. Conditional control for a load
initializeTaggedPointerObfuscator
: small object type, which mainly do some confusing operations, not as the focus;namedClassesSize
: Calculates the size required to create the table, here is beforecache
In 3/4 of the reverse calculation, calculate the capacityNXCreateMapTable
: Create a pageHash table
Is used to store class information. The table size isnamedClassesSize
;
NXCreateMapTable creates a hash table to store class information.
Gdb_objc_realized_classes:
// This is a misnomer: gdb_objc_realized_classes is actually a list of
// named classes not in the dyld shared cache, whether realized or not.
// This list excludes lazily named classes, which have to be looked up
// using a getClass hook.
NXMapTable *gdb_objc_realized_classes; // exported for debuggers in objc-gdb.h
Copy the code
What that means is that this hash table is used to store named classes that are not in the shared cache, and the total capacity, whether the class is implemented or not, is four thirds of the number of classes;
2. Fix precompile@selector
The problem of confusion
Fix the @selector reference where sel is a string with an address
_getObjc2SelectorRefs
Through:_getObjc2SelectorRefs
getMachO
Static segment in__objc_selrefs
;sel_registerNameNoLock
Register:sel
To add it to the hash tablenamedSelectors
;sels[i] = sel
From:MachO
Reads thesel
The address is not a real address and needs to be reassigned to the address fromdyld
Reads thesel
Address shall prevail;
This block of code takes the static segment __objc_selrefs from MachO through _getObjc2SelectorRefs, traverses the list, and adds sel to the hash table through sel_registerNameNoLock. Sel_registerNameNoLock reads from dyld;
The source code for _getObjc2SelectorRefs is as follows:
GETSECT(_getObjc2SelectorRefs, SEL, "__objc_selrefs");
Copy the code
It gives you the static section __objc_selrefs in MachO;
The same retain method is found, but the two addresses are not the same, so you need to reassign the address.
3. Error messy class handling (class with name and address)
Class cls = (Class)classlist[i]
: getcls
At this time,cls
It’s just an address;readClass
: reads the class, after thatcls
Will really have a name;
Breakpoint print CLS:
After executing readClass:
Breakpoint debugging shows that after the readClass step, the class has an address and a name;
if(newCls ! = cls && newCls) {// Class was moved but not deleted. Currently this occurs
// only when the new class resolved a future class.
// Non-lazily realize the class below.
resolvedFutureClasses = (Class *)realloc(resolvedFutureClasses, (resolvedFutureClassCount+1) * sizeof(Class));
resolvedFutureClasses[resolvedFutureClassCount++] = newCls;
}
Copy the code
This code will not be executed if it passes a breakpoint; Here we’re dealing with some future classes. What is a future class? A class that should be deleted but is not is called a future class. This class needs to be processed here.
4. Fixed the remapping of some classes that were not loaded by the image file
Remap unmapped classes and Super classes:
_getObjc2ClassRefs
Used to obtainMachO
In the static period__objc_classrefs
, that is, to obtainThe references to classes
;_getObjc2SuperRefs
Used to obtainMachO
In the static period__objc_superrefs
, that is, to obtainA reference to the parent class
;
The class that remapClassRef operates on is lazily loaded.
5. Fix some messages
_getObjc2MessageRefs
: getMachO
The static period__objc_msgrefs
;fixupMessageRef
: registers the function pointer and fixes it as a new pointer;
Get __objc_msgrefs in MachO through _getObjc2MessageRefs, walk through and register the function pointer with fixupMessageRef, and fix it as a new pointer
6. What to do when a class has a protocol:readProtocol
All Protocol lists are traversed and loaded into the hash table of the Protocol
Class cls = (Class)&OBJC_CLASS_$_Protocol;
: CLS = Protocol class. All protocols and objects have similar structures. Isa corresponds to Protocol classNXMapTable *protocol_map = protocols();
createProtocol a hash table
, table,protocol_map
_getObjc2ProtocolList
Through:_getObjc2ProtocolList
Access to theIn the MachO
The static period__objc_protolist
Protocol list that is read from the compiler and initializedprotocol
readProtocol
: Loops throughreadProtocol
Method to add the protocol toprotocol_map
The hash table
7. Fix the protocol that was not loaded
_getObjc2ProtocolRefs
: obtain the static segment __objc_protorefs of MachOremapProtocolRef
: Compares the current protocol with the protocols in the same memory address in the protocol list. If the protocols are different, replace them
RemapProtocolRef = remapProtocolRef = remapProtocolRef
/*********************************************************************** * remapProtocolRef * Fix up a protocol ref, in case the protocol referenced has been reallocated. * Locking: runtimeLock must be read- or write-locked by the caller * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
static size_t UnfixedProtocolReferences;
static void remapProtocolRef(protocol_t **protoref)
{
runtimeLock.assertLocked();
// Obtain the protocol for the uniform memory address in the protocol list
protocol_t *newproto = remapProtocol((protocol_ref_t)*protoref);
if(*protoref ! = newproto) {// If the current protocol is different from the same memory address protocol, replace*protoref = newproto; UnfixedProtocolReferences++; }}Copy the code
__objc_protorefs in MachO is obtained by _getObjc2ProtocolRefs, and then traverses the protocol that needs to be fixed. The remapProtocolRef method compares the current protocol and the protocol list in the same memory address of the protocol is the same, if different, replace;
8. Treatment of classification
It is used to process classes, which can only be executed after classes are initialized and data is loaded into the class. For run-time classes, it is found after the first load_images call after the execution of _dyLD_OBJC_NOTIFy_register
9. Class loading processing (returns the real data structure of a class)
Classes that implement non-lazy loading for load methods and static instance variables
Nlclslist method source code:
const classref_t *header_info::nlclslist(size_t *outCount) const
{
#if __OBJC2__
// This field is new, so temporarily be resilient to the shared cache
// not generating it
if (isPreoptimized() && hasPreoptimizedSectionLookups()) {
*outCount = nlclslist_count;
const classref_t *list = (const classref_t *)(((intptr_t)&nlclslist_offset) + nlclslist_offset);
#if DEBUG
size_t debugCount;
assert((list == _getObjc2NonlazyClassList(mhdr(), &debugCount)) && (*outCount == debugCount));
#endif
return list;
}
return _getObjc2NonlazyClassList(mhdr(), outCount);
#else
return NULL;
#endif
}
Copy the code
AddClassTableEntry method source:
/*********************************************************************** * addClassTableEntry * Add a class to the table of all classes. If addMeta is true, * automatically adds the metaclass of the class as well. * Locking: runtimeLock must be held by the caller. **********************************************************************/
static void
addClassTableEntry(Class cls, bool addMeta = true)
{
runtimeLock.assertLocked();
// This class is allowed to be a known class via the shared cache or via
// data segments, but it is not allowed to be in the dynamic table already.
auto &set = objc::allocatedClasses.get();
ASSERT(set.find(cls) == set.end());
if(! isKnownClass(cls)) set.insert(cls);if (addMeta)
addClassTableEntry(cls->ISA(), false);
}
Copy the code
- through
_getObjc2NonlazyClassList
To obtainMachO
The static period__objc_nlclslist
That is, the table of non-lazily loaded classes addClassTableEntry(cls);
Inserts the class and its metaclass into the tablerealizeClassWithoutSwift
Initialize the class (in step 3, only the name and address are used; data is not loaded), assign read and write data (such as RW), and return the real structure of the class
Note that this is a non-lazily loaded class
10. Unprocessed classes, optimize classes that are deleted and not recycled (future classes)
realizeClassWithoutSwift
: the implementation classrealizeAllClasses
: Implements all classes
Through combing the process, the core content is found in step 3’s readClass and step 9’s realizeClassWithoutSwift methods