Objective-c Runtime Weak

The blog link

background

Now it’s hard for an iOS developer to go out for an interview without looking at some Runtime source code. Method Swizzling Method Swizzling Method Swizzling Online search, the content is basically much the same, so put aside the interview Angle, and then to scratch a scratch Runtime Weak related source code, we can learn what new content? Here I will try to interpret the source code from other angles, welcome to explore.

The preparatory work

The Runtime source code version is 750, and the official version can be downloaded on the official website. This article uses the compilable version provided by GitHub, and can be compiled and run with Xcode 10.1. Of course, the Runtime source code is written in C/C++, so some KNOWLEDGE of C++ programming is necessary.

A glimpse

Start with comments

Why start with comments? Comments are also an important part of the code, especially for some logically complex code, important core code, ‘magic’ code, exposed interface, etc. Comments are a good supplement, for the research and take over the code, you leave a valuable asset. We know that Apple’s Runtime source code is open source, but how does the open source code comment? What is the reference for us? Open the objc-weak.h header and let’s look at a few examples:

Design specification

/* The weak table is a hash table governed by a single spin lock. An allocated blob of memory, most often an object, but under GC any such allocation, may have its address stored in a __weak marked storage location through use of compiler generated write-barriers or hand  coded uses of the register weak primitive. Associated with the registration can be a callback block for the case when one of the allocated chunks of memory is reclaimed. The table is hashed on the address of the allocated memory. When __weak marked memory changes its reference, we count on the fact that we can still see its previous reference. So, in the hash table, indexed by the weakly referenced item, is a list of all locations where this address is currently being stored. For ARC, we also keep track of whether an arbitrary object is being deallocated by briefly placing it in the table just prior to invoking dealloc, and removing it via objc_clear_deallocating just prior to memory reclamation. */
Copy the code

Weak reference tables are hash tables controlled by a single spinlock (thread-safe). An allocated block of memory (usually an object, which under GC can be any opened memory) whose address can be stored at the storage location of the __weak flag by using compiler-generated write-barriers or manually coded registration weak primitives…

Type specification

// The address of a __weak variable.
// These pointers are stored disguised so memory analysis tools
// don't see lots of interior pointers from the weak table into objects.
typedef DisguisedPtr<objc_object *> weak_referrer_t;
Copy the code

The memory analysis tool can only see the values, not the internal structure, because it can be accessed directly through Pointers.

Class/struct specification

/** * The global weak references table. Stores object ids as keys, * and weak_entry_t structs as their values. */
struct weak_table_t {
    weak_entry_t *weak_entries;
    size_t    num_entries;
    uintptr_t mask;
    uintptr_t max_hash_displacement;
};
Copy the code

The function of weak_table_t structure is explained here.

Constants that

// out_of_line_ness field overlaps with the low two bits of inline_referrers[1].
// inline_referrers[1] is a DisguisedPtr of a pointer-aligned address.
// The low two bits of a pointer-aligned DisguisedPtr will always be 0b00
// (disguised nil or 0x80.. 00) or 0b11 (any other address).
// Therefore out_of_line_ness == 0b10 is used to mark the out-of-line state.
#define REFERRERS_OUT_OF_LINE 2
Copy the code

This is where the constant is defined to be 2.

Interface specification

/// Adds an (object, weak pointer) pair to the weak table.
id weak_register_no_lock(weak_table_t *weak_table, id referent, 
                         id *referrer, bool crashIfDeallocating);

Copy the code

The description here is also clear, adding objects and weak pointer pairs to the weak reference table. There are two details: the _no_lock suffix indicates that this function is not thread safe if it does not hold the lock. The crashIfDeallocating parameter specifies whether a call to Weak_register_no_lock during the object’s locating would trigger a crash.

summary

When we write business code, we generally do not pay much attention to annotations, including some annotations provided to other teams in the SDK, and some are not very complete. Here, we can learn from the open source code how to write and maintain good annotations. The code for subsequent maintenance will be much clearer, and we will have less “technical customer service” work. In addition, I rarely read comments and looked directly at the code. After a close reading of the comments, you’ll have a better understanding of the code’s history, history, and perhaps some unexpected insights. I went back to GitHub and took a look at the common third-party libraries, and found that the most popular open source libraries don’t leave much room for comments.

The union use

The source code

struct weak_entry_t {
    DisguisedPtr<objc_object> referent;
    union {
        struct {
            weak_referrer_t *referrers;
            uintptr_t        out_of_line_ness : 2;
            uintptr_t        num_refs : PTR_MINUS_2;
            uintptr_t        mask;
            uintptr_t        max_hash_displacement;
        };
        struct {
            // out_of_line_ness field is low bits of inline_referrers[1]
            weak_referrer_t  inline_referrers[WEAK_INLINE_COUNT];
        };
    };
    // ...
};
Copy the code

Weak_entry_t structure uses union, why is union used here? Why are bitfields used in structs? I’m going to look at them one by one.

The introduction of the union

The union? C/C ++ is used more, iOS development is used less. So it’s a little bit new to me personally. So what’s it good for? Let’s start with an example:

struct s1 {
    char mark;
    int num;
    double score;
};
union u1 {
    char mark;
    int num;
    double score;
};
Copy the code

Here we define a struct s1, a union U1. On Mac (x86_64) we sizeof(struct s1) sizeof(union U1), we get 16 and 8. For s1 on an x86_64 machine, char is 1 byte, int is 4 bytes, and double is 8 bytes. Since structs are memory aligned, char is aligned to int. The whole thing is 4 + 4 + 8 = 16. For u1 on an x86_64 machine, the widest double will be 8. That is to say, one of the advantages of union intuition is the savings, which is also a considerable optimization for weak operations, which are relatively frequent.

But there are some caveats to using union:

The same memory segment can be used to hold several different types of members, but only one of them can exist at any one time, rather than several at the same time. That is to say, only one member is active at a time, the other members are not active, that is, not both exist and active at the same time.
The active member of a community variable is the last one stored. After a new member is stored, the original member is lost.

So you need to be very careful with union, otherwise it’s easy to make mistakes.

Analysis of union in Weak_entry_t

Let’s look at the use of union in Weak_entry_t. We can run Runtime project source, breakpoint view.

(lldb) po sizeof(weak_entry_t)
40
Copy the code

Let’s analyze:

DisguisedPtr<objc_object>It takes 8 bytes.
unionIn the firststruct 中 referrersIt’s eight bytes,out_of_line_nessTwo,num_refsThat’s 62 places, soout_of_line_ness 和 num_refsThat adds up to eight bytes,maskIt’s eight bytes,max_hash_displacementIt takes 8 bytes. So 8 * 4 = 32 bytes.
unionIn the secondstruct 中 WEAK_INLINE_COUNT4, theninline_referrersIt takes 32 bytes.
twostructBoth take 32 bytes, so 8 + 32 = 40 bytes.

The use of bitfields

Another trick we see in the source code is the use of bitfields:

uintptr_t        out_of_line_ness : 2;
uintptr_t        num_refs : PTR_MINUS_2;
Copy the code

The advantage of this, of course, is to save memory, because the out_of_line_ness itself is only a flag bit, there is no need to store large numbers, so two digits are enough, we can see from the comment:

// out_of_line_ness field overlaps with the low two bits of inline_referrers[1].
// inline_referrers[1] is a DisguisedPtr of a pointer-aligned address.
// The low two bits of a pointer-aligned DisguisedPtr will always be 0b00
// (disguised nil or 0x80.. 00) or 0b11 (any other address).
// Therefore out_of_line_ness == 0b10 is used to mark the out-of-line state.
#define REFERRERS_OUT_OF_LINE 2
Copy the code

That is, only out_OF_LINe_ness == REFERRERS_OUT_OF_LINE is used in the code to judge, and two digits are enough. The remaining 62 bits are sufficient for num_refs reference counts. For memory savings, but also achieved the extreme.

Rationality of design

We know that only one struct is active in a union at a time, so here’s the code:

id obj = [[NSObject alloc] init]; // first 4 exists in 'inline_referrers', the following' struct 'works __weak ID weakObj1 = obj; __weak id weakObj2 = obj; __weak id weakObj3 = obj; __weak id weakObj4 = obj; __weak ID weakObj5 = obj; // Add the referrers to the referrers.Copy the code

Therefore, if an object is weakly referenced less often (<=4), it is stored directly in the array, and the data moves through the stack relatively quickly. If the number of weak references is high, the memory will be rebooted on the heap to expand the storage, and the memory needs to be managed manually, which is relatively slow and logically complicated. The number of weak references for most objects does not exceed the threshold, which balances memory and performance.

Design HashMap

Warm up

Let’s look at the Design HashMap of LeetCode. If memory optimization is not considered, the implementation can be implemented using a 1000,000-size array, and the HashMap’s key is the value itself, and the value is the value itself. But if memory optimization is a consideration, a fast hash function and hash collision handling are essential. I used Golang to realize the solution of this problem. Golang version solution has passed the unit test of LeetCode at present.

The processing of weak_table_t

Weak_table_t is also a Hash Map in essence. The solution in the above problem is also a reference to the partial implementation of weak_table_t. There are still many places worth thinking and learning about the implementation of weak_table_t.

Fast hash function

The hash function is important because ultimately we need to map keys to subscripts and store them in arrays, so that a fast hash function can be efficient for frequent operations, and at the same time it needs to be uniform and random. This reduces the probability of hash conflicts and ensures efficient storage.

Runtime Runtime Runtime Runtime Runtime Runtime Runtime Runtime


// ...
size_t begin = w_hash_pointer(new_referrer) & (entry->mask);
// ...

/** * Unique hash function for weak object pointers only. * * @param key The weak object pointer. * * @return Size unrestricted hash of pointer. */
static inline uintptr_t w_hash_pointer(objc_object **key) {
    return ptr_hash((uintptr_t)key);
}

// Pointer hash function.
// This is not a terrific hash, but it is fast 
// and not outrageously flawed for our purposes.

// Based on principles from http://locklessinc.com/articles/fast_hash/
// and evaluation ideas from http://floodyberry.com/noncryptohashzoo/
#if __LP64__
static inline uint32_t ptr_hash(uint64_t key)
{
    key ^= key >> 4;
    key *= 0x8a970be7488fda55;
    key ^= __builtin_bswap64(key);
    return (uint32_t)key;
}
#else
static inline uint32_t ptr_hash(uint32_t key)
{
    key ^= key >> 4;
    key *= 0x5052acdb;
    key ^= __builtin_bswap32(key);
    return key;
}
#endif

Copy the code

Ptr_hash which is fast hash function, of course, the realization of the function logic does not too understand, we can go to locklessinc.com/articles/fa… For reference on this website, the details will not be delve into, the content of the blog is still quite complex, and even used assembly. As described in the comments, the ptr_hash function is not perfect (but on target) and fast enough, and it’s impossible to tell how much Apple tried and tested it. However, as the code annotated below shows, this simple function is well thought out.

/* Higher-quality hash function. This is measurably slower in some workloads. #if __LP64__ uint32_t ptr_hash(uint64_t key) { key -= __builtin_bswap64(key); key *= 0x8a970be7488fda55; key ^= __builtin_bswap64(key); key *= 0x8a970be7488fda55; key ^= __builtin_bswap64(key); return (uint32_t)key; } #else static uint32_t ptr_hash(uint32_t key) { key -= __builtin_bswap32(key); key *= 0x5052acdb; key ^= __builtin_bswap32(key); key *= 0x5052acdb; key ^= __builtin_bswap32(key); return key; } #endif */
Copy the code

There is one more detail worth studying in the source code

size_t begin = w_hash_pointer(new_referrer) & (entry->mask);
Copy the code

Note that bits and (&) are used instead of the usual mod (%) operation. Why? Weak table is always expanded by 2 to the NTH power, so we can carry out mod operation through bit operation. However, it should be noted that this method is only suitable for finding the NTH order of a number divided by 2. Of course, bits are faster than mod, so this is a little trick.

Hash conflict

As mentioned above, the ptr_hash function is not a perfect hash function (there is no perfect hash function yet), so there are two main ways to handle hash collisions: open addressing and linked list. Weak table uses open addressing. Here we go through the source code:

if (entry->num_refs >= TABLE_SIZE(entry) * 3/4) {
	return grow_refs_and_insert(entry, new_referrer);
}
size_t begin = w_hash_pointer(new_referrer) & (entry->mask);
size_t index = begin;
size_t hash_displacement = 0;
while(entry->referrers[index] ! = nil) { hash_displacement++; index = (index+1) & entry->mask;
	if (index == begin) bad_weak_table(entry);
}
if (hash_displacement > entry->max_hash_displacement) {
	entry->max_hash_displacement = hash_displacement;
}
weak_referrer_t &ref = entry->referrers[index];
ref = new_referrer;
entry->num_refs++;
Copy the code

The hash value (array subscript) is computed using the hash function and the mod. If the subscript already has a value, the mod is recomputed by incrementing by 1 until it finds a place where no value is stored. And of course there are some other boundary condition judgments. The linear detection method is used here. In addition to this method, there are two other classical methods — quadratic detection and double hashing, which are not introduced here for the moment. If you are interested, you can turn to Google. Of course, linear detection has a major drawback. As more data is inserted into the hash table, the greater the likelihood of hash collisions, the fewer free places, and the longer the linear probes take. Here we see that in order to reduce the occurrence of such a situation, the current weak table capacity is checked at the beginning. If it reaches 3/4 of the total capacity, the weak table capacity is expanded.

Although the code is short, the design is still very thoughtful, need to read carefully, carefully experience.

Delete operation

For linear probes, the delete operation also requires special care, and cannot simply empty the element at that position. Because in the search operation, if we find an empty position, we will consider it as a valid position. If this position is deleted and empty later, then the original search algorithm will be invalid, and the existing data will be considered not to exist. So how to solve this problem? Let’s go straight to the source code:

/** * Remove entry from the zone's table of weak references. */
static void weak_entry_remove(weak_table_t *weak_table, weak_entry_t *entry)
{
    // remove entry
    if (entry->out_of_line()) free(entry->referrers);
    bzero(entry, sizeof(*entry));

    weak_table->num_entries--;

    weak_compact_maybe(weak_table);
}
Copy the code

The odd thing here is that free(entry->referrers); Entry is not left empty, and the data stored in memory is erased using BZero. The problem is that if you do not empty the space, then as you insert and delete, the original deleted element will still occupy the space, which is bound to cause a lot of waste. So Apple finally calls Weak_compact_maybe to check, and the redundant space reaches a certain threshold for compression.

Read operation

Because of hash conflicts, the read operation has one more check:

/** * Return the weak reference table entry for the given referent. * If there is no entry for referent, return NULL. * Performs a lookup. * * @param weak_table * @param referent The object. Must not be nil. * * @return The table of weak referrers to this object. */
static weak_entry_t *
weak_entry_for_referent(weak_table_t *weak_table, objc_object *referent)
{
    assert(referent);

    weak_entry_t *weak_entries = weak_table->weak_entries;

    if(! weak_entries)return nil;

    size_t begin = hash_pointer(referent) & weak_table->mask;
    size_t index = begin;
    size_t hash_displacement = 0;
    // When there are hash conflicts, the index computed by the hash function may not yield the correct value.
    // It is necessary to iterate until the correct value is found or a null value is returned when the limit is exceeded.
    // So when there are a lot of hash conflicts, the data access performance deteriorates significantly.
    while(weak_table->weak_entries[index].referent ! = referent) { index = (index+1) & weak_table->mask;
        if (index == begin) bad_weak_table(weak_table->weak_entries);
        hash_displacement++;
        if (hash_displacement > weak_table->max_hash_displacement) {
            returnnil; }}return &weak_table->weak_entries[index];
}
Copy the code

Expansion and compression processing

Expansion and compression processing is not special, mainly some threshold setting and judgment.

// Grow the given zone's table of weak references if it is full.
static void weak_grow_maybe(weak_table_t *weak_table)
{
    size_t old_size = TABLE_SIZE(weak_table);

    // Grow if at least 3/4 full.
    if (weak_table->num_entries >= old_size * 3 / 4) {
        weak_resize(weak_table, old_size ? old_size*2 : 64); }}// Shrink the table if it is mostly empty.
static void weak_compact_maybe(weak_table_t *weak_table)
{
    size_t old_size = TABLE_SIZE(weak_table);

    // Shrink if larger than 1024 buckets and at most 1/16 full.
    if (old_size >= 1024  && old_size / 16 >= weak_table->num_entries) {
        weak_resize(weak_table, old_size / 8);
        // leaves new table no more than 1/2 full}}Copy the code

If the storage capacity is greater than or equal to three quarters of the total capacity, expand the storage capacity. If the total capacity is greater than or equal to 1024 and the storage capacity is less than 1/16, you need to compress the storage capacity. Through these two dynamic memory space processing, the weak table is guaranteed to be in a controllable memory usage state.

The surrounding

The above is the general analysis of weak table. In fact, there are many interesting places around it. Let’s continue to pick up the weak table.

SideTable

Thread safety

As mentioned above, objc-weak. H exposes almost all functions that end with _no_lock, which means that __weak is left to the caller to handle. So who is thread safe anymore? The answer is SideTable. Let’s take a quick look at the code:

struct SideTable {
    spinlock_t slock;
    RefcountMap refcnts;
    weak_table_t weak_table;

    SideTable() {
        memset(&weak_table, 0.sizeof(weak_table));
    }

    ~SideTable() {
        _objc_fatal("Do not delete SideTable.");
    }

    void lock(a) { slock.lock(); }
    void unlock(a) { slock.unlock(); }
    void forceReset(a) { slock.forceReset(); }

    // Address-ordered lock discipline for a pair of side tables.

    template<HaveOld, HaveNew>
    static void lockTwo(SideTable *lock1, SideTable *lock2);
    template<HaveOld, HaveNew>
    static void unlockTwo(SideTable *lock1, SideTable *lock2);
};
Copy the code

In SideTable, there is a variable slock of type SPINlock_T, and a variable Weak_table_TABLE of type Weak_TABLE. There are also some lock methods. There are two ways to handle locking:

// Call a static function to lock both sideTables
SideTable::lockTwo<haveOld, haveNew>(oldTable, newTable);
/ /...
SideTable::unlockTwo<haveOld, haveNew>(oldTable, newTable);

// Single SideTable call
table.lock();
/ /...
table.unlock();
Copy the code

Global variable design

Weak table is “The global weak References table”. So how does this “global” work? Weak_table is held by the SideTable, which is held by the global SideTables. So why do we need to do this? In my opinion, it should be for storage and search efficiency considerations. After all, if all the weak variables are stored in a weak table, the burden of the weak table will be a little heavy. Let’s take a look at how SideTables is implemented:

// We cannot use a C++ static initializer to initialize SideTables because
// libc calls us before our C++ initializers run. We also don't want a global 
// pointer to this struct because of the extra indirection.
// Do it the hard way.
alignas(StripedMap<SideTable>) static uint8_t 
    SideTableBuf[sizeof(StripedMap<SideTable>)];

static void SideTableInit(a) {
    new (SideTableBuf) StripedMap<SideTable>();
}

static StripedMap<SideTable>& SideTables() {
    return *reinterpret_cast<StripedMap<SideTable>*>(SideTableBuf);
}
Copy the code

First use alignAS and there’s a description of alignAS. In addition, the initialization function is also special, as explained in the comments here. The loading order of the original library also affects the implementation of the code. Here we have a new type, StripedMap. What data type is this? StripedMap is a detailed implementation of a simplified Hash map.stripedMap, but I won’t expand it here, so you can go to objc-private.h. Here’s a detail:

enum { CacheLineSize = 64 };

// StripedMap<T> is a map of void* -> T, sized appropriately 
// for cache-friendly lock striping. 
// For example, this may be used as StripedMap<spinlock_t>
// or as StripedMap<SomeStruct> where SomeStruct stores a spin lock.
template<typename T>
class StripedMap {
#ifTARGET_OS_IPHONE && ! TARGET_OS_SIMULATOR
    enum { StripeCount = 8 };
#else
    enum { StripeCount = 64 };
#endif

    struct PaddedT {
        T value alignas(CacheLineSize);
    };
    // omit N lines of code here...
}
Copy the code

The size of the iPhone is currently 8 CacheLineSize, and the SideTable is stored in different areas. So how do you tell which region an object ends up in? Again by bit operations and mod:

static unsigned int indexForPointer(const void *p) {
	uintptr_t addr = reinterpret_cast<uintptr_t>(p);
	return ((addr >> 4) ^ (addr >> 9)) % StripeCount;
}
Copy the code

Let’s see how it works:

// Get a SideTable from SideTables
table = &SideTables()[obj];

template<typename T>
class StripedMap {
  // omit N lines of code here...
  // Actually call the code
 public:
    T& operator[] (const void *p) { 
        return array[indexForPointer(p)].value; 
    }
    const T& operator[] (const void *p) const { 
        return const_cast<StripedMap<T>>(this)[p]; 
    }
  // omit N lines of code here...
};
Copy the code

Here we have some knowledge of a global storage data structure, and we can learn from similar implementations, such as thread-safe design, data storage structure design, and so on.

conclusion

Of course, this is only a summary of the analysis of the weak implementation of some of the source code, the entire nsobjject. Mm file implementation is still very complex, including a lot of C/C++ implementation skills (such as the use of templates are more, can be carefully studied later), there are also a lot of subtle design. You should spend more time studying, not just for the interview. In addition, when learning the source code, each reading will have a different harvest, but also can put aside the language level to think about the general design, are very interesting.