In daily development, we often use code like the following to create objects:

Person *p = [[Person alloc] init];
Copy the code

You’ll notice that to create an object, you need to call both alloc and init. So today we’re going to explore what that alloc does.


In fact, as iOS developers know, alloc is used to create memory for objects. So how exactly to open up memory? That’s the focus of our exploration today. First we need to download the source code of ObjC4, and try to download the latest version. Second, download down the source is not compiled, here to recommend you an article, iOS_OBJC4-756.2 latest source code compiler debugging. You can refer to this article to compile your own source code.


###alloc source explore ####1, createPersonObject, as follows:We bury the break point here and jumpallocFunction, you’ll find the following function call stack:

  • “First step” entryallocFunction source:
+ (id)alloc {
    return _objc_rootAlloc(self);
}
Copy the code

  • “Step two” is inallocIn the source code, we’re going to post, the function goes in_objc_rootAlloc, we follow in to check:
// Base class implementation of +alloc. cls is not nil.
// Calls [cls allocWithZone:nil].
id
_objc_rootAlloc(Class cls)
{
    return callAlloc(cls, false/*checkNil*/, true/*allocWithZone*/);
}
Copy the code

  • “Step 3” same operation as we entercallAllocMethod, find out.
static ALWAYS_INLINE id callAlloc(Class cls, bool checkNil, bool allocWithZone=false) { #if __OBJC2__ if (slowpath(checkNil && ! cls)) return nil; if (fastpath(! cls->ISA()->hasCustomAWZ())) { return _objc_rootAllocWithZone(cls, nil); } #endif // No shortcuts available. if (allocWithZone) { return ((id(*)(id, SEL, struct _NSZone *))objc_msgSend)(cls, @selector(allocWithZone:), nil); } return ((id(*)(id, SEL))objc_msgSend)(cls, @selector(alloc)); }Copy the code

At this point, it gets a little confusing, with different branches of the source code. This is where we need to debug breakpoints.

Before we start debugging breakpoints, let’s clarify the meaning of a few key judgments in the callAlloc method. 1, hasCustomAWZ. First we looked at hasCustomAWZ and found that its source code implementation looks like this:

bool hasCustomAWZ() const { return ! cache.getBit(FAST_CACHE_HAS_DEFAULT_AWZ); }Copy the code

Moving on to FAST_CACHE_HAS_DEFAULT_AWZ, we find the first thing we need (note the source code comment) :

// class or superclass has default alloc/allocWithZone: implementation
// Note this is is stored in the metaclass.
#define FAST_CACHE_HAS_DEFAULT_AWZ    (1<<14)
Copy the code

CLS ->ISA()->hasCustomAWZ() is used to find out if there is an alloc/allocWithZone: implementation in the class or parent. (Obviously, our Person doesn’t have one at this point.)

2. Now that we know what fastPath means, let’s see what fastPath means.

We followed into the discovery:

#define fastpath(x) (__builtin_expect(bool(x), 1))
#define slowpath(x) (__builtin_expect(bool(x), 0))
Copy the code

Fastpath&slowpath are two macros defined in objC source code. All calls are __builtin_expect.

The main function of __builtin_expect(bool exp, probability) is conditional branching prediction. The first argument is a Boolean expression. The second argument is the probability that the first argument is true. This argument can only be 1 or 0. When the value is 1, it indicates that the Boolean expression is true in most cases. A value of 0 indicates that the Boolean expression is false most of the time. The return value of the function is the expression value of the first argument.

That is, fastPath (! CLS ->ISA()->hasCustomAWZ()) CLS ->ISA()->hasCustomAWZ(). So the “fourth step” is to go to the _objc_rootAllocWithZone inside the if. (This can also be determined by breakpoint debugging)

3. There’s another judgment here__OBJC2__In the source code, there is no way to know what this macro definition is used for. But we can judge the general meaning from the following comment:It follows that__OBJC2__Is to determine if there isCompiler optimization.


  • “Step four”_objc_rootAllocWithZone
NEVER_INLINE id _objc_rootAllocWithZone(Class cls, // allocWithZone under __OBJC2__ ignores the hide AllocWithZone Ignores the zone argument return _class_createInstanceFromZone(CLS, 0, nil, OBJECT_CONSTRUCT_CALL_BADALLOC); }Copy the code

  • “Step five”_class_createInstanceFromZone

_class_createInstanceFromZone is the heart of the alloc process.

static ALWAYS_INLINE id _class_createInstanceFromZone(Class cls, size_t extraBytes, void *zone, int construct_flags = OBJECT_CONSTRUCT_NONE, bool cxxConstruct = true, Size_t *outAllocatedSize = nil) {// 'Realized' --> 'Realized' ASSERT(CLS ->isRealized()); // Read class's info bits all at once for performance Bool hasCxxCtor = cxxCTOR && CLS ->hasCxxCtor(); bool hasCxxDtor = cls->hasCxxDtor(); bool fast = cls->canAllocNonpointer(); size_t size; '_objc_rootAllocWithZone' == '0' size = CLS ->instanceSize(extraBytes); if (outAllocatedSize) *outAllocatedSize = size; id obj; if (zone) { obj = (id)malloc_zone_calloc((malloc_zone_t *)zone, 1, size); } else {// Also according to '_objc_rootAllocWithZone' we can know that '__OBJC2__' case will enter here. Obj = (id)calloc(1, size); } if (slowpath(! obj)) { if (construct_flags & OBJECT_CONSTRUCT_CALL_BADALLOC) { return _objc_callBadAllocHandler(cls); } return nil; } if (! Obj ->initInstanceIsa(CLS, hasCxxDtor); } else { // Use raw pointer isa on the assumption that they might be // doing something weird with the zone or RR. obj->initIsa(cls); } if (fastpath(! hasCxxCtor)) { return obj; } construct_flags |= OBJECT_CONSTRUCT_FREE_ONFAILURE; return object_cxxConstructFromClass(obj, cls, construct_flags); }Copy the code

In _class_createInstanceFromZone, there are three main steps:

  • size = cls->instanceSize(extraBytes);Calculate how much memory you need to open up.
  • obj = (id)calloc(1, size);Application memory
  • obj->initInstanceIsa(cls, hasCxxDtor);associatedclassandIsa pointer.

  • “Step 6”object_cxxConstructFromClass

The source code for this function is also very easy to understand because the official comments are clear. I will cut the key parts here and not show the full source code.You can seeobject_cxxConstructFromClassReturn value of, if yesselfIt meansBuilding a successful; If it isnilIt meansBuild failures.


#### Alloc flow chart Through the above exploration, we conclude thatallocThe call flow chart of the function is as follows:


Above we explored the alloc process, there are a few knowledge points, here we expand:

1: byte alignment && Memory alignment

This is size = CLS ->instanceSize(extraBytes) in _class_createInstanceFromZone; . Here we go to instanceSize:

``` inline size_t instanceSize(size_t extraBytes) const { if (fastpath(cache.hasFastInstanceSize(extraBytes))) { return cache.fastInstanceSize(extraBytes); } size_t size = alignedInstanceSize() + extraBytes; // CF requires all objects be at least 16 bytes. if (size < 16) size = 16; return size; } ` ` `Copy the code

In the case of no cache, we’ll call alignedInstanceSize(), so we’ll follow:

' '// Class's ivar size rounded up to a poor-size boundary. const { return word_align(unalignedInstanceSize()); } ` ` `Copy the code

UnalignedInstanceSize () returns the size of the instancesize variable, depending on the size of the ivars variable:

``` // May be unaligned depending on class's ivars. uint32_t unalignedInstanceSize() const { ASSERT(isRealized()); return data()->ro()->instanceSize; } ` ` `Copy the code

At this point, ourPersonThere is no custom member variable, however, the return value is8; This is becauseNSObjectThere is one in itselfisa. As follows:Classisstruct *Type:

At the same timeobjc_classInherited from the originalobjc_object:

  • Byte alignment is 8-byte alignment

This is where we goword_alignThe function will know:Pay attention toWORD_MASKfor7, according to the algorithm,word_alignReturn value, always8Multiples.

X = 8 &&word_mask = 7; So the formula for this function is :(8 + 7) &7 which is 15 &7. The binary of 15 is –> 0000 1111. The binary of 7 is –> 0000 0111, so ~7 is –> 1111 1000. So 15&7 is 0000 1111&1111 1000, 0000 1000 == 8

  • Memory alignment is 16 bytes aligned

In instanceSize, if there is a cache, the fastInstanceSize function is called:

size_t fastInstanceSize(size_t extra) const { ASSERT(hasFastInstanceSize(extra)); if (__builtin_constant_p(extra) && extra == 0) { return _flags & FAST_CACHE_ALLOC_MASK16; } else { size_t size = _flags & FAST_CACHE_ALLOC_MASK; // remove the FAST_CACHE_ALLOC_DELTA16 that was added // by setFastInstanceSize return align16(size + extra - FAST_CACHE_ALLOC_DELTA16); }}Copy the code

As you can see, the function ends with a call to align16:

static inline size_t align16(size_t x) {
    return (x + size_t(15)) & ~size_t(15);
}
Copy the code

This is not hard to understand, the algorithm is the same as the algorithm above for byte alignment. But the return value is guaranteed to be a multiple of 16.

2: __builtin_expect(bool exp, probability)

This function was briefly introduced above, but will be covered in more detail here (see article: Built-in functions in the LLVM compiler) : the main purpose of this function is to perform conditional branch prediction. The function takes two arguments: the first argument is a Boolean expression and the second argument indicates the probability that the value of the first argument is true. This argument can only take 1 or 0.

A value of 1 indicates that the Boolean expression is true most of the time, while a value of 0 indicates that the Boolean expression is false most of the time.

The return value of the function is the expression value of the first argument.

During the execution of an instruction, the CPU can complete the value of the next instruction due to the function of pipeline, which can improve the CPU utilization. When executing a branch instruction, the CPU also prefetches the next instruction, but if the conditional branch jumps to another instruction, the CPU prefetches the next instruction useless, thus reducing pipeline efficiency. The __builtin_expect function can optimize the sequence of instructions after the program is compiled, so that the instructions are executed as sequentially as possible, thus improving the accuracy rate of CPU prefetch instructions. Such as:

if (__builtin_expect (x, 0))
    foo();
Copy the code

Representation: the value of x may be false most of the time, so foo() is less likely to be executed. This way the compiler does not compile the foo() function next to the if conditional jump instruction when compiling this code. Such as:

if (__builtin_expect (x, 1))
    foo();
Copy the code

The value of x is likely to be true most of the time, so foo() has a better chance of being executed. The compiler compiles the code by placing the assembly instruction for foo() next to the if conditional jump instruction.

To simplify function usage, iOS uses two macros, fastPath and slowPath, to implement this branch optimization judgment processing:

#define fastpath(x) (__builtin_expect(bool(x), 1))
#define slowpath(x) (__builtin_expect(bool(x), 0))
Copy the code