Tagged Pointer

1. The background

In 2013, apple equipped iPhone5s with the first 64-bit A7 dual-core processor. In order to save memory and improve execution efficiency, apple put forward the concept of Tagged Pointer.

Suppose we want to store an NSNumber object whose value is an integer. Normally, if this integer is just a plain NSInteger variable, then the memory it consumes is dependent on the CPU’s bit, accounting for 4 bytes on a 32-bit CPU and 8 bytes on a 64-bit CPU. The size of the pointer type is also generally related to the CPU bits, with a pointer occupying 4 bytes of memory on a 32-bit CPU and 8 bytes on a 64-bit CPU.

Therefore, if an ordinary iOS application does not have Tagged Pointer, the memory usage of NSNumber, NSDate and other objects will double after migrating from a 32-bit machine to a 64-bit machine, although the logic will not change.

Looking at efficiency issues, in order to store and access an NSNumber object, we need to allocate memory for it on the heap, maintain its reference count, and manage its lifetime. All this adds extra logic to the program, resulting in a loss of efficiency.

2.Tagged Pointer

2.1 implementation

To address the memory footprint and efficiency issues mentioned above, Apple introduced Tagged Pointer. Since the value of variables such as NSNumber and NSDate usually does not need 8 bytes of memory, for example, the signed integer represented by 4 bytes can reach more than 2 billion (note: 2^31=2147483648, plus 1 sign bit), is manageable for most cases.

So we can split a pointer to an object into two parts, one that holds the data directly, and the other that is a special marker indicating that this is a special pointer that does not point to any address. So, after the Tagged Pointer object is introduced, the memory graph for the NSNumber on a 64-bit CPU looks like this:

Since Tagged Pointer stores the value in a Pointer, no space is created in the heap.

2.2 the characteristics of

Tagged Pointer features:

  • Tagged Pointer is used to store small objects, such as NSNumber and NSDate
  • The Tagged Pointer value is no longer an address, but a real value. So, it’s not really an object anymore, it’s just a normal variable in an object’s skin. Therefore, its memory is not stored in the heap and does not require malloc and free.
  • Three times more efficient at memory reads and 106 times faster at creation.

3. Source code exploration

3.1 Whether to support Tagged Pointer

The SUPPORT_TAGGED_POINTERS definition in objc_config.h indicates that Tagged Pointer is supported on 64-bit systems.

#if ! __LP64__ # define SUPPORT_TAGGED_POINTERS 0 #else # define SUPPORT_TAGGED_POINTERS 1 #endifCopy the code

3.2 Checking whether a Pointer variable is Tagged Pointer

In objc_object.h, there is an isTaggedPointer function that determines whether a Pointer variable isTaggedPointer

inline bool 
objc_object::isTaggedPointer() 
{
    return _objc_isTaggedPointer(this);
}
Copy the code

IsTaggedPointer calls _objc_isTaggedPointer in objc_internal. H. Force the pointer to an unsigned long and then do the and with _OBJC_TAG_MASK to see if the result is still _OBJC_TAG_MASK

static inline bool 
_objc_isTaggedPointer(const void * _Nullable ptr)
{
    return ((uintptr_t)ptr & _OBJC_TAG_MASK) == _OBJC_TAG_MASK;
}
Copy the code

Continue to look at the _OBJC_TAG_MASK macro definition.

ARM64 processor # define _OBJC_TAG_MASK (1UL<<63) #elif OBJC_MSB_TAGGED_POINTERS // MSB # define _OBJC_TAG_MASK (1UL<<63) #else // LSB # define _OBJC_TAG_MASK 1UL #endifCopy the code

It turns out that _OBJC_TAG_MASK has two different definitions, with either the highest bit being 1 or the lowest bit being 1.

Therefore, we can say that on different 64-bit devices we can determine whether a Pointer is Tagged Pointer by determining whether the highest or lowest bit of the Pointer value is 1.

3.3 Tagged Pointer Storage Structure

3.3.1 OBJC_SPLIT_TAGGED_POINTERS

Before analyzing Tagged Pointer’s storage structure, take a look at two macro definitions, OBJC_SPLIT_TAGGED_POINTERS and OBJC_MSB_TAGGED_POINTERS.

#if __arm64__ // ARM64 uses a new tagged pointer scheme where normal tags are in // the low bits, extended tags are in the high bits, and half of the // extended tag space is reserved for unobfuscated payloads. # define OBJC_SPLIT_TAGGED_POINTERS 1 #else  # define OBJC_SPLIT_TAGGED_POINTERS 0 #endifCopy the code

ARM64 uses a new marker pointer scheme that separates tags, with extended tags at the high level and normal tags at the low level. And half of the space taken up by extension tags is reserved for unconfused loads.

3.3.2 rainfall distribution on 10-12 OBJC_MSB_TAGGED_POINTERS

# if (TARGET_OS_OSX | | TARGET_OS_MACCATALYST) && __x86_64__ / / 64 - bit Mac - tag bit is LSB / / in a 64 - bit Mac - the least significant bit (LSB) # define OBJC_MSB_TAGGED_POINTERS 0 #else // Everything else - tag bit is MSB // Using MSB as the tagged Pointer flag # define OBJC_MSB_TAGGED_POINTERS 1 #endifCopy the code

MAC systems with 64-bit x86 architecture use LSB, and the rest are MSB. MSB means that data is stored and distributed from a high level, while LSB is the opposite.

3.3.3 Storage Structure

Objc-runtime-new. mm objC-Runtime-new. mm

/*********************************************************************** * Tagged pointer objects. * * Tagged pointer objects store the class and the object value in the * object pointer; the "pointer" does not actually point to anything. * * Tagged pointer objects currently use this representation: * (LSB) * 1 bit set if tagged, clear if ordinary object pointer * 3 bits tag index * 60 bits payload * (MSB) * The tag index defines the object's class. * The payload format is defined by the object's class. * * If the tag index is 0b111, the tagged pointer object uses an * "extended" representation, allowing more classes but with smaller payloads: * (LSB) * 1 bit set if tagged, clear if ordinary object pointer * 3 bits 0b111 * 8 bits extended tag index * 52 bits payload * (MSB) * * Some architectures reverse the MSB and LSB in these representations. * * This representation is subject to change. Representation-agnostic SPI is: * objc-internal.h for class implementers. * objc-gdb.h for debuggers. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /Copy the code

Under LSB, the least significant bit is the flag bit of tagged Pointer. The three bits then store the tag index, which defines the type of the current object. The remaining 60 bits of space is the payload, and the payload is the actual value of the object. The format of the payload is determined by its type. MSB’s storage structure is just the opposite of LSB’s.

inLSBwhentag indexThe value of0b1117,tag indexThe value does not refer to the type of the object, but to an additional 8 bits for storagetag index. At this momenttag indexThere are more classes to represent, but the amount of space used to store data is reduced, by 8 bits, to a maximum of 52 bits.

This objc_internal. H provides macro definitions of the displacement values that will be used when operating on tagged Pointer.

# define _OBJC_TAG_MASK (1UL<<63) // Set the flags at the top # define # define _OBJC_TAG_SLOT_SHIFT 0 # define _OBJC_TAG_SLOT_SHIFT 0 # define _OBJC_TAG_PAYLOAD_LSHIFT 1 # define _OBJC_TAG_PAYLOAD_RSHIFT 4 # define _OBJC_TAG_PAYLOAD_RSHIFT 4 / / load the digits on the right On the right is the tag first moves to the right after the left will be able to take out the actual load value # define _OBJC_TAG_EXT_MASK (_OBJC_TAG_MASK | 0 x7ul) / / 64 and 7 (111) and # define _OBJC_TAG_NO_OBFUSCATION_MASK ((1UL<<62) | _OBJC_TAG_EXT_MASK) # define _OBJC_TAG_CONSTANT_POINTER_MASK \ ~(_OBJC_TAG_EXT_MASK | ((uintptr_t)_OBJC_TAG_EXT_SLOT_MASK << _OBJC_TAG_EXT_SLOT_SHIFT)) # define # define _OBJC_TAG_EXT_SLOT_SHIFT 55 // From taggeed. # define _OBJC_TAG_EXT_SLOT_SHIFT 55 # define _OBJC_TAG_EXT_PAYLOAD_LSHIFT 9 # define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12 # define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12 // macro definition MSB # define _OBJC_TAG_MASK (1UL<<63) // Flag bit at the highest bit # define _OBJC_TAG_INDEX_SHIFT 60 // The number of digits that need to be moved in order to obtain the tag index moves 60 bits to the right to obtain the highest 4-bit value and then calculates the value with mask(111). # define _OBJC_TAG_SLOT_SHIFT 60 # define _OBJC_TAG_PAYLOAD_LSHIFT 4 # define _OBJC_TAG_PAYLOAD_RSHIFT 4 # define _OBJC_TAG_EXT_MASK (0xfUL<<60) // Extend tag mask 60-64 # define # define _OBJC_TAG_EXT_SLOT_SHIFT 52 // From taggeed. # define _OBJC_TAG_EXT_SLOT_SHIFT 52 # define _OBJC_TAG_EXT_PAYLOAD_LSHIFT 12 # define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12 # define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12 LSB # define _OBJC_TAG_MASK 1UL LSB # define _OBJC_TAG_MASK 1UL # define _OBJC_TAG_SLOT_SHIFT 0 # define _OBJC_TAG_SLOT_SHIFT 0 # define tag_tag_slot_shift 0 # define _OBJC_TAG_PAYLOAD_LSHIFT 0 # define _OBJC_TAG_PAYLOAD_RSHIFT 4 # define _OBJC_TAG_PAYLOAD_RSHIFT 4 # define _OBJC_TAG_PAYLOAD_RSHIFT 4 # define _OBJC_TAG_EXT_MASK 0xfUL // Extend tag mask 1111 # define _OBJC_TAG_EXT_INDEX_SHIFT 4 # define _OBJC_TAG_EXT_SLOT_SHIFT 4 // Get the Class from taggeed Pointer _OBJC_TAG_EXT_PAYLOAD_LSHIFT 0 // Bits on the left of load with extensions # define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12 // Bits on the right of load with extensions 8 + 3 + 1 #endifCopy the code

Using the macro definition above, we can proceed to draw the structure diagram under MSB. When there is no extension tagWhen there is an extension tag

In ARM64 mode, when there is no extension tagARM64, when there is an extension tag

3.4 Encoding/decoding

3.4.1 track confused value

extern uintptr_t objc_debug_taggedpointer_obfuscator;
Copy the code

Objc_debug_taggedpointer_obfuscator is a constant.

In objc_runtime_new. Mm, there is a about objc_debug_taggedpointer_obfuscator and initializeTaggedPointerObfuscator description:

/*********************************************************************** * initializeTaggedPointerObfuscator * Initialize objc_debug_taggedpointer_obfuscator with randomness. * * The tagged pointer obfuscator is intended to make it  more difficult * for an attacker to construct a particular object as a tagged pointer, * in the presence of a buffer overflow or other write control over some * memory. The obfuscator is XORed with the tagged pointers when setting * or retrieving payload values. They are filled with randomness on first * use. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /Copy the code

Tagged Pointer obfuscation values are intended to make it more difficult for an attacker to construct a particular object as a marker pointer when there is a buffer overflow or other write control over some memory. In short, safety. Confuse a value with tagged Pointer for an xor operation.

InitializeTaggedPointerObfuscator used to initialize the confusion value.

static void initializeTaggedPointerObfuscator(void) { if (! DisableTaggedPointerObfuscation // && dyld_program_sdk_at_least(dyld_fall_2018_os_versions) ) { // Pull random data into  the variable, then shift away all non-payload bits. arc4random_buf(&objc_debug_taggedpointer_obfuscator, sizeof(objc_debug_taggedpointer_obfuscator)); // create a random number objc_debug_taggedPOinter_obfuscator &= ~_OBJC_TAG_MASK; // The obfuscator doesn't apply to = 0 #if OBJC_SPLIT_TAGGED_POINTERS = 0 #if objc_tagged_pointers = 0 Any of the extended tag mask or the no-obfuscation bit. Obfuscator does not apply to extended Tag masks and no-Obfuscation bits // Set the corresponding position to 0 objC_debuG_TaggedPOinter_obfuscator &= ~(_OBJC_TAG_EXT_MASK | _OBJC_TAG_NO_OBFUSCATION_MASK); // Shuffle the first seven entries of the tag permutator. // Swap the values in the objc_debug_tag60_permutations array randomly Int Max = 7; for (int i = max - 1; i >= 0; i--) { int target = arc4random_uniform(i + 1); swap(objc_debug_tag60_permutations[i], objc_debug_tag60_permutations[target]); } #endif } else { // Set the obfuscator to zero for apps linked against older SDKs, // in case they're relying on the tagged pointer representation. objc_debug_taggedpointer_obfuscator = 0; }}Copy the code

3.4.2 coding

Void * _Nonnull _objc_encodeTaggedPointer(uintptr_t PTR) {void * _Nonnull _objc_encodeTaggedPointer(uintptr_t PTR) Value Uintptr_t value = (objc_debug_taggedPOinter_obfuscator ^ PTR); // if ((value & _OBJC_TAG_NO_OBFUSCATION_MASK) == _OBJC_TAG_NO_OBFUSCATION_MASK) return (void *)ptr; Uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) &_objc_tag_index_mask; // Uintptr_t permutedTag = _objc_basicTagToObfuscatedTag(basicTag); Value &= ~(_OBJC_TAG_INDEX_MASK << _OBJC_TAG_INDEX_SHIFT); / / will confuse the tag after the index to value the value | = permutedTag < < _OBJC_TAG_INDEX_SHIFT; #endif return (void *)value; }Copy the code

Rule 3.4.3 decoding

To decode Tagged Pointer, perform an xor operation with objC_debug_taggedPOinter_obfuscator

static inline uintptr_t
_objc_decodeTaggedPointer_noPermute(const void * _Nullable ptr)
{
    uintptr_t value = (uintptr_t)ptr;
#if OBJC_SPLIT_TAGGED_POINTERS
    if ((value & _OBJC_TAG_NO_OBFUSCATION_MASK) == _OBJC_TAG_NO_OBFUSCATION_MASK)
        return value;
#endif
    
    return value ^ objc_debug_taggedpointer_obfuscator;
    
}
Copy the code

Here, set the confused tag index back to its original value

static inline uintptr_t
_objc_decodeTaggedPointer(const void * _Nullable ptr)
{
    uintptr_t value = _objc_decodeTaggedPointer_noPermute(ptr);
#if OBJC_SPLIT_TAGGED_POINTERS
    uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK;

    value &= ~(_OBJC_TAG_INDEX_MASK << _OBJC_TAG_INDEX_SHIFT);
    value |= _objc_obfuscatedTagToBasicTag(basicTag) << _OBJC_TAG_INDEX_SHIFT;
#endif
    return value;
}
Copy the code

3.5 create TaggedPointer

static inline void * _Nonnull _objc_makeTaggedPointer(objc_tag_index_t tag, uintptr_t value) { // PAYLOAD_LSHIFT and PAYLOAD_RSHIFT are the payload extraction shifts. // They are reversed here for  payload insertion. // ASSERT(_objc_taggedPointersEnabled()); if (tag <= OBJC_TAG_Last60BitPayload) { // ASSERT(((value << _OBJC_TAG_PAYLOAD_RSHIFT) >> _OBJC_TAG_PAYLOAD_LSHIFT) == value); uintptr_t result = (_OBJC_TAG_MASK | ((uintptr_t)tag << _OBJC_TAG_INDEX_SHIFT) | ((value << _OBJC_TAG_PAYLOAD_RSHIFT) >>  _OBJC_TAG_PAYLOAD_LSHIFT)); return _objc_encodeTaggedPointer(result); } else { // ASSERT(tag >= OBJC_TAG_First52BitPayload); // ASSERT(tag <= OBJC_TAG_Last52BitPayload); // ASSERT(((value << _OBJC_TAG_EXT_PAYLOAD_RSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_LSHIFT) == value); uintptr_t result = (_OBJC_TAG_EXT_MASK | ((uintptr_t)(tag - OBJC_TAG_First52BitPayload) << _OBJC_TAG_EXT_INDEX_SHIFT) | ((value << _OBJC_TAG_EXT_PAYLOAD_RSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_LSHIFT)); return _objc_encodeTaggedPointer(result); }}Copy the code

When you create a TaggedPointer, there are two cases:

  • There are extension tags
  • No extended tag

When there is no extended tag, tagged Point is divided into three parts: tag mask, tag index, and payload. We move the values of each of them to the correct position by displacement calculation, and then we do the or operation to combine them.

The process is similar when there is an extension tag, _OBJC_TAG_EXT_MASK this mask value contains the tag bit and the value of 7.

3.6 Obtaining the Value of TaggedPointer

static inline uintptr_t _objc_getTaggedPointerValue(const void * _Nullable ptr) { // ASSERT(_objc_isTaggedPointer(ptr));  uintptr_t value = _objc_decodeTaggedPointer_noPermute(ptr); uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK; if (basicTag == _OBJC_TAG_INDEX_MASK) { return (value << _OBJC_TAG_EXT_PAYLOAD_LSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_RSHIFT; } else { return (value << _OBJC_TAG_PAYLOAD_LSHIFT) >> _OBJC_TAG_PAYLOAD_RSHIFT; } // The implementation of this function is very simple: first, it decodes Tagged Pointer, performs xor with objc_debug_taggedPOinter_obfuscator, and then performs shift based on platform macro definitions. } static inline intptr_t _objc_getTaggedPointerSignedValue(const void * _Nullable ptr) { // ASSERT(_objc_isTaggedPointer(ptr)); uintptr_t value = _objc_decodeTaggedPointer_noPermute(ptr); uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK; if (basicTag == _OBJC_TAG_INDEX_MASK) { return ((intptr_t)value << _OBJC_TAG_EXT_PAYLOAD_LSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_RSHIFT; } else { return ((intptr_t)value << _OBJC_TAG_PAYLOAD_LSHIFT) >> _OBJC_TAG_PAYLOAD_RSHIFT; }}Copy the code
  • First thetagged pointerdecoding
  • Then check if there is an extension tag
  • Value can be obtained by corresponding displacement operation

3.7 Obtaining the Tag of TaggedPointer

static inline objc_tag_index_t _objc_getTaggedPointerTag(const void * _Nullable ptr) { // ASSERT(_objc_isTaggedPointer(ptr)); uintptr_t value = _objc_decodeTaggedPointer(ptr); uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK; uintptr_t extTag = (value >> _OBJC_TAG_EXT_INDEX_SHIFT) & _OBJC_TAG_EXT_INDEX_MASK; if (basicTag == _OBJC_TAG_INDEX_MASK) { return (objc_tag_index_t)(extTag + OBJC_TAG_First52BitPayload); } else { return (objc_tag_index_t)basicTag; }}Copy the code
  • Decode Tagged Pointer first
  • Then through the displacement operation, obtainbasicTagextTag
  • Check if there is an extension tag and return the result

3.8 Obtaining a Class pointer based on objc_tag_index_t

Fetching a Class pointer is done by fetching the classes in the objC_tag_CLASSES or objC_tag_ext_CLASSES array based on the tag subscript value.

static Class * classSlotForBasicTagIndex(objc_tag_index_t tag) { #if OBJC_SPLIT_TAGGED_POINTERS uintptr_t obfuscatedTag = _objc_basicTagToObfuscatedTag(tag); return &objc_tag_classes[obfuscatedTag]; #else uintptr_t tagObfuscator = ((objc_debug_taggedpointer_obfuscator >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK);  uintptr_t obfuscatedTag = tag ^ tagObfuscator; // Array index in objc_tag_classes includes the tagged bit itself // Array index in objc_tag_classes includes the tagged bit itself # if SUPPORT_MSB_TAGGED_POINTERS return &objc_tag_classes[0x8 | obfuscatedTag]; # else return &objc_tag_classes[(obfuscatedTag << 1) | 1]; # endif #endif }Copy the code
static Class *  
classSlotForTagIndex(objc_tag_index_t tag)
{
    if (tag >= OBJC_TAG_First60BitPayload && tag <= OBJC_TAG_Last60BitPayload) {
        return classSlotForBasicTagIndex(tag);
    }

    if (tag >= OBJC_TAG_First52BitPayload && tag <= OBJC_TAG_Last52BitPayload) {
        int index = tag - OBJC_TAG_First52BitPayload;
#if OBJC_SPLIT_TAGGED_POINTERS
        if (tag >= OBJC_TAG_FirstUnobfuscatedSplitTag)
            return &objc_tag_ext_classes[index];
#endif
        uintptr_t tagObfuscator = ((objc_debug_taggedpointer_obfuscator
                                    >> _OBJC_TAG_EXT_INDEX_SHIFT)
                                   & _OBJC_TAG_EXT_INDEX_MASK);
        return &objc_tag_ext_classes[index ^ tagObfuscator];
    }

    return nil;
}
Copy the code