Two or three things about NSArray

In iOS development, we use arrays in many, many ways. And there are a lot of details about arrays that need to be noticed and optimized, and we need to dive in and see. The following are the fragments that I have accumulated during my long time of work and study. I have accumulated enough pieces that I should take them out to shine a light.

Reading out of the question

When I was reading Effective Objective-C 2.0 to learn English a few days ago, I noticed something I hadn’t noticed before:

In the case of NSArray, when an instance is allocated, Allocated tokens it’s an instance of another class that’s allocated, known as a placeholder array. This placeholder array is then converted to an instance of another class, which is a concrete subclass of NSArray. When you use NSArray’s alloc method to get an instance, it first classifies an instance of a class that acts as a placeholder array. That array is later converted to an instance of another class that is an entity subclass of NSArray.

Without further ado, write two lines of code:

    NSArray *placeholder = [NSArray alloc];
    NSArray *arr1 = [[NSArray alloc] init];
    NSArray *arr2 = [[NSArray alloc] initWithObjects:@0, nil];
    NSArray *arr3 = [[NSArray alloc] initWithObjects:@0, @1, nil];
    NSArray *arr4 = [[NSArray alloc] initWithObjects:@0, @1, @2, nil];
    
    NSLog(@"placeholder: %s", object_getClassName(placeholder));
    NSLog(@"arr1: %s", object_getClassName(arr1));
    NSLog(@"arr2: %s", object_getClassName(arr2));
    NSLog(@"arr3: %s", object_getClassName(arr3));
    NSLog(@"arr4: %s", object_getClassName(arr4));
    
    NSMutableArray *mPlaceholder = [NSMutableArray alloc];
    NSMutableArray *mArr1 = [[NSMutableArray alloc] init];
    NSMutableArray *mArr2 = [[NSMutableArray alloc] initWithObjects:@0, nil];
    NSMutableArray *mArr3 = [[NSMutableArray alloc] initWithObjects:@0, @1, nil];
    
    NSLog(@"mPlaceholder: %s", object_getClassName(mPlaceholder));    
    NSLog(@"mArr1: %s", object_getClassName(mArr1));
    NSLog(@"mArr2: %s", object_getClassName(mArr2));
    NSLog(@"mArr3: %s", object_getClassName(mArr3));
Copy the code

The printout looks something like this:

The 2018-02-25 09:09:15. 628381 + 0800 NSArrayTest [44716-5228210] placeholder: __NSPlaceholderArray 2018-02-25 09:09:15.628749+0800 NSArrayTest[44716:5228210] arr1: __NSArray0 2018-02-25 09:09:15.629535+0800 NSArrayTest[44716:5228210] __NSSingleObjectArrayI 2018-02-25 09:09:15.630635+0800 NSArrayTest[44716:5228210] arr3: __NSArrayI 2018-02-25 09:09:15.630789+0800 NSArrayTest[44716:5228210] ARR4: __NSArrayI 2018-02-25 09:09:15.630993+0800 NSArrayTest[44716:5228210] mPlaceholder: __NSPlaceholderArray 2018-02-25 09:09:15.631095+0800 NSArrayTest[44716:5228210] __NSArrayM 2018-02-25 09:09:15.631954+0800 NSArrayTest[44716:5228210] __NSArrayM 2018-02-25 09:09:15.632702+0800 NSArrayTest[44716:5228210] mArr3: __NSArrayMCopy the code

When an array is empty the result is an __NSPlaceholderArray when alloc is empty. When we init an immutable empty array, we get __NSArray0; If there is one and only one element, it is __NSSingleObjectArrayI; Those with more than one element are called __NSArrayI; Init produces a mutable array, all __NSArrayM.

    NSArray *placeholder1 = [NSArray alloc];
    NSArray *placeholder2 = [NSArray alloc];
    NSLog(@"placeholder1: %p", placeholder1);
    NSLog(@"placeholder2: %p", placeholder2);
Copy the code

The printed results are interesting

The 2018-02-25 09:41:45. 097431 + 0800 NSArrayTest [45228-5277101] placeholder1: 0x604000005d90 2018-02-25 09:41:45.097713+0800 NSArrayTest[4522:5277101] meas 2: 0x604000005d90Copy the code

These two memory addresses are the same, and we can guess that a singleton was generated that was replaced by a new instance after init. There is only one ISA pointer inside the class and nothing else. Since Apple did not release the source code here, I looked at other similar open source sources and came to the following conclusions:

1. When the element is null, a singleton of __NSArray0 is returned; If there is only one element, an instance of __NSSingleObjectArrayI is returned. If the element is greater than one, an instance of __NSArrayI is returned. 4, most of the information on the Internet does not mention __NSSingleObjectArrayI, may be added later, the reason is probably for efficiency, here not to explore.

So in order to distinguish between mutable and immutable cases, when we init it, we’re going to create immutablePlaceholder and mutablePlaceholder depending on whether it’s NSArray or NSMutableArray, They are all __NSPlaceholderArray.

Create an array

In each of the above methods, initWithObjects:count: is called at the end.

@interface NSArray<__covariant ObjectType> : NSObject <NSCopying, NSMutableCopying, NSSecureCoding, NSFastEnumeration>

@property (readonly) NSUInteger count;
- (ObjectType)objectAtIndex:(NSUInteger)index;
- (instancetype)init NS_DESIGNATED_INITIALIZER;
- (instancetype)initWithObjects:(const ObjectType _Nonnull [_Nullable])objects count:(NSUInteger)cnt NS_DESIGNATED_INITIALIZER;
- (nullable instancetype)initWithCoder:(NSCoder *)aDecoder NS_DESIGNATED_INITIALIZER;

@end
Copy the code

That’s the beauty of a class family. When you create a subclass of a class family, you don’t need to implement everything. In the Primitive methods referred to in CoreFoundation’s abstract factory base classes (NSArray, NSString, NSNumber, etc.), the core methods that must be overridden to create subclasses are referred to as Primitive methods. It is usually declared in the interface of the class, and it is usually stated in the documentation. Other methods that can optionally be implemented are declared in a Category. At the same time, it should be noted that the Primitive methods of the ancestors of the whole inheritance tree should be implemented.

CFArray and NSMutableArray

CFArray is in CoreFoundation, and as opposed to NSArray in Foundation, it’s toll-free Bridged. As you can see from ibireme’s blog post, CFArray was originally implemented using a dual-ended queue, but it was later changed due to performance issues. Since there was no open source code, Ibireme had to guess through testing that it might be implemented with a circular buffer (although it is now confirmed that it is still dual-ended). Any typical programmer knows how C arrays work. It boils down to a contiguous memory space that can be easily read and written. Arrays and Pointers are not the same thing (see Expert C Programming or this article for details). It cannot be said that a malloc memory space is the same as an array (an overused term).

One of the most obvious disadvantages of using a linear memory space is that when an element is inserted at subscript 0, all other elements need to be moved. This is how memmove works:

This blog post
__NSArrayM
Circular buffer

Ring buffers have some really cool properties. In particular, insertion or deletion at either end does not require any memory to be moved unless the buffer is full. Let’s look at how this class makes full use of the ring buffer to make itself much more powerful than C arrays. We learned a few interesting things here: Pointers are not cleared on delete. The most interesting thing is that if we insert or delete in the middle, only the elements on the least side will be moved.

NSMutableArray Class Reference
NSMutableArray

– count
– objectAtIndex:
– insertObject:atIndex:
– removeObjectAtIndex:
– addObject:
– removeLastObject
– replaceObjectAtIndex:withObject:

Not surprisingly, __NSArrayM fulfills this rule. However, the list of all implementation methods for __NSArrayM is quite short and does not contain the 21 additional methods listed in the NSMutableArray header. Who is responsible for implementing these methods?

This proves that they are just part of the NSMutableArray class itself. This can be quite handy: any NSMutableArray subclass only needs to implement the seven most basic methods. All other high-level abstractions are built upon them. For example, the -removeAllobjects method simply iterates back, calling -removeObjectatIndex: one by one.

N ways to iterate over a set of numbers

1. The for loop

for (int i = 0;  i < array.count; ++i) {
       id object = array[i];
  }
Copy the code

2.NSEnumerator

NSArray *anArray = /*... * /. NSEnumerator *enumerator = [anArray objectEnumerator]; id object;while((object = [enumerator nextObject])! = nil){ }Copy the code

3.forin

Rapid traverse

NSArray *anArray = /*... * /.for (id object in anArray) {

  }
Copy the code

4.enumerateObjectsWithOptions:usingBlock:

With a block callback, iterating through the child thread, the object’s callback order is out of order, and the calling thread waits for the loop to complete:

[array enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id  _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
        xxx
  }];
Copy the code

Performance comparison figure

1. __NSArrayI

@interface __NSArrayI : NSArray
{
    NSUInteger _used;
    id _list[0];
}
@end

Copy the code

_used is the number of elements in the array and is returned when [array count] is called. Here we can use id _list[0] as id *_list, which is a buff to store id objects. Due to the immutable nature of __NSArrayI, once _list is allocated, there is no movement or deletion until it is freed, only the object is retrieved. The __NSSingleObjectArrayI structure is defined as:

@interface __NSSingleObjectArrayI : NSArray
{
    id object;
}
@end

Copy the code

Since the __NSSingleObjectArrayI object is only available when “creating an immutable array containing only one object”, its internal structure is much simpler.

@interface __NSArrayM : NSMutableArray
{
    NSUInteger _used;
    NSUInteger _offset;
    int _size:28;
    int _unused:4;
    uint32_t _mutations;
    id *_list;
}
@end

Copy the code

__NSArrayM is a little more complicated, but again, its internal array of objects is a contiguous memory id* _list, Just as __NSArrayI’s id _list[0] _used: the current number of objects _offset: the starting offset of the actual array of objects, the use of which will be discussed later _size: the allocated size of the _list (the number of objects that can be stored, not the number of bytes) _mutations: Change the token. Every change to __NSArrayM causes _mutations to add 1 to id *_list, which is a circular array. In addition, it dynamically reallocates data to meet current storage requirements.

As we said above, __NSArrayM uses circular buffers. In addition, it dynamically reallocates data to meet current storage requirements. For example, a _list that initially contains five objects with a total size of _size 6: _offset = 0,_used = 5,_size=6

After three objects are appended to the end: _offset = 0,_used =8,_size=8 _list has been reassigned

Delete object A: _offset = 1,_used = 7,_size=8

Delete object E: _offset = 2,_used = 6,_size=8

Append two objects to the end: _offset = 2,_used =8,_size=8 _list is sufficient to store the two new objects, so the two new objects are stored at the beginning of the _list instead of being reallocated

Probe into the speed characteristics of traversal

1. For loop &for in

These are the two fastest, so let’s take Forin as an example. Forin complies with the NSFastEnumeration protocol and has only one method:

- (NSUInteger)countByEnumeratingWithState:
                        (NSFastEnumerationState *)state
                             objects:(id *)stackbuffer 
                                  count:(NSUInteger)len;
Copy the code

It takes the object directly from the C array. For mutable arrays, it takes at most two times to get full speed. If the array is not already looped, all elements are retrieved the first time, just like an immutable array. But if the array forms a loop, you need to get the beginning of the object array offset to the end of the loop array twice, and the remaining elements stored at the beginning of the loop array twice. The for loop is slower because objectAtIndex is called every time: if we don’t need to get the index for the current loop, we can choose Forin.

2. Block cycle

This loop is the slowest, but we can get more information directly from the block as we traverse, and we can change the block’s method signature to avoid casting.

for(NSString *key inaDictionary){ NSString *object = (NSString *)aDictionary[key]; } NSDictionary *aDictionary = /*... * /. [aDictionary enumerateKeysAndObjectsUsingBlock: ^(NSString *key,NSString *obj,BOOL *stop){ }];Copy the code

In addition, dispatch group can also be conveniently used in case of concurrency. One more point: If the number of arrays is too large, all traversal methods other than block traversal need to be optimized by adding an autoreleasePool method. Blocks are not needed because the system already implements the relevant processing when it implements them.

reference

Effective Objective-C 2.0: 52 Effective ways to write high quality iOS and OS X code Performance and principles of array traversal