1. Understand the importance of hash

Aren’t we curious to see Hash everywhere in iOS development?

The figure below is just a partial list of the points(HashApplication analysis in iOS

A quote from Zhihu:

Algorithms, data structures, communication protocols, file systems, drivers — even if you don’t write them yourself, understanding how they work can help you troubleshoot and optimize your code, just as if you don’t design and build cars, but if you know how engines, transmissions, airbags, etc., How you drive fuel-efficient, prolong service life and guarantee their safety benefits learning without thought is labor lost; thought without learning is perilous, developers will be one with wave into the industry, no matter when and where, keep the depth and breadth of learning for its own development is very important, who all don’t want to retire at 60 still look up to in the add or delete.

1.1 Implementation principle of associated object:

See other resources for more details on the principles, but I’ll just cover the basic data structures used in the implementation. The associated object uses the structure of HashMap nested HashMap to store data. Simply speaking, the second HashMap of all the associated objects that store the object is extracted from the first HashMap according to the object, and then the corresponding value and policy of the attribute is extracted from the second HashMap according to the attribute name.

Associative objects are designed so that the value of the attribute can be found by passing in the object + attribute name. After the schema design is implemented, the basic steps for finding the associated objects of an object are as follows:

  • 1.Given condition one: object, hence the first oneHashMap(AssociationsHashMap), using a value that uniquely represents the objectkey, using a structure (name: value + policy) that stores all associated objects of the objectvalue
  • 2,Known condition two: attribute nameHence the secondHashMap(ObjectAssociationMap), using the attribute namekey, using the structure (value + policy) corresponding to the attribute namevalue

References:

Summary of iOS underlying principles – associated object implementation principle

The AssociatedObject AssociatedObject is fully resolved

1.2 Implementation principle of Weak:

The same detailed principles can be found elsewhere, but the basic data structures used in the implementation are described here. Weak uses a global HashMap nested array structure to store data. When the object (the object to which the weak pointer points) is destroyed, the array containing all the weak Pointers to this object is found from the HashMap according to the object, and then all elements in the array (the weak pointer) are set to nil.

The most important feature of weak is that when the object is destroyed, it automatically sets nil to reduce the risk of accessing wild Pointers. This is also the original intention of weak design. After the scheme design is implemented, the basic steps of setting weak pointer to nil are as follows:

  • Dealloc () select an array of weak Pointers to this object from the global HashMap based on a unique value as the key

  • Set all elements of the array to nil

Apple’s implementation of weak is similar to the implementation of notifications, specifying who (the weak pointer) listens to whom (the assignment object) what event (the dealloc operation) does what (the nil operation).

References:

The basic level of iOS analyzes the implementation principle of weak (including initialization, reference, and release analysis of the weak object)

1.3 basic data structure used by KVO implementation

It’s complicated. An object can be observed by n objects, and n properties of an object can be observed by N objects.

GNUstep KVC/KVO exploration (ii) : internal implementation of KVO

1.4 Principle of iOS App signature

In a word: consistent hash algorithm + asymmetric encryption and decryption algorithm

For details, see the principle of iOS App signature

1.5. The location where the object reference count is stored

Apple iOS source code thinking: object reference count stored in where? Inspiration from runtime source code

If objects support TaggedPointer {return directly returns the pointer value of the object as a reference count} else if the device is a 64-bit environment && Objective-C2.0 {return Part of the object isa pointer space (bits_extra_rc)} else {return hash table}Copy the code

1.6. Storage relationship between Runloop and thread

There is a one-to-one correspondence between threads and runloops (child threads may not have one), and the relationship is stored in a global Dictionary. There is no RunLoop when the thread is created, and if you don’t grab it, it never will. RunLoop creation occurs at the first fetch and RunLoop destruction occurs at the end of the thread. You can only get runloops inside a thread (except for the main thread).

1.7 Principle of NSDictionary:

After explaining the Hash table, the following is a brief explanation

Second, hash table

Bucket sort

2.1 hash table definition

A hash table (also called a hash table) is a data structure that accesses data stored in memory directly by key. A hash table is essentially an array, and each element in the array becomes a box that holds key-value pairs. Retrieves value from the array based on index. The key is how to get the index, which requires a fixed function (hash function) to convert the key to index. No matter how well a hash function is designed, it is possible for different keys to get the same hash value after being hashed, which requires handling hash conflicts.

2.2 advantages and disadvantages of hash table

Advantages: Hash tables can provide fast operations.

Disadvantages: Hash tables are usually based on arrays and are difficult to expand once arrays are created. There is also no easy way to traverse the data items in a table in any order, such as from smallest to largest.

In summary, if the data is not traversed in order, the well can predict the size of the data volume in advance. So hash tables are unmatched in terms of speed and ease of use.

2.3. Hash search steps

  • 1. Use a hash function to transform the searched key map into the index of the array. Ideally, different key maps have different array subscripts, and all the search time is O(1). But that’s not the case, so the second step in hash lookup is to deal with hash conflicts.

  • 2. Deal with Hash collision. There are many processing methods, such as zipper method, linear detection method.

2.4 Hash table stored procedure

  • 1. Use the hash function to obtain the hash value h based on key

  • 2, if the number of bins is N, then the value should be stored in the bottom (h% N) bins. H %n has a value range of [0, n-1].

  • 3. If the box is not empty (a value has been stored), that is, different keys get the same H, resulting in a hash conflict, in this case, the use of zipper method or open addressing linear detection method to resolve the conflict.

Hash (" hash ") = 23; Hash (" hash ") = 30; Hash (" hash ") = 23;Copy the code

2.5. Common hash functions:

The first step in hash lookup is to map keys to indexes using hash functions. This mapping function is the hash function. If we have an array that holds 0-m, then we need a hash function that can convert any key to an index (0 to m-1) in the range of that array. The hash function needs to be easy to compute and distribute all the keys evenly. For a simple example, using the last three digits of a mobile phone number as a key is better than using the first three digits because of the high repetition rate. For example, it is better to use an ID number and date of birth than to use the first few digits.

In practice, our keys are not all numbers, they may be strings, they may be combinations of several values, etc., so we need to implement our own hash function.

  • 1. Direct addressing
  • 2. Numerical analysis
  • 3. Square the middle method
  • 4, folding method
  • Random number method
  • 6, divide the remainder method

It is not easy to design an excellent hash algorithm. Based on experience, the following requirements are summarized:

  • The original data cannot be derived backwards from the hash value (so the hash algorithm is also called one-way hash algorithm);
  • Very sensitive to input data, even if the original data only changed a Bit, the final hash value will be very different;
  • The probability of hash conflict is very small, for different original data, the probability of the same hash value is very small;
  • The efficiency of hashing algorithm should be as efficient as possible. For long text, hashing value can also be calculated quickly.

2.6 load factor = total keyed logarithm/number of arrays

Load factor is an important attribute of hash table, which is used to measure the degree of empty/full hash table, and can also improve the efficiency of query to a certain extent. The larger the load factor, the fuller the hash table, the more likely it is to cause conflicts, and the lower the performance. So when the load factor is greater than a constant (typically 0.75), the hash table will automatically expand. When a hash table is expanded, it typically creates twice the size of the original array. Therefore, even if the hash value of the key does not change, the result of modulating the number of arrays will change as the number of arrays increases, so the positions of the key-value pairs may change. This process is also known as rehash.

When a large number of arrays are added to the hash table, data needs to be rehashed and moved, which affects performance greatly.

Although hash table expansion can reduce the load factor, it can not always improve the query performance of hash table effectively. For example, the unreasonable design of hash function leads to the same hash value calculated by all keys, so their positions are still in the same linked list and become a linear list even if the capacity is expanded. The performance is extremely low, and the time complexity of query becomes O(n).

2.7 Solutions to Hash conflicts:

Method one: zipper method

It’s simply an array plus a linked list. The key is mapped to the subscript index of an array of size M by the hash function. Each element of the array points to a linked list, and each node in the linked list stores a key-value pair with the hash index as the subscript of the node.

Java 8 uses the zipper approach to resolve hash conflicts. When the design of hash function is not reasonable, the linked list is very long (if the length of the list is over 8, it is switched to a red-black tree, and if the length is less than 6, it is degraded to a linked list again). Switching the linked list to a red-black tree can ensure the efficiency of insertion and search, but the disadvantage is that when the hash table is large, the expansion of the hash table will lead to the instantaneous efficiency reduction.

Redis also uses the zipper method to resolve hash conflicts. Incremental capacity expansion solves the problem of instantaneous capacity reduction in Java 8, and the zipper approach (inserting new key pairs at the head of the list) brings two benefits:

  • First, the head insertion method can save the insertion time. If it goes to the tail, the time is zeroO(n)Find the tail of the list, or need an additional memory location to hold the tail of the list.
  • Two, the head plug method can save the search time. For a data system, newly inserted data is likely to be queried frequently.

Method two: open addressing linear detection

Use two arrays of size N (one for keys and one for values). When a collision occurs (that is, the index of the hash value of one key is occupied by the index of the array of another key), the index is increded by one (index += 1), which produces three results:

  • 1, missed (array subscript value is empty, not occupied).
keys[index] = key
values[index] = value
Copy the code
  • 2, hit (array subscript value is not empty, occupied).
if keys[index] == key {
	values[index] = value
}
Copy the code
  • 3, hit (array subscript value is not empty, occupied).
if keys[index] ! = key {index += 1 // Repeat the above steps until either result 1 or result 2 above is stopped. / /... / /... }Copy the code

Advantages of the zipper method

Compared with open addressing linear probe, the zipper method has the following advantages:

  • (1),Zipper methodConflict processing is simple and there is no stacking phenomenon, that is, non-synonyms will never conflict, so the average search length is short.
  • (2), becauseZipper methodThe node space of each linked list is applied dynamically, so it is more suitable for the situation that the table length cannot be determined before table construction.
  • (3),Open addressing linear probe sendsIn order to reduce conflicts, the filling factor α is required to be small, so when the node scale is large, a lot of space will be wasted. whileZipper methodCan be taken as α≥1, and the node is large,Zipper methodThe pointer field added in, can be ignored, so it saves space;
  • (4), and in useZipper methodIn the constructed hash table, the operation of deleting nodes is easy to implement. Simply delete the corresponding nodes on the linked list. And forOpen addressing linear probe sendsThe deletion node cannot simply empty the space of the deleted node, otherwise it will truncate the lookup path of the synonym node that fills in the hash after it. This is because of variousOpen addressing linear probe sends, empty address cell (open address) is a search failure condition. So in useOpen addressing linear probe sendsDeleting a hash table that handles a conflict can only mark the node to be deleted, not actually delete the node.

Disadvantages of the zipper method

Pointers need extra space, so when the node size is small, open addressing linear probe saves space, and if the saved pointer space is used to expand the size of the hash table, the loading factor can be reduced, which reduces the conflict in open addressing linear probe, so as to improve the average search speed.

Disadvantages of open addressing linear detection method

  • 1, easy to produce stacking problems;
  • 2. Not suitable for large-scale data storage;
  • 3. The design of hash functions has a great influence on conflicts;
  • 4. Multiple conflicts may occur during insertion. The deleted element is one of multiple conflicting elements.
  • 5. Large nodes will waste a lot of space;

2.8. Average search length of Hash table

The average search length of the Hash table includes the average search length when the search succeeds and the average search length when the search fails.

Average search length when the search is successful = sum of comparison times when each element in the table is searched successfully/number of elements in the table;

Search is not successful when the average search length of equivalent element in the table lookup is not successful, on average, more often, in the table can be understood as to insert an element, the element in each position is possible, and then calculate the comparison are required for each position to be able to insert the number of times, and then divided by the long table is the average search length of search is not successful.

Example:

Given a set of data {32,14,23,01,42,20,45,27,55,24,10,53}, assume that the length of the hash table is 13 (the nearest prime number to n) and that the hash function is H(k) = k%13. Draw the hash table constructed by linear detection method and zipper method respectively, and calculate the average search length of successful and unsuccessful search in the case of equal probability.

First, zipper method

Average search length when a search is successful:

ASL = (1*6+2*4+3*1+4*1)/12 = 7/4
Copy the code

Average search length when the search is unsuccessful:

ASL = (4+2+2+1+2+1)/13
Copy the code

Two, linear detection method

Number of searches when the search is successful = number of comparisons when the element is inserted, and average length of searches when the search is successful:

ASL = (1 + 2 + 1 + 4 + 3 + 1 + 1 + 1 + 3 + 9 + 1 + 1 + 3) / 12 = 2.5Copy the code

The number of searches when the search fails = the number of comparisons when the search fails is, the distance between the NTH location and the first location without data: for example, the 0th location is 1, and the first location is 2. Average search length when the search is unsuccessful:

ASL = (1+2+3+4+5+6+7+8+9+10+11+12)/ 13 = 91/13
Copy the code

NSDictionary version 1: YesNSMapTableImplemented, adoptedZipper methodResolve hash conflicts

typedef struct {
   NSMapTable        *table;
   NSInteger          i;
   struct _NSMapNode *j;
} NSMapEnumerator;
Copy the code

The above structure describes a pointer object that traverses an NSMapTable, including a pointer to the table object itself, a count, and a node pointer.

typedef struct {
   NSUInteger (*hash)(NSMapTable *table,const void *);
   BOOL (*isEqual)(NSMapTable *table,const void *,const void *);
   void (*retain)(NSMapTable *table,const void *);
   void (*release)(NSMapTable *table,void *);
   NSString  *(*describe)(NSMapTable *table,const void *);
   const void *notAKeyMarker;
} NSMapTableKeyCallBacks;
Copy the code

The above structure stores Pointers to functions used to calculate the hash value of keys, determine whether keys are equal, retain, and release operations.

typedef struct {
   void       (*retain)(NSMapTable *table,const void *);
   void       (*release)(NSMapTable *table,void *);
   NSString  *(*describe)(NSMapTable *table, const void *);
} NSMapTableValueCallBacks;
Copy the code

The above three function Pointers define the operation on the value object when a pair of key-values is inserted into the NSMapTable.

@interface NSMapTable : NSObject {
   NSMapTableKeyCallBacks   *keyCallBacks;
   NSMapTableValueCallBacks *valueCallBacks;
   NSUInteger             count;
   NSUInteger             nBuckets;
   struct _NSMapNode  **buckets;
}
Copy the code

Is NSMapTable really a hash + linked list data structure? The find time for inserting or deleting an object in NSMapTbale = O(1) + O(m), m being the worst possible n.

O(1) : Hash the key to obtain the location of the bucket

O(m) : the value of different keys with the same hash value is stored in the linked list to traverse the time of the linked list

The above conclusion and the corresponding explanation seem reasonable, right? Take a look at the analysis below!

NSDictionary version 2: Yes_CFDictionaryThe encapsulation used to resolve hash conflicts isOpen addressing linear detection method

struct __CFDictionary {
    CFRuntimeBase _base;
    CFIndex _count;
    CFIndex _capacity;
    CFIndex _bucketsNum;
    uintptr_t _marker;
    void *_context;
    CFIndex _deletes;
    CFOptionFlags _xflags;
    const void **_keys;	
    const void **_values;
};
Copy the code

As you can see from the above data structure, NSDictionary uses two pointer arrays to hold keys and values respectively. The key value pairs are stored in a continuous fashion. The zipper method wraps the key and value into a single result store (linked list nodes), while the Dictionary structure has keys and values (open addressing linear detection resolves hash collisions using two arrays), indicating that the two data are stored separately, unlike the zipper method.

NSDictionary uses open addressing linear detection to resolve hash conflicts:

As you can see, the keys and values set by the NSDictionary will calculate the empty bucket array based on the specific hash function. The keys and values will be the same as many keys and values. Then when storing the data, the index index will be found according to the value calculated by the hash function. Move inserts after opening addressing. If the empty bucket array reaches the data threshold, the empty bucket array is expanded and hashed again.

This inserts discrete key-values into the hash table that can establish a relationship. When we look up the key, we calculate the key from the hash > value, and then we access the hash table keys and values directly from the index. In this way, the query speed can be as close to O(1) as continuous linear storage data, but it takes up a bit of space, and the performance is very strong.

If deleted, it will also be logically deleted according to the _maker flag, unless the NSDictionary memory is removed.

NSDictionary uses this design for query performance. Second, NSDictionary will always be released quickly during use and will not occupy memory for a long time.

2.11 Apple scheme Selection:

Apple uses both zip and open addressing linear detection to resolve hash conflicts. Which one to use depends on the life cycle and characteristics of the stored data.

  • @synchronizedUsing theZipper method. The data mostly stored by the zipper method is a universal type and can be repeatedly used. Just like the lock stored by @synchronized is an implementation structure of irrelevant business, the probability of multiple objects using the same lock is quite high when the program is running, which effectively saves memory.
  • Weak object associatedObjectUSES aOpen addressing linear detection method. Open addressing linear probe is used to store data that is temporary and released as soon as it runs out, like associatedObject, weak.

2.12 NSDictionary stored procedure:

See another blog I wrote: iOS notes: learn more about ==, isEqual, and hash

  • 1, through the method- (void)setObject:(id)anObject forKey:(id <NSCopying>)aKey;You can see that the key must be followedNSCopyAgreement, that is to sayNSDictionaryThe key of the copy is a new copy, and the value is a shallow copy (if you want to make a deep copy)NSMapTable).
  • 2, But this is not enough, the key must be inheritedNSObjectAnd rewrite-(NSUInteger)hash;and-(BOOL)isEqual:(id)object;Two methods. The first function is used to compute the hash value, and the second function is used to determine if the values are the same when the hash value is the same (resolveHash conflict).

Refer to the blog

A brief introduction to algorithms and data structures: eleven hash tables

NSDictionary and NSMutableArray basic principles (hash tables and ring buffers)

Swift dictionary implementation principle

Understand hash tables in depth

Unscrambles objective-C NSDictionary

IOS redo the wheel, write an NSDictionary

IOS redo the wheel, write an NSDictionary

Hash table – linear probe, chain address method, average length of successful search, unsuccessful search

Hash table, hash algorithm, consistent hash table

Detail @synchronized and dispatch_once