The basic principles of hash tables

Let’s start by reviewing arrays and lists we’ve implemented before, whether it’s a dynamic array, a linked list, or a circular linked list, and if you’re looking for a particular value from one of them, you inevitably have to iterate over all the elements to compare them.

As users of object-oriented development languages, you’ve probably used objects like Maps, which store a key-value pair through key: value. Such objects in Objective-C are dictionaries: NSDictionary, NSMutableDictionary, and Dictionary in Swift. The time complexity of the key value is O(1). We used to just use them, but now we’re thinking about how we can implement a custom dictionary ourselves using the data structures we already know.

As mentioned above, the time complexity of searching values through arrays and linked lists is O(n) level, so how to achieve only O(1) level to query values? Let’s start by reviewing a few data structures: arrays, unidirectional lists, and bidirectional lists. Which data structure can be O(1) anywhere?

The answer is an array, and the data can be evaluated by subscript anywhere in the array in order one time. In this case, we decided to use static arrays as the underlying structure for storing data. In this case, our key-value pairs are stored in arrays. But the time complexity of O(1) is only possible with subscripts, and looking up values directly still requires sequential comparisons of elements for equality.

So the next question is, how do I convert the traversal lookup of the key into a lookup of the array directly through the index? To do this, consider a function that converts different keys to an index of the corresponding array length, and each key converts to a different index. And this function is O(1) level of data complexity, so when you need to store a key-value pair in an array, all you need to do is use this function to calculate the corresponding index in the array, and then store the key-value pair in the corresponding location in the array. When you want to fetch a key from an array, you just need to use this function to calculate the index of the key in the array, and then use the index to fetch the key from the array. Save, fetch, and search are all O(1) time complexities.

I don’t need to explain what a hash table is, but that’s how a hash table works, a static array plus a function that evaluates index by key, is a hash table. Here is a look at the data storage process of the hash table analyzed above:

We have two key-value pairs that need to be stored in our custom dictionary. Key is the name and value is the age, which are Whip:18 and Jack:20 respectively.

If Whip:18 is stored, the hash function is used to calculate that the index of the hash table corresponding to the Whip is 2. Then, a node object in the hash table is created with key pointing to the Whip and value pointing to 18. Then, the node is stored at the position with index 2 in the hash table.

When storing Jack:20, first calculate the index of Jack in the hash table as 6 through the hash function, then create a node object of the hash table with key pointing to Jack and value pointing to 20, and then store the node at the position of index 6 in the hash table.

Let’s look at the process of evaluating a hash table:

To obtain the age of the Whip, the hash table uses the hash function to calculate the location where the Whip is stored: index = 2. Then, the hash table directly takes the node where the Whip is stored and returns the value of the node 18.

The hash function

Since we need a hash function to calculate the index of different keys in the array, we need to implement a hash function first. To qualify, a hash function must satisfy the following conditions:

  • Different keys produce as different an index as possible.
  • The generated index must be within the length of the hash table.

In Objective-C, the NSObject object comes with a hash method that returns the hash value of an object:

Person *p1 = [Person personWithAge:1];
Person *p2 = [Person personWithAge:1];
NSLog(@"%zd %zd", p1.hash, p2.hash); // Print 4345476000 4345477856Copy the code

You can see that even objects of the same type with the same attribute value return different hashes. We can also override the hash method of our custom object to return the desired hash value, so that objects with the same property value return the same hash value:

- (NSUInteger)hash {
    NSUInteger hashCode = self.age; / / odd prime NumbershashCode * 31 == (hashCode<<5) - hashCode
    hashCode = (hashCode<<5) - hashCode;
    return hashCode;
}
Copy the code

The above calculation is done the way the JDK does, multiplying an integer by 31 to get the hash value. Since 31 is an odd prime, multiplying by it makes it easier to achieve uniqueness and reduce collisions.

Since the hash method has been overwritten, it needs to be matched with the isEqual method of the overwritten object. A correct judgment logic needs to satisfy:

  • Two objects isEqual: If the value is true, the hash value returned by the two objects must be equal.
  • Two objects whose hashes are equal, isEqual: not necessarily equal.

Imagine if two objects are equal: returns true, which means they are equal, but their hashes do not want to be equal, which means they may be stored in different places in the hash table, which is equivalent to having two key-value pairs with the same key in a Map or AN NSDictionary, which is obviously illogical.

Now that we know how to use the default and custom methods to return a key, let’s calculate its index in an array using the key hash. First, the hash value of key is also an integer, and the length is not necessarily the same. For example, the default return is 4345477856. After we customize the hash method, 31 is returned. Assuming that our array is only 8 in length, we certainly cannot treat the hash of the key as the index of the array.

If the length of the array is 2^3 = 8, 8-1 = 7, the corresponding binary bit is: 0111.

Here we do ampersand with 0111:

// Decimal 253 1111 1101&0111 --------- 0101 = 5 // Decimal 31 0001 1111&0111 --------- 0111 = 7 // Decimal 8 0000 1000&0111 -- -- -- -- -- -- -- -- -- 0000 = 0Copy the code

If the key hash is ampersand of length -1 (a power of 2), then the index of the key hash can be obtained:

- (NSUInteger)indexWithKey:(id)key {
    if(! key)return 0;
    NSUInteger hash = [key hash];
    return hash & (self.array.length - 1);
}
Copy the code

Now this function is not very useful, for example, if the hash table is 2^6, 2^ 6-1 = 63, and the binary bits are 0011 1111

// 6390901416293117903 0101 1000 1011 0001 0000 1010 1111 1010 0100 1000 1011 0001 0010 1111 1100 1111&011 1111 ---------------------- 000 1111 = 15 // 6390901416293511119 0110 1000 1011 0010 0000 1010 1111 1010 0100 1000 1011 0111 0010 1111 1100 1111 & 011, 1111 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 000, 1111 = 15Copy the code

Since the last 7 binary bits of 6390901416293117903 and 6390901416293511119 are 1001111, the result of the bit-operation with 011 1111 is 000 1111 = 15, and the high position is not involved in the operation. The result is that the index of the array will be the same as long as the last 7 hash digits are the same, and we should try to make all the hash digits count.

The hash value is shifted 16 bits to the right, and the hash value is ^, and the array length is -1.

(6390901416293117903 ^ (6390901416293117903 >> 16)) & 63); // 62
(6390901416293511119 ^ (6390901416293511119 >> 16)) & 63); // 56
Copy the code

Here is the hash function of the optimized final hash table:

- (NSUInteger)indexWithKey:(id)key {
    if(! key)return 0;
    NSUInteger hash = [key hash];
    return (hash ^ (hash >> 16)) & (self.array.length - 1);
}
Copy the code

Although the above optimization can avoid the calculation of the same index with different hashes to a certain extent, it still cannot completely avoid, for example: 6390901416293117903 and 6681946542211936207, because the last eight bits of binary are the same, and the last eight bits of the right shift are the same, and the last few bits of the 63 are the same, so it’s inevitable that there will be different keys computed by the hash function, Same thing with index in an array.

Hash conflict

The key value pair needs to be stored in a hash table. After index is calculated using key, other key value pairs are already stored in the hash table. The problem occurs when the key of the index pair in the array is the same as the key of the key-value pair that is being added by the hash function. There are two types of problems:

  • The current key is the same as the key at the index position in the array, so you just need to overwrite the new key.
  • The current key is not the same as the key at the index position in the array.

The second situation is the so-called hash conflict, which is inevitable, and there are many ways to solve the hash conflict:

  • Index is evaluated again by other rules until an empty position is found.
  • Find the first non-empty position before or after index.
  • Still place the element at index and store the two elements as a linked list, etc.

We use the third method, that is, to find conflicts, in the form of a linked list of all elements in an index, as shown in the following figure:

When storing Jack:20, index = 2 was calculated by hash function, and it was found that the position with index = 2 in the hash table already had other nodes. In this case, next of the last node was pointed to the new node.

The value in this case is shown below:

First, the index corresponding to the key is calculated by the hash function, and then the first node corresponding to the array is found, and the key stored by the node is compared with whether the key queried is equal. If not, the next node is found by the next pointer of the node, and the judgment process is repeated until the node with the same key is found.

The basic structure of hash tables

Through the above analysis, the structure of the hash table is already clear. First, the hash table is a static array, and the data stores the nodes of the hash table. The nodes of the hash table are one-way linked list structure, and each node points to the next node through the next pointer. We also need to implement a hash function that uses key to calculate its position in the hash table, which we’ve already implemented.

Here I have encapsulated the hash table as a dictionary-like class, the external interface is completely consistent with the system’s NSMutableDictionary, the implementation of the same function. Start by creating a JKRHashMap_LinkedList class.

_size is used to store the number of key/value pairs in the current hash table.

#import <Foundation/Foundation.h>NS_ASSUME_NONNULL_BEGIN @interface JKRHashMap_LinkedList<KeyType, ObjectType> : NSObject{@protected // number of nodes NSUInteger _size; } /// Number of elements - (NSUInteger)count; // clear all elements - (void)removeAllObjects; - (void)removeObjectForKey:(KeyType)key; /// Add an element - (void)setObject:(nullable ObjectType)object forKey:(nullable KeyType)key; // get element - (nullable ObjectType)objectForKey:(nullable KeyType)key; - (BOOL)containsObject:(nullable ObjectType)object; /// containsKey - (BOOL)containsKey:(nullable KeyType)key; @end @interface JKRHashMap_LinkedList<KeyType, ObjectType> (JKRExtendedHashMap) - (nullable ObjectType)objectForKeyedSubscript:(nullable KeyType)key; - (void)setObject:(nullable ObjectType)obj forKeyedSubscript:(nullable KeyType)key;

@end


NS_ASSUME_NONNULL_END
Copy the code

Then create the node object of the hash table:

@interface JKRHashMap_LinkedList_Node : NSObject

@property (nonatomic, strong) id key;
@property (nonatomic, strong) id value;
@property (nonatomic, strong) JKRHashMap_LinkedList_Node *next;

@end
Copy the code

Basic functions of hash tables

Initialize the

First, there should be a static array in the hash table. Since Objective-C does not provide one, a static array was implemented in advance in the first article. It needs to be created when the hash table is initialized and has a power of two:

@interface JKRHashMap_LinkedList ()

@property (nonatomic, strong) JKRArray *array;

@end

@implementation JKRHashMap_LinkedList

- (instancetype)init {
    self = [super init];
    self.array = [JKRArray arrayWithLength:1 << 4];
    return self;
}

@end
Copy the code

The hash function

You can use it directly:

- (NSUInteger)indexWithKey:(id)key {
    if(! key)return 0;
    NSUInteger hash = [key hash];
    return (hash ^ (hash >> 16)) & (self.array.length - 1);
}
Copy the code

Find nodes by key

As the search process analyzed above,

  • 1, first find the index corresponding to the key through the hash function.
  • 2. Obtain the node at the corresponding position in the hash table through index.
  • 3. If the node is empty, the key is not in the hash table, and null is returned.
  • 4. If the node exists, check whether the query key of the node is equal. If the node is equal, the query key of the node is returned directly.
  • 5, get the next node of this node, repeat step 3.
- (JKRHashMap_LinkedList_Node *)nodeWithKey:(id)key {
    NSUInteger index = [self indexWithKey:key];
    JKRHashMap_LinkedList_Node *node= self.array[index];
    while (node) {
        if (node.key == key || [node.key isEqual:key]) {
            return node;
        } else if (key && node.key && [key class] == [node.key class] && [key respondsToSelector:@selector(compare:)] && [key compare:node.key] == 0){
            return node;
        }
        node = node.next;
    }
    return node;
}
Copy the code

Obtain an object using the key

The value of the returned node is returned as follows:

- (id)objectForKey:(id)key {
    JKRHashMap_LinkedList_Node *node = [self nodeWithKey:key];
    return node ? node.value : nil;
}
Copy the code

Whether to store contain keys

The node returned by key is null;

- (BOOL)containsKey:(id)key {
    return[self nodeWithKey:key] ! = nil; }Copy the code

Returns the number of key-value pairs in the hash table

Just return _size:

- (NSUInteger)count {
    return _size;
}
Copy the code

Add key-value pairs

Steps for adding key-value pairs:

  • 1. Use key to calculate index.
  • 2. Fetch the node corresponding to the hash table through index.
  • 3, if the node is empty, create a new node and store the passed key and value, and store the node pointer to the hash table, _size++. Otherwise, go to the next step.
  • 4. Determine whether the key of the node is equal to the incoming key. If so, overwrite the key and value of the node with the incoming key and value. Otherwise, go to the next step.
  • If the node is empty, create a new node, store the key and value to the key and value of the new node, and point the next preNode to the new node, _size++. Otherwise, return to Step 4.
- (void)setObject:(id)object forKey:(id)key {
    NSUInteger index = [self indexWithKey:key];
    JKRHashMap_LinkedList_Node *node = self.array[index];
    if(! node) { node = [JKRHashMap_LinkedList_Node new]; node.key = key; node.value = object; self.array[index] = node; _size++;return;
    }
    
    JKRHashMap_LinkedList_Node *preNode = nil;
    while (node) {
        if (node.key == key || [node.key isEqual:key]) {
            break;
        } else if (key && node.key && [key class] == [node.key class] && [key respondsToSelector:@selector(compare:)] && [key compare:node.key] == 0) {
            break;
        }
        preNode = node;
        node = node.next;
    }
    
    if (node) {
        node.key = key;
        node.value = object;
        return;
    }
    
    JKRHashMap_LinkedList_Node *newNode = [JKRHashMap_LinkedList_Node new];
    newNode.key = key;
    newNode.value = object;
    preNode.next = newNode;
    _size++;
}
Copy the code

Delete key-value pairs

  • 1. Use key to calculate index.
  • 2. Fetch the node corresponding to the current hash table through index.
  • 3. Check whether the node is empty. If the node is empty, return directly. Otherwise, go to the next step.
  • 4, check whether the key of this node is equal to the key passed in, if so, delete the node, _size–. Otherwise, go to the next step.
  • 5. Get the next node of this node and repeat Step 4.
- (void)removeObjectForKey:(id)key {
    NSUInteger index = [self indexWithKey:key];
    JKRHashMap_LinkedList_Node *node= self.array[index];
    JKRHashMap_LinkedList_Node *preNode = nil;
    while (node) {
        if (node.key == key || [node.key isEqual:key]) {
            if (preNode) {
                preNode.next = node.next;
            } else {
                self.array[index] = node.next;
            }
            _size--;
            return;
        } else if (key && node.key && [key class] == [node.key class] && [key respondsToSelector:@selector(compare:)] && [key compare:node.key] == 0){
            if (preNode) {
                preNode.next = node.next;
            } else {
                self.array[index] = node.next;
            }
            _size--;
            return; } preNode = node; node = node.next; }}Copy the code

Whether an element is contained

Because the value of the hash table is stored in the node, and its location cannot be directly found, it can only be achieved by traversing all the nodes of the hash table:

- (BOOL)containsObject:(id)object {
    if (_size == 0) {
        return NO;
    }
    
    for (NSUInteger i = 0; i < self.array.length; i++) {
        JKRHashMap_LinkedList_Node *node= self.array[i];
        while (node) {
            if (node.value == object || [node.value isEqual:object]) {
                returnYES; } node = node.next; }}return NO;
}
Copy the code

Let custom hash tables support dictionary operators

- (id)objectForKeyedSubscript:(id)key {
    return [self objectForKey:key];
}

- (void)setObject:(id)obj forKeyedSubscript:(id)key {
    [self setObject:obj forKey:key];
}
Copy the code

Overwrite printing makes it easy to view the hash table structure

- (NSString *)description {
    NSMutableString *string = [NSMutableString string];
    [string appendString:[NSString stringWithFormat:@"<%@, %p>: \ncount:%zd length:%zd\n{\n", self.className, self, _size, self.array.length]];
    for (NSUInteger i = 0; i < self.array.length; i++) {
        [string appendString:[NSString stringWithFormat:@"\n\n--- index: %zd ---\n\n", i]];
        JKRHashMap_LinkedList_Node *node= self.array[i];
        if (node) {
            while (node) {
                [string appendString:[NSString stringWithFormat:@"[% @ : % @ - > % @ % @]", node.key , node.value, node.next ? [NSString stringWithFormat:@"% @.", node.next.key] : @"NULL", node.next ? node.next.value : @""]];
                node = node.next;
                if (i) {
                    [string appendString:@","]; }}}else {
            [string appendString:@""];
            [string appendString:@"NULL"];;
        }
    }
    [string appendString:@"\n}"];
    return string;
}

Copy the code

Functional testing of hash tables

JKRHashMap_LinkedList *dic = [JKRHashMap_LinkedList new];
for (NSUInteger i = 0; i < 30; i++) {
    NSString *key = getRandomStr();
    dic[key] = [NSString stringWithFormat:@"%zd", i];
}
NSLog(@"% @", dic); // Print: <JKRHashMap_LinkedList, 0x102814A10 >: count:30 length:16 {-- index: 0 --- [Wlqvuq:2 -> Xecsbw:9] [Xecsbw:9 -> Kvfexi:11] [Kvfexi:11 -> NULL] --- index: 1 --- [Ifaeuy:15 -> NULL] , --- index: 2 --- [Bmitqy:3 -> Ynqbcw:12] , [Ynqbcw:12 -> NULL] , --- index: 3 --- [Djwmew:0 -> Epzzlc:4] , [Epzzlc:4 -> Jqjrvq:22] , [Jqjrvq:22 -> NULL] , --- index: 4 --- [Myvwre:28 -> NULL] , --- index: 5 --- [Mrgpfv:8 -> Ltdazq:25] , [Ltdazq:25 -> Tzweni:27] , [Tzweni:27 -> NULL] , --- index: 6 --- NULL --- index: 7 --- [Eyvque:5 -> Ltmzik:24] , [Ltmzik:24 -> NULL] , --- index: 8 --- [Rvnupm:7 -> NULL] , --- index: 9 --- [Ryrort:16 -> NULL] , --- index: 10 --- [Rsdkaw:1 -> Hgszuk:20] , [Hgszuk:20 -> Jtrtes:26] , [Jtrtes:26 -> NULL] , --- index: 11 --- [Txonlm:29 -> NULL] , --- index: 12 --- [Bvbdbe:14 -> NULL] , --- index: 13 --- [Pszvix:6 -> Dtizif:19] , [Dtizif:19 -> Czkxyj:21] , [Czkxyj:21 -> Kzatxv:23] , [Kzatxv:23 -> NULL] , --- index: 14 --- [Ustobp:10 -> Erqclk:13] , [Erqclk:13 -> Fbliqs:17] , [Fbliqs:17 -> Jpcvbm:18] , [Jpcvbm:18 -> NULL] , --- index: 15 --- NULL }Copy the code

As you can see, the values in the hash table are evenly scattered across the array and stored as a one-way linked list when hash conflicts occur.

Compare hash table performance test with NSMutableDictionary

Since to do performance contrast test, first of all need data, used here apple objc4 released the source code, the official download address: opensource.apple.com/tarballs/ob… The runtime source folder is used as a resource file:

Read the code of all the files in it, take out all the words in it, and count the number of occurrences of different words.

This requirement can just use the dictionary or our self-defined hash table to take the word as key and the number of occurrences of the word as value. The calculation logic is as follows:

  • First read all the files and intercept all the words (including duplicates).
  • Iterate over all the words, adding the words as keys to the hash table.
  • If the value is obtained, the value is the obtained value +1; otherwise, the value is 1. The key and value are stored in the hash table.

This way, when you add all the words in sequence, the hash table stores the number of occurrences of each word.

Take out all the words first:

NSMutableArray * allFileStrings() {
    NSFileManager *fileManager = [NSFileManager defaultManager];
    NSError *fileManagerError;
    NSString *fileDirectory = @"/Users/Lucky/Documents/SourceCode/runtime";
    
    NSArray<NSString *> *array = [fileManager subpathsOfDirectoryAtPath:fileDirectory error:&fileManagerError];
    if (fileManagerError) {
        NSLog(@"Failed to read folder");
        nil;
    }
    NSLog(@"File path: %@", fileDirectory);
    NSLog(@"File number: %zd", array.count); NSMutableArray *allStrings = [NSMutableArray array]; [array enumerateObjectsUsingBlock:^(NSString * _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) { NSString *filePath = [fileDirectory stringByAppendingPathComponent:obj]; NSError *fileReadError;  NSString *str = [NSString stringWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:&fileReadError];if (fileReadError) {
            return;
        }
        [str enumerateSubstringsInRange:NSMakeRange(0, str.length) options:NSStringEnumerationByWords usingBlock:^(NSString * _Nullable substring, NSRange substringRange, NSRange enclosingRange, BOOL * _Nonnull stop) {
            [allStrings addObject:substring];
        }];
    }];
    NSLog(@"Number of words: %zd", allStrings.count);
    return allStrings;
}
Copy the code

Then iterate through and add to the hash table:

JKRHashMap_LinkedList *map = [JKRHashMap_LinkedList new]; [JKRTimeTool teskCodeWithBlock:^{ NSMutableDictionary *map = [NSMutableDictionary new];  [allStrings enumerateObjectsUsingBlock:^(id _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) { NSNumber *count = map[obj];if (count) {
            count = [NSNumber numberWithInteger:count.integerValue+1];
        } else {
            count = [NSNumber numberWithInteger:1];
        }
        map[obj] = count;
    }];
    NSLog(@"NSMutableDictionary counts the number and occurrences of non-repeating words %zd", map.count);
    NSLog(@"NSMutableDictionary counts the number of occurrences of words NSObject: %@", map[@"NSObject"]);
    NSLog(@"NSMutableDictionary counts the number of occurrences of a word include: %@", map[@"include"]);
    NSLog(@"NSMutableDictionary counts the number of occurrences of a word return: %@", map[@"return"]);
    
    __block NSUInteger allCount = 0;
    [allStrings enumerateObjectsUsingBlock:^(id  _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
        allCount += [map[obj] integerValue];
        [map removeObjectForKey:obj];
    }];
    
    NSLog(@"NSMutableDictionary adds up all words %zd", allCount); }]; // Print: Number of files: 104 Number of all words: 165627 JKRHashMap_LinkedList Counts number of non-repeating words and occurrences 34 JKRHashMap_LinkedList Counts the number of occurrences of words include: 379 JKRHashMap_LinkedList counts the number of occurrences of wordsreturnJKRHashMap_LinkedList: 2681 JKRHashMap_LinkedList: 14.768 sCopy the code

Use NSMutableDictionary to test this:

[JKRTimeTool teskCodeWithBlock:^{ NSMutableDictionary *map = [NSMutableDictionary new];  [allStrings enumerateObjectsUsingBlock:^(id _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) { NSNumber *count = map[obj];if (count) {
            count = [NSNumber numberWithInteger:count.integerValue+1];
        } else {
            count = [NSNumber numberWithInteger:1];
        }
        map[obj] = count;
    }];
    NSLog(@"NSMutableDictionary counts the number and occurrences of non-repeating words %zd", map.count);
    NSLog(@"NSMutableDictionary counts the number of occurrences of words NSObject: %@", map[@"NSObject"]);
    NSLog(@"NSMutableDictionary counts the number of occurrences of a word include: %@", map[@"include"]);
    NSLog(@"NSMutableDictionary counts the number of occurrences of a word return: %@", map[@"return"]);
    
    __block NSUInteger allCount = 0;
    [allStrings enumerateObjectsUsingBlock:^(id  _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
        allCount += [map[obj] integerValue];
        [map removeObjectForKey:obj];
    }];
    
    NSLog(@"NSMutableDictionary adds up all words %zd", allCount); }]; // Print: Number of files: 104 Number of all words: 165627 NSMutableDictionary counts number of non-repeating words and occurrences 34 NSMutableDictionary counts the number of occurrences of words include: 379 NSMutableDictionary counts the number of occurrences of wordsreturn: 2681 NSMutableDictionary total number of words 165627 Time: 0.058sCopy the code

It turns out that all the calculations are the same, proving that our hash table can indeed perform the same function as the system’s NSMutableDictionary, but with a huge time difference: 14.768s vs. 0.058s, which is, well, unbearably slow.

Hash table optimization – capacity expansion

The default capacity of our hash table is 1 << 4 = 16. 10490 pieces of data are scattered in the array of length only 16. The average location of the array stores 655 nodes, that is, the average length of each one-way linked list reaches 655. However, the storage, fetching and reading of the hash table is traversed from the beginning node of the one-way linked list. Therefore, when there are too many hash table elements and the length of the linked list is longer, the traversal time is bound to be longer.

To prevent each list from getting too long, we need to expand the hash array as soon as it reaches a certain number of elements, much like a dynamic array expansion operation. At the same time, if the length of the hash table array changes, the index calculated by the hash function corresponding to each key will inevitably change, so the nodes stored in the hash table need to be repositioned in the hash table.

So when to expand the capacity of the hash table? According to scientific statistics, when the number of elements stored in the hash table is greater than the length of the array * 0.75, the expansion is optimal, we use this rule.

Add a capacity expansion method at the beginning of the method to add key-value pairs:

- (void)setObject:(id)object forKey:(id)key {
    [self resize];
    // ...
}
Copy the code

If the number of elements in the array is smaller than the length of the hash table x 0.75, do not expand the array; otherwise, expand the array.

- (void)resize {
    if (_size <= self.array.length * 0.75) return;
    
}
Copy the code

Capacity expansion requires the creation of a new array with a larger capacity, here we use the expanded array is twice the original array:

JKRArray *oldArray = self.array;
self.array = [JKRArray arrayWithLength:oldArray.length << 1];
Copy the code

We need to rearrange all the nodes in the original array into the hash table. In this case, we reuse the original nodes and only rearrange their positions, instead of taking out the values one by one and adding them to the hash table again. In this way, we need to rebuild all the nodes and save unnecessary overhead:

for (NSUInteger i = 0; i < oldArray.length; i++) {
    JKRHashMap_LinkedList_Node *node = oldArray[i];
    while(node) { JKRHashMap_LinkedList_Node *moveNode = node; node = node.next; moveNode.next = nil; // Rearrange the nodes [self moveNode:moveNode]; }}Copy the code

The logic for rearranging nodes is as follows:

  • 1. Obtain the key of the node to calculate index.
  • 2. Fetch the node through index.
  • 3. Check whether the node is empty. If it is empty go to step 4, otherwise go to step 5.
  • 4, place the node in the array index position.
  • 5, iterate to the last node in turn, and point next of the last node to the node.

The complete expansion logic is as follows:

- (void)resize {
    if (_size <= self.array.length * 0.75) return;
    JKRArray *oldArray = self.array;
    self.array = [JKRArray arrayWithLength:oldArray.length << 1];
    for (NSUInteger i = 0; i < oldArray.length; i++) {
        JKRHashMap_LinkedList_Node *node = oldArray[i];
        while (node) {
            JKRHashMap_LinkedList_Node *moveNode = node;
            node = node.next;
            moveNode.next = nil;
            [self moveNode:moveNode];
        }
    }
}

- (void)moveNode:(JKRHashMap_LinkedList_Node *)newNode {
    NSUInteger index = [self indexWithKey:newNode.key];
    JKRHashMap_LinkedList_Node *node = self.array[index];
    if(! node) { self.array[index] = newNode;return;
    }
    JKRHashMap_LinkedList_Node *preNode = nil;
    while (node) {
        preNode = node;
        node = node.next;
    }
    preNode.next = newNode;
}
Copy the code

Performance comparison after capacity expansion

Repeat the preceding test after capacity expansion. The following output is displayed:

NSMutableDictionary: 0.066s JKRHashMap_LinkedList: 0.192sCopy the code

The time has been reduced from 14.768 seconds to 0.192 seconds.

The following

The hash table realized only by using one-way linked list cannot guarantee the search speed in all cases. When there are problems in the calculation of hash function or when the amount of data is very large, it is likely to have a very long one-way linked list.

In the JDK’s open source hash table solution, when the length of a single list exceeds a certain value, the linked list is converted into a red-black tree to solve this problem. Later, the hash table will be re-implemented in a red-black tree way. But before we do that, we’ll cover the implementation of queues and stacks, as they are the basis for binary tree operations.

The source code

Click to view the source code –