What is a Hash table
Let’s look at the structure of the hash table:
Array + linked list
A Hash table (also called a Hash table) is a data structure that accesses data stored in memory based on a Key. That is, it accesses records by computing a function on key values that maps the data for the desired query to a location in the table, which speeds up lookups. This mapping function is called a hash function, and the array of records is called a hash table
In plain English, the Key is converted to an integer number by a fixed algorithm function (hash function), and then the number is mod to the length of the array. The remainder result is used as the index of the array, and the value is stored in the array space with the number as the index.
When querying using a hash table, the hash function will be usedkey
Convert to the corresponding array subscript, and locate in the array space of the subscript to obtain value, so as to make full use of the array positioning performance for data positioning.
(Read the above words carefully.)
Let’s get to know what the following key words are:
Key: We enter the value to be found
Value: The content we want to get
Hash: key The value computed by the hash function (modulo the array length to obtain the array subscript)
Hash function: There is a function F that can directly determine the location of the search value based on this function and the search key without iterating through the comparison. In this way, the location of the key is known in advance, and the data can be directly found to improve efficiency. That is, the hash function index=F(key) calculates the location of the storage address based on the key. A hash table is a lookup table created based on the hash function.
Hash function constructor
methods
There are many kinds of methods, such as direct addressing method, numerical analysis method, square method, folding method, random number method, division residual method, etc., there are many online related introduction, here will not focus on this
Considerations for designing hash functions
- The time required to compute the hash address (there is no need for a complicated function to compute it)
- The length of the keyword
- Long table
- Whether the distribution of keywords is even, whether there are rules to follow
- Minimize conflict
Hash conflict
What is a hash conflict
The same hash address may be obtained for different keywords, that is, K1 ≠ K2, and F (k1)= F (k2), or f(k1) MOD capacity = F (k2) MOD capacity, which is called collision, also known as conflict.
By building a well-behaved hash function, you can reduce conflicts, but it is generally impossible to avoid them completely, so resolving conflicts is another key issue for hash tables.
Conflicts occur when creating and searching hash tables, and the methods for resolving conflicts should be the same in both cases.
Resolving Hash Conflicts
- Open addressing
The basic idea of this method is as follows: When the hash address p=F(key) of the key keyword conflicts, another hash address p1 is generated based on P. If P1 still conflicts, another hash address p2 is generated based on P. Find a non-conflicting hash address PI and store the elements there.
General rehash function form:
H = (F(key) + di) MOD m
Where I = 1,2… , m-1 is the number of collisions
M is the length of the table.
F is a hash function.
Di is an incremental sequence, and the corresponding rehash mode varies with the value mode of the incremental sequence.
1) Linear detection and hashing
Di = 1,2,3… , m – 1
When a conflict occurs, the next cell in the table is viewed sequentially until an empty cell is found or the entire table is searched.
2) Second detection and hash
di = 1^2, -1^2, 2^2, -2^2,… , k^2, -k^2 (k <= m-1)
In the event of a conflict, it is more flexible to jump around the table.
3) Pseudo-random number detection and hashing
Di = pseudo-random sequence
Here is a list on the web: there is a hash table of length 11 with three records filled with the keywords 17, 60 and 29. The hash function is f(key)= key MOD 11. A fourth record is available with the keyword 38. According to the above hash algorithm, the hash address is 5, which is the same as the hash address of keyword 60, resulting in a conflict. Depending on how incremental D is picked, there are three scenarios:
Linear detection: When a conflict occurs, f(key) + D, 5 + 1 = 6, and the next hash address is 6, and the conflict occurs again, and so on. Finally, the free hash address is 8, and the data is filled into the free hash address is 8.
Second detection: When a conflict occurs, the next hash address is 6 because d = 1^2, so 5 + (-1) = 6 because d = -1^2, so 5 + (-1) = 4. The next hash address is 4 because d = -1^2, so the next hash address is 4.
Pseudorandom number detection method: the random number method is completely determined by the pseudorandom sequence. If a random number seed is obtained by a pseudorandom sequence {1, -2, 2… , k}, then the first address is 6, the second address is 3, and so on. If it is idle, the data will be filled in.
There are many applications of open addressing in iOS. For details, please refer to the underlying implementation principles of note-set NSSet and dictionary NSDictionary
- Chain address method (zipper method, bit bucket method)
Store the data for the conflicting keywords in a linear linked list of conflicting hash addresses. When implemented, one strategy is to place all conflicts in the hash table on a stack, with new elements inserted at the front or back of the table depending on how convenient it is.
Load factor
There are two parameters mentioned here: initial capacity and load factor, which are important parameters that affect the performance of the hash table.
Capacity: Indicates the length of the array in the hash table. The initial capacity is the capacity when the hash table is created.
Load factor: A measure of how full a hash table can be before its capacity automatically increases (the number of stored elements). It measures how much space is used in a hash table.
LoadFactor = loadFactor/capacity
In general, the expected complexity of hash table lookups is O(1) when loadFactor <= 1.
For hash tables using linked lists, the larger the load factor is, the more space is utilized, and the result is the lower efficiency of lookups. If the load factor is too small, the data in the hash table will be too sparse, resulting in a serious waste of space. The default load factor is 0.75.
capacity
As the number of elements in the hash table increases, the probability of collisions increases (because the array length is fixed), so you need to expand the array to improve query efficiency. After the array is expanded, the most expensive point of performance occurs, and the data in the original array must be recalculated and inserted into the new array. This is called expansion.
When is the expansion going to happen? Array expansion occurs when the number of elements in the table exceeds the capacity * loadFactor.
Using OC to roughly achieve the following expansion:
- (void)resizeOfNewCapacity:(NSInteger)newCapacity {
NSInteger oldCapacity = _elementArray.count;
if(oldCapacity == MAX_CAPACITY) {// If the array size before capacity expansion has reached the maximum 2^ 30_threshold = oldCapacity - 1; // Change the threshold to the maximum value of int (2^ 30-1), so that the capacity will not be expanded laterreturn; } / / initializes a new array NSMutableArray * newArray = [NSMutableArray arrayWithCapacity: newCapacity];for (int i = 0; i < newCapacity; i ++) {
[newArray addObject:@""]; } [self transferWithNewTable:newArray]; [_elementArray removeAllObjects]; [_elementArray addObjectsFromArray:newArray]; //hash_threshold = (NSInteger)_capacity * _loadFactor; // change the threshold} - (void)transferWithNewTable:(NSMutableArray *)array {// iterate over the old array to transfer the element to the new arrayfor (int i = 0; i < _elementArray.count; i ++) {
if ([[[_elementArray objectAtIndex:i] class] isEqual:[SingleLinkedNode class]]) {
SingleLinkedNode *node = _elementArray[i];
if(node ! = NULL) {do {
[self insertElementToArrayWith:array andNode:node];
node = node.next;
} while(node ! = NULL); } } } } - (void)insertElementToArrayWith:(NSMutableArray *)array andNode:(SingleLinkedNode *)node { NSInteger index = [node.keyintegerValue] % _capacity; // Calculates the position of each element in the new arrayif(! [[[array objectAtIndex:index] class] isEqual:[SingleLinkedNode class]]) { [array replaceObjectAtIndex:index withObject:node]; }else {
SingleLinkedNode *headNode = [array objectAtIndex:index];
while(headNode ! = NULL) { headNode = headNode.next; } // Insert the element directly into headNode.next = node; }}Copy the code
The implementation of OC language
Build the HashTable object
HashTable. H file
#import <Foundation/Foundation.h>@class SingleLinkedNode; NS_ASSUME_NONNULL_BEGIN @interface HashTable : NSObject @property (nonatomic, strong) NSMutableArray *elementArray; @property (nonatomic, assign) NSInteger capacity; // The capacity array (hashTable) length @property (nonatomic, assign) NSInteger modCount; @property (nonatomic, assign) @property (nonatomic, assign)floatthreshold; // threshold @property (nonatomic, assign)floatloadFactor; // Load factor /** Initialize Hash table @param capacity Array length @return hashTable * / - (instancetype) initWithCapacity (NSInteger) capacity; /** insert @param newNode into newNode */ - (void)insertElementByNode:(SingleLinkedNode *)newNode; /** Query @param keyreturnValue */ - (NSString *)findElementByKey (NSString *)key; @end NS_ASSUME_NONNULL_ENDCopy the code
HashTable. M file:
#define MAX_CAPACITY pow(2, 30)
#import "HashTable.h"
#import "SingleLinkedNode.h"
@implementation HashTable
- (instancetype)initWithCapacity:(NSInteger)capacity {
self = [super init];
if(self) { _capacity = capacity; _loadFactor = 0.75; _threshold = (NSInteger) _loadFactor * _capacity; _modCount = 0; // Initialize the array directly for ease of understandinghash, so a given capacity directly, the default is 16 in the Java _elementArray = [NSMutableArray arrayWithCapacity: capacity].for (int i = 0; i < capacity; i ++) {
[_elementArray addObject:@""]; }}return self;
}
- (void)insertElementByNode:(SingleLinkedNode *)newNode {
if (newNode.key.length == 0) {
return; } // Determine whether capacity expansion is requiredif(_threshold < _modCount * _capacity) { _capacity *= 2; [self resizeOfNewCapacity:_capacity]; } // Calculate the storage location NSInteger keyValue = [newNode.keyintegerValue]; // F(x) = x; gethashValue NSInteger index = keyValue % _capacity; //hashValue MOD capacity = array subscript newNode.hashValue = keyValue; // If the inserted region is free, the data is stored directly to that regionif(! [[[_elementArray objectAtIndex:index] class] isEqual:[SingleLinkedNode class]]) { [_elementArray replaceObjectAtIndex:index withObject:newNode]; _modCount++; }elseSingleLinkedNode *headNode = [_elementArray objectAtIndex:index];while(headNode ! = NULL) {// Insert a duplicate key, then overwrite the original elementif ([headNode.key isEqualToString:newNode.key]) {
headNode.value = newNode.value;
return; } headNode = headNode.next; } _modCount++; // Insert the element directly into headNode.next = newNode; } } - (NSString *)findElementByKey:(NSString *)key {if (key.length == 0) {
returnnil; } NSInteger keyValue = [keyintegerValue];
NSInteger index = keyValue % _capacity; // hashFunction keyValue % _capacity (0~9)if (index >= _capacity) {
return nil;
}
if(! [[[_elementArray objectAtIndex:index] class] isEqual:[SingleLinkedNode class]]) {return nil;
}elseSingleLinkedNode *headNode = [_elementArray objectAtIndex:index];while(headNode ! = NULL) {if ([headNode.key isEqualToString:key]) {
return headNode.value;
}
headNode = headNode.next;
}
return nil;
}
}
- (void)resizeOfNewCapacity:(NSInteger)newCapacity {
NSInteger oldCapacity = _elementArray.count;
if(oldCapacity == MAX_CAPACITY) {// If the array size before capacity expansion has reached the maximum 2^ 30_threshold = oldCapacity - 1; // Change the threshold to the maximum value of int (2^ 30-1), so that the capacity will not be expanded laterreturn; } / / initializes a new array NSMutableArray * newArray = [NSMutableArray arrayWithCapacity: newCapacity];for (int i = 0; i < newCapacity; i ++) {
[newArray addObject:@""]; } [self transferWithNewTable:newArray]; [_elementArray removeAllObjects]; [_elementArray addObjectsFromArray:newArray]; //hash_threshold = (NSInteger)_capacity * _loadFactor; // change the threshold} - (void)transferWithNewTable:(NSMutableArray *)array {// iterate over the old array to transfer the element to the new arrayfor (int i = 0; i < _elementArray.count; i ++) {
if ([[[_elementArray objectAtIndex:i] class] isEqual:[SingleLinkedNode class]]) {
SingleLinkedNode *node = _elementArray[i];
if(node ! = NULL) {do {
[self insertElementToArrayWith:array andNode:node];
node = node.next;
} while(node ! = NULL); } } } } - (void)insertElementToArrayWith:(NSMutableArray *)array andNode:(SingleLinkedNode *)node { // This method have no access to the new location of the array of success / / NSInteger index = [self indexForHashValue: node. HashValue andNewCapacity: array. Count]; NSInteger index = [node.keyintegerValue] % _capacity; // Calculates the position of each element in the new arrayif(! [[[array objectAtIndex:index] class] isEqual:[SingleLinkedNode class]]) { [array replaceObjectAtIndex:index withObject:node]; }else {
SingleLinkedNode *headNode = [array objectAtIndex:index];
while(headNode ! = NULL) { headNode = headNode.next; } // Insert the element directly into headNode.next = node; } } - (NSInteger)indexForHashValue:(NSInteger)hash andNewCapacity:(NSInteger)newCapacity {
return hash & (newCapacity - 1);
}
@end
Copy the code
SingleLinkedNode. H file:
#import <Foundation/Foundation.h>
NS_ASSUME_NONNULL_BEGIN
@interface SingleLinkedNode : NSObject <NSCopying>
@property (nonatomic, strong) NSString *key;
@property (nonatomic, strong) NSString *value;
@property (nonatomic, strong) SingleLinkedNode *next;
@property (nonatomic, assign) NSInteger hashValue;
- (instancetype)initWithKey:(NSString *)key value:(NSString *)value;
@end
NS_ASSUME_NONNULL_END
Copy the code
SingleLinkedNode. M file:
#import "SingleLinkedNode.h"
@implementation SingleLinkedNode
- (instancetype)initWithKey:(NSString *)key value:(NSString *)value {
if (self = [super init]) {
_key = key;
_value = value;
}
return self;
}
@end
Copy the code
The code above may look a bit messy, but if you are interested you can download the Demo portal here
Conclusion: This article is mainly about understanding hash table and its implementation features, and using OC language to implement hash table in a simple and rough way. If there is any error, please leave a message and let me know. Thank you.