Introduction to hash tables
1. Concepts related to hashing
Hash, we say that a Hash or a Hash is really the same thing
A Hash is an input of arbitrary length that, through a Hash algorithm (Hash algorithm), is transformed into a fixed-length output, which is the Hash value (Hash value).
- This transformation is a compression mapping (the space for hash values is usually smaller than the space for input)
If the hash value calculated from the same hash function is different, the input value must be different
If the hash value calculated from the same hash function is the same, the hash value may not be the same, that is, different inputs may output the same hash
When two different inputs compute the same hash value from the same hash function, the phenomenon is called a collision (there is a collision resolution method, described below).
2. Data structure
- Array features: easy to address, difficult to insert and delete
- Linked list features: difficult to address, easy to insert and delete
Combine the characteristics of both to achieve a data structure that is easy to address and easy to insert and delete — this is a HashMap, or hash table, which we can think of as an array of linked lists
From the above figure, we can see that each member of the array is a linked list. We assign the elements to different linked lists according to the characteristics of each element. In turn, we find the correct linked list by these characteristics, and then find the corresponding elements from the linked list
Among them, the method of calculating the subscript of element array according to element characteristics is hashing algorithm
Hash functions
When a hash table is used to query, a hash function is used to convert the key to the corresponding array subscript. The following three hash functions are commonly used
- Division and hash
Index = value % 16 to find the modulus, we call it division because we do it by division
- Square hash
- Fibonacci hashing
(I’m not going to go into the depth here, there’s a nod of the head)
4. Resolve the collision
Conflict resolution techniques fall into two categories:
- Open hash (also known as the zipper method) : Stores the conflicting keys outside the main table of the hash table
- The separation of link
- Closed hash (also known as open address) : The conflicting keys are stored in another slot in the table
- Linear detection
- Double hash
- Random hash
Linear detection
Linear detection is basically a linear search for empty slots when conflicts occur
- Index = H (K)
- Index = (index + 1) mod M (M is the size of the table)
For example: hash table size M = 7, hash function: H(K) = K mod M insert these values: 701, 145, 217, 19, 13, 749 H(K) = 701%7 = 1 H(K) = 145%7 = 5 H(K) = 217%7 = 0 H(K) = 19%7 = 2 H(K) = 13%7 = 1(conflict) --> 2(already have value) --> Three (3) insert position H (K) = 749% 7 = 2 (conflict) -- - > 3 have been (value) - - > 4 (insert point 4)Copy the code
It can be seen that with data insertion, the number of probe traversal times will gradually decrease, and the retrieval process is called exhaustive
Double hash
The offset to the next detected location depends on the key value, so the position can be different for different keys
We need to introduce a second hash function, H 2 (K), to be used as the offset in the detection sequence (so to speak, linear detection is the double hash of H 2 (K) = 1)
For hash tables of size M, H2 (K) has values in the range 1 to m-1. If M is prime, a common choice is H2 (K) = 1 + ((K/M) mod (m-1)).
- Index = H (K), offset = H 2 (K)
- If location index already has a key, then
Index = (index + offset) mod M
(M is the size of the table)
Random hash
In contrast to dual hashing applications, random hashing avoids clustering by making the detection sequence dependent on the key
With random hashes, the probe sequence is generated from the output of a keyseeded pseudorandom number generator
- Create RNG with K as seed, set
index = RNG.next() mod M
- If location index already has a key, then
index = RNG.next() mod M
Split link (zipper method)
The key is to put all the elements in the same hash slot (each slot in the array) into a linked list
The index is computed by the hash function
- If there is no element in the index slot, insert it directly
- If you have elements in the slot
- If the key value is different, the data is inserted into the head of the list
- If the key values are the same, the values are updated
Suppose the hash function is H (K) = K mod 8
- No hashing occurs
Directly inserted into the
- Hash clash but different key
Insert into the chain header
- A hash conflict occurs and the key is the same
Update the value
Hash table application in iOS
1. The weak form
Weak uses a global HashMap. When an object is destroyed, an array of Pointers to weak is found in the hash table based on the object, and all elements in the array are set to nil
Basic steps
- (dealloc) Find an array of weak Pointers to the global hashMap, based on a unique value representing the object as the key
- Sets all elements of the array (weak Pointers) to nil
2. Runloop
Threads and runloops have a one-to-one relationship that is stored in a global Dictionary
3. The other
Refer to the blog
Map Overview part 1: Understand HashMap from top to bottom
The most thorough hash() analysis of the Map in the entire web.
Data structure used in iOS _ hash table
You should also know about hashing conflict resolution strategies
The zipper method resolves hash conflicts and several common hash functions