Introduction to hash tables

1. Concepts related to hashing

Hash, we say that a Hash or a Hash is really the same thing

A Hash is an input of arbitrary length that, through a Hash algorithm (Hash algorithm), is transformed into a fixed-length output, which is the Hash value (Hash value).

  • This transformation is a compression mapping (the space for hash values is usually smaller than the space for input)

If the hash value calculated from the same hash function is different, the input value must be different

If the hash value calculated from the same hash function is the same, the hash value may not be the same, that is, different inputs may output the same hash

When two different inputs compute the same hash value from the same hash function, the phenomenon is called a collision (there is a collision resolution method, described below).

2. Data structure

  • Array features: easy to address, difficult to insert and delete
  • Linked list features: difficult to address, easy to insert and delete

Combine the characteristics of both to achieve a data structure that is easy to address and easy to insert and delete — this is a HashMap, or hash table, which we can think of as an array of linked lists

From the above figure, we can see that each member of the array is a linked list. We assign the elements to different linked lists according to the characteristics of each element. In turn, we find the correct linked list by these characteristics, and then find the corresponding elements from the linked list

Among them, the method of calculating the subscript of element array according to element characteristics is hashing algorithm

Hash functions

When a hash table is used to query, a hash function is used to convert the key to the corresponding array subscript. The following three hash functions are commonly used

  • Division and hash

Index = value % 16 to find the modulus, we call it division because we do it by division

  • Square hash
  • Fibonacci hashing

(I’m not going to go into the depth here, there’s a nod of the head)

4. Resolve the collision

Conflict resolution techniques fall into two categories:

  • Open hash (also known as the zipper method) : Stores the conflicting keys outside the main table of the hash table
    • The separation of link
  • Closed hash (also known as open address) : The conflicting keys are stored in another slot in the table
    • Linear detection
    • Double hash
    • Random hash

Linear detection

Linear detection is basically a linear search for empty slots when conflicts occur

  1. Index = H (K)
  2. Index = (index + 1) mod M (M is the size of the table)
For example: hash table size M = 7, hash function: H(K) = K mod M insert these values: 701, 145, 217, 19, 13, 749 H(K) = 701%7 = 1 H(K) = 145%7 = 5 H(K) = 217%7 = 0 H(K) = 19%7 = 2 H(K) = 13%7 = 1(conflict) --> 2(already have value) --> Three (3) insert position H (K) = 749% 7 = 2 (conflict) -- - > 3 have been (value) - - > 4 (insert point 4)Copy the code

It can be seen that with data insertion, the number of probe traversal times will gradually decrease, and the retrieval process is called exhaustive

Double hash

The offset to the next detected location depends on the key value, so the position can be different for different keys

We need to introduce a second hash function, H 2 (K), to be used as the offset in the detection sequence (so to speak, linear detection is the double hash of H 2 (K) = 1)

For hash tables of size M, H2 (K) has values in the range 1 to m-1. If M is prime, a common choice is H2 (K) = 1 + ((K/M) mod (m-1)).

  1. Index = H (K), offset = H 2 (K)
  2. If location index already has a key, thenIndex = (index + offset) mod M(M is the size of the table)

Random hash

In contrast to dual hashing applications, random hashing avoids clustering by making the detection sequence dependent on the key

With random hashes, the probe sequence is generated from the output of a keyseeded pseudorandom number generator

  1. Create RNG with K as seed, setindex = RNG.next() mod M
  2. If location index already has a key, thenindex = RNG.next() mod M

Split link (zipper method)

The key is to put all the elements in the same hash slot (each slot in the array) into a linked list

The index is computed by the hash function

  • If there is no element in the index slot, insert it directly
  • If you have elements in the slot
    • If the key value is different, the data is inserted into the head of the list
    • If the key values are the same, the values are updated

Suppose the hash function is H (K) = K mod 8

  1. No hashing occurs

Directly inserted into the

  1. Hash clash but different key

Insert into the chain header

  1. A hash conflict occurs and the key is the same

Update the value

Hash table application in iOS

1. The weak form

Weak uses a global HashMap. When an object is destroyed, an array of Pointers to weak is found in the hash table based on the object, and all elements in the array are set to nil

Basic steps

  • (dealloc) Find an array of weak Pointers to the global hashMap, based on a unique value representing the object as the key
  • Sets all elements of the array (weak Pointers) to nil

2. Runloop

Threads and runloops have a one-to-one relationship that is stored in a global Dictionary

3. The other


Refer to the blog

Map Overview part 1: Understand HashMap from top to bottom

The most thorough hash() analysis of the Map in the entire web.

Data structure used in iOS _ hash table

You should also know about hashing conflict resolution strategies

The zipper method resolves hash conflicts and several common hash functions