The issue:
Simply put, a hash table is a data structure that relies on hash functions to organize data to achieve constant time complexity, and is highly efficient at insertion and search.
Two types of hash tables:
Hash set
isA collection of
An implementation of a data structure used for storageThe duplicate values
.Hash map
ismapping
An implementation of a data structure used for storage(key, value)
Key/value pair.
Hash table templates are built into most high-level programming language standard libraries.
1, the principle of hash table
The key idea of hash tables is to use hash functions to map keys to buckets. To be more precise,
- When we insert a new key, the hash function determines which bucket the key should be assigned to and stores the key in the appropriate bucket;
- When we want to search for a key, the hash table uses the same hash function to find the corresponding bucket, and searches only in specific buckets.
The sample
In our example, we use y = x % 5 as the hash function. Let’s use this example to complete the insert and search strategy:
- Insert: We parse the keys through hash functions, mapping them to the appropriate bucket.
- For example, 1987 is assigned to bucket 2, while 24 is assigned to bucket 4.
- Search: We resolve the keys using the same hash function and search only within specific buckets.
- If we search for 1987, we will map 1987 to 2 using the same hash function. So we searched in bucket 2, and we found 1987 in that bucket.
- For example, if we search for 23, we map 23 to 3 and search in bucket 3. We find 23 is not in bucket 3, which means 23 is not in the hash table.
Hash function:
It can be seen that there is a corresponding relation F between the location of the element and its keyword. When searching, the index position (bucket) of the element can be mapped by the key through the hash function, and the corresponding relation F is hash hash function. The hash function is the most important component of the hash table used to map keys to specific buckets. In the above example y = x % 5 is used as the hash function, where x is the key value and y ‘is the index of the allocated bucket.
The hash function will depend on the range of key values and the number of buckets.
Here are some examples of hash functions:
The design of hash functions is an open question. The idea is to assign keys to buckets whenever possible, and ideally the perfect hash function would be a one-to-one mapping between keys and buckets. In most cases, however, the hash function is not perfect and requires a tradeoff between the number of buckets and the capacity of the buckets.
2. Conflict resolution
Ideally, if our hash function is a perfect one-to-one mapping, we will not need to deal with conflicts. Unfortunately, in most cases, conflict is almost inevitable. For example, in our previous hash function (y = x % 5), 1987 and 2 were both assigned to bucket 2, which was a conflict (so the mapping location is called a bucket, because a conflict also requires a secondary lookup inside the bucket to find the location of the element).
You can simply use an array to store keys in the same bucket. If N is variable or large, we may need to use a highly balanced binary tree instead.
3. Complexity analysis
If there are M keys in total, then the spatial complexity of O(M) can be achieved when using hash tables.
The time complexity of hash tables has a strong relationship with design.
Take the example of using arrays to store values in the same bucket, which ideally is small enough to be considered a constant. The time complexity of both insertion and search is O(1).
But in the worst case, the maximum bucket size will be N. The time complexity is O(1) for insertion and O(N) for search.
Principles of built-in hash tables
A typical design of a high-level programming language’s built-in hash table is:
- The key value can be anything
Hashable
Type. And values of the hashable type will haveHash code
. This hash code will be used by the mapping function to get the index of the store. - Each bucket contains one
An array of
Is used to initially store all values in the same bucket. - If there are too many values in the same bucket, those values are kept in one
Highly balanced binary tree search tree
In the.
The average time complexity of insert and search is still O(1). The worst-case time complexity of insert and search is O(logN), using a highly balanced BST. It’s a tradeoff between insertion and search.