Combine LeetCode to talk about the application of hash table in algorithm problems
LeetCode’s first 100 problems summarized some scenarios in which hash tables (unordered_map) are applied to algorithmproblems. Using hash tables at the right time can greatly improve algorithmefficiency, such as counting the number of occurrences of each character or word in a string, or selecting two numbers from a one-dimensional array to be equal to a number.
Before we get started, let’s take a quick look at hash tables (also known as hash tables) and jump to the LeetCode section if you’re in a hurry.
Introduction to hash tables
The worst time for hash table lookups is O(n) and the average time is O(1), so ideally hash tables are used much like arrays.
A hash table uses some algorithmic operation (hash function) to convert a Key into an index of the array to access the data in the array. This allows access to the data in a key-value manner, achieving constant access efficiency. Today’s NOSQL databases use key-value to access stored data.
A hash table is a classic example of an algorithm making trade-offs in time and space. A hash function maps the key value to the access address of the record for quick lookups. If there is no memory limit, we can use the key as the index of the array directly, and all operations can be done with only one access to memory.
The hash function
A hash function is the process of converting a key into an array index. It should be easy to calculate and distribute all keys evenly.
The most commonly used method of hash function is the divisor remainder method, usually choose prime number, so as to ensure uniform distribution of key values.
The hash function depends on the type of key, and each data type requires a hash function. For example, if the key type is an integer, we can use the remainder method directly; If the key type is a string we can still use division and remainder, we can treat the string as a particularly large integer.
int hash = 0;
for (int i=0; i<s.length(); i++){ hash = (R*hash +s.charAt(i)%M); }Copy the code
or
Hash hashCode(char *key){
int offset = 5;
Hash hashCode = 0;
while(*key){
hashCode = (hashCode << offset) + *key++;
}
return hashCode;
}
Copy the code
Use hashCode(key) & (size-1) to get a size-1 hash value
To solve collision
Different keywords get the same hash address f(key1)=f(key2), which is a collision. This is something we need to avoid as much as possible. Common treatment methods are:
- Zipper method
- Open address method
Zipper method
Points each element in an array of size M to a linked list, where each node stores a key-value pair whose hash value is the index of that element. The average length of each list is N/M, where N is the total number of key-value pairs. The diagram below:
Add operation:
- You get hashCode by using the hash function
- Get index by hashcode
- If there is no list at index, create a new node as the first node of the new list
- If there is already a list at index, the key is first iterated to see if it already exists. If there is, the key is returned. If not, the key is added to the list header
Delete operation:
- You get hashCode by using the hash function
- Get index by hashcode
- Walk through the linked list and delete the nodes
Open address method
Use an array of size M to hold N key-value pairs, and when a collision occurs, directly check the next position in the hash table. The inspection method can be linear detection, square detection, double hashing and other methods, the main difference lies in the detection of the next position. Double hashing is recommended in Introduction to Algorithms.
// Insert algorithm
HASH-INSERT(T, k)
i = 0
repeat
j = h(k, i)
if T[j] == NIL
T[j] = k
return j
else
i++
until i == m
error "hash table overflow"
// Search algorithm
HASH-SEARCH(T, k)
i = 0
repeat
j = j(k, i)
if T[j] == k
return j
i++
until T[j] == NIL or i == m
return NIL
Copy the code
Hash table title in LeetCode
In practicing LeetCode, I came across several scenarios where tables could be used to simplify the problem.
The problem of finding indexes by element values
Given an array nums = [2, 7, 11, 15], find two numbers whose sum is target = 9.
This simple problem, if you use cyclic violence, can be solved very quickly, but the time complexity is O(n^2), and if the array is very long, on the order of millions and millions, it takes an enormous amount of time to loop through.
A quick solution is to use a table to write down the index of each value, then loop once to determine whether target-nums[I] is in the table, and quickly find a set of solutions. Here, key is a value and value is the index corresponding to the value, which is a method to quickly find the index using the value.
vector<int> twoSum(vector<int>& nums, int target)
{
unordered_map<int.int> maps;
int size = nums.size();
for(int i = 0; i < size; ++i)
maps[nums[i]] = i;
for (int i = 0; i < size; ++i) {
int left = target - nums[i];
if(maps.count(left) > 0&& maps[left] ! = i) {return vector<int>({i, maps[left]}); }}return vector<int> (); }Copy the code
Write down the number of occurrences of an element
The problem of legitimacy judgment of sudoku and the problem of shortest substring containing all characters are both problems of calculating the occurrence times of an element by using hash table.
Unordered_map
, unordered_map
, unordered_map
, unordered_map
The total number of possible ASCII elements is 128, which is essentially the hash function f(int x){return x; }.
Sudoku problem
Sudoku requires that each row, column, and table be divided into 9 sub-blocks. Within each sub-block, 1-9 can only appear once and cannot be repeated. We can create a hash table for each row and column, a total of 27 tables, and add elements to the table. If any element in the table is greater than 1, we can judge that Sudoku is not qualified.
bool isValidSudoku(vector<vector<char>>& board)
{
vector<unordered_map<char.int>> rows(9);
vector<unordered_map<char.int>> cols(9);
vector<unordered_map<char.int>> subs(9);
for (int i=0; i<9; i++)
{
for (int j=0; j<9; j++)
{
// row
char ch = board[i][j];
if(ch ! ='. ')
{
rows[i][ch]++;
if(rows[i][ch] > 1)
return false;
cols[j][ch]++;
if(cols[j][ch] > 1)
return false;
int idx = i/3 + j-(j%3);
subs[idx][ch]++;
if(subs[idx][ch] > 1)
return false; }}}return true;
}
Copy the code
Similarly, when solving sudoku problems, we can use dynamic programming method to check whether the inserted element is legal by using hash table before inserting each element. If not, we can restore the “crime scene” and return to the upper layer.
Convert data with different keys to the same key
Some problems, such as Group Anagrams problem, find all the same elements in a list with different permutations.
Input: ["eat"."tea"."tan"."ate"."nat"."bat"],
Output:
[
["ate"."eat"."tea"],
["nat"."tan"],
["bat"]]Copy the code
We can use the principle of hash table, combined with the characteristics of permutation and combination, all elements according to the lexicographical conversion, so that the results of these elements are the same, pointing to the same index. F (string){return sort(string); }. The solution to the problem is as follows:
vector<vector<string>> groupAnagrams(vector<string>& strs)
{
vector<vector<string>> result;
unordered_map<string.vector<string>> map;
for(int i = 0; i < strs.size(); i++)
{
string s = strs[i];
sort(s.begin(), s.end());
map[s].push_back(strs[i]);
}
for(pair<string.vector<string>> pa:map)
{
result.push_back(pa.second);
}
return result;
}
Copy the code
conclusion
Combining the above problems with hash table in LeetCode, we can summarize some ideas about how to improve algorithm efficiency of hash table in algorithm problems.
- In those problems that require iterating over and over to find an element, the problem can be converted to finding an index based on the content of the element, and the time efficiency of the hash table in this area is very high;
- In some problems such as string word frequency statistics and sudoku, hash function can be used to calculate the number of occurrences of an element as an auxiliary tool of algorithm.
- There are also some problems, you can use the idea of hash function, let several different elements get the same result, so as to achieve a cluster.
references
- Algotithm 3rd
- LeetCode hash table