Hash table Hash
Goal:
- basis
- Design of hash functions
- Hash conflict handling
- Chain address method Separate Chaining
- Implement your own hash table
- Dynamic spatial processing and complexity of hash tables
1.1 Understanding hash tables
- Each character corresponds to an index (hash function), O(1) lookup operation
- It is difficult to ensure that each “key” is converted by a hash function to a different “index”.
- Hash conflict
- The designer of hash functions is very important
- The more evenly distributed the “index” obtained by the “key” hash function, the better
- Hash table fully embodies the classical idea of algorithm design: space for time. A hash table is a balance between time and space
1.2 Design of hash function
Every Object class in Java has an implementation of hashCode() by default.
-
The integer
-
Small positive integers are used directly
-
Small range of negative integers offset -100 100- > 0 200
-
Large integer
-
Modulus: die a prime number (relating to the mathematical theory, but more discussion) reference: planetmath.org/goodhashtab…
The test code is as follows:
int a = 42; System.out.println(((Integer)a).hashCode()); int b = -42; System.out.println(((Integer)b).hashCode()); Copy the code
-
-
-
floating-point
Both are 32-bit and 64-bit binary representations, but the computer interprets them as floating-point numbers
To an integer.
double c = 3.1415926; System.out.println(((Double)c).hashCode()); Copy the code
-
String, compound type
- To an integer
String d = "java"; System.out.println(d.hashCode()); Copy the code
-
The principle of
- Consistency: Hash (a) == hash(b) if a == b
- High efficiency: the calculation is efficient and simple
- Uniformity: Hash values are evenly distributed
1.3 Handling of hash conflict
-
Chain address method
-
For the entire hash table, I open m Spaces, and for each space, because of the hash conflicts, I’m essentially storing a linked list
For each position, we say save a linked list, which is actually the lookup table. After previous learning, we can also use the balanced tree structure (TreeMap, TreeSet) to achieve the lookup table.
Before Java8, there was a linked list for each location. Starting with Java8, when hash collisions reached a certain level, each location changed from a linked list to a red-black tree.
-
-
TreeMap implementation
The code is as follows:
/ / add public void add(K key, V value) { / / reuse TreeMap TreeMap<K, V> map = hashtable[hash(key)]; if (map.containsKey(key)) { map.put(key, value); } else{ map.put(key, value); size++; }}/ / delete public V remove(K key){ TreeMap<K, V> map = hashtable[hash(key)]; V ret = null; if (map.containsKey(key)){ ret = map.remove(key); size--; } return ret; } / / change public void set(K key, V value){ TreeMap<K,V> map = hashtable[hash(key)]; if(! map.containsKey(key)){throw new IllegalArgumentException(key + "doesn't exist"); } map.put(key, value); } public boolean contains(K key){ return hashtable[hash(key)].containsKey(key); } public V get(K key){ return hashtable[hash(key)].get(key); } Copy the code
-
Complexity analysis
-
Suppose there are a total of M addresses. If you put them in a hash table, you get N elements, and when you implement them in a linked list, the time complexity of each address is O(N/M), and when you implement them in a balanced tree, the time complexity of each address is O(logN/M).
-
How do I get to order one? Dynamic space allocation.
- On average, each address carries more elements than a certain level, that is, capacity expansion. N/M >= upperTol
- The average capacity of each address is reduced to a certain extent. N/M < lowerTol
The specific implementation code is as follows:
private void resize(int newM){ TreeMap<K, V>[] newHashTable = new TreeMap[newM]; for (int i = 0; i<newM; i++){ newHashTable[i] =new TreeMap<>(); } int oldM = M; this.M = newM; for (int i = 0; i<oldM; i++){ TreeMap<K, V> map = hashtable[i];for(K key:map.keySet()){ newHashTable[hash(key)].put(key, map.get(key)); }}this.hashtable = newHashTable; } // Modify the added element public void add(K key, V value) { / / reuse TreeMap TreeMap<K, V> map = hashtable[hash(key)]; if (map.containsKey(key)) { map.put(key, value); } else { map.put(key, value); size++; if (size >= upperTol * M) { resize(2* M); }}}/ / delete public V remove(K key) { TreeMap<K, V> map = hashtable[hash(key)]; V ret = null; if (map.containsKey(key)) { ret = map.remove(key); size--; if (size < lowerTol * M && M/2>=intitCapacity) { resize(M / 2); }}return ret; } Copy the code
-
For hash tables, the number of elements increases from N to upperTol*N, the address space doubles, and the average complexity is O(1). Each operation ranges from O(lowerTol) to O(upperTol)–>O(1).
-
Expand by 2 by M, not prime. Solution: according to planetmath.org/goodhashtab…
/ / prime tables private final int[] capacity = {53.97.193.389.769.1543.3079.6151.12289.24593.49157.98317.196613.393241.786433.1572869.3145739.6291469.12582917.25165843.50331653.100663319.201326611.402653189.805306457.1610612741}; / / add public void add(K key, V value) { / / reuse TreeMap TreeMap<K, V> map = hashtable[hash(key)]; if (map.containsKey(key)) { map.put(key, value); } else { map.put(key, value); size++; if (size >= upperTol * M && capacityIndex + 1< capacity.length) { capacityIndex++; resize(capacity[capacityIndex]); }}}/ / delete public V remove(K key) { TreeMap<K, V> map = hashtable[hash(key)]; V ret = null; if (map.containsKey(key)) { ret = map.remove(key); size--; if (size < lowerTol * M && capacityIndex - 1> =0) { capacityIndex--; resize(capacity[capacityIndex]); }}return ret; } Copy the code
-
Conclusion:
- The amortized complexity of hash tables is O(1), sacrificing orderality. The implementation of sets and maps can be linked lists, trees, or hash tables.
- Ordered sets, ordered maps: balanced trees
- Unordered set without mapping: hash table
-