This is the seventh day of my participation in the August More text Challenge. For details, see:August is more challenging
Hash table
Almost all programming languages use hash tables directly or indirectly as data structures. The magic of a hash table, which is usually based on an array, is a transformation of the values of the subscript, called a hash function, which gives you a HashCode. It has the advantage of being faster and more efficient than arrays.
1. Know the hash table
- Hash: The process of converting large numbers into subscripts of an array range
- Hash function: Usually we convert words to large numbers (each letter corresponds to
unicode
), the process of converting large numbers into hashed code is encapsulated into a function called a hash function - Hash table: The encapsulation of the entire structure into which the data is eventually inserted is called a hash table
Each key in the hash table has a unique value, and each key has its corresponding index value. Through this index value, the value of the key can be directly found in the hash table, which greatly improves the efficiency of the search. If the data is large and each index value corresponds to more than one key, conflicts will occur. There are two common ways to resolve this conflict: chain-address and open-address.
In the following encapsulation, the chain address method based on array is adopted.
Encapsulate the hash table
Set the total length of the hash table when the hash table is encapsulated. We usually set the total length to be twice the number, so if we have 50 pieces of data, let’s set the total length of this hash table to be 100.
function HashTable() { this.storage = []; This.count = 0; This. limit = 7; // The total length of the current hash table}Copy the code
The hash function
Hash functions are crucial in a hash table and are required for almost every operation. What it does is it takes the index value of a key in the hash table, calculates its hashCode by some calculation method, and then takes the mod of the hashCode (the value of hashCode is very large, in order to make the data quantity smaller), and this size is the value of this. Limit, That’s the length of the hash table.
This 37 is actually a base, so for example, ABC is equal to 1 times 37^2 + 2 times 37^1 + 3. I just want to pick a good number here, not necessarily 37, but preferably prime.
HashTable.prototype.hashFunc = function (str, size) { var hashCode = 0; For (var I = 0; i < str.length; i++) { hashCode = 37 * hashCode + str.charCodeAt(i); } var index = hashCode % size; return index; }Copy the code
Three, common operations
Here we use the chained address method (based on arrays, where each subscript corresponds to an array)
A. Insert and modify data
Since there are no duplicate values in a hash table, when a user passes in a (key, value) that doesn’t already exist, it’s an insert. If it exists, it is a modification operation. Operation steps:
-
Get the index value based on the key (via a hash algorithm) and insert the data into the corresponding location
var index = this.hashFunc(key, this.limit); Copy the code
-
Retrieve the location based on the index value: first determine if there is an array at the location, if not, create it
var bucket = this.storage[index]; if (bucket == null) { bucket = []; this.storage[index] = bucket; } Copy the code
-
Determine whether to add or modify the original value (if the key is the same, it is modified, if the key is different, it is added)
If the key already exists, change its value;
Here the bucket is a reference to one of the indexes in the hash table, and stores an array of keys: values (the keys calculated by the hash function are intended to pass through).
for (var i = 0; i < bucket.length; i++) { var tuple = bucket[i] if (tuple[0] == key) { tuple[1] = value; return; }}Copy the code
-
If no, perform subsequent operations. Notice that we’re pushing an array object
bucket.push([key, value]); this.count += 1; Copy the code
B. Obtain and delete operations
The first two steps here are the same as adding or modifying data above. So let’s hash out the index of this key, and then find the array of the key, if it exists, and then find the value of this key from that array.
-
The index is obtained based on the key
-
Get the bucket(array) for the location (index)
-
Check whether the bucket is null. If the bucket is null, return null
if (bucket == null) return null; Copy the code
-
Look linearly for each key in the bucket equal to the passed key
Returns value if it is equal to; Return null if no one has been found at the end of the traversal
for (var i = 0; i < bucket.length; i++) { var tuple = bucket[i] if (tuple[0] == key) { return tuple[1]; } } return null; Copy the code
The delete operation is similar here, except the get operation gets the value of this key and returns it, and the delete operation deletes it; Add the following two lines of code to the statement to delete the obtained value and the key.
bucket.splice(i, 1);
this.count -= 1;
Copy the code
C. Test code
Where 0, 3, 4, and 5 are the converted hash table index values, and when their index values are the same, they are stored in an array as key = value.
var hash = new HashTable();
hash.put('mannqo', 123);
hash.put('mama', 222);
hash.put('Ytao', 321);
hash.put('fafa', 213);
hash.put('neinei', 123);
hash.remove('neinei');
console.log(hash.get('neinei'));
console.log(hash.get('mannqo'));
console.log(hash);
Copy the code
Output results:
As you can see, the deleted element is againget
It will return null, and it will not be visible in the hash table. With the same indexmama
andfafa
Is stored in an array with index 0.
4. Expand the hash table
As the amount of data increases, the length of the bucket(array) corresponding to each index value will become longer, resulting in a decrease in efficiency. This is whether you need to expand the hash table. It is possible to increase efficiency by expanding an array when the loadFactor is greater than 0.75. After scaling, remember to change the location of all the data items (re-call the hash function to get a different location), because the location of each element will also change after scaling.
-
Var oldStorage = this.storage;
-
Reset all properties
this.storage = []; this.count = 0; this.limit = newLimt; Copy the code
-
Iterate over all buckets in oldStorage
The first step is to determine if each of the old buckets has anything. If not, the program continues, and if it does, it is stored in the new bucket.
for (var i = 0; i < oldStorage.length; i++) { var bucket = oldStorage[i]; if (bucket == null) { continue; } for (var j = 0; j < bucket.length; j++) { var tuple = bucket[j]; this.put(tuple[0], tuple[1]); }}Copy the code
Determine whether to expand the capacity
If (this.count > this.limit * 0.75) {this.resize(this.limit * 2); }Copy the code
If there are too many elements to delete, you can also reduce the size of the hash table by adding the following statement to the delete operation.
If (this.limit > 7 && this.count < this.limit * 0.25) {this.resize(floor(this.limit / 2))}Copy the code
Five, the summary
- Hash tables are very powerful, the most prominent feature is that it is fast to find. Each key has its corresponding index value, according to the index value can be found very quickly. There may be more than one element in this position (conflicts are inevitable), but the number of conflicting elements is usually small and efficient.
- However, hash table has its disadvantages. In a hash table, the data is not ordered, and the key of the hash table is not allowed to repeat, each key corresponds to a value, which is used to hold different elements. Different places the same key.