This is the seventh day of my participation in the August More text Challenge. For details, see:August is more challenging

Hash table

Almost all programming languages use hash tables directly or indirectly as data structures. The magic of a hash table, which is usually based on an array, is a transformation of the values of the subscript, called a hash function, which gives you a HashCode. It has the advantage of being faster and more efficient than arrays.

1. Know the hash table

  • Hash: The process of converting large numbers into subscripts of an array range
  • Hash function: Usually we convert words to large numbers (each letter corresponds tounicode), the process of converting large numbers into hashed code is encapsulated into a function called a hash function
  • Hash table: The encapsulation of the entire structure into which the data is eventually inserted is called a hash table

Each key in the hash table has a unique value, and each key has its corresponding index value. Through this index value, the value of the key can be directly found in the hash table, which greatly improves the efficiency of the search. If the data is large and each index value corresponds to more than one key, conflicts will occur. There are two common ways to resolve this conflict: chain-address and open-address.

In the following encapsulation, the chain address method based on array is adopted.

Encapsulate the hash table

Set the total length of the hash table when the hash table is encapsulated. We usually set the total length to be twice the number, so if we have 50 pieces of data, let’s set the total length of this hash table to be 100.

function HashTable() { this.storage = []; This.count = 0; This. limit = 7; // The total length of the current hash table}Copy the code

The hash function

Hash functions are crucial in a hash table and are required for almost every operation. What it does is it takes the index value of a key in the hash table, calculates its hashCode by some calculation method, and then takes the mod of the hashCode (the value of hashCode is very large, in order to make the data quantity smaller), and this size is the value of this. Limit, That’s the length of the hash table.

This 37 is actually a base, so for example, ABC is equal to 1 times 37^2 + 2 times 37^1 + 3. I just want to pick a good number here, not necessarily 37, but preferably prime.

HashTable.prototype.hashFunc = function (str, size) { var hashCode = 0; For (var I = 0; i < str.length; i++) { hashCode = 37 * hashCode + str.charCodeAt(i); } var index = hashCode % size; return index; }Copy the code

Three, common operations

Here we use the chained address method (based on arrays, where each subscript corresponds to an array)

A. Insert and modify data

Since there are no duplicate values in a hash table, when a user passes in a (key, value) that doesn’t already exist, it’s an insert. If it exists, it is a modification operation. Operation steps:

  1. Get the index value based on the key (via a hash algorithm) and insert the data into the corresponding location

    var index = this.hashFunc(key, this.limit);
    Copy the code
  2. Retrieve the location based on the index value: first determine if there is an array at the location, if not, create it

    var bucket = this.storage[index];
    if (bucket == null) {
        bucket = [];
        this.storage[index] = bucket;
    }
    Copy the code
  3. Determine whether to add or modify the original value (if the key is the same, it is modified, if the key is different, it is added)

    If the key already exists, change its value;

    Here the bucket is a reference to one of the indexes in the hash table, and stores an array of keys: values (the keys calculated by the hash function are intended to pass through).

    for (var i = 0; i < bucket.length; i++) { var tuple = bucket[i] if (tuple[0] == key) { tuple[1] = value; return; }}Copy the code
  4. If no, perform subsequent operations. Notice that we’re pushing an array object

    bucket.push([key, value]);  
    this.count += 1;
    Copy the code

B. Obtain and delete operations

The first two steps here are the same as adding or modifying data above. So let’s hash out the index of this key, and then find the array of the key, if it exists, and then find the value of this key from that array.

  1. The index is obtained based on the key

  2. Get the bucket(array) for the location (index)

  3. Check whether the bucket is null. If the bucket is null, return null

    if (bucket == null)   return null;
    Copy the code
  4. Look linearly for each key in the bucket equal to the passed key

    Returns value if it is equal to; Return null if no one has been found at the end of the traversal

    for (var i = 0; i < bucket.length; i++) {
        var tuple = bucket[i]
        if (tuple[0] == key) {
            return tuple[1];
        }
    }
    return null;
    Copy the code

The delete operation is similar here, except the get operation gets the value of this key and returns it, and the delete operation deletes it; Add the following two lines of code to the statement to delete the obtained value and the key.

bucket.splice(i, 1);
this.count -= 1;
Copy the code

C. Test code

Where 0, 3, 4, and 5 are the converted hash table index values, and when their index values are the same, they are stored in an array as key = value.

var hash = new HashTable();
 hash.put('mannqo', 123);
 hash.put('mama', 222);
 hash.put('Ytao', 321);
 hash.put('fafa', 213);
 hash.put('neinei', 123);
 hash.remove('neinei');
 console.log(hash.get('neinei'));
 console.log(hash.get('mannqo'));
 console.log(hash);
Copy the code

Output results:

As you can see, the deleted element is againgetIt will return null, and it will not be visible in the hash table. With the same indexmamaandfafaIs stored in an array with index 0.

4. Expand the hash table

As the amount of data increases, the length of the bucket(array) corresponding to each index value will become longer, resulting in a decrease in efficiency. This is whether you need to expand the hash table. It is possible to increase efficiency by expanding an array when the loadFactor is greater than 0.75. After scaling, remember to change the location of all the data items (re-call the hash function to get a different location), because the location of each element will also change after scaling.

  1. Var oldStorage = this.storage;

  2. Reset all properties

    this.storage = [];
    this.count = 0;
    this.limit = newLimt;
    Copy the code
  3. Iterate over all buckets in oldStorage

    The first step is to determine if each of the old buckets has anything. If not, the program continues, and if it does, it is stored in the new bucket.

    for (var i = 0; i < oldStorage.length; i++) { var bucket = oldStorage[i]; if (bucket == null) { continue; } for (var j = 0; j < bucket.length; j++) { var tuple = bucket[j]; this.put(tuple[0], tuple[1]); }}Copy the code

Determine whether to expand the capacity

If (this.count > this.limit * 0.75) {this.resize(this.limit * 2); }Copy the code

If there are too many elements to delete, you can also reduce the size of the hash table by adding the following statement to the delete operation.

If (this.limit > 7 && this.count < this.limit * 0.25) {this.resize(floor(this.limit / 2))}Copy the code

Five, the summary

  • Hash tables are very powerful, the most prominent feature is that it is fast to find. Each key has its corresponding index value, according to the index value can be found very quickly. There may be more than one element in this position (conflicts are inevitable), but the number of conflicting elements is usually small and efficient.
  • However, hash table has its disadvantages. In a hash table, the data is not ordered, and the key of the hash table is not allowed to repeat, each key corresponds to a value, which is used to hold different elements. Different places the same key.