This is the sixth day of my participation in the First Challenge 2022. For details: First Challenge 2022.

Hash table

There is no hash table in JS. In JS we use objects, sets or maps, which are essentially collections of key-value pairs (hash tables).

In addition, there are many nicknames for hash tables, such as hash table, hash map, map, dictionary, associative array and so on. Usually, hash table is called more frequently. In this book, hash table is called uniformly.

Why hash tables?

The array query is too slow.

Even with binary lookup, it takes O(logn) time to find elements in an array.

Is there a way to implement queries in O(1) time complexity?

Yes, scientists have invented data structures such as hash tables, which combine the advantages of arrays and linked lists. The insertion and deletion time complexity is O(1), and the search time complexity is also O(1).

The bottom layer of a hash table is actually an array plus a hash function, and the hash function is the key to achieving O(1) search time.

The hash function

A hash function is a function that takes in a piece of data, internally computes a number somehow, and outputs that number.

Hash functions map input to numbers and must meet the following requirements:

  • Each input and output must be consistent.
  • Different inputs map to different numbers.

Let’s say we have some goods, milk for $3, apples for $4, pears for $6, and let’s create a hash table of goods.

Suppose the hash function returns the SUM of the ASCII values of the input characters and then mod 3.

Then we can get the mapping between goods and the output of the hash function:

Milk -> 429% 3 -> 0 price is 3 yuan Apple -> 424% 3 -> 1 price is 4 yuan PEAR -> 530%3 -> 2 price is 6 yuanCopy the code

We can then store the prices of these three items in an array indexed by the hash function’s output.

[3, 4, 6] Array index positions 0, 1, and 2 store the prices of the three items respectivelyCopy the code

This gives us O(1) time complexity for the query, because the hash function tells us exactly where the price is stored, and we don’t need to look it up at all.

For example, if we want to query the price of apples, we compute the hash function to get index 1, and then return the value of index 1 stored in the array, which is 4.

Let’s look at the requirements of the hash function:

Each input and output must be consistent

They have to be consistent, because if they’re not, let’s say I look up the price of apples, this time it’s 3, the next time it’s 5, how does that work?

Different inputs map to different numbers

It cannot be repeated. If it is repeated, the prices of two goods are stored in the same index, which will cause an error.

conflict

In the previous example, we had very few items, only three.

So let’s say we have more products, and we have a new product called mlik, and milk is just in alphabetical order.

When placing the commodity mlik into the commodity hash table, the same hash function is used to calculate the same value as milk, which is 0.

The index 0 is already occupied by milk.

If mlik is also placed at 0, it will overwrite the previous milk.

This situation is called a collision, also called a collision.

How to resolve conflicts?

There are many ways to resolve conflict, such as the zipper method:

Store a linked list where mlik and milk collide, and store the prices of mlik and milk respectively in the linked list.

One problem with this is that it’s still fast to look up the price of apples and pears, and slower to look up milk, because you have to look up the list one by one.

And there is another problem. If we store data in this way, the result is that the hash table is distributed to the linked list, as shown below:

Then our hash table almost becomes a linked list, which is not good!

Scientists have come up with some ways to try to avoid this problem:

  • The loading factor
  • Good hash functions

The loading factor

Fill factor === Number of elements contained in the hash/total number of hash positions [NULL, 1, NULL, 0, NULL] -> Fill factor is 2/5 -> 0.4 [1, 2, 3, 0, 5] -> Fill factor is 5/5 -> 1Copy the code

The lower the fill factor, the lower the likelihood of collisions and the higher the hash table performance.

If the fill factor is too large, adjust the length of the hash table, for example:

[NULL, 2, 3] changed to [NULL, 2, 3, NULL, NULL, NULL, NULL] from 2/3 to 1/3Copy the code

Is it very simple? Although the method is very simple, it can effectively reduce the occurrence of conflict.

A good rule of thumb is to adjust the length of the hash table once the fill factor is greater than 0.7.

Good hash functions

Bad hash functions allow values to pile up, leading to a lot of collisions;

Good hash functions distribute the values of the array evenly.

What is a good hash function? In fact, hash tables are already handled internally in various programming languages, such as the SHA function.

Learning a knowledge, a lot of time to dig up endless, I think know the concept, understand.

Application scenarios of hash tables

These are examples in the book

  • Phone book, DNS resolution
  • Prevent duplication of votes
  • Cache/remember the data so the server doesn’t have to process it again to generate it.

In my opinion, it’s all about creating mappings and making quick lookups.

Hash table in JS implementation

Take Object as an example to realize the above commodity list:

const priceMap = {
  milk: 3,
  apple: 4,
  pear: 6
}
Copy the code

The internal hash function calculates the hash value of the three items, and then stores it in the corresponding index of the array. However, the calculation method of the hash function inside JS is more complicated.

Js determines whether an attribute exists on an object (hash lookup)

  • Let’s just look it up and see if it’s empty

  • The in operator

  • Object.prototype.hasOwnProperty()

const obj = { name: 'lin' } obj.name ! == undefined // obj['name']! == undefined 'name' in obj // in operator obj.hasOwnProperty('name') // hasOwnPropertyCopy the code

Of course, Map, WeakMap, Set and WeakSet in JS are also collections of key value pairs, but they are different. For details, please refer to the third edition of RUan Yipin ECMAScript 6 (ES6) standard Introductory tutorial

summary

You almost never have to implement hash tables yourself; various languages already implement them internally.

Hash tables have the following features:

  • The hash table is internally implemented by arrays and hash functions.
  • Hash tables are very fast to find, insert, and delete.
  • Hash tables are suitable for simulating mapping relationships.

The hash table compares the time complexity of arrays and linked lists as follows:

operation Hash table (average) Hash table (worst) An array of The list
To find the O(1) O(n) O(n) O(n)
insert O(1) O(n) O(n) O(1)
delete O(1) O(n) O(n) O(1)