“This is the 18th day of my participation in the First Challenge 2022. For details: First Challenge 2022.”
Hey, guys. I’m your hot ass.
I said earlier when I was talking about arrays that it’s too hard to find a number in an array, so you have to start from scratch and compare it until you find an equal number.
This method, to be honest, is too stupid for a lazy person like me.
Can’t there be a way to just look at this number, just look it up in the array? !
Oh, don’t tell me. There is.
Because there is always more lazy than me, eggs lazy is only lying, somebody else big guy lazy is directly start work to solve, as expected that sentence “lazy is the first productive force”, sincere not I deceive!
That’s what I’m going to talk to bitches about today: hash tables.
Hash table
A Hash Table is a Hash Table.
I said up here that I wanted to have a way of looking at a number and knowing where it is in an array, and that’s actually using the idea of hashing.
The idea is that you can find the location of the key without any useless comparisons.
Let me give you a scenario that might be clearer:
There are 40 students in dogegg class. The student ID number of each student is composed of the year of admission + grade + class + number. For example, 2021210129 is the 29th student of Class 01, grade 21, who will be admitted in 2021.
Now we have a requirement: we need to find someone’s personal information quickly.
That time we have to establish a table, in principle, if I want to know a student’s personal information, in fact, know the student number is good, but not in this, the value of the student number is too big, we can not take the student number as a subscript.
Student number can not, that what can? We fixed a look, yi, the number can ah, the number is from 1 to 40.
How do I get the number? Is not student number to 2021210100 mod ok.
At this point, if we want to check the personal information of the student whose student number is 2021210129, we only need to access the data with subscript 29.
Actually, this can be solved in order one time.
Second man solid hammer.
The formula is: storage location = F (key word).
F here is also called a hash function, and each key corresponds to the range of 0 to N minus 1 and is placed in the appropriate position.
In the example above f(key) = key % 2021210100.
The hash address of the key is calculated by the same hash function, and the key is stored according to the hash address.
The final form of the table is hash table, it is mainly search oriented storage structure, simplify the process of comparison, improve efficiency.
Hash sample
If you can see clearly above, let’s give a small example to deepen the impression.
If you have an array of n = 10, the hash function f(key) = key % 10 hashes 4,10,11,19,29,39 into the array.
Once the hash function is determined, all that is left is to calculate the corresponding storage location of the keyword.
- 4%10 = 4, so put 4 at index 4.
- 10% 10 = 0, so put 10 at 0.
- 11%10 = 1, so put 11 at index 1.
- 19%10 = 9, so put 19 at 9.
- 29%10 = 9, so put 29 at 9.
But now the problem is, the hole with index 9 is already occupied by 19, and the indices of 29 are in conflict.
There’s a scientific name for this: hashing.
Handle hash conflicts
For two unequal keywords key1 and key2, if f(key1) = f(key2), this is a hash conflict.
Key1 takes the hole, so KEY2 has to find another way, what way?
Generally, there are several ways to deal with hash conflicts:
- Open addressing
- Hash function method
- Chain address method
- .
Open addressing
Open addressing means that in case of conflict, another available location is selected.
Open addressing has two most commonly used strategies:
- Linear detection method
- Secondary detection method
Linear detection method
Linear detection is, as the name suggests, straight-forward detection.
And look at the formula:
f(key) = (f(key) + di) % m (di = 1, 2, 3, … , m – 1).
I’ll use the example from the “hash example” :
Hash f(key) = key % 10, hash 4,10,11,19,29,39 into the array.
At 29, 29 percent 10 is 9.
But the subscript 9 already has element 19, so it probes the next position (9 + 1) % 10 = 0.
0 already has element 10, so go on to the next position (9 + 2) % 10 = 1.
We also have element 11 at 1, so we have to go to the next location (9 + 3) % 10 = 2.
The subscript 2 position is finally empty, so 29 finds a home:
I don’t know if you noticed, but for the 29, it was just a conflict with the 19, but it’s a conflict with the 10 and 11.
This causes several collisions to be handled at a time, and a large number of numbers can be concentrated in one area, greatly reducing the efficiency of inserts and lookups.
Later, I don’t know which big guy did improvement on the basis of linear, tamping out a second detection method.
Secondary detection method
The second detection method is to square the previous di as follows:
(key) = f (f (key) + di) % m (di = 1 squared, 1 squared, 2 squared, – 2 squared,… , q², -q², q ≤ m/2)
For example, for 29, the subscript 9 is 19, so it probes the next position (9 + 1²) % 10 = 0.
The position with subscript 0 is occupied by element 10, and the next time it probes the previous position (9-1²) % 10 = 8.
The position with subscript 8 above and directly occupied:
Hash function method
To hash, instead of just one hash function, use a set of hash functions F1 (key), F2 (key), f3(key)…. .
When the storage space calculated using the first hash function is occupied, use the second hash function, and there will always be a hash function to resolve the conflict.
And so on until you find an empty space, and then you hold it.
Of course, this method does not result in a large number of numbers clustered in one area, but this situation significantly increases the calculation time.
Chain address method
Who said that conflict can only be found after another pit, several people squatting in a pit it is not sweet.
Maybe some troll really liked it, and then the whole chain address thing.
The linked address method puts the keys that yield the same result in a single linked list, and the hash table stores the pointer to the head of each single linked list.
The hash function f(key) = key % 10 hashes 4,10,11,19,29,39.
Finally, the following figure is obtained:
You see, the chain address method doesn’t have this pit occupied, so you have to find another pit.
We squat together, so that there will never be a case of failure to find the address, but directly inserted into the corresponding single linked list, so the insertion time complexity is O(1).
Of course, there are some gains and some losses, and this situation will inevitably cause performance loss on the search. Here, the search time complexity is O(k), where K = N/the number of single-linked lists.
Okay, so that’s it for hash tables, doesn’t it seem pretty simple.
Hash table as a very high and efficient search data structure, lost the keyword repeated meaningless comparison, directly in one step to find the results, very top.
So it doesn’t have to be so annoying to deal with conflict and shit, because you win some and you lose some.
Hopefully bitches will learn after watching this, and don’t forget my likes, by the way.
I’m Handsome. See you next time