The day arch a pawn, unexpected!
Introduction to the
Hash is a word we hear a lot about at work, such as hash table, hash function, hashCode, HashTable, HashMap, etc. What is the love and hate between them? Take a look
An array of
Before we talk about hash tables, let’s look at the granddaddy of data structures: arrays.
Arrays are pretty simple, so I’m not going to talk about them, but you can see it in the picture below.
Array subscripts usually start at 0 and store the elements backwards. The same is true for elements that can only be found from the beginning (or from the end).
For example, to find the element 4, it would take three searches from the beginning and two searches from the end.
Early hash tables
The disadvantage of an array is that you can only find an element from the beginning or from the end until it matches, and the equilibrium time complexity is O(n).
So, is there a way to quickly find elements using arrays?
The smart programmers came up with a way to compute the value of an element using a hash function, and use that value to determine the location of the element in the array, so that the time complexity can be reduced to O(1).
For example, if you have five elements 3, 5, 4, and 1, you hash them into an array and place them precisely, rather than sequentially, as you would in a simple array.
If the array size is 8, we can create a hash function called hash(x) = x % 8, and the final element looks like this:
Hash (4) = 4%8 = 4. Hash (4) = 4%8 = 4
An evolutionary hash table
Hash (13) = 13%8 = 5 AUWC (hash(13) = 5%8 = 5) AUWC (hash(13) = 5%8 = 5)
This is the hash conflict, this article is from the number tong elder brother read source code.
Why do we have hash conflicts?
Because the array we’re applying for is of finite length, mapping an infinite number to a finite array is bound to result in collisions, where multiple elements are mapped to the same location.
Okay, so there’s a hash conflict, so we have to solve it, we have to do it!
How to?
Linear detection method
Now that position 5 is occupied, I’m going to give up element 13, I’m going to move back one bit, I’m going to go to position 6, and that’s linear detection, when there’s a conflict, you move back until you find an empty position.
I TMDRL need to move back 3 times to position 7 before I have a free position, which leads to low efficiency of inserting elements. The same is true for searching. First locate position 4, find that it is not the person I want, and then move back. Until we find position seven.
Secondary detection method
The disadvantages of using linear detection method has a big, conflicting elements tend to accumulate in together, for example, 12 on the 7th position, again a 14, conflict, and then back again at the end of the array, to start from scratch again into 0 position, you will find that the elements of conflict have gathered phenomenon, it is unfavorable to find, also not conducive to insert a new element.
At this time, a smart programmer brother came up with a new idea — quadratic detection method. When there is a conflict, instead of looking for empty positions one by one, I use the original hash value plus the quadratic power of I to look for empty positions. I go from 1,2,3… This is done until an empty space is found.
Or above for example, insert element 12, the process is like this,
This allows you to quickly find a place to put new elements without causing conflicting elements to pile up.
But goose, there’s a new element 20. Where do you put it?
AYMY, it doesn’t fit anywhere.
Studies have shown that when more than half of the elements are placed in a hash table using the quadratic detection method, the new element cannot find its place.
So a new concept – expansion.
What is capacity expansion?
When x of placed elements reaches the total capacity, it needs to be expanded. This X is also called expansion factor.
Obviously, the larger the expansion factor, the better, indicating a higher space utilization of the hash table.
So, unfortunately, the secondary detection method does not meet our goal, the expansion factor is too small, only 0.5, half of the space is wasted.
At this time to programmers to play their smart characteristics of the time, after 996 brainstorming, and came up with a new hash table implementation – linked list method.
The linked list method
It’s conflict resolution! If there is a conflict, I will not add it to the array. Instead, I will use a linked list to connect the elements in the same index position of the array. Then I can make full use of space
Hey hey hey, perfect △△.
It’s perfect. I’m a hacker, and I keep putting in elements that *%8=4, and you’ll find that almost all of them end up in the same linked list, MD. The end result is that your hash table degrades to a single linked list, and the efficiency of querying inserts becomes O(n).
At this point, of course, there is a way to expand the factor to do what drop?
For example, if the expansion factor is set to 1, when the number of elements reaches 8, the expansion factor is doubled, half of the elements are still in position 4, and half of the elements are in position 12, which relieves the pressure of the hash table.
However goose, still not very perfect, this article comes from the number tong Elder brother read source code.
Smart programmer brothers started a brainstorming of 9127, and finally came up with a new structure — linked list tree method (of course, the name is Tong Brother).
List tree method
Although the above expansion can solve part of the problem when the number of elements is small, the overall search and insert efficiency is not too low, because the number of elements is small.
However, hackers are still attacking, and the number of elements continues to increase, and when it reaches a certain level, it always leads to a very low efficiency of lookup inserts.
So, to think of it differently, since linked lists are inefficient, let me upgrade them, and when the list is long, how about a red-black tree?
Well, Shenzhou trip, I think so, just do it.
Yeah, good, good. Mom’s not afraid of me getting hacked anymore.
So, this is the end of it?
You’re overthinking it, NM. You have to move half the elements every time you expand, okay? Is that really okay?
Programmers are too hard, this time after 12127 brainstorming, finally came up with a new thing – consistency Hash.
Consistency of the Hash
Consistent Hash is more commonly used in distributed systems. For example, the Redis cluster has four nodes deployed, and we define all Hash values as 0 to 2^32, with a quarter of the elements placed on each node.
This is just an example, the actual Redis cluster principle is like this, the specific value is not like this.
At this point, suppose you need to add a node to Redis, such as node5, between node3 and node4, so that you simply move the elements between Node3 and node4 from node4 to node5, leaving the other elements unchanged.
In this way, the speed of capacity expansion is increased, and the impact of the elements is relatively small, and most requests are almost invisible.
conclusion
How was it? Wasn’t it wonderful?
Want to learn more about programming posture, come to my work from the number “Tong Elder brother read source code” wave ah ~