preface
A HashMap is a very common data structure that is examined in both jobs and interviews.
For example, the optimal solution of a variant of Two Sum in Leetcode 1 requires a HashMap, and the LRU Cache requires a LinkedHashMap.
HashMap is simple to use and the underlying implementation isn’t complicated, so let’s look at some common interview questions. If you don’t know, read this article. This article will take you to the ancestors of HashMap and guarantee you an interview
- What’s the difference between equals and equals?
- Why does overriding equals() require overriding hashCode()?
- The differences and connections between Hashtable, HashSet and HashMap
- What are the methods for handling hash conflicts? Which one is in Java? Why is that? Have you used the other method in your work? Under what circumstances is it used more?
- Implement a HashMap with your bare hands
This paper is divided into the following chapters:
- Introduction to the Set and Map families
- Implementation principle of HashMap
- Hashing conflict details
- HashMap basic operations
- Problem sets Two Sum
- Problem sets LRU
The family of the Set
Before we talk about Map, let’s look at Set.
The concept of set, which we learned in junior high school mathematics, is that there can’t be repeated elements in it, and it’s the same here.
Set is an interface in Java, and as you can see it is a collection framework class in the java.util package.
There are three commonly used ones:
HashSet: Uses the key of Hashmap to store elements. The main feature is unordered, and the basic operations are O(1) time complexity, fast.
LinkedHashSet: This is a HashSet + LinkedList structure that has O(1) time complexity while preserving insertion order.
TreeSet: adopts red-black tree structure, which can be ordered by natural sorting or custom comparator. The disadvantage is that the query is not as fast as HashSet.
The family of the Map
A Map is a key-value pair in which keys cannot be repeated because the keys in a set are there.
So Map also has these three implementation classes corresponding to Set:
HashMap: Corresponds to HashSet, also unordered, O(1).
LinkedHashMap: This is a “HashMap + bidirectly-linked list” structure that ends on a HashMap, so it has all the features of a HashMap and order.
TreeMap: is ordered, essentially implemented by binary search trees.
Implementation principle of HashMap
For each key in the HashMap, a hash function is used to compute a hash value that represents the number in the buckets, which is actually an array. So you take the length of the array on this number module and you get its index in the array, and you put it in the array.
So here are a few questions:
If different elements compute the same hash value, how do we store it?
A: This is a hash collision, where multiple keys correspond to the same bucket.
How is element uniqueness guaranteed in a HashMap? Can the same element get different hashes?
A: Element uniqueness is guaranteed through hashCode() and equals() methods.
If there are too many pairs and buckets too few, how can they break?
A: Rehasing. If there are too many collisions, the array will be doubled (default). In this case, the hash value is the same, but the length of the array is changed, so the calculated index is changed, and it will be allocated to different positions. We will see you later
So when is rehashing? How do you measure if the bucket is crowded enough to expand?
A: The load factor is the number of pairs divided by the number of buckets. The default in Java is 0.75F, beyond which rehashing occurs.
About hashCode() and equals()
If the key hashCode() values are the same, then it is possible that a Hash collision has occurred, or that it has actually encountered another version of itself. So how do you tell? Continue to compare with equals().
In other words,
HashCode () determines the number of keys in the bucket, which is the index in the array;
Equals () is used to compare whether two objects are the same.
So how to answer this classic interview question:
Why override equals(), but must override hashCode()?
Answer: First we assume that the hashcodes of any two objects are different.
If you do not rewrite hashCode(), you will have different hashes, and then you will not be able to recognize each other, thus contradicting equals().
Scatter flowers ~ ~ 🎉🎉🎉
Let’s take a closer look at these two methods:
The hashCode() and equals() methods are both defined in Object Class. Object is the ancestor of all Java classes.
Since it’s free, let’s take a look at what’s in the gift bag. Oracle files for Google Object:
So these methods can be used directly ~
Returning to hashCode() and equals(), if the new class does not override the two methods, it inherits the Object class definition by default.
So let’s click in and see how equals() is defined:
Notes:
The equals() method compares whether these two references refer to the same object.
Huh??????? Are you kidding me? Isn’t that the same as ==?
For primitive type == = is a symbol for comparing values. In the case of reference Type, the comparison is made to see if both references refer to the same object.
There are only 8 Primitive types for Java: byte, short, int, long, float, double, char, Boolean. Everything else is Reference type. So although Java claims “Everything is object”, there are still non-object data types.
I don’t believe it. I’ll go to the source code to see how it works.
Equals () ==
So why do we do this?
A: To make you override ~
For example, when we compare strings, we want to compare the contents of the two strings. Then:
Str1 = "tianxiaoqi";str2 = newString (" tianxiaoqi ");
str1 == str2; // return false
str1.equals(str2); // return true
Copy the code
Because String overrides equals() :
Ancestors left you is to let you use, if you do not use, that other people also provided the default method, is also enough meaning.
Ok, let’s go back to the introduction of hashCode() :
What hashCode() returns is not relevant to this article, but you can refer to this article [1].
It is not necessarily the object’s (virtual) memory address that is returned, depending on the runtime library and JVM implementation.
However, no matter how it is implemented, it follows the documented convention that a unique hash value is returned for different objects.
Hashing conflict details
Generally speaking, there are two kinds of solutions to hash conflicts [2]
- Separate chaining
- Open addressing
Java uses the first Separate chaining, which is to add a “chain” to the end of the bucket to store the collision. Different versions of the “chain” use different data structures:
In JDK1.6 and 1.7, linked lists are stored so that if there are a lot of collisions, it becomes a lookup on the list. Worst case is O(n);
It was optimized in JDK 1.8 to store lists with red-black trees when they are large (over 8), which greatly improves lookups.
(In other words, this is really like to test, has been asked in many interviews, and the interviewer asked why is more than “8” to use red black tree 🤔)
The second method, Open Addressing, is also an important idea because in real distributed systems, there are many places where hash is used but seprate chaining is inappropriate.
This method is sequential search, if the bucket is already occupied, then “somehow” continue to find the next bucket that is not occupied until the first empty bucket is found.
As shown in the figure, John Smith and Sandra Dee have hash conflicts, both of which are calculated to bucket 152, so Sandra goes to the next empty bucket 153, which also affects the following key: Ted Baker calculated that it should have been placed at 153, but since Sandra had already taken it, we had to go to the next slot, so 154.
This is called the Linear probing, and you look for the next slot one by one, as shown in the picture above. There are other ways, like finding squares, or Double hashing.
HashMap basic operations
The basic operation of every data structure is nothing more than add, delete, change, and query.
- Add: put(K key, V value)
- Delete: remove (Object key)
- Put (K key, V value)
- Get (Object key)/containsKey(Object key)
Some keys are of type Object and some are of type K. This is not because of equals()…
This is because you don’t have to use the same object when you do get/remove.
Remember that str1 and STR2 were both Tian xiaoqi? For example, if I put(str1, value) and then use get(str2), I want to get the corresponding value of tianxiaoqi. Just because I changed my clothes doesn’t mean you don’t recognize me! Therefore, there is no limit to the type of key in get/remove, which is convenient for the other user to recognize it.
In fact, the operation process of these apis is much the same, let’s use the most complex PUT (K key, V value) :
- The first step is to get the index of the location in the array
- So how do we find index? Well, we can do that with getIndex() method alone;
- How do I do that? I’m going to compute the value from the hash function, the length of the array on the module;
- So with the Node at this location, we’ll traverse the LinkedList, that’s what we’re doing on the LinkedList,
- If you find it, update the value;
- If you don’t find it, you put it on the list, either at the head or at the end, but I like to put it on the head, because it’s more likely to be used, but it doesn’t affect the time complexity.
The code is as follows:
public V put(K key, V value) {
int index = getIndex(key);
Node<K, V> node = array[index];
Node<K, V> head = node;
while(node ! =null) {
// There is a key, just update the value if (checkEquals(key, node)) { V preValue = node.value; node.value = value; return preValue; } node = node.next; } // Add a node instead of a key Node<K, V> newNode = new Node(key, value); newNode.next = head; array[index] = newNode; return null; } Copy the code
For more details like rehashing and load factor, check out the source code.
Read the source code we can do Leetcode 706 exercises, so easy~
Differences from Hashtable
This is an age exposed post. HashMap is related to Hashtable, just as ArrayList is related to Vector, and StringBuilder is related to StringBuffer.
Hashtable is an interface provided by earlier JDKS, and HashMap is a new version; The most significant difference between them is that Hashtable is thread-safe, whereas HashMap is not thread-safe.
This is because after Java 5.0, data structures are allowed to be thread-safe. In practice, we found that locking at the level of data structures is not necessary. Locking and releasing locks are expensive in the system, and internal locking can sometimes become a bottleneck in programs.
So HashMap, ArrayList, StringBuilder no longer cares about thread safety, performance is much improved, and of course, thread safety is shifted to us programmers.
Another difference is that a HashMap allows null values in keys, whereas a Hashtable does not. This has the advantage of giving a default value.
All right, let’s do one more problem at the end.
Top K problem
Very often test the Top K question, is also a big factory interview in the rules of the question, the two questions are similar, here to the first subject for example.
Given a group of words, count k words with the highest frequency. For example, in “I love leetcode, I love coding”, the two most frequent items are “I” and “love”.
Some students think this is very easy, but in fact, this is only the mother question, it can be advanced to the system design level to ask:
On an e-commerce site, the most k items sold in the past hour.
Let’s look at the algorithm level first:
Ideas:
Count the frequency of all the words, and take the highest k words in order of frequency.
Details:
Use HashMap to store the frequency of words, using minHeap/maxHeap to fetch the first k.
Implementation:
- To build a
HashMap <key = word, value = frequency of occurrence >
, iterating through the array, incrementing the number of occurrences of the word by 1.
This step is order n.
- Use the minHeap size = k to store the results, and define the comparison order specified in the problem a. a. First, sort by frequency of occurrence; B. Same frequency, in alphabetical order.
- Iterate through the map and add if the number of words in a. minHeap is less than K; B. Or replace it with a more frequent word.
Temporal and spatial complexity analysis:
The first step is order n, and the third step is nlogk, so it adds up to order N logk.
With an extra heap and map, the space complexity is O(n).
Code:
class Solution {
public List<String> topKFrequent(String[] words, int k) {
// Step 1
Map<String, Integer> map = new HashMap<>();
for (String word : words) {
Integer count = map.getOrDefault(word, 0); count++; map.put(word, count); } // Step 2 PriorityQueue<Map.Entry<String, Integer>> minHeap = new PriorityQueue<>(k+1.new Comparator<Map.Entry<String, Integer>>() { @Override public int compare(Map.Entry<String, Integer> e1, Map.Entry<String, Integer> e2) { if(e1.getValue() == e2.getValue()) { return e2.getKey().compareTo(e1.getKey()); } return e1.getValue().compareTo(e2.getValue()); } }); // Step 3 List<String> res = new ArrayList<>(); for(Map.Entry<String, Integer> entry : map.entrySet()) { minHeap.offer(entry); if(minHeap.size() > k) { minHeap.poll(); } } while(! minHeap.isEmpty()) { res.add(minHeap.poll().getKey()); } Collections.reverse(res); return res; } } Copy the code
LRU Cache
This is a very popular question for both domestic and North American interviews.
You can refer to the previous article “LRU Cache takes you through the Nature of interviews”.
That’s all for this article. If you feel good, please remember to bookmark and forward. Also want to see more data structures and algorithms with me friends, remember to pay attention to me, I am Tian Xiaoqi, Java is such a thing.
The resources
[1]
HashCode () refer to the article: https://blog.csdn.net/xusiwei1236/article/details/45152201
[2]
Hash conflict wiki: https://en.wikipedia.org/wiki/Hash_table