The first K high-frequency elements, which is a very representative problem, are also applied in many scenarios in real life, such as real-time hot information statistics in Weibo every day.
Let’s look at the problem first:
Given an array of integers, nums, and k, return the elements with the highest frequency of k. You can return the answers in any order.
Example 1:
Input: nums = [1,1,1,2,2,3], k = 2 output: [1,2]
Advanced: The time complexity of your algorithm must be better than O(n log n), where n is the size of the array.
So this is going to take a lot more time than nlogn, so you can get rid of quicksort, merge, and so on.
And the time complexity of logn is very quickly associated with binary, binary search trees and heap. The first two think about it this scenario doesn’t work.
Use the heap
So let’s figure out can the heap be less than order nlogn? The answer is yes, the heap can wait for the result in order of nlogk (k<n).
Bucket sort
So what else can we do about it besides the heap? Of course there is, and that’s bucket sort.
Train of thought
Let’s see why we can do both heap and bucket sort. My derivation is as follows, you can take it as a reference, if you have a good plan, you are welcome to discuss with me.
To count the first k high-frequency elements, we must traverse the entire array at least once. And we immediately thought of Map to keep track of the number of occurrences of each element, which we did in the allophanumeric problem.
Then you need to sort the values in the Map, and then the first k elements from the largest to the smallest are the results. The difference in the solution is in the implementation of sorting.
You can have various sorts, quicksort, merge, insert, bucket, bubble, hill, heap sort, etc., and the time complexity of the various sorts determines the final time complexity.
And then since the only things that are less than the advanced order of nlogn in each sort are O of nlogk for heap sort and O of n for bucket sort.
The specific implementation
The heap
Method a
int[] topK = new int[k]; Map<Integer, Integer> countMap = new HashMap<>(); PriorityQueue<int[]> PriQueue = new PriorityQueue<>((O1, O2)->(O2 [1]-o1[1])); for (int num : nums) { countMap.put(num, countMap.getOrDefault(num, 0) + 1); } for (Map.Entry<Integer, Integer> entry : countMap.entrySet()) { int value = entry.getKey(), count = entry.getValue(); priQueue.offer(new int[]{value, count}); } for (int i = 0; i < k; i++) { topK[i] = priQueue.poll()[0]; } return topK;
This is a mistake that I wrote at the beginning. It should be a mistake that beginners tend to make. The time complexity of this solution is order nlogn, not order nlogk, and I’ll show you why in the next solution. PriorityQueue’s generic type is int[]. There is no need to create a new structure for PriorityQueue. Instead, use the existing Map.entry
.
Method 2
int[] topK = new int[k]; Map<Integer, Integer> countMap = new HashMap<>(); PriorityQueue<int[]> PriQueue = new PriorityQueue<>((o1, O2)->(o1[1] -O2 [1])); for (int num : nums) { countMap.put(num, countMap.getOrDefault(num, 0) + 1); } for (Map.Entry<Integer, Integer> entry : countMap.entrySet()) { int value = entry.getKey(), count = entry.getValue(); If (priqueue.size () < k) {priqueue.offer (new Int []{value, count}); } else { if (priQueue.peek()[1] < count) { priQueue.poll(); priQueue.offer(new int[]{value, count}); } } } for (int i = 0; i < k; i++) { topK[i] = priQueue.poll()[0]; } return topK;
The difference between method one and method one is that method one builds a large top heap and puts all n elements into the heap priorityQueue, so that the time complexity of the whole heap is O(nlogn), while method two builds a small top heap and keeps only k elements in the heap. If the number of elements is greater than k, the top elements will be removed from the heap before other elements are added to the heap.
How do you decide whether to use a large top heap or a small top heap in order of nlogk? The way I remember it is, if you want to get the top k, you build a small top heap, and you keep adding more than the top of the heap into the small top heap, and you end up with the largest number of k left in the heap; And vice versa.
Method three
if (nums.length == 0 || nums.length < k) {
return new int[]{};
}
Map<Integer, Integer> numsMap = new HashMap<>();
for (int num : nums) {
numsMap.put(num, numsMap.getOrDefault(num, 0) + 1);
}
PriorityQueue<Map.Entry<Integer, Integer>> pri = new PriorityQueue<>((o1, o2) -> o1.getValue() - o2.getValue());
for (Map.Entry<Integer, Integer> num : numsMap.entrySet()) {
if (pri.size() < k) {
pri.offer(num);
} else if (pri.peek().getValue() < num.getValue()) {
pri.poll();
pri.offer(num);
}
}
int[] res = new int[k];
for (int i = 0; i < k; i++) {
res[i] = pri.poll().getKey();
}
return res;
Method three is Map.Entry
implementation, this code will be much simpler, much better readability.
What is the difference between method 3 and method 4?
I first saw and thought for a while before I found the implementation idea.
Method four
Map<Integer, Integer> map = new HashMap(); for(int n : nums) { int freq = map.getOrDefault(n, 0) + 1; map.put(n, freq); } // compare the entries of the map directly so that there are multiple elements, Queue< map.entry <Integer,Integer>> heap = new PriorityQueue<>((a, b) -> a.getValue() -b.getValue ())); / * * * here is also a small root stacks, maintenance and normal here but a little difference, the normal way of thinking is to be able to determine whether elements after k element with larger than the roof, and then out of the heap heap, * implement a slightly different, here is the first pile, and then in a pile of pile top elements, the time complexity is O (nlog (k + 1)), */ for(Map.entry <Integer,Integer> Entry: Map.entrySet ()) {heap.offer(Entry); if(heap.size() > k) { heap.poll(); } } int[] res = new int[k]; for(int i = 0; i < k; i++) { res[i] = heap.poll().getKey(); } return res;
Strictly speaking, the time complexity here is O(nlog(k+1)), because there will be k+1 elements in the heap, and when the element is detected larger than k, the smallest element at the top of the heap will be taken out of the heap and put into the heap. This avoids the comparison process, but the time complexity is a little higher, and this solution can also be used when the performance requirements are not so strict.
Bucket sort
The idea of bucket sort is to take as many buckets as 1-nums.length, and then traverse the whole Map, putting the Entry of the Map into the bucket of value. At this time, there may be multiple elements in a bucket, and then remove the elements from the largest to the smallest.
So what I was thinking at the beginning was a simple way of replacing a bucket with an array, but that’s not going to work, because there might be multiple elements in a bucket, and there might be overwriting, and then counting sort scenarios where you can replace a bucket with an array.
Map<Integer, Integer> countMap = new HashMap<>(); for (int num : nums) { countMap.put(num, countMap.getOrDefault(num, 0) + 1); } // You can't write a list of numbers in an array, because there will be multiple numbers in the array, and the list will be overwritten. List<Integer>[] tong = new arrayList [nums.length + 1]; // ArrayList[nums.length + 1]; for (Map.Entry<Integer, Integer> num : countMap.entrySet()) { if (tong[num.getValue()]==null) tong[num.getValue()] = new ArrayList<>(); tong[num.getValue()].add(num.getKey()); } int[] res = new int[k]; int index = 0; Int I = nums.length; int I = nums.length; int I = nums.length; int I = nums.length; int I = nums.length; i >= 0; i--) { if (tong[i] == null) continue; for (int n : tong[i]) { res[index++] = n; if (index == k) return res; } } return res;
Write in the last
When we do problems, don’t just try to solve them, but try out all the possible solutions, and then compare the pros and cons, and see if we can optimize some of the slow solutions to improve their efficiency.
Sometimes you have increased the efficiency of an algorithm tenfold or even dozens of times. The sense of achievement is unprecedented!
Enjoy the process and you’ll write less and less bad-smelling code later on.