The first K high-frequency elements, this is a very representative question, and there are many application scenarios in real life, such as daily statistics of real-time hot information in weibo.
Let’s look at the problem:
Given an integer array nums and an integer k, return the element with the highest frequency k. You can return the answers in any order.
Example 1:
Input: nums = [1,1,1,2, 3], k = 2 output: [1,2]
Advanced: The time complexity of your algorithm must be better than O(n log n), where n is the size of the array.
This problem requires more time than O(nlogn), so the general quicksort, merge and so on can be discarded.
For logn, we can think of binary, binary search trees and heaps very quickly. The first two on second thought don’t apply to this scenario.
Use the heap
So can the heap be less than O(nlogn)? The answer is yes, the heap can wait for the result in O(nlogk)(k<n) time complexity.
Bucket sort
Is there any other way we can do it besides heap? Of course there is, and that’s bucket sort.
Train of thought
Let’s figure out why we can use heap and bucket sort to get the answer. My derivation process is as follows, you can take it as a reference, if there is a good plan, welcome to discuss with me.
To count the first k high-frequency elements, we must traverse the entire array at least once. We quickly came up with the idea of a Map to record the number of occurrences of each element, as we did in the problem of letter heterotopic words.
Then we need to sort the values in the Map. After sorting, the first K elements from the largest to the smallest are the results. The solution difference is in the implementation of sorting.
You can have various sorts, quicksort, merge, insert, bucket, bubble, hill, heap sort, etc., and the time complexity of each sort determines the final time complexity.
And since the only ones that are less than order nlogn in each sort are order N logk for heap sort and order n for bucket sort.
The specific implementation
The heap
Method a
int[] topK = new int[k];
Map<Integer, Integer> countMap = new HashMap<>();
// The first is the array value, and the second is the number of occurrences
PriorityQueue<int[]> priQueue = new PriorityQueue<>((o1,o2)->(o2[1]-o1[1]));
for (int num : nums) {
countMap.put(num, countMap.getOrDefault(num, 0) + 1);
}
for (Map.Entry<Integer, Integer> entry : countMap.entrySet()) {
int value = entry.getKey(), count = entry.getValue();
priQueue.offer(new int[]{value, count});
}
for (int i = 0; i < k; i++) {
topK[i] = priQueue.poll()[0];
}
return topK;
Copy the code
This is a mistake I wrote at the beginning of the way to write, should be beginners more prone to make mistakes. The time complexity of this solution is O(nlogn), not O(nlogk), and I’ll explain why in the next solution. In addition, the PriorityQueue uses an int[], which is a bit inelegant. There is no need to create a new structure to record the queue, just use the existing map. Entry
.
Method 2
int[] topK = new int[k];
Map<Integer, Integer> countMap = new HashMap<>();
// The first is the array value, and the second is the number of occurrences
PriorityQueue<int[]> priQueue = new PriorityQueue<>((o1,o2)->(o1[1]-o2[1]));
for (int num : nums) {
countMap.put(num, countMap.getOrDefault(num, 0) + 1);
}
for (Map.Entry<Integer, Integer> entry : countMap.entrySet()) {
int value = entry.getKey(), count = entry.getValue();
// Always keep k elements in the heap so that logk is faster
if (priQueue.size() < k) {
priQueue.offer(new int[]{value, count});
} else {
if (priQueue.peek()[1] < count) {
priQueue.poll();
priQueue.offer(new int[]{value, count}); }}}for (int i = 0; i < k; i++) {
topK[i] = priQueue.poll()[0];
}
return topK;
Copy the code
The difference between method 2 and method 1 is that method 1 builds the big top heap and loads all the n elements into the PriorityQueue, so the time to build the whole heap is O(nlogn), whereas method 2 builds the small top heap and keeps the number of elements in the heap to k, and if the number of elements in the heap is greater than K, the top element is removed from the heap before the other elements are added to the heap.
How do you decide whether to use a big top heap or a small top heap for order nlogk? The way I remember it is, if you want to get the first k, you build the small top heap, and you keep putting things that are bigger than the top of the heap into the small top heap, and you end up with the largest number of k left in the heap; And vice versa.
Method three
if (nums.length == 0 || nums.length < k) {
return new int[] {}; } Map<Integer, Integer> numsMap =new HashMap<>();
for (int num : nums) {
numsMap.put(num, numsMap.getOrDefault(num, 0) + 1);
}
PriorityQueue<Map.Entry<Integer, Integer>> pri = new PriorityQueue<>((o1, o2) -> o1.getValue() - o2.getValue());
for (Map.Entry<Integer, Integer> num : numsMap.entrySet()) {
if (pri.size() < k) {
pri.offer(num);
} else if(pri.peek().getValue() < num.getValue()) { pri.poll(); pri.offer(num); }}int[] res = new int[k];
for (int i = 0; i < k; i++) {
res[i] = pri.poll().getKey();
}
return res;
Copy the code
Method three is map. Entry
implementation, so the code will be much simpler, much more readable.
The following four is from other places to see the implementation method, see and method 3 is different?
I saw it at the beginning and thought for a while before I found the implementation idea.
Method four
Map<Integer, Integer> map = new HashMap();
for(int n : nums) {
int freq = map.getOrDefault(n, 0) + 1;
map.put(n, freq);
}
// Compare the map entries directly, so that there can be multiple elements, without creating new arrays or entity classes
Queue<Map.Entry<Integer,Integer>> heap = new PriorityQueue<>((a, b) -> a.getValue() - b.getValue());
/** * maintain a small root of the heap, but there is a difference between the normal way of doing this, and the normal way of doing this is to check whether the element is larger than the top of the heap after k elements are full. * here is a different way of doing this: first enter the heap, and then exit the top of the heap, which is O(nlog(k+1)). Theoretically, it should be slower than order nlogk, */
for(Map.Entry<Integer,Integer> entry: map.entrySet()) {
heap.offer(entry);
if(heap.size() > k) { heap.poll(); }}int[] res = new int[k];
for(int i = 0; i < k; i++) {
res[i] = heap.poll().getKey();
}
return res;
Copy the code
The time complexity here is strictly O(nlog(k+1)), because there will be k+1 elements in the heap, and when the element is detected to be greater than k, the smallest element at the top of the heap will be removed from the heap and then added to the heap. This avoids the comparison process, but the time complexity is higher, and it can also be used when the performance requirements are not so strict.
Bucket sort
The idea of sorting buckets is to use 1-nums.length, and then traverse the entire Map, put the Map Entry into the bucket value. In this case, a bucket may have multiple elements, and then extract the elements from the largest to the smallest.
So what I started with was a simple idea of replacing a bucket with an array, and that’s not going to work, because buckets might have multiple elements, they might be overwritten, and then we can replace buckets with arrays in counting sort scenarios.
Map<Integer, Integer> countMap = new HashMap<>();
for (int num : nums) {
countMap.put(num, countMap.getOrDefault(num, 0) + 1);
}
// Can not directly use the array record, because there will be more than one number of times, such array will be overwritten
// So just put a list at each array location.
// The number of elements in the bucket is multiple
// A bucket contains all the numbers that occur in the bucket
List<Integer>[] tong = new ArrayList[nums.length + 1];
for (Map.Entry<Integer, Integer> num : countMap.entrySet()) {
if (tong[num.getValue()]==null) tong[num.getValue()] = new ArrayList<>();
tong[num.getValue()].add(num.getKey());
}
int[] res = new int[k];
int index = 0;
// Because the array element is assigned to nums.length, it is nums.length instead of nums.length-1
for (int i = nums.length; i >= 0; i--) {
if (tong[i] == null) continue;
for (int n : tong[i]) {
res[index++] = n;
if (index == k) returnres; }}return res;
Copy the code
Write in the last
When we are doing the problem, we should not only pursue to do it, but also try all possible solutions, and then compare the advantages and disadvantages. For some slow solutions, see if we can optimize and improve their efficiency.
Sometimes you can increase the efficiency of an algorithm by ten times or even tens of times, and the sense of accomplishment is unprecedented!
Enjoy the process, and you’ll write less “bad” code later on.