First public account: Bigsai, reprint please attach a link to this article

preface

Hello, everyone, I am Bigsai brother, long time no see, very miss wow 🤩!

Today I’m going to share with you a TOPK problem, but I’m not going to consider a particularly large distributed solution here, a common algorithm problem.

First of all, what is a topK problem?

TopK problem is to find the number of the first k in the sequence is large (or small), topK problem and the KTH large (or small) solution idea is actually roughly the same.

The TopK question is a very classic question that comes up very, very often in both written tests and interviews (never lie). Below, the starting point of small white, think topK is a big problem before K, know topK together!

At present, in the TopK and the KTH big problem solution is almost the same, here is a hard buckle 215 array of the KTH big element as the solution of the problem demonstration. Before learning Topk, here are the top 10 lists programmers must know and know.

The sorting method

Find TopK and sort TopK

What, you want me to find TopK? Do not light TopK, how many you want, HOW many I give you, and give you the order to arrange, what sort I am most familiar with?

If you think of bubble sort O(n^2), you’re careless.

If we use O(n^2) level sorting algorithm, it is also optimized, bubble sort and simple selection sort, each trip can determine a maximum (minimum) value, so we do not need to sort all the data, only K times, so the time complexity of this algorithm is also O(nk).

Here’s a review of the difference between bubble sort and simple selection sort:

Bubble sort and simple selection sort are multiple trips, each trip can determine a maximum or minimum, the difference is that bubble in the enumeration process only compare with the back, if larger than the back then swap; The simple option is to mark a maximum or minimum number and position each time, and then swap it with the last position number of the trip (the enumeration range for each trip becomes smaller).

Here’s a diagram of the process:

Here the code also provides you with a simple choice, the above figure is to choose the smallest, the implementation of each time choose the largest is ok.

// Swap two positional elements in an array
private void swap(int[] arr, int i, int j) {
  int temp = arr[i];
  arr[i] = arr[j];
  arr[j] = temp;
}
// Bubble sort implementation
public int findKthLargest1(int[] nums, int k) {
  for(int i=nums.length-1; i>=nums.length-k; i--)// This is just k times
  {
    for(int j=0; j<i; j++) {if(nums[j]>nums[j+1])// Compare with the neighbor on the right
      {
        swap(nums,j,j+1); }}}return nums[nums.length-k];
}
// Simply select the implementation
public int findKthLargest2(int[] nums, int k) {
  for (int i = 0; i < k; i++) {// We only need K times here
    int max = i; // The minimum position
    for (int j = i + 1; j < nums.length; j++) {
      if (nums[j] > nums[max]) {
        max = j; // Change the minimum position}}if(max ! = i) { swap(nums, i, max);// Swap with the ith position}}return nums[k-1];
}
Copy the code

Of course, you can do quicksort and merge sort and even heap sort, which is O(nlogn), so you sort all the data and then you return the result, so I’m not going to go into that, but you can either tune the API or you can write it by hand.

So both ways of thinking about it, except for the case where K is very small O(nk) is faster, most of the cases are actually O(nlogn) is faster, but from O(n^2) to O(nk), there’s something to be learned.

Heap stack-based optimization

I’ve written about priority queues and heap sorting in the past, but I’m not going to repeat it, so you can take a look:

Priority queues don’t know, look at heap sort

Hardcore, handwritten priority queue

Heap sort O(nlogn) is heap sort O(nlogn) is heap sort O(nlogn) is heap sort O(nlogn) is heap sort O

Heap this data structure, divided into large root heap and small root heap, the small root heap is the value of the parent node is less than the value of the child node, the large root heap is the value of the parent node is greater than the value of the child node, here is definitely to use the large root heap.

The heap looks like a tree structure, but the heap is a complete binary tree and we’re very efficient with arrays, and it’s very easy to find the parent node directly with subscripts, so we implement the heap with arrays, and each time the sorted node moves to the end of the array and lets a new array form a new heap to continue.

Heap sort can be divided into two parts from a large point of view, unordered number to form the heap and on the basis of the heap to take the top of the sort. The time complexity of unordered number is O(n), sorting on the heap basis, taking the top element of the heap every time, and then moving the last element to the top of the heap to adjust the heap, it only needs O(logn) level of time complexity each time, completing the n times is O(nlogn), but we only need K times, So it’s going to take O(klogn) time to sort k elements, so it’s going to take O(n+klogn) time to sort k elements.

I’ve drawn a picture to help you understand it, so if you do it twice you get Top2, and if you do it k times you get TopK.

The implementation code is:

class Solution {
    private void swap(int[] arr, int i, int j) {
        int temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }
    // The current node is effectively converted to a heap (large root).
    public void shiftDown(int arr[],int index,int len)// No need for position 0
    {
        int leftchild=index*2+1;/ / the left child
        int rightchild=index*2+2;/ / right child
        if(leftchild>=len)
            return;
        else if(rightchild<len&&arr[rightchild]>arr[index]&&arr[rightchild]>arr[leftchild])// The right child is in range and should be swapped
        {
            swap(arr, index, rightchild);// Switch node value
            shiftDown(arr, rightchild, len);// May have an impact on the heap of child nodes, refactoring down
        }
        else if(arr[leftchild]>arr[index])// Switch left children{ swap(arr, index, leftchild); shiftDown(arr, leftchild, len); }}// Create a heap of arrays
    public void creatHeap(int arr[])
    {
        for(int i=arr.length/2; i>=0;i--)
        {
            shiftDown(arr, i,arr.length);
        }
    }
    public int findKthLargest(int nums[],int k)
    {
        / / step1 built pile
        creatHeap(nums);
        //step2 create the heap k times, and place the top element of the heap at the end of each time
        for(int i=0; i<k; i++) {int team=nums[0];
            nums[0]=nums[nums.length-1-i];// Remove the top element and place the last element at the top of the heap
            nums[nums.length-1-i]=team;
            shiftDown(nums, 0, nums.length-i-1);// Adjust the heap to a legal large root heap, noting the (logical) length change
        }
        returnnums[nums.length-k]; }}Copy the code

Based on fast platoon optimization

Heap sorting can be optimized, but what about fast sorting?

Of course I can. How can something so awesome be without me?

This part needs to have a certain understanding and the understanding, heap fast long ago wrote: in front of the graphic hand tore the bubbling and fast row stay behind (optimization), fast line of the core idea is: the partition, each time to determine the position of a digital, then digital is divided into two parts, the left side is smaller than it, on the right side is larger than it, then the recursive call this process. The time complexity of each adjustment is O(n), and the average number of times is logn. Therefore, the average time complexity is O(nlogn).

But what does this have to do with finding TopK?

If we want TopK, we want K numbers that are larger than the target number. If we pick a random number like 5, we have 4 to the left of 5 and 4 to the right of 5.

(1) If k-1 is equal to the number to the right of 5, then the middle 5 is the KTH, and both it and its right are TopK.

â‘¡ If k-1 is less than the number of numbers on the right side of 5, it means that TopK is all on the right side of 5, and the space can be directly compressed into the right side to continue to recursively call the same method to search.

â‘¢ If k-1 is greater than the number on the right side of 5, then it means that the right side and 5 are all in TopK, and the left side (k- includes the total number on the right side of 5), then the search scope is compressed, and k is also compressed. So for example, if k is equal to 7, then 5 and 5 are already 5 numbers to the right and must be in the Top7, we just need to find the Top2 to the left of 5.

In this case, the values will be compressed every time, because the fast sorting is not completely recursive, the time complexity is not O(nlogn) but O(n) level (you can find some online proof for details), but the test sample has some extreme code such as give you order with you 1, 2, 3, 4, 5, 6… Looking for Top1 leads to more extreme cases. So I will use a random number to swap with the first one in case of a special example (just for the sake of the problem), of course I will not add a random swap here, and if the TopK to be obtained is not sorted.

The detailed logic can be seen in the implementation code:

class Solution {
    public int findKthLargest(int[] nums, int k) {
        quickSort(nums,0,nums.length-1,k);
        return nums[nums.length-k];
    }
    private void quickSort(int[] nums,int start,int end,int k) {
        if(start>end)
            return;
        int left=start;
        int right=end;
        int number=nums[start];
        while (left<right){
            while (number<=nums[right]&&left<right){
                right--;
            }
            nums[left]=nums[right];
            while (number>=nums[left]&&left<right){
                left++;
            }
            nums[right]=nums[left];
        }
        nums[left]=number;
        int num=end-left+1;
        if(num==k)// stop when k is found
            return;
        if(num>k){
            quickSort(nums,left+1,end,k);
        }else {
            quickSort(nums,start,left-1,k-num); }}}Copy the code

Counting sort extras

There is always some sort of operation – linear sort, so you might ask bucket sort is ok?

Can also, but depending on the value range optimization, bucket sorting is suitable for uniform dense data more times of the case, and counting sorting is hoping that the value can be a little smaller.

So what’s the core idea behind bucket sorting?

Use counting sort to count the number of occurrences of each number, and then add up a new array from back to front.

This case is very suitable for large numbers and small distribution.

I didn’t want to write the code, but you will give me three lines so I can write it

215 / / power button
//1 <= k <= nums.length <= 104
//-104 <= nums[i] <= 104
public int findKthLargest(int nums[],int k)
{
  int arr[]=new int[20001];
  int sum[]=new int[20001];

  for(int num:nums){
    arr[num+10000] + +; }for(int i=20000-1; i>=0; i--){ sum[i]+=sum[i+1]+arr[i];
    if(sum[i]>=k)
      return i-10000;
  }
  return 0;
}
Copy the code

conclusion

Well, that’s all for today’s TopK question, and I’m sure you’ll be able to handle it next time.

TopK is not hard, it’s just clever sorting. Ranking is very important. Interviews are very frequent.

I’m not going to lay my cards on the table here, but how will the interviewer guide you to say the TOPK question?

The Crafty interviewer:

Well, let’s talk about data structures and algorithms, let’s talk about sorting, you’ve seen this before, right? Tell me the three kinds of sorting that you’re most familiar with, and explain how they work.

Humble me:

Bia la bia la bia la bia la

If you mention quick row, bucket sort maybe let you use this sort to achieve the TopK problem, other sort is also possible, so master the top ten sort is very necessary!

Personal original public account: Bigsai welcome your attention, spent half a year to write an original data structure and algorithm PDF.