Find the NTH largest number in a very large array

preface

This is a big factory interview question, the requirement is to be as fast as possible, so the first time to deny the first sort after taking the subscript, this takes a long time for sure. In the interview I was told that I would do something similar to the median in the data stream: change the size of the two heapsto size N and length-n, and then do everything else as the median does

But the interviewer was not satisfied with my answer and asked me, do you think this question is the same as finding the median? Then he showed me a way to use quicksort to get the NTH largest number without sorting all of it. He didn’t say exactly how, but I forgot about it. (Finally, I got the interviewer’s comment that an algorithm still needs efforts)

The title

In a very large random number group, all the numbers are not repeated, how to find the NTH largest number in the fastest timeCopy the code

thinking

Here is the method that I thought of using two piles in the interview process. After the interview, I started to implement it again. I don’t have to analyze this method much, and the general process is like this:

  • Set up two heaps, one big top heapA, a small top heapB
  • At the top of the small pileBThe length is less thann, puts the current array elements into the big top heapAAnd thenAThe top of the heap element is removed and placedB
  • At the top of the small pileBLength is equal to then, puts the current array element intoBAnd thenBThe top of the heap element is removed and placedA

It turns out that only one small top heap B is enough. There is no need to use two heaps like the median in the data stream.

 // Put the current array element into the heap
 B.insert(num)
 ​
 // When the heap size reaches n, pop the top element of the heap
 if(B.size() == n) {
     B.pop()
 }
Copy the code

On second thought, isn’t this the idea of using heap sort, and then finding the number after sorting, as the interviewer said, is it the same as finding the median? Using the quicksort method, you can get the NTH largest number without sorting the entire array. How to do this will be revealed in the following article

answer

Here’s how quicksort works:

Quick sort is a kind of based on the thought of “divide and conquer” the ranking method of implementation, every time will choose an element as a benchmark elements, compared with the benchmark element to carry on the exchange, is greater than the benchmark aside, put on the other side is less than the benchmark, then array to benchmark where elements segmentation, on both sides of the partition of each of the above operation again, The array is sorted until all the pieces are 1 in length

Here’s a GIF to help you understand:

In the process of this sort, each selected benchmark element (usually choose the current segmentation of the array first) after completed the exchange, in fact already in the sorted array in the right position, so can judge whether the subscript of the base element is equal to n – 1, to get the number n of large Numbers

The specific implementation code is as follows:

 const getNthNum = (arr, n, start = 0, end = arr.length - 1) = > {
     if(start >= end) return arr[start]
 ​
     const pivotIndex = partition(arr, start, end)
 ​
     if(pivotIndex == n - 1) {
         return arr[pivotIndex]
     } else if(pivotIndex > n - 1) {
         // If the index of the current base element is greater than n-1, keep looking to the left
         return getNthNum(arr, n, start, pivotIndex - 1)}else {
         // If the index of the current base element is greater than n-1, continue to look to the right
         return getNthNum(arr, n, pivotIndex + 1, end)
     }
 }
 ​
 const partition = (arr, start, end) = > {
     let pivot = arr[start]
 ​
     // Used to mark boundaries that are currently in the correct order
     let mark = start
 ​
     for(let i = start + 1; i <= end; i++) {
         if(arr[i] > pivot) {
             // mark++ is equivalent to extending the bounds outward
             mark++
             // Then swap the part that belongs outside the boundary with the newly found correct element
             [arr[mark], arr[i]] = [arr[i], arr[mark]]
         }
     }
 ​
     arr[start] = arr[mark]
     arr[mark] = pivot
 ​
     return mark
 }
Copy the code

To verify that this method is actually faster than fetching elements after sorting, I wrote a piece of code for a simple test. Here is the part that automatically generates the test data:

 // Shuffle algorithm, used to generate monotonically increasing arrays out of order
 Array.prototype.shuffle = function() {
     var array = this;
     var m = array.length,
         t, i;
     while (m) {
         i = Math.floor(Math.random() * m--);
         t = array[m];
         array[m] = array[i];
         array[i] = t;
     }
     return array;
 }
 ​
 // Use keys() to fill the 20480 size array with 0 to 20479 strokes
 const array = [...new Array(20480).keys()]
 ​
 array.shuffle()
 ​
 const n = parseInt(Math.random() * 10000)
Copy the code

Array.prototype.sort (array.prototype.sort); array.prototype.sort (array.prototype.sort); So the direct sort subscript here is using JS code to write insert sort, for reference only

The results are as follows:

It can be seen in the array length of 20480 age, insertion sort time consuming and the rest of the gap is very large, and the quick sort is almost twice the heap sort is fast, so you can think quick sort to a more optimal solution (dare not say the optimum solution here, feel there may be better, if you have know bosses also please tell in the comments section)