Interview question: Find the KTH element (topK)

I added algorithm E to the original.

One, the introduction

So this is sort of a Top of K problem. What does that mean? How do I find the largest (K) element in an unordered array? We don’t care about a lot of data here, we can fit it into memory.

Second, common algorithms

Algorithm is A:

Sort the elements of the array in ascending order, find the array subscript k-1 element. This is the easiest way to think about it, if you use a simple sorting algorithm, order n^2.

Method B:

Step 1: Initialize an array of length K, read K elements first, sort the elements in descending order (or ascending order is ok), then the KTH element is the last one.
Step 2: Read the next element and compare it with the sorted KTH element. If it is greater than, delete the current KTH element and place it in the first k-1 correct position (here is a simple insertion sort. If you don’t know insertion sort, you can see the illustration here.
Time complexity: the first step is to use the ordinary sorting algorithm time complexity is O(k^2); Step 2: N minus k times k minus 1 = Nk minus k squared plus k. So the time complexity of algorithm B is O(NK). When k=N/2 (rounded down), the time complexity is still O(N ^2).

In fact, the KTH big problem, you can also do the opposite, that is, the N minus K plus 1 small problem. This is equivalent. So when K is equal to N over 2, which is the hardest part, but it’s also interesting, when K is equal to the median.

Better algorithm

Algorithm C:

The idea: read the data into an array, do a buildHeap on the array (we’re building the big top heap here), and then deleteMax the heap K times, and the KTH time the result is the value we need. (Because it is a big top heap, the data is sorted from large to small, more on heap sorting later).

Now let’s address the question left over from the previous section, why is buildHeap linear? For those unfamiliar with heaps, take a look at the schematic priority queue (heap). Let’s look at the code implementation first.

public PriorityQueue(T[] items) {

       // The number of elements in the heap

       currentSize = items.length;

       // Self-implement declaration

       array = (T[]) new Comparable[currentSize +1];

       int i = 1;

       for (T item : items){

           array[i++] = item;

       }

       buildHeap();

   }



private void buildHeap(a) {

       for (int i = currentSize / 2; i > 0; i--){

           // Heap filter method, refer to the link above

           percolateDown(i);

       }

   }

Copy the code

In the figure, an unordered tree is initialized, and after seven percolate downs, a large top heap is generated. As you can see from the figure, there are 9 dashed lines, each corresponding to 2 comparisons, for a total of 18 comparisons. To determine the buildHeap time bound, we need to count the number of dashed lines, which is the maximum number of dashed lines we can get by adding the height of all the nodes in the heap. The sum is order N.

Theorem: The sum of the heights of an ideal binary tree (full binary tree) containing 2h+1-1 nodes of height h is 2h+1-1- (h+1).

What is a full binary tree? A full binary tree is a completely filled binary tree, with the last layer filled, as shown in the figure. A complete binary tree is filled except for the last layer, and the last layer must be filled from left to right, which is the structure of the heap described in the previous text. A full binary tree is always a complete binary tree, and a complete binary tree is not always a full binary tree.
Prove the theorem:

It’s easy to see that in a full binary tree, at height H, there’s 1 node; Two nodes on height H-1, two nodes on height H-2 and two nodes on height H-i in generalⁱConsists of two nodes.

Multiply both sides of the equation by 2 to get:

Subtract the two equations to get:

So the theorem has to be proved.Because the heap consists of a complete binary tree, the number of nodes in the heap is 2^hAnd 2^h+1Between, so that means that this sum is order N.So buildHeap is linear. So the time complexity of algorithm C is: initialize array O(N), buildHeap O(N), K deleeMax requires O(klogN), soTotal time complexity: O (N + N + klogN) =O(N+klogN).If K is N over 2, the running time is order NlogN..

Algorithm D:

Algorithm idea: We adopt the idea of algorithm B, except that we build a small top heap with K nodes. As long as the next number is larger than the root node, the root node will be deleted and the number will be filtered down. So the KTH greatest number in the algorithm is the number of roots.
Time complexity: buildHeap for K numbers is O(K), in the worst case assuming that all remaining N-k numbers go into the heap for downfiltering, the total is O(K +(n-k)logk). If K is N over 2, you need order NlogN.

Algorithm E: Fast selection

Reference quicksort and its optimization, using the idea of quicksort, make a little change.

Algorithm idea:

If the number of elements in S is 1, then k=1 and return the elements in S. If you are using a small array by the method and | S | < = CUTOFF, will return to the first K S order and the largest element
Take an element V of S and call it pivot;
Divide S-{v} (the elements of S except the pivot elements) into two disjoint sets S₁And S₂, S₁All the elements in this set are less than or equal to the pivot elements v, S₂All elements in are greater than or equal to the pivot element;
If k < = | S₁|, then the first k yuan is largest in S₁In the. At this point, we return quickSelect(S₁, k). If k = 1 + | S₁|, then hub for the yuan is the first largest k yuan, we return to the hub for the yuan directly. Otherwise, the KTH largest element must be in S₂Phi, it’s S₂The first (k – | S₁| – $1) most. We make a recursive call and return quickSelect(S₂K – | S₁| – 1).

Java implementation:

public class QuickSelect {

    / * *

* Cut-off range

* /

    private static final int CUTOFF = 5;



    public static void main(String[] args) {

        Integer[] a = {8.1.4.9.6.3.5.2.7.0.12.11.15.14.13.20.18.19.17.16};

        int k = 5;

        quickSelect(a, a.length - k + 1);

        System.out.println("The first" + k + "The big element is:" + a[a.length - k]);

    }



    public static <T extends Comparable<? super T>> void quickSelect(T[] a, int k) {

        quickSelect(a, 0, a.length - 1, k);

    }



    private static <T extends Comparable<? super T>> void quickSelect(T[] a, int left, int right, int k) {

        if (left + CUTOFF <= right) {

            // Get the hub element by splitting the median of three numbers

            T pivot = median3(a, left, right);



            // Start to split the sequence

            int i = left, j = right - 1;

            for(; ;) {

                while (a[++i].compareTo(pivot) < 0) {

                }

                while (a[--j].compareTo(pivot) > 0) {

                }

                if (i < j) {

                    swapReferences(a, i, j);

                } else {

                    break;

                }

            }

            // Swap the hub element with the element of position I

            swapReferences(a, i, right - 1);



            if (k <= i) {

                quickSelect(a, left, i - 1, k);

            } else if (k > i + 1) {

                quickSelect(a, i + 1, right, k);

            }

        } else {

            insertionSort(a, left, right);

        }

    }



    private static <T extends Comparable<? super T>> T median3(T[] a, int left, int right) {

        int center = (left + right) / 2;

        if (a[center].compareTo(a[left]) < 0) {



            swapReferences(a, left, center);

        }

        if (a[right].compareTo(a[left]) < 0) {

            swapReferences(a, left, right);

        }

        if (a[right].compareTo(a[center]) < 0) {

            swapReferences(a, center, right);

        }

        // Place the hub element at right-1

        swapReferences(a, center, right - 1);

        return a[right - 1];

    }



    public static <T> void swapReferences(T[] a, int index1, int index2) {

        T tmp = a[index1];

        a[index1] = a[index2];

        a[index2] = tmp;

    }



    private static <T extends Comparable<? super T>> void insertionSort(T[] a, int left, int right) {

        for (int p = left + 1; p <= right; p++) {

            T tmp = a[p];

            int j;



            for (j = p; j > left && tmp.compareTo(a[j - 1]) < 0; j--) {

                a[j] = a[j - 1];

            }



            a[j] = tmp;

        }

    }

}



// Output the result

// The fifth element is: 16

Copy the code

Because I’m doing ascending sort here, taking the KTH large is the same thing as taking the N minus K plus 1.

Worst time complexity: As with quicksort, O(N2) is the worst when a subsequence is empty.
Average time complexity: As you can see, quick selection recurses only one subsequence at a time. The average time complexity is O(N).

Four,

In this paper, several solutions to the problem of top(K) are described in detail. The first two are quite ordinary, while the last two are better. It is temporarily given that the median takes O(NlogN) time to solve. We will show later that the average O(N) time complexity can be achieved using a fast selection method that recurses only one subsequence at a time.