The introduction

Today, this article explains an active topic in the interview of various big factories (Tencent, Byte, Ali, etc.) : Top K problem, will be introduced in the following vein:

  • What is the Top K problem
  • Five classical solutions of Top K problem
  • Quick review and summary
  • Top K swordsmen: minimum K number, first K high-frequency elements, KTH minimum element (with solutions)

Let’s get started at 👇

What is Top K problem

What is the Top K problem? In simple terms, it is to find the number with the highest frequency of the first K in a set of data, or the number with the highest frequency of the first K (or of course, the first K can be small).

The classical Top K problems include the largest (small) K number, the first K high-frequency elements, and the KTH largest (small) element

The following takes the KTH largest element in the array as an example to introduce the solution of the Top K problem

Topic:

Finds the KTH largest element in the unsorted array. Notice that you want to find the KTH largest element in the sorted array.

Example:

Input:4.5.1.6.2.7.3.8] and k =4Output:5
Copy the code

Two, solution: global sort, take the KTH number

The simplest thing we can think of is to sort the array (which can be the simplest sort) by taking the first K numbers, so easy

Code implementation:

let findKthLargest = function(nums, k) {
    nums.sort((a, b) = > b - a);
    return nums[k- 1]};Copy the code

Complexity analysis:

  • Time complexity: O(nlogn)
  • Space complexity: O(logn)

Note:

Before version 7.0 of the V8 engine, array.prototype.sort () used insert sort when the Array length was less than 10, and quicksort otherwise.

Quicksort was abandoned after V8 7.0 because it was not a stable sorting algorithm and in the worst case time complexity was degraded to O(n2).

Instead, a hybrid sorting algorithm is used: TimSort.

This algorithm was originally used in Python, and it is not strictly one of the above 10 sorting algorithms, but a hybrid sorting algorithm:

Insert sort is used in small subarrays, and then merge sort is used to merge the ordered subarrays, with O(nlogn) time complexity.

Three, solution: local sorting, bubbling

They just want to figure out the KTH largest element in the array, they don’t have to sort the whole array

You can use bubble sort, bubbling the largest number to the far right only k times

Dynamic graph demonstration

Code implementation

let findKthLargest = function(nums, k) {
    // Do k-round bubble sort
    bubbleSort(nums, k)
    return nums[nums.length-k]
}

let bubbleSort = function(arr, k) {
    for (let i = 0; i < k; i++) {
        // Early exit bubble loop identifier bit
        let flag = false;
        for (let j = 0; j < arr.length - i - 1; j++) {
            if (arr[j] > arr[j + 1]) {
                const temp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = temp;
                flag = true;
                // Indicates that a data exchange has occurred}}// There is no data exchange
        if(! flag)break}}Copy the code

Complexity analysis:

  • Time complexity: Best time complexity O(n), average time complexity O(n*k)
  • Space complexity: O(1)

Four, solution: before the structurekMaximum element small top heap, take the top heap

We can also solve this problem by constructing a small top heap of the first k largest elements, where any node value on the small top heap must be less than or equal to the value of its left and right child nodes, i.e. the top of the heap is the minimum.

So we can take k elements out of the array and make a small top heap, then compare the rest of the elements to the small top heap, replace the top if it is larger than the top of the heap, and then heapize. After all the elements are iterated, the top of the heap in the heap is the KTH maximum

The specific steps are as follows:

  • Take the front of the arraykThe number (0 到 k-1Bit), construct a small top heap
  • fromkBits start traversing the number group, each data is compared with the top element of the small top heap, if smaller than the top element, no processing is done, continue traversing the next element; If it is larger than the top of the heap, the element is replaced with the top of the heap, and then heaped into a small top heap.
  • After traversal, the data at the top of the heap is the KTH largest data

Code implementation:

let findKthLargest = function(nums, k) {
    // Build a small top heap by taking the first k numbers from nums
    let heap = [,], i = 0
    while(i < k) {
       heap.push(nums[i++]) 
    }
    buildHeap(heap, k)
    
    // Start with k bits
    for(let i = k; i < nums.length; i++) {
        if(heap[1] < nums[i]) {
            // Replace and heap
            heap[1] = nums[i]
            heapify(heap, k, 1)}}// Return the top of the heap element
    return heap[1]};// Build the heap in situ, from back to front, from top to bottom
let buildHeap = (arr, k) = > {
    if(k === 1) return
    // From the last non-leaf node, top-down heap
    for(let i = Math.floor(k/2); i>=1 ; i--) {
        heapify(arr, k, i)
    }
}

/ / heap
let heapify = (arr, k, i) = > {
    // Top-down heap
    while(true) {
        let minIndex = i
        if(2*i <= k && arr[2*i] < arr[i]) {
            minIndex = 2*i
        }
        if(2*i+1 <= k && arr[2*i+1] < arr[minIndex]) {
            minIndex = 2*i+1
        }
        if(minIndex ! == i) { swap(arr, i, minIndex) i = minIndex }else {
            break}}}/ / exchange
let swap = (arr, i , j) = > {
    let temp = arr[i]
    arr[i] = arr[j]
    arr[j] = temp
}
Copy the code

Complexity analysis:

  • Time complexity: Traversing the number group requires O(n) time complexity, and once heapification requires O(logk) time complexity, so the time complexity of solving the problem of Top K using heap is O(nlogk).
  • Space complexity: O(k)

Take advantage of the heap to solve the Top K problem

The problem of finding Top K can be handled by sorting, but what if we need to find Top K in a dynamic array?

Dynamic arrays can insert or delete elements, so do we need to reorder the array every time we solve the Top K problem? That’s order nlogn each time.

Here we can use the heap, we can maintain a small top heap of K size, and when something is added to the array, compare it to the top element, and if it is larger than the top element, replace that element with the top element, and then heap into a small top heap; If it is smaller than the top of the heap, it is not processed. So the time to solve the Top k problem is only O(logk).

Heap sorting, Top K, median, etc

Five, solution: optimization, quickselect algorithm

No matter sorting algorithm or constructing heap to solve Top K problem, we have gone through a certain amount of unnecessary operations:

  • With the sorting algorithm, we only want the KTH maximum, but we sort the array globally or locally
  • If heap sort is used, you need to maintain a size ofkIs a heap (big top heap, small top heap), and also takes a very expensive time, the time complexity isO(nlogk)

So is there an algorithm that doesn’t require sorting, or extra space, to get the KTH largest element?

Which brings us to the fast selection algorithm 😊

The quickSelect algorithm is similar to the quick sorting algorithm. The following is a general introduction to the quick sorting algorithm

Fast row

Fast row using the idea of divide and conquer strategy, and the so-called partition, as the name suggests, is to divide and rule, will be a complicated problem, is divided into two or more sub-problems, similar to that of the sub-problems into smaller sub-problems, until the smaller subproblem can be simple to solve, to solve the subproblems, the original problem solution, the solution for the son of a merger.

There are only three steps in the process of fast sorting:

  • First, select a number from the sequence as the reference number
  • Put everything greater than this number to its right, and everything less than or equal to this number to its left (one quick sorting)partition)
  • Repeat to the left and right of the base until the array is sorted

Specific steps are as follows:

  • Create two Pointers to the left and right ends of the array
  • 2. Take an arbitrary element from the array as the base
  • 3, the left pointer begins to move to the right and encounters a stop larger than the baseline
  • 4. The right pointer starts to move to the left, stops when it encounters an element smaller than the base, and swaps the element pointed to by the left and right Pointers
  • 5. Repeat 3,4 until the left pointer exceeds the right pointer, at which point values less than the baseline are placed to the left and values greater than the baseline are placed to the right
  • 6, and repeat for the left and right sides of the base until the array is sorted

Notice how the benchmark here is nan? The easiest way to do this is to always select the leftmost element as the baseline:

But this is not the best choice for almost already ordered sequences, and it will result in the worst performance of the algorithm. Another option is to select the middle number or use math.random () to randomly select a number as the base. The code implementation below uses random numbers as the base.

Code implementation

let quickSort = (arr) = > {
  quick(arr, 0 , arr.length - 1)}let quick = (arr, left, right) = > {
  let index
  if(left < right) {
    // Divide the array
    index = partition(arr, left, right)
    if(left < index - 1) {
      quick(arr, left, index - 1)}if(index < right) {
      quick(arr, index, right)
    }
  }
}

// A quick queue
let partition = (arr, left, right) = > {
  // take the middle term as the benchmark
  var datum = arr[Math.floor(Math.random() * (right - left + 1)) + left],
      i = left,
      j = right
  // Start adjusting
  while(i <= j) {
    
    // The left pointer moves right
    while(arr[i] < datum) {
      i++
    }
    
    // Right pointer moves left
    while(arr[j] > datum) {
      j--
    }
    
    / / exchange
    if(i <= j) {
      swap(arr, i, j)
      i += 1
      j -= 1}}return i
}

/ / exchange
let swap = (arr, i , j) = > {
    let temp = arr[i]
    arr[i] = arr[j]
    arr[j] = temp
}

/ / test
let arr = [1.3.2.5.4]
quickSort(arr)
console.log(arr) // [1, 2, 3, 4, 5]
// The second maximum value
console.log(arr[arr.length - 2])  / / 4
Copy the code

Quicksort is sort from smallest to largest, so the KTH maximum is in the n-k position

Complexity analysis

  • Time complexity: O(nlogn)
  • Space complexity: O(nlogn)

Quickselect algorithm

So we did quicksort to get the KTH maximum, but we didn’t have to bother,

We just need to compare whether the reference position is in the n-k position every time we perform quicksort,

  • If less thann-k, then the KTH maximum value is to the right of the reference value, we just need to recursively fast-arrange the sub-sequence to the right of the reference value;
  • If more thann-k, then the KTH maximum value is on the edge of the reference value, we only need to recursively fast-arrange the sub-sequence to the left of the reference value;
  • If a is equal to then-k, the KTH maximum value is the reference value

Code implementation:

let findKthLargest = function(nums, k) {
    return quickSelect(nums, nums.length - k)
};

let quickSelect = (arr, k) = > {
  return quick(arr, 0 , arr.length - 1, k)
}

let quick = (arr, left, right, k) = > {
  let index
  if(left < right) {
    // Divide the array
    index = partition(arr, left, right)
    // Top k
    if(k === index) {
        return arr[index]
    } else if(k < index) {
        // Top k is on the left
        return quick(arr, left, index- 1, k)
    } else {
        // Top k is on the right
        return quick(arr, index+1, right, k)
    }
  }
  return arr[left]
}

let partition = (arr, left, right) = > {
  // take the middle term as the benchmark
  var datum = arr[Math.floor(Math.random() * (right - left + 1)) + left],
      i = left,
      j = right
  // Start adjusting
  while(i < j) {
    
    // The left pointer moves right
    while(arr[i] < datum) {
      i++
    }
    
    // Right pointer moves left
    while(arr[j] > datum) {
      j--
    }
    
    / / exchange
    if(i < j) swap(arr, i, j)

    // When there is duplicate data in the array, they are all datum, but in different positions
    // Continue incrementing I to prevent an infinite loop
    if(arr[i] === arr[j] && i ! == j) { i++ } }return i
}

/ / exchange
let swap = (arr, i , j) = > {
    let temp = arr[i]
    arr[i] = arr[j]
    arr[j] = temp
}
Copy the code

Complexity analysis:

  • Time complexity: Average time complexity O(n), worst case time complexity O(n2)
  • Space complexity: O(1)

Six, solution: continue to optimize the median (BFPRT) algorithm

The median algorithm, also known as the median algorithm, whose worst-time complexity is O(n), is proposed by Blum, Floyd, Pratt, Rivest and Tarjan. The idea of this algorithm is to improve the worst case time complexity of fast selection algorithm by modifying the principal element selection method.

In BFPTR algorithm, only the selection of reference value in Partion in QuickSelect algorithm is changed. In QuickSelect algorithm, we can choose the first element or the last element as the reference element, and the optimized one can choose a random element as the reference element. In the BFPTR algorithm, the median of the median quintile is chosen as the base element (also called pivot) each time. The purpose of this is to make the partition reasonable and avoid the worst case.

BFPRT algorithm steps are as follows:

  • Select principal component
    • Divide n elements into ordern/5Each group has 5 elements. If there is any surplus, remove it
    • For thisn/5Use insertion sort to find their respective median for each of the groups
    • For all the medians found in the previous step, the BFPRT algorithm is called to find their medians, as the principal element;
  • Take the pivot element as the cut-off point, put the less than the pivot element on the left and the greater than the pivot element on the right;
  • Determine the position of the pivot and the magnitude of k, selectively recurse to the left or right

Code implementation:

let findKthLargest = function(nums, k) {
    return nums[bfprt(nums, 0, nums.length - 1, nums.length - k)]
}

let bfprt = (arr, left , right, k) = > {
  let index
  if(left < right) {
    // Divide the array
    index = partition(arr, left, right)
    // Top k
    if(k === index) {
        return index
    } else if(k < index) {
        // Top k is on the left
        return bfprt(arr, left, index- 1, k)
    } else {
        // Top k is on the right
        return bfprt(arr, index+1, right, k)
    }
  }
  return left
}

let partition = (arr, left, right) = > {
  / / benchmark
  var datum = arr[findMid(arr, left, right)],
      i = left,
      j = right
  // Start adjusting
  while(i < j) {
    // The left pointer moves right
    while(arr[i] < datum) {
      i++
    }
    
    // Right pointer moves left
    while(arr[j] > datum) {
      j--
    }
    
    / / exchange
    if(i < j) swap(arr, i, j)

    // When there is duplicate data in the array, they are all datum, but in different positions
    // Continue incrementing I to prevent an infinite loop
    if(arr[i] === arr[j] && i ! == j) { i++ } }return i
}

/** * array arr[left, right] takes a group of five elements and computes the median of each group, * finally returns the median subscripts of these medians (i.e. pivot subscripts). * * @attention returns the last argument in the statement with an extra 1, so that k is always greater than 0. * /
let findMid = (arr, left, right) = > {
    if (right - left < 5)
        return insertSort(arr, left, right);

    let n = left - 1;

    // Find the median in groups of five and move the median to the left of the array in turn
    for (let i = left; i + 4 <= right; i += 5)
    {
        let index = insertSort(arr, i, i + 4);
        swap(arr[++n], arr[index]);
    }

    // Use BFPRT to get the median subscripts of these medians.
    return findMid(arr, left, n);
}

/** * insert sort arr[left, right] and return the median of [left, right] *. * /
let insertSort = (arr, left, right) = > {
    let temp, j
    for (let i = left + 1; i <= right; i++) {
        temp = arr[i];
        j = i - 1;
        while (j >= left && arr[j] > temp)
        {
            arr[j + 1] = arr[j];
            j--;
        }
        arr[j + 1] = temp;
    }
    return ((right - left) >> 1) + left;
}

/ / exchange
let swap = (arr, i , j) = > {
    let temp = arr[i]
    arr[i] = arr[j]
    arr[j] = temp
}
Copy the code

Complexity analysis:

Why 5?

In BFPRT algorithm, why choose 5 as grouping?

First, even numbers are excluded, because the median is easier to calculate for odd numbers.

If YOU choose 3, there is

n

If I pick 7,9 or more, it’s going to take me longer to insert sort, and the constant c is going to be so big that it’s not worth it.

Vii. Review and summary

Finally, we summarize that it is not difficult to solve the topk problem, and there are mainly the following ideas:

  • Overall sort: O(nlogn)
  • Local sort: only bubble sort the first k Max, O(n*k)
  • Using heap: O(nlogk)
  • Counting or bucket sort: Counting sort is used for the first K maximum values, and the time complexity is O(n + M), where M represents the data range. Bucket sorting is used for the highest frequency k, and the time complexity is O(n). However, both require that the input data be integers with a definite range, which is not covered in this article and can be found at github.com/sisterAn/Ja… See the heap series for more information
  • Optimization: QuickSelect algorithm: average O(n), worst O(N2)
  • Continue optimization: Median of median (BFPRT) algorithm: worst O(n)

Here is a brief description of the latter two options:

  • First of all, we need to know what fast sorting is. The process of fast sorting simply has three steps: first, select a number from the sequence as the reference number; Place all numbers greater than this number to its right, and all numbers less than or equal to this number to its left (a quick partition); Repeat to the left and right of the base until the array is sorted

  • Quickselect algorithm: Quicksort will sort all the data, and we only want the KTH maximum value, so QuickSelect optimizes quicksort: When each partition is performed, whether the reference value position is in n-k (the KTH large = the n-k small, n is the array length) position, if less than n-k, then the KTH maximum value is to the right of the reference value, we only need to recurse the subsequence to the right of the reference value. If it is greater than n-k, then the KTH maximum value is on the edge of the reference value. We only need to recursively fast-sort the subsequence to the left of the reference value. If it’s n minus k, then the KTH maximum is the base value, the average time is O(n), the worst case is O(n2), and the worst case is very high

  • Median of median (BFPRT) algorithm: The idea of this algorithm is to modify the quickSelect algorithm’s pivot selection method to improve the time complexity of the algorithm in the worst case

Finally, attached is the classic exercise of Top K problem. There is too much content to solve the problem. Go to git repository to check the solution

  • Tencent & bytes, etc. : minimum number of K:

    Github.com/sisterAn/Ja…

  • Leetcode347: The first K high-frequency elements:

    Github.com/sisterAn/Ja…

  • Byte &leetcode215: the KTH largest element in the array:

    Github.com/sisterAn/Ja…

Past wonderful

  • Front-end advanced algorithm 9: after reading this article, no longer afraid of heap sort, Top K, median question interview
  • Front-end advanced algorithm 8: the headline is the underlying hash table problem
  • Front-end advanced algorithm 7: small white can understand the tree and binary tree
  • Front-end advanced algorithm 6: queue, double-ended queue, sliding window and matching algorithm
  • Front-end advanced algorithm: common algorithm problems and perfect problem solutions
  • Video interview uHF online programming questions, enough to handle most companies
  • 10 questions and 10 answers, with a quick introduction to front-end algorithms
  • Front-end advanced Algorithm 5: Fully read the stack structure used by the front-end (+ LeetCode
  • Front-end advanced algorithm 4: Linked lists are so simple
  • Front-end advanced algorithm 3: Learning the LRU algorithm from the browser cache elimination strategy and Vue’s keep-alive
  • Bottle jun front-end advanced algorithm camp first week summary
  • Front-end advanced algorithm 2: JavaScript array from Chrome V8 source code
  • Front-end advanced algorithm 1: How to analyze and count the execution efficiency and resource consumption of algorithms?

Thank you for reading

Welcome to pay attention to “front-end bottle Gentleman”, reply to “communication” to join the front-end communication group!

Welcome to pay attention to “front-end bottle gentleman”, reply to “algorithm” automatically join, from 0 to 1 to build a complete data structure and algorithm system!

Here, I not only introduce algorithms, but also combine algorithms with various front-end areas, including browsers, HTTP, V8, React, Vue source code, and so on.

Here (algorithm group), you can learn a dachang algorithm programming problem (Ali, Tencent, Baidu, Byte, etc.) or Leetcode every day, Mr. Aquarius will solve it the next day!

In addition, there are handwritten source code problems every week, Aquarius will also solve!