A variety of TopK algorithm implementation

I am the front-end watermelon brother, today to complete the TopK algorithm.

TopK, that is, find the smallest (or largest) number of k in the array, and do not require these numbers to be returned in order.

This is a classic interview question. There are also a lot of solutions, which can better investigate the data structure and algorithm basis of the interviewer.

So the idea is the same, let’s say we’re trying to find the smallest number of k.

The corresponding LeetCode subject address has two:

“Sword finger Offer (2nd edition)” question 40: leetcode-cn.com/problems/zu…
“Programmer Interview Classic (6th edition)” 17.14: leetcode-cn.com/problems/sm…

The sorting

The simplest way is full sort, which is to sort all the elements of the array in ascending order and then take the first k number. The sorting API of the programming language is usually used to ensure correctness.

function getLeastNumbers(arr: number[], k: number) :number[] {
  return arr.sort((a, b) = > a - b).slice(0, k);
};
Copy the code

In the actual development, if there is a need for this, and the amount of data is not large, the built-in sorting method is the most reliable.

Because the underlying sorting method is usually fast sorting algorithms with good time complexity, the time complexity of full sorting is order n log of n.

In addition to the full sort, you can actually do an optimization, do a local sort. In some algorithms (bubble and selection sort), the ith iteration of sorting puts the ith element in the same position as the last sorted element.

Here we have a look at the magic selection sort implementation:

function getLeastNumbers(arr: number[], k: number) :number[] {
  let min: number;
  let minIdx: number;
  for (let i = 0; i < k; i++) { // This iterates k times
    min = arr[i];
    minIdx = i;
    for (let j = i + 1; j < arr.length; j++) {
      if (arr[j] < min) {
        min = arr[j];
        minIdx = j;
      }
    }
    [arr[i], arr[minIdx]] = [arr[minIdx], arr[i]]; / / exchange
  }
  return arr.slice(0, k);
};
Copy the code

As long as we go from n iterations to k iterations, we get an array with only the first K elements sorted.

Local sorting algorithm is optimized to O(k*n) on the basis of O(n^2).

When k is very small, local sorting is more efficient than full sorting.

Fast row thought

At the heart of quicksort are divide and conquer and partition.

Limited to space, the specific principle and implementation of quicksort can see my article: the classic implementation of quicksort, you really can write?

In simple terms, the implementation of quicksort is to take a pivot value at a time and place whatever is less than or equal to pivot on the left interval and whatever is greater than pivot on the right interval. And then you keep doing the same thing for both intervals, until the interval is less than or equal to 1, recursively from top to bottom.

Quicksort used to recurse on both intervals. Now it recurses selectively.

After each partition:

If pivot is less than k, the recursion ends, and the first K elements of the array are the smallest K elements;
If pivot is to the left of K, then the left range of pivot can be sorted, with values less than or equal to pivot on the left. Then we recurse on the right interval;
If pivot is to the right of K, then pivot’s right interval is ignored and the left interval is recursed.

There is an important point here: the value of the index on which pivot is placed after each partition is the value that the entire array should be sorted. That’s why we can ignore one of these intervals.

function getLeastNumbers(arr: number[], k: number) :number[] {
  partition(arr, 0, arr.length - 1, k);
  return arr.slice(0, k);
};

function partition(arr: number[], lo: number, hi: number, k: number) {
  if (lo >= hi) return;
  const pivot = arr[hi];  // Choose pivot at random instead
  let i = lo;
  for (let j = lo; j < hi; j++) {
    if (arr[j] <= pivot) {
      swap(arr, i, j);
      i++;
    }
  }
  swap(arr, i, hi);
  
  // Select * from the partition where the last two rows are located
  // partition(arr, i + 1, hi, k);
  // partition(arr, lo, i - 1, k);
  if (i < k) {
    partition(arr, i + 1, hi, k);
  } else if (i > k) {
    partition(arr, lo, i - 1, k); }}function swap(arr: number[], i: number, j: number) {
  let tmp = arr[i];
  arr[i] = arr[j];
  arr[j] = tmp;
}
Copy the code

The average time complexity is O(n), the worst time complexity is O(n^2). It’s generally more efficient than fast exhaust, except in extreme cases.

Big pile top

The big top heap is a complete binary tree. The value of any node is greater than or equal to the value of any node in the subtree.

Let’s create a big top heap of size K. Let’s put k elements in. When you want to add new elements later, compare them to the top of the heap (the maximum value of the heap). If it is smaller than the largest element in the heap, the top element is not qualified and cannot be a member of TopK, so it is removed from the heap and the new element is added to the heap; Otherwise, the new element doesn’t go into the heap.

Keep doing this until you have traversed the entire array. The last heap is our TopK.

JavaScript doesn’t have built-in data structures like heaps or priority queues, so we’ll leave that out for now.

The time to load the heap is order log k, n times, so the total time is order n log k.

If you need to maintain a TopK dynamically, such as your website’s daily leaderboard, using a big top heap is more appropriate.

At the end

In general, the scheme of fast sorting has the lowest time complexity, the big top heap is suitable for dynamic maintenance of TopK, and the full sort is suitable for business code writing and data volume is not large.

A variety of TopK algorithm implementation

The sorting

Fast row thought

Big pile top

At the end

Related Posts

Micro front End Qiankun Project Practice!!

All the front end to see 2D gamification of interaction primer

Webpack4 Process Analysis 4