One, the introduction

As the name implies, quicksort is a quicksort algorithm in practice that is particularly useful in C++ or in sorting Java base types. Its average running time is O(NlogN); But the worst case performance is O(N2). I’ll talk about quicksort first and then how to optimize it.

2. Quicksort

  • Algorithm idea:

Divide and conquer the array into two parts and call it recursively. The quicksort process of sorting the array S

  1. If the number of elements in S is 0 or 1, it returns directly;
  2. Take any element V in S and call it pivot; [The selection strategy of hub elements is very important, which will be detailed below]
  3. Divide S-{v} (the elements of S except the pivot elements) into two disjoint sets S1And S2, S1All the elements in this set are less than or equal to the pivot elements v, S2All elements in are greater than or equal to the pivot element;
  4. Returns the quicksort (S1), hub element v, Quicksort (S1 < / sub2).
  • Selection strategy of hub element
  1. Take the first or the last:Simple butVery silly(Ah, 9 dragons, the picture above??) . When the input sequence is in ascending or descending order, this results in S1The set is empty and all elements except the pivot element are in S2Set, that’s the way to do it,The worst time complexity is O(N)2).
  2. Random selection: This is a safer approach. Unless there is an error in the random number generator, and the probability of continuous poor segmentation is relatively low. However, random number generation is expensive, which increases the running time.
  3. Three-digit median segmentation method: the median (median) of a set of sequences is the best choice of hub element (because the sequences can be divided into two sub-sequences, merge sort tells us that at this time is O(NlogN); But calculating the median of an array can be time-consuming and slow down the efficiency of quicksort. But you can do that by calculating the median of the first, middle, and last element of the array. Take the sequence: [8,1,4,9,6,3,5,2,7,0]. The first element is 8, the middle (left+right)/2(rounded down) element is 6, and the last element is 0. So the median is 6, so the pivot element is 6. It is clear that the bad case of pre-sorted input is eliminated by using the three-digit split method, and the comparison is actually reduced by 14%.
  • Fast row process
  1. Swap the hub element with the last element of the array, so that the hub element leaves the data segment to be split;

  2. Initialize two indexes, left and right, to the first and penultimate element of the array.

  3. If the left index points to an element smaller than the pivot element, left++; Otherwise, left stops. The right index points to an element larger than the pivot element, right–; Otherwise, right stops.

  4. If left<right, the two elements are swapped and the cycle continues with steps 3 and 4. Otherwise, the loop breaks out and the left element is swapped with the pivot element (this is when the split is complete). Call these two subsequences recursively.

    Assume that all elements are different (that is, not equal). We’ll talk about what to do with repeating elements.

    The next step is to move the elements smaller than the pivot elements to the left of the array, and the elements larger than the pivot elements to the right of the array.

    When left is to the left of right, we move left to the right over those elements that are smaller than the pivot element, and right to the left over those elements that are larger than the pivot element. When left and right stop, left points to an element larger than the pivot element and right to an element smaller than the pivot element. If left<right, the two elements are swapped. This pushes an element larger than the pivot element to the right and an element smaller than the pivot element to the left. Let’s visualize the process: left does not move and right moves one position to the left, as shown below:

    We swap elements pointing to left and right, repeating the process until left>right.

    At this point, we can see that the elements to the left of the left are all smaller than the pivot element, and the elements to the right are all larger than the pivot element. We continue to recurse the left and right sequences, and eventually we can finish sorting.

    Above we assume that elements differ from each other. Next we discuss the handling of duplicate elements.

  • Duplicate element processing: simply say, should the left and right indexes stop when an element is equal to a hub element?
  1. If only one of them is stopped: this includes two types, if only the left and right indexes are stopped, this will cause all elements equal to the hub element to be moved into a collection. If you think of a sequence where everything is repeating, that’s the worst case O(N2).
  2. If neither stops: This needs to prevent left and right indexes from crossing bounds and does not swap elements. But the correct process, as illustrated above, is that the pivot element needs to be swapped with the element pointed to by the left index. Again, if everything is the same, that’s going to split the sequence to the left, which is the worst case O(N2).
  3. Stop all: Still consider the case of all elements being equal, which seems to make many “meaningless” swaps; But the positive effect is that the left and right interleaving happens in the middle, which is exactly what we do when we split the sequence into two subsequences, and again, merge sort, which is O(NlogN). Our analysis shows that this is the only case in which quadratic power can be avoided.

In large numbers of inputs, there are a lot of repeating elements. Again, it’s important to think about efficient sorting of these repeating elements.

Is quicksort really fast? For input sequences of small arrays (N<=20), quicksort is better than insert sort; In addition, in the above optimization, the result of recursion can be only one or two elements when the median of three numbers is divided, which will lead to errors. So, continuing optimization is to replace small sequences with insertion sort, which reduces running time by about 15%. A good cutoff range is 10 (actually 5-20 works about the same).

The trinary median segmentation can also be optimized: Assuming the input sequence is A, select a[left], a[center], a[right], select the hub value, and place the minimum and maximum values respectively at a[left], a[right], and place the hub value at a[right-1], which is also the correct position. And it prevents the right from going to the right without crossing the line; So the starting position is left plus 1,right minus 2.

Third, optimize summary Java implementation quicksort:

public class Quicksort {

    / * *

* Cut-off range

* /


    private static final int CUTOFF = 10;



    public static void main(String[] args) {

        Integer[] a = {8.1.4.9.6.3.5.2.7.0};

        System.out.println("Before quicksort:" + Arrays.toString(a));

        quicksort(a);

        System.out.println("After quicksort:" + Arrays.toString(a));

    }



    public static <T extends Comparable<? super T>> void quicksort(T[] a) {

        quicksort(a, 0, a.length - 1);

    }



    private static <T extends Comparable<? super T>> void quicksort(T[] a, int left, int right) {

        if (left + CUTOFF <= right) {

            // Get the hub element by splitting the median of three numbers

            T pivot = median3(a, left, right);



            // Start to split the sequence

            int i = left, j = right - 1;

            for(; ;) {

                while (a[++i].compareTo(pivot) < 0) {

                }

                while (a[--j].compareTo(pivot) > 0) {

                }

                if (i < j) {

                    swapReferences(a, i, j);

                } else {

                    break;

                }

            }

            // Swap the hub element with the element of position I

            swapReferences(a, i, right - 1);

            // Sort the sequence smaller than the pivot element

            quicksort(a, left, i - 1);

            // Sort the sequence larger than the pivot element

            quicksort(a, i + 1, right);

        } else {

            // Insert sort

            insertionSort(a, left, right);

        }

    }



    private static <T extends Comparable<? super T>> median3(T[] a, int left, int right) {

        int center = (left + right) / 2;

        if (a[center].compareTo(a[left]) < 0) {



            swapReferences(a, left, center);

        }

        if (a[right].compareTo(a[left]) < 0) {

            swapReferences(a, left, right);

        }

        if (a[right].compareTo(a[center]) < 0) {

            swapReferences(a, center, right);

        }

        // Place the hub element at right-1

        swapReferences(a, center, right - 1);

        return a[right - 1];

    }



    public static <T> void swapReferences(T[] a, int index1, int index2) {

        T tmp = a[index1];

        a[index1] = a[index2];

        a[index2] = tmp;

    }



    private static <T extends Comparable<? super T>> void insertionSort(T[] a, int left, int right) {

        for (int p = left + 1; p <= right; p++) {

            T tmp = a[p];

            int j;



            for (j = p; j > left && tmp.compareTo(a[j - 1]) < 0; j--) {

                a[j] = a[j - 1];

            }



            a[j] = tmp;

        }

    }



}

// Output the result

// Before quick sort: [8, 1, 4, 9, 6, 3, 5, 2, 7, 0]

// Quick sort: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Copy the code

Quicksort analysis

  1. Worst time complexity: that is, the element is divided into a subsequence, the other subsequence is empty, the time complexity is O(N2).
  2. The best time complexity: that is, the sequence is divided into two sub-sequences, the time complexity is O(NlogN), analysis and merge sort are similar.
  3. Average time complexity: O(NlogN).
  4. Space complexity: O(logN)

Five, the summary

In this paper, the optimization of quicksort is carried out from three aspects: how to choose the hub element, analyze the processing of repeated elements and replace it with insertion sort when dividing into small array recursively. The principle, process and optimization of quicksort are systematically described in detail. Quicksort takes an average time O(NlogN) and is the sort algorithm used for basic types in Java. Take a look at the arrays.sort method. At this point, I’m going to go back and perfect the topK problem, and I can use the idea of quicksort, to average order N to solve topK.

If you think you can do it, you can recommend it or give it a thumbs-up.