Quote:

How to optimize bubble sort? , graphical selection sort and insertion sort. The average time complexity of these simple sorting algorithms is O(n^2). Hill sort was one of the first algorithms to break the second time barrier. Let’s look at why Hill sort breaks quadratic time.

First, analyze the lower bound of simple sorting algorithm

Reverse order: order pairs (a[I], a[j]) having the property I < j but a[I] > A [j]. For example, sequence 34,8, 64,51, 32,21 has nine inversions, namely, (34,8), (34,32), (34,31), (64,51), (64,32), (64,21), (51,32), (51,21) and (32,21). Note that this is exactly the number of swaps that (implicitly) need to be performed. This is always the case, because swapping two neighboring elements out of order eliminates exactly one inversion, whereas an sorted array has no inversion. Since there is also O(N) work in the algorithm, assuming I is the number of inversions in the original array, the running time of insertion sort is O(I + N). So, if the number of inversions is O(N), insertion sort runs in linear time; Bubble sort can also achieve the optimal O(N) under ordered conditions by adding flag bits.

So we can calculate the average time accurate bound for simple sorting by calculating the average number of inversions in the original sequence.

This assumes that elements differ from each other (if repetition is allowed, we don’t even know the average number of repetitions).

Theorem 1: The average number of inversions for an array of N distinct numbers is N(n-1)/4.

It is proved that in the sequence L and the inverse sequence Lr of N mutual differences, the ordered even is N(n-1)/2 (easy to understand: it is equivalent to choosing two elements from N mutual differences, the two elements have the order, namely: A2N= N(n-1)/2). The average list of inversions would be half: N(n-1)/4 inversions.

Theorem 2: Any algorithm that sorts by swapping adjacent elements requires an average

Proof: Because the initial average number of inversions is N(n-1)/4, and each swap only reduces the number of inversions by one, it is required

Conclusion: This lower bound tells us that in order for a sorting algorithm to run in sub-quadratic or O(N^2) time, some comparisons must be performed, particularly swapping of elements that are far apart. A sorting algorithm moves forward by removing inversions, and to work effectively it must remove more than one inversion per swap. Now let’s see how Hill sort breaks the quadratic time bound.

2. ShellSort

Hill sort is named after its designer, Donald Shell. Hill sort is implemented by multiple insertion sort. It works by comparing elements spaced at certain intervals; The distance used for the comparisons decreases as the algorithm progresses, until only the last order of adjacent elements is compared. So Hill sort is also called reduced increment sort.

  • Algorithm idea: Use a sequence h1, H2… Ht is called an incremental sequence. As long as H1 =1, any incremental sequence is feasible. After using incremental HK sort, for each I we have a[I]<=a[I +hk]; All elements separated by HK are sorted, which is called HK sort. Hill sorts work as long as the last h1=1 (which is the most common insertion sort). An HK sort is an insertion sort of the HK subarray.

  • Sorting process

  1. Select an incremental sequence H1,h2,… .ht, h1= 1.
  2. Sort according to the number of incremental sequences, i.e., cycle T times, and change to H at the end of each sequencet-1The delta.
  3. Divide the original array into htSubarrays, insert sort for each subarray.

Below is a diagram of Hill ordering using delta sequences 1, 3, 5 for sequences {3, 7, 1, 13, 9, 11, 5, 8, 2, 4, 12, 6, 10}.

Hill sequence diagram

Each color represents a subarray, which makes it easy to see the sorting process and the result of each trip.

  • Java implementation bubble sort:In our code, we use the incremental sequence recommended by Hill (but not very efficiently). ht= N / 2 and hk=hk+1/ 2. Understand the insertion sort process, the code implementation is very simple.
 private static <T extends Comparable<? super T>> void shellsort(T[] a) {

        int j;

        T tmp = null;

        for (int gap = a.length / 2; gap > 0; gap /= 2) {

            for (int i = gap; i < a.length; i++) {

                tmp = a[i];

                for (j = i; j >= gap &&

                        tmp.compareTo(a[j - gap]) < 0; j -= gap) {

                    a[j] = a[j - gap];

                }

                a[j] = tmp;

            }

        }

    }

Copy the code
  • Time and space complexity and stability analysis:
  1. The running time of Hill sort depends on the selection of incremental sequences, and the proof is complicated [see other sources for more information].
  2. The worst time complexity of Hill sort with Hill delta is O(n^2).
  3. The worst time complexity of Hill sort using Hibbard increments is O(N3/2); The optimal time complexity is O(N 5/4).
  4. With Sedgewick incremental sequences, the worst time complexity of sorting is O(N4/3); The average time complexity is O(N 7/6). The best sequence is {1,5,19,41,109… }. The terms in this sequence are either of the form 9 times 4i – 9 times 2i+1, or 4I – 3 times 2i+1).
  5. Space complexity: Only one temporary variable is used, so the space complexity is O(1);
  6. Stability: Unstable sort. Since the step size is different from one step to the next, the insertion sort with long steps may insert the following elements in front.

Third, summary

This paper explains that to break the quadratic time bound, we must compare the far apart elements to swap. In this way, multiple inversions can be deleted in one swap to break the quadratic time bound. Hill sort divides the original sequence into multiple subsequences by using incremental sequences and performs insertion sort on each subsequence. Hill sort works as long as the final increment sequence h1=1. Hill sort time depends heavily on the choice of delta sequence, we can directly “tabulate” the delta sequence in the array, so that we do not have to calculate each sort.

Statement: drawing code words are hard, if reproduced, please indicate the source. References are from Java Language Description of Data Structures and Algorithms (3rd Edition).