This post was originally posted on my blog
preface
This series includes ten classic sorting algorithms.
- The language used is Java
- The structure is: define abstract classes
Sort
It implements swap, size comparison and so on. For example, if you swap two values, just pass in the subscript. All other concretely sorted classes inherit from abstract classesSort
. So we can focus on the algorithm itself.
/* * Return value is 0, indicating array[i1] == array[i2] * return value is less than 0, indicating array[i1] < array[i2] * return value is greater than 0, indicating array[i1] < array[i2] * return value is greater than 0. Array [i1] > array[i2] */ protected int CMP (int i1, int i2) {return array[i1].compareTo(array[i2]);
}
protected int cmp(T v1, T v2) {
return v1.compareTo(v2);
}
protected void swap(int i1, int i2) {
T tmp = array[i1];
array[i1] = array[i2];
array[i2] = tmp;
}
Copy the code
What is Hill sort
- Shellsort, also known as descending incremental sort algorithm, is a more efficient and improved version of insertion sort. Hill sort is an unstable sort algorithm.
To improve the
Hill sort is an improved method based on the following two properties of insertion sort:
- Insertion sort has high efficiency when it operates on almost ordered data, that is, it can achieve the efficiency of linear sort
- But insert sort is generally inefficient because it can only move data one bit at a time
Hill sort history
Hill sort, named after its designer Donald Shell, was published in 1959. Some older textbooks and reference manuals named the algorithm shell-Metzner, which includes Marlene Metzner Norton’s name, but according to Metzner herself, “I did nothing for the algorithm, and my name should not appear in the algorithm’s name.”
Algorithm implementation
-
The original algorithm implementation required O(n2) comparisons and swaps in the worst case. V. Pratt’s book makes minor changes to the algorithm, which can improve performance up to O(n log2 n). This is a little bit worse than the order n log n of the best comparison algorithm.
-
Hill sort improves the performance of insert sort by dividing all the compared elements into several regions. This allows an element to make one big step towards its final position at a time. The algorithm then takes smaller and smaller steps for sorting. The last step of the algorithm is ordinary insert sort, but by this step, the data to be sorted is almost already sorted (insert sort is faster).
-
Suppose you have a small piece of data at the end of an array sorted in ascending order. If you use a sort of O(n2) complexity (bubble sort or insert sort), it may take n comparisons and swaps to move the data to the right place. Hill sort, on the other hand, moves data with larger steps, so small data can get to the right place with only a few comparisons and swaps.
-
A more understandable implementation of Hill sort is to put array columns in a table and sort the columns (by insertion sort). Repeat the process, but each time with a longer column. You end up with just one column. Converting an array to a table is to better understand the algorithm, which itself only sorts the original array (by increasing the index’s step size, for example I += step_size instead of I ++).
-
For example, if we have a set of numbers [13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10] and start sorting with steps of 5, we can better describe the algorithm by placing the list in a table with five columns, so that they should look like this:
13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10
Copy the code
Then we sort each column:
10 14 73 25 23
13 27 94 33 39
25 59 94 65 82
45
Copy the code
[10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45] At this point 10 has moved to the correct position, and then sort by step 3:
10 14 73
25 23 13
27 94 33
39 25 59
94 65 82
45
Copy the code
After sorting, it becomes:
10 14 13
25 23 33
27 25 59
39 65 73
45 94 82
94
Copy the code
Finally sort by 1 step (this is a simple insertion sort).
Algorithm stability
- Hill sort is not a stable sorting algorithm.
Is it in place
- What is the in situ algorithm?
- Not dependent on additional resources or on a few additional resources, relying only on the output to override the input
- The spatial complexity of 𝑂(1) can be considered as in situ algorithm
- The non-in-situ algorithm is called not-in-place or out-of-place
- Hill sort is in-place
Space-time complexity
- Best time complexity: O(n)
- Worst time complexity: O(N4/3)~O(n2)
- Average time complexity: depends on step size
- Space complexity: O(1)
Step sequence
-
The worst-case time complexity of the step sequence given by Hill himself is O(n2).
- Hill gave the sequence of step sizes as 1,2,4,8,16,32,64…
-
At present, the best known step size sequence and the fastest case time complexity is O(N4/3), which was proposed by Robert Sedgewick in 1986.
- 1,5,19,41,109…
-
The step size sequence given by Robert Sedgewick is implemented as follows
Private List<Integer>sedgewickStepSequence() {
List<Integer> stepSequence = new LinkedList<>();
int k = 0, step = 0;
while (true) {
if (k % 2 == 0) {
int pow = (int) Math.pow(2, k >> 1);
step = 1 + 9 * (pow * pow - pow);
} else {
int pow1 = (int) Math.pow(2, (k - 1) >> 1);
int pow2 = (int) Math.pow(2, (k + 1) >> 1);
step = 1 + 8 * pow1 * pow2 - 6 * pow2;
}
if (step >= array.length) break;
stepSequence.add(0, step);
k++;
}
return stepSequence;
}
Copy the code
code
The simplest insertion sort is based on steps 1,2,4,8,16,32…
public class ShellSort <T extends Comparable<T>> extends Sort<T> {
@Override
protected void sort() {
// TODO Auto-generated method stub
List<Integer> stepSequence = sedgewickStepSequence();
for(Integer step : stepSequence) { sort(step); Private void sort(int step) {private void sort(int step) {// colfor(int col = 0; col < step; Col ++) {// col+step, col+2*step, col+3*stepfor (int begin = col + step; begin < array.length; begin += step) {
int cur = begin;
while(cur > col && cmp(cur, cur - step) < 0) { swap(cur, cur - step); cur -= step; }}}} /* Obtain step sequence */ private List<Integer>shellStepSequence() {
List<Integer> stepSequence = new ArrayList<>();
int step = array.length;
while ((step >>= 1) > 0) {
stepSequence.add(step);
}
returnstepSequence; }}Copy the code
To optimize the
Ideas:
- The insertion sort algorithm is optimized according to the insertion sort of android ten sorting algorithms
- Optimize the step size
Private void sort2(int step) {// col: columnfor(int col = 0; col < step; Col ++) {// Sort the col columnfor (int begin = step+col; begin < array.length; begin+=step) {
int cur = begin;
T res =array[cur];
while(cur>col && cmp(res,array[cur-step])<0) { array[cur] = array[cur-step]; cur-=step; } array[cur] = res; }}}Copy the code
Private List<Integer>sedgewickStepSequence() {
List<Integer> stepSequence = new LinkedList<>();
int k = 0, step = 0;
while (true) {
if (k % 2 == 0) {
int pow = (int) Math.pow(2, k >> 1);
step = 1 + 9 * (pow * pow - pow);
} else {
int pow1 = (int) Math.pow(2, (k - 1) >> 1);
int pow2 = (int) Math.pow(2, (k + 1) >> 1);
step = 1 + 8 * pow1 * pow2 - 6 * pow2;
}
if (step >= array.length) break;
stepSequence.add(0, step);
k++;
}
return stepSequence;
}
Copy the code
The optimized code is
package YZ.Sort;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class ShellSort <T extends Comparable<T>> extends Sort<T> {
@Override
protected void sort() {
// TODO Auto-generated method stub
List<Integer> stepSequence = sedgewickStepSequence();
for(Integer step : stepSequence) { sort2(step); Private void sort(int step) {private void sort(int step) {// colfor(int col = 0; col < step; Col ++) {// col+step, col+2*step, col+3*stepfor (int begin = col + step; begin < array.length; begin += step) {
int cur = begin;
while(cur > col && cmp(cur, cur - step) < 0) { swap(cur, cur - step); cur -= step; }}}} private void sort2(int step) {// col: columnfor(int col = 0; col < step; Col ++) {// Sort the col columnfor (int begin = step+col; begin < array.length; begin+=step) {
int cur = begin;
T res =array[cur];
while(cur>col && cmp(res,array[cur-step])<0) { array[cur] = array[cur-step]; cur-=step; } array[cur] = res; Private List<Integer>shellStepSequence() {
List<Integer> stepSequence = new ArrayList<>();
int step = array.length;
while ((step >>= 1) > 0) {
stepSequence.add(step);
}
returnstepSequence; } private List<Integer>sedgewickStepSequence() {
List<Integer> stepSequence = new LinkedList<>();
int k = 0, step = 0;
while (true) {
if (k % 2 == 0) {
int pow = (int) Math.pow(2, k >> 1);
step = 1 + 9 * (pow * pow - pow);
} else {
int pow1 = (int) Math.pow(2, (k - 1) >> 1);
int pow2 = (int) Math.pow(2, (k + 1) >> 1);
step = 1 + 8 * pow1 * pow2 - 6 * pow2;
}
if (step >= array.length) break;
stepSequence.add(0, step);
k++;
}
returnstepSequence; }}Copy the code
The results of
Data sources:
Test by randomly generating 10000 data from 1 to 20000
Integer[] array = Integers.random(20000, 1, 80000);
The results are as follows
[MergeSort] Stability: true Time: 0.011s(11ms) Comparison times: 261,000 exchange times: 0
【QuickSort】 Stability: false Time: 0.012s(12ms) Comparison count: 345,500 Exchange count: 13,200
【HeapSort】 Stability: false Time: 0.018s(18ms) Comparison number: 511,000 exchange number: 20000
[ShellSort] Stability: false Time: 0.02s(20ms) Comparison times: 430,400 exchange times: 0
Time: 0.485s(485ms) Comparison times: 200 million exchange times: 200 million
[InsertionSort3] Stability: true Time: 0.526s(526ms) Comparison times: 25800 exchange times: 0
[InsertionSort2] Stability: true Time: 0.801s(801ms) Comparison times: 996322,900 exchange times: 0
[InsertionSort1] Stability: true Time: 1.281s(1281ms) Comparison times: 996322,900 exchange times: 996122,900
BubbleSort2 Stability: True Time: 2.271s(2271ms) Compare times: 200 million swap times: 996122,900
[BubbleSort] stability: true time: 2.339s(2339ms) comparison number: 200 million swap number: 99612,900
BubbleSort1 Stability: True Time: 2.403s(2403ms) Compare times: 200 million Swap times: 996122,900
You can see that hill sort performance is still very good, compared to insert sort, optimization is quite a lot.
The reverse of
What is an inverse pair?
Inversions in array [2,4,1] are <2,1> and <4,1>
The time complexity of insertion sort is proportional to the number of inversions
The more inversions there are, the higher the time complexity of insertion sort is
Insertion sort is particularly efficient when the number of inversions is minimal
Sometimes even faster than O(nlogn) level quicksort
Code Address:
- The code in git is at github