Hill sort of the top ten sorting algorithms

This post was originally posted on my blog

preface

This series includes ten classic sorting algorithms.

The language used is Java
The structure is: define abstract classesSortIt implements swap, size comparison and so on. For example, if you swap two values, just pass in the subscript. All other concretely sorted classes inherit from abstract classesSort. So we can focus on the algorithm itself.

/* * Return value is 0, indicating array[i1] == array[i2] * return value is less than 0, indicating array[i1] < array[i2] * return value is greater than 0, indicating array[i1] < array[i2] * return value is greater than 0. Array [i1] > array[i2] */ protected int CMP (int i1, int i2) {return array[i1].compareTo(array[i2]);
	}
	
	protected int cmp(T v1, T v2) {
		return v1.compareTo(v2);
	}
	
	protected void swap(int i1, int i2) {
		T tmp = array[i1];
		array[i1] = array[i2];
		array[i2] = tmp;
	}

Copy the code

What is Hill sort

Shellsort, also known as descending incremental sort algorithm, is a more efficient and improved version of insertion sort. Hill sort is an unstable sort algorithm.

To improve the

Hill sort is an improved method based on the following two properties of insertion sort:

Insertion sort has high efficiency when it operates on almost ordered data, that is, it can achieve the efficiency of linear sort
But insert sort is generally inefficient because it can only move data one bit at a time

Hill sort history

Hill sort, named after its designer Donald Shell, was published in 1959. Some older textbooks and reference manuals named the algorithm shell-Metzner, which includes Marlene Metzner Norton’s name, but according to Metzner herself, “I did nothing for the algorithm, and my name should not appear in the algorithm’s name.”

Algorithm implementation

The original algorithm implementation required O(n2) comparisons and swaps in the worst case. V. Pratt’s book makes minor changes to the algorithm, which can improve performance up to O(n log2 n). This is a little bit worse than the order n log n of the best comparison algorithm.
Hill sort improves the performance of insert sort by dividing all the compared elements into several regions. This allows an element to make one big step towards its final position at a time. The algorithm then takes smaller and smaller steps for sorting. The last step of the algorithm is ordinary insert sort, but by this step, the data to be sorted is almost already sorted (insert sort is faster).
Suppose you have a small piece of data at the end of an array sorted in ascending order. If you use a sort of O(n2) complexity (bubble sort or insert sort), it may take n comparisons and swaps to move the data to the right place. Hill sort, on the other hand, moves data with larger steps, so small data can get to the right place with only a few comparisons and swaps.
A more understandable implementation of Hill sort is to put array columns in a table and sort the columns (by insertion sort). Repeat the process, but each time with a longer column. You end up with just one column. Converting an array to a table is to better understand the algorithm, which itself only sorts the original array (by increasing the index’s step size, for example I += step_size instead of I ++).
For example, if we have a set of numbers [13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10] and start sorting with steps of 5, we can better describe the algorithm by placing the list in a table with five columns, so that they should look like this:

13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10
Copy the code

Then we sort each column:

10 14 73 25 23
13 27 94 33 39
25 59 94 65 82
45
Copy the code

[10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45] At this point 10 has moved to the correct position, and then sort by step 3:

10 14 73
25 23 13
27 94 33
39 25 59
94 65 82
45
Copy the code

After sorting, it becomes:

10 14 13
25 23 33
27 25 59
39 65 73
45 94 82
94
Copy the code

Finally sort by 1 step (this is a simple insertion sort).

Algorithm stability

Hill sort is not a stable sorting algorithm.

Is it in place

What is the in situ algorithm?
- Not dependent on additional resources or on a few additional resources, relying only on the output to override the input
- The spatial complexity of 𝑂(1) can be considered as in situ algorithm
The non-in-situ algorithm is called not-in-place or out-of-place
Hill sort is in-place

Space-time complexity

Best time complexity: O(n)
Worst time complexity: O(N4/3)~O(n2)
Average time complexity: depends on step size
Space complexity: O(1)

Step sequence

The worst-case time complexity of the step sequence given by Hill himself is O(n2).
- Hill gave the sequence of step sizes as 1,2,4,8,16,32,64…
At present, the best known step size sequence and the fastest case time complexity is O(N4/3), which was proposed by Robert Sedgewick in 1986.
- 1，5，19，41，109…
The step size sequence given by Robert Sedgewick is implemented as follows

Private List<Integer>sedgewickStepSequence() {
		List<Integer> stepSequence = new LinkedList<>();
		int k = 0, step = 0;
		while (true) {
			if (k % 2 == 0) {
				int pow = (int) Math.pow(2, k >> 1);
				step = 1 + 9 * (pow * pow - pow);
			} else {
				int pow1 = (int) Math.pow(2, (k - 1) >> 1);
				int pow2 = (int) Math.pow(2, (k + 1) >> 1);
				step = 1 + 8 * pow1 * pow2 - 6 * pow2;
			}
			if (step >= array.length) break;
			stepSequence.add(0, step);
			k++;
		}
		return stepSequence;
	}
Copy the code

code

The simplest insertion sort is based on steps 1,2,4,8,16,32…

public class ShellSort <T extends Comparable<T>> extends Sort<T> {

	@Override
	protected void sort() {
		// TODO Auto-generated method stub
		List<Integer> stepSequence = sedgewickStepSequence();
		for(Integer step : stepSequence) { sort(step); Private void sort(int step) {private void sort(int step) {// colfor(int col = 0; col < step; Col ++) {// col+step, col+2*step, col+3*stepfor (int begin = col + step; begin < array.length; begin += step) {
				int cur = begin;
				while(cur > col && cmp(cur, cur - step) < 0) { swap(cur, cur - step); cur -= step; }}}} /* Obtain step sequence */ private List<Integer>shellStepSequence() {
		List<Integer> stepSequence = new ArrayList<>();
		int step = array.length;
		while ((step >>= 1) > 0) {
			stepSequence.add(step);
		}
		returnstepSequence; }}Copy the code

To optimize the

Ideas:

The insertion sort algorithm is optimized according to the insertion sort of android ten sorting algorithms
Optimize the step size

Private void sort2(int step) {// col: columnfor(int col = 0; col < step; Col ++) {// Sort the col columnfor (int begin = step+col; begin < array.length; begin+=step) {
			int cur = begin;
			T res =array[cur];
			while(cur>col && cmp(res,array[cur-step])<0) { array[cur] = array[cur-step]; cur-=step; } array[cur] = res; }}}Copy the code

Private List<Integer>sedgewickStepSequence() {
		List<Integer> stepSequence = new LinkedList<>();
		int k = 0, step = 0;
		while (true) {
			if (k % 2 == 0) {
				int pow = (int) Math.pow(2, k >> 1);
				step = 1 + 9 * (pow * pow - pow);
			} else {
				int pow1 = (int) Math.pow(2, (k - 1) >> 1);
				int pow2 = (int) Math.pow(2, (k + 1) >> 1);
				step = 1 + 8 * pow1 * pow2 - 6 * pow2;
			}
			if (step >= array.length) break;
			stepSequence.add(0, step);
			k++;
		}
		return stepSequence;
	}
Copy the code

The optimized code is

package YZ.Sort;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

public class ShellSort <T extends Comparable<T>> extends Sort<T> {

	@Override
	protected void sort() {
		// TODO Auto-generated method stub
		List<Integer> stepSequence = sedgewickStepSequence();
		for(Integer step : stepSequence) { sort2(step); Private void sort(int step) {private void sort(int step) {// colfor(int col = 0; col < step; Col ++) {// col+step, col+2*step, col+3*stepfor (int begin = col + step; begin < array.length; begin += step) {
				int cur = begin;
				while(cur > col && cmp(cur, cur - step) < 0) { swap(cur, cur - step); cur -= step; }}}} private void sort2(int step) {// col: columnfor(int col = 0; col < step; Col ++) {// Sort the col columnfor (int begin = step+col; begin < array.length; begin+=step) {
			int cur = begin;
			T res =array[cur];
			while(cur>col && cmp(res,array[cur-step])<0) { array[cur] = array[cur-step]; cur-=step; } array[cur] = res; Private List<Integer>shellStepSequence() {
		List<Integer> stepSequence = new ArrayList<>();
		int step = array.length;
		while ((step >>= 1) > 0) {
			stepSequence.add(step);
		}
		returnstepSequence; } private List<Integer>sedgewickStepSequence() {
		List<Integer> stepSequence = new LinkedList<>();
		int k = 0, step = 0;
		while (true) {
			if (k % 2 == 0) {
				int pow = (int) Math.pow(2, k >> 1);
				step = 1 + 9 * (pow * pow - pow);
			} else {
				int pow1 = (int) Math.pow(2, (k - 1) >> 1);
				int pow2 = (int) Math.pow(2, (k + 1) >> 1);
				step = 1 + 8 * pow1 * pow2 - 6 * pow2;
			}
			if (step >= array.length) break;
			stepSequence.add(0, step);
			k++;
		}
		returnstepSequence; }}Copy the code

The results of

Data sources:

Test by randomly generating 10000 data from 1 to 20000

Integer[] array = Integers.random(20000, 1, 80000);

The results are as follows

[MergeSort] Stability: true Time: 0.011s(11ms) Comparison times: 261,000 exchange times: 0

【QuickSort】 Stability: false Time: 0.012s(12ms) Comparison count: 345,500 Exchange count: 13,200

【HeapSort】 Stability: false Time: 0.018s(18ms) Comparison number: 511,000 exchange number: 20000

[ShellSort] Stability: false Time: 0.02s(20ms) Comparison times: 430,400 exchange times: 0

Time: 0.485s(485ms) Comparison times: 200 million exchange times: 200 million

[InsertionSort3] Stability: true Time: 0.526s(526ms) Comparison times: 25800 exchange times: 0

[InsertionSort2] Stability: true Time: 0.801s(801ms) Comparison times: 996322,900 exchange times: 0

[InsertionSort1] Stability: true Time: 1.281s(1281ms) Comparison times: 996322,900 exchange times: 996122,900

BubbleSort2 Stability: True Time: 2.271s(2271ms) Compare times: 200 million swap times: 996122,900

[BubbleSort] stability: true time: 2.339s(2339ms) comparison number: 200 million swap number: 99612,900

BubbleSort1 Stability: True Time: 2.403s(2403ms) Compare times: 200 million Swap times: 996122,900

You can see that hill sort performance is still very good, compared to insert sort, optimization is quite a lot.

The reverse of

What is an inverse pair?

Inversions in array [2,4,1] are <2,1> and <4,1>

The time complexity of insertion sort is proportional to the number of inversions

The more inversions there are, the higher the time complexity of insertion sort is

Insertion sort is particularly efficient when the number of inversions is minimal

Sometimes even faster than O(nlogn) level quicksort

Code Address:

The code in git is at github