Introduction to insertion sort

Insertion sort is also a relatively simple sorting algorithm. The idea is to take the first element of the array as an ordered sequence, and then insert the rest of the array, one element at a time, from left to right, into the appropriate position in the ordered sequence on the left, increasing the length of the ordered sequence on the left until the entire array is ordered.

The illustration below

Suppose you want to do an insertion sort on the following array

The first element is added to the ordered sequence

Then take the first element of the unsorted portion and insert that element into the appropriate position in the ordered sequence on the left

The first element of the unsorted part is 3, and it is found that 3 should be inserted before 9

So the result of the first insertion is as follows

Next round, take the first element of the unsorted part, 6

Insert 6 into the appropriate position in the ordered sequence on the left, between 3 and 9

In the next round, take the first element of the unsorted part of the array, 1, and find that 1 should be inserted to the left of 3

Moving on to the next round, 4 is inserted between 3 and 6

In the next round, 2 is inserted between 1 and 3

. Finally, the whole array is sorted

Note: There are many ways to insert a number into an ordered sequence. The simplest is to use a bubbling process, constantly comparing and swapping two adjacent numbers, always swapping the inserted elements to the appropriate position. With this in mind, the code for inserting sort is as follows

public void insertSort(int[] array) {
		/* Continuously insert numbers into the appropriate position, extending the bounds of the ordered sequence */
		for (int sortedSize = 1; sortedSize < array.length; sortedSize++) {
			for (int i = sortedSize - 1; i >= 0 ; i--) {
				if (array[i + 1] < array[i]) {
					swap(array, i + 1, i);
				} else {
					break; }}}}Copy the code

Optimization idea

Thinking a

Change compare and swap to one-way assignment

There is actually room for optimization of the insertion sort described above. Because for each round of sorting, you really just have to find the right place. In the above implementation, finding the right place to insert is done by constantly swapping elements. It takes three assignments to swap two numbers each time. In a train of thought, we can actually put to insert the element of staging first, and then to be inserted in a sequence of elements and orderly element in turn from right to left, as long as the ordered sequence of elements in the larger than to insert elements, will be its position in the ordered sequence moves to the right one, so that the original every time need to exchange, to ChanXiangFu value, I only need one assignment. Finally, after the appropriate location is found, the temporary element to be inserted can be directly assigned. The illustration below

Suppose that the state of the array at some point in the insertion sort process is as follows

The next to be inserted into the elements of 5, it is temporary, then starting from the orderly sequence is the most the right side of the elements, and in turn 5 comparison, found 9 to 5, then 9 one-way right assignment, covering off 5, the logic goes, the location of the original 9 actually is useless, because 9 has been moves to the right one, but due to our temporary 5, so don’t have to worry about lost. Useless elements are highlighted in purple in the image below

And then to the left, the next element in the ordered sequence, 8

If 8 and 5 are compared and 8 is larger than 5, 8 is also assigned to the right, covering the purple 9

Continue to the left,

When comparing 2 and 5, it is found that 2 is less than 5, indicating that an appropriate insertion position has been found. Assign the previously temporary 5 to the position of the purple element

I’m done with this round of insertion sort

Logically speaking, it is equivalent to moving the larger elements in the ordered sequence one place to the right, leaving a space for the element to be inserted. Because it is one-way assignment, compared with the previous way of comparison and exchange, there are fewer assignment operations, so the performance will be improved. But you need an extra space to temporarily store the elements to be inserted. According to this idea, the code is as follows

	public void insertSortV1(int[] array) {
		for (int sortedSize = 1; sortedSize < array.length; sortedSize++) {
			int temp = array[sortedSize];
			int i = sortedSize;
			while (i > 0 && array[i - 1] > temp) {
				array[i] = array[i - 1]; i--; } array[i] = temp; }}Copy the code

Idea 2

Let’s change linear search to binary search

With each round of insertion sort, the key is to find the right insertion position, which is actually a search process. All of the above implementations use linear look-ups, which start at the right end of an ordered sequence and work their way to the left until they find the right place. The time complexity of this search is linear, that is, O(n). We can according to this process of search optimization, will replace the linear search for binary search, binary search is by leaps and bounds to find, take time to find the sequence of the middle, and the target, if less than the target value, continued to look up in the right half part, if it is greater than the target value, will continue to find the part on the left side. Binary search reduces the search space by half each time in order log(n) time.

The optimized code using binary lookup is shown below

	/** * binary search *@paramLeft left boundary (inclusive) *@paramThere is no (inclusive) * */
	private int binarySearch(int[] array, int left, int right, int target) {
		while (left <= right) {
            /* take the middle position */
			int mid = (left + right) >> 1;
			If (arr[mid] > target) select * from mid (arr[mid] > target) select * from mid (arr[mid] > target); If arr[mid] < target, mid (arr + 1) < target (left * 1.2), mid (arr + 1) < target (left * 1.2), mid (arr + 1) < target (left * 1.2) Right = left + 2, mid = left + 1 * 2.1 arr[mid] > target If arr[mid] < target, select left = right = mid; if arr[mid] > target, select left = right = mid; If arr[mid] < target, then left is the insertion position * */
			if (array[mid] > target) {
				/* Look in the left half */
				right = mid - 1;
			} else if (array[mid] < target) {
				/* Search the right half of */
				left = mid + 1;
			} else {
				/* = mid + 1 */
				return mid + 1; }}return left;
	}

	@Override
	public void insertSortBinarySearch(int[] array) {
		for (int sortedSize = 1; sortedSize < array.length; sortedSize++) {
			int temp = array[sortedSize];
			/* Get the position to insert */
			int insertPos = binarySearch(array, 0, sortedSize - 1, temp);
			/* Move all ordered sequences after the insertion position one */
			for (int i = sortedSize; i > insertPos ; i--) {
				array[i] = array[i - 1]; } array[insertPos] = temp; }}Copy the code

Thought three

Hill sorting

Hill sorting is a variation of insertion sort, its core idea is to introduce the concept of a step, hill sorting using step length cut array is divided into many small array, in each small array using insertion sort, then gradually narrowing the step length, the last step to a long, step length for a hill sorting, is the simple insertion sort. Hill sorts have the advantage of being able to move elements across multiple positions, as opposed to simple insert sorts, where you have to compare elements one by one to find the right insertion position.

The hill sort diagram is shown below. The initial step size is usually set to half the array length

The array is then divided into four groups of elements according to gap=4. Elements with positions equal to gap are grouped into the same group. The first group is [1,5,9], (5 and 1 are separated by 4 positions,9 and 5 are separated by another 4 positions…). The second group is [4,7], which is light yellow, the third group is [2,6], which is light purple, and the fourth group is [8,3], which is orange. For each group of elements, a simple insertion sort is carried out separately, then the value of gap is narrowed, and insertion sort is continued for each group until gap is narrowed to 1. The last round is equivalent to a simple insertion sort. Because insert sort can achieve high efficiency when the array is basically ordered, hill sort, by introducing the concept of step size, can allow elements to be inserted across multiple positions without having to operate one by one. After each round of Hill sort, the array becomes more globally ordered.

After the first hill sort, the array state is as follows (only the 3 and 8 positions are swapped)

In the second round of Hill ranking, the gap is halved to 2

Insert and sort the two groups of elements respectively, and the results are as follows (group 1, marked with light blue, no change; Group 2, shown in orange, only switched 3 and 4.)

Finally, gap=1, the last Hill sort (this is equivalent to a simple insertion sort)

So if you look at the last round, you just swap 2 and 3,6 and 7. Swap twice, complete the sort

The above array is sorted by Hill, and the number of swap operations is more than 4 if simple insertion sort is used. This shows the power of Hill sort. According to the above, we can also better understand the meaning of Hill sort, also known as reduced incremental sort. Step size, delta, gap are all the same thing. The efficiency of Hill sorting is also affected by the selection of incremental sequences, which are usually selected with an initial increment of half the array length and then halved each time.

Hill sort code implementation is as follows

	public void shellSort(int[] array) {
		for (int gap = array.length / 2; gap > 0; gap /= 2) {
			for (int i = gap; i < array.length; i++) {
				int j = i;
				int temp = array[i];
                /* Instead of binary search, use linear search */
				while (j - gap >= 0&& array[j - gap] > temp) { array[j] = array[j - gap]; j -= gap; } array[j] = temp; }}}Copy the code

Notice that in the code implementation, it is directly from the position after the gap to the last element of the array, and for each element, the gap is incremented by the direct insertion sort forward. This is a little different from the illustration where you sort each group of elements separately by insertion. The advantage of this is that the entire Hill sort code requires only three levels of loop and is relatively easy to understand. If, as the diagram describes, the insertion sort of one set of elements at a time was looped, the code would require four loops and be difficult to understand. Interested readers can refer to the code below

	public void shellSort(int[] array) {
		int gap = array.length / 2;
		while (gap >= 1) {
			for (int i = 0; i < gap; i++) {
				/* This loop is the insertion sort for each set of elements */
				for (int j = i; j + gap < array.length; j += gap) {
					int pos = j + gap;
					while (pos - gap >= 0) {
						if (array[pos] < array[pos - gap]) {
							swap(array, pos, pos - gap);
							pos -= gap;
						} else {
							break;
						}
					}
				}
			}
			gap /= 2; }}Copy the code

The performance test

Random number groups ranging from 10,000 to 500,000 are used to test the performance of each insertion sorting algorithm, and the broken line graph is drawn as follows

It can be seen that the performance of Hill sort is better than other versions, using binary search optimization second, using one-way assignment optimization third, not optimized performance is the worst.