Overview of sorting algorithms

Sorting is the process of rearranging a group of objects in some logical order. For example, the order is sorted by date — probably using some sort algorithm. In the early days of computing, it was thought that 30% of the computing cycle was spent on sorting. If the ratio is lower today, one reason might be that sorting algorithms are more efficient today, not that sorting is less important. The widespread use of computers now makes data ubiquitous, and the first step in sorting data is usually to sort it. Almost all computer systems implement a variety of sorting algorithms for both the system and the user.

Even if you just use the sorting functions in the standard library, there are three practical implications for learning sorting algorithms:

  1. Necessary skills of IT professionals, but also the Internet company interview must test points;

  2. Similar techniques can be effective for other types of problems;

  3. Sorting algorithms are often the first step in solving other problems.

Sequencing plays an important role in business data processing and modern scientific computing. It can be applied to things processing, combinatorial optimization, astrophysics, molecular dynamics, linguistics, genomics, weather forecasting, and many other fields. One sorting algorithm (quicksort) has even been hailed as one of the top ten algorithms in science and engineering of the 20th century.

Data structures and algorithms, there are ten algorithms for sorting, including bubble sort, simple choice sort, simple insertion sort, merge sort, heap sort, quick sort, Hill sort, count sort, radix sort, bucket sort.

Quicksort and merge sort are the most commonly tested in interviews, and interviewers often ask to write the code of these two sorts on the spot. The code for both sorts must be handy. For other sorts, it may be necessary to compare the pros and cons, the ideas of the various algorithms and their usage scenarios, and to know the temporal and spatial complexity of the algorithms.

Here are ten algorithms to learn from easy to difficult:

1. Bubble sort

Bubble sort is a simple sorting algorithm. It repeatedly goes through the sequence to be sorted, comparing two elements at a time and swapping them if they are in the wrong order. The work of visiting the sequence is repeated until no more exchanges are needed, that is, the sequence has been sorted. The algorithm gets its name from the fact that the related elements slowly “float” to the top of the sequence by swapping.

Basic ideas:

1. Compare adjacent elements. If the first is larger (smaller) than the second, swap them both;

2. Do the same for each pair of adjacent elements, starting from the first pair to the last pair at the end, so that the last element should be the largest (smallest) number;

3. Repeat the steps for all the elements except the last one.

Repeat steps 1 and 2 until the sorting is complete.

* Bubble descending example: *

The original array

2. Sort by simple selection

The idea of selection sort is a little bit like bubble sort, where you sort the smallest element first. But the process is different. Bubble sort is by comparing and swapping neighbors. And selection sorting is done by selection of the whole.

For example, to sort the unordered sequence 5,3,8,6,4 by simple selection, you first choose the smallest number other than 5 to swap with 5, so you choose 3 to swap with 5, and you get 3,5,8,6,4. The rest of the sequence continues to be selected and swapped, and you end up with an ordered sequence. In fact, you can think of selection sort as an optimization of bubble sort, because it has the same purpose, except that selection sort only swaps when it has a minimum number, which greatly reduces the number of swaps.

Specific steps

First, find the largest (smallest) element in the array;

Second, swap it with the first element of the array (if the first element is the largest (or smallest) element, then swap it with itself);

Again, find the largest (smallest) element of the remaining elements and swap it with the second element of the array. And so on until you sort the entire array.

Simple selection sort (descending) example

Raw array:

3. Simple insertion sort

Insertion sort is done not by swapping places but by comparing where to insert elements. I believe that we have had the experience of playing cards, especially the larger number of cards. In the card may have to tidy up their own cards, cards when how to tidy up? You take a card, you find the right place to insert it. It’s the same principle as insertion sort.

So for example, we’re going to receive five, three, four, eight, six, and we’re going to receive five first, which is definitely in the right place, so there’s no need to sort it out. And then you get the 3, and then you put the 3 in front of the 5, and you put the 5 back one place, so it becomes a 3,5. And then we get a 4. Now what are we going to do? Insert the 4 in front of the 5, and move the 5 one place back.

* Specific steps: *

  • For unsorted data, the sorted sequence is scanned from back to front to find the appropriate position and insert it.

  • To make room for the elements to be inserted, we need to move the sorted elements to the right after the insertion position.

  • The time required for insertion sort depends on the initial order of the elements in the input. For example, sorting a large array whose elements are already ordered (or nearly ordered) will be much faster than sorting a randomly ordered array or an array in reverse order.

In general, insertion sort is very efficient for partially ordered arrays, and it works well for small arrays.

Simple insertion sort (descending) example

Raw array:

1. 4. Smart Car

A fast sorting algorithm based on insertion sort (please learn insertion sort first, understand the basic idea of insertion sort. Insertion sort is slow for large ordinal arrays because elements can only be moved bit by bit from one end of the array to the other. For example, if the smallest element of the primary key happens to be at the end of the array, it takes n-1 moves to move it to the right position.

Hill sort simply improved insertion sort, also known as reduced increment sort, to speed things up, and was one of the first algorithms to break O(n^2).

Hill sort is to sort the array by a certain incremental group, the use of direct insertion sort algorithm to sort each group; As the increments decrease, each group contains more and more elements. When the increments decrease to 1, the entire array is grouped into one group, and the sorting is complete. The increments that get smaller and smaller form a sequence of increments.

Hill sort (descending) example

We choose the method of calculating the increments: gap= length of the array /2, and the shrinking increment continues in the way of gap= gap/2, so the resulting increment sequence is {7,3,1}.

3,

If the third increment is 1, the array after the second sort is divided into groups of 1, and the insertion sort is performed to form the last sorted array

Incremental sort in Hill sort

Under the previous large increment, the size of each subsequence is small, and the sorting efficiency with direct insertion is high. Although the subsequence is larger and larger in the subsequent increment decreasing, the sorting efficiency is still high because the ordering of the whole sequence is more and more obvious.

In theory, as long as an array is decrement and the last value is 1, it can be used as an incremental sequence. Is there a sequence of steps that requires relatively few comparisons and moves in the sorting process, and that the time complexity of the algorithm is asymptotically optimal regardless of the number of sequence records to be sorted? But at the moment, it is mathematically impossible to prove that any particular sequence is the “best.”

* Common incremental sequences are: *

The Hill increment sequence: {N/2, (N /2)/2,… , 1}, where N is the length of the original array, this is the most common sequence, but not the best

Hibbard sequence: {2^k-1… , 3, 1}

Sedgewick sequence: {… , 109, 41, 19, 5,1} is expressed as

Merge sort

Merge sort is an efficient sorting algorithm based on merge operation. This algorithm is a typical application of divide and conquer. The ordered subsequence is merged to obtain a completely ordered sequence. That is, to order each subsequence, and then to order the subsequence segments.

If two ordered tables are merged into a single ordered table, it is called 2-way merging, and the corresponding method is multi-way merging.

For a given set of data, the recursive and divide-and-conquer technique is used to divide the data sequence into smaller and smaller half subtables. After sorting the half subtables, the recursive method is used to merge the sorted half subtables into larger and larger ordered sequences.

To improve performance, we sometimes use other sorting algorithms, such as insertion sort, to sort half subtables when the number of half subtables is less than a certain number (such as 15).

Merge sort (descending) example

Quick sort

Quicksort has been hailed as one of the top ten algorithms in science and engineering of the 20th century. Quicksort is an improvement on bubble sort and a typical application of divide and conquer.

First, select an arbitrary data (such as the first number of an array) as the key data, we call it the base number, and then put all smaller numbers before it, and all larger numbers after it, this process is called a quick sort, also known as partition operation. In the actual implementation, we usually operate directly on the original array.

By a trip to the quick sort to sort the data split into separate two parts, one part of all the data are smaller than the other part of all data, and then of the two parts data respectively according to the method of quick sort, the entire sorting process can be recursive, to achieve the whole data into an orderly sequence.

To improve performance, we sometimes use other sorting algorithms, such as insertion sort, when the number of separate parts is less than a certain number (such as 15).

Quicksort principle

11. All numbers smaller than the base number 48 are to its left, and all numbers larger than 48 are to its right. The left and right by quick sort to continue to sort down, you can complete the final sort.

Base number in quicksort

Selection of benchmark: The optimal case is that the benchmark value is just in the middle of the disordered region, which can maximize the efficiency of sorting both sides and minimize the number of recursive partition, but it is generally difficult to achieve the optimal. There are generally three ways to select the baseline: select the first element of the array, select the last element of the array, and select the median of the first, last, and middle elements (for example, 4, 5, 6, 7, the first 4, the last 7, and the middle 5, the median of these three numbers is 5, so select 5 as the baseline).

Dual-pivot quicksort: A two-base quicksort algorithm, which uses two bases to sort the entire array in three parts, experimentally saves about 10% of its time compared to classic quicksort.