Today we summarize two excellent sorting algorithms, merge sort and quicksort.
First, they both use two important ideas of recursion and divide-and-conquer. I won’t go into recursion here.
Divide and conquer: As the name implies, divide and conquer, this is a very common idea in sorting, but it is also a good way to solve problems in other scenarios and even in everyday life. When we encounter a big problem and don’t know what to do, we often divide it into several small pieces. After we have dealt with each small module problem, we combine them and the big problem can be solved. Similarly, we can take advantage of the divide-and-conquer idea to improve performance when dealing with sorting problems.
So let’s summarize merge sort
Merge sort (top down, followed by a simple bottom-up explanation)
The idea of merge sort is actually pretty simple, it’s two steps, divide and conquer.
When we’re dealing with a large array, it’s always a little bit too slow to use the bubbly, insert, select sort that we learned before, but hill sort is fine, but it’s not the best choice, because it takes subsquare time. If we have enough space in memory, we can use the idea of increasing spatial complexity in exchange for decreasing time complexity so that sorting can be done faster.
So when we merge sort, the array we’re going to sort into two parts, we’re going to call them Left and Right, and if we sort both of them and we’re done, then the subarrays Left and Right are both ordered arrays. If Left[0]<Right[0], place Left[0] at index 0 of the new array. If Left[0]<Right[0], place Left[0] at index 0 of the new array. Then compare Left[1] and Right[0], and so on, in ascending or descending order, you can copy all the Left and Right arrays into the new array in a certain order, at this time the array sort is completed.
So how do we sort the Left and Right arrays? In fact, the model of merge sort does not divide an array into two pieces, but into the smallest elements of the array, and then treat each of the smallest elements. This kind of processing method should be realized by recursion.
Let’s explain it in detail with pictures.
We can see that will merge sort an array in limited time of break up, and then to the merger, the number of the same will be ordered by the entire array, when we use recursive implement merge sort, does not need too much attention to every step is how to operate, you just need to will generally thought analysis clearly.
Use insertion sort for small arrays
We know that insertion sort is suitable for small sizes and arrays of data in a certain order. When we use the merge sort, still to conquer in the array size is very small, this may be because too often a recursive function call, some performance loss (of course, for in general, the loss is not an unbearable), then we can consider a more compromise, we set a threshold, When we allocate arrays to subarrays of this size, we stop doing recursion and use faster insertion sort, which improves performance, although the threshold setting is not specified and requires multiple tests. This is a small addition to merge sort and, according to some data tests, improves performance by about 20 percent.
Bottom up merge sort
So let’s talk a little bit about bottom-up merge sort. The idea is to merge those mini-arrays first and then merge the resulting subarrays in pairs until we have sorted the entire array completely. The code will be much cleaner if you implement this merging approach.
First we merge in pairs (treating each element as an array of size 1), then we merge in pairs (merging the array of size 2 into an array of size 4). In this way, the two arrays may not be the same size, but the sum is still the original array. Instead of merging from the top down, merge from both ends of the data.
So what’s the difference between these two approaches? When the array length is 2 to the NTH power, the access to the array is the same and the time consumption is the same. At other times, the two accesses the array differently and in different order.
Bottom-up merge sort is better for organizing data in linked lists, when we sort by sublists of size 1, then by size 2, then by size 4. We just need to rejoin the list to sort in place. No extra space is needed.
The speed of Hill sorting and recursive sorting has always been a controversial issue. It is pointed out in the fourth edition of Algorithms that in practical applications, the difference between the two running times is within the constant level (Hill sorting is used as a proven increasing sequence), so performance depends on the implementation. Theoretically, no one has been able to prove that hill sort runs linearly logarithmically for random data, so it is possible that hill sort will have a higher performance growth rate on average, and in the worst case, this gap has been demonstrated, but has no effect on actual use.
Quick sort
As the most widely used sorting algorithm, quicksort is popular mainly because of its simple implementation. Fast sorting has two great advantages.
- Quicksort is sort in place (only a very small auxiliary stack is required)
- Quicksort time consumption, the sort time of an array of length N is proportional to NlgN
Most current sorting algorithms cannot combine these two points. And the quicksort inner loop is much cleaner than most sort inner loops, making it faster both theoretically and practically. But the main disadvantage of quicksort is that it is very fragile and needs to be used with great care to avoid performance degradation. There are many examples where quicksort causes the time consumption to be squared because of some error.
Now let’s look at the definition of quicksort.
Quicksort is also a divide-and-conquer sort method. It divides an array into contiguous arrays and sorts the two parts independently. Quicksort and merge sort are complementary: merge sort divides the array into two subarrays and sorts the ordered subarrays separately, while quicksort sorts the array in such a way that the entire array is naturally ordered when both self-data are ordered. This definition comes from The fourth edition of Algorithms, and it can seem confusing. Below I use their own summary to describe the implementation of quicksort method.
When we are faced with an array to sort, we first find a base number (there is no special limit to the base number, we usually go to the first number of the array), and then we find a way to put the data that is larger than the base number on one side of the base number, and the data that is smaller than the base number on the other side. So now we have two subarrays, and even though these two subarrays aren’t sorted, we’re pretty sure that all the numbers in one of these subarrays are less than the other, which means that when we sort these two subarrays, the whole array is sorted. So now that’s where the divide-and-conquer idea comes in, we split these two arrays, and then we use that idea, and we do the same thing with the base number, and we divide the data by the base number, and now we have four subarrays. And so with that, when each of our subarrays is ordered, then we’re done sorting. We can do this by recursion.
The above process is represented by the picture
So how do we do that in code?
The left pointer scans the array to the right and stops when it finds data greater than the base (in ascending order). The right pointer scans to the left and stops whenever it finds data less than the base. When both stop, exchange the data that they point to now. Finally, the whole process is completed by swapping the datum and the data at the boundary. At this point, the array is split by the datum.
Let’s summarize some key points of quicksort
To prevent cross-border
When dealing with the largest or smallest elements of an array, it is important to be aware of the way Pointers are evaluated, where it is very easy to throw an exception with an out-of-bounds array index.
Termination of the recursive
Because quick sort using the recursive processing, so you need to set up end loop condition, forget to set the compiler complains, but if set wrong, no attention to detail, very may cause program into dead circulation, and our first thought is always when dealing with infinite loop circulation within the code, want to once again found that the problem is very difficult, this requests us in the design program, Design recursive termination loop conditions first.
Now let’s look at the algorithm improvement of quicksort
When executing quicksort multiple times or placing it on a large library function, we should think more about performance optimization of quicksort. Good optimization can greatly improve performance.
Switch to insert sort
This is an idea we mentioned earlier, when we operate on small arrays, insertion sort may be faster than recursive merge (the Algorithm points out that the performance is better for 5-15 arrays), so insertion sort on small arrays **, ** is also very simple to implement
If (left + M >= right) {// sort(arr,left,right); return; }Copy the code
You just need to change the statement that jumps out of the recursion. The M value is the array size parameter. The principle is that when we get to a certain length, the difference between the left pointer and the right pointer is M, so we don’t do recursion, we just jump out of the loop after insertion sort.
Entropy optimal sort
In practice, we might have an array with a lot of duplicate data, such as people’s birthdays. In this case, the performance of quicksort is ok, but there is huge room for improvement, with the potential to change the linear logarithmic level to linear level. When there is a lot of same data in an array, it is very likely that the data in the sub-array will be exactly the same after segmentation. We do not need to sort such an array, but because of the use of recursion, the program will still merge and split it, which is bound to cause a lot of waste of resources.
This optimization method is called three-way segmentation quicksort, the specific implementation is not shown here, there is time to write a single article to summarize in detail.