Note: this article is after reading two daniu’s blog, through the collation for their own learning quicksort notes, shared out for everyone to learn. For more information check out this blog link.

One. What is quicksort

Quicksort was proposed by Turing Prize winner C. A. R. Hoare (1934–) in 1960.

Quicksort is an improvement on bubbling sort. Its basic idea is: by a trip to sort to sort data split into separate two parts, one part of all the data than the other one is not part of the all data, and then of the two parts data respectively according to the method of quick sort, the entire sorting process can be recursive, to achieve the whole data into an orderly sequence.

The whole sorting process only takes three steps:

(1) In the data set, select an element as the “pivot”. (2) All elements less than the “base” are moved to the left of the “base”; All elements greater than the base are moved to the right of the base. (3) Repeat steps 1 and 2 for the two subsets to the left and right of the “benchmark” until only one element remains in all the subsets.

Two. Quicksort time complexity

Norahiko, a Japanese programmer, has written an animated demonstration of a sorting algorithm that visually compares the speed of fast sorting with that of other sorting methods.

Why is quicksort so fast

Quicksort is essentially a strategy of asking questions to narrow the range of possible outcomes, according to Weipeng Liu’s Mathematical Beauty: Why QuickSort Is So Fast.

We’ll start with the basics of guessing numbers and weighing balls, summarize the underlying rules, and then move on to sorting.

1. Guess the number

Suppose there is a number between 1 and 64 and you guess (you can only ask yes or no questions). What strategy should you use to make sure you’re right as few times as possible in any situation? Two points, obviously. Guess if it is between 1 and 32, eliminate half of the possibilities, and then continue to divide. This strategy ensures that no matter how the numbers play hide-and-seek with you, they will be correct within log_2{n} times. In algorithmic terms the lower bound is the best.

Why does this strategy have an optimal lower bound? The simple answer is that the strategy is balanced. On the other hand, if the strategy is not balanced, for example, is it between 1 and 10, then if it is not between 1 and 10 then there are more possibilities than N/2 left to investigate.

The essence of this strategy can be summed up as “making the unknown impossible”. It has no “weak spot”, and any branch of the answer is equally likely. On the other hand, once a branch has more possibilities, you get upset when the situation falls on that branch. For example, the worst strategy in the number guessing game is to guess one by one: Is it one? Is 2? … Because it takes 64 guesses at worst, and the lower bound is very bad. What makes binary search good is that it rules out half the possibilities every time and half no matter what (it’s the best at worst).

2. Said ball

Twelve balls, one of which is a bad ball. There is a scale. You need to weigh it the least number of times to determine which ball is broken and whether it is light or heavy. This question is an age-old puzzle.

Let’s review the number game. To make sure we get the least number of guesses in any situation, our strategy is to rule out exactly half of them each time. Analogies to the weighing ball problem: the bad ball could be any one of the 12 balls, that’s 12 possibilities; And in each of these possibilities, the bad ball can be light or heavy. So there are 12×2=24 possibilities for the answer to the question “which is the bad ball, the light ball or the heavy ball?” Now we to say the ball with balance, is equivalent to 24 possibility to ask questions about it, because the balance of the output has three kinds of “balance, left, right,” this is equivalent to our problems there are three answers, namely all the possibilities can be cut into three parts, according to guess the Numbers game, we should try to get the equal probability of three branches, That is to say, the possibilities are divided equally into three equal parts. This reduces the probability of an answer by a third with one weighting, and by a factor of 27 with three. And there are only 24 possibilities, so theoretically it’s perfectly possible to weigh it three times.

With guidelines for how to weigh, it is not too difficult to construct a weighing strategy. First of all, let’s explain why the most intuitive scale is not optimal — 6 and 6: at 6 and 6, the probability of balance is 0, that is to say, we get zero information. As I said, the optimal strategy is to make the balance equally likely between the three states, so that all the possibilities of the answer are equally divided.

Just keep in mind that as a guideline, you must choose a scale that is as likely to leave an answer when the sky is flat as when the scale is tilted left (right). In effect, this means that you have to choose a scale in which the balance is equally likely to produce all three outcomes, because the balance’s probability of producing a result is equal to the sum of all possible answers supporting that result (left-leaning, right-leaning, balanced), and each possibility of the answer is equally likely.

MacKay devoted section 4.1 of his book Information Theory: Inference and Learning Algorithms to this ball-weighing problem, and drew a nice graph:

The “1+” in the figure refers to the possibility that “ball 1 is heavy”. There are 24 possibilities at the beginning. After weighing 4 and 4, there are always 4 possibilities left, regardless of which case (branching). It was a perfect three points. Then construct a second name for each branch, and if you do a little math here, you can see that the balance is equally likely (strictly speaking, almost equally likely) to produce all three results for the second name on branch 1, i.e., “1, 2, 6 versus 3, 4, 5.” That’s why this notation works best in the worst case, no branch is its weakness, and it’s bound to reduce the situation by a third.

3. Sort (ideally)

Using the previous problem view, the nature of sorting can be expressed as follows: a set of N unsorted numbers, all of which have N! In which only one of the permutations is satisfactory (e.g. from largest to smallest). In other words, there are N possible sorting problems. Kind of. Any sorting based on the comparison of the basic operation unit is “comparison of a and b,” this is equivalent to guess a question of digital game, clearly the answer to this question can only be “yes” or “no”, a question of only two output maximum possibility will be cut in half space, according to the above ideas, the best cut method is cut in half and half. In other words, we hope that after comparing the magnitude of a and B, if we find that ab is also left with N! /2 possibilities. Since each permutation is assumed to be equally likely, this means that ab is also supported by N! PI over 2, in other words, the probability of ab.

We want to make sure that every time we compare a and B, the probability of ab is equal, so that we can reduce the probability by half no matter what! Optimal lower bound.

A direct corollary is that if every perfect comparison is made like the one above, then N! The possible permutations are just log_2{N! } log_2{N! } is approximately NlogN. That’s the complexity of quicksorting.

4. Sorting (actual situation)

Let’s consider the quicksort process: select an element at random as an “axis element” and move all elements larger than the axis to the left and the rest to the right. According to this procedure, the first comparison of quicksort is to compare an element to an axis element, at which point it becomes obvious that the chances of “greater than” and “less than” are 50/50. It’s a beautiful comparison.

The second comparison, however, is less clever: If the first comparison results in a1pivot, then the relationship between the elements A1, A2, and pivot is completely established — A1 <pivot< A2, and the probability of the remaining elements being aligned is denoted as P (no calculation required). And what if A2 is less than pivot? The relationship between A1 and A2 is still uncertain, that is, there are two cases in the branch: A1 < A2 <pivot, and A2 < A1 <pivot. In either case, the remaining permutations are P, so the remaining permutations in this branch are 2P. So when A2 is less than pivot, there are 2/3 possibilities left to check.

Further, if the second comparison does find a2<pivot, the third comparison is even worse. Imitating the reasoning above, the probability that A3 <pivot will be 3/4!

That’s why fast sorting isn’t that fast either, because it doesn’t cut the remaining possibilities in half with every comparison.

5. Section

Thinking of sorting problems as a way to narrow down the range of possible outcomes by asking questions, like guessing numbers, makes it clear that the “best questions” are those that divide all possibilities evenly, no matter what the answer is, Can rule out k-1/k possibilities, whereas unbalanced problems always have one or more of the answer branches that rule out less possibilities than k-1/k. So the lower bound of the strategy gets dragged down.

Four. The concrete implementation of quicksort

Know the idea and principle of quick row, let’s use javascript to achieve a specific quick row.

First, define a quickSort function that takes an array.

var quickSort = function(arr) {
};Copy the code

Then, check the number of elements in the array and return if it is less than or equal to 1.

var quickSort = function(arr) {if (arr.length <= 1) { returnarr; }};Copy the code

Next, select the Pivot, separate it from the original array, and define two empty arrays to hold two subsets, one left and one right.

var quickSort = function(arr) {if (arr.length <= 1) { returnarr; } var pivotIndex = math.floor (arr.length / 2); Var pivot = pivot. Splice (pivotIndex, 1)[0]; Var left = []; var right = []; };Copy the code

We then start iterating through the set of numbers, with elements smaller than the “base” going into the left subset and elements larger than the base going into the right subset.

var quickSort = function(arr) {if (arr.length <= 1) { returnarr; } var pivotIndex = math.floor (arr.length / 2); Var pivot = pivot. Splice (pivotIndex, 1)[0]; Var left = []; var right = [];for(var i = 0; i < arr.length; i++){if(arr[I] < pivot) {left.push(arr[I]); }else[I] {right. Push (arr); }}};Copy the code

Finally, repeat the process over and over again using recursion to get the sorted array.

var quickSort = function(arr) {if (arr.length <= 1) { returnarr; } var pivotIndex = math.floor (arr.length / 2); Var pivot = pivot. Splice (pivotIndex, 1)[0]; Var left = []; var right = [];for(var i = 0; i < arr.length; i++){if(arr[I] < pivot) {left.push(arr[I]); }else[I] {right. Push (arr); }}return quickSort(left).concat([pivot], quickSort(right));
};Copy the code

To do so, simply call quickSort().

Source: Ruan Yifeng, Liu Weipeng