• Sorting Algorithms in Python
  • Marcus Sanatan
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: fireairforce
  • Proofreader: JalanJiang, csming1995

Python implements sorting algorithms

Introduction to the

Sometimes the data we store or retrieve in an application may be out of order. We may need to reorder the data if we want to process it properly or use it efficiently. Over the years, computer scientists have created many sorting algorithms to process data.

In this article, we’ll look at some popular sorting algorithms, see how they work, and implement them in Python. They will also compare how quickly they sort the elements in the list.

For simplicity, these algorithms sort all the numbers in the list in ascending order. Of course, you can adjust it according to your own needs.

Bubble sort

This simple sorting algorithm compares the elements in the list by iterating through the list in pairs and swapping them until the larger elements “bubble” to the end of the list and the smaller elements remain “at the bottom.”

introduce

Let’s start by comparing the first two elements of the list. If the first element is greater than the second, we swap them. If they are already sorted, we leave them as they are. Then we move on to the next pair of elements, compare their values, and swap as needed. This process continues until the last pair of elements in the list.

When the end of the list is reached, it repeats the process for each pair of elements. However, this process is inefficient. What if we only need to swap once in the array? Why do we still iterate n^2 times, even though the array is sorted?

Obviously, in order to optimize the algorithm, we need to stop it when we’re done sorting.

So how do we know we’re done sorting? If the elements are ordered, then we don’t have to keep swapping. Therefore, whenever values are swapped, we set a flag value to True to repeat the sorting process. If no exchange occurs, the flag value remains False and the algorithm stops.

implementation

After optimization, we can implement bubble sort with the following Python code:

def bubble_sort(nums):
    # We set the flag value swapped to True so that the loop can execute at least once
    swapped = True
    while swapped:
        swapped = False
        for i in range(len(nums) - 1) :if nums[i] > nums[i + 1] :# Swap elements
                nums[i], nums[i + 1] = nums[i + 1], nums[i]
                # Set the flag value to True so that we can loop again
                swapped = True

Check to see if it can be executed correctly
random_list_of_nums = [5.2.1.8.4]
bubble_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code

The algorithm runs in a while loop that breaks out only when no elements can be swapped. We start by setting swapped to True to ensure that the algorithm can be executed at least once.

Time complexity

In the worst case (when the list is in reverse order), the algorithm must swap each item of the array. At each iteration, the flag value swapped is set to True. Therefore, if we have n elements in the list, we will iterate over each element n times, so the time complexity of bubble sort is O(n^2).

Selection sort

The algorithm divides the list into two parts: the sorted part and the unsorted part. We keep removing the smallest element from the unsorted part of the list and adding it to the sorted part.

introduce

In fact, we don’t need to create a new list for the sorted elements; what we do is make the leftmost part of the list the sorted part. We then search for the smallest element in the entire list and swap it with the first element.

Now that we know that the first element of the list is ordered, we continue searching for the smallest of the remaining elements and swap it with the second element. This iterates until the element to be checked is the last item in the remaining list.

implementation

def selection_sort(nums):
    The value of # I corresponds to the number of sorted values
    for i in range(len(nums)):
        # We assume that the first term of the unsorted portion is the smallest
        lowest_value_index = i
        # This loop is used to iterate over unsorted items
        for j in range(i + 1, len(nums)):
            if nums[j] < nums[lowest_value_index]:
                lowest_value_index = j
        Swap the smallest value of the unsorted element with the value of the first unsorted element
        nums[i], nums[lowest_value_index] = nums[lowest_value_index], nums[i]

Check whether the algorithm is correct
random_list_of_nums = [12.8.3.20.11]
selection_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code

You can see that as I increases, we have fewer and fewer elements to check.

Time complexity

In the selection sorting algorithm, we can easily get the time complexity by checking the number of for loops. For a list of n elements, the outer loop iterates n times. The inner loop iterates n-1 times for I 1, n-2 times for I 2 and so on.

The sum of The Times of comparison is (n-1) + (n-2) +… + 1, thus the time complexity of the selection sorting algorithm is O(n^2).

Insertion sort

Like selection sort, the algorithm divides the list into sorted parts and unsorted parts. It inserts the iterated elements into the correct position in the sorted list by iterating over the unsorted portion.

introduce

Let’s assume that the first element of the list is sorted. And then we go to the next element, which we’ll call x. If x is greater than the first element, we’re going to keep iterating. If x is small, we copy the value of the first element to the second location, and then set the value of the first element to x.

As we work with other elements in the unsorted part, we keep moving the larger elements in the sorted part up until we encounter an element less than x or reach the end of the sorted part, and then place x in the correct place.

implementation

def insertion_sort(nums):
    # We assume that the first element is sorted, and start iterating from the second element
    for i in range(1, len(nums)):
        item_to_insert = nums[i]
        Keep the index of the subscript of the previous element
        j = i - 1
        If all items in the sorted segment are larger than the item to be inserted, move them forward
        while j >= 0 and nums[j] > item_to_insert:
            nums[j + 1] = nums[j]
            j -= 1
        # Insert element
        nums[j + 1] = item_to_insert

Verify that the algorithm is correct
random_list_of_nums = [9.1.15.28.6]
insertion_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code

Time complexity

In the worst case, the array will be sorted in reverse order. The outer for loop in the insertion sort always iterates n-1 times.

In the worst case, the internal for loop will swap once, then twice, and so on. The number of swaps will be 1 + 2 +… + (n – 3) + (n – 2) + (n – 1), which gives insertion sort O(n^2) time complexity.

Heap sort

This popular sorting algorithm, like insertion sort and selection sort, divides lists into sorted and unsorted parts. It converts the unsorted segments of the list into a heap of data structures so that we can efficiently determine the largest element.

introduce

We first convert the list to a maximum heap — a binary tree whose largest element is the root node. And then let’s put this node at the end of the list. Then we rebuild the missing maximum heap and place the new maximum before the last item in the list.

We then repeat the heap-building process until all nodes are removed.

implementation

We create a helper function, heapify, to help implement this algorithm:

def heapify(nums, heap_size, root_index):
    Set maximum element index to root node index
    largest = root_index
    left_child = (2 * root_index) + 1
    right_child = (2 * root_index) + 2

    Update the maximum element if the left child of the root node is a valid index and the element is larger than the current maximum element
    if left_child < heap_size and nums[left_child] > nums[largest]:
        largest = left_child

    Do the same for the right child of the root node
    if right_child < heap_size and nums[right_child] > nums[largest]:
        largest = right_child

    If the largest elements are no longer root elements, they are swapped
    iflargest ! = root_index: nums[root_index], nums[largest] = nums[largest], nums[root_index]Adjust the heap to ensure that the new root node element is the largest element
        heapify(nums, heap_size, largest)

def heap_sort(nums):
    n = len(nums)

    Create a maximum heap using a list
    The second argument to # range means that we will stop before the element whose index value is -1, the first element in the list
    The third argument to # range indicates that we iterate in the opposite direction
    # decrease I by 1
    for i in range(n, - 1.- 1):
        heapify(nums, n, i)

    Move the largest heap of root elements to the end of the list
    for i in range(n - 1.0.- 1):
        nums[i], nums[0] = nums[0], nums[i]
        heapify(nums, i, 0)

Verify that the algorithm is correct
random_list_of_nums = [35.12.43.8.51]
heap_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code

Time complexity

Let’s first look at the time complexity of the Heapify function. In the worst case, the largest element is never the root element, which results in recursive calls to the Heapify function. While recursive calls may seem like a performance drain, remember that we are using a binary tree here.

Visualize a three-element binary tree of height 2. Now visualize a binary tree with seven elements and a height of 3. The tree grows logarithmically to n. The Heapify function traverses the tree in O(log(n)) time.

Heap_sort iterates through the array n times. Therefore, the total time complexity of the heapsort algorithm is O(nlog(n)).

Merge sort

This divide-and-conquer algorithm splits a list into two parts and keeps splitting the rest of the list in half until there is only one element left.

Adjacent elements become sort pairs, which then merge and sort with other sort pairs. This process continues until we have a sorted list that sorts all the elements in the unsorted input list.

introduce

We recursively split the list in half until we get a list of length 1. We then merge each of the pieces that have been partitioned, sorting them in the process.

Sorting is done by comparing the smallest elements in each half. The first element of each list is the first element to be compared. If the first half begins with a smaller value, we add it to the sorted list. We then compare the second minimum of the first half with the first minimum of the second half.

Every time we select a smaller value at the beginning of the half, we move the items we want to compare.

Introduction to the

def merge(left_list, right_list):
    sorted_list = []
    left_list_index = right_list_index = 0

    # We use list length a lot, so make it a variable for easy use
    left_list_length, right_list_length = len(left_list), len(right_list)

    for _ in range(left_list_length + right_list_length):
        if left_list_index < left_list_length and right_list_index < right_list_length:
            We check which value is smaller at the beginning of each list
            If the item at the beginning of the left list is small, add it to the sorted list
            if left_list[left_list_index] <= right_list[right_list_index]:
                sorted_list.append(left_list[left_list_index])
                left_list_index += 1
            If the item at the beginning of the right list is small, add it to the sorted list
            else:
                sorted_list.append(right_list[right_list_index])
                right_list_index += 1

        If the end of the left list is reached, add elements from the right list
        elif left_list_index == left_list_length:
            sorted_list.append(right_list[right_list_index])
            right_list_index += 1
        If the end of the right list is reached, add elements from the left list
        elif right_list_index == right_list_length:
            sorted_list.append(left_list[left_list_index])
            left_list_index += 1

    return sorted_list

def merge_sort(nums):
    # If there is only one element in the list, return it
    if len(nums) <= 1:
        return nums

    The index must be an integer
    mid = len(nums) // 2

    # Sort and merge each half
    left_list = merge_sort(nums[:mid])
    right_list = merge_sort(nums[mid:])

    # Merge sorted lists into new lists
    return merge(left_list, right_list)

Verify that the algorithm is correct
random_list_of_nums = [120.45.68.250.176]
random_list_of_nums = merge_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code

Note that the merge_sort() function differs from previous sorting algorithms in that it returns a sorted new list rather than sorting an existing list.

Therefore, merge sort requires space to create a new list of the same size as the input list.

Time complexity

Let’s first look at the merge function. It takes two lists and iterates n times, where n is the combined size of the two lists. The merge_sort function splits the given array into two and sorts the subarrays recursively. Since the recursive input is half of a given array, like a binary tree, this increases the processing time logarithmically to n.

Thus, the overall time complexity of merge sort is O(nlog(n)).

Quick sort

This divide-and-conquer algorithm is the most commonly used sorting algorithm in this article. If used wisely, it can be very efficient and does not require the extra space required by merge sort. We partition the list around a base value and sort the elements around the base value.

introduce

Quicksort starts by partitioning the list — selecting the first value of the list to be sorted. This value is called the reference value. All elements less than the base value are moved to the left.

The base value is in the right place, and we recursively sort the elements around the base value until the whole list is in order.

implementation

There are different ways of quicksort partitioning. Here's how Hoare implements it. Tony Hoare also created the quicksort algorithm.
def partition(nums, low, high):
    # We chose the middle element as the reference value.
    Some implementations choose either the first element or the last element as a reference value.
    Sometimes an intermediate element or a random element is used as a reference value.
    There are many more methods you can select or create.
    pivot = nums[(low + high) // 2]
    i = low - 1
    j = high + 1
    while True:
        i += 1
        while nums[i] < pivot:
            i += 1

        j -= 1
        while nums[j] > pivot:
            j -= 1

        if i >= j:
            return j

        # If the elements at I (to the left of the base value) are greater than the elements at j (to the right of the base value), they are swapped.
        nums[i], nums[j] = nums[j], nums[i]

def quick_sort(nums):
    Create a helper function to make a recursive call
    def _quick_sort(items, low, high):
        if low < high:
            # This is the index after the base element, where our list is split
            split_index = partition(items, low, high)
            _quick_sort(items, low, split_index)
            _quick_sort(items, split_index + 1, high)

    _quick_sort(nums, 0, len(nums) - 1)

Check whether the algorithm is correct
random_list_of_nums = [22.5.1.18.99]
quick_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code

Time complexity

The worst case scenario is to always select the minimum or maximum element as the reference value. This creates a partition of size n-1, resulting in recursive calls n-1 times. This results in a worst-case time complexity of O(n^2).

Although the worst case scenario is worse, quicksort is still heavily used because its average time complexity is much faster than other sorting algorithms. Although the partition function uses a nested while loop, it compares all the elements of the array for swap. Therefore, its time complexity is only O(n).

If you choose a good reference value, the quicksort function will divide the array into two parts that will grow logarithmically with n. Therefore, the average time complexity of the quicksort algorithm is O(nlog(n)).

Python’s built-in sorting functions

While it is beneficial to understand these sorting algorithms, in most Python projects you will probably use sorting functions that are already provided in the language.

We can change the list to sort its contents by sort() :

apples_eaten_a_day = [2.1.1.3.1.2.2]
apples_eaten_a_day.sort()
print(apples_eaten_a_day) # [1, 1, 1, 2, 2, 2, 3]
Copy the code

Or we can use the sorted() function to create a new sorted list:

apples_eaten_a_day_2 = [2.1.1.3.1.2.2]
sorted_apples = sorted(apples_eaten_a_day_2)
print(sorted_apples) # [1, 1, 1, 2, 2, 2, 3]
Copy the code

They are all sorted in ascending order, but you can easily sort in descending order by setting the Reverse flag to True:

# Reverse sort the list
apples_eaten_a_day.sort(reverse=True)
print(apples_eaten_a_day) # [3, 2, 2, 2, 1, 1, 1]

Sort backwards to get a new list
sorted_apples_desc = sorted(apples_eaten_a_day_2, reverse=True)
print(sorted_apples_desc) # [3, 2, 2, 2, 1, 1, 1]
Copy the code

Unlike the sorting algorithm functions we created, both of these functions sort lists of tuples and classes. The sorted() function sorts any iterable, including lists, strings, tuples, dictionaries, collections, and custom iterators that you can create.

These Sort functions implement the Tim Sort algorithm, which is inspired by merge Sort and insertion Sort.

Speed is

To get an idea of their execution speed, we generated a list of 5,000 numbers between 0 and 1000. Then we calculate how long each algorithm takes to complete. Each algorithm was run 10 times to allow us to build a more reliable performance model.

Here are the results, in seconds:

Run The bubbling choose insert The heap merge fast
1 5.53188 1.23152 1.60355 0.04006 0.02619 0.01639
2 4.92176 1.24728 1.59103 0.03999 0.02584 0.01661
3 4.91642 1.22440 1.59362 0.04407 0.02862 0.01646
4 5.15470 1.25053 1.63463 0.04128 0.02882 0.01860
5 4.95522 1.28987 1.61759 0.04515 0.03314 0.01885
6 5.04907 1.25466 1.62515 0.04257 0.02595 0.01628
7 5.05591 1.24911 1.61981 0.04028 0.02733 0.01760
8 5.08799 1.25808 1.62603 0.04264 0.02633 0.01705
9 5.03289 1.24915 1.61446 0.04302 0.03293 0.01762
10 5.14292 1.22021 1.57273 0.03966 0.02572 0.01606
On average, 5.08488 1.24748 1.60986 0.04187 0.02809 0.01715

If you test yourself, you will get different values, but the observed performance model should be the same or similar. Bubble sort is the slowest and worst performing of all algorithms. While it is useful as an introduction to sorting and algorithms, it is not suitable for practical use.

We also notice that quicksort is very fast, it’s almost twice as fast as merge sort, and it doesn’t require extra space to run. Recall that our partitioning is based on the middle element of the list, and different partitioning methods may have different results.

Since insert sort performs much fewer comparisons than selection sort, implementation of insert sort is generally faster, but selection sort was slightly faster in our tests.

Insert sorts swap elements more often than selection sorts. If swapping values takes more time than comparing values, the “opposite” result is believable.

Be careful when choosing a sorting algorithm because it can affect performance.

conclusion

Sorting algorithms give us a number of ways to sort data. We looked at six different algorithms — bubble sort, selection sort, insertion sort, merge sort, heap sort, and quicksort — and their implementation in Python.

The amount of comparison and exchange performed by the algorithm and the environment in which the code is run are key determinants of performance. In real Python applications, it is recommended that we stick with the built-in Python sorting functions because of their flexibility in input and speed.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.