- Sorting Algorithms in Python
- Marcus Sanatan
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: fireairforce
- Proofreader: JalanJiang, csming1995
Python implements sorting algorithms
Introduction to the
Sometimes the data we store or retrieve in an application may be out of order. We may need to reorder the data if we want to process it properly or use it efficiently. Over the years, computer scientists have created many sorting algorithms to process data.
In this article, we’ll look at some popular sorting algorithms, see how they work, and implement them in Python. They will also compare how quickly they sort the elements in the list.
For simplicity, these algorithms sort all the numbers in the list in ascending order. Of course, you can adjust it according to your own needs.
Bubble sort
This simple sorting algorithm compares the elements in the list by iterating through the list in pairs and swapping them until the larger elements “bubble” to the end of the list and the smaller elements remain “at the bottom.”
introduce
Let’s start by comparing the first two elements of the list. If the first element is greater than the second, we swap them. If they are already sorted, we leave them as they are. Then we move on to the next pair of elements, compare their values, and swap as needed. This process continues until the last pair of elements in the list.
When the end of the list is reached, it repeats the process for each pair of elements. However, this process is inefficient. What if we only need to swap once in the array? Why do we still iterate n^2 times, even though the array is sorted?
Obviously, in order to optimize the algorithm, we need to stop it when we’re done sorting.
So how do we know we’re done sorting? If the elements are ordered, then we don’t have to keep swapping. Therefore, whenever values are swapped, we set a flag value to True to repeat the sorting process. If no exchange occurs, the flag value remains False and the algorithm stops.
implementation
After optimization, we can implement bubble sort with the following Python code:
def bubble_sort(nums):
# We set the flag value swapped to True so that the loop can execute at least once
swapped = True
while swapped:
swapped = False
for i in range(len(nums) - 1) :if nums[i] > nums[i + 1] :# Swap elements
nums[i], nums[i + 1] = nums[i + 1], nums[i]
# Set the flag value to True so that we can loop again
swapped = True
Check to see if it can be executed correctly
random_list_of_nums = [5.2.1.8.4]
bubble_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code
The algorithm runs in a while loop that breaks out only when no elements can be swapped. We start by setting swapped to True to ensure that the algorithm can be executed at least once.
Time complexity
In the worst case (when the list is in reverse order), the algorithm must swap each item of the array. At each iteration, the flag value swapped is set to True. Therefore, if we have n elements in the list, we will iterate over each element n times, so the time complexity of bubble sort is O(n^2).
Selection sort
The algorithm divides the list into two parts: the sorted part and the unsorted part. We keep removing the smallest element from the unsorted part of the list and adding it to the sorted part.
introduce
In fact, we don’t need to create a new list for the sorted elements; what we do is make the leftmost part of the list the sorted part. We then search for the smallest element in the entire list and swap it with the first element.
Now that we know that the first element of the list is ordered, we continue searching for the smallest of the remaining elements and swap it with the second element. This iterates until the element to be checked is the last item in the remaining list.
implementation
def selection_sort(nums):
The value of # I corresponds to the number of sorted values
for i in range(len(nums)):
# We assume that the first term of the unsorted portion is the smallest
lowest_value_index = i
# This loop is used to iterate over unsorted items
for j in range(i + 1, len(nums)):
if nums[j] < nums[lowest_value_index]:
lowest_value_index = j
Swap the smallest value of the unsorted element with the value of the first unsorted element
nums[i], nums[lowest_value_index] = nums[lowest_value_index], nums[i]
Check whether the algorithm is correct
random_list_of_nums = [12.8.3.20.11]
selection_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code
You can see that as I increases, we have fewer and fewer elements to check.
Time complexity
In the selection sorting algorithm, we can easily get the time complexity by checking the number of for loops. For a list of n elements, the outer loop iterates n times. The inner loop iterates n-1 times for I 1, n-2 times for I 2 and so on.
The sum of The Times of comparison is (n-1) + (n-2) +… + 1, thus the time complexity of the selection sorting algorithm is O(n^2).
Insertion sort
Like selection sort, the algorithm divides the list into sorted parts and unsorted parts. It inserts the iterated elements into the correct position in the sorted list by iterating over the unsorted portion.
introduce
Let’s assume that the first element of the list is sorted. And then we go to the next element, which we’ll call x. If x is greater than the first element, we’re going to keep iterating. If x is small, we copy the value of the first element to the second location, and then set the value of the first element to x.
As we work with other elements in the unsorted part, we keep moving the larger elements in the sorted part up until we encounter an element less than x or reach the end of the sorted part, and then place x in the correct place.
implementation
def insertion_sort(nums):
# We assume that the first element is sorted, and start iterating from the second element
for i in range(1, len(nums)):
item_to_insert = nums[i]
Keep the index of the subscript of the previous element
j = i - 1
If all items in the sorted segment are larger than the item to be inserted, move them forward
while j >= 0 and nums[j] > item_to_insert:
nums[j + 1] = nums[j]
j -= 1
# Insert element
nums[j + 1] = item_to_insert
Verify that the algorithm is correct
random_list_of_nums = [9.1.15.28.6]
insertion_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code
Time complexity
In the worst case, the array will be sorted in reverse order. The outer for loop in the insertion sort always iterates n-1 times.
In the worst case, the internal for loop will swap once, then twice, and so on. The number of swaps will be 1 + 2 +… + (n – 3) + (n – 2) + (n – 1), which gives insertion sort O(n^2) time complexity.
Heap sort
This popular sorting algorithm, like insertion sort and selection sort, divides lists into sorted and unsorted parts. It converts the unsorted segments of the list into a heap of data structures so that we can efficiently determine the largest element.
introduce
We first convert the list to a maximum heap — a binary tree whose largest element is the root node. And then let’s put this node at the end of the list. Then we rebuild the missing maximum heap and place the new maximum before the last item in the list.
We then repeat the heap-building process until all nodes are removed.
implementation
We create a helper function, heapify, to help implement this algorithm:
def heapify(nums, heap_size, root_index):
Set maximum element index to root node index
largest = root_index
left_child = (2 * root_index) + 1
right_child = (2 * root_index) + 2
Update the maximum element if the left child of the root node is a valid index and the element is larger than the current maximum element
if left_child < heap_size and nums[left_child] > nums[largest]:
largest = left_child
Do the same for the right child of the root node
if right_child < heap_size and nums[right_child] > nums[largest]:
largest = right_child
If the largest elements are no longer root elements, they are swapped
iflargest ! = root_index: nums[root_index], nums[largest] = nums[largest], nums[root_index]Adjust the heap to ensure that the new root node element is the largest element
heapify(nums, heap_size, largest)
def heap_sort(nums):
n = len(nums)
Create a maximum heap using a list
The second argument to # range means that we will stop before the element whose index value is -1, the first element in the list
The third argument to # range indicates that we iterate in the opposite direction
# decrease I by 1
for i in range(n, - 1.- 1):
heapify(nums, n, i)
Move the largest heap of root elements to the end of the list
for i in range(n - 1.0.- 1):
nums[i], nums[0] = nums[0], nums[i]
heapify(nums, i, 0)
Verify that the algorithm is correct
random_list_of_nums = [35.12.43.8.51]
heap_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code
Time complexity
Let’s first look at the time complexity of the Heapify function. In the worst case, the largest element is never the root element, which results in recursive calls to the Heapify function. While recursive calls may seem like a performance drain, remember that we are using a binary tree here.
Visualize a three-element binary tree of height 2. Now visualize a binary tree with seven elements and a height of 3. The tree grows logarithmically to n. The Heapify function traverses the tree in O(log(n)) time.
Heap_sort iterates through the array n times. Therefore, the total time complexity of the heapsort algorithm is O(nlog(n)).
Merge sort
This divide-and-conquer algorithm splits a list into two parts and keeps splitting the rest of the list in half until there is only one element left.
Adjacent elements become sort pairs, which then merge and sort with other sort pairs. This process continues until we have a sorted list that sorts all the elements in the unsorted input list.
introduce
We recursively split the list in half until we get a list of length 1. We then merge each of the pieces that have been partitioned, sorting them in the process.
Sorting is done by comparing the smallest elements in each half. The first element of each list is the first element to be compared. If the first half begins with a smaller value, we add it to the sorted list. We then compare the second minimum of the first half with the first minimum of the second half.
Every time we select a smaller value at the beginning of the half, we move the items we want to compare.
Introduction to the
def merge(left_list, right_list):
sorted_list = []
left_list_index = right_list_index = 0
# We use list length a lot, so make it a variable for easy use
left_list_length, right_list_length = len(left_list), len(right_list)
for _ in range(left_list_length + right_list_length):
if left_list_index < left_list_length and right_list_index < right_list_length:
We check which value is smaller at the beginning of each list
If the item at the beginning of the left list is small, add it to the sorted list
if left_list[left_list_index] <= right_list[right_list_index]:
sorted_list.append(left_list[left_list_index])
left_list_index += 1
If the item at the beginning of the right list is small, add it to the sorted list
else:
sorted_list.append(right_list[right_list_index])
right_list_index += 1
If the end of the left list is reached, add elements from the right list
elif left_list_index == left_list_length:
sorted_list.append(right_list[right_list_index])
right_list_index += 1
If the end of the right list is reached, add elements from the left list
elif right_list_index == right_list_length:
sorted_list.append(left_list[left_list_index])
left_list_index += 1
return sorted_list
def merge_sort(nums):
# If there is only one element in the list, return it
if len(nums) <= 1:
return nums
The index must be an integer
mid = len(nums) // 2
# Sort and merge each half
left_list = merge_sort(nums[:mid])
right_list = merge_sort(nums[mid:])
# Merge sorted lists into new lists
return merge(left_list, right_list)
Verify that the algorithm is correct
random_list_of_nums = [120.45.68.250.176]
random_list_of_nums = merge_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code
Note that the merge_sort() function differs from previous sorting algorithms in that it returns a sorted new list rather than sorting an existing list.
Therefore, merge sort requires space to create a new list of the same size as the input list.
Time complexity
Let’s first look at the merge function. It takes two lists and iterates n times, where n is the combined size of the two lists. The merge_sort function splits the given array into two and sorts the subarrays recursively. Since the recursive input is half of a given array, like a binary tree, this increases the processing time logarithmically to n.
Thus, the overall time complexity of merge sort is O(nlog(n)).
Quick sort
This divide-and-conquer algorithm is the most commonly used sorting algorithm in this article. If used wisely, it can be very efficient and does not require the extra space required by merge sort. We partition the list around a base value and sort the elements around the base value.
introduce
Quicksort starts by partitioning the list — selecting the first value of the list to be sorted. This value is called the reference value. All elements less than the base value are moved to the left.
The base value is in the right place, and we recursively sort the elements around the base value until the whole list is in order.
implementation
There are different ways of quicksort partitioning. Here's how Hoare implements it. Tony Hoare also created the quicksort algorithm.
def partition(nums, low, high):
# We chose the middle element as the reference value.
Some implementations choose either the first element or the last element as a reference value.
Sometimes an intermediate element or a random element is used as a reference value.
There are many more methods you can select or create.
pivot = nums[(low + high) // 2]
i = low - 1
j = high + 1
while True:
i += 1
while nums[i] < pivot:
i += 1
j -= 1
while nums[j] > pivot:
j -= 1
if i >= j:
return j
# If the elements at I (to the left of the base value) are greater than the elements at j (to the right of the base value), they are swapped.
nums[i], nums[j] = nums[j], nums[i]
def quick_sort(nums):
Create a helper function to make a recursive call
def _quick_sort(items, low, high):
if low < high:
# This is the index after the base element, where our list is split
split_index = partition(items, low, high)
_quick_sort(items, low, split_index)
_quick_sort(items, split_index + 1, high)
_quick_sort(nums, 0, len(nums) - 1)
Check whether the algorithm is correct
random_list_of_nums = [22.5.1.18.99]
quick_sort(random_list_of_nums)
print(random_list_of_nums)
Copy the code
Time complexity
The worst case scenario is to always select the minimum or maximum element as the reference value. This creates a partition of size n-1, resulting in recursive calls n-1 times. This results in a worst-case time complexity of O(n^2).
Although the worst case scenario is worse, quicksort is still heavily used because its average time complexity is much faster than other sorting algorithms. Although the partition function uses a nested while loop, it compares all the elements of the array for swap. Therefore, its time complexity is only O(n).
If you choose a good reference value, the quicksort function will divide the array into two parts that will grow logarithmically with n. Therefore, the average time complexity of the quicksort algorithm is O(nlog(n)).
Python’s built-in sorting functions
While it is beneficial to understand these sorting algorithms, in most Python projects you will probably use sorting functions that are already provided in the language.
We can change the list to sort its contents by sort() :
apples_eaten_a_day = [2.1.1.3.1.2.2]
apples_eaten_a_day.sort()
print(apples_eaten_a_day) # [1, 1, 1, 2, 2, 2, 3]
Copy the code
Or we can use the sorted() function to create a new sorted list:
apples_eaten_a_day_2 = [2.1.1.3.1.2.2]
sorted_apples = sorted(apples_eaten_a_day_2)
print(sorted_apples) # [1, 1, 1, 2, 2, 2, 3]
Copy the code
They are all sorted in ascending order, but you can easily sort in descending order by setting the Reverse flag to True:
# Reverse sort the list
apples_eaten_a_day.sort(reverse=True)
print(apples_eaten_a_day) # [3, 2, 2, 2, 1, 1, 1]
Sort backwards to get a new list
sorted_apples_desc = sorted(apples_eaten_a_day_2, reverse=True)
print(sorted_apples_desc) # [3, 2, 2, 2, 1, 1, 1]
Copy the code
Unlike the sorting algorithm functions we created, both of these functions sort lists of tuples and classes. The sorted() function sorts any iterable, including lists, strings, tuples, dictionaries, collections, and custom iterators that you can create.
These Sort functions implement the Tim Sort algorithm, which is inspired by merge Sort and insertion Sort.
Speed is
To get an idea of their execution speed, we generated a list of 5,000 numbers between 0 and 1000. Then we calculate how long each algorithm takes to complete. Each algorithm was run 10 times to allow us to build a more reliable performance model.
Here are the results, in seconds:
Run | The bubbling | choose | insert | The heap | merge | fast |
---|---|---|---|---|---|---|
1 | 5.53188 | 1.23152 | 1.60355 | 0.04006 | 0.02619 | 0.01639 |
2 | 4.92176 | 1.24728 | 1.59103 | 0.03999 | 0.02584 | 0.01661 |
3 | 4.91642 | 1.22440 | 1.59362 | 0.04407 | 0.02862 | 0.01646 |
4 | 5.15470 | 1.25053 | 1.63463 | 0.04128 | 0.02882 | 0.01860 |
5 | 4.95522 | 1.28987 | 1.61759 | 0.04515 | 0.03314 | 0.01885 |
6 | 5.04907 | 1.25466 | 1.62515 | 0.04257 | 0.02595 | 0.01628 |
7 | 5.05591 | 1.24911 | 1.61981 | 0.04028 | 0.02733 | 0.01760 |
8 | 5.08799 | 1.25808 | 1.62603 | 0.04264 | 0.02633 | 0.01705 |
9 | 5.03289 | 1.24915 | 1.61446 | 0.04302 | 0.03293 | 0.01762 |
10 | 5.14292 | 1.22021 | 1.57273 | 0.03966 | 0.02572 | 0.01606 |
On average, | 5.08488 | 1.24748 | 1.60986 | 0.04187 | 0.02809 | 0.01715 |
If you test yourself, you will get different values, but the observed performance model should be the same or similar. Bubble sort is the slowest and worst performing of all algorithms. While it is useful as an introduction to sorting and algorithms, it is not suitable for practical use.
We also notice that quicksort is very fast, it’s almost twice as fast as merge sort, and it doesn’t require extra space to run. Recall that our partitioning is based on the middle element of the list, and different partitioning methods may have different results.
Since insert sort performs much fewer comparisons than selection sort, implementation of insert sort is generally faster, but selection sort was slightly faster in our tests.
Insert sorts swap elements more often than selection sorts. If swapping values takes more time than comparing values, the “opposite” result is believable.
Be careful when choosing a sorting algorithm because it can affect performance.
conclusion
Sorting algorithms give us a number of ways to sort data. We looked at six different algorithms — bubble sort, selection sort, insertion sort, merge sort, heap sort, and quicksort — and their implementation in Python.
The amount of comparison and exchange performed by the algorithm and the environment in which the code is run are key determinants of performance. In real Python applications, it is recommended that we stick with the built-in Python sorting functions because of their flexibility in input and speed.
If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.