Have you ever asked yourself what algorithm Swift uses for sorting? There are a lot of sorting algorithms out there, and it’s likely that you’ll rarely need to use anything other than the language’s built-in methods. However, if you want to prevent unwanted behavior and annoying edge cases, it is important to understand the properties of the sorting algorithm built into your language. sort()
When analyzing the sorting algorithm, you need to search for two properties:
1 – Classification stability
The stability of a sorting algorithm indicates the ability of the algorithm to maintain the original order of equal elements after sorting. An unstable sorting algorithm does not guarantee that the order of the same elements in an unsorted array will remain the same after sorting, whereas a stable one guarantees that they will remain the same.
This may sound strange, after all, if the elements are the same, why should I care about their overall order? If you sort elements by value, but when sorting elements by arbitrary priority, using an unstable algorithm may give you undesirable results.
Let’s say we’re building a music player and our current task is to sort songs according to their popularity:
struct Music: Comparable, Equatable, CustomStringConvertible {
let name: String
let popularityValue: Int
static func < (lhs: Music, rhs: Music) -> Bool {
return lhs.popularityValue < rhs.popularityValue
}
var description: String {
return name
}
}
var songs: [Music] = [
Music(name: "I'm that swifty", popularityValue: 3),
Music(name: "Swift boi", popularityValue: 5),
Music(name: "Swift That Walk", popularityValue: 1),
Music(name: "Too Swift", popularityValue: 5),
]
Copy the code
If we were to use Quicksort, we would get the following result:
extension Array where Element: Equatable & Comparable {
func quicksort(comparison: ((Element, Element) -> Bool)) -> [Element] {
var copy = self
copy.quick(0, count - 1, comparison: comparison)
return copy
}
mutating private func quick(_ i: Int, _ j: Int, comparison: ((Element, Element) -> Bool)) {
guard i < j else {
return
}
let pivot = partition(i, j, comparison: comparison)
quick(i, pivot - 1, comparison: comparison)
quick(pivot + 1, j, comparison: comparison)
}
mutating private func partition(_ i: Int, _ j: Int, comparison: ((Element, Element) -> Bool)) -> Int {
let pivotElement = self[j]
var indexToAdd = i - 1
for k ini.. <j {if comparison(self[k], pivotElement) {
indexToAdd += 1
swapAt(indexToAdd, k)
}
}
swapAt(indexToAdd + 1, j)
return indexToAdd + 1
}
}
songs = songs.quicksort {
$0.popularityValue > The $1.popularityValue
}
print(songs)
Copy the code
// [Too Swift, Swift boi, I'm that swifty, Swift That Walk]
Copy the code
Although the “Swift BOI “was placed “Too Swift” before the original array, Quicksort changed their location!
We’ve never really used an unsorted version of an array, but that’s not too bad. But consider what happens if we reorder the array multiple times:
songs = songs.quicksort {
$0.popularityValue > The $1.popularityValue
}
print(songs)
songs = songs.quicksort {
$0.popularityValue > The $1.popularityValue
}
print(songs)
songs = songs.quicksort {
$0.popularityValue > The $1.popularityValue
}
print(songs)
songs = songs.quicksort {
$0.popularityValue > The $1.popularityValue
}
print(songs)
songs = songs.quicksort {
$0.popularityValue > The $1.popularityValue
}
print(songs)
Copy the code
// [Too Swift, Swift boi, I'm that swifty, Swift That Walk]
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
// [Too Swift, Swift boi, I'm that swifty, Swift That Walk]
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
// [Too Swift, Swift boi, I'm that swifty, Swift That Walk]
Copy the code
Their relative order is constantly changing!
The reason is that Quicksort is an unstable sorting algorithm. If for some reason we need to constantly update this list in the UI, the user will see the song change position in the ranking, even though they have the same priority. That’s not very good.
To keep order, we need to use a stabilization algorithm like Mergesort.
extension Array where Element: Equatable & Comparable {
func mergesort(comparison: ((Element, Element) -> Bool)) -> [Element] {
return merge(0, count - 1, comparison: comparison)
}
private func merge(_ i: Int, _ j: Int, comparison: ((Element, Element) -> Bool)) -> [Element] {
guard i <= j else {
return[] } guard i ! = jelse {
return [self[i]]
}
let half = i + (j - i) / 2
let left = merge(i, half, comparison: comparison)
let right = merge(half + 1, j, comparison: comparison)
var i1 = 0
var i2 = 0
var new = [Element]()
new.reserveCapacity(left.count + right.count)
while i1 < left.count && i2 < right.count {
if comparison(right[i2], left[i1]) {
new.append(right[i2])
i2 += 1
} else {
new.append(left[i1])
i1 += 1
}
}
while i1 < left.count {
new.append(left[i1])
i1 += 1
}
while i2 < right.count {
new.append(right[i2])
i2 += 1
}
return new
}
}
Copy the code
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
// [Swift boi, Too Swift, I'm that swifty, Swift That Walk]
Copy the code
2 – Temporal/spatial complexity
The second important thing to note is how much extra memory the algorithm needs to run and the best/worst case of the algorithm.
My favorite example is Counting Sort: Sort an array by simply Counting the number of occurrences of each element and Counting them in order. If the difference between each value is small, for example, this algorithm can Sort the array at runtime very close to O (n) – but if the difference is large, for example, Counting Sort will take a lot of time to run, even though the array is small. ,1,4,2,5 [3] [000] 1100000
Similarly, the famous quicksort is known as an average fast and in-place O (n log n) algorithm, but it has a terrible worst-case O (n2) if the pivot is always the highest/smallest element. The partition. If you are dealing with a large amount of data or a restricted environment, there will be a specific sorting algorithm that best suits your needs.
Pre-swift 5 algorithm: Introsort
Prior to Swift 5, Swift’s sorting algorithm was a hybrid algorithm called Introsort, which mixed intensity and single algorithm together Quicksort to guarantee worse O (n log n) cases. HeapsortInsertion Sort
The idea behind Introsort is simple: First, if there are fewer than 20 elements in a partition, use the Insertion Sort. Although the algorithm has a worst-case O (n2), it also has a best-case O (n). Compared with ordinary O (N log N) algorithm, the Insertion Sort always performs better in small input.
If the array is not small, Quicksort is used. This will take our best case to O (n log n), but also keep the worst case O (n2). However, Introsort can avoid it – if Quicksort’s recursion tree is too deep, the partition will switch to Heapsort. In this case, “too deep” is considered. 2 * floor(log2(array.count))
internal mutating func _introSortImpl(within range: Range<Index>,
by areInIncreasingOrder: (Element, Element) throws -> Bool,
depthLimit: Int) rethrows {
// Insertion sort is better at handling smaller regions.
if distance(from: range.lowerBound, to: range.upperBound) < 20 {
try _insertionSort(within: range, by: areInIncreasingOrder)
} else if depthLimit == 0 {
try _heapSort(within: range, by: areInIncreasingOrder)
} else {
// Partition and sort.
// We don't check the depthLimit variable for underflow because this // variable is always greater than zero (see check above). let partIdx = try _partition(within: range, by: areInIncreasingOrder) try _introSortImpl( within: range.lowerBound.. Copy the code
Introsort greedily tries to choose the best algorithm for a given situation, backing up to different options if the previous one goes wrong. It has worst-case best-case O (n) and O (n log n), making it a suitable one-size-fits-all algorithm.
In terms of memory usage, it will be slightly worse than the usual sorting algorithm. Although these three algorithms can sort in place, Introsort is implemented recursively in Swift. Since Swift does not guarantee that tail-recursive calls will be optimized, running the built-in Swift 5 before is not the best option if memory usage is kept low. sort()
Most notably, the Introsort is unstable. While the Insertion Sort is stable, the default implementations of Quicksort and Heapsort are not. If the order of the same elements is important, it’s not a good idea to use Swift 5 before. sort()
After Swift 5 – Timsort
Multiple threads surfaced between 2015 and 2018 to add a stable sorting algorithm to Swift that didn’t rely on recursion, but promising discussions first surfaced in early 2018. In October 2018, a pull request was finally merged to change Introsort to a modified version of Timsort.
Timsort is a hybrid algorithm like Introsort, but with a different approach. It works by dividing an array into smaller parts, sorting these smaller parts using “insert sort,” and then merging those parts with “merge sort.” Because both Insertion Sort and Mergesort are stable, Timsort is stable, and while it also has the worst O (n log n) and non-constant spatial complexity, it tends to be much faster than the more naive algorithms in real world scenarios. . The reason Timsort is so fast is that while the description sounds simple, each step is actually highly tuned for efficiency.
Find the next “run” (partition)
Instead of splitting everything first and merging it into Mergesort as a final step, Timsort scans the array once and gradually merges these areas (called “runs”).
The beauty is that, unlike Mergesort, running isn’t just an array divided by half. Timsort abuses the fact that each array you’re sorting may have several contiguous subsequences nearly or already sorted, which is exactly the best case of the Insertion Sort. To find the next run, Timsort advances one pointer until the current sequence stops in ascending/descending mode:
[1, 2, 3, 1, 9, 6]
i j
Copy the code
Note: This example is for visual purposes only – because the array here is small, Timsort will just insert it for immediate sorting.
The range from I to j defined our first run, but optimization doesn’t stop there.
First: If the sequence is falling, we can already sort it in linear time by reversing the elements.
Second: To speed up the insertion sort and balance out the number of merge operations that will be done later, Timsort defines that each run should have a minimum power of 2 between 16 and 128, or at least very close to it. If we find a run that is smaller than the minimum run size, we extend the current run and Sort using the Insertion Sort.
// Find the next consecutive run, reversing it if necessary.
var (end, descending) = try _findNextRun(in: self, from: start, by: areInIncreasingOrder)
ifdescending { _reverse(within: start.. <end) } // If the current run is shorter than the minimum length, use the // insertion sort to extend it.if end < endIndex && end - start < minimumRunLength {
letnewEnd = Swift.min(endIndex, start + minimumRunLength) try _insertionSort(within: start.. <newEnd, sortedEnd: end, by: areInIncreasingOrder) end = newEnd }Copy the code
For the actual size, Swift specifically selects values between 32 and 64, which vary depending on the size of the array. Finally, after a run is found, it is added to a stack that contains all the other previous runs we found.
Merge operation
Each time a run is found, Timsort tries to merge the first three runs of the stack into one until the following conditions are met:
runs.count < 3 || runs[runs.count - 2].size > (runs[runs.count - 1].size + runs.last! .count) && runs.count < 2 || runs[runs.count - 1].size > runs.last! .sizeCopy the code
This is done to reduce the number of runs we have to remember, but mainly to balance out the overall size of the runs, since bringing them together is good for the Timsort merge phase.
At first glance, the Timsort merge phase looks like Mergesort: We compare a pair of elements in each array, select the smallest element and move it to the correct position in the final array.
However, Timsort beautifully lends itself to the fact that if a particular array continues to “win” comparisons, it may continue to do so. If this happens, we can simply binary search for the correct location of elements from the “lost” array in the “winning” array, rather than continue to compare elements and move all elements to the front of the final array. This is called a gallop, and it saves us a lot of time by allowing us to skip an entire block comparing the winning array.
If the binary search process is “lost” to the regular comparison process, the gallop process can be aborted. You can see all the optimizations done in the Timsort description. Finally, after all runs are found, the remaining runs in the stack are gradually merged together until the entire array is sorted.
The main difference with Swift is that the compiler’s Timsort implementation does not use Pentium and tries to fold based on the last four runs rather than the last three. Nonetheless, it outperforms Introsort in almost every case.
conclusion
Timsort is one of the fastest sorting algorithms for real world problems. Knowing the algorithms your language uses can help you optimize your code by making better decisions based on the data you’re working with.
Click here to enter the communication group, to share or post your questions, comments or feedback, and let me know any suggestions and corrections you would like to share.
Thanks for reading ~ click “like” and then go! 🚀
Original address swiftrocks.com/introsort-t…