The previous “code byte” introduced some classical sorting algorithms, their time complexity is best O(nlogn), how to order the million level order according to the amount? Do you think you can implement merge sort and quicksort as you learned before? The functionality can be done, but it’s too inefficient. Today, “code brother” with you to play with the sorting algorithm under special scenarios, in the case of O(n) time complexity to achieve sorting.
The algorithm is similar to a supercar, which can be used in special situations to achieve great speed. Fast platoon and merge are economical and practical cars. Bucket sort, count sort, and base sort are racing cars on the track, also known as linear sort.
There’s no time to explain. Get in the car! Let’s start today with a bucket sort.
Bucket Sort
As the name implies, can use “bucket”, the core idea is to sort the data into a few ordered buckets, the data in each bucket in a separate sort, all the bucket data sorting completed, and then according to the bucket in order to take out the sequence, the sequence is ordered.
For bucket sorting to be efficient, we need to do the following:
- Try to increase the number of buckets when there is enough extra space.
- The mapping function is used to evenly distribute n input data into K buckets.
At the same time, the selection of sorting algorithm for the elements in the bucket is very important for performance.
The basic idea of bucket sort is to divide the array ARR into n equally sized subintervals (buckets), sort each subinterval separately, and finally merge.
Why is the time order n? Let’s find out.
If we have n pieces of data to sort, we need to use the function mapping to evenly distribute it into K buckets with y = n/k elements per bucket.
Then each bucket is quickqueued internally, so the time complexity is O(y * y * logy), and the time complexity of k buckets is O(k * y * logy) because y = n/k. So the time of the whole bucket sort is order n log n over k. When the number of buckets k approaches the number of data n, log of n over k is a small constant, and the bucket sorting time approaches O(n).
It looks so good, can it replace the O(nlogn) complexity sorting algorithm introduced between codes?
Unfortunately, the answer is no. The track a sports car can run on is special and cannot replace a family car. In fact, it’s used in very demanding situations.
- The data to be sorted is easily divided evenly into K buckets, and there is a natural size order between buckets. In this way, it is not necessary to sort the data in each bucket after the implementation is sorted.
- Data is evenly distributed between buckets. If there are many buckets, there are few buckets. So the time complexity of sorting in a bucket is not constant, and in extreme cases, when all the data is in a bucket, it becomes O(nlogn) time.
Applicable scenario
More suitable for external sorting. The so-called external sort is the data stored in the external disk, the amount of data is relatively large and memory is limited, can not be loaded into the memory all at once.
Let’s say we have 10GB of order data and we want to sort by order amount (assuming the amounts are all positive integers), but we have limited memory, only a few hundred MEgabytes, and there is no way to load 10GB of data into memory at once. What to do at this time?
solution
Below, the same is true for ordering 10G order data according to the order amount. The minimum order amount is 1 yuan, and the maximum order amount is 100,000 yuan. We divide all the orders into 100 buckets according to the amount. For the first bucket, we store orders with the amount of 1 yuan to 1000 yuan, and for the second bucket, we store orders with the amount of 1001 yuan to 2000 yuan, and so on. Each bucket corresponds to a file and is numbered and named according to the size order of the amount range (00,01,02… 99).
Ideally, if the order amount is evenly distributed between $10,000 and $100,000, the order will be evenly divided into 100 files, each of which holds about 100MB of order data. We can place the 100 small files in memory and sort them by quicksort. After all the files are sorted, we only need to read the order data in each small file from small to large according to the file number, and write it to a file. Then the file will store the order data in order of amount from small to large.
The code field
/** * Bucket sort: divide arr into n equally sized subintervals (buckets), sort each subinterval separately, and merge */
public class BucketSort implements LineSort {
private static final QuickSort quickSort = new QuickSort();
@Override
public int[] sort(int[] sourceArray, int bucketSize) {
// Find the maximum and minimum values
int minValue = sourceArray[0];
int maxValue = sourceArray[1];
for (int value : sourceArray) {
minValue = Math.min(minValue, value);
maxValue = Math.max(maxValue, value);
}
/ / number of barrels
int bucketCount = (maxValue - minValue) / bucketSize + 1;
int[][] buckets = new int[bucketCount][bucketSize];
// Hold the element index of the array for each bucket. The default value is 0
int[] indexArr = new int[bucketCount];
// Allocate the values from the array to buckets
for (int value : sourceArray) {
int bucketIndex = (value - minValue) / bucketSize;
// The current bucket array has reached its maximum value
if (indexArr[bucketIndex] == buckets[bucketIndex].length) {
ensureCapacity(buckets, bucketIndex);
}
// Put the data into the bucket and the array index corresponding to the bucket is + 1
buckets[bucketIndex][indexArr[bucketIndex]++] = value;
}
// Sort each bucket, using quicksort
int k = 0;
for (int i = 0; i < buckets.length; i++) {
if (indexArr[i] == 0) {
continue;
}
// The default value is bucketSize. Sort the bucket according to the actual bucket capacity. Otherwise, the default value of bucketSize is 0
quickSort.quickSortInternal(buckets[i], 0, indexArr[i] - 1);
for (int j = 0; j < indexArr[i]; j++) { sourceArray[k++] = buckets[i][j]; }}return sourceArray;
}
/** * Expand the array and save the data **@param buckets
* @param bucketIndex
*/
private void ensureCapacity(int[][] buckets, int bucketIndex) {
int[] tempArr = buckets[bucketIndex];
int[] newArr = new int[tempArr.length * 2];
for (int j = 0; j < tempArr.length; j++) { newArr[j] = tempArr[j]; } buckets[bucketIndex] = newArr; }}Copy the code
Unit testing
Generate a million data, data range [1, 100000]
@displayName (" Linear Sorting algorithm test ")
public class LineSortTest {
private static int length = 100;
private int[] array = new int[length];
@BeforeEach
public void beforeEach(a) {
Random rand = new Random();
for (int i = 0; i < length; i++) {
// Randomly generate [1, 1000000] data
array[i] = rand.nextInt(length) + 1; }}@displayName (" Bucket sort ")
@Test
public void testBucketSort(a) {
BucketSort bucketSort = new BucketSort();
// 100 data, 10 buckets
int[] sort = bucketSort.sort(array, 10); System.out.println(Arrays.toString(sort)); }}Copy the code
conclusion
How to rank 1 million users by age? Now is it really easy to think about the problem? Let me talk about my solution.
In fact, sorting one million users by age is like sorting half a million students by grades. Let’s assume the age range is from 1 year old to 120 years old. We can iterate through the million users, divide them by age into 120 buckets, and then iterate through the elements of those 120 buckets in sequence. So you have a million users sorted by age.
The background reply “Add group”, join the technical community to learn together.