Bucket sort
Foreword: the purpose that writes these articles in the blog is used at record learning, be afraid later forget, if where write wrong welcome correct, thank!!
Learning objective: Master the principle and idea of bucket sorting algorithm
First, premise knowledge
Sorting algorithm concept, time complexity. You can go to this website to learn the sorting algorithm 01_ algorithm basic introduction to read
Introduction to bucket sorting
Bucket sort is an updated version of the counting sort we discussed in the previous article, so Bucket sort is the same as counting sort, by putting the elements of the original sequence into “a new location”, and then removing the sort one by one.
And bucket sort is not comparison sort, bucket sort is stable
Three, bucket sorting principle
We learned about counting sort, and we learned that counting sort takes the elements of the original sequence and puts them in a new array and sorts them.
After the upgrade, bucket sorting is no longer limited to an array, but to create multiple buckets to place the elements of the original sequence, and set a range for each bucket, and then distribute the elements of the original sequence into exclusive buckets according to the size, and then take out from each bucket in sequence, then a good sorting result can be obtained
Illustration:
Four, bucket sorting design ideas
First, the upgraded bucket sort, unlike the original count sort, according to the index, but set a range on each bucket, the range will be used to receive the corresponding elements in the original sequence, so
-
The first step is to create enough buckets to store all the elements of the original sequence. Must be at least larger than the array length
-
Since bucket sorting is not a comparison sort, we need an interval (that is, a range) so that any element entering a bucket from the original sequence can enter the corresponding bucket according to this interval
2. When the elements of the original sequence are saved, the inner parts of each bucket should be sorted. For example, in the figure above, if 14 comes before 11, the inner parts of each bucket should be sorted again
- The internal bucket sorting will directly affect the performance of bucket sorting
According to the above two design schemes: there are two schemes as FAR as I know: 1. The number of buckets is set 2 according to the original sequence length. The number of buckets is set based on the difference between the maximum and minimum values
- aboutBucket number, we choose to use this timeTo set the length of the original sequence. The data structure used by bucketsJDKTo provide theArrayListclass
- Two advantages: 1. Automatic expansion
- 2. The main reason for using this container is that containers can also be placed as elements in this container, so that multiple buckets can be grouped together for easy processing
- aboutintervalSetup, this is the core of bucket sorting problem
- Interval selection
- Use the original sequence maximum minus the minimum value to a total interval
- Then divide the total interval by the original sequence length so that each bucket is evenly divided into an interval
- We’re going to introduce another technical term here, the one used when the data goes into the bucketThe mapping function, this function can help usFind the bucket where the corresponding range resides by the value.
- As far as I know, there are two ways to set up mapping functions
- The first is the one I use
- Bucket location == Value to be discharged/range
- The second,The number of buckets is set based on the maximum or minimum valueWhen (max-min)/arr.length + 1;
- Bucket position == (value to be emptied – minimum sequence value)/sequence length
- There is no need to set the interval to use this scheme
- Optimal performance, but space consumption
- Interval selection
- For bucket sorting, I directly use the SORT method of the Collections utility class provided by the JDK
Five, code implementation
package com.migu.sortingAlgorithm;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
public class BucketSort {
public static void main(String[] args) {
int[] arr = {50.350.150.250.500.0};
sort(arr);
System.out.println(Arrays.toString(arr));
}
public static void sort(int[] arr){
int max = arr[0]; / / the biggest
int min = arr[0]; / / minimum
for(int i=1; i<arr.length; i++) {
if(arr[i] > max) {
max = arr[i];
}
if(arr[i] < min) { min = arr[i]; }}// The maximum size element difference of the original sequence is used to allocate the interval
int differ = max - min;
// The storage range of each bucket is rounded up
int section = (int)Math.ceil(differ / (arr.length - 1));/ / bucket list
ArrayList<ArrayList<Integer>> bucketList = new ArrayList<>();
// The number of buckets is set according to the sequence length
for(int i = 0; i < arr.length; i++){
bucketList.add(new ArrayList<>());
}
// Iterate over the original sequence into the bucket
for(int i = 0; i < arr.length; i++){
// Divide the current number by the range to get the location of the bucket
int num = arr[i] / section - 1;
if(num < 0){
num = 0;
}
bucketList.get(num).add(arr[i]);
}
// Sort within buckets
for(int i = 0; i < bucketList.size(); i++){
// Call the JDK utility class to help sort
Collections.sort(bucketList.get(i));
}
// Write the original sequence
int index = 0;
for(ArrayList<Integer> arrayList : bucketList){
for(intvalue : arrayList){ arr[index++] = value; }}}}Copy the code
6. Time complexity
As long as the number of buckets is properly allocated and the time complexity is O(n + k), this K is the complexity of the internal bucket sorting. If there is a large amount of data in a bucket list, time complexity is a different matter
Seven,
If the memory is sufficient, it is recommended to increase the number of buckets and of course change the allocation degree of the interval accordingly, so that the sorting operation within buckets can be avoided and the time complexity can reach O(n).
Bucket sorting is not friendly to data sorting that is not evenly distributed. If there is a large number of data in a certain interval, bucket sorting makes no sense at all