Bucket sort simple introduction ^-^
We live in a world full of things that have been sorted. When standing in line will be sorted according to the height, the rank of the exam needs to be sorted according to the score, online shopping will be sorted according to the price, E-mail in accordance with the time sorted…… Anyway, a lot of things need to sort, so sort is everywhere. Now let’s take a concrete example to introduce sorting algorithms.
The first appearance is our hero small hum, above this lovely baby is. When the final examination is over, the teacher will rank the students’ scores from highest to lowest. There were only 5 students in Her class, and they scored 5, 3, 5, 2 and 8 points respectively. And then you sort the scores from highest to lowest, and then you get 8, 5, 5, 3, 2. Do you have a good way to write a program that tells the computer to read five random numbers and print them out from the largest to the smallest? Please think about it for at least 15 minutes before you read (*^__^*)
We can solve this problem here with just a one-dimensional array. Make sure you really think about it before you read on.
First we need to apply an array of size 11 int a[11]. OK, now you have 11 variables, numbered from a[0] to a[10]. At the beginning, we initialize a[0] to a[10] to zero, indicating that none of these scores have been scored yet. For example, a[0] = 0 means that no one has scored a 0 yet, and a[1] = 0 means that no one has scored a 1 yet… A [10] = 0 means no one has ever scored a 10.
The first person’s score is 5. We will increase the corresponding value of A [5] by 1 on the original basis, that is, change the value of A [5] from 0 to 1, indicating that 5 points have occurred once.
The second person’s score is 3, so we increase the corresponding value of A [3] by 1 on the original basis, that is, the value of A [3] is changed from 0 to 1, indicating that 3 points have occurred once.
Pay attention! The third person’s score is also 5, so the value of A [5] needs to be increased by 1 on this basis, that is, the value of A [5] is changed from 1 to 2, indicating that 5 has occurred twice.
Do the same for the fourth and fifth person. The end result is the picture below.
Did you notice that the values in a[0] to a[10] are actually the number of occurrences of each score from 0 to 10? Next, we just need to print out the scores that have appeared, as shown below.
If a[0] is 0, 0 does not appear and is not printed.
A [1] is 0, indicating that “1” does not appear and will not be printed.
A [2] is 1, indicating that “2” has appeared once. Print 2.
A [3] is 1, indicating that “3” has appeared once. Print 3.
A [4] is 0, indicating that “4” does not appear, so it is not printed.
A [5] is 2, indicating that “5” has appeared twice. Print 5, 5.
A [6] is 0, indicating that “6” does not appear, so it is not printed.
A [7] is 0, indicating that “7” does not appear, so it is not printed.
A [8] is 1, indicating that “8” has appeared once. Print 8.
A [9] is 0, indicating that “9” does not appear, so it is not printed.
A [10] is 0, indicating that “10” does not appear and will not be printed.
The final screen output “2 3 5 5 8”, the complete code is as follows.
1 #include <stdio.h> 2 int main() 3 { 4 int a[11],i,j,t; 5 for(i=0; i<=10; i++) 6 a[i]=0; // initialize to 0 7 8 for(I =1; i<=5; 9 {10 scanf("%d",&t); A [t]++; For (I =0; i<=10; A [0]~a[10] 15 for(j=1; j<=a[i]; J++) // printf("%d ", I); 17 18 getchar(); getchar(); 19 // getchar(); 20 // You can also use system("pause"); Return 0; 22}Copy the code
The input data is: \
5, 3, 5, 2, 8
And if you look closely, you’ll see that this is sort from smallest to largest. But we want to sort from largest to smallest, so how do we do that? Or first think about yourself and then look down oh.
It’s really simple. You just take for(I =0; i<=10; I++) instead of the for (I = 10; i>=0; I –) OK, go and have a try.
This sort method is called “bucket sort”. Because the real bucket sorting is a little bit more complicated than this, and we’ll talk more about it later, this algorithm is pretty good for us.
This algorithm is like having 11 buckets, numbered from 0 to 10. Each time a number appears, a small flag is placed in the bucket with the corresponding number. Finally, just count how many small flags are in each bucket. For example, there is a small flag in bucket 2, indicating that 2 appears once. There is a small flag in bucket 3, indicating that 3 appears once; There are two small flags in bucket 5, indicating that 5 appears twice; Bucket 8 has a little flag in it, indicating that 8 appears once.
Now you can try typing n integers between 0 and 1000, sorting them from largest to smallest. As a reminder, if we need to sort integers between 0 and 1000, we need 1001 buckets to represent the number of occurrences of each number between 0 and 1000. In addition, each bucket here is simply “marking” the number of occurrences of each number, so I like to change the previous array A to a more appropriate name: book (the word “book” means a record, mark).
1 #include <stdio.h> 2 3 int main() 4 { 5 int book[1001],i,j,t,n; 6 for(i=0; i<=1000; i++) 7 book[i]=0; 8 scanf("%d",&n); 9 for(I =1; i<=n; If ("%d",&t); if ("%d",&t); // read each number into the variable t 12 book[t]++; 13} 14 for(I =1000; i>=0; I --) // Check buckets 1000 to 0 for(j=1; j<=book[i]; J++) // printf("%d ", I); 17 18 getchar(); getchar(); 19 return 0; 20}Copy the code
You can enter the following data for verification. \
10
8 100 50 22 15 6 1 1000 999 0
The result is: \
1000 999 100 50 22 15 8 6 10
Finally, time complexity. Line 6 loops m times (m is the number of buckets), line 9 loops N times (n is the number of buckets to sort), and lines 14 and 15 loop m+n times. So the whole sorting algorithm is done m+n+m+n times. We use capital letter O to represent the time complexity, so the algorithm’s time complexity is O(m+n+m+n), that is, O(2*(m+n)). We can ignore the smaller constants when we talk about the time complexity, and the final bucket sorting time is O(m+n). One other thing, when you’re talking about time complexity, n and m are usually capitalized O(m + n).
This is a very fast sorting algorithm. Bucket sorting has been used since 1956. The basic idea of this algorithm was proposed by E.J. Issac and R.C. Singleton. As I said earlier, this is not a real bucket sorting algorithm, a real bucket sorting algorithm is much more complicated than this!
Below specifically say radix sort and bucket sort!
Radix sort
The basic idea
Instead of comparing keywords, “allocate” and “collect” are used.
PS: Take the decimal system as an example. The base refers to the bits of a number, such as the ones, tens, hundreds, etc. In hexadecimal, for example, 0xB2, there are two radices (the plurality of radix).
Least Significant Digit (LSD)
Short keywords are considered small and placed first, and then keywords of the same length are sorted by dictionary order or numeric size, etc. Like 1,2,3,4,5,6,7,8,9,10,11 or “b, c, d, e, f, g, h, I, j, ba.”
Most Significant Digit (MSD)
Sort directly in dictionary order, suitable for strings, words, or integers of fixed length. For example: 1, 10, 2, 3, 4, 5, 6, 7, 8, 9 and “b, ba, c, d, e, f, g, h, I, j”.
Radix sort diagram
As can be seen from the diagram, the basic flow of radix sort (LSD) is as follows.
Radix sort process
Will be based on the right side of the integer Numbers will be thrown into the corresponding basket, 0 to 9, for the same number to keep their original relative order (to ensure the stability of the sorting algorithm), then put the basket in the number as shown in the list, and then a second trip to collect (according to the number of the second collection), thus constantly repeated, when there is no more, String numbers are sorted numbers.
Algorithm analysis
space
Using sequential allocation is obviously not appropriate. Because each pocket has the potential to hold all the integers to be sorted. So, the need for extra space is 10N, which is too much. It makes sense to use link allocation. The extra space required is N, and it is usually enough to add a pointer to each pocket. In general, let the value range of each key be D, the beginning and end Pointers total 2*radix, and the total space O(N +2*radix). \
time
Each counter in the figure above has two bits, so two allocations and two collections are sufficient. In general, each node has a D-bit keyword and must perform d allocation and collection operations.
- Cost per allocation: O(n)O(n)
- Cost per collection: O(radix)O(radix)
- Total cost: O(D *(N +radix))O(D ×(N +radix))
Algorithm c++ plus implementation
Radix sort algorithm based on LSD:
1 #include <iostream>
2
3 using namespace std;
4 const int MAX = 10;
5
6 void print(int *a,int sz) {
7 for(int i = 0; i < sz; i++)
8 cout << a[i] << " ";
9 cout << endl;
10 }
11
12 void RadixSortLSD(int *a, int arraySize)
13 {
14 int i, bucket[MAX], maxVal = 0, digitPosition =1 ;
15 for(i = 0; i < arraySize; i++) {
16 if(a[i] > maxVal) maxVal = a[i];
17 }
18
19 int pass = 1; // used to show the progress
20 /* maxVal: this variable decide the while-loop count
21 if maxVal is 3 digits, then we loop through 3 times */
22 while(maxVal/digitPosition > 0) {
23 /* reset counter */
24 int digitCount[10] = {0};
25
26 /* count pos-th digits (keys) */
27 for(i = 0; i < arraySize; i++)
28 digitCount[a[i]/digitPosition%10]++;
29
30 /* accumulated count */
31 for(i = 1; i < 10; i++)
32 digitCount[i] += digitCount[i-1];
33
34 /* To keep the order, start from back side */
35 for(i = arraySize - 1; i >= 0; i--)
36 bucket[--digitCount[a[i]/digitPosition%10]] = a[i];
37
38 /* rearrange the original array using elements in the bucket */
39 for(i = 0; i < arraySize; i++)
40 a[i] = bucket[i];
41
42 /* at this point, a array is sorted by digitPosition-th digit */
43 cout << "pass #" << pass++ << ": ";
44 print(a,arraySize);
45
46 /* move up the digit position */
47 digitPosition *= 10;
48 }
49 }
50
51 int main()
52 {
53 int a[] = {170, 45, 75, 90, 2, 24, 802, 66};
54 const size_t sz = sizeof(a)/sizeof(a[0]);
55
56 cout << "pass #0: ";
57 print(a,sz);
58 RadixSortLSD(&a[0],sz);
59 return 0;
60 }
Copy the code
The output is:
1 pass #0: 170 45 75 90 2 24 802 66
2 pass #1: 170 90 2 802 24 45 75 66
3 pass #2: 2 802 24 45 66 170 75 90
4 pass #3: 2 24 45 66 75 90 170 802
Copy the code
First count how many numbers are in each of the 10 baskets (or pockets), then the frequency distribution of numbers from 0 to 9 (rather than frequency density, which has a cumulative process) to determine where the subscripts of the positions are when the integers are “collected”. At the same time, in order to ensure the stability of the sorting algorithm, the same number keeps the original relative position unchanged, and the original data table is traversed in reverse order to form the collected data table one by one. For example, as shown in the figure above, the number 66 has a frequency distribution of 8, which means it should be ranked in the eighth place and subscript 7 in the array. If the frequency distribution is 4 for numbers 2 and 802, the subscript of 802 is 3 and the subscript of 2 is 2 when the table is traversed backwards, which in fact guarantees the stability of the sorting algorithm.
Bucket sort
The basic idea
The basic idea of bucket sort is to divide a data table into buckets, and then sort each bucket separately, either using a different sorting algorithm, or recursively using bucket sort algorithm. This is also a typical divide-and-conquer strategy. It is a distributed sort, between MSD radix sort and LSD radix sort.
The basic flow
Set up buckets; Iterate through the raw array and put the data into their buckets; Sort buckets that are not empty; The sorted array is made by walking through buckets and putting them back into the original array.
graphic
C++ plus description of the algorithm
1 #include <iostream> 2 #include <iomanip> 3 using namespace std; 4 5 #define NARRAY 8 /* array size */ 6 #define NBUCKET 5 /* bucket size */ 7 #define INTERVAL 10 /* bucket range */ 8 9 struct Node 10 { 11 int data; 12 struct Node *next; 13}; 14 15 void BucketSort(int arr[]); 16 struct Node *InsertionSort(struct Node *list); 17 void print(int arr[]); 18 void printBuckets(struct Node *list); 19 int getBucketIndex(int value); 20 21 void BucketSort(int arr[]) 22 { 23 int i,j; 24 struct Node **buckets; 25 26 /* allocate memory for array of pointers to the buckets */ 27 buckets = (struct Node **)malloc(sizeof(struct Node*) * NBUCKET); 28 29 /* initialize pointers to the buckets */ 30 for(i = 0; i < NBUCKET; ++i) { 31 buckets[i] = NULL; 32 } 33 34 /* put items into the buckets */ 35 for(i = 0; i < NARRAY; ++i) { 36 struct Node *current; 37 int pos = getBucketIndex(arr[i]); 38 current = (struct Node *) malloc(sizeof(struct Node)); 39 current->data = arr[i]; 40 current->next = buckets[pos]; 41 buckets[pos] = current; 42 } 43 44 /* check what's in each bucket */ 45 for(i = 0; i < NBUCKET; i++) { 46 cout << "Bucket[" << i << "] : "; 47 printBuckets(buckets[i]); 48 cout << endl; 49 } 50 51 /* sorting bucket using Insertion Sort */ 52 for(i = 0; i < NBUCKET; ++i) { 53 buckets[i] = InsertionSort(buckets[i]); 54 } 55 56 /* check what's in each bucket */ 57 cout << "-------------" << endl; 58 cout << "Bucktets after sorted" << endl; 59 for(i = 0; i < NBUCKET; i++) { 60 cout << "Bucket[" << i << "] : "; 61 printBuckets(buckets[i]); 62 cout << endl; 63 } 64 65 /* put items back to original array */ 66 for(j =0, i = 0; i < NBUCKET; ++i) { 67 struct Node *node; 68 node = buckets[i]; 69 while(node) { 70 arr[j++] = node->data; 71 node = node->next; 72 } 73 } 74 75 /* free memory */ 76 for(i = 0; i < NBUCKET; ++i) { 77 struct Node *node; 78 node = buckets[i]; 79 while(node) { 80 struct Node *tmp; 81 tmp = node; 82 node = node->next; 83 free(tmp); 84 } 85 } 86 free(buckets); 87 return; 88 } 89 90 /* Insertion Sort */ 91 struct Node *InsertionSort(struct Node *list) 92 { 93 struct Node *k,*nodeList; 94 /* need at least two items to sort */ 95 if(list == 0 || list->next == 0) { 96 return list; 97 } 98 99 nodeList = list; 100 k = list->next; 101 nodeList->next = 0; /* 1st node is new list */ 102 while(k ! = 0) { 103 struct Node *ptr; 104 /* check if insert before first */ 105 if(nodeList->data > k->data) { 106 struct Node *tmp; 107 tmp = k; 108 k = k->next; 109 tmp->next = nodeList; 110 nodeList = tmp; 111 continue; 112 } 113 114 for(ptr = nodeList; ptr->next ! = 0; ptr = ptr->next) { 115 if(ptr->next->data > k->data) break; 116 } 117 118 if(ptr->next! =0){ 119 struct Node *tmp; 120 tmp = k; 121 k = k->next; 122 tmp->next = ptr->next; 123 ptr->next = tmp; 124 continue; 125 } 126 else{ 127 ptr->next = k; 128 k = k->next; 129 ptr->next->next = 0; 130 continue; 131 } 132 } 133 return nodeList; 134 } 135 136 int getBucketIndex(int value) 137 { 138 return value/INTERVAL; 139 } 140 141 void print(int ar[]) 142 { 143 int i; 144 for(i = 0; i < NARRAY; ++i) { 145 cout << setw(3) << ar[i]; 146 } 147 cout << endl; 148 } 149 150 void printBuckets(struct Node *list) 151 { 152 struct Node *cur = list; 153 while(cur) { 154 cout << setw(3) << cur->data; 155 cur = cur->next; Int main(void) 160 {161 int array[NARRAY] = {29,25,3,49,9,37,21,43}; 162 163 cout << "Initial array" << endl; 164 print(array); 165 cout << "-------------" << endl; 166 167 BucketSort(array); 168 cout << "-------------" << endl; 169 cout << "Sorted array" << endl; 170 print(array); 171 return 0; 172}Copy the code
The output is:
1 Initial array 2 29 25 3 49 9 37 21 43 3 ------------- 4 Bucket[0] : 9 3 5 Bucket[1] : 6 Bucket[2] : 21 25 29 7 Bucket[3] : 37 8 Bucket[4] : 43 49 9 ------------- 10 Bucktets after sorted 11 Bucket[0] : 3 9 12 Bucket[1] : 13 Bucket[2] : 21 25 29 14 Bucket[3] : 37 15 Bucket[4] : 43 49 16 ------------- 17 Sorted array 18 3 9 21 25 29 37 43 49 Copy the code
Although the bucket in the program uses a linked list structure to make full use of space resources, and the bucket is also cleverly constructed, the data structure is similar to the form of stack linked list, inserts are inserted at the top, so the data after traversal will always be on the top, so the output after the bucket can be seen from the comparison with the diagram. The discovery is indeed the reverse of the original relative order.
Counting sort
The sorting time complexity of the data table with the length of N cannot be lower than O(nlogn). But if you know something about the table, you can sort in a more unique way, even in linear time.
The basic idea
When the length of the data table is N, it is known that the range of data in the data table is limited, for example, between 0 and K, and k is much smaller than N. Thus, counting sort can be achieved by counting the frequency of data at each range point.
Basic operation
According to the scope of the data table obtained, the data is divided into buckets, and then the frequency of data on buckets is directly counted. Then the sorted data table can be obtained by traversing buckets in sequence.
Algorithm c++ plus implementation
1 #include <iostream> 2 3 using namespace std; 4 5 void print(int a[], int sz) { 6 for (int i = 0; i < sz; i++ ) cout << a[i] << " "; 7 cout << endl; 8 } 9 10 void CountingSort(int arr[], int sz) { 11 int i, j, k; 12 int idx = 0; 13 int min, max; 14 15 min = max = arr[0]; 16 for(i = 1; i < sz; i++) { 17 min = (arr[i] < min) ? arr[i] : min; 18 max = (arr[i] > max) ? arr[i] : max; 19 } 20 21 k = max - min + 1; 22 /* creates k buckets */ 23 int *B = new int [k]; 24 for(i = 0; i < k; i++) B[i] = 0; 25 26 for(i = 0; i < sz; i++) B[arr[i] - min]++; 27 for(i = min; i <= max; i++) 28 for(j = 0; j < B[i - min]; j++) arr[idx++] = i; 29 30 print(arr,sz); 31 32 delete [] B; 33} 34 35 int main() 36 {37 int a[] = {5,9,3,9,10,9,2,4,13,10}; 38 const size_t sz = sizeof(a)/sizeof(a[0]); 39 print(a,sz); 40 cout << "----------------------\n" ; 41 CountingSort(a, sz); 42}Copy the code
The output is:
2 ---------------------- 3 2 3 4 5 9 9 9 10 10 13Copy the code
K buckets are constructed to calculate the data frequency. There are two steps to do the sorting. The first step is to make incremental counting and the second step is to write the corresponding number of counting statistics into the original data table.
Because this sort does not use comparison, it breaks the upper limit of time complexity O(nlogn). However, counting sort is not like a sort, because in complex data structures, it cannot sort structures by sorting the sorting codes in the same structure. Not to mention stability.