Pay attention to the “code brother byte” star label, receive the latest technology dry goods to improve themselves. Github: github.com/UniqueDong/…

Before we learned O(n²) classical sorting algorithm: bubble sort, insert sort, select sort, today we will learn O(nlogn) merge sort, this kind of sorting idea is also more commonly used.

Merge sort and quicksort both use divide-and-conquer.

As a typical divide-and-conquer algorithm application, merge sort is implemented by two methods:

  • Top-down recursion (all recursive methods can be iteratively rewritten, hence the second method);
  • Bottom-up iteration;

The principle of

Divide the array in the middle into the left and right parts, sort the left and right parts separately, and merge the two parts of the sort number together until the whole sequence is sorted. It’s actually using the idea of divide and conquer, which is, as the name suggests, divide and conquer, breaking up a big problem into smaller sub-problems.

It’s a little bit like recursion, but divide and conquer is always done recursively. Divide and conquer is a problem solving processing idea, recursion is a programming skill.

The recursive formula

As we said earlier in recursion, the trick to recursive code is to analyze the recursive formula, find the termination condition, and then translate the recursive formula into code.

Merge_sort (p... R) = merge (merge_sort (p... Q), merge_sort (q + 1... R)) termination condition: P >= r no further decompositionCopy the code

Let me explain the recursion.

Merge_sort (p… R) means, sort arrays with subscripts from p to r. We convert this sorting problem into two subproblems, merge_sort(p… Q) and merge_sort (q + 1… R), where q is equal to the middle position between p and r, which is (p+r)/2. Once the subarrays p to q and q+1 to r are sorted, we merge the two ordered subarrays together so that the subarrays p to r are sorted.

public static void mergeSort(int[] arr) {
    sort(arr, 0, arr.length - 1);
}

public static void sort(int[] arr, int left, int right) {
    // Recursive termination condition
    if(left >= right) {
        return;
    }
    // Get the middle position between left right
    int mid = left + ((right - left) >> 1);
    // Divide and conquer recursively
    sort(arr, left, mid);
    sort(arr, mid + 1, right);
    merge(arr, left, mid, right);
}

// Merge data
public static void merge(int[] arr, int left, int mid, int right) {
    int[] temp = new int[right - left + 1];
    int i = 0;
    int p1 = left;
    int p2 = mid + 1;
    // Compare the left and right elements, which is smaller, and fill that element in temp
    while(p1 <= mid && p2 <= right) {
        temp[i++] = arr[p1] < arr[p2] ? arr[p1++] : arr[p2++];
    }
    // After the above loop exits, fill the remaining elements into temp
    // Only one of the following two while will be executed
    while(p1 <= mid) {
        temp[i++] = arr[p1++];
    }
    while(p2 <= right) {
        temp[i++] = arr[p2++];
    }
    // Copy the final sorting result to the original array
    for(i = 0; i < temp.length; i++) { arr[left + i] = temp[i]; }}Copy the code

Performance analysis

First, is merge sort a stable sorting algorithm?

As you can see, the key to a stable merge sort is the merge() function, which is the part of the code that merges two ordered subarrays into one ordered array.

If A[left… mid] and A[mid+1… right] have the same value, we can put the elements from A[left… mid] into TMP array as in the pseudocode. This ensures that elements with the same value are placed in the same order before and after the merge. So merge sort is a stable sorting algorithm.

Second, what is the time complexity of merge sort?

Merge sort involves recursion, and the analysis of time complexity is a little more complicated. This is a good opportunity to learn how to analyze the time complexity of recursive code.

In the recursion section, we said that recursion is applicable when a problem A can be decomposed into multiple subproblems B and C, so solving problem A can be decomposed into solving problems B and C. After problems B and C are solved, we combine the results of B and C into the results of A.

If we define the time to solve problem A to be T(a), and the time to solve problem B and C to be T(b) and T(c) respectively, then we can obtain such a recursive relation: where K is equal to the time to combine the results of two subproblems B and C into the result of problem A.

T(a) = T(b) + T(c) + K

We can draw an important conclusion: not only the problem of recursive solution can be written as a recursive formula, but also the time complexity of recursive code can be written as a recursive formula.

Let’s say it takes T(n) to merge n elements, so it takes T(n/2) to split into two subarrays. We know that merge() merges two ordered subarrays in O(n) time. So, using the previous formula, the time complexity of merge sort can be calculated by:

T (1) = C; When n=1, only constant execution time is required, so it is denoted by C. T(n) = 2 times T(n/2) + n; n>1Copy the code

So how do WE solve for T(n)? Not intuitive enough? So let’s break it down a little bit more.

T(n) = 2*T(n/2) + n = 2*(2*T(n/4) + n/2) + n = 4*T(n/4) + 2*n = 4*(2*T(n/8) + n/4) + 2*n = 8*T(n/8) + 3*n = 8*(2*T(n/16)  + n/8) + 3*n = 16*T(n/16) + 4*n ...... = 2^k * T(n/2^k) + k * n ......Copy the code

So by doing this step by step, we get T of n is equal to 2 to the kT of n over 2 to the k plus kN. When T of n over 2 to the k is equal to T of 1, which is n over 2 to the k is equal to 1, we get k is equal to log base 2n. We substitute the value of k into the formula above, and we get T(n)=Cn+nlog2n. If we use big O notation, T of n is equal to order nlogn. So merge sort is order nlogn in the best case, the worst case, the average case, order nlogn.

Third, what is the space complexity of merge sort?

Merge sort is order nlogn in any case, which looks pretty good. (As you’ll see, even quicksort is O(n ^ 2) in the worst case.) But merge sort is not as widely used as quicksort. Why is that? Because it has a fatal “weakness”, that is, merge sort is not an in-place sort algorithm.

This is because the merge function of merge sort requires extra storage space to merge two ordered arrays into one.

In fact, the spatial complexity of recursive code does not add up in the same way as the temporal complexity. We just forgot the most important point, which is that although each merge requires additional memory, the temporary space is freed up after the merge is complete. There is only one function executing on the CPU at any one time, so there is only one temporary memory space in use. The temporary memory space can’t exceed the size of n data, so the space complexity is O(n).

After thinking about

You now have 10 interfaces to log files, each about 300MB in size, and the logs in each file are sorted by time stamp from smallest to largest. You want to merge the 10 smaller log files into a single log file, and the merged logs will still be sorted by timestamp from smallest to largest. If the machine handling the above sorting task has only 1GB of memory, do you have a good solution to merge the 10 log files “quickly”?

Back to Taiwan reply: log sort, you can get the answer oh.

“Add group” to join the technical group to get more growth, the latest content in hand