Binary search and its variant algorithm

preface

Binary Search (Binary Search) is a Search algorithm for ordered data sets.

Idea: Binary lookup is an ordered set of data (ascending or descending). The idea of lookup is somewhat similar to divide-and-conquer. Each time, the interval to be searched is reduced by half by comparing with the middle element of the interval, until the element to be searched is found, or the interval is reduced to 0

Step: Define low, high, mid Pointers to the middle of the beginning and end respectively. Value is the value we are looking for

Perform the following algorithm

(1) if arr[mid] = value, return mid subscript

(2) When ARr [mid] < value, it indicates that the target element is in the right interval divided by mid, so the initial position low of the interval is assigned as mid+1

(3) When ARr [mid] > value, it indicates that the target element is in the left interval divided by mid, so the end position of the interval high is assigned as mid-1

Illustration:

In an ordered set,11,19,23,27,33,45,55,67,98 {8}, for example, to find value value = 19 were analyzed

Complexity analysis

So let’s say the size of the data is n, and each round of the search is going to be half the size of the data, which is divided by 2. At best, it is found on initialization, at worst, it does not stop until the search interval is reduced to empty.

NNN, n2\frac{n}{2}2n, n4\frac{n}{4}4n, n8\frac{n}{8}8n,… N2k \frac{n}{2^k}2kn, n2k\frac{n}{2^k}2kn Where n2k\frac{n}{2^k}2kn = 111, the value of k is the total number of reduction. Each reduction operation only involves the size comparison of two data. Therefore, after k interval reduction operations, the time complexity is O(k). N2k \frac{n}{2^k}2kn = 111, we can find KKK = log⁡2n\log_2nlog2n, so the time complexity is O(logn).

Advantages: Efficient binary search, with amazing search speed, because logn is a very “scary” order of magnitude, even if n is very, very large, the corresponding logn is also very small. For example, n is equal to 2 to the 32nd, which is a big number, right? It’s about 4.2 billion. In other words, if we were to use binary search to find one data in 4.2 billion data sets, we would need at most 32 comparisons

coding

conventional

Iterative implementation

/** **@param arr
 * @param value
 * @return* /
public static int binarySerach1(int[] arr, int  value) {
    // header pointer
    int low = 0;
    // end pointer
    int high = arr.length - 1;

    while (low <= high){
        int mid = low + ((high - low) >> 1);

        if (arr[mid] == value){
            return mid;
        }else if (arr[mid] > value){
            high = mid - 1;
        }else {
            low = mid + 1; }}return -1;
}
Copy the code

Note:

Loop exit condition: note low <= high, not low < high
Value of mid:
- Mid =(low + high)/2 is problematic. Because if low and high are high, the sum of the two might overflow
- The improved method is to write the calculation method of mid as low + (high-low)/2
- Change the division operation to bitwise because computers can process bitwise operations much faster than division

Recursive implementation

/** * recursion *@param arr
 * @param value
 * @return* /
public static int binarySerach2(int[] arr, int  value){
    return bsearch(arr,0,arr.length -1,value);
}

private static int bsearch(int[] arr, int low, int high, int value) {

    if (low > high) return -1;

    int mid =  low + ((high - low) >> 1);
    if (arr[mid] == value) {
        return mid;
    } else if (arr[mid] > value) {
        return bsearch(arr, low, mid-1, value);

    } else {
        return bsearch(arr, mid+1, high, value); }}Copy the code

variant

1. Find the element whose first value is equal to the given value

The binary search algorithm implemented above is relatively simple and has some limitations, for example, it is the default element is not repeated. So, assuming that the ordered combination has repeating elements, then using the above algorithm to solve it is definitely problematic. Take the ordered array {1,3,4,5,6,8,8,8,11,18}, where the values of a[5], a[6], a[7] are all equal to 8, which are repeated data. We want to find the first element equal to 8, which is the element with subscript 5. The illustration is as follows:

Code:

public static int binarySerach3(int[] arr, int value) {
    int low = 0;
    int high = arr.length - 1;
    while (low <= high) {
        int mid =  low + ((high - low) >> 1);
        if (arr[mid] > value) {
            high = mid - 1;
        } else if (arr[mid] < value) {
            low = mid + 1;
        } else {
            if ((mid == 0) || (arr[mid - 1] != value)) return mid;
            else high = mid - 1; }}return -1;
}
Copy the code

Key: Line 11 is the key of the algorithm. The previous code is the same idea as before, but it is different from line 11. We need to find the element whose first value is equal to the given value, so we only need to pay attention to two places

mid == 0“, which means that the value found is the value at index 0 of the array, which is the starting position, so it must be the first value
ifmidThe preceding digit is not to be matchedvalue, then the first value is found, otherwise,highPoint to themid - 1, re-compare

Find the element whose last value is equal to the given value

The ordered array {1,3,4,5,6,8,8,8,11,18} is used again, where the values of a[5], a[6], a[7] are all equal to 8, which are repeated data. We want to find the last element equal to 8, the element with subscript 7.

Code:

/** * finds the element * whose last value is equal to the given value@param arr
 * @param value
 * @return* /
public static int binarySerach4(int[] arr, int value) {
    int low = 0;
    int high = arr.length - 1;
    while (low <= high) {
        int mid =  low + ((high - low) >> 1);
        if (arr[mid] > value) {
            high = mid - 1;
        } else if (arr[mid] < value) {
            low = mid + 1;
        } else {
            if ((mid == arr.length - 1) || (arr[mid + 1] != value)) return mid;
            else high = mid + 1; }}return -1;
}
Copy the code

The case for this variant is very similar to the first one, with a slight modification of the code condition after line 11 without further analysis

Find the first element that is greater than or equal to the given value

Let’s look at another kind of deformation problem. In an ordered array, find the first element greater than or equal to a given value. For example, an array stores a sequence like {3,4,6,7,10}. If you look for the first element that is greater than or equal to 5, it is 6.

Code:

/** * find the first element that is greater than or equal to the given value *@param arr
 * @param value
 * @return* /
public static int binarySerach3(int[] arr, int value) {
    int low = 0;
    int high = arr.length - 1;
    while (low <= high) {
        int mid =  low + ((high - low) >> 1);
        if (arr[mid] >= value) {
            if ((mid == 0) || (arr[mid - 1] < value)) return mid;
            else high = mid - 1;
        } else {
            low = mid + 1; }}return -1;
}
Copy the code

Key: this variant is also the first value searched, and the search logic of variant 1 is the same, except that the matching value value is changed to the search ratio value larger, and arr[mid] >= value is uniformly processed

Find the last element whose value is less than or equal to the given value

Again, the valid sequence {3,4,6,7,10} is used to find the last element that is less than or equal to the given value. If the given value is 5, then the search process is 4.

/** * find the first element that is greater than or equal to the given value *@param arr
 * @param value
 * @return* /
public static int binarySerach4(int[] arr, int value) {
    int low = 0;
    int high = arr.length - 1;
    while (low <= high) {
        int mid =  low + ((high - low) >> 1);
        if (arr[mid] > value) {
            high = mid - 1;
        } else {
            if ((mid == arr.length - 1) || (arr[mid + 1] > value)) return mid;
            else low = mid + 1; }}return -1;
}
Copy the code

Train of thought and above roughly same, do not make analysis

limitations

The time complexity of binary search is O(logn), and the efficiency of searching data is very high. However, binary search can not be used in every situation, and its application scenarios are very limited. So when is binary search appropriate and when is it not appropriate?

Binary lookup relies on a sequential table structure, which is simply an array

Can binary lookup rely on other data structures? Like linked lists. The answer is no, mainly because binary search algorithms require random access to elements by subscript. We saw in arrays and linked lists that the time for an array to randomly access data by index is O(1), while the time for a linked list to randomly access data is O(n). So, if the data is stored in linked lists, the time complexity of binary lookup becomes high.

Binary lookup is for ordered data

Binary search is more demanding at this point, the data must be ordered. If the data is not in order, we need to sort it first. The lowest time complexity of sorting is O(nlogn). So, if we’re dealing with a static set of data, without frequent insertions and deletions, we can do one sort, multiple binary searches. In this way, the sorting cost can be evenly amortized, and the marginal cost of binary search will be low.

Therefore, binary search can only be used in scenarios where insert and delete operations are infrequent and multiple searches are sorted at one time. Binary lookup is no longer suitable for dynamically changing data sets.

3, the amount of data is too small for binary search.

If the amount of data to be processed is small, binary lookup is not necessary at all, and sequential traversal is sufficient. For example, if we are looking for an element in an array of size 10, the speed of the search is about the same whether we use binary search or sequential traversal. Only when the amount of data is relatively large, the advantage of binary search will be more obvious.

There is, however, an exception. If comparing data is time-consuming, I recommend using binary lookup regardless of the size of the data. For example, arrays are filled with strings of more than 300 length, which would be time-consuming to compare the size of two strings. We need to reduce the number of comparisons as much as possible, and reducing the number of comparisons greatly improves performance, where binary lookup has an advantage over sequential traversal.

4, the amount of data is too large for binary search.

The bottom layer of binary search depends on the data structure of array, which requires continuous memory space to support random access. For example, if we have 1GB of data and want to store it in arrays, we need 1GB of contiguous memory space.

Notice the word “contiguous” here, which means that even if there is 2GB of memory left, if the remaining 2GB is scattered, there is no contiguous 1GB of memory, then there is no way to apply for a 1GB array. And our binary search is the function of the array data structure above, so too large data with array storage is more difficult, it can not use binary search.

The statement

** Wang — “The Beauty of Data Structures and Algorithms”, Why the minimum time complexity of sorting is O (nlogn)

The article is original, welcome to reprint, just indicate the source

Personal ability is limited, have incorrect place, also please correct

The code for this article has been uploadedgithubWelcome star –Making the address

Binary search and its variant algorithm

preface

Complexity analysis

coding

conventional

variant

limitations

The statement

Related Posts

Volatile does not guarantee atomicity

Golang needs to avoid stepping on 50 pits

Flask’s journey: Get started fast