Reprint please indicate the source leonchen1024.com/2018/08/14/…

Dsmudge Search (BINARY search), also called half-interval search, logarithmic search (binary chop), is a search algorithm that looks for a specific element in an ordered array.

There are several variations on binary search. In particular, fractional cascading (gathering the values of each array into an array of 11[0,3,2,0] Improves the efficiency of finding the same value in multiple arrays, effectively solving a series of search problems in computational geometry and other fields. Binary search: Exponential search extends binary search to a list that has no bounds. Binary search tree and B-tree are binary search extensions.

The principle of

The search starts with the middle element of the array. If the middle element is exactly the element to be searched, the search process ends. If the middle element is greater than or less than the element being looked for, the search continues in half of the array that is greater than or less than the element being looked for, and the process is repeated until the element is found, or if the half is empty, it is not found. Each comparison reduces the search area by half.

steps

Given an ordered array A is A0… , an-1 and ensure that A0<=… <=An-1, and the target value T.

  1. Let L be 0 and R be n minus 1.
  2. If L>R, the search fails
  3. Let m(intermediate element index) be the largest integer less than (L+R)/2
  4. If Am
  5. If Am>T, set R=m-1 and go back to step 2;
  6. When Am=T, the search ends; The index position of T is m.

variant

  1. Let L be 0 and R be n minus 1.
  2. Let m(index of intermediate elements) be the upper bound, that is, the smallest value greater than (L+R)/2.
  3. If Am>T, set R to m-1 and return to step 2
  4. If Am<=T, set L to m and return to step 2.
  5. Until L=R, the search is complete. If T=Am, m is returned; otherwise, the search fails.

Reprint please indicate the source leonchen1024.com/2018/08/14/…

This variant sets L to m instead of m+1 when Am<=T. This method of comparison is faster because it omits one comparison per loop. But on average, there will be one more cycle. This variant always returns the rightmost element index if the array contains duplicate elements. For example, if A is [1,2,3,4,4,5,6,7], the method will return index 4 instead of index 3.

Roughly match the

Due to the sequential nature of ordered arrays, binary search can be extended to approximate matches. It can be used to calculate the rank (or rank, the number of elements smaller than it), the advance (next smallest element), the successor (next largest element), and the nearest neighbor of an assignment. You can also use two ranking queries to perform range queries.

  • Ranking queries can be performed using a modified binary search. Return m on success and L on failure, which equals the number of elements less than the target value.
  • Moving forward and succeeding can be done using ranking queries. When the ranking of the target value is known, the forward is the element of the previous ranking position on success and the element of the ranking position on failure. It is followed by the next element in the ranking position, or the next element that moves forward. The nearest lead to the target value may be a forward or a successor, depending on which is closer to the target value.
  • Range query, once the ranking of the values on both sides of the range is known, then the ranking of the elements greater than the boundary minimum and less than the boundary maximum is their range, whether or not to contain the boundary value is handled as needed.

Performance analysis

Time complexity Binary search reduces the search area by half each time, and time complexity is


(n is the number of elements in the set.) The worst case is when the traversal reaches the last level, or when the element is not found .

The comprehensive complexity is

Fractional cascading can improve the efficiency of querying for the same value across a large number of groups. K is the number of arrays to query for the target value in each arrayDispersion layering can reduce it to.

Variant efficiency analysis reduces the number of comparisons per cycle compared to normal binary search, but it must complete the cycle without getting the answer in the middle. However, in the case of large n, the increase in the number of comparisons reduced cannot offset the consumption of redundant cycles.

Reprint please indicate the source leonchen1024.com/2018/08/14/…

Space complexity O(1). Tail recursion, can be rewritten as a loop.

application

Finds elements in an array, or for insertion sort.

Binary search compared to other schemes

An ordered array using binary search is inefficient in insert and delete operations, consuming O(n) time per operation. Other data structures provide more efficient inserts and deletions, and provide equally efficient full matching. However, binary search is suitable for many search problems and only consumesThe time.

Hashing

For Associative Arrays, hash tables, which use hash functions to map keys to data structures on records, are generally faster than binary look-up in the case of ordered arrays. Most of the implementation average overhead is constant. However, hashing does not apply to fuzzy matching, such as calculating forward, subsequent, and nearest keys. The only information that hashing can give us in the case of a failed query is that the target does not exist in the record. Binary lookup is an ideal pattern for this kind of matching, consuming logarithmic time.

Trees

Binary search tree is a binary tree data structure based on binary search principle. The tree records are arranged in order, and each record in each tree can be searched using a binary search method that takes logarithmic time on average. The average time to insert and delete is also logarithmic. This will consume linear time faster than ordered arrays, and binary trees have all the operations that ordered arrays can perform, including range and fuzzy lookups.

However, binary search is generally more efficient than binary search tree search, because binary search trees are likely to be completely unbalanced, resulting in poor performance. The same applies to balanced binary search trees, which balance their own nodes slightly towards a fully balanced tree. Although unlikely, it is possible for a tree to have only a few nodes with two child nodes resulting in a severe imbalance, in which case the average time loss and the worst case are almost O(n). Binary search trees take up more space than ordered arrays.

Binary search trees can be efficiently structured in the file system, so they can be searched quickly on the hard disk. B-tree generalizes this approach to tree structure. B-tree is used to organize long-term storage such as databases and filesystems.

Linear search

Linear Search is a simple Search algorithm that searches every record until it finds the target value. Linear search can be used on linked lists, where insertion and deletion are faster than on arrays. Binary search is faster than linear search unless the array is very short. If the array must be sorted first, the cost must be amortized in the search. Sorting arrays also allows for efficient approximate matching and other operations.

Set membership algorithms

A search related issue is set membership. All search algorithms, such as binary search, can be applied to set members. There are other algorithms that are more suitable for collection members. Bit arrays are the simplest and are useful when the range of keys is limited. It’s very fast, order one time. Judy array can handle 64 – bit keys efficiently.

For approximate results, Bloom filters are another hash-based probabilistic data structure that stores collections of keys encoded using bit arrays and multiple hash functions. Bloom filters are more space-efficient than bit Arrays in most cases, but not much slower: member lookup takes O(k) time using k-weighted hash functions. However, Bloom filters are prone to misjudgment.

Other data structures

Reprint please indicate the source leonchen1024.com/2018/08/14/…

There are some data structures that in some cases are more efficient than binary searches on ordered arrays for lookups or whatever. For example, searches, approximate matches, and other operations on van Emde Boas trees, fusion trees, prefix trees, and bit arrays can be more efficient than binary searches on ordered arrays. However, although these operations can be more efficient than using ordered arrays if keys are ignored, such data structures usually take advantage of certain key properties (keys are usually small integers), so a key lacking those properties will consume more space or time. Some structures, such as the Judy matrix, use a combination of ways to ensure efficiency and the ability to perform approximate matching.

variant

Uniform binary search

Uniform Binary Search is not a store of bound values for lower and upper limits, but rather an index of intermediate elements and changes from the middle element in one loop to the middle element in the next. The change at each step is reduced by half. For example, the array to search is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] and the middle element is 6.Uniform binary search operates on both the left and right subarrays. In this case, the middle element of the left subarray ([1, 2, 3, 4, 5]) is 3 and the middle element of the right subarray ([7, 8, 9, 10, 11]) is 9. Then store 3 as the difference between the two intermediate elements and 6. To reduce the space used in the search, the algorithm adds or subtracts both this and the intermediate element changes. The advantage of this algorithm is that the index differences of each loop can be stored in a table, which can improve the performance of the algorithm in some systems.

Exponential search

Exponential Search extends binary Search to unbounded arrays. It starts by looking for the index of an element whose first index is a power of 2 and is greater than the target value. It then sets the element index to the upper boundary and begins a binary search. Exponential search consumptionSecond cycle, then binary search consumptionThe second loop, x is the position of the target value. Exponential lookups work for bounded lists and improve performance over binary lookups when the target value is near the beginning of the array. Please indicate the source of reprintLeonchen1024.com/2018/08/14/…

Interpolation search

The Interpolation search ignores the location of the target value and computes the distance between the lowest and highest element of the array which is the length of the array. This can only be used if the array elements are numbers. It applies when the median is not the best guess choice. For example, if the target value is near the highest element of the array, it is best to locate it at the end of the array. If the array is evenly distributed or nearly evenly distributed, it consumesOnce more.

In fact, an interpolated search is slower than a binary search with fewer array elements because it requires extra computation. Although the increase in time complexity is less than that of binary search, only in the case of large arrays can the computational loss be compensated for.

Fractional cascading

Fractional cascading can improve the efficiency of finding the same elements or approximate matches in multiple ordered arrays, as needed to find the same elements in each array separatelyK is the number of arrays. Decentralized cascade reduces this time by storing the information in each array in the specified way .

Reprint please indicate the source leonchen1024.com/2018/08/14/…

It aggregates the values of each array into an array of the form 11[0,3,2,0], and the numbers in parentheses are the numbers that the value should return in the corresponding array.) it improves the efficiency of finding the same value in multiple arrays, effectively solving a series of search problems in computational geometry and other fields

Decentralized cascade was invented for efficient solutions to various kinds of computational geometry problems, but it also applies to other applications, such as data mining and Internet Protocol.

Implementation problems

Note that the intermediate value method, if used (L+R)/2 when the number of elements in the array is too large, will cause overflow. So I’m going to use L plus R minus L over 2.

The sample

Version C – Recursive

int binary_search(const int arr[], int start , int end , int khey){
    if (start > end)
      return - 1;

    int mid = start +(end - start)/2;   // Direct averaging may overflow, so use this algorithm
    if (arr[mid] > khey)
        return binary_search(arr , start , mid - 1 , khey);
    else if (arr[mid] < khey)
        return binary_search(arr , mid + 1 , end , khey);
    else
        return mid;    // Equality is detected last because most searches are either greater than or less than

}

Copy the code

C version – while loop

int binary_search(const int arr[], int start, int end, int khey){
    int result = - 1;    // Return -1 if no data is found

    int mid;
    while (start <= end){
      mid = start + (end - start)/2 ;    // Direct averaging may overflow, so use this algorithm
      if (arr[mid] > khey)
          end = mid- 1;
      else if (arr[mid] < khey)
          start = mid + 1;
      else{    // Equality is detected last because most searches are either greater than or less than
          result = mid;
          break; }}return result;

}

Copy the code

Python3 recursive

def binary_search(arr, start, end, hkey):
    if start > end:
        return - 1

    mid = start + (end - start) / 2
    if arr[mid] > hkey:
        return binary_search(arr, start , mid - 1,hkey)
    if arr[mid] < hkey:
        return binary_search(arr, mid + 1, end, hkey)
    return mid

Copy the code

Python3 while loop

def binary_search(arr, start, end, hkey):
    result = - 1

    while start <= end:
        mid = start + (end - start) / 2
        if arr[mid] > hkey :
            end = mid - 1
        elif arr[mid] < hkey :
            start = mid + 1
        else :
            result = mid
            break

    return result

Copy the code

Java recursive

public static int binarySearch(int[] arr, int start, int end, int hkey){
    if (start > end)
        return -1;

    int mid = start + (end - start)/2;    // Prevent overflow
    if (arr[mid] > hkey)
        return binarySearch(arr, start, mid - 1, hkey);
    if (arr[mid] < hkey)
        return binarySearch(arr, mid + 1, end, hkey);
    return mid;  

}

Copy the code

The Java while loop


public static int binarySearch(int[] arr, int start, int end, int hkey){
    int result = -1;

    while (start <= end){
        int mid = start + (end - start)/2;    // Prevent overflow
        if (arr[mid] > hkey)
            end = mid - 1;
        else if (arr[mid] < hkey)
            start = mid + 1;
        else {
            result = mid ;  
            break; }}return result;

}

Copy the code

References

En.wikipedia.org/wiki/Binary…

Reprint please indicate the source leonchen1024.com/2018/08/14/…

About Me

My blog is leonchen1024.com

My lot github.com/LeonChen102…

Wechat official account