1 Time Complexity

1.1 Asymptotic notation

θ \Theta NOTATION (pronounced Big Theta, asymptotically tight bound) :


Θ ( g ( n ) ) = f ( n ) : c 1 . c 2 . n 0 R + . n p n 0 . There are 0 Or less c 1 g ( n ) Or less f ( n ) Or less c 2 g ( n ) \ Theta (n) (g) = \ \ {f (n) : \ exists c_1, c_2, n_0 \ in \ mathbb {R} ^ +, \ forall n \ ge n_0, have 0 \ le c_1g (n), le f (n), le c_2g (n) \ \}

For any two function f (n) f (n) (n) f and g (n), g (n) (n), g is f (n) = O (g (n)) f (n) = O (g (n)), f (n) = O (g (n)), If and only if f (n) = O (n) (g) f (n) = O (n) (g) f (n) = O (g (n)) and f (n) = Ω (g (n) (n) = f \ Omega (n) (g) f (n) = Ω (g (n))

OOO notation (pronounced Big O, asymptotically upper bound) :


O ( g ( n ) ) = f ( n ) : c . n 0 R + . n p n 0 . There are 0 Or less f ( n ) Or less c g ( n ) O (n) (g) = \ \ {c, f (n) : \ exists n_0 \ in \ mathbb {R} ^ +, \ forall n \ ge n_0, have 0, le f (n), le CG (n) \ \}

Omega \Omega notation (pronounced Big Omega, asymptotically lower bound) :


Ω ( g ( n ) ) = f ( n ) : c . n 0 R + . n p n 0 . There are 0 Or less c g ( n ) Or less f ( n ) \ Omega (n) (g) = \ \ {c, f (n) : \ exists n_0 \ in \ mathbb {R} ^ +, \ forall n \ ge n_0, have 0 \ le CG \ le f (n) (n) \ \}

Ooo notation (pronounced Little O, non-asymptotically compact upper bound) :


o ( g ( n ) ) = f ( n ) : c R + . n 0 > 0 . n p n 0 . There are 0 Or less f ( n ) < c g ( n ) O (n) (g) = \ \ {f (n) : \ forall c \ \ mathbb in {R} ^ +, \ exists n_0 \ gt 0, \ forall n \ ge n_0, 0 \ le f (n) \ lt CG (n) \ \}

ω\omega omega notation (pronounced Little omega, non-asymptotically compact lower bound) :


Omega. ( g ( n ) ) = f ( n ) : c R + . n 0 > 0 . n p n 0 . There are 0 Or less c g ( n ) < f ( n ) \ omega (n) (g) = \ \ {f (n) : \ forall c \ \ mathbb in {R} ^ +, \ exists n_0 \ gt 0, \ forall n \ ge n_0, have 0 \ le CG \ lt f (n) (n) \ \}

1.2 main method

The master theorem is as follows:

Let a≥1a \ge 1a≥1 and b>1b \gt 1b>1 be constants, f(n)f(n)f(n) f(n)f(n) is a function, and T(n)T(n)T(n) T(n) is a recurrence defined on non-negative integers:


T ( n ) = a T ( n / b ) + f ( n ) T(n) = aT(n/b) + f(n)

Where n/bn/bn/b is interpreted as ⌊n/b⌋\lfloor n/b \rfloor⌊n/b⌋ or ⌋ n/b \lceil n/b \rceil ⌊n/b singer Then T(n)T(n)T(n) has the following asymptotic bound:

  1. If for a constant ϵ > 0 \ epsilon \ gt 0 ϵ there > 0 f (n) = O (nlogba – ϵ) f (n) = O (n ^ {log_ba – \ epsilon}) f (n) = O (nlogba – ϵ), T (n) = O (nlogba) T (n) = O (n ^ {log_ba}) T (n) = O (nlogba)
  2. If f (n) = O (nlogbalg ⁡ kn) f (n) = O (n ^ ^ {log_ba} \ lg k, n) (n) = O f (nlogbalgkn), the k acuity 0 k \ ge acuity 0 0 k, T (n) = O (nlogbalg ⁡ n k + 1) T (n) = O (n ^ {log_ba} \ lg ^ 1} of {k + n) T (n) = O (nlogbalgk + 1, n)
  3. If for a constant ϵ > 0 \ epsilon \ gt 0 ϵ there > 0 f (n) = Ω (nlogba ϵ) + f (n) = \ Omega (n ^ + \ epsilon} {log_ba) f (n) = Ω (nlogba + ϵ), And for a constant c 1 c or less \ le 1 c 1 or less, and all the big enough NNN have af (n/b) cf (n) or less af (n/b) \ le cf (n) af (n/b) cf (n) or less, the T (n) = O (f (n)) T (n) = O (f (n)) T (n) = O (f (n))

Example 1:


T ( n ) = 9 T ( n / 3 ) + n T(n)=9T(n/3)+n

Using case 1 of the master theorem, T(n)=O(n2)T(n)=O(n^2)T(n)=O(n2).

Example 2:


T ( n ) = T ( 2 n / 3 ) + 1 T(n)=T(2n/3)+1

Using the main theorem 2, can solve T (n) = O (lg ⁡ n) T (n) = O (\ lg {n}) T (n) = O (LGN).

Example 3:


T ( n ) = 3 T ( n / 4 ) + n lg n T(n)=3T(n/4)+n \lg{n}

Using the main theorem 2, can solve T (n) = O (NLG ⁡ n) T (n) = O (n \ lg {n}) T (n) = O (NLGN).

1.3 Important Theorems

Stirling formula:


lim n + up n ! 2 PI. n ( n e ) n = 1 \lim_{n\to+\infty}\frac{n! }{\sqrt{2\pi n}\left(\frac{n}{e}\right)^n} = 1

or


n ! material 2 PI. n ( n e ) n n! \approx \sqrt{2\pi n}\left(\frac{n}{e}\right)^n

Fibonacci general term formula:


F 0 = 0 F 1 = 1 F i = F i 1 + F i 2 = 1 5 ( 1 + 5 2 ) n ( 1 5 2 ) n = 1 5 ( 1 + 5 2 ) n + 1 2 \begin{aligned} F_0 &= 0\\\\ F_1 &= 1\\\\ F_i &= F_{i-1}+F_{i-2}\\\\ &=\frac{1}{\sqrt5}\left\lfloor\left(\frac{1+\sqrt5}{2}\right)^n-\left(\frac{1-\sqrt5}{2}\right)^n\right\rfloor\\\\ &=\left\lfloor\frac{1}{\sqrt5}\left(\frac{1+\sqrt5}{2}\right)^n+\frac{1}{2}\right\rfloor \end{aligned}

2 the sorting

2.1 Insertion Sort

Evolved from poker sorting, starting with the second card, inserted forward to the correct position. When the array to be sorted is ordered, the time complexity is optimal O(n)O(n)O(n).

  • Average time complexity: O(n2)O(n^2)O(n2)
  • Space complexity: O(1)O(1)O(1)

2.2 Merge Sort

A partition method of array [l.. r] A [l.. r] A [l.. r], take m = ⌊ ⌋ m = l + r2 \ lfloor \ frac {l + r} {2} \ rfloorm = ⌊ 2 l + r ⌋, Recursively to A [l. m] A [l.. m] A [l.. m] and A [m.. r] A [m.. r] A [m.. r] sort, merge and sorted two parts. O(n)O(n)O(n) extra space is needed when merging.

  • Average time complexity: O(NLG ⁡n)O(n\lg{n})O(NLGN)
  • Space complexity: O(n)O(n)O(n)

2.3 the heap sort

It takes O(n)O(n)O(n) O(n)O(n)O(n) O(lg⁡n)O(\lg{n})O(LGN) time pop to complete the sorting.

  • Average time complexity: O(NLG ⁡n)O(n\lg{n})O(NLGN)
  • Space complexity: O(1)O(1)O(1)

2.4 QuickSort

Take any point A[p]A[p]A[p] A[p] of array AAA as pivot, place the elements smaller than A[p]A[p]A[p] A[p]A[p] in the left part of the array, and place the elements larger than A[p]A[p]A[p] A[p]A[p] in the right part of the array, recursively sorting the left and right parts respectively.

Pivot can be selected randomly, or the median of the left-most value A[L]A[L]A[L]A[L]A[L]A, the median value A[m]A[m]A, and the right-most value A[r]A[r]A[r] A[R].

When the recursion to the array range is very small (such as 16 digits), insertion sort can be used instead of quicksort to reduce the recursive stack depth.

  • Average time complexity: O(NLG ⁡n)O(n\lg{n})O(NLGN)
  • Space complexity: O(1)O(1)O(1)

2.5 Quick Selection

Use quicksort method to find the element with the smallest KKK. It works the same way as quicksort, but it is more efficient to recurse to the left or right after the pivot is found, rather than to recurse to both sides, with an expected time of O(n)O(n)O(n).

The worst-case time complexity of fast selection is O(n2)O(n^2)O(n2). By improving pivot selection, a selection algorithm with the worst time complexity of O(n)O(n)O(n) can be obtained, but the actual efficiency is often not as high as the general fast selection algorithm because of the high constant factor.

2.6 Introsort

The sorting algorithm starts with quicksort, switches to heapsort when the depth of recursion exceeds a certain depth (depth is the log of the number of sorted elements), and switches to insert sort when the recursion is small.

With this approach, introspective sorting can achieve the high performance of quicksort on conventional data sets while retaining the time complexity of O(NLG ⁡n)O(n\lg{n})O(NLGN) in the worst case.

At present, the sort of C++ STL adopts Introsort algorithm. The Introselect algorithm, which has the same principle as Introsort, is also adopted by C++ STL as the default selection algorithm (nth_element).

  • Average time complexity: O(NLG ⁡n)O(n\lg{n})O(NLGN)
  • Space complexity: O(1)O(1)O(1)

2.7 Counting Sort

Counting sort uses an additional array CCC, where the third element is the number of elements in the array AAA to be sorted that have a value equal to iii. The elements in AAA are then sorted into the correct positions according to the array CCC. Because the elements of array AAA may have large values, the storage space of array CCC may also need to be large, with some limitations.

Because count sort is not cache-friendly, the efficiency of count sort is lowest when the CCC space exceeds the size of the L3 Cache. For large array sorts, counting sort is actually less efficient than radix sort.

  • Average time complexity: O(n+r)O(n+r)O(n+ R)O(n+r), where RRR represents the maximum value of an element
  • Space complexity: O(r)O(r)O(r)

2.8 Radix Sort

Radix sort is done by cutting integers into different numbers by number of digits (or bytes) and comparing them separately by number of digits (or bytes) from lowest to highest.

Radix sort usually selects 1 byte as the radix for comparison, which is L1 cache-friendly and efficient. The efficiency of radix sort is affected when the number of bytes is large.

  • Average time complexity: O(d(n+k))O(d(n+k))O(d(n+k)), where DDD represents the number of rounds for each number comparison and KKK represents the number of digits of the base
  • Space complexity: O(n+k)O(n+k)O(n+k)

2.9 Summary of sorting algorithm

Sorting algorithm On average The best situation The worst Spatial complexity The stability of
Insertion sort
Θ ( n 2 ) \Theta(n^2)

Ω ( n ) \Omega(n)

O ( n 2 ) O(n^2)

O ( 1 ) O(1)
stable
Merge sort
Θ ( n lg n ) \Theta(n\lg{n})

Ω ( n lg n ) \Omega(n\lg{n})

O ( n lg n ) O(n\lg{n})

O ( n ) O(n)
stable
Heap sort
Θ ( n lg n ) \Theta(n\lg{n})

Ω ( n lg n ) \Omega(n\lg{n})

O ( n lg n ) O(n\lg{n})

O ( 1 ) O(1)
unstable
Quick sort
Θ ( n lg n ) \Theta(n\lg{n})

Ω ( n lg n ) \Omega(n\lg{n})

O ( n 2 ) O(n^2)

O ( 1 ) O(1)
unstable
Count sorting
Θ ( n + r ) \Theta(n+r)

Ω ( n + r ) \Omega(n+r)

O ( n + r ) O(n+r)

O ( r ) O(r)
stable
Radix sort
Θ ( d ( n + k ) ) \Theta(d(n+k))

Ω ( d ( n + k ) ) \Omega(d(n+k))

O ( d ( n + k ) ) O(d(n+k))

O ( n + k ) O(n+k)
stable

In the table DDD represents the number of rounds for each number comparison, KKK represents the number of digits of the base, and RRR represents the maximum value of the element

Among comparison sorts, Introsort, which is the combination of quick sort, heap sort and insertion sort, has the best performance at present.

In the non-comparison sort, the cardinality sort has the best performance, which is better than the introspection sort under single thread.

In multi-thread sorting, introspective sorting based on multi-thread out-of-order emission is the best one.

In linked list sort, merge sort has the best performance at present.

2.10 External Sort

External sort generally adopts the external merge sort algorithm of loser tree optimization to sort the data that cannot be completely put into memory. The specific method is as follows:

  1. For data with a total value of NNN, under the condition of sufficient memory, the data of MMM (obtained according to memory) is read each time and output to disk by internal sorting.
  2. KKK road merge (usually implemented by loser tree) to merge KKK files. Bytes are read from KKK files each time and stored in the input buffer, and the merged result is stored in the output buffer and then stored in disk.

Where KKK is generally a power of 2 (maximizing the loser tree effect), ⌈ N /m⌉\ LCeil N /m\ Rceil ⌈ N/M ⌉ Internal sort and Log logk(N/M) disk \ LCeil \log_k(N/M)\ RCeil Administrator Logk (N/M) So the total time complexity is O (NLG ⁡ m + nlog ⁡ k (n/m)) = O (NLG ⁡ n) O (n \ lg} {m + n \ log_k (n/m)) = O (n \ lg {n}) O (NLGM + nlogk (n/m)) = O (NLGN), and internal same sort time complexity. But when the memory capacity is small relative to the total amount of data, the main time cost is in disk I/O, so it is generally much slower than internal sorting that operates only on memory.

Ways to optimize performance:

  1. Parallel computing:

    • Sequential disk reads and writes can be accelerated by processing data in parallel with multiple disk drives.
    • Optimize internal sort and merge operations using multiple threads.
    • With asynchronous input and output, you can sort and merge simultaneously, and read and write simultaneously.
    • Use multiple computers to build distributed computing and share computing tasks.
  2. Optimization algorithm:

    • To optimize the internal sorting algorithm, radix sorting can be considered to replace ordinary comparison sorting for special data.
    • The merging algorithm is optimized by using the characteristics of binary heap, paired heap, loser tree and other data structures.

3 Data Structure

3.1 Basic data structure

3.1.1 list

A linked list is a chained storage structure in which nodes are connected by Pointers and each node is allocated a separate memory. Therefore, if the linked list node is too small, it is not cache-friendly and performance will be poor.

The insertion and deletion time of linked list is O(1)O(1)O(1), and the random access time is O(n)O(n)O(n) O(n). Linked lists can be sorted by insertion sort of O(n2)O(n^2)O(n2) and merge sort of O(NLG ⁡n)O(n\lg{n})O(NLGN).

3.1.2 Stacks and queues

The stack is a last-in, first-out (LIFO) ADT that pops only the most recently inserted element. Queues are the opposite. A queue is a first-in, first-out (FIFO) ADT that only pops up the first inserted element that is still in the queue.

Both stacks and queues can be implemented using linked lists and arrays. When stack or queue memory requirements are small, it is more efficient to implement arrays of fixed size. When the required memory is large or the memory is not fixed and needs to be allocated dynamically, the implementation of stack and queue using dynamic array is more efficient. The stack and queue insert and delete times are O(1)O(1)O(1), but random access is not supported.

A two-way queue can pop up both the last inserted element and the first inserted element, similar to a combination of stack and queue.

Different from ordinary queues, priority queues are generally implemented by the maximum heap or minimum heap, and can only pop up the maximum or minimum value. Due to different internal structures, the insertion and deletion time of priority queues is usually O(lg⁡n)O(\lg{n})O(LGN).

3.1.3 binary tree

A tree is a chain structure that can have multiple points. A binary tree is a tree with at most two points, and the two children that a binary tree node points to are called the left child and the right child, in the same way that a node itself is called a parent or parent. The tree has only one root, which is at the top of the tree. The childless nodes of the tree are called leaves and are at the lowest level of the tree.

In addition to chained ADT, binary trees can also be implemented through arrays.

Hash table 3.2

A Hash table is a data structure that accesses data stored in memory based on a Key. That is, it speeds up lookups by calculating a function on key values (often called a hash function) that maps the data for the query to a location in the table to access the record.

Hash tables usually use chaining or open addressing to resolve location conflicts.

Chaining resolves conflicts by following each position in the hash table with a linked list. The insertion time of the chaining hash table is O(1)O(1)O(1), and the worst-case lookup time is O(n)O(n)O(n). Under the assumption of simple uniform hashing, the mean time of lookups is O(1+α)O(1+\alpha)O(1+α), where α\alphaα is the desired length of the linked list and also the load factor of the hash table.

The addressing method is developed to store all elements in a hash table. When inserted elements conflict, it probes to find an empty slot to place the element to be inserted. Linear exploration, quadratic exploration, double exploration method is usually used, in which the dual exploration because of the use of two different hash functions for position search, the probability of conflict will be smaller than linear and quadratic exploration, is one of the best methods for developing addressing method. The expected number of probes for an unsuccessful lookup of an open addressing hash table loaded with factor α\alphaα (α\alphaα < 1) is at most 1/(1−α)1/(1- alpha)1/(1−α), The expectations of a successful search probe frequency up to 1 alpha ln ⁡ 11 – alpha \ frac {1} {\ alpha} \ ln \ frac {1} {1 – \ alpha} alpha 1 ln1 – alpha 1, the expectations of an insert probe frequency up to 1 / (1 – (alpha) 1 / (1 – \ alpha) 1 / (1 – (alpha).

If the hash table is half-full, the expected number of probes for a successful lookup is less than 1.387. If the hash table is 90% full, the expected number of probes is less than 2.559.

3.3 Hash table implementation of unordered_map

GCC c++ unordered_map implementation uses dynamic array + link method. The size of the hash table grows dynamically. Whenever the number of elements exceeds the size of the array, the array is doubled and the first prime number is found backwards, rehash the original elements into the new table. Each bucket of unORDERED_map is concatenated by a pointer (that is, the last node of each bucket’s list points to the next bucket) to facilitate traversal that would otherwise require accessing the entire dynamic array each time. The linked list of each bucket contains key-value pairs and size_t hash codes. The hash codes are compared first and then the key values. If the key comparison is difficult, the search efficiency can be effectively improved.

3.3 the tree

3.3.1 Definition of complete binary tree

Perfect binomial tree, complete binomial tree and full binomial tree are defined as follows:

The tree describe
Perfect binary tree Each node except the leaf node has two children, and each layer is completely filled.
Complete binary tree Every layer except the last is fully filled and all nodes remain aligned to the left.
Full binary tree Every node except the leaf node has two child nodes.

3.3.2 Binary search tree

Binary search tree is a search tree organized by binary tree structure, in which the value of each node is greater than or equal to any node of its left subtree, and less than or equal to any node of its right subtree. The insert, delete, find (ordered) time of binary search tree is O(lg⁡n)O(\lg{n})O(LGN)

To delete node XXX from binary search tree, the following three situations need to be considered:

  1. If XXX has no children, it is simply removed and its parent is modified, replacing XXX with NIL as the new child.
  2. If XXX has only one child, the child is promoted to the position of XXX, and the parent of XXX is modified to replace XXX with a child of XXX.
  3. if
    x x
    There are twoChildren, then find
    x x
    In the subsequent
    y y
    (i.e.
    x x
    The leftmost node in the right subtree of is greater than
    x x
    ), let
    y y
    occupy
    x x
    The location of the. There are two scenarios to consider:

    1. If XXX is not the parent of YYy, let YYy occupy the position of XXX, so that the right subtree of XXX becomes the new right subtree of YYY, and the left subtree of XXX becomes the new left subtree of YYy. If yyy originally has a right subtree, raise it one position to replace the original position of YYy.
    2. If XXX is the parent node of YYy, promote YYy to the position of XXX and make the left subtree of XXX the new left subtree of YYy.

3.3.3 AVL tree

AVL tree is the first self-balanced binary search tree invented. In an AVL tree, the maximum height difference between the two subtrees corresponding to any node is 1, so it is also called a height balanced tree. The average and worst-case time complexity of lookup, insert, and delete is O(lg⁡n)O(\lg{n})O(LGN). Adding and removing elements may require one or more tree rotations to rebalance the tree.

AVL tree insertion requires up to two rotations and deletion requires up to O(lg⁡n)O(\lg n)O(LGN) rotations.

An AVL tree of HHH height must have at least Fibonacci(H +3)−1Fibonacci(H +3) -1fibonacci (H +3)−1 node. So the minimum height of AVL tree ⌈ the log ⁡ 2 (n + 1) ⌉ \ lceil \ log_2 (n + 1) \ rceil ⌈ log2 (n + 1) ⌉, Maximum height is ⌊ 1.44 log ⁡ 2 (n + 2) – 0.328 ⌋ \ lfloor1.44 \ log_2 (n + 2) – 0.328 \ rfloor ⌊ 1.44 log2 (n + 2) – 0.328 ⌋

3.3.4 Splay tree

Splay Tree is a self-balancing binary search Tree, which can complete the insertion, find, modify and delete operations based on Splay operation in the time of evenly distributed O(lg⁡n)O(\lg{n})O(LGN). The worst-case single operation time complexity of the extended tree is O(n)O(n)O(n) O(n), but the worst-case amortized time complexity is O(lg⁡n)O(\lg{n})O(LGN).

When a node XXX is accessed, the stretch operation moves XXX to the root node. To stretch, we perform a series of rotations, each of which brings XXX closer to the root node. By stretching after each node visit, the most recently visited node will be closer to the root node, and the stretch tree will be roughly balanced, so that we can get the lower bound of the expected amortized time complexity — amortized O(lg⁡n)O(\lg{n})O(LGN).

The insertion, search and deletion of extended tree is similar to that of ordinary BST. After each operation, the visited node is moved to the root by Splay operation (if the deletion operation is performed, Splay is the parent of the deleted node).

3.3.5 the red-black tree

Red-black tree is the most used kind of self-balanced binary search tree, which requires an extra 1 color bit (red/black) to maintain the tree balance. Red-black trees keep the approximate balance of the tree by rotating and adjusting the color bit (that is, the difference between the nodes of the left and right subtrees may be greater than 1).

Because red-black trees only guarantee the absolute equilibrium of black nodes, but not red nodes, the height of red-black trees is at most 2log⁡2(n+1)2\log_2(n+1)2log2(n+1). For nodes with black height DDD, the height difference between left and right subtrees is DDD at most, so the number difference between left and right subtrees is 22D − 2D2 ^{2d} -2 ^ D22D −2d at most. When d= 10D = 10D =10, the maximum difference between nodes is 220−210=10475522^{20} -2 ^{10}=1047552220−210=1047552.

Because red black tree is not absolutely balanced, the maximum height is nearly twice higher than AVL tree in extreme cases, so it is generally not implemented in the way of array. Assuming the number of nodes in a red-black tree is 1023, the height of the tree is at most 2log⁡2(1023+1)=202\log_2(1023+1)=202log2(1023+1)=20, With array implementation at least 220=10485762^{20}=1048576220=1048576 elements, a huge waste of space.

Red-black trees have the following properties:

  1. Each node is either red or black.
  2. The root is black.
  3. Each leaf (NIL) is black.
  4. If a node is red, both of its children are black.
  5. For each node, a simple path from the node to all of its descendants contains the same number of black nodes (with the same black height).

3.3.5.1 Rotation of red-black trees

Rotation is divided into left rotation and right rotation, as shown in the figure below:

3.3.5.2 Inserting a red-black tree

As with normal BST, nodes are inserted at the corresponding leaf nodes. After node ZZZ is inserted, the following three conditions need to be considered and repaired by rotation and discoloration:

  • Case 1: the tertiary node YYy of ZZZ is red.
  • Case 2: the tertiary node yyy of ZZZ is black and ZZZ is a right child.
  • Case 3: the tertiary node yyy of ZZZ is black and ZZZ is a left child.

Notice that case 2 can be turned into case 3.

A red-black tree requires at most two rotations and one color flip (case 1) after insertion to maintain the properties of a red-black tree.

3.3.5.3 Deleting a red-black tree

After node ZZZ is deleted, if there is only a single subtree XXX in ZZZ, move the subtree XXX to the position of ZZZ. In this case, node YYy is the same as node ZZZ.

If ZZZ has two subtrees and YYy is the successor of ZZZ and XXX is the right subtree of YYY, then replace node ZZZ with node YYy so that the color of YYy is the same as that of ZZZ, and then move the right subtree XXX of YYy to the original position of YYY.

When the original color of node YYy is black, consider the following four cases and repair them by rotation and discoloration:

  • Case 1: the WWW sibling of XXX is red.
  • Case 2: WWW, the sibling of XXX, is black, and both WWW children are black
  • Case 3: the WWW sibling of XXX is black, the left child of WWW is red, and the right child of WWW is black
  • Case 4: the WWW sibling of XXX is black, and the right child of WWW is red

Note that case 1 can be converted to case 2, 3, and 4, and case 3 can be converted to case 4.

A red-black tree requires up to three rotations (and two color flips) after nodes are removed to maintain the properties of a red-black tree.

3.3.5.4 Traversal of red-black tree

Since the parent node is included in the red-black tree implementation, the parent node can be used to implement the iterator of the red-black tree. The algorithm to find the successor of node XXX is as follows:

  1. If XXX has a right subtree, return the leftmost node of the right subtree of XXX.
  2. If XXX is the right subtree of its parent, set x=x.parentx=x.parentx=x.parent; otherwise, skip to Step 4.
  3. Repeat Step 3.
  4. Return x.p arentx. Parentx. The parent.

3.3.6 Balancing binary tree selection

According to the different advantages of several balanced binary trees, the choice can be made by:

The data structure Insert data To find the data
Red and black tree Random, occasionally sequential Random, occasionally sequential
AVL tree The order random
Splay tree The order The order

There is little difference between the optimized AVL tree and red-black tree. The search time of AVL tree is slightly lower than that of red-black tree, and the insert and delete time of AVL tree is slightly higher than that of red-black tree. Therefore, red-black trees are generally selected for insertion intensive tasks, while AVL trees are selected for search intensive tasks.

(contrast figure from www.zhihu.com/question/19…

In practice, red-black trees are often used as the first choice. The C++ STL, Java, and Linux kernels all choose red-black trees as balanced binary trees. Instead of red-black trees, Redis has chosen jump tables, which are more suitable for concurrent access and modification and simpler to implement.

3.3.7 jump table

Each node of the jump table is assigned a random level, and each layer can be regarded as an ordered single linked list. The probability that any node of the jump table has the number of levels XXX is (12)x(\frac{1}{2})^x(21)x, that is, the probability decreases by half with each increase in levels.

When looking for nodes in the jump table, search horizontally from the topmost node to the right, if not found, jump to the next layer and continue to search right until the node is found or to the bottom layer.

The expected jump table insert, find, delete time is O(lg⁡n)O(\lg{n})O(LGN). But the worst time is O(n)O(n)O(n). The expected space complexity of jump tables is O(n)O(n)O(n) O(n)O(n \lg n)O(NLGN) in the worst case.

The general red-black tree is faster than the jump table under single thread, and the efficiency of the deep-optimized version of the jump table with memory pool is also slightly slower than that of the non-optimized red-black tree of CLRS. Jump tables are typically used in scenarios that can be optimized for multiple threads and processors.

3.3.8 tree pile

Treap=Tree+Heap is a binary search Tree with Heap order. Treap maintains heap order by recording an additional piece of data, the priority. Treap requires only left or right rotation (single rotation) to maintain binary tree properties and heap order, which is less difficult to implement than Splay trees.

Tree heap insertion: like normal BST, insert in leaf node, then cycle comparison, if the current node priority is greater than the parent, then rotate to the parent node. Expected time complexity O (lg ⁡ n) O (lg \ {n}) O (LGN), the worst case O (n) O (n) O (n).

Tree heap lookup: same as normal BST.

Tree heap deletion: first find the node to be deleted, rotate it to the leaf node, then delete. Expected time complexity O (lg ⁡ n) O (lg \ {n}) O (LGN), the worst case O (n) O (n) O (n).

3.3.9 B tree

B tree is a balanced search tree designed for disks or other direct-access secondary storage devices. A B tree is similar to a red-black tree, but B trees are better in reducing disk I/O operands. Generally, database systems use B trees or variations of B trees (such as B+ trees) to store information. The structure of the B-tree (two-three-four tree) is as follows:

The leaves of a B tree have the same depth, the height of the tree. The number of keywords at each node of the B-tree has upper and lower bounds (except the root). Assuming that TTT is the minimum degree of B tree, the lower bound of the number of keywords in B tree nodes is T − 1T-1T −1, and the upper bound is 2T − 12T-12T −1. The number of children on nodes of a B tree is the number of keywords +1, so when t=2t=2t=2, an internal node can only have two, three, four children, that is, a two-three-four tree.

Suppose that the sum number of B tree is NNN, the maximum number of children of internal nodes is MMM, and the minimum number is MMM (generally m=⌈ m /2⌉m=\lceil m /2\rceilm=⌈ m /2⌉, for B* tree, M =⌈2M/3⌉m=\lceil 2M/3\rceilm=⌈2M/3⌉), then the minimum height of B tree is:


h m i n = log M ( n + 1 ) 1 h_{min} = \lceil\log_M(n+1)\rceil-1

The maximum height of B tree is:


h m a x = log m n + 1 2 h_{max} = \left\lfloor\log_m\frac{n+1}{2}\right\rfloor

B tree search: by comparing with each keyword, find the smallest subscript III, make the third keyword ≥\ge≥ need to find the keyword, return the result or continue to search backward.

B tree insertion: Insert according to the method of multi-fork tree. When the nodes to be inserted are full (with 2T − 12T-12T −1 keywords, split them into two nodes with T − 1T-1T −1 keywords, and promote the middle keyword to the parent node as the division point of the two new trees. You want to split every node that’s full during insertion, rather than just splitting the last node inserted, to make sure that its parent isn’t full every time you want to split.

Deletion of B-tree: Deletion of B-tree is complicated, and its details are as follows:

  1. If the keyword KKK is in node XXX and XXX is a leaf node, KKK will be deleted from XXX.
  2. If the keyword
    k k
    In the node
    x x
    , and
    x x
    If it is an internal node, do the following:

    • A. If the child node YYy before KKK in node XXX contains at least TTT keywords, find the precursor K ‘k ‘K’ in the subtree rooted in YYY. Recursively delete k’k ‘k ‘and replace KKK with k’k ‘k’ in XXX.
    • B. Symmetrically, if node YYy has fewer keywords than TTT, check the child node ZZZ after KKK in node XXX. If ZZZ has at least TTT keywords, find KKK’s successor k’k ‘k ‘in the subtree rooted in ZZZ. Recursively delete k’k ‘k ‘and replace KKK with k’k ‘k’ in XXX.
    • C. Otherwise, if yyy and ZZZ both contain only t− 1T-1T −1, merge KKK and the whole ZZZ into YYY, release ZZZ and recursively delete KKK from YYY.
  3. If the keyword
    k k
    It is not currently an internal node
    x x
    “, it must be included
    k k
    The root of the subtree of
    x . c i x.c_i
    (if
    k k
    It’s in the tree). if
    x . c i x.c_i
    only
    t 1 t-1
    You must perform steps 3a or 3b to ensure that the keyword contains at least one
    t t
    The node of the keyword. And then, through the
    x x
    To recurse to an appropriate child of.

    • A. If x. cix.c_ix.c_ix. ci contains only t− 1T-1T −1 keywords, but one of its adjacent siblings contains at least TTT keywords, drop a keyword in XXX to x.cix.c_ix.ci. X.cix.c_ix.ci adds an extra keyword by raising a keyword of the adjacent left or right sibling of x.cix.c_ix.ci to XXX and moving the pointer of the corresponding child of that sibling to x.cix.c_ix.ci.
    • B. If all the adjacent brothers of x. cix.c_ix.c_ix. ci and X. cix.c_ix.c_ix. ci contain only t− 1t-1T −1, merge x. cix.c_ix.c_ix. ci with one brother, that is, move the key of XXX to the newly merged node so that it becomes the middle key of the node.

The specific deletion is shown in the following figure:

The deletion of the B-tree will adjust the structure of the tree in the process of searching down. Therefore, it is necessary to determine whether the keyword is in the tree first, which is different from other balanced binary trees.

3.3.9.1 B + tree

All values of a B+ tree are stored in leaf nodes, while internal nodes have only keywords to index, so B+ trees are easier to delete than B trees. Since the values of B+ trees are all on the leaves, and all the leaves are connected in a linked list, B+ trees can easily support interval lookups.

Assuming that the order of B+ tree is BBB and the height is HHH, then

  1. The maximum number of values stored in a B+ tree is vmax= BH − BH −1v_{Max}= B ^ h-B ^{H-1}vmax= BH − BH −1.
  2. The minimum number of values stored in the B+ tree is vmin=2⌈b2⌉h−1−2⌈b2⌉h−2v_{min}=2\left\lceil \frac{B}{2}\right\rceil^{h-1}-2\left\lceil \ frac {b} {2} \ \ right rceil ^ {2} h – vmin = 2 ⌈ 2 b ⌉ ⌉ h in h – 1-2 ⌈ 2 b – 2.
  3. The maximum number of B+ tree storage keys is Kmax = BH −1k_{Max}= B ^ H-1kmax = BH −1.
  4. Kmin =2⌈b2⌉h−1−1k_{min}=2\left\ LCeil \frac{B}{2}\right\ Rceil ^{h-1}-1kmin=2⌈2b⌉h−1−1.

3.3.9.2 B * tree

The B* tree increases the minimum utilization of nodes from 1/21/21/2 to 2/32/32/3. Assuming that 2T2t2t is the minimum degree of B* tree, the lower bound of the number of keywords in B tree nodes is 2T − 12T-12T −1, and the upper bound is 3T − 13T-13T −1

3.4 the heap

3.4.1 track binary heap

A binary heap is a complete binary tree satisfying heap order, that is, the key values of the parent node always maintain a fixed order relation to the key values of any child node (the maximum heap or the minimum heap), and the left and right subtrees of each node are a binary heap. Binary heap implementation is the simplest, only based on the array implementation.

Binary heaps are commonly used for heap sorting, as well as implementing priority queues. Binary heap inserts and deletes are O(lg⁡n)O(\lg n)O(LGN), build heap is O(n)O(n)O(n), find the minimum (minimum heap) or maximum (maximum heap) is O(1)O(1)O(1), but merge is O(n)O(n)O(n) O(n)O(n). Left – heap, binomial heap, and Fibonacci heap are usually considered when frequent merging is required.

3.4.2 Tournament tree

Tournament tree is a tree established by comparing the size of two adjacent elements. Based on the survival of the fittest mechanism, the winner can enter the next round of comparison until the final winner is selected, thus satisfying heap order. The structure of the tournament tree is shown below:

The championship tree has a winner tree and a loser tree. The winner tree saves the winner every time, and the loser tree saves the loser every time, but passes the winner up. The loser tree requires an extra node to hold the final winner. In KKK path merge, because each pop-up is the minimum (minimum heap), and the inserted value is always larger than the pop-up value, so the loser tree is better, at this point, insert and delete can be combined into a single update operation, requiring only one round of comparison. The structure of the loser tree is shown below:

The championship tree of fixed size (such as KKK path merge) can be implemented with array, but the tree chain structure or double dynamic array implementation can only be considered when the size is not fixed and frequent deletion is required, which is not cache-friendly and inefficient.

Rule 3.4.3
D D
Heap vs. loser tree

DDD heap is mainly faster to insert and slower to delete than binary heap. Binary heap insert and delete are log⁡2n\log_2{n}log2n comparison, whereas DDD heap insert is log⁡dn\log_d{n}logdn comparison, and delete is (D −1)log⁡ DN (D-1)\log_d{n}(D −1)logdn comparison. Therefore, binary heap is better than DDD heap in cases where insertion is not required, such as selection, sorting, etc.

Loser tree is the championship tree of KKK Road merging scenarios optimized. Tournament trees are expensive when the number of nodes is not a power of two, so they are generally fixed to a power of two (such as KKK path merging).

  1. KKK times selection performance comparison: The loser tree space is 2N2N2N, and the comparison times of establishing loser tree are NNN, and the comparison times of each POP are log⁡2(2n)=log⁡2(n)+1\log_2(2n)=\log_2(n)+1log2(2n)=log2(n)+1, So KKK before choosing a small number of total times for comparing the n + k (log ⁡ 2 (n) + 1) n + k (\ log_2 (n) + 1) n + k (log2 (n) + 1). The binary heap space is NNN, the number of heap comparisons is 2N2n2n, and the number of comparison for each POP is log⁡2n\log_2{n}log2n, so the total number of comparison for the number with small KKK is 2n+klog⁡2(n)2n +k \log_2(n)2n+klog2(n). When KKK is small, the number of loser tree comparisons is significantly less than binary heap (nearly 222 times).
  2. Performance comparison of sorting: the number of loser tree comparisons is approximately N +nlog⁡2(2n)= 2N +nlog⁡2nn +n\log_2 (2n)=2n+n\log_2{n}n+nlog2(2n)= 2N +nlog2n. The number of binary heap comparisons is approximately 2n+nlog⁡2n2n +n \log_2{n}2n+nlog2n. Therefore, the time cost of the two is not very different, but the space cost of the loser tree is twice that of the binary heap, and the time cost of the loser tree is higher when the number of elements is not a power of two.
  3. KKK path merge performance comparison: KKK path merge is mainly used for external sorting of multiple merge, each KKK path comparison contains an insert operation and a delete operation. In the KKK merge with total number of NNN, the loser tree only needs one update operation, the total number of comparison is N (1+log⁡2k)n(1+\log_2{k})n(1+log2k), and the total number of DDD heap comparison is ndlog⁡ DKND \log_d{k} ndLOGdk. Therefore, the loser tree is superior to the DDD heap, and the loser tree’s advantage increases as KKK value increases.

Therefore, sorting and binary heap are better in general, while KKK sub-selection and KKK path combination are better in special cases.

3.4.4 left type reactor

The left node of the left heap (or left-biased tree) always has more nodes than the right node. The rank of the left heap is the rank of the right subtree of the root heap. The rank is the distance to the leaf node. The rank of the left heap is -1, and the rank of the leaf node is 0. The left heap inserts and deletes are implemented based on merge. When two left heaps are merged, the largest (maximum heap) root is chosen as the root of the new heap, and its right subtree is recursively merged with another heap. Check the rank of the left and right subtrees after each combination. If the rank of the left subtree is smaller than that of the right subtree, the left and right subtrees are swapped.

The insertion, deletion and merge time of left heap are all O(lg⁡n)O(\lg{n})O(LGN), but the constant factor of insertion and deletion is large, and its chain storage structure is not friendly to cache, so the left heap is selected only when the heap needs to merge frequently, and the performance of binary heap is not as good as that of binary heap in general.

3.4.5 binomial heap

A binomial heap is the set of binomial trees that satisfy the following properties:

  1. Every binary tree satisfies the maximum heap (or minimum heap) property, that is, the key of any node is greater than or equal to the key of its parent.
  2. No two or more binomial trees can have the same degree (including degree 0), that is, there can only be 0 or 1 binomial trees with degree K.

The binomial tree structure of binomial heap is uniquely determined by the binary representation of node number, so the binomial heap with node number NNN has at most only log⁡2n\log_2{n}log2n binomial trees. For example, the number of nodes of 13 is 23+22+202^3+2^2+2^023+22+20, that is, there are three binomial trees of degree 3, 2 and 0, as shown below:

The basic operation of binomial heap is merge, and its insert and delete operations can be evolved from merge operations.

Binomial heap merging: start from the binomial tree of degree 0, similar to binary addition, if there are 2 binomial trees of degree KKK, take the maximum as the root node and merge them into a binomial tree of degree K + 1K + 1K +1, add up bitwise. So the binomial heap with the maximum degree of NNN and MMM, respectively, has the maximum degree Max ⁡(n,m)+1\ Max (n,m)+ 1max(n,m)+1. The time complexity of binomial heap merge is O(lg⁡n)O(\lg{n})O(LGN)

Insertion of binomial heap: can be regarded as merging with binomial heap with node number 1, so the worst-case time complexity is O(lg⁡n)O(\lg{n})O(LGN), while the time complexity of amortized analysis is O(1)O(1)O(1).

Binomial heap search: a special pointer can be used to point to the node with the largest key among all binomial heap root nodes, and maintain the pointer in related operations, so the time complexity is O(1)O(1)O(1).

Deletion of binomial heap: it is necessary to find the node to be deleted according to the special pointer pointing to the maximum value. After deletion, the subtree of this node is divided into a new binomial heap, and then the new binomial heap is merged with the original binomial heap. Therefore, the time complexity of deletion is O(lg⁡n)O(\lg{n})O(LGN).

Decrease key (decrease key) : the worst O(lg⁡n)O(\lg{n})O(LGN) time is needed to loop and swap with its parent to maintain heap order after reducing key for a particular value.

3.4.6 Fibonacci heap

The Amortized analysis performance of Fibonacci heap is better than binomial heap, and it is a huge improvement over binomial heap O(lg⁡n)O(\ LG {n})O(LGN) in terms of the amortized time of O(1)O(1)O(1) O(1) for each decrease key in dense graph. Fibonacci heaps are commonly used for optimization of Dijkstra’s algorithm and for merging priority queues that require better amortized performance. Due to the complexity of Fibonacci heap implementation, it is rarely used in general.

3.4.7 pairing heap

Paired heap is a data structure with simple implementation and superior amortized complexity. It can be regarded as a simplified Fibonacci heap. Paired heap is a multi-fork tree with heap order, which is usually connected to sibling nodes by bidirectional linked lists. The specific structure is shown as follows:

Merge of paired heaps: Merge of paired heaps is as simple as taking the heap with the largest root key as the new heap and connecting another heap to the subtree of the new heap. The amortized time is O(1)O(1)O(1).

Pair heap insertion: can be considered as merging with a heap of only one node. The amortized time is O(1)O(1)O(1).

Matching heap lookup: return the first node of the heap directly. The amortized time is O(1)O(1)O(1).

Deletion of paired heaps: Remove the root node, first merge the child heaps in pairs from left to right, and then merge them one by one from right to left to form the new heap. Split the time complexity is O (22 lg ⁡ lg ⁡ n) O (2 ^ 2 \ SQRT {{\ lg lg {\ {n}}}}) O LGLGN (22).

Decrease key (decrease key) : If the root node is still the largest node after decrease key, the structure of the heap remains unchanged. Otherwise, it is equivalent to deleting the root node and then inserting the root node after the key reduction.

3.4.8 Performance comparison of heap

The maximum heap Find maximum value Delete maximum value insert Down the Key merge
Binary heap
O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( lg n ) O(\lg{n})

O ( lg n ) O(\lg{n})

O ( n ) O(n)
Loser tree
O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( lg n ) O(\lg{n})

O ( lg n ) O(\lg{n})

O ( 1 ) O(1)
The left pile type
O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( lg n ) O(\lg{n})

O ( lg n ) O(\lg{n})

O ( n ) O(n)
The binomial heap
O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( n ) O(n)
Fibonacci heap
O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( 1 ) O(1)

O ( 1 ) O(1)

O ( 1 ) O(1)
Pairing heap
O ( 1 ) O(1)

O ( lg n ) O(\lg{n})

O ( 1 ) O(1)

O ( 2 2 lg lg n ) O(2^{2\sqrt{\lg{\lg{n}}}})

O ( 1 ) O(1)

3.5 Do not overlap

Unjoint Set is a tree data structure used to deal with Disjoint Set merging and query problems. A parallel lookup set usually contains the following two operations:

  1. Find: Determine the subset of an element. This can be used to determine whether two elements are in the same subset.
  2. Merge (Union) : Combine two subsets into the same subset.

Check and set the operation of the expected time complexity is O (alpha) (n) O (\ alpha (n)) O (alpha (n)), the alpha (n) \ alpha is alpha (n) (n) n = A (x, x) n = A (x, x) n = A (x, x) inverse function, AAA is rapid increase ackermann function, So alpha(n) \alpha(n) alpha(n) grows rather slowly. When n=10600n=10^{600}n=10600, alpha(n) \alpha(n)α(n) is only 444, so the time complexity of each operation can be approximatively considered as O(1)O(1)O(1).

4 graph

4.1 Basic graph algorithms

Figure represented by G = (V, E) G = (V, E) G = (V, E), including ∣ V ∣ | V | ∣ n ∣ representative figure of the junction points, ∣ E ∣ | E | ∣ E ∣ represents the number of edges of the graph. The graph is generally realized by adjacency list or adjacency matrix. In sparse graph (on the edge of a few ∣ E ∣ | E | ∣ E ∣ is far less than ∣ V ∣ 2 | | V ^ 2 ∣ n ∣ 2), generally USES the adjacency list, thus saving space. The adjacency matrix is usually used in dense graphs.

Adjacency list: one contains ∣ V ∣ | V | ∣ article V ∣ chain table array, each list node represents an edge, its space is O (V + E) O (V + E) O (V + E).

Adjacency matrix: a size of ∣ x ∣ ∣ V V ∣ | V | \ times | V | ∣ x ∣ ∣ V V ∣ two-dimensional array (matrix), its space is O (2) ∣ V ∣ O (| | V ^ 2) O (2) ∣ V ∣

Breadth First Search (BFS) : starting from the starting point SSS, according to the order of the distance with SSS from small to large, that is, the algorithm needs to Search all the nodes with the distance from the starting point SSS KKK, and then Search the nodes with the distance of K + 1K + 1K +1. BFS is typically implemented using queues.

Depth-first-search (DFS) : Start from the starting point, SSS, Search along one edge to the farthest node, and then go back to SSS and Search from the other edge until the Search is complete. DFS is usually implemented recursively or on a stack.

Topological sort: For directed acyclic graph, topological sort can be carried out by DFS or BFS, so as to obtain the order in which the nodes of the graph can arrive. The algorithm first finds the node with the degree of entry 0 (that is, the node with the smallest order), and then circulates the degree of entry −1-1−1 of other nodes that these nodes can reach until all nodes are sorted.

Strongly connected component: The strongly connected component of the directed graph G=(V,E)G=(V,E)G=(V,E) is a maximal set of nodes C⊆VC\subseteq VC⊆V. For any pair of nodes UUu and VVV in CCC, The paths u→vu\to vu→v and V →uv\to uv→u exist simultaneously, that is, starting from any point in the set, one can reach any other point in the set.

To find the strongly connected component, Tarjan algorithm can be used, and its algorithm steps are as follows:

TODO

4.2 Minimum Spanning Tree

A subset of a connected undirected graph in which all points are connected and acyclic is called a spanning tree. Minimum spanning tree is the spanning tree with the smallest sum of edge weights. Generally, the algorithms to obtain the minimum spanning tree are Kruskal algorithm and Prim algorithm.

The minimum spanning tree in a directed graph is called a minimal tree graph, that is, a directed spanning tree with the least edge weight from one point to all other points. Zhuliu algorithm is generally used to obtain the minimum tree graph.

2 Kruskal algorithm

The steps of Kruskal algorithm are as follows:

  1. Sort all edges of the graph in order of edge weight from smallest to largest.
  2. Each edge is iterated sequentially, adding the edge to the spanning tree if its two points are not contiguous
  3. Repeat step 2 until the minimum spanning tree (total join ∣ n ∣ ∣ ∣ | V | – 1-1 V – 1 the edge)

Because The Kruskal algorithm needs to determine whether the two points of the edge are already connected, it needs to establish and search the set of all points. The time complexity of Kruskal algorithm is O(Elg⁡V)O(E\lg{V})O(ElgV).

4.2.2 Prim algorithm

Prim algorithm steps are as follows:

  1. Select the starting SSS and add all edges connected to the SSS to the priority queue (minimum heap).
  2. Take out the edge (u, V)(u,v) (u,v)(u, v) with the smallest edge weight from the priority queue and add it to spanning tree AAA, and add all other connected edges of VVV (not connected to AAA) to the priority queue.
  3. Repeat ∣ V ∣ ∣ ∣ | V | – 1-1 V – 1 step 2.

Using binary heap of Prim algorithm’s time complexity is O (Vlg ⁡ V + Elg ⁡ V) = O (Elg ⁡ n) O (V \ lg lg \ {n}} {V + E) = O (E \ lg} {V) O (VlgV + ElgV) = O (ElgV), The time complexity of Prim algorithm optimized by Fibonacci heap (key reduction O(1)O(1)O(1) O) is O(E+Vlg⁡V)O(E+V\lg{V})O(E+VlgV).

4.2.3 Zhu liu algorithm

TODO

4.3 Shortest Path

4.3.1 Bellman – Ford algorithm

Bellman-ford algorithm solves the single source shortest path problem in general, and can also be used to judge whether negative loops exist. The steps of Bellman-Ford algorithm are as follows:

  1. Relax all the edges.
  2. Repeat ∣ V ∣ ∣ ∣ | V | – 1-1 V – one step 1.

SSS due from the source nodes without negative ring of an arbitrary point to the shortest path at most only ∣ V ∣ ∣ ∣ | V | – 1-1 V – 1 the edge, so we need to all while repeating ∣ V ∣ ∣ | V | – 1-1 V ∣ – one relaxation operation.

Sentenced to negative ring: Assumes that the graph G = (V, E) G = (V, E) G = (V, E) of the source nodes to SSS VVV of shortest path estimates for v.d v.d v.d, then for any edge (u, V) ∈ E (u, V) \ in E ∈ E (u, V), Have v.d > u.d + w (u, v) v.d \ gt u.d + w (u, v) v.d > u.d + w (u, v), including w (u, v) w (u, v) w (u, v) as the edge, the GGG exist in the ring.

The time complexity of Bellman-Ford algorithm is O(VE)O(VE)O(VE).

4.3.2 Dijkstra algorithm

Dijkstra algorithm solves the single-source shortest path problem when all edge weights are non-negative. Dijkstra algorithm steps are as follows:

  1. Establish priority queue (minimum heap) for all node VVV of graph GGG.
  2. Take the node uuu with the smallest distance from the queue (i.e., U.U.U.U.D is the smallest).
  3. Relax all outgoing edges (u,v)(u, v)(u,v) connected to node UUu.
  4. Repeat step 2 until all nodes are removed (∣ n ∣ | V | ∣ V ∣ times).

The time complexity of Dijkstra algorithm based on binary heap is O(Elg⁡V)O(E\lg{V})O(ElgV), and the time complexity of Fibonacci heap optimization Dijkstra algorithm is O(Vlg⁡V+E)O(V\ LG {V}+E)O(VlgV+E).

4.3.3 DAG single source shortest path

Directed Acyclic Graph (DAG) can obtain the shortest path only by topologically ordering Graph GGG and then relaxing all outgoing edges connected by each node in sequence. The time complexity of the algorithm is O(V+E)O(V+E)O(V+E).

4.3.4 All nodes to the shortest path

For dense graph, floyd-Warshall algorithm can be used to obtain the shortest path of all node pairs. Floyd-warshall algorithm is a dynamic programming algorithm with the following ideas: For any point I,j,ki, j,ki, j,k, if dij>dik+dkjd_{ij} \gt d_{ik} + d_{kj}dij>dik+ DKJ, Has made dij = dik + dkjd_ {ij} = d_ {ik} + d_ {kj} dij = dik + DKJ. Therefore, the time complexity of Floyd-Warshall algorithm is O(V3)O(V^3)O(V3).

For sparse graph, Johnson algorithm can obtain the shortest path of all node pairs in O(V2lg⁡V+VE)O(V^2\lg{V}+VE)O(V2lgV+VE) time. For graph without negative weight, we only need to run Fibonacci heap optimization Dijkstra algorithm for all points to get the shortest path of all node pairs. The running time of this algorithm is O(V2lg⁡V+VE)O(V^2\ LG {V}+VE)O(V2lgV+VE). For graphs with negative weighted edges, it is necessary to establish a new node SSS, connect SSS and VVV for all points V ∈Vv\in Vv∈V, and make w(s,v)= 0W (s,v)=0w(s, v)=0. Then, bellman-Ford algorithm is run once, and the potential h(v)= v.DH= v.dh= v.d is obtained at each point while negative ring is judged, where V.D.V.D.V.D is the shortest distance from SSS to VVV. For all the edge (u, v) ∈ E (u, v) \ in E (u, v) ∈ E, calculate the new edge right w ^ = w (u, v) + h (u) – h (v) \ hat {w} = w (u, v) + h (u) – h (v) w ^ = w (u, v) + h (u) – h (v). In this case, all edge weights are positive, so we can run Dijkstra on all edges once to obtain the weighted shortest path δ^(u,v)\hat{\delta}(u,v)δ^(u,v), Through the formula of the delta (u, v) = the delta ^ (u, v) – h (u) + h (v) \ delta (u, v) = \ hat {\ delta} (u, v) – h (u) + h (v) the delta (u, v) = the delta ^ (u, v) – h (u) + h (v) all point to the shortest path.

4.4 maximum flow

TODO

5 String Matching

5.1 KMP algorithm

TODO

5.2 BM algorithm

TODO

5.3 Karp – Rabin algorithm

TODO

5.4 the Trie tree

TODO

5.5 Suffix Array

TODO

6 Dynamic Planning

TODO

reference

  • Introduction to Algorithms
  • Data Structure — Deng Junhui

Some of the pictures are from the Internet, but I forget where they came from.