Algorithm + data structure = program


takeaway

1.1 Why learn data structures and algorithms

Many people think that learning data structure and algorithm is just to cope with the interview, and there is basically no chance to use it in daily development. Some students also brush the questions, but find that sometimes these questions will cost a lot of money at a glance. There are even some students who are directly opposed to the term data structure and algorithm.

A big part of this is because you don’t really understand data structures. Have you ever wondered why big companies require data structures and algorithms? Why does the technology pass but always hang in the algorithm?

Learning data structures and algorithms is not just about learning the classical structures of arrays, linked lists, queues, stacks, trees, graphs, etc., nor is it just about learning the algorithms for interview.

In fact, data structure is one of the important links to test whether your basic skills are solid. The most important thing is that you should learn an idea: how to turn real problems into expressions in computer language.

Algorithm + data structure = program

This article will introduce the basic concepts of data structure. After understanding the basic concepts, we will do some related exercises to deepen our understanding of them. I hope I can get you some help.

This topic will also continue to add content, if there are pictures or description of the wrong place or unclear place, please point out in time, I will correct.

The basic concepts of data structure have been completed so far. Related algorithm questions will be added in the algorithm section. Thank you for your attention.

1.2 What are data structures and algorithms

Data structures are the way computers store and organize data. A data structure is a collection of data elements that have one or more specific relationships with each other. More often than not, a carefully chosen data structure can lead to higher running or storage efficiency. Data structures are often associated with efficient retrieval algorithms and indexing techniques.

Data structures include:

  1. Linear structure: linear tables (arrays, linked lists, queues, stacks, hashes)
  2. Tree structure: binary tree, AVL tree, red black tree, B tree, heap, Trie, Huffman tree, and search set
  3. Graph structure: adjacency matrix, adjacency list

Algorithm refers to the accurate and complete description of problem solving scheme and a series of clear instructions to solve problems. Algorithm represents the strategy mechanism to solve problems in a systematic way

Algorithms include: recursive method, recursive method, exhaustive method, greedy algorithm, divide and conquer, dynamic programming method, iterative method, branch boundary method, backtracking method

I. Basic concepts of algorithm

Algorithm refers to the accurate and complete description of problem solving scheme and a series of clear instructions to solve problems. Algorithm represents the strategy mechanism to solve problems in a systematic way

The advantages and disadvantages of an algorithm can be measured by space complexity and time complexity.

Features:

  1. Finiteness: The fineness of an algorithm is that it must be able to terminate after a finite number of steps are performed.
  2. Definiteness: Each step of an algorithm must have a definite definition.
  3. Input: : An algorithm has zero or more inputs to describe the initial condition of the operation object. The so-called zero Input means that the algorithm itself determines the initial condition.
  4. Outputs: An algorithm has one or more outputs that reflect the results of processing the input data. An algorithm without output is meaningless.
  5. Effectiveness: Any computational step performed in an algorithm is one that can be decomposed into basic executable operational steps, that is, each computational step can be completed within a finite time (also known as Effectiveness).

1.1 Time Complexity

The time complexity of an algorithm refers to the computational effort required to perform the algorithm. That is, the number of times the program needs to be executed

Big O notation

The large O notation is generally used to describe the degree of responsibility, which refers to the complexity of data size N

  • Ignore constants, coefficients, low order

    Number of executions The complexity of the Informal terms
    12 O(1) Constant of the order
    2n+3 O(n) Linear order
    4n^2+2n+6 O(n^2) Square order
    4log2n+25 O(logn) The logarithmic order
    3n+2nlog3n+15 O(nlogn) Nlogn order
    4n^3+3n^2+n+12 O(n^3) Cubic order
    2^n O(2^n) The index order

O(1) < O(logn) < O(n) < O(nlogn) < O(n^2) < O(n^3) < O(2^n) < O(n!) < O(n^n)

1.2 Space Complexity

Space Complexity is a measure of the amount of storage Space temporarily occupied by an algorithm while it is running, denoted by S(n)=O(f(n)). For example, the time complexity of direct insertion sort is O(n^2), and the space complexity is O(1). A normal recursive algorithm would have order n space complexity, because every time you recurse, you have to store the information that comes back. The merits and demerits of an algorithm are mainly measured from the execution time and storage space of the algorithm.

Ii. Data Structure-Linear Structure ()

A linear list is a finite sequence of n elements of the same type (n>=0)

The index0     1      2     3	   n
    a1 >> a2 >> a3 >> a4 >> ... >> an  
Copy the code

A1 is the first node/element and an is the last node/element

A1 is the precursor and successor of A2

Common linear tables Linear tables (arrays, linked lists, queues, hashes)

2.1 Array

An Array is a linear list that is stored sequentially, with contiguous memory addresses for all elements.

Related algorithms problem number 1

2.2 Linked List

A linked list is a non-sequential storage structure on a physical storage unit. The logical order of data elements is realized by linking the order of Pointers in the linked list

2.2.1 Unidirectional linked lists

Leetcode – Removes nodes in the linked list

2.2.1 Circular linked lists

Leetcode – Circular linked list

2.2.1 Bidirectional linked lists

Leetcode – Bidirectional linked list

2.3 Stack

A stack is a special kind of linear table that can only be operated on at one end

  • The operation of adding an element to the stack is called push
  • To remove elements from the stack, this is called pop.
  • The stack follows the principle of Last In First Out LIFO

Leetcode – Stack implementation with queues

Leetcode – Implement queues with stacks

2.4 Queue

A queue is a special kind of linear table, you can only do things at the top and the bottom and what’s special about a queue is that it only allows you to delete at the front end of the table, and insert at the back end of the table, and like a stack, a queue is a limited linear table. The end that inserts is called the end of the queue, and the end that deletes is called the head of the queue.

  • Rear: You can only add elements from the rear of the queue, usually called enQueue
  • Queue head (front) : Only elements can be removed from the queue head, usually called deQueue

Problem set for related algorithms

2.4.1 Dual-ended Queue

A two-end queue is an operation that can be added or deleted at both ends of a queue

2.4.2 Circular queue

Think of vector space as a ring connected end to end, and call such vectors cyclic vectors. The queues stored in them are called Circular queues. A circular queue is a circular queue that connects sequential queues end to end and logically treats the table storing queue elements as a ring.

2.5 Hash Tables/Hash Tables

A Hash table (also called a Hash table) is a data structure that is directly accessed based on Key values. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array of records is called a hash table.

A hash table is essentially an array, and each element in the array becomes a box that holds key-value pairs. Retrieves value from the array based on index. Index is obtained by a fixed hash function that converts the key to index. No matter how well a hash function is designed, it is possible for different keys to get the same hash value after being hashed, which requires handling hash conflicts.

Hash function:

A hash function is a function that maps the key values of an element in a hash table to its storage location. In a common sense, it is a function that directly calculates the location of the key in the table.

The main steps of hash function implementation:

  1. Hash of Key (must be an integer)
  2. The hash of the key is then computed in relation to the size of the array to generate an index

Load factor: total number of nodes (total number of key-value pairs)/hash bucket array length

Load factor, which is used to measure the degree of empty/full hash table, can also reflect the efficiency of query to a certain extent. A larger load factor means that the hash table is more full, more likely to cause conflicts, and lower performance. Therefore, in general, when the load factor is greater than a constant (typically 0.75), the hash table will automatically expand. When a hash table is expanded, it typically creates twice the size of the original array. Therefore, even if the hash value of key does not change, the result of modulating the number of arrays will change as the number of arrays increases, so the positions of key and value pairs may change, which is also called rehashing

2.5.1 Construction method of hash function

1. Direct customization method:

Take the keyword or a linear function value of the keyword as a hash address. That is, H(key)=key or H(key)= a·key + b, where a and b are constants (such hash functions) are called self functions). If H(key) already has a value, go to the next one until H(key) has no value, and put it in.

Example: H(key) = A ·key + B A = 1/10 B = 5

Key Hash(Key)
1630 168
80 13
4780 483
1820 187

2. Numerical analysis:

Analysis of a set of data, such as a set of the employee’s date of birth, then we found the same date of birth of the first few digits, in this case, the chances of conflict will be very big, but we found that date back several month and date number difference is very big, if use the back of the Numbers to form a hash address, will significantly reduce their risk of conflict. Therefore, numerical analysis method is to find out the rule of numbers and use these data to construct the hash address with low probability of conflict as far as possible

3. Square to middle method:

When it is impossible to determine which bits of the keyword are evenly distributed, the square value of the keyword can be calculated first, and then the middle bits of the square value can be taken as the hash address. This is because the middle bits after the square are related to each bit in the keyword, so different keywords will produce different hash addresses with a high probability.

Example: We use the alphabetic position of an English letter as the internal code for that letter. For example, the internal code of K is 11, E is 05, Y is 25, A is 01, and B is 02. Thus, the internal code of the keyword “KEYA” is 11052501. Similarly, we can get the internal code of the keywords “KYAB”, “AKEY” and “BKEY”. After the keyword is squared, bits 7 to 9 are taken out as the hash address of the keyword, as shown in the following table:

The keyword The internal encoding The square value of the internal code Hash address of the H(k) keyword
KEYA 11052501 122157778355001 778
KYAB 11250102 126564795010404 795
AKEY 01110525 001233265775625 265
BKEY 02110525 004454315775625 315

5. Folding:

Divide the keyword into parts with the same number of digits, the last number of digits can be different, and then take the sum of these parts (minus the carry) as the hash address. There are two methods of digital superposition: shift superposition and interboundary superposition. Shift stacking is to align the lowest order of each part after segmentation, and then add; Interboundary stacking is when you fold back and forth along the partition boundary from one end to the other, and then align and add.

5. Random number method:

Select a random number and take the random value of the keyword as the hash address, that is, H(key)=random(key) where random is a random function and is usually used when the keyword length is different.

6. Division and remainder method:

Take the remainder of the keyword divided by a number p not larger than the length m of the hash table as the hash address. H(key) = key MOD p,p<=m. Not only can the key directly modular, but also after folding, square take medium operations modular. The choice of P is very important, generally take prime number or m, if the choice of P is not good, it is easy to produce synonyms.

Example:

H = key % p (p<=m)

But it’s not always that simple, the key might go through some other calculation and then mod.

No matter how well a hash function is designed, it is possible for different keys to get the same hash value after being hashed, which requires handling hash conflicts.

2.5.2 Hashing Conflict Resolution

1. Zipper hair:

A combination of linked lists and arrays

A linked list stored at the location of each bucket in the hash table, with next pointing to the current element if duplicate hash values occur.

In JDK1.8, the default is to use a one-way linked list to string elements. When adding elements, the single-necklace list will be converted to a red-black tree to store elements, such as hash table size >64 && single-necklace table node number greater than 8.

When the number of red-black tree nodes is small enough, it will turn into a single linked list.

2. Rehash (Doha Hash) :

By designing two or more hash functions, you can avoid conflict, but there is still a chance of conflict, and the better or more hash functions you design, the less likely you are to clash (unless you are very bad, it is almost impossible to clash).

Open address method:

The open address method has a formula: Hi=(H(key)+di) MOD m I =1,2… , k (k < = m – 1)

Where, m is the length of the hash table. Di is the sequence of increments at the time of the conflict.

  1. Linear Probing

    If di values are likely to be 1,2,3… M – 1 constant

    Probe one by one until you find one that’s empty and then store the data in the bucket

  2. Quadratic detection

    D I = a * I 2 (I <= m/2) m is the total number of Key sets. A is a constant.

    Detect if the location of interval I2 cells is empty, if so, store the address in it.

  3. Pseudo-random detection

    d i = random(Key);

    The detection interval is a pseudo-random number.

3.5.3 Related algorithms

Leetcode – 1. Sum of two numbers

Leetcode – 3. The oldest string without repeating characters

3. Data Structure – Tree Structure

A tree structure is a hierarchy of nested structures. The outer and inner layers of a tree structure have similar structures, so the structure can be recursively represented. The various tree graphs in classical data structures are typical tree structures: a tree can be simply represented as a root, a left subtree, and a right subtree. The left and right subtrees have their own subtrees. – wikipedia


The basic concept

  • Tree species:

    • Ordered tree: Any node in a tree is in no order
    • Unordered tree: There is order between any nodes in the tree
    • Multi-fork tree/N-fork tree: Each node of a tree can have more than two child nodes
    • Binary tree: binary tree
    • Full binary tree: Full binary tree
    • Complete binary tree: Complete binary tree
    • Huffman tree: Huffman tree
  • Node: Each element in the tree is called a node

  • Empty tree: A tree without any nodes is called empty

  • Root node: A node that has no parent (also called a tree with a single root node)

  • Parent node: A node that has at least one child node directly subordinate to it.

  • Child node: the node at the next level from the parent node

  • Sibling: Children of the same parent are siblings of each other

  • Subtree: A tree consisting of a child node and all its descendants

  • Left subtree: A tree consisting of the left child node and all the children of the child node

  • Right subtree: The tree consisting of the right child node and all the descendants of the child node

  • Degree of nodes: the number of subtrees

  • Tree degree: The maximum degree of all nodes is the tree degree

  • Leaf node: node whose degree is 0

  • Non-leaf node: a node whose degree is not 0

  • The number of layers of the tree: the first layer (or layer 0) of the root node and the second layer of the child node and so on

  • Node depth: the total number of nodes on the path from the root node to the current node

  • Height: The total number of nodes along the path from the current node to the farthest leaf node

  • Tree depth: the maximum depth of all nodes

  • Tree height: The total number of nodes along the path from the following node to the furthest leaf node

  • Forest: a collection of m(m>=0) trees that do not intersect each other is called a forest.

3.1 Binary Tree

A binary tree is an ordered tree whose nodes have a degree of 2 or less. It is the simplest and most important tree. The recursion of binary tree is defined as: binary tree is an empty tree, or a non-empty tree consisting of a root node and two disjoint left and right subtrees called root respectively. Both left and right subtrees are binary trees [2].

Features of binary trees:

  • Each node has a maximum of 2 subtrees (degree Max. 2)
  • All subtrees are ordered
  • Even if there is only one subtree there are two subtrees

Properties of binary trees:

  • There are at most 2^i-1 (I ≥1) nodes on the i-th layer of a binary tree

  • A binary tree of depth H contains at most 2^ H-1 nodes

  • If in any binary tree, there are n0 leaf nodes and n2 nodes of degree 2, then there must be n0 = n2+1

  • A complete binary tree with n nodes has a depth of log2x+1 (where x represents the largest integer not greater than n)

  • If a complete binary tree with n nodes is numbered sequentially (1 ≤ I ≤ n), then, for nodes numbered I (I ≥ 1) :

    When I =1, the node is the root, and it has no parent node when I >1Is, the parent node of the node is I /22If I ≤ n, it is numbered as2Left node of I, otherwise there is no left node if2i+1≤ n, the number is2i+1Otherwise, there is no right nodeCopy the code

True binary tree: a binary tree has only nodes of degree 0 and degree 2 (if any nodes of degree 1 are not true).

Full binary tree: in a binary tree, there are only nodes with degree 0 and degree 2, and the nodes with degree 0 are all in the last layer. Full binary tree is also true binary tree

Complete binary tree: depth of k, there are n nodes of a binary tree if and only if its every node and the depth of full binary tree of k in a pair of Numbers from 1 to n nodes of the season, is called the complete binary tree, leaf node will only appear in the final two layers, and the last one layer of leaf nodes are aligned on the left, from the root node to the bottom tier 2 must be a full binary tree

Properties of complete binary trees:

  • Nodes of degree 1 have only left subtrees

  • Nodes of degree 1 have either 1 or 0

  • For a binary tree with the same number of nodes, the height of a complete binary tree is the smallest (a common binary tree with 15 nodes can have 5 layers, but a full binary tree must have only 4 layers).

  • Assume that the height of the complete binary tree is h (h >= 1)

    • At least 2^(h-1) nodes (2^0 + 2^1 + 2^2 +… +2^(h-2) + 1 ) = 2^(h-1)

    • A maximum of (2^h) – 1 node (2^0 + 2^1 + 2^2 +… Plus 2 to the h minus 1 is 2 to the h minus 1, full binary tree

    • If the total number of nodes in a complete binary tree is n.

      • Number of leaf nodes

        If the leaf node (degree is0) is n0 and the degree is1Is n1, and the degree is2If n = n0+ N1 +n2 in any binary tree, there are N0 leaf nodes and n2 degree is2, there must be n0 = n2+1N0 = n2 +1, n2 = n0- 1
        n = n0+n1+n0- 1 = 2n0+n1- 1Complete binary tree properties: degree is1The nodes of the1a either0So if n1 is equal to1Then n is even and n is equal to2n0+1- 1 = 2N0, n0 is equal to n over PI2, so if n1 =0Then n is odd, and n is equal to2n0+0- 1 = 2n0- 1N0 = (n +1) /2So odd and even combine with the formula: n0 = floor(n+1)  // round downCode: leaf node = (n+1) > >1
        Copy the code
      • Number of non-leaf nodes

        Number of non-leaf nodes: n1+n2: leaf node: if n1=1N0 = n /2, leaf node: if n1=0N0 = (n +1) /2Then: If n1 =1Then n is even, n1+n2 = n-(n/)2) = (2n-n)/2 = n/2If the n1 =0Then n is odd, n1+n2 = n-(n+)1) /2 = (2n-n- 1) /2 = (n- 1) /2So: n1+n2 = floor(n- 1) /2  // round down
        Copy the code
      • The height of a complete binary tree

        According to: a complete binary tree of height h has at least one2^(h- 1) nodes, at most (2^h)- 1So:2^(h- 1) <= n < 2^h
        			h- 1 <= log2n < h  // logarithm eg: log2n = 5.5, h-1 <= 5.5 < h, h = 6
        			h = floor(log2n)+1  // Round down the summary formula
        Copy the code
    • A complete binary tree with n nodes (n>0) numbered from top to bottom from left to right starting with 0/1 for any I nodes

      Number from x (x=0 or 1) 0 start number x = 0 1 start number x = 1
      If I is equal to x It’s the root node It’s the root node
      If I is greater than x The parent node is numbered floor(I /2) The parent node is numbered floor((i-1)/2)
      If 2i plus x is less than n minus x The left child node is numbered 2i The left child node is numbered 2i+1
      If 2i plus x is greater than n minus x It has no left child It has no left child
      If 2i plus 1 plus x is less than = n minus x The number of the right child node is 2i+1 The number of the right child node is 2i+2
      If 2i plus 1 plus x is greater than n minus x It has no right child It has no right child

      Notice that if you start numbering from 0 then the formula will respond

Binary Search Tree

Binary search tree (binary search tree, binary sort tree) it is either an empty tree, or a binary tree with the following properties:

  • If its left subtree is not empty, the value of any node in the left subtree is less than the value of its root node.
  • If its right subtree is not empty, the value of any node in the right subtree is greater than the value of its root node.
  • Its left and right subtrees are binary search trees respectively.

As a classical data structure, binary search tree has the advantages of quick insertion and deletion of linked list and quick search of array.

It is widely used in file systems and database systems for efficient sorting and retrieval operations.

Binary search tree elements must be comparable such as eg: int double if you specify a type, you need to specify a comparison method

For example, an ordered dynamic search data in the array [18] 3,4,6,8,9,7,10,11,14.16.

  • The worst time for binary search is O(logn) but the worst time for add and remove is O(n).
  • The worst time complexity of binary search tree to search a number, add and delete is O(logn).

3.2.1 Preorder traversal

Root node – left subtree – right subtree

Preorder traversal – recursive and iterative method implementation

3.2.2 Order traversal

Left subtree – root node – right subtree

In order traversal – recursive and iterative method implementation

3.2.3 Post-order traversal

Left subtree – right subtree – root node

Post – order traversal – recursive and iterative method implementation

3.2.4 Sequence traversal

Access each node once from top to bottom and left to right

Sequence traversal – recursive and iterative method implementation

3.2.5 Binary search tree correlation algorithm

The height of the binary tree

Leetcode – 94. Middle order traversal of binary trees

3.3 AVL Tree

In computer science, AVL tree is the first self-balanced binary search tree invented. The maximum difference in height between two subtrees of any node in an AVL tree is 1, so it is also called a height balanced tree. Additions and deletions may require one or more tree rotations to rebalance the tree.

The AVL tree takes its name from its inventors g. M. Adelson-Velsky and E. M. Landis, who published it in their 1962 paper An Algorithm for the Organization of Information.

3.3.1 Basic concepts and features

The basic concept

  • Balance factor: the height of a node and the left and right subtrees

Characteristics of AVL trees

  • Itself is first and foremost a binary search tree.

  • With equilibrium conditions: the equilibrium factor for the difference in height between the left and right subtrees of each node can only be -1, 0, 1. The absolute value is 1

  • If the absolute value of the height difference of each node is greater than 1, the tree is out of balance.

  • Search, add, delete time complexity is O(logn)

    Examples of imbalances:

    The tree on the left is a balanced binary search tree. Add a 14 at node 13

    The result is that both node 15 and parent node 18 are out of balance, and in the worst case, all the ancestor nodes are out of balance

3.3.2 Restore balance: LL- Right rotation

3.3.3 Restore balance: RR- Left rotation

3.3.4 Restoring balance: LR- First left and then right

3.3.5 Restoring balance: RL- First right then left

3.3.6 AVL summary

add

  • All ancestor nodes may be out of balance
  • As long as you get the lowest-height point back to equilibrium, the whole tree is back to equilibrium in order one adjustment

delete

  • Parent or ancestor nodes may be out of balance (only one node is out of balance)
  • After the balance is restored, the ancestor nodes at higher levels may be out of balance, so it takes at most O(logn) adjustments to cycle to the root node for detection

Average complexity

  • Search O (logn)
  • Add: O(logn) Requires only O(1) adjustment operations
  • Delete: O(logn) A maximum of O(logn) adjustments are required

3.3.6 AVL related algorithms

Problem set for related algorithms

Problem set for related algorithms

3.4 B Tree

3.4.1 Basic Concepts

To find a given keyword in a B-tree, first take the root node, which contains the keyword K1,… Kn searches for the given keyword (sequential search or binary search method can be used). If the keyword equal to the given value is found, the search succeeds. Otherwise, it must be determined that the keyword to be searched is between Ki and Ki+1, and Pi is the pointer to the child root node. At this point, the node indicated by the pointer Pi is taken to continue searching until it is found, or the search fails when the pointer Pi is empty.

B tree is a balanced multi-path search tree, which is used for file system and database implementation

  • A node can store more than two elements and can have more than two child nodes
  • It has some properties of binary search trees
  • Balanced, many subtrees of each node have the same height
  • Compared to

B tree properties

  • Assume that the number of elements stored in a node is x m as order (m>=2)
    • Number of elements: x
      • Root node: 1 <= x <= m-1
      • Non-root node: ⌈m/2⌉-1 <= x <= m-1 (⌈⌉ : rounded up)
    • If there are children, the number of children is y = x+1
      • Root node: 2 <= y <= m
      • Non-root node: ⌈m/2⌉ <= y <= m
        • M = 3, 2 <= y <= 3 so it can be called (2,3) tree, 2-3 tree
        • M = 4, 2 <= y <= 4 so you could call it a (2,4) tree, a two-three-four tree
        • .

B trees and binary search trees are logically equivalent

  • A super node can be obtained by merging multiple generations of nodes
  • The supernode merged with generation 2 has at most 4 child nodes which are at least 4th-order B-tree
  • The supernode merged with 3 generations has at most 8 child nodes and at least 8 order B trees
  • The supernode merged in N generation has at most 2^ N child nodes and at least B tree of order 2^ N

M order B tree, at most log2m generation merge

3.4.2 Add – Overflow

Properties:

  • The new element must be added to the leaf node
  • The node element must be equal to m(order)
  • Assume that the middle element of the overflowed node is K and split the child nodes
    • Merges the element at position K upward with the parent node
    • Split the elements at [0,k-1] and [k+1,m-1] into 2 child nodes.
  • After a split, it is possible to overflow the parent node and loop through the appeal method to the root node at worst

3.4.3 Delete – Underflow

  1. If you need to delete a leaf node, just delete it

  2. If you want to delete the element in the non-leaf node

    • Find out the precursor or successor nodes and cover the nodes to be deleted
    • Then delete the precursor or successor nodes
  3. Remove underflow – sibling nodes borrow elements

    • The nodes of the element must be equal to⌈ ⌉ - 2 m / 2
    • At least one sibling of an underflow node, if any⌉ ⌈ m / 2You can borrow one element
    • Inserts the parent node element into the minimum position of the underflow node
    • Replace the element of the parent node with the element of the sibling node
    • That’s the rotation
  4. Delete underflow – Merge left and right child nodes

    • If the node overflows after the number of elements of the neighboring sibling node⌈ ⌉ - 1 m / 2
    • The element of the parent node is removed and merged with the left and right child nodes
    • This operation can cause underflow to propagate all the way up

3.4.4 Animation presentation recommended

In order to better understand the B tree to recommend a website: www.cs.usfca.edu/~galles/vis…

3.4.5 B-tree correlation algorithm

Problem set for related algorithms

3.5 Red-Black TREE

Red – black tree (Red – black tree) is a self-balanced binary search, is used in computer science, a data structure, the typical use is to implement associative arrays. It was invented by Rudolf Bell in 1972 and is known as the “symmetric binary B tree”, and its modern name comes from a paper written by Leo J. Guibas and Robert Sedgwick in 1978. Red-black trees have a complex structure, but their operations have a good worst-case runtime and are efficient in practice: it can do lookup, insert, and delete in O(logn) time, where n is the number of elements in the tree.

Red-black is the equivalent of a two-three-four tree. In other words, for every 2-3-4 tree, there exists at least one red-black tree whose data elements are in the same order.

The following red-black tree will also involve some concepts of B-tree. If you have not seen the suggestions of B-tree, first scroll up to browse through the basic operations of b-tree, such as adding and deleting.

3.5.1 Properties of red-black trees

A red-black tree is a binary lookup tree where each node has a color attribute, either red or black. In addition to the general mandatory requirements for binary lookup trees, we add the following additional requirements for any valid red-black tree:

  1. Nodes are red or black

  2. The root node must be black

  3. All the leaf nodes are black. (Leaves are null nodes)

  4. Two children of each red node are black.

    • The parent of the red node must be black
    • There cannot be two consecutive red nodes on all paths from each leaf node to the following
  5. All paths from any node to each leaf node have the same number of black nodes.

All we need to do to maintain a red-black tree is satisfy these five properties.

In fact, red-black trees can also be converted to two-three-four trees, that is, fourth-order B-trees. The following operations, such as adding and deleting, will be clearer if we combine them with two-three-four trees to understand them

3.5.2 Add – Nature Maintenance

Due to the nature of B-tree, the new element must be added to the leaf node and the number of elements in the fourth-order B-tree is 1 to 3 (non-leaf nodes will be replaced by the precursor or successor nodes of the node and then deleted).

We first add the node as a binary search tree and mark it red. (If set to black, there will be an extra black node on the root to leaf path, which is difficult to adjust. However, setting it to red may result in a collision between two consecutive red nodes, which can be adjusted by color flips and tree rotation. What you do next depends on the color of the other neighboring nodes. As in the human family tree, we will use the term uncle node to refer to the siblings of a node’s parent. Note:

  • Properties one and three are always the same.
  • Property 4 is only threatened when adding red nodes, redrawing black nodes to red, or doing rotations.
  • Property 5 is only threatened when adding black nodes, repainting red nodes to black, or doing rotations.
  • Color the root node black if added

The addition of red-black trees is the following13A position I will base onColor order

Before this, we need to understand the network between nodes of a binary tree, as shown in the figure below:

Pay attention toAll new nodes default to red

 func afterAdd(_ node: Any){
     guard let n = node as? TreeNode<Any> else { return } // If node is empty
     insert_case1(n)
 }
Copy the code

Case 1:

Assume that the new node, node, follows the tree and has no parent.

2. The nature of a violation:

  • Property 2: The root node must be black

Solution:

func insert_case1(_ node:TreeNode<Any>) {
		if node.parent = = nil {
				dyed_black(node)
		}else{
				insert_case2(node)
		}
}
Copy the code

Situation 2:

Assume that the parent of the new node is black (node is red), so all RBTree properties are satisfied

2. The nature of a violation:

  • There is no

Solution:

func insert_case2(_ node:TreeNode<Any>) {
		if isBlack(node.parent) {
        return
    }else{
        insert_case3(node)
    }
}
Copy the code

Case 3:

Assume that the parent and uncle of the new node are red (the newly inserted node is red)

The following example is the new node 10 that was added

2. The nature of a violation:

  • Property 4: Two children of each red node are black.
    • The parent of the red node must be black
    • There cannot be two consecutive red nodes on all paths from each leaf node to the following

Solution:

  1. Color the parent node and uncle node black

    In this case, adding nodes to the B-tree does not meet the characteristics of the B-tree, and the middle nodes need to be merged and split upward

  2. Guard is merged and disaggregated upward

  3. And that might cause the parent node to become something that doesn’t fit the red-black tree and you just treat grand as if it’s a new node that’s added and you just do it again

    In this case, the root node was insert_case1(dyed_red(node.grand)), which dyed the root node red

func insert_case3(_ node:TreeNode<Any>) {
		 // Parent must be red
	   if isRed(node.uncle){  // Parent and uncle are red
  	     dyed_black(node.parent)
    	   dyed_black(node.uncle)
      	 afterAdd(dyed_red(node.grand))
	   }else{ // Uncle is not red or missing
  	     insert_case4(node)
	   }
}
Copy the code

Situation 4:

Suppose the parent node is red and the uncle node is black or missing,

Node is the left child of the parent node and the parent node is the right child of the Grand

or

Node is the right child of the parent node and the parent node is the left child of the Grand

2. The nature of a violation:

  • Property 4: Two children of each red node are black.
    • The parent of the red node must be black
    • There cannot be two consecutive red nodes on all paths from each leaf node to the following

Solution:

If: node is the left child of parent and the parent node is the right child of grand

Rotate the node parent to the right

If: node is the right child of parent and the parent node is the left child of grand

Rotate the node parent to the left


And then we’re going to be in case 5 and we’re just going to do what we did in case 5

Pay attention toAt this point, the node has changed and the node needs to be rotated

func insert_case4(_ node:TreeNode<Any>) {
	  // Uncle is not red or missing
    var tempNode = node
    if isLeftChild(node) && isRightChild(node.parent){ // LR
        rotate_left(node.parent)
        tempNode = node.left!
    }else if isRightChild(node) && isLeftChild(node.parent){ // RL
        rotate_right(node.parent)
        tempNode = node.right!
    }
    // Note that the node value has changed in case 5
    insert_case5(tempNode)
}
Copy the code

Situation 5:

Suppose the parent node is red and the uncle node is black or missing,

Node is the left child of the parent node and the parent node is the left child of grand

or

Node is the right child of parent and the parent node is the right child of grand

This is going to be the same thing as case 4 after rotation

2. The nature of a violation:

  • Property 4: Two children of each red node are black.
    • The parent of the red node must be black
    • There cannot be two consecutive red nodes on all paths from each leaf node to the following

Solution:

Color the node parent black and the node grand red

If: node is the right child of parent and the parent node is the right child of grand

The Node grand is rotated to the left

If: node is the left child of parent and the parent node is the left child of grand

The Node grand rotates to the right


func insert_case5(_ node:TreeNode<Any>) {
    // Uncle is not red or missing
    dyed_black(node.parent)
		dyed_red(node.grand)
		if isLeftChild(node) && isLeftChild(node.parent){ // LL
		    rotate_right(node.grand)
		}else if isRightChild(node) && isRightChild(node.parent){ // RR
		    rotate_left(node.grand)
		}
}
Copy the code

3.5.3 Delete – Nature Maintenance

Deleting nodes also maintains five properties of red-black trees

In a 2-3-4 tree (fourth-order B-tree) our deletion operation will always be at the leaf node (when deleting non-leaf nodes, we will find the precursor or successor nodes to replace and delete). So if you think about it in terms of deleting a red-black tree, it’s the same thing that deleting a node is done on a leaf node.

Note: afterRemove has been modified for nodes with degree 1 in the binary tree

AfterRemove (node) if the afterRemove is passed by non-leaf nodes; The value of node is a precursor or successor of the current node

func remove(_ n:BSNode<Any>? {
		guard var node = n else { return } // If node is empty
		size - = 1
		if node.hasTwoChild() {
		    let succeeding = successor(node)  // Subsequent nodes
		    node.value = succeeding.value
		    node = succeeding
		}
		let replacemrnt = node.left ! = nil ? node.left : node.right
		if replacemrnt ! = nil { // Node is a node of degree 1 and only appears at level 2
		    replacemrnt?.parent = node.parent
		    / /... code
		    afterRemove(replacemrnt!)}else if node.parent = = nil { // Node is a leaf node and a root node
		    / /... code
		    afterRemove(node) // Other tree processing methods
		}else{ // Node is a leaf node but not a root node
		    / /... code
		    afterRemove(node)
		}
}

/// RBTree red black tree to handle all cases after deletion
func afterRemove(_ node: Any){
		guard let n = node as? TreeNode<Any> else { return } // If node is empty
		delete_case1(n)  // Start the situation in the red-black tree
}
Copy the code

Case 1:

Assume that the child node passed in for replacement is red (the deleted node is red or the deleted node is black but degree 2)

2. The nature of a violation:

  • Property 2: None

Solution:

If 25 is deleted, it must be his precursor or successor, 17 or 33. Both nodes are red, so just black them.

func delete_case1(_ node:TreeNode<Any>) {
		if isRed(node){
	      dyed_black(node)
		    return
		}else{
		 delete_case2(node)
		}
}
Copy the code

Situation 2:

Assume that the root node is deleted

2. The nature of a violation:

  • Property 2: None

Solution:

There must be only root nodes in a red-black tree if the node is the root node.

func delete_case2(_ node:TreeNode<Any>) {
    if node.parent = = nil {
        return
    }else{
        delete_case3(node)
    }
}
Copy the code

Case 3:

Assumptions:

  • Remove the black child node
  • The child node on the right is deleted
  • The sibling of the deleted node is red

2. The nature of a violation:

  • Property 5: All paths from any node to each leaf node have the same number of black nodes.

Solution:

Dye Sibling black and parent red first

Then rotate the parent right so that the child of 55 is 88 and the left child of 88 is 76

And then we’re in case 4 or case 5 and we’re just going to do it in case 4 or case 5

// Delete the black child && delete the right child && delete the red sibling of the node
func delete_case3(_ node:TreeNode<Any>) {
   // The node has already been deleted. So you can't get it by calling Sibling directly
   // Is the deleted node left or right
   / / if the parent right = = nil Indicates that the node is deleted | | deleted if right child nodes
   let isRight = node.parent?.right = = nil || isRightChild(node)
   // Sibling nodes
   var sibling = isRight ? node.parent?.left: node.parent?.right
   if isRight { // Delete the node on the right
       // Sibling nodes are red
       if isRed(sibling) {
           dyed_black(sibling)
           dyed_red(node.parent)
           rotate_right(node.parent)
           sibling = node.parent?.left
       }
       delete_case4(node,sibling!)}else{
       delete_case5(node, sibling!)}}Copy the code

Situation 4:

Assumptions:

  • Remove the black child node
  • The child node on the right is deleted
  • The sibling of the deleted node is black
  • The children of sibling nodes are black

2. The nature of a violation:

2. The nature of a violation:

  • Property 5: All paths from any node to each leaf node have the same number of black nodes.

Solution:

Dye parent black and Sibling red

If the original parent is black, it will cause the parent to overflow. In this case, the parent should be treated as the newly deleted node.

func delete_case4(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
   // The children of sibling nodes are black
   var sibling = s
  // The left and right sides of sibling nodes are black or sibling nodes are children
   if isBlack(sibling.left) && isBlack(sibling.right) {
       let parentBlack = isBlack(node.parent)
       dyed_black(node.parent)
       dyed_red(sibling)
       if parentBlack {
           afterRemove(node.parent!)}}else{ 
     // There are red children around the sibling node
      delete_case5(node, sibling)
   }
}

Copy the code

Situation 5:

Assumptions:

  • Remove the black child node
  • The child node on the right is deleted
  • The sibling of the deleted node is black
  • Sibling nodes have red child nodes

2. The nature of a violation:

  • Property 5: All paths from any node to each leaf node have the same number of black nodes.

Solution:

And we can think about it in terms of two or three-four order B trees and the way that we’re going to do it in B trees is we’re going to do the same thing with underflows

Example 1: After the overspill, we make a right rotation and inherit the color of the rotated gold node from the color of the parent node

Case 2: After the overflow in the figure, we first make a left rotation and then a right rotation and inherit the color of the rotated golden node from the color of the parent node

Case 3: The situation shown here can be chosen to be similar to case 1 or case 2 because case 1 rotates once so we choose to deal with it in the same way as case 1.

func delete_case5(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
		var sibling = s
		if isBlack(sibling.left) {
		    rotate_left(sibling)
		    sibling = (node.parent?.left)!
		}
		if node.parent?.color = = RED {
		    dyed_red(sibling)
		}else{
		    dyed_black(sibling)
		}
		dyed_black(sibling.left)
		dyed_black(node.parent)
		rotate_right(node.parent)
}
Copy the code

Situation 6:

Assumptions,

  • Remove the black child node
  • The child node on the left is deleted
  • The sibling of the deleted node is black
  • The children of sibling nodes are black

The only difference from case 3 is whether the left or right child is deleted.

As opposed to the rotation and sibling fetch in case 4, I’ll just attach the code here.

2. The nature of a violation:

  • Property 5: All paths from any node to each leaf node have the same number of black nodes.

Solution:

// Delete the left side
func delete_case6(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
    var sibling = s
    // Sibling nodes are red
    if isRed(sibling) {
        dyed_black(sibling)
        dyed_red(node.parent)
        rotate_left(node.parent)
        sibling = (node.parent?.right)!
    }
    delete_case6(node, sibling)
}
Copy the code

Case 7:

Assumptions:

  • Remove the black child node
  • The child node on the left is deleted
  • The sibling of the deleted node is black
  • The children of sibling nodes are black

The only difference from case 4 is whether the left or right child is deleted.

As opposed to the rotation and sibling fetch in case 4, I’ll just attach the code here.

2. The nature of a violation:

  • Property 5: All paths from any node to each leaf node have the same number of black nodes.

Solution:

func delete_case7(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
		let sibling = s
		// The sibling node is black
		if isBlack(sibling.left) && isBlack(sibling.right) {
		    let parentBlack = isBlack(node.parent)
		    dyed_black(node.parent)
		    dyed_red(sibling)
		    if parentBlack {
		        afterRemove(node.parent!)}}else{
		    delete_case8(node, sibling)
		}
}
Copy the code

Case 8:

Assumptions:

  • Remove the black child node
  • The child node on the right is deleted
  • The sibling of the deleted node is black
  • Sibling nodes have red child nodes

This differs from case 5 only in whether the left or right child is deleted.

As opposed to the rotation and sibling fetch in case 4, I’ll just attach the code here.

2. The nature of a violation:

  • Property 5: All paths from any node to each leaf node have the same number of black nodes.

Solution:

func delete_case8(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
    var sibling = s
    if isBlack(sibling.right) {
        rotate_left(sibling)
        sibling = (node.parent?.right)!
    }
    
    if node.parent?.color = = RED {
        dyed_red(sibling)
    }else{
        dyed_black(sibling)
    }
    dyed_black(sibling.right)
    dyed_black(node.parent)
    rotate_left(node.parent)
}
Copy the code

3.5.4 Complexity analysis of red-black tree

AVL tree:

  • The balance criterion is strict: the height difference between the left and right subtrees cannot exceed 1 (balance factor).
  • Maximum height: 1.44 * log2(N +2) -1.328 (100W nodes, maximum height of AVL tree is 28)
  • Search, add, and delete are all O(logn) complexity, where add requires only O(1) rotation adjustments, but delete requires at most O(logn) rotation adjustments

Red and black tree:

  • The balance criteria are loose: no path is twice as large as any other
  • Maximum height 2 * log2(n+1) (100W nodes, maximum height of red-black tree is 40)
  • Search, add, and delete are all O(logn) complex, where add and remove require only O(1) rotation adjustments.

How to choose:

  • The number of searches is much greater than the number of insertions and deletions
  • Search, insert, delete almost the same number of times, choose red black tree
  • Compared with AVL trees, red black trees sacrifice the criterion of balance to recover the rotation times when the region is balanced is better than AVL trees
  • The average statistical performance of red-black trees is better than that of AVL trees, and most of them are selected in practical applications.

3.5.5 Red-black tree correlation algorithm

Problem set for related algorithms

3.6 Binary Heap

A binary heap is a special kind of heap. A binary heap is either a complete binary tree (binary tree) or a nearly complete binary tree (binary tree).

There are two types of binary heap: maximum heap and minimum heap.

Maximum heap: the key value of the parent node is always greater than or equal to that of any child node;

Minimum heap: The key value of the parent node is always less than or equal to the key value of any child node.

Application scenario: Get the maximum value in data (TopK problem)

way Get maximum value Delete maximum value Add elements
Dynamic array/bidirectional list O(n) O(n) O(1)
Ordered dynamic array/two-line associative table O(1) O(1) O(n)
Balanced binary search tree O(logn) O(logn) O(logn)
The heap O(1) O(logn) O(logn)

An important property of the heap:

  • The value of any node that is always greater than or equal to its children is called the maximum heap, the big root heap, and the big top heap
  • The value of any node that is always less than or equal to the value of the child node is called the minimum heap, the root heap, and the top heap

Due to the nature of complete binary trees, the underlying implementation of binary heap generally uses arrays

  1. The indexi(n is the number of elements)
    1. If I = 0, it’s the root node
    2. If I > 0, the index of its parent is floor((i-1)/2).
    3. If 2i+1 <= n-1, his left child has an index of 2i+1
    4. If 2i plus 1 is greater than n minus 1, he has no left child
    5. If 2i+2 <= n-1, its right child has an index of 2i+2
    6. If 2i plus 2 is greater than n minus 1, it has no right child

Recommended reading: data structure | weibo Top 10 hot search is how calculated? (Binary heap)

3.6.1 Heap-related algorithms

TopK problem

Trie tree

Also known as word search tree, Trie tree, is a tree structure, is a variant of the hash tree. It is typically used to count, sort and save large numbers of strings (but not limited to strings), so it is often used for text word frequency statistics by search engine systems. Its advantages are that it reduces query time by using the common prefix of strings, minimizes unnecessary string comparisons, and improves query efficiency over hash trees.

Dictionary items can be searched by:

(1) Start a search from the root node;

(2) Obtain the first letter of the keyword to be searched, select the corresponding subtree according to the letter, and go to the subtree to continue the search;

(3) In the corresponding subtree, obtain the second letter of the keyword to be searched, and further select the corresponding subtree for retrieval.

(4) Iterative process……

(5) At a node, all the letters of the keyword have been taken out, then read the information attached to the node, that is, to complete the search.

Other operations are similar

Advantages of Trie: The efficiency of prefix search is mainly related to prefix length

The Trie’s downside: It consumes a lot of memory

3.8 Huffman Tree

Given N weights as N leaf nodes, a binary Tree is constructed. If the weight path length of the Tree reaches the minimum, such a binary Tree is called an optimal binary Tree, also known as Huffman Tree. Huffman tree is a tree with the shortest weight path length, and the node with higher weight is closer to the root.

Example:

  • The Haverman tree is the basis of current compression algorithms
  • Suppose you want to take the string ABBBCCCCCCCCDDDDDDEE
    • Can be converted to ASCII code (6569, 10000011000101), but it can be very long. How can it be short?
    • The convention letters correspond to binary A: 000, B:001, C:010, D:011, E:100
    • Binary code: 000 001001001 010010010010010010 011011011011011 100100
    • Twenty letters were converted into sixty bits
  • This can be compressed to 41 bits using Huffman trees
    • Calculate the frequency of each letter A:1, B:3, C:8, D:6, E:2
    • These weights are used to construct a Haverman tree
  1. How to build a Haverman tree?

    1. N binary trees were constructed with weight values as root nodes to form a forest
    2. In the forest, two trees with the smallest root nodes are selected and combined as the left and right sub-trees of a new tree, and the root node of the new tree is the sum of its left and right sub-root nodes
    3. Delete the two selected trees from the forest and add the new tree to the forest
    4. Repeat 2 and 3 until the forest becomes a tree.

  2. Left is 0 and right is 1, which gives you five letters

    A: 1110, B:110, C:0, D:10, E:1111

    Huffman encoding is: 11110110110110000000001010101010101111

  3. Conclusion:

    1. The Huffman tree with n weights has that leaf node

    2. Each Huffman code is not a prefix to another Huffman code

    3. Huffman tree is a tree with the shortest weight path length, and the node with higher weight is closer to the root node

      Weighted path length: The total length of the final Huffman code is proportional to the weight of all leaf nodes in the tree multiplied by the length of their path to the root node.

algorithm

.

.

Reference:

  • Love data Structures and Algorithms season 1
  • Blog.csdn.net/yt618121/ar…
  • Blog.csdn.net/yt618121/ar…
  • baike.baidu.com
  • Juejin. Cn/post / 684490…
  • www.wikipedia.org
  • www.conardli.top/docs/dataSt…
  • Data structure | weibo Top 10 hot search is how calculated? (Binary heap)