List of data structures

One, the introduction

Definition:

  • Data: a symbolic representation of objective things that can be entered into a computer and processed
  • Data element: A basic unit of data, containing many data items
  • Data item: the smallest unit of data that has independent meaning and is not divisible
  • Data object: is a collection of data elements of the same nature, is a subset of data
  • Data structure: A collection of data elements that have some relationship to each other
  1. Logical structure: The logical relationships between data elements, which is how data structures are presented to the user. There is a one-to-one relationship between data elements, in addition to the first node and end node each node has a precursor and a subsequent [linear list, stack, queue, orderly table 】 forced the tree structure: there is a one-to-many relationship between data elements, each node as much as a precursor, more nodes following forced the tree, binary tree, forest 】 【 graphical structure: There is a many-to-many relationship between data elements, and each node has multiple precursor and successor nodes [connected graph, strongly connected graph, Euler graph, dense graph, sparse graph]

  2. Storage structure: The way data elements and their relationships are stored in computer memory (data structure) ▪ Sequential storage structure: The logical contiguous nodes are also physically stored consecutively, using arrays to achieve the sequential table advantages: Random access, high storage utilization disadvantages: Each insert, delete, need to move a large number of nodes forced the chain store structure: does not require the logically adjacent nodes in physics are adjacent, multi-purpose pointer implementation [list] advantages: insert, delete, convenient, only need to modify the pointer field faults: storage space utilization rate is low, cannot undertake random access forced the index storage structure: Is usually in the storage nodes information at the same time, also indexed tables, index entries of the form of a keyword, address) (advantage: the search speed, combined with linear structure after may conduct random access, insert, delete, you just need to move the index table storage address of the corresponding node faults: clues binary tree 】 【 space utilization lower forced the hash storage structure: You can calculate the storage bit address of a node based on the keyword of the node to be queried. Advantages: Faster query speed Disadvantages: This method is only applicable to scenarios that require quick data search and insertion

  3. Operation on data. A logical structure that maps to a variety of storage mechanisms

    · Data type: it is the general term for a set of values of the same nature and a set of operations defined on this set. It is a data knot that has been implemented in a programming language

    1. Unstructured atomic types: undecomposible, such as primitive types (integer, real, character, enumeration), pointer types, null types in C
      1. Type of structure: Its value is composed of several components of a certain structure, which may be structural or non-structural. In a sense, a data structure can be viewed as “a set of values with the same structure”, and a structure type can be viewed as consisting of a data structure and a set of operations defined on a mount. Data structure = Data elements + Relational data types = Data structure + operations
      2. Abstract data type (ADT) : The logical data structure and operations on the logical data structure abstracted from the mathematical model of the problem, regardless of the specific storage structure of the computer and the specific implementation algorithm of the operations.
Cases of ADT, Complex {data objects: D = {e1 and e2 | e1 and e2 are real Numbers} data relationship: R = {< e1 and e2 > | e1 is real part of Complex, e2 is imaginary part of Complex} basic operations: AssignComplex (& z, v1, v2) : DestroyComplex(&z): GetImag(z, &imag): GetImag(z, &imag): GetImag(z, &imag); Use the imag variable to get the imaginary part Add(z1, z2, &sum): Use the sum variable to get the sum of the Complex numbers z1, z2} ADT ComplexCopy the code
  1. Algorithm: A description of a sequence of steps for solving a particular problem. The algorithm is designed for correctness, readability, usability, robustness, efficiency, and low storage requirements

Time complexity: O (1) < O (㏒ ₂ n) < O (n) < O (n ㏒ ₂ n) < O (n squared) < O (n) after < O (2 ^ n) < O (n!)

Two, linear table

Definition:

Is a finite sequence of data elements with the same properties

ADT List {data objects: D = {a (I) | 1 < = I < = n, n > = 0, a (I) for ElemType type} data relationship: R = {< a (I), a (I + 1) > | a (I), a (I + 1) ∈ D, I = 1,... ElemType struct {ElemType data[MaxSizr]; int length; } SQlist; Typedef struct LNode {ElemType data; struct LNode *next; } LinkList; Typedef struct DNode {ElemType data; struct DNode *prior; struct DNode *next; } DLinkList;Copy the code
  1. Operations on its storage structure (initialization, add, delete, change, find, destroy), static lists, circular lists, and the operation and difference between the leading node and the non-leading node lists.
  2. To delete or insert a node in a linked list operation, you need to find the precursor and successor of the node (which (not) lead and tail nodes are best matched with various linked lists).
  3. The purpose of having a head node in a singly linked list is to simplify the algorithm for inserting and removing nodes
  4. P ->next==L (Lw is the pointer to the head node)

Stack and queue

Stack definition:

Is a special linear table, also known as lifO table

ADT Stack {data objects: D = {a (I) | 1 < = I < = n, n > = 0, a (I) for ElemType type} data relationship: R = {< a (I), a (I + 1) > | a (I), a (I + 1) ∈ D, I = 1,... Typedef struct {ElemType data[MaxSize]; int top; } SqStack; Typedef struct linknode {ElemType data; linknode *next; } LiStack;Copy the code

How does the top pointer change in its storage structure (initialize, add, delete, change, find, destroy), each time it enters or leaves the station? How do arithmetic expressions become postfix expressions? What about prefix expressions?

Rule: In a certain order, each element is numbered from front to back from 1. The sequence number of the elements that are smaller than the sequence number of the element must form a descending sequence

1 2 3 4 1 4 3 2 Correct 1 4 2 3 wrong push a B C D push a D C B push a D B push CCopy the code

Cattelan number:

If there are n elements, numbered from 1 to N, that enter a stack sequentially, the possible order of exit isKind of

Attached: Equivalent to the following problems

  • The number of unstack sequences with n different elements pushed onto the stack;
  • The number of binary trees with different forms formed by n different elements;
  • The number of binary trees in which n different elements are preordered

Top == MaxSize-1 The top pointer always points to the current element

Calculate suffix expression orally:

Convert the infix expression “5+2×(1+6)-8/2” into a suffix expression, add parentheses “((5+(2×(1+6)))-(8/2))” to the expression according to the operation sequence, replace the closing parentheses with the operator before the left parentheses, and remove the left parentheses to obtain the suffix expression “5216+×+82/-“.

Recursive queen problem? Maze problem? Hanoi? Fibonacci numbers? Is it non-recursive?

Definition:

Is a special linear table, also known as fifO table

ADT Stack {data objects: D = {a (I) | 1 < = I < = n, n > = 0, a (I) for ElemType type} data relationship: R = {< a (I), a (I + 1) > | a (I), a (I + 1) ∈ D, I = 1,... Typef struct {ElemType data[MaxSizr]; int front, rear; } SqQueue; Typedef struct {ElemType data[MaxSize]; int front, count; } QuType; Typedef strcut qnode {typedef struct {ElemType data; QNode * font; struct qnode *next; QNode * rear; } QNode; // Queue data node definition} LiQueue // queue definitionCopy the code

Operations on its storage structure (initialization, add, delete, change, search, destroy), count solving problems, maze problems

Sequential storage structure:

  • Acyclic queue:

    Rear == front

    Rear == MaxSize-1

    Initial conditions: Front == rear == -1

    The head and tail Pointers point to the current head and tail elements

    Number of elements: rear-front

  • Circular queue:

    Rear == front

    (rear+1)%MaxSize == front

    Initial conditions: Front == rear == 0

    The head and tail Pointers point to the current head and tail elements

    Number of elements: (rear-front+MaxSize)%MaxSize

  • The overflow

    Overflow: when a team is full, it still enters the team

    Overflow: the team still plays when the team is empty

    False overflow: empty position in the team, but rear == maxsize-1 determines that the team is full

The criteria for chain storage structure are the same as those for linked lists

4. Trees and binary trees

Tree definition:

Is a nonlinear structure consisting of a finite set of N (n>=0) nodes with one and only one root node

ADT List {data objects: D = {a (I) | 1 < = I < = n, n > = 0, a relationship (I) for ElemType type} data: R = {< a (I), a (j) > | a (I), a (j) ∈ D, 1 < = I < = n, 1 < = m < = n, where each element is only a precursor node (grubbing node), have zero or more nodes} following basic operations: slightly; }Copy the code
Tree representation: tree representation, Venn diagram, concave representation, bracket representationCopy the code

Properties:

  1. Number of nodes of tree species = sum of degree of all nodes + 1

    Proof: In a tree, every node has one and only one precursor node (branches)(minus the root node), so the sum of degrees of all nodes (branches), plus the root node without precursors, is all nodes of the tree

  2. In the tree of M, the number of nodes at layer I is m^(i-1) at most (I >=1). It is proved that: when each node is full, the number of nodes at layer I is m times of that at layer I +1

  3. The m-degree tree of height h has at most (m^h-1)/(m-1) nodes (the height of the tree of full m-degree is the smallest).

  4. The minimum height of m-degree tree with n nodes is ⌈log(m)[n*(m-1)+1]⌉ Set the height of the tree is h, if the first h – 1 layer of the tree is full, the last layer can be filled with resentment, the tree has a minimum height with inequality [m ^ (h – 1) – 1)/(m – 1) < n < = [m] ^ h – 1 / (m – 1) h = ⌈ the log (m) [n * (m – 1) + 1] ⌉

  5. Leaf node: N (0) = N (2) + 2× N (3) +…… + (m – 1) * n (m – 1) + 1

The storage structure of the tree:

Typedef struct {ElemType data; typedef struct {ElemType data; int pos; } STree[MaxSize]; Girder to make full use of the properties of each node only parents, find a node of the parents is easy, but find a child node need to traverse the entire structure children linked storage structure: according to the most generous to set a child node of the tree pointer to the number of domains, solved the child nodes of each node pointer field number increasing complicating algorithm. typedef struct tnode { ElemType data; struct tnode *child[M]; } CTree; This storage structure is more suitable for binary trees, quadtrees, etc., but it is a waste of storage space for other trees. Typedef struct cbNode {ElemType data; struct cbnode child; struct cbnode brother; } CBTree; The storage structure is to convert the tree into a binary tree storage structure, so the biggest advantage is that it can easily realize the conversion between the tree and the binary tree, but it is very troublesome to find the parent node, need to start from the root node searchCopy the code

The concept of binary trees:

Definition: A binary tree is another tree structure in which there are no nodes with a degree greater than 2, and the subtrees of the binary tree have left and right divisions that cannot be reversed (as in a tree with a degree of 2).

!!!!!!!!! Full binary tree, complete binary tree, binary sorting tree, Huffman tree, balanced binary tree, B+ tree, B- tree!!

Properties:

  1. Number of leaves in a non-empty binary tree = nodes of degree 2 + 1
  2. A non-empty binary tree has at most 2^(k-1) nodes at the k-layer (k >= 1)
  3. A binary tree of height H has at most 2^ h-1 nodes (h >= 1)
  4. Hierarchical traversal of complete binary trees numbered 1,2… , n, the relationship is as follows: a. When I > 1, the parent node of node I is numbered ⌊ I /2⌋. That is, when I is an even number, the parent node is numbered I /2 and the node is the left child node. When I is odd, its parent node is (i-1)/2 and the node is the right child node B. When 2When I <= n, the number of the left child of node I is 2I, otherwise there is no left child node c. when 2When I +1 is less than = n, the number of the child on the right of node I is 2I + 1, otherwise no right child node d. node’s level (depth) I to ⌊ log (2) (n + 1) ⌋ or ⌊ log (2) (n + 1) ⌋
  5. N different nodes can construct binary trees with different C(n)(2n)/(n+1)[Cattelan number]

Binary tree storage structure:

Sequential storage structure: complete binary tree, full binary tree is more suitable, the number of nodes in the tree can uniquely reflect the logical relationship between nodes, not only save space, but also use array subscript to determine the location of nodes in the tree and logical relationship

Typedef struct bitnode {ElemType data; typedef struct bitnode {ElemType data; struct bitnode *lchild; struct bitnode *rchild; } BiTree;Copy the code

Binary tree traversal:

Order traversal (center, left, right), order traversal (left, center, right), order traversal (left, right, center) Ps. These three traversal recursive and non-recursive algorithms will, “first + middle/then + middle/level + middle” can uniquely identify a binary tree

  • Clue binary tree:

    Objective: To speed up the speed of finding the precursors and successors of nodes (traditional chain storage can only reflect a parent-child relationship, but can not reflect the precursors and successors of similar linear tables in traversal)

Typedef struct threadNode {ElemType data; typedef struct threadnode {ElemType data; struct threadnode *lchild; struct threadnode *rchild; int ltag, rtag; } ThreadNode; Ltag = 0 lChild indicates the left child of the node. Rtag = 0 rChild indicates the right child of the node. Lchild indicates the precursor of the nodeCopy the code

Question:

  1. The forerunner node for a sequential traversal of a node in the online sequential clue binary tree could not be found

  2. The subsequent nodes of the sequential traversal of a node in the sequential clue binary tree cannot be found because the parent of the node is not known. There are no Pointers to the parent in the binary list

    Ps. How to structure the idea to be able to say clearly

Given a tree can uniquely identify a binary tree, binary tree conversion to tree or forest is unique tree traversal = forest traversal = binary tree traversal, tree traversal = forest traversal = binary tree traversal

  • Huffman tree: definition: from the root node of the tree to any node in the path length (after the number of edges) and the node on the product of the weight, the notes for the node weighted path length (WPL), the sum of all the leaf node weighted path length is called the weighted path length of the tree, when weighted path length minimum “binary tree” is called a Huffman tree, also known as the optimal binary tree prefix code: No code is a prefix to another code (by default, 0 is the left subtree, 1 is the right subtree, and vice versa). A Huffman tree has two nodes at each level (except the root node), and the number of 0s and 1s in the Huffman code is considered to be the sum

Example: The Huffman code designed according to the use frequency of 5 characters cannot be 00,100,101,110,111 cause: There are 4 nodes with path number 3, so it must be generated by 2 nodes with path number 2. There is another node with path number 2 in the sequence, and there are 3 nodes with path number 2 in total, which does not conform to the definition of Huffman tree

[Ps. The following are tree table searches. When inserting and deleting operations, there is no need to move records in the table to reduce the extra time overhead.]

  • Binary sort tree (BST) : definition: Also known as binary search trees, the characteristic is left subtree node value root node < < right subtree node values, recursive implementation each subtree is BST BST as a dynamic collection, its characteristic is the structure of the tree is usually not a generation, but in the process of finding, when when there are no keywords in the tree and then to insert, when inserted sequence is orderly, Height of BST Calculation of the average length of the maximum successful search, calculation of the average length of the search failure (mainly depends on the height of the tree O(log(2)(n)) ~ O(n))

  • Balanced binary tree (AVL) : Objective: To prevent the height of binary tree from growing too fast and improve the performance of binary sorting tree (AVL is also a binary sorting tree, which should be distinguished from complete binary tree) features: The height difference between the left and right subtrees of any node is -1, 0 and 1, and each adjustment of AVL is based on the out-of-range balance factor nearest to the inserted node (LL, RR, LR, RL).

Ps. The minimum number of nodes in a balanced binary tree of height N: N0=0, N1=1, N2=2, Nn= nn-1 + nn-2 + 1 The maximum number of nodes: 2^ n-1

  • B-tree: Definition: Also known as multi-path balanced binary tree, it is a very effective data structure for organizing and maintaining an external file system. Characteristics: The maximum number of children of all nodes in the tree is called the order of b-tree, usually expressed by M. An M-order B-tree is either an empty tree or a m-degree tree that meets the following requirements:

    • All leaf nodes are in the same layer and carry no information
    • Each node in the tree has at most m subtrees (that is, at most m-1 keywords)
    • If the root node is not a terminal node, the root node has at least two subtrees
    • In addition to root nodes, other non-leaf nodes have at least ⌈m/2⌉ subtrees (i.e., contain at least ⌈m/2⌉-1 keywords)
    • Except the root node, the number of keywords on other non-leaf nodes is ⌈m/2⌉-1 <= n <= m-1
    • Operation: add, delete, change, build (similar to squeeze toothpaste, less as usual add, more on the squeeze, first to determine the keyword limit of each node)
  • B+ tree: Definition: Same as above:

    • Each branch node has at most m subtrees

    • The root node either has no subtrees, or at least two

    • Except the root node, other branch nodes have at least ⌈m/2⌉ subtrees

    • There are n subtrees that have n keywords

    • All leaf nodes contain all keywords and Pointers to corresponding records, and leaf nodes are sized by keyword

      Sequential chaining: All branch nodes contain only the largest keyword in their children and Pointers to their children


Five, the figure

Definition:

  • Graph: Graph G consists of two sets V(finite non-empty set of vertices) and E(finite set of vertex edges), denoted as G=(V, E).
  • Undirected graph: In graph G, the set E is unordered
  • Directed graph: In graph G, the set E is ordered
  • 2. A complete graph with n(n-1)/2 edges and a complete directed graph with N (n-1) edges
  • Endpoints and adjacent points: an edge (I,j)< I,j>, where I and j are adjacent points to each other and are also two endpoints of this edge
  • Degree of a vertex: In an undirected graph, the number of edges of a vertex is called the degree of the point. A directed graph divides the degree of the vertex into the degree of the point. The degree of the vertex + the degree of the point = degree.
  • Subgraph: A graph consisting of a set of vertices and edges that are subsets, respectively
  • Path and path length: In a graph, a path from vertex I to vertex J is a sequence of vertices, and the number of edges traversed on a path is the path length.
  • 1. An undirected connected graph with only one path between any two vertices.
  • 2. A path that begins and ends at the same point.
  • Connected, connected graph, connected component: in undirected graph G, if there is a path from vertex I to vertex J, then I to j is connected; If any two vertices in the graph are connected, G is called a connected graph; otherwise, it is not a connected graph. The maximally connected subgraph of an undirected graph G is called the connected component of G.
  • Strongly connected graph and strongly connected component: In directed graph G, if any two vertices I and j are connected, graph G is strongly connected, otherwise it is not strongly connected. The maximally connected subgraph of a directed graph G is called the strongly connected component of G.
  • Dense graph, sparse graph: when a graph is close to complete graph, it is called dense graph; When a graph contains fewer edges, it is called sparse (e << n(n-1))

Properties:

  1. Let the number of vertices I in the graph be n(I), and n(0) = 0 for the undirected connected graph
  2. If an undirected connected graph has n vertices and e edges, and the degree of all vertices is less than m, then:
    • N = n (m) +… + n(2) + n(1) + n(0)
    • The sum of all the vertices is equal to 2 times e
    • Sum of all vertex degrees = m * n(m) +…… + 2 * n(2) + 1 * n(1)

Diagram storage:

Adjacency matrix

It is often used in undirected dense graph, which saves a lot of space by using the compressed storage of matrix. What is stored is edge information [how to see the degree, degree and degree of a vertex, the relationship between the number of edges and the number of vertices, which is convenient and which is not convenient to find?

    				#define MaxVertexNum 100typedef char VertexType; // Typedef int EdgeType; // typepedef struct {VertexType Vex[MaxVertexNum]; EdgeType Edge[MaxVertexNum][MaxVertexNum]; Int vexnum, arcnum; // the current number of vertices and arcs} MGraph;Copy the code

Features:

  1. The adjacency matrix of an undirected graph is a symmetric (and unique) matrix, so the upper (lower) triangular matrix is usually stored in storage

  2. For the adjacency matrix of an undirected graph, the number of non-zero (or non-infinite) elements in the ith row (or column) is exactly the degree of the ith vertex

  3. For the adjacency matrix of a digraph, the number of non-zero (or non-infinite) elements in the ith row (or column) is exactly the out (or in) of the ith vertex

  4. Storing graphs with an adjacency matrix makes it easy to determine whether any two vertices have edges between them, but to determine how many edges there are you have to traverse the entire matrix

  5. Dense graphs are suitable for storage using adjacency matrices

Adjacency list

It is often used for directed graphs, using sequence list + linked list to save a lot of space, and store the information of nodes [how to see the degree of a vertex, degree and degree, the relationship between the number of edges and the number of vertices, which is convenient and which is not convenient to find?

    						#define MaxVertexNum 100Typedef struct ArcNode {// int adjvex; Struct ArcNode *next; } ArcNode; Typedef struct VNode {// VertexType data; ArcNode *first; } VNode, AdjList[MaxVertexNum]; typedef struct { AdjList adjlist; Int n, e; // The number of vertices and edges} AGraph;Copy the code

Features:

  1. If the undirected graph, then the required storage space for O (| V | | | E + 2) “appeared twice each edge”. If as a directed graph, the required storage space for O (+ | | V | E |)

  2. Adjacency lists are suitable for storing sparse matrices

  3. Using adjacency list to store graphs, it is easy to determine all adjacent edges of any vertex, but to determine whether there are edges between two points, we need to jump to another node to find the corresponding node

  4. In directed graphs, a row in the adjacency list represents the degree of the vertex, and the degree of entry requires traversing the adjacency list

  5. Adjacency list representation is not unique, because in the process of building a table, the link order of each edge node is arbitrary, depending on the table building algorithm and edge input order

    • Curb table

      Another chain storage structure for digraphs

    • Another chain storage structure for multiple adjacency matrix undirected graphs

Graph traversal:

  • Breadth-first traversal (BFS)

    Binary tree-like hierarchical traversal (using queues)

  • Depth-first traversal (DFS)

    Sequential traversal similar to binary trees (using stacks)

  • Minimum spanning tree

    An undirected graph without a loop is called a tree graph (containing N vertices and N edges). In a weighted graph, the tree graph with the lowest weight is called a minimum spanning tree, and the tree graph is not unique (for obvious reasons).

    • She algorithm (Prim) a constructive algorithm, each time from the edge of the current vertex the option value of the minimum connections, and then the next vertex connected in the same method to select, loop, which is not allowed to appear as a vertex when there is no other vertices can be connected, select back to choose a vertex, until it joined the n vertices Suitable for dense figure, Because the time complexity O(n^2) is independent of the number of edges

    • Kruskal algorithm

      One is to construct a minimum spanning tree by selecting the appropriate edge in increasing order of weight, starting with the edge with the least weight, and selecting the second-smallest edge if a loop is formed, until n-1 edges are added

      Suitable for sparse graphs because the time complexity O(e*log(2)(e)) is independent of the vertex tree

  • The shortest path

    • Dijkstra algorithm

      Also known as single-source shortest path algorithm (i.e. the shortest path from any vertex to other vertices without negative weight edge) process:

      1. Select a vertex I and put the weights from vertex I to other vertices into array dist[]. Set the weights that cannot be reached to infinity.

      2. Select the vertex J with the lowest weight from vertex I;

      3. Compute the weights from j to the remaining vertices;

      4. Compare the weights from J to the other vertices with those from I to the other vertices, and update dist[] if the weights are smaller;

      5. Repeat until all vertices have been added

      The time is order n^2.

    • Freud’s algorithm (Floyd)

      Also known as multi-source shortest path algorithm (i.e. shortest path per pair of vertices)

    • AOV network and topological ordering unlooped directed graph, called directed acyclic graph, in which the vertex is called a topological sequence (that is, to reach the vertex must first pass through all the precursor vertices of the vertex), such a directed graph is called AOV network in a directed graph to find a topological sequence called topological sorting process: 1. Select a vertex with no precursor (i.e. input 0) from the directed graph and output it; 2. Delete the vertex and all edges starting from the vertex from the net. 3. Repeat the above steps until there are no vertices with 0 degree of entry in the remaining network (i.e. the remaining ones are rings or there are no vertices). The topological sequence generated by topological sorting is not unique and the time complexity is O(n + E).

    • AOE network and critical path


6. Generalized tables

Definition:

  • The generalized table is a finite sequence of n (n>=0) elements, GL=(A1, A2, A3… , an)

  • Usually expressed as:

    The sample explain
    A = (#) A is an empty table. ‘#’ does not represent any generalized table element. The length of the table is 0, the depth of the table is 1, and there is no header or tail
    B = (e) B is a table with a single element E, with a length of 1 and a depth of 1
    C = (a, (b, c, d)) C is a table with one atom and one subtable of length 2 and depth 2
    D = (A, B, C) = ((#), (e), (a, (b, c, d))) D is a table with three child tables of length 3 and depth 3
    E = ((a, (a, b), ((a, b), c))) E is a table with a subtable of length 1 and depth 4
  • Generalized table depth: refers to the multiplicity of parentheses contained in the generalized table

  • Header: The first element of a generalized table, A1, is called the header. For example, B has the header e and D has the header (#).

  • Table tail: A child table consisting of elements other than the header of a generalized table is called a table tail. For example, the table tail of B is (#) and the table tail of D is ((e), (a, (B, C, D)). The table tail is always a generalized table. Head [GL] = a1, tail[GL] = (a2, A3… , an)

Storage structure of generalized tables:

Generalized tables generally use chain storage structure, mainly with child sibling storage structure [generalized tables can be regarded as a tree, and then stored as a binary tree]

typedef struct GLNode { int tag; // 0: atomic node; 1: struct GLNode *link; // A pointer to a subsequent element or sibling union {ElemType data; Struct GLNode *sublist; // pointer to the first element} val; } GLNode;Copy the code

Relationships between generalized tables, arrays, and linear tables:

  • An array is a sequence of finite data elements with the same properties, and each element has a unique subscript qualification. A multidimensional array can be regarded as a linear table in a linear table
  • A generalized table is a sequence of finite data elements, each element can be an atom or a subtable, and the elements are also linear
  • A linear table is a set of finite data elements with the same properties, and there is a linear relationship between the elements. Array and generalized table are the generalizations of linear table

Seven, find

Operations on lookup tables typically have the following:

  1. Queries whether a particular data element exists in the lookup table
  2. Retrieves various attributes of a particular data element that satisfy the criteria
  3. Inserts a data element into the lookup table
  4. Removes a specific data element from the lookup table

Static lookup table

  • In order to find

    Also known as linear search, mainly used for the “ordered” and “unordered” search of keywords in a linear table

    Average length of successful lookups in unordered lookup tables :(n+1)/2 average length of failed lookups :(n+1)

    Average length of successful lookups in ordered lookup tables: (n+1)/2 Average length of failed lookups: n/(n+1)

  • Split search, also known as binary search, only applies to the ordered list, chain storage structure can not be used

    Average length of successful lookups: [log(2)(n+1)]

    Average lookup length of lookup failures :(n+1)

    Ps. The average search length will be calculated by looking at the graph

  • Block to find

    Also called indexed sequential search, that piece of internal disorder, orderly between blocks, each of the key word is less than the second largest piece of all of the keywords, keyword indexing table by each one of the biggest and the base of the block in the table below Length is n the lookup table of uniform can be divided into b block, each have a s a record, in the case of equal probability When the block in the table and index USES sequential search: Average length of a successful search: (s^2 + 2 x S + n)/(2 x s) If s = SQRT (n), the minimum value is SQRT (n)+1 When the index table is half-indexed: Average length of a successful search: ⌈log(2)(b+1)⌉ + (s +1) / 2Copy the code

Dynamic lookup table

  • Binary sort tree lookup

    B tree and B+ tree (see “Binary tree traversal” above)

  • Hash tables (both dynamic and static lookup tables)

    Also known as hash function, mainly through the hash function to construct the storage address:

    1. Direct addressing: H(key) = a

    2. H(key) = key%p

    3. Number analysis method: select a number of more evenly distributed digits as the hash address

    4. Square take middle method: take the key word square several bits as hash address

    5. Fold method: divide the keyword into several parts with the same number of digits, and then take the sum of these parts as the hash address

    6. Random number method: H(key) = random(key)

    Address conflict handling methods:

    1. Open address method: take the conflicting hash address as the independent variable, through some hash conflict function to get a new free hash address

      • Linear detection method: H(key) = (H(key)+ I)%m 0<= I <=m-1
      • Square detection method: H(key) = (H(key)+d)%m d=d+ I ^ 20 0<= I <=m-1
      • Double hash: H(key) = (H(key)+ I *H(key))%m
    2. Zipper method: all synonyms are linked together in a single linked list. Each cell of the hash table stores the pointer to the head of the corresponding single linked list of synonyms

      Loading factor: n/m(n is the total number of elements to be filled, and m is the number following %). The average search length of a hash table depends on the size of the loading factor (the number of records in the table/the length of the table), not directly on the number of records in the table or the length of the table. The larger the loading factor, the greater the possibility of conflicts. How to fill in each method form? Average number of successful match searches? Average number of failed lookups? _Copy the code

8. Internal sort

Stability of sorting: If there are multiple elements with the same keywords in the table to be sorted and the relative order of these elements with the same keywords remains unchanged, the sorting is stable; otherwise, it is unstable

L[] represents a table and L() represents a tuple

Insert sort:

The idea is to insert one record to be sorted into the previous sorted subsequence by its keyword size at a time until all records have been inserted

  • Direct insertion sort

    1. Find L(I) insert position k in L[1…… i-1]
    2. Move all elements in L[k… i-1] back one place
    3. Copy L(I) into L(k)

    Space complexity: only constant auxiliary units are used, O(1)

    Time complexity: O(n^2)

    Stability: Because the elements are inserted from back to front, the relative positions of the elements do not change, belonging to stable sorting

    Applicability: Suitable for sequential storage and chain storage

    Features: When the initial elements are in reverse order, the execution efficiency is the worst. The closer to positive order, the higher the execution efficiency

  • Split insertion sort

    1. The first step is to find the position of the element to be inserted

    2. Move other elements uniformly

    3. Insert elements

    Space complexity: only constant auxiliary units are used, O(1)

    Time complexity: O(n^2)

    Stability: Belongs to stable order

    Applicability: Suitable for sequential storage and chain storage

    Features: Only on the basis of direct insertion sort, the number of keyword comparison is reduced, and the number of element movement is unchanged

  • Hill sort (reduced incremental sort)

    1. Take a step size d(1) less than n and divide all the records in the table into groups D (1)

    2. All the records with a distance of multiple of D (1) were placed in the same group, and direct insertion sorting was conducted between the groups

    3. Select the second step length d(2) < d(1) and repeat the above two steps

    4. All the way up to d is 1, and then we do direct insertion sort

    Space complexity: only constant auxiliary units are used, O(1)

    Time complexity: O(n^1.3) Stability: Records of the same keyword are divided into different sub-tables, and their relative order will change, which is unstable sorting. Applicability: Only applicable to sequential storage Features: The efficiency is highest when the sequence is in positive order, and the efficiency is worst when the sequence is in reverse order

Swap sort:

  • Bubble sort

    1. Starting with the first element, compare the values of adjacent elements from front to back
    2. If L[i-1] > L[I], the values of the two are swapped
    3. Repeat with the second element
    4. You go all the way to the last two elements, and you’re done sorting

    Space complexity: only constant units are used, O(1)

    Time complexity: O(n^2)

    Stability: Belongs to stable order

    Applicability: Suitable for sequential storage and chain storage

    Characteristics: The closer the sequence is to the positive sequence, the higher the efficiency

  • Quick sort

    The improvement of bubble sort based on divide-and-conquer is the algorithm with the best average performance among all internal sorts

    1. Select a benchmark pivot from the table to be sorted, and divide the table into two parts, L[1…… k-1] and L[K +1…… n]
    2. The elements are constantly compared with pivot so that all elements in L[1… k-1] are less than L[k+1… n], and pivot ends up in L(k).
    3. At the end of one quicksort, a new benchmark pivot is selected in each of the two points for the next sort
    4. Repeat the process until there is only one element or nothing in each part

    Spatial complexity: stacks are used, O(n) at worst, O(log(2)(n) on average) time complexity: Partitioning algorithm related, O(n^2) at worst, O(n*log(2)(n) at best) stability: In the partition algorithm, when there are two same keywords in the right interval, their relative order will change after a quick sort, belonging to unstable sorting applicability: only applicable to order storage characteristics: the more random distribution to be sorted, the higher the efficiency, the more ordered, the lower the efficiency

Selection sort:

  • Simple selection sort

    1. Divide the array into two parts L[0…… i-1] ordered and L[I…… n-1] unordered, and all the keywords in the ordered region are smaller than those in the unordered region
    2. Select the smallest keyword L(I) in the unordered region L[I…… n-1]
    3. Add L(I) to L[0…… i-1] so that L[0…… I] becomes ordered
    4. Repeat the process until the unordered area has no keywords

    Spatial complexity: only constant auxiliary units are used, O(1) Time complexity: O(n^2) Stability: The relative positions of elements in the ordered region will change due to the size of the ordered region, belonging to unstable sorting. Sorting efficiency has nothing to do with the data to be sorted, because the minimum keyword is selected for each comparison and the comparison times remain unchanged

  • Heap sort

    Each node whose keyword is no less than its child node is called a “big root heap” and conversely a “small root heap”

    1. The initial sequence of keywords to be sorted (L1,L2… .ln) is constructed into the big top heap, which is the initial unordered region;
    2. By exchanging the top element L[1] with the last element L[n], a new unordered region (L1,L2… Ln-1) and a new ordered region (Ln), and L[1,2… n-1]<=L[n];
    3. Since the new heap top R[1] after the swap may violate the heap properties, it is necessary for the current unordered region (L1,L2… Ln-1) is adjusted to a new heap (heapified, filtered), and L[1] is again swapped with the last element of the unordered region to obtain a new unordered region (L1,L2… .ln -2) and the new ordered region (Ln-1,Ln). This process is repeated until the number of ordered elements is n-1 (the last element left in the heap), and the sorting process is complete.

    Spatial complexity: sort in place, auxiliary space for heapization (also known as filtering), O(1) Time complexity: O(nlog(2) N) Applicability: suitable for sequential storage, chain storage (complete binary tree of sequential storage) Characteristics: sorting efficiency has nothing to do with the data to be sorted, a minimum (large) keyword will be selected every time

Merge sort:

The sequence is regarded as n ordered tables of length 1. In each round, the adjacent ordered tables are combined and sorted during the merging process

Spatial complexity: O(n) using recursion

Time complexity: O(nlog(2)n)

Stability: Stable sort

Applicability: Suitable for sequential storage

Characteristics: Sorting efficiency has nothing to do with the data to be arranged, and the ordered region generated in each round is not necessarily the global order

Radix sort:

Instead of comparing the size of keywords, sort by the values of each of the keywords (ones, tens, hundreds…).

Spatial complexity: O(n+r) using recursion

O(d(n+r))

Stability: Stable sort

Applicability:

Characteristics: The ordered region generated in each round is not necessarily the global order

Ps. Bubble sort, heap sort, and simple selection sort generate global ordered regions

Hill sort, direct insertion sort, quick sort and merge sort do not produce global ordered region; Quicksort can return one element per trip; Radix sort can be globally ordered or globally unordered

Comparison chart of sorting algorithms: