Graph of data structure and algorithm

0x01 Basic Concepts

The core concept

Photo: a set of nodes connected by edges (or vertex) | some discrete nodes and the number of a collection of edges connecting them, is the abstract model of network structure.

** edge: ** A line segment connecting two nodes.

** weight: the weight of ** edges

** Path: ** A continuous sequence of nodes, with a-b-F representing the path from node A to node F. A path that does not contain duplicate nodes is called a “simple path,” and the paths we’ll talk about in this installment are all simple paths

** Node distance: ** sum of path weights

Nodes and edges are the basic units of a graph

Other concepts

Ring: A non-empty path that repeats only the first and last nodes. A graph with a ring is called a ring graph, and a graph without a ring is called an acyclic graph

Parallel edge: An edge connected to the same node

Adjacent node: Two nodes connected on the same edge

Node degree: the number of adjacent nodes of a node

Connected graph: GRAPH in which paths exist on any two nodes

Directed graph & undirected graph

Directed graphs and undirected graphs are divided according to whether an edge has direction or not

In a graph, either all sides have direction, or all sides have no direction

0x02 Storage structure

Graphs can have multiple storage structures, and there is no one right way to use a graph. The actual way to use a graph depends only on the problem to be solved and the type of graph

1. Adjacency matrix

The rows and columns of a matrix represent nodes, and the values of the matrix represent the weights of the edges. The weight of the unconnected edge is 0. Symmetric matrices for undirected graphs. For many graphs, a large number of values are zero, which is a waste of storage space

2. Association matrix

The rows of a matrix represent nodes and the columns represent edges. It is applicable to the situation where the number of edges is larger than that of nodes, greatly saving space and memory.

3. Adjacency list

The adjacency list consists of a list of adjacent nodes for each node in the graph. The benefit is that you don’t waste storage space.

Compared with the storage mode of matrix, adjacency list can be stored by a variety of data structures, such as number group, linked list, etc., matrix can only be stored by two-dimensional array. The advantage of matrix storage is that it can be optimized for a specific type of graph, whether in storage space or search efficiency.

0x03 Traversal/search

The only difference between traversal and search is that there is no specified end point, there is no difference in the algorithm core

1. Breadth-first search

A search algorithm that preferentially traverses adjacent nodes

There are only two core steps:

Access a node, mark the node as visited, and clear it from the node to be accessed.
The adjacent nodes of the marked node are nodes to be accessed. If they have been visited or are already nodes to be accessed, they are not marked.

The specific traversal process starting from A is as follows:

Select A as the starting node;
Access A, and its adjacent nodes are B and C. Add nodes B and C to be accessed. 【B, C】
Access B, and its adjacent nodes are A, D, and E. Add nodes D and E to be accessed. 【C, D, E】;
Access C, its adjacent nodes are A, D, E. No new node to be accessed is added. 【D, E】;
Access D, and its adjacent nodes are B, C and F. Add node F to be accessed. 【E, F】;
Access E, and its adjacent nodes are B, C and F. No new node to be accessed is added. [F];
Access F, and its adjacent nodes are D and E. No new node to be accessed is added. 【】.
If the node to be accessed is empty, the traversal is complete.

The access sequence is A, B, C, D, E, and F

Depth-first search

A search algorithm for adjacent nodes that traverse adjacent nodes first

The core steps are also two steps :(exactly the same as breadth first)

Access a node, mark the node as visited, and clear it from the node to be accessed.
The adjacent nodes of the marked node are nodes to be accessed. If they have been visited or are already nodes to be accessed, they are not marked.

The difference between breadth-first search and breadth-first search lies in the different storage modes of nodes to be accessed: 1. Breadth-first: queue-fifO 2. Depth-first: stack – First in, last out

The specific traversal process starting from A is as follows:

Select A as the starting node;
Access A, and its adjacent nodes are B and C. Add nodes B and C to be accessed. 【C, B】
Access C, its adjacent nodes are A, D, E. Add nodes D and E to be accessed. 【E, D, B】;
Access E, and its adjacent nodes are B, C and F. Add node F to be accessed. 【F, D, B】;
Access F, and its adjacent nodes are D and E. No new node to be accessed is added. 【D, B】;
Access D, and its adjacent nodes are B, C and F. No new node to be accessed is added. [B];
Access B, and its adjacent nodes are A, D, and F. No new node to be accessed is added. 【】.
If the node to be accessed is empty, the traversal is complete.

The access sequence is A, C, E, F, D, and B

0x04 Two classical algorithm applications

1. Shortest/long path (minimum/maximum distance) of two nodes

Among all paths between two nodes, the path with the least weight overlap is the shortest path

There are many classical algorithms to find the shortest path. Here is only one that is easier to understand:

Bellman-ford algorithm (edge as basic unit)

Algorithm is the core

Set the starting weight to 0 and the other nodes to infinity. The weight of a node is the temporary distance of the shortest path from the starting point to the node.
Iterate over all edges, updating the weights of nodes on both sides of edges
Weight calculation rule: Take the two directions of the edge to calculate the weight of the two nodes respectively, end weight = math.min (original end weight, starting weight + edge weight)
In an undirected graph, you actually only need to recalculate the weighted node of the two nodes
Repeat step 2 until no more node weights are changed

Algorithm demo

A is the starting point, F is the ending point, find the shortest path from A to F

There are 8 edges, A-B, A-C, B-D, B-E, C-D, C-E, D-F, e-F

First traversal

A-B: the weight of B = 0 + 9 = 9, the original weight ∞, the final weight is 9

2. A-c: the weight of C = 0 + 1 = 1, the original weight ∞, the final weight is 1

3. B-d: the weight of D = 9 + 2 = 11, the original weight ∞, the final weight is 11

4. B-e: the weight of E = 9 + 5 = 14, the original weight ∞, the final weight is 14

5. C-d: the weight of D = 1 + 3 = 4, the original weight is 11, the final weight is 4

6. C-e: the weight of E = 1 + 7 = 8, the original weight is 14, the final weight is 8

7. D-F: C weight = 4 + 3 = 7, the original weight ∞, the final weight is 7

8. E-f: the weight of E = 7 + 5 = 12, the original weight is 8, the final weight is 8

After the first iteration, all the infinity is eliminated

The second time through, the final result is as follows:

Compared to the first traversal, only node B’s weight changes from 9 to 6. So the shortest path from A to B is not a-B, at least for now a-C-D-B is shorter.

The third time through, the final result is as follows:

Compared with the second traversal, no node weight changes, the algorithm ends.

The final conclusion is that the shortest path from A to F is a-C-d-f, and the weight is 7. As shown in “Get color # ff6B00” below

Other algorithms

Dijkstra Algorithm, A* (A-STAR) Algorithm, Shortest Path Faster Algorithm (SPFA) Algorithm, Floyd Algorithm, etc. (see recommended books for details)

2. Minimum spanning tree

Construct the minimum cost spanning tree of the connected network. To connect multiple or all nodes in a connected graph with a minimum sum of paths. In simple terms, it means connecting n nodes with n minus 1 edges and making the path shortest

Prim algorithm (node as basic unit)

Algorithm is the core

Start with the start node and add it to the minimum spanning tree
Through all non-minimum spanning tree nodes, find the nearest adjacency to any node in the spanning tree, and add the adjacency to the minimum spanning tree.
If new members have been added to the minimum spanning tree, you need to update the minimum value of the remaining nodes
Repeat steps 2 and 3 until the minimum spanning tree is constructed (i.e., the minimum number of spanning tree nodes is reached)

Algorithm demo

Add node A to the minimum spanning tree

2. Traverse the remaining nodes, find that C is closest to A, and add it to the minimum spanning tree

3. Traverse the remaining nodes, find that D is closest to C, and add it to the minimum spanning tree

4. Traverse the remaining nodes, find that B is closest to D, and add it to the minimum spanning tree

5. Traverse the remaining nodes, find that F is closest to D, and add it to the minimum spanning tree

6. Traverse the remaining nodes, find that E is closest to B or F, and add it to the minimum spanning tree

7. There are no remaining nodes. Since there are two nearest paths with equal distances in Step 6, the figure has two minimum spanning trees

Other algorithms

Kruskal algorithm – to generate the smallest tree in edge units. (See recommended books for details)

A practical example of the 0x05 diagram

1. Database Neo4j based on picture storage

Neo4j Graph Platform – The Leader in Graph Databases

neo4j.com

2. Six degrees of segmentation theory

Any two people in the world who don’t know each other need very few middlemen to establish a connection. Harvard psychology professor Stanley Milgram did a linkage experiment based on this concept in 1967, trying to prove that it took an average of six steps to contact any two people who didn’t know each other

3. Map navigation – path planning, shortest path, subway path planning, travel and flight selection, etc

Graphs are better at studying distances and relationships between nodes

0x06 Algorithm Books Recommended

An introduction to

Book.douban.com/subject/303…

Book.douban.com/subject/258…

The advanced

Book.douban.com/subject/199…

Book.douban.com/subject/266…

Book.douban.com/subject/204…