figure
Graph Theory is a branch of mathematics. It takes pictures as its object of study. In graph theory, a graph is a graph composed of a number of given points and lines connecting two points. This graph is usually used to describe a specific relationship between some things, with points representing things and lines connecting two points indicating that there is such a relationship between two things.
Here is a logical diagram structure:
Graph is one of the most complex data structures. All the data structures mentioned above can be regarded as special cases of graph. So why not just use graphs and divide them into multiple data structures?
This is because most of the time, there is no need to use such complex functions, and many features of graphs are not available. If they are all called graphs in general, it will be very bad for communication. If you want to communicate with others, you don’t want to say that this problem is examining a particular kind of graph, which is… This is too verbose, so special graphs of other graphs are given special names to make communication easier. We don’t use “real” diagrams until we get to very complex situations.
As mentioned in the previous chapter, data structures are used to serve algorithms, and data structures are used to store data in order to be more efficient. So when do you need diagrams to store data, and how effective are diagrams in this case? The simple answer is that if you can’t store it well with other simple data structures, you should use graphs. For example, if we need to store a two-way friend relationship that is many-to-many, we must use a graph because no other data structure can simulate it.
The basic concept
Undirected Graph & deriving Graph
As mentioned above, binary trees can completely realize other tree structures, and similarly, directed graphs can completely realize undirected graphs and mixed graphs, so the study of directed graphs has always been the focus of investigation.
All graphs discussed in this article are directed graphs.
I mentioned earlier that we use a line connecting two points to indicate that there is this relationship between two things. So if the relationship between two things is directed, it’s a directed graph, otherwise it’s undirected. For example, if A knows B, then B may not know A. So the relationship is one way, and we need to represent it in a directed graph. Because if we use an undirected graph, we can’t tell whether the edges of A and B indicate that A knows B or B knows A.
Traditionally, when we draw graphs, we use directed graphs with arrows, and undirected graphs without arrows.
Weighted Graph & Unweighted Graph
If the edges are weighted it is a entitled graph (or a weighted graph), otherwise it is an unauthorized graph (or an unweighted graph). So what is weighted? Exchange rates, for example, are a weighted logical graph. 1 currency A for 5 currency B, so the weight of our sides A and B is 5. Friends, for example, can be viewed as a weightless graph.
Indegree & Outdegree
So how many edges point to node A, that’s what the entry of node A is. Similarly, the output of node A is the same as how many edges come out of A.
Again, in the picture above, all nodes in this picture have an in and out degree of 1.
Path & loop [Path: Path]
- The Cyclic Graph is a Cyclic Graph because we trigger from a point in the Graph to return to the starting point. This is the same thing as a real ring.
- Acyclic Graph
I can take the diagram above and make it a loop free graph with no loops at all.
Connected graph & strongly connected graph
In an undirected graph, if any two vertices I and J share paths, it is called a connected graph.
In a directed graph, if any two vertices I and J share paths, it is called a strongly connected graph.
Spanning tree
A spanning tree of a connected graph is a connected subgraph that contains all n vertices in the graph but only enough n-1 edges to form a tree. A spanning tree with n vertices has n-1 edges and only n-1 edges. If one edge is added to the spanning tree, it must be a ring. In all spanning trees of a connected network, the spanning tree with the minimum cost of all edges is called the minimum spanning tree, where the cost sum refers to the weight sum of all edges.
The establishment of the figure
Most graph titles don’t give you a ready-made graph data structure. When you know it’s a graph problem, the first step to solving a problem is usually to build a graph.
All above is the logical structure of the graph, so how to store the graph in the computer?
We know that graphs are made up of points and edges. In theory, we only need to store all edge relationships in the graph, because edges already contain relationships between two points.
Here I briefly introduce two common graph building methods: adjacency matrix (common, important) and adjacency list.
Adjacency Matrixs
The first way is to use an array or hash table to store the graph, in this case we use a two-dimensional array.
Graph is described by an N * N matrix, which is a two-dimensional matrix, in which graph[I][j] describes the relationship of edges.
In general, for unauthorized graphs I use graph[I][j] = 1 to indicate that there is an edge between vertex I and vertex j, and that the edge points from I to j. Graph [I][j] = 0 to indicate that there is no edge between vertex I and vertex j. For the power graph, we can store other numbers, which are weights.
You can see that the diagram is diagonally symmetric, so we only have to look at half of it, which is half of the wasted space.
The space complexity of this storage is O(n ^ 2), where n is the number of vertices. If the graph is sparse (the number of edges in the graph is much smaller than the number of vertices), then space is wasted. And if the graph is undirected, there will always be at least 50 percent wasted space. This is also intuitively reflected in the figure below.
The advantages of adjacency matrix are as follows:
-
Intuitive, simple.
-
Determine whether the two vertices are connected, obtain the in and out degree and update degree, and the time complexity is O(1).
Because it is relatively easy to use, so I basically use this method for all the problems that need to be drawn.
For example, force buckle 743. Network delay time. Title description:
There are N network nodes, labeled 1 through N. Given a list times, represents the transmission time of the signal through a directed edge. Times [I] = (u, v, w), where u is the source node, v is the target node, and w is the time when a signal is transmitted from the source node to the target node. Now, we send a signal from some node K. How long will it take for all nodes to receive signals? If all nodes cannot receive signals, -1 is returned. Example: input: times = [[2,1,1],[2,3,1],[3,4,1]], N = 4, K = 2 output: 2 note: N ranges from [1, 100]. K ranges between [1, N]. The length of times is between [1, 6000]. All edges times[I] = (u, v, w) have 1 <= u, v <= N, and 0 <= w <= 100.Copy the code
This is a typical graph problem, and for this problem, how do we construct a graph with an adjacency matrix?
A typical drawing code:
Construct adjacency matrix using hash table:
graph = collections.defaultdict(list)
for fr, to, w in times:
graph[fr - 1].append((to - 1, w))
Copy the code
Construct adjacency matrix using two-dimensional array:
graph = [[0]*n for _ in range(m)] Create an m * n two-dimensional matrix
for fr, to, w in times:
graph[fr-1][to-1] = w
Copy the code
This creates a critical matrix, and then we walk through the graph based on this adjacency matrix.
The Adjacency List
For each point, a linked list is stored that points to all points directly connected to that point. For weights, the values of the elements in the list correspond to weights.
For example, in an undirected and unauthorized graph:
(Image fromZhuanlan.zhihu.com/p/25498681)
You can see that in an undirected graph, the adjacency matrix is symmetric about the diagonal, and the adjacency list always has two symmetric edges.
In the directed and unauthorised graph:
(image from zhuanlan.zhihu.com/p/25498681)
Because adjacency lists are a bit more cumbersome to use, they are also not commonly used. To reduce the cognitive burden for beginners, I won’t post the code.
Graph traversal
Now that the graph is set up, it’s time to iterate.
No matter what kind of algorithm you are, you’re going to have to traverse, and there are usually two ways to do it: depth-first or breadth-first. (Other weird traversal methods don’t really make sense and don’t need to be learned.)
Regardless of the type of traversal, if the graph has a loop, it is important to keep track of node access to prevent loops. Of course, you may not need to actually use a collection to record node access, such as using a data in-place tag out of the data range, which would be O(1)O(1)O(1).
A digraph is used here as an example, and a digraph is similar and will not be described here.
Graph search will be covered in detail in the following topics, so I will stop here.
Depth First Search, DFS
The method of depth-first traversal graph is to start from a vertex V in the graph and continuously visit the neighbor, the neighbor’s neighbor until the visit is complete.
As shown in the figure above, if we use DFS and start from node A, one possible access order is: A -> C -> B -> D -> F -> G -> E, and of course A -> D -> C -> B -> F -> G -> E, depending on your code, but they are depth-first.
Our lines First Search, BFS
Breadth-first search, which can be described graphically as “dabbling,” also requires a queue to keep the order of vertices traversed, so that adjacent vertices of those vertices can be accessed in queue order.
As shown above, if we use BFS and start from node A, one possible access order is: A -> B -> C -> F -> E -> G -> D and so on, depending on your code, but they are breadth-first.
It is important to note that DFS and BFS are only an algorithm idea, not a specific algorithm. Therefore, it has a strong adaptability, rather than being limited to the characteristic of the data structure, the graph described in this paper can be used, the tree mentioned above can also be used. In fact, it can be used as long as the data structure is non-linear.
Common algorithms
Figure of the title of the algorithm is more suitable for templates.
Here are some common board problems. Mainly include:
- Dijkstra
- Floyd-Warshall
- Minimum spanning Tree (Kruskal & Prim) This section has been deleted at present, I feel that I have not written in detail, and it will be opened again after completion of the supplement.
- A star pathfinding algorithm
- Bipartite map (dyeing method) meaning in Chinese
- Topological Sort
Examples of common algorithms are listed below.
All of the following templates are based on adjacency matrices.
It is highly recommended that you study the following classical algorithms after you have studied the search in the topic. You can take a few common search questions and test them, and if you can do it then you can go on. Recommended topic: Maximize path value in a graph
Shortest distance, shortest path
Dijkstra algorithm
DIJKSTRA’s basic idea is breadth-first traversal. In fact, the basic idea of the shortest search algorithm is breadth first, but the specific expansion strategy is different.
DIJKSTRA algorithm mainly solves the shortest distance from any point in the graph to any other point in the graph, that is, the single-source shortest path.
Dijkstra is a little hard to remember, but you can just call it the DJ algorithm, is it a lot easier to remember?
For example, give you several cities and the distance between them. Let you plan the shortest route from city A to city B.
For this problem, we can first graph the distance between cities, and then use Dijkstra to do it. So how exactly does Dijkstra calculate shortest paths?
The basic idea of DJ algorithm is greed. Starting at start, you traverse all of your neighbors each time and find the one with the smallest distance, essentially a breadth-first traversal. Here we use the data structure of the heap to find the lowest cost in logNlogNlogN time.
If you use a normal queue, it’s a special case where all the edges in the graph have the same weight.
Let’s say we want to find the shortest distance from start to end. This is how we expect the DJ algorithm to be used.
For example, a graph looks like this:
E -- 1 --> B -- 1 --> C -- 1 --> D -- 1 --> F
\ /\
\ ||
-------- 2 ---------> G ------- 1 ------
Copy the code
We use the adjacency matrix to construct:
G = {
"B": [["C".1]],
"C": [["D".1]],
"D": [["F".1]],
"E": [["B".1], ["G".2]],
"F": []."G": [["F".1]],
}
shortDistance = dijkstra(G, "E"."C")
print(shortDistance) # E -- 3 --> F -- 3 --> C == 6
Copy the code
Specific algorithm:
- Initialize the heap. All the data in the heap is the binary ancestor of (cost, v), which means “the distance from start to V is cost”. So initially, the heap has tuples (0, start)
- One pops out of the heap (cos (t, v), and the first pop must be (0, start). If v has been accessed, it is skipped to prevent the creation of a loop.
- If v is the end point we’re looking for, we just return cos (t), which is the shortest distance from start to that point
- Otherwise, add v’s neighbor to the heap, that is, add (neibor, cost + c) to the heap. Where neibor is v’s neighbor, and C is the distance from V to Neibor (that is, the cost of the transfer).
Repeat steps 2-4
Code template:
Python
import heapq
def dijkstra(graph, start, end) :
The data in the heap is the binary ancestor of (cost, I), which means "the distance from start to I is cost".
heap = [(0, start)]
visited = set(a)while heap:
(cost, u) = heapq.heappop(heap)
if u in visited:
continue
visited.add(u)
if u == end:
return cost
for v, c in graph[u]:
if v in visited:
continue
next = cost + c
heapq.heappush(heap, (next, v))
return -1
Copy the code
JavaScript
const dijkstra = (graph, start, end) = > {
const visited = new Set(a)const minHeap = new MinPriorityQueue();
// New MinPriorityQueue() uses LC's built-in API. Its enqueue consists of two parts:
/ / element and priority.
// The heap is sorted by priority, and you can use element to record some content.
minHeap.enqueue(startPoint, 0)
while(! minHeap.isEmpty()){const {element, priority} = minHeap.dequeue();
// The following two variables are not required, just easy to understand
const curPoint = element;
const curCost = priority;
if(curPoint === end) return curCost;
if(visited.has(curPoint)) continue;
visited.add(curPoint);
if(! graph[curPoint])continue;
for(const [nextPoint, nextCost] of graph[curPoint]){
if(visited.has(nextPoint)) continue;
// Note that the distance in the heap must be from startPoint to a point;
//curPoint to nextPoint is nextCost; But curPoint is not necessarily startPoint.
constaccumulatedCost = nextCost + curCost; minHeap.enqueue(nextPoint, accumulatedCost); }}return -1
}
Copy the code
With this algorithm template, you can go to AC 743. Network latency.
Here is the complete code for your reference:
Python
class Solution:
def dijkstra(self, graph, start, end) :
heap = [(0, start)]
visited = set(a)while heap:
(cost, u) = heapq.heappop(heap)
if u in visited:
continue
visited.add(u)
if u == end:
return cost
for v, c in graph[u]:
if v in visited:
continue
next = cost + c
heapq.heappush(heap, (next, v))
return -1
def networkDelayTime(self, times: List[List[int]], N: int, K: int) - >int:
graph = collections.defaultdict(list)
for fr, to, w in times:
graph[fr - 1].append((to - 1, w))
ans = -1
for to in range(N):
dist = self.dijkstra(graph, K - 1, to)
if dist == -1: return -1
ans = max(ans, dist)
return ans
Copy the code
JavaScript
const networkDelayTime = (times, n, k) = > {
// Ahem, this is not Dijkstra's best solution for this problem
const graph = {};
for(const [from, to, weight] of times){
if(! graph[from]) graph[from] = [];
graph[from].push([to, weight]);
}
let ans = -1;
for(let to = 1; to <= n; to++){
let dist = dikstra(graph, k, to)
if(dist === -1) return -1;
ans = Math.max(ans, dist);
}
return ans;
};
const dijkstra = (graph, startPoint, endPoint) = > {
const visited = new Set(a)const minHeap = new MinPriorityQueue();
// New MinPriorityQueue() uses LC's built-in API. Its enqueue consists of two parts:
/ / element and priority.
// The heap is sorted by priority, and you can use element to record some content.
minHeap.enqueue(startPoint, 0)
while(! minHeap.isEmpty()){const {element, priority} = minHeap.dequeue();
// The following two variables are not required, just easy to understand
const curPoint = element;
const curCost = priority;
if(visited.has(curPoint)) continue;
visited.add(curPoint)
if(curPoint === endPoint) return curCost;
if(! graph[curPoint])continue;
for(const [nextPoint, nextCost] of graph[curPoint]){
if(visited.has(nextPoint)) continue;
// Note that the distance in the heap must be from startPoint to a point;
//curPoint to nextPoint is nextCost; But curPoint is not necessarily startPoint.
constaccumulatedCost = nextCost + curCost; minHeap.enqueue(nextPoint, accumulatedCost); }}return -1
}
Copy the code
The time complexity of DJ algorithm is vlogv+ EVlogv + EVlogv + E, where V and E are the number of points and edges in the graph respectively.
So one last thought for you: What if you calculate the distance from one point to all of the points on the graph? What adjustments will be made to our algorithm?
Tip: You can do this by using a DIST hash table that records the shortest distance from the starting point to each point. If you want to come out, you can buckle 882 to verify oh ~
It is worth noting that Dijkstra cannot handle negative edge weights. That is, if there are edges with negative weights, then the answer may not be correct. The shortest circuit algorithm based on dynamic programming (described below) can handle this situation.
Floyd – Warshall algorithm
Floyd-warshall can solve any two point distances, namely multi-source shortest paths, which is different from DJ algorithm.
In addition, The Behrman-Ford algorithm is also a classical dynamic programming algorithm to solve the shortest path, which is also different from DJ, which is based on greed.
Compared with the dijkstra algorithm above, it is particularly suitable for calculating the distance of any two points in the graph, such as 1462 of force link, because the calculation process saves the intermediate calculation results to prevent repeated calculation. Course arrangement IV. Except for that. The biggest difference between the Behrman-Ford algorithm described below is that this algorithm is a multi-source shortest path, while Behrman-Ford is a single-source shortest path. The Behrman-Ford algorithm is simpler in terms of complexity and notation, which we’ll show you later.
This is not to say that Bermann and Dijkstra do not support multisource shortest paths, you just need to add a for loop to enumerate all the starting points.
Another important point is that floyd-Warshall algorithm can deal with negative weights because it uses the idea of dynamic programming rather than greed, which needs special attention. For more details on dynamic programming, see the dynamic programming topics and knapsack issues below.
The algorithm is also easy to understand, in simple terms: the shortest path from I to j = the shortest path from I to k + the minimum of the shortest path from k to j. The diagram below:
The shortest distance from u to v is the shortest distance from u to x plus the shortest distance from x to v. In the figure above x is the path from u to v. If not, we need multiple intermediate nodes and take the smallest.
The correctness of the algorithm is self-evident, because from I to J, either directly to or through another point K in the graph, there may be multiple intermediate nodes K, through the middle point to pick the smallest, naturally is the shortest distance from I to J.
Question: Can the longest acyclic path be solved by dynamic programming?
The time complexity of the algorithm is O(N3)O(N^3)O(N3), and the space complexity is O(N2)O(N^2)O(N2), where N is the number of vertices.
Code template:
Python
# graph is the adjacency matrix and n is the number of vertices
Graph [u][v] = w
def floyd_warshall(graph, n) :
dist = [[float("inf") for _ in range(n)] for _ in range(n)]
for i in range(n):
for j in range(n):
dist[i][j] = graph[i][j]
# check vertex k against all other vertices (i, j)
for k in range(n):
# looping through rows of graph array
for i in range(n):
# looping through columns of graph array
for j in range(n):
if( dist[i][k] ! =float("inf")
anddist[k][j] ! =float("inf")
and dist[i][k] + dist[k][j] < dist[i][j]
):
dist[i][j] = dist[i][k] + dist[k][j]
return dist
Copy the code
JavaScript
const floydWarshall = (graph, v) = >{
const dist = new Array(v).fill(0).map(() = > new Array(v).fill(Number.MAX_SAFE_INTEGER))
for(let i = 0; i < v; i++){
for(let j = 0; j < v; j++){
// The distance between the two points is 0
if(i === j) dist[i][j] = 0;
// The distance between I and j is known
else if(graph[i][j]) dist[i][j] = graph[i][j];
// The distance between I and j is unknown
else dist[i][j] = Number.MAX_SAFE_INTEGER; }}// Check if there is a point k that makes the distance between I and j shorter, if so, update the shortest distance
for(let k = 0; k < v; k++){
for(let i = 0; i < v; i++){
for(let j = 0; j < v; j++){
dist[i][j] = Math.min(dist[i][j], dist[i][k] + dist[k][j])
}
}
}
returnSee need}Copy the code
Let’s go back and look at how to set the template to solve the force buckle 1462. Course arrangement IV, title description:
You need to take a total of n courses, numbered from 0 to N-1. Some courses have direct prerequisite courses. For example, if you want to take course 0, you must take course 1 first, then the number of prerequisite courses is given in the form of [1,0] pairs. You are given the total number of courses N and one list of prerequisite courses and one list of prerequisite courses. For each pair of queries[I], please determine whether queries[I][0] are pre-courses for queries[I][1]. Return a list of Boolean values. Each element in the list corresponds to the judgment result of each query pair. Note: If course A is a prerequisite for course B and course B is a prerequisite for course C, then course A is also a prerequisite for course C. Example 1: input: n = 2, resident = [[1,0]], queries = [[0,1],[1,0]] Course 0 is not a prerequisite for course 1, but course 1 is a prerequisite for course 0. Example 2: input: n = 2, resident = [], queries = [[1,0],[0,1]] output: [false,false] Example 3: input: n = 3, resident = [[1,2],[1,0],[2,0]] query = [[1,0],[1,2] example 4: input: N = 3, resident = [[1,0],[2,0]], queries = [[0,1],[2,0]] example 5: input: N = 5, prerequisites = [[0, 1], [1, 2], [2, 3], [3, 4]], the queries = [[0, 4], [4, 0], [1, 3], [3, 0]] output: [true, false, true, false] tip: 2 <= n <= 100 0 <= prerequisite.length <= (n * (n - 1) / 2) 0 <= prerequisite[i][0], prerequisite[i][1] < n prerequisite[i][0] ! There are no rings in the map of prerequisite courses. There are no duplicated edges in the ap chart. 1 <= queries.length <= 10^4 queries[i][0] ! = queries[i][1]Copy the code
This problem can also be done using Floyd-Warshall. You could think of it this way, if the distance from I to j is greater than 0, it’s a prerequisite. The data range of this question is about 10 ^ 4. Using the above Dijkstra algorithm must time out, so floyd-Warshall algorithm is a wise choice.
I’m just going to set the template here, just change it a little bit. Full code: Python
class Solution:
def Floyd-Warshall(self, dist, v) :
for k in range(v):
for i in range(v):
for j in range(v):
dist[i][j] = dist[i][j] or (dist[i][k] and dist[k][j])
return dist
def checkIfPrerequisite(self, n: int, prerequisites: List[List[int]], queries: List[List[int]]) - >List[bool] :
graph = [[False] * n for _ in range(n)]
ans = []
for to, fr in prerequisites:
graph[fr][to] = True
dist = self.Floyd-Warshall(graph, n)
for to, fr in queries:
ans.append(bool(dist[fr][to]))
return ans
Copy the code
JavaScript
// This is not the best way
var checkIfPrerequisite = function(numCourses, prerequisites, queries) {
const graph = {}
for(const [course, pre] of prerequisites){
if(! graph[pre]) graph[pre] = {} graph[pre][course] =true
}
const ans = []
const dist = Floyd-Warshall(graph, numCourses)
for(const [course, pre] of queries){
ans.push(dist[pre][course])
}
return ans
};
var Floyd-Warshall = function(graph, n){
dist = Array.from({length: n + 1}).map(() = > Array.from({length: n + 1}).fill(false))
for(let k = 0; k < n; k++){
for(let i = 0; i < n; i++){
for(let j = 0; j < n; j++){
if(graph[i] && graph[i][j]) dist[i][j] = true
if(graph[i] && graph[k]){
dist[i][j] = (dist[i][j])|| (dist[i][k] && dist[k][j])
}else if(graph[i]){
dist[i][j] = dist[i][j]
}
}
}
}
return dist
}
Copy the code
If you can solve this problem, I also recommended a problem to you 1617. Statistics subtree between the maximum distance between the city, the international edition has a problem solving code quite clear, quite understand, but did not use the state compression performance is not very good, address: leetcode.com/problems/co…
The dynamic programming algorithm on this graph and you can practice with this problem.
- The cheapest flight within the 787.K stop connection
Berman-ford algorithm
It’s similar to the algorithm above. This method mainly deals with single source shortest path, which is the shortest distance from one point to another point in the graph.
The basic idea is also dynamic programming.
The core algorithm is:
- The initialization start distance is 0
- All edges in the graph are processed several times until stable. The processing basis is: for each directed edge (u,v), if dist[u] + w is less than dist[v], it means that we find a closer way to v and update it.
- The upper limit of the number of vertices is the number of vertices V, so you might as well go straight to n.
- And finally check to see if there are negative edge loops. (note)
Let me give you an example. For one of the following graphs, there exists a B -> C -> D -> B such that the distances from B to C and D can theoretically be infinitesimally small. We need to detect this and exit.
The algorithm time complexity: O (V ∗ E) O (V * E) O V ∗ (E), space complexity: O (V) (V) O O (V).
Code example: Python
# return -1 for not exsit
# else return dis map where dis[v] means for point s the least cost to point v
def bell_man(edges, s) :
dis = defaultdict(lambda: math.inf)
dis[s] = 0
for _ in range(n):
for u, v, w in edges:
if dis[u] + w < dis[v]:
dis[v] = dis[u] + w
for u, v, w in edges:
if dis[u] + w < dis[v]:
return -1
return dis
Copy the code
JavaScript
const BellmanFord = (edges, startPoint) = >{
const n = edges.length;
const dist = new Array(n).fill(Number.MAX_SAFE_INTEGER);
dist[startPoint] = 0;
for(let i = 0; i < n; i++){
for(const [u, v, w] of edges){
if(dist[u] + w < dist[v]){ dist[v] = dist[u] + w; }}}for(const [u, v, w] of edges){
if(dist[u] + w < dist[v]) return -1;
}
return dist
}
Copy the code
Recommended reading:
- bellman-ford-algorithm
Recommend:
- Best Currency Path
A topological sort
In computer science, topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge UV from vertex U to vertex V, u comes first in the ordering. Topological sorting is possible if and only if there are no directed rings in the graph (i.e. directed acyclic graphs).
A typical question is one that gives you a bunch of courses that have a prerequisite relationship between them, and asks you to suggest a viable learning path, requiring you to take the courses that you take first. Any directed acyclic graph has at least one topological sort. There are algorithms that can construct topological ordering of any directed acyclic graph in linear time.
Kahn algorithm
In simple terms, if L is a list of results, find the nodes with zero degree of entry and put them in L, because they don’t have any parents. ** Then remove the edges connected to these nodes from the graph and look for nodes in the graph with zero degree of entry. ** For the newly found nodes with zero degree of entry, their parents are already in L, so they can also be put into L. Repeat until no node with zero degree of entry is found. If the number of elements in L is the same as the total number of nodes, the sorting is complete. If the number of elements and total number of nodes in L are different, it indicates that there are rings in the original graph and topological sorting cannot be carried out.
def topologicalSort(graph) :
""" Kahn's Algorithm is used to find Topological ordering of Directed Acyclic Graph using BFS """
indegree = [0] * len(graph)
queue = collections.deque()
topo = []
cnt = 0
for key, values in graph.items():
for i in values:
indegree[i] += 1
for i in range(len(indegree)):
if indegree[i] == 0:
queue.append(i)
while queue:
vertex = queue.popleft()
cnt += 1
topo.append(vertex)
for x in graph[vertex]:
indegree[x] -= 1
if indegree[x] == 0:
queue.append(x)
ifcnt ! =len(graph):
print("Cycle exists")
else:
print(topo)
# Adjacency List of Graph
graph = {0: [1.2].1: [3].2: [3].3: [4.5].4: [].5: []}
topologicalSort(graph)
Copy the code
Minimum spanning tree
First let’s look at what a spanning tree is.
First, a spanning tree is a subgraph of the original graph. It is essentially a tree, which is why it is called a spanning tree, not a spanning graph. Second, the spanning tree should include all vertices in the graph. The figure below is not a spanning tree because it does not contain all vertices, in other words all vertices are not in the same connected domain.
Yellow vertices are not included
You can think of a spanning tree as a multi-fork tree with an indeterminate root node, and since it is a tree, it must contain no rings. The following figure is not a spanning tree.
Therefore, it is not difficult to conclude that the number of edges of the minimum spanning tree is N-1, where n is the number of vertices.
Now let’s look at what a minimum spanning tree is.
Minimum spanning tree (MSTP) is a short name of minimum right regeneration tree (MSTP) by adding minimum keyword on the basis of SPANNING tree. From this statement, it can also be seen that the minimum spanning tree processing is the entitlement graph. The weight of a spanning tree is the weight sum of all its edges, so the minimum spanning tree is the minimum spanning tree with the weight sum. Therefore, it can be seen that both the spanning tree and the minimum spanning tree may not be unique.
Minimum spanning trees are of great value in real life. For example, IF I want to build a subway and cover N stations, which are accessible to each other (the same unicom domain), how can the construction cost be minimized? Since the route between each station is different, the cost is different, so this is a minimum spanning tree scenario in practice, and there are many similar examples.
(Image from Wikipedia)
It is not difficult to see that calculating the minimum spanning tree is to select n-1 edges from the edge set to satisfy the spanning tree and minimize the sum of weights.
Kruskal and Prim are two classical algorithms for minimum spanning trees. How do these two algorithms calculate minimum spanning trees? Let’s take a look at them in this section.
Kruskal
Kruskal is relatively easy to understand and recommended to master.
The Kruskal algorithm, also known figuratively as the edge addition method, selects the edge with the least weight every time it advances and adds it to the result set. To prevent the creation of rings (adding rings is meaningless, as long as the weight is positive, it will definitely make the result worse), we need to check that the currently selected edge is connected to the already selected edge. If it’s connected, there’s no need to pick it up, because that would create a ring. So algorithmically, we can use and look up the set assist to complete. We’ll talk about syndication in a later chapter.
The find_parent part of the following code is actually the core code for the lookup set, but we haven’t wrapped it and used it.
Kruskal’s specific algorithm:
- Sort the edges from smallest to largest in weight.
- Initialize n vertices to n connected domains
- Select edges to add to the result set in ascending order of weight, optionally selecting the smallest edge at a time. If the currently selected edge is connected to the already selected edge (if forced, there is a ring), then the selection is abandoned, otherwise the selection is carried out and the result set is added.
- Repeat 3 until we find a subgraph of size N
Code template:
Edge is an array, and each item in the array is in the form of (cost, FR, to), which means that there is an edge with the weight of Cost from FR to TO.
class DisjointSetUnion:
def __init__(self, n) :
self.n = n
self.rank = [1] * n
self.f = list(range(n))
def find(self, x: int) - >int:
if self.f[x] == x:
return x
self.f[x] = self.find(self.f[x])
return self.f[x]
def unionSet(self, x: int, y: int) - >bool:
fx, fy = self.find(x), self.find(y)
if fx == fy:
return False
if self.rank[fx] < self.rank[fy]:
fx, fy = fy, fx
self.rank[fx] += self.rank[fy]
self.f[fy] = fx
return True
class Solution:
def Kruskal(self, edges) - >int:
n = len(points)
dsu = DisjointSetUnion(n)
edges.sort()
ret, num = 0.1
for length, x, y in edges:
if dsu.unionSet(x, y):
ret += length
num += 1
if num == n:
break
return ret
Copy the code
Prim
Prim algorithm is also vividly known as the point method, each advance selects the point with the lowest weight and adds it to the result set. Visually, it looks like a real world tree that keeps growing.
Prim specific algorithm:
- Initialize the minimum spanning tree point set MV is any vertex in the graph, and the minimum spanning tree edge set ME is empty. Our goal is to fill MV to be the same as V, and the edge set is computed automatically based on the generation of MV.
- In set E (set E is the edge set of the original graph), select the smallest edge
, where U is the existing element in MV, and V is the element that does not exist in MV (like the growing real world tree mentioned above), add V to MV, and add
to ME.
,>
,> - Repeat 2 until we find a subgraph of size N
Code template:
Dist [I][j] = x indicates that there is an edge with weight x from vertex I to vertex j.
class Solution:
def Prim(self, dist) - >int:
n = len(dist)
d = [float("inf")] * n # represents the minimum distance between each vertex and the vertex added to the minimum spanning tree.
vis = [False] * n # indicates whether it has been added to the minimum spanning tree
d[0] = 0
ans = 0
for _ in range(n):
Find the minimum D of the current round
M = float("inf")
for i in range(n):
if not vis[i] and d[i] < M:
node = i
M = d[i]
vis[node] = True
ans += M
for i in range(n):
if not vis[i]:
d[i] = min(d[i], dist[i][node])
return ans
Copy the code
Compare the two algorithms
For the sake of description, let V be the number of vertices in the graph and E be the number of edges in the graph. Then KruKal’s algorithm complexity is O(ElogE)O(ElogE)O(ElogE), and Prim’s algorithm time complexity is E+VlogVE +VlogVE +VlogV. Therefore, Prim is suitable for dense graphs while KruKal is suitable for sparse graphs.
You can also refer to wikipedia – Minimum spanning tree as a supplement.
Here is another copy of the video learning materials, including animation did a good job, everyone can be used as a reference, address: www.bilibili.com/video/BV1Eb…
You can practice using LeetCode’s 1584. Minimum cost of connecting all points.
Other algorithms
A star pathfinding algorithm
The problem of a-star pathfinding is to find the shortest distance or shortest path to any two points in A two-dimensional table. It is a common heuristic algorithm used to calculate the movement of NPCS in games. There are always obstacles to this kind of problem. In addition to obstacles, the buckle questions will also add some restrictions, making the questions more difficult.
This kind of topic is usually the difficulty of force buckle. It’s easy to understand, but not so easy to write completely without bugs.
In this algorithm, we start at the starting point, examine its four adjacent squares and try to expand until we find the target. A star pathfinding algorithm more than one way to find, interested in their own understanding.
The formula is: F (n)=g(n)+h(n).
Among them:
-
F (n) is the estimated cost from the initial state to the target state through the state N,
-
G (n) is the actual cost of going from the initial state to state n in the state space,
-
H (n) is the estimated cost of the best path from state N to the target state.
If g(n) is 0, that is, only the evaluation function h(n) from any vertex N to the target is calculated, but the distance from the starting point to vertex N is not calculated, then the algorithm is transformed into the best first search using greedy strategy, which is the fastest, but may not get the optimal solution. If h(n) is not greater than the actual distance between vertex N and the target vertex, the optimal solution can be obtained. The smaller h(n) is, the more nodes need to be calculated, and the lower the efficiency of the algorithm. Common evaluation functions include Euclidean distance, Manhaton distance and Chebyshev distance. If h(n) is 0, that is, only the shortest path g(n) from the starting point to any vertex N is needed, but no evaluation function h(n) is calculated, then it is transformed into single-source shortest path problem, that is, Dijkstra algorithm, in which the most vertices need to be calculated.
An important concept here is the valuation algorithm. Generally, we use Manhattan distance for valuation, that is, H(n) = D * (ABS (N.x — goal.x) + ABS (N.y — goal.y)).
(Image from Wikipedia zh.wikipedia.org/wiki/A*%E6%…)
A complete code template:
grid = [
[0.1.0.0.0.0],
[0.1.0.0.0.0].# 0 are free path whereas 1's are obstacles
[0.1.0.0.0.0],
[0.1.0.0.1.0],
[0.0.0.0.1.0]]""" heuristic = [[9, 8, 7, 6, 5, 4], [8, 7, 6, 5, 4, 3], [7, 6, 5, 4, 3, 2], [6, 5, 4, 3, 2, 1], [5, 4, 3, 2, 1, 0]]"""
init = [0.0]
goal = [len(grid) - 1.len(grid[0]) - 1] # all coordinates are given in format [y,x]
cost = 1
# the cost map which pushes the path closer to the goal
heuristic = [[0 for row in range(len(grid[0"))"for col in range(len(grid))]
for i in range(len(grid)):
for j in range(len(grid[0])):
heuristic[i][j] = abs(i - goal[0]) + abs(j - goal[1])
if grid[i][j] == 1:
heuristic[i][j] = 99 # added extra penalty in the heuristic map
# the actions we can take
delta = [[-1.0], [0, -1], [1.0], [0.1]] # go up # go left # go down # go right
# function to search the path
def search(grid, init, goal, cost, heuristic) :
closed = [
[0 for col in range(len(grid[0"))"for row in range(len(grid))
] # the reference grid
closed[init[0]][init[1]] = 1
action = [
[0 for col in range(len(grid[0"))"for row in range(len(grid))
] # the action grid
x = init[0]
y = init[1]
g = 0
f = g + heuristic[init[0]][init[0]]
cell = [[f, g, x, y]]
found = False # flag that is set when search is complete
resign = False # flag set if we can't find expand
while not found and not resign:
if len(cell) == 0:
return "FAIL"
else: # to choose the least costliest action so as to move closer to the goal
cell.sort()
cell.reverse()
next = cell.pop()
x = next[2]
y = next[3]
g = next[1]
if x == goal[0] and y == goal[1]:
found = True
else:
for i in range(len(delta)): # to try out different valid actions
x2 = x + delta[i][0]
y2 = y + delta[i][1]
if x2 >= 0 and x2 < len(grid) and y2 >= 0 and y2 < len(grid[0) :if closed[x2][y2] == 0 and grid[x2][y2] == 0:
g2 = g + cost
f2 = g2 + heuristic[x2][y2]
cell.append([f2, g2, x2, y2])
closed[x2][y2] = 1
action[x2][y2] = i
invpath = []
x = goal[0]
y = goal[1]
invpath.append([x, y]) # we get the reverse path from here
whilex ! = init[0] ory ! = init[1]:
x2 = x - delta[action[x][y]][0]
y2 = y - delta[action[x][y]][1]
x = x2
y = y2
invpath.append([x, y])
path = []
for i in range(len(invpath)):
path.append(invpath[len(invpath) - 1 - i])
print("ACTION MAP")
for i in range(len(action)):
print(action[i])
return path
a = search(grid, init, goal, cost, heuristic)
for i in range(len(a)):
print(a[i])
Copy the code
Typical topic 1263. Push boxes
Binary chart
I showed you the binary graph in these two problems, so you just have to look at it and do both of these problems. There’s really no difference between these two questions and one question.
- 0886. Possible dichotomy
- 0785. Judge the binary graph
Recommended order: look at 886 first and then 785.
conclusion
By understanding the common concepts of diagrams, we are getting started. And then we can do the problem.
There are two kinds of general graph topics, one is search topic, one is dynamic programming topic.
For search questions, we can:
- The first step is always drawing
- The second step is all traversal based on the graph of the first step to find feasible solutions
If the title indicates acyclic graphs, we may not use the Visited array, otherwise most would need the Visited array. Of course, you can also choose the in-situ algorithm to reduce space complexity. Specific search techniques will be discussed in the search section of the topic.
Graph topics are relatively difficult, especially when it comes to code writing. But as far as interview questions are concerned, there are few types of questions in the chart.
- In terms of searching topics, many topics are templates can be solved. Therefore, it is recommended that you practice the template and make sure that you can do it yourself.
- And for dynamic programming problems, a classic example is thisFloyd – Warshall algorithmAfter you understand it, you can take it
The cheapest flight within the 787.K stop connection
Practice. Of course, this requires you to learn dynamic programming first, and we’ll talk about dynamic programming in depth later in Dynamic Programming and Knapsack Problems.
The common picture of the board has the following kinds of questions:
- The short circuit. Algorithms include DJ algorithm, Floyd algorithm and Bellman algorithm. Some of them are single-source algorithms, some are multi-source algorithms, some are greedy algorithms, and some are dynamic programming.
- Topological sort. Topological sorting can be done using BFS or DFS. Compared to the shortest, this kind of problem belongs to the simple type of knowing.
- Minimum spanning tree. Minimum spanning tree is the least frequent among the three types of questions and can be finally broken through.
- A star pathfinding and binary map topic proportion is very low, you can choose to master according to their own situation.