What is topological sort? How does it work?

After reading this article, you can try to solve the following problems:

207. Curriculum schedule (Medium)

210. Curriculum II (Medium)

Many readers said that they want to see the “graph” related algorithm, that is to meet you, combined with the algorithm to give you the graph related skills again.

As mentioned in the previous article, data structure related algorithms are no more than two points: traversal + access. So the basic traversal method of graph is also very simple, the graph algorithm in the preceding text on the basis of how to extend from the traversal framework of multi-fork tree to graph traversal.

There are some special algorithms for graph data structure, such as dichotomy graph judgment, looped graph acyclic graph judgment, topological sorting, and the most classical minimum spanning tree, single-source shortest path problem, and even more difficult problems like network flow.

But in my experience, with things like network streaming, you’re not in a competition, so unless you’re really interested, you don’t need to learn it; For example, the minimum spanning tree and the shortest path problem, although from the point of view of brush problems, but they belong to the classical algorithm, learn to have the power to master; Topological sorting, for example, is one of the more basic and useful algorithms that you should be familiar with.

** So this paper combined with the specific algorithm, to say the principle of topological sorting algorithm, ** because the object of topological sorting is directed acyclic graph, so incidentally talk about how to judge whether the graph has a ring.

Determine whether a directed graph has a ring

Let’s take a look at question 207 “Curriculum” :

The function signature is as follows:

int[] findOrder(int numCourses, int[][] prerequisites);

Copy the code

The questions should be easy to understand. When can I not complete all the courses? When there are cyclic dependencies.

In fact, this scenario is also very common in real life, for example, we write code import package is also an example, the code directory structure must be properly designed, otherwise there will be loop dependence, the compiler will report errors, so the compiler actually uses a similar algorithm to determine whether your code can compile successfully.

When you see a dependency problem, the first thing that comes to mind is to turn the problem into a data structure called a “directed graph.” As long as there are cycles in the graph, there are cyclic dependencies.

Specifically, we can first regard the course as a node in the “directed graph”, the node numbers are 0, 1… , numcourses-1, regard the dependence between courses as directed edges between nodes.

For example, if you have to take course 1 before you can take course 3, you have a directed edge that goes from node 1 to node 3.

So we can generate a diagram that looks like this from the Letter array that the problem inputs:

If there are rings in this directed graph, it means that there are cyclic dependencies between courses, and it is definitely impossible to finish all courses; On the other hand, if you don’t have a ring, you can definitely take the whole course.

Ok, so to solve this problem, first we need to convert the input from the problem into a directed graph, and then determine whether there are rings in the graph.

How do I convert that into a graph? We have written two storage forms of graph, adjacency matrix and adjacency list, based on graph theory.

In my experience, the common storage is to use the adjacency list, such as the following structure:

List<Integer>[] graph;

Copy the code

Graph [s] is a list of nodes pointed to by node S.

So we can first write a graph function:

List<Integer>[] buildGraph(int numCourses, List<Integer>[] graph = new LinkedList[numCourses]; for (int i = 0; i < numCourses; i++) { graph[i] = new LinkedList<>(); } for (int[] edge : prerequisites) { int from = edge[1]; int to = edge[0]; // Add a directed edge graph[from]. Add (to); } return graph; }Copy the code

So once you have a graph, how do you tell if there’s a ring in the graph?

So let’s not worry, but let’s think about how to iterate over this graph, and if we can iterate, we can tell if there’s a ring in the graph.

The framework of DFS algorithm traversal graph is written on the basis of graph theory above, which is nothing more than an extension from the multi-fork tree traversal framework, adding a Visited array:

// Prevent revisiting the same node Boolean [] visited; // Start BFS traversal with true void traverse(List<Integer>[] graph, int s) {if (visited[s]) {return; } // mark the current node as visited[s] = true; for (int t : graph[s]) { traverse(graph, t); } /* After traversing the code position */}Copy the code

Then we can simply apply this traversal code:

// Prevent revisiting the same node Boolean [] visited; boolean canFinish(int numCourses, int[][] prerequisites) { List<Integer>[] graph = buildGraph(numCourses, prerequisites); visited = new boolean[numCourses]; for (int i = 0; i < numCourses; i++) { traverse(graph, i); }} void traverse(List<Integer>[] graph, int s) {Copy the code

Note that not all nodes in the diagram are connected, so use a for loop to call the DFS search algorithm once for all nodes as a starting point.

So, you can iterate through all the nodes in this graph. If you print the Visited array, it should all be true.

As we said earlier in the framework thinking of data structures and algorithms, traversing a graph is similar to traversing a multi-fork tree, so it should be pretty easy for you to understand by now.

So how do you tell if there are rings in this picture?

As we explained earlier, you can think of a recursive function as a pointer to a recursion tree. Here’s the same thing:

You can also think of traverse as Pointers to traverse the nodes on the graph, adding a Boolean onPath to the current path of the traverse:

boolean[] onPath; boolean hasCycle = false; boolean[] visited; Void traverse(List<Integer>[] graph, int s) {if (onPath[s]) {// find loop!! hasCycle = true; } if (visited[s]) { return; } // Mark the node s as visited[s] = true; // start traversing the node s onPath[s] = true; for (int t : graph[s]) { traverse(graph, t); } // node s traversal completes onPath[s] = false; }Copy the code

OnPath [s] is marked as true when entering node S and false when leaving. If onPath[s] is already marked, a loop is present.

PS: Refer to the scene where the snake fails to get around the bend and bites itself.

In this way, you can determine whether there is a ring in the process of traversing the graph, the complete code is as follows:

// Record a traverse recursion. Boolean [] onPath; Boolean [] visited; Boolean hasCycle = false; boolean canFinish(int numCourses, int[][] prerequisites) { List<Integer>[] graph = buildGraph(numCourses, prerequisites); visited = new boolean[numCourses]; onPath = new boolean[numCourses]; for (int i = 0; i < numCourses; I ++) {// traverse all nodes of the graph (graph, I); } // Complete all classes as long as there are no loop dependencies hasCycle; } void traverse(List<Integer>[] graph, int s) {if (onPath[s]) {// hasCycle = true; } the if (visited [s] | | hasCycle) {/ / if you have already found the ring, also need not traverse the return; } // visit [s] = true; onPath[s] = true; for (int t : graph[s]) { traverse(graph, t); } // onPath[s] = false; } List<Integer>[] buildGraph(int numCourses, int[][] prerequisites) {Copy the code

And that solves the problem, which is to determine whether there are rings in a directed graph.

But what if the problem maker continues to nag you and asks you not only to determine whether there is a ring, but also to return what nodes the ring has?

You might say, isn’t the index true in onPath the number of the nodes that make up the ring?

No, assuming that the green nodes in the image below are recursive paths, they are all true in onPath, but clearly the looped nodes are only part of the equation:

I’ll leave this question for you to think about, and I’ll put the correct answer at the top of the comment section of our official account.

So next, let’s talk about a classic graph algorithm: topological sorting.

A topological sort

Look at question 210, “Schedule II” :

This is an advanced version of the previous question, not just to determine whether you can complete all the courses, but to go further and return you to a reasonable order so that you have completed all the previous courses by the time you start each course.

The function signature is as follows:

int[] findOrder(int numCourses, int[][] prerequisites);

Copy the code

The word Topological Sorting is mathematically defined on the Internet. Here’s an illustration from Baidu.com to help you visualize it:

The intuition is that you want to “flatten” a picture, and all the arrows in the “flatten” picture are in the same direction, like the one above, where all the arrows are pointing to the right.

Obviously, if there are rings in a directed graph, you can’t do topological sorting, because you can’t make all the arrows go in the same direction; Conversely, if a graph is a “directed acyclic graph”, topological sorting must be possible.

But what does our problem have to do with topological sorting?

In fact, it is not hard to see. If the courses are abstracted into nodes and the dependencies between courses are abstracted into directed edges, the topological ordering result of this graph is the order of classes.

First of all, let’s judge whether the course dependence of the question input is looped or not. If looped, topological sorting cannot be carried out, so we can reuse the main function of the previous question:

public int[] findOrder(int numCourses, int[][] prerequisites) { if (! CanFinish (numCourses, resident)) {return new int[]{}; } / /... }Copy the code

PS: For simplicity, canFinish reuses the previously implemented functions, but can actually combine the loop checking logic with the topological sorting logic, with traverse functions, which can be left to your own implementation.

So the key question is, how do you sort topologically? Is it time to show off some fancy skills?

In fact, it is very simple to reverse the result of the back-order traversal, which is the result of topological sorting.

Look directly at the solution code:

boolean[] visited; Postorder = new ArrayList<>(); postorder = new ArrayList<>(); Int [] findOrder(int numCourses, int[][] prerequisites) {// If (! canFinish(numCourses, prerequisites)) { return new int[]{}; } // List<Integer>[] graph = buildGraph(numCourses, prerequisites); // visit = new Boolean [numCourses]; for (int i = 0; i < numCourses; i++) { traverse(graph, i); } // Reverse the collections.reverse (postorder) of type int[]; int[] res = new int[numCourses]; for (int i = 0; i < numCourses; i++) { res[i] = postorder.get(i); } return res; } void traverse(List<Integer>[] graph, int s) { if (visited[s]) { return; } visited[s] = true; for (int t : graph[s]) { traverse(graph, t); } // postorder.add(s); Boolean canFinish(int numCourses, int[][] prerequisites); List<Integer>[] buildGraph(int numCourses, int[][] prerequisites);Copy the code

Although the code looks a lot, the logic should be clear. As long as the graph has no loops, we will call the traverse function to traverse the graph through BFS, record the post-traverse results, and finally reverse the post-traverse results as the final answer.

So why is the reverse of a back-order traversal topological sort?

I’m also avoiding mathematical proof here, but just to give you an intuitive example, let’s just say binary trees, and this is the binary tree traversal framework that we’ve talked about many times:

Void traverse(TreeNode root) {// traverse(root.left) // traverse(root.right) // traverse(root.right)}Copy the code

When is the backward traversal of a binary tree? Postorder position traversal code is executed after traversing the left and right subtrees. In other words, the root node is loaded only when the left and right subtrees are loaded into the result list.

This feature is important because topological sorting is based on post-order traversal because a task cannot begin to execute until all dependent tasks have been completed.

If you think of each task as a node in a binary tree, and the task that the task depends on as a child node, should you process all the child nodes first and then the parent node? Is this a post-order traversal?

The following figure is the result of a sequential traversal of a binary tree:

Combine this figure to say why we need to reverse the result of the post-order traversal, is the final topological sorting result.

We said that A node can be understood as A task, and the children of this node can be understood as the dependencies of this task, but notice the expression of the dependency we said before: if A is done before B can be done, then there is A directed edge from A to B, indicating that B depends on A.

So, the parent node depends on the child node, and the binary tree should look like this:

Is it the opposite of our normal binary tree pointer? Therefore, the normal post-order traversal results should be reversed, which is the result of topological sorting.

Above, I briefly explained why “the result of topological sorting is the result of backward order traversal after reverse”, of course, although my explanation is more intuitive, but there is no strict mathematical proof, interested readers can check for themselves.

Anyway, it’s enough to remember that topological sorting is the result of backward traversal, and topological sorting is only for directed acyclic graphs, so ring detection should be done before topological sorting.

View more quality algorithm articles click here, hand with your brush force buckle, committed to the algorithm to speak clearly! My algorithm tutorial has received 90K star, welcome to like!

What is topological sort? How does it work?

Determine whether a directed graph has a ring

A topological sort

Related Posts

2021, fourth year of learning programming, what did I do in this year?

I’ve been writing for a year, and all the tools I need are here!

What should code review focus on