Depth - first search and breadth - first search, ultra - detailed text and text parsing

preface

Depth First Search (DFS) and Breath First Search (Breath First Search) are two very important algorithms in graph theory. They are widely used in topology sorting, pathfinding (mazes), Search engines, crawlers, etc. It’s also used a lot in Leetcode, a lot of interview questions.

This article will be from the following aspects to tell about the depth first traversal, breadth first traversal, I think we will see the harvest.

Depth-first traversal, breadth-first traversal brief
Problem sets drill
DFS, BFS application in search engine

Depth-first traversal, breadth-first traversal brief

Depth-first traversal

The main idea of depth-first traversal is to start from an unvisited vertex V in the graph, follow a path all the way to the end, then back to the previous node at the end of the path, and start from another path to the end… , recursively repeat this process until all vertices have been traversed. It is characterized by not hitting the south wall and not turning back. It first goes one way and then moves on another way.

Trees are a special case of graphs (connected acyclic graphs are trees), so let’s see how trees can be traversed using depth-first traversal.

1. We traverse from the root node 1, whose adjacent nodes are 2, 3, and 4. We traverse node 2 first, then child node 5 of 2, and then child node 9 of 5.

Has walked exactly 2, above a road (9 is the leaf nodes, and no traverse the nodes), at this time is from 9 backs up a node 5, see whether there is other than the 9 5 nodes, no continue to back to 2, 2, and no other than the 5 nodes, back to 1, 1 have a node other than the 2 3, Therefore, depth-first traversal is carried out from node 3, as follows

3. Similarly, from 10 to 6, there is no child node except 10 in 6. If you go back up again, you find that 3 has child point 7 except 6, so 7 will be traversed at this time

3. Backtracking from 7 to 3 and 1, it is found that node 4 of 1 has not been traversed, so the traversal is carried out along 4 and 8 at this time, and the traversal is completed

The complete nodes are traversed in the following order (represented by the blue numbers on the nodes)

I believe that it is not difficult for you to see the above traversal to find that this is the pre-order traversal of the tree. In fact, whether it is pre-order traversal, middle-order traversal, or post-order traversal, all belong to depth-first traversal.

So how to implement depth-first traversal, there are two forms of recursion and non-recursion, let’s take a binary tree as an example to see how to implement depth-first traversal using recursion and non-recursion respectively.

1, recursive implementation

Recursive implementation is relatively simple, because it is a front-order traversal, so we traverse the current node, the left node, the right node, for left and right nodes, traversal their left and right nodes, and so on, until the leaf node (recursive termination condition), the code is as follows

public class Solution {
    private static class Node {
        /** * Node value */
        public int value;
        /** ** left node */
        public Node left;
        /** ** right node */
        public Node right;

        public Node(int value, Node left, Node right) {
            this.value = value;
            this.left = left;
            this.right = right; }}public static void dfs(Node treeNode) {
        if (treeNode == null) {
            return;
        }
        // Iterate over the node
        process(treeNode)
        // Iterate over the left node
        dfs(treeNode.left);
        // Iterate over the right nodedfs(treeNode.right); }}Copy the code

Recursion is expressive and easy to understand, but if the hierarchy is too deep, it can easily lead to stack overflow. So let’s focus on non-recursive implementations

2. Non-recursive implementation

Observing the characteristics of depth-first traversal carefully, for binary trees, since it is sequential traversal (traversing the current node first, then traversing the left node, then traversing the right node), we have the following idea

For each node, the current node is traversed first, then the right node is pressed, then the left node is pressed (so that the left node is traversed first when the stack is flipped, which meets the depth-first traversal requirements).
Pop the stack and get the node at the top of the stack. If the node is not empty, repeat step 1. If the node is empty, the traversal is finished.

Let’s use the following binary tree as an example to see how to implement DFS using stacks.

The overall GIF is shown below

The overall idea is still quite clear, use the stack to push the node to be traversed, and then check whether there are any untraversed nodes after the stack, if there are, push the stack, if there is no continuous backtracking (out of the stack), with the idea, it is not difficult to write the following binary tree depth first traversal code:

/** * use stack to implement DFS *@param root
 */
public static void dfsWithStack(Node root) {
    if (root == null) {
        return;
    }

    Stack<Node> stack = new Stack<>();
    // Push the root node first
    stack.push(root);
    while(! stack.isEmpty()) { Node treeNode = stack.pop();// Iterate over the node
        process(treeNode)

        // Press the right node first
        if(treeNode.right ! =null) {
            stack.push(treeNode.right);
        }

        // press the left node again
        if(treeNode.left ! =null) { stack.push(treeNode.left); }}}Copy the code

As you can see, using a stack to implement depth-first traversal is not complicated code, and you don’t have to worry about stack overflow because of the deep level of recursion.

Breadth-first traversal

Breadth-first traversal refers to starting from an untraversed node of the graph, traversing the adjacent nodes of the node first, and then traversing the adjacent nodes of each adjacent node.

The breadth-first traversal GIF of the tree mentioned above is as follows, and the value of each node is their traversal order. Therefore, breadth-first traversal is also called sequential traversal. It traverses the first layer (node 1), then the second layer (node 2,3,4), the third layer (node 5,6,7,8), and the fourth layer (9,10).

Depth-first traversal uses stacks, while breadth-first traversal uses queues. Let’s take the following binary tree as an example to see how to use queues for depth-first traversal

Diagram below

I believe that seeing the above GIF, it is not difficult to write the following code

/** * use queue to implement BFS *@param root
 */
private static void bfs(Node root) {
    if (root == null) {
        return;
    }
    Queue<Node> stack = new LinkedList<>();
    stack.add(root);

    while(! stack.isEmpty()) { Node node = stack.poll(); System.out.println("value = " + node.value);
        Node left = node.left;
        if(left ! =null) {
            stack.add(left);
        }
        Node right = node.right;
        if(right ! =null) { stack.add(right); }}}Copy the code

Problem sets drill

Let’s take a look at some of the problems in LeetCode that use DFS and BFS to solve problems:

Leetcode 104,111: Given a binary tree, find its maximum/minimum depth.

For example, given a binary tree [3,9,20,null,null,15,7],

    3
   / \
  9  20
    /  \
   15   7
Copy the code

It has a minimum depth of 2 and a maximum depth of 3

How to calculate the depth of the left and right subtrees? For each recursive call, the depth is increased by one. It is not difficult to write the following code

/** * leetcode 104: Find the maximum depth of the tree *@param node
 * @return* /
public static int getMaxDepth(Node node) {
    if (node == null) {
        return 0;
    }
    int leftDepth = getMaxDepth(node.left) + 1;
    int rightDepth = getMaxDepth(node.right) + 1;
    return Math.max(leftDepth, rightDepth);
}

/** * leetcode 111: Find the minimum depth of tree *@param node
 * @return* /
public static int getMinDepth(Node node) {
     if (node == null) {        return 0;    } 
     if (node.left == null) {        return 1 + getMinDepth(node.right);    } 
     if (node.right == null) {        return 1 + getMinDepth(node.left);    } 
     int leftDepth = getMinDepth(node.left);
     int rightDepth = getMinDepth(node.right); 
     return 1 + Math.min(leftDepth, rightDepth);
}
Copy the code

Leetcode 102: Given a binary tree, return the value of the node traversed in order. (That is, layer by layer, all nodes are accessed from left to right). Example, given a binary tree: [3,9,20,null,null,15,7]

    3
   / \
  9  20
    /  \
   15   7
Copy the code

Returns the result of its hierarchical traversal:

[[3], [9,20], [15,7]Copy the code

Answer: Clearly the problem is a variation of breadth-first traversal, just in the process of breadth-first traversal, each layer of the nodes are added to the same array, the key is to traverse the same layer node before, must first work out how much the same layer node has, or because of BFS is queue, traverse is constantly in the process of the child nodes around the team, If you do not calculate the nodes of the same layer in advance, the left and right nodes of the same layer will be added to the nodes of the same layer, remember this! Diagram below

According to the above GIF ideas, it is not difficult to get the code as follows:

Java code

/** * leetcdoe 102: Binary tree sequence traversal, using BFS *@param root
 */
private static List<List<Integer>> bfsWithBinaryTreeLevelOrderTraversal(Node root) {
    if (root == null) {
        // If the root node is empty, the binary tree does not exist
        return Arrays.asList();
    }

    // The final sequence traversal result
    List<List<Integer>> result = new ArrayList<>();

    Queue<Node> queue = new LinkedList<>();
    queue.offer(root);

    while(! queue.isEmpty()) {// Record each layer
        List<Integer> level = new ArrayList<>();
        int levelNum = queue.size();
        // Traverses the nodes of the current layer
        for (int i = 0; i < levelNum; i++) {
            Node node = queue.poll();
            // The left and right children of the first node join the team. Since levelNum is calculated before joining the team, the left and right children of the first node are not traversed at the current level
            if(node.left ! =null) {
                queue.add(node.left);
            }
            if(node.right ! =null) {
                queue.add(node.right);
            }
            level.add(node.value);
        }
        result.add(level);
    }

    return result;
}
Copy the code

Python code:

class Solution:
    def levelOrder(self, root) :
        """ :type root: TreeNode :rtype: List[List[int]] """
        res = []  # nested list, save the final result
        if root is None:
            return res
        
        from collections import deque
        que = deque([root])  # queue, save nodes to be processed
        while len(que)! =0:
            lev = []  # list, which holds the values of the nodes of this layer
            thislevel = len(que)  # Number of nodes at this layer
            whilethislevel! =0:
                head = que.popleft()  # eject queue head node
                # The left and right children of the first node join the team
                if head.left is not None:
                    que.append(head.left)
                if head.right is not None:
                    que.append(head.right)
                lev.append(head.val)  # Push the value of the first node into this layer
                thislevel-=1
            res.append(lev)
        return res
Copy the code

This is obvious with BFS, but it could also be done with DFS, and it would be a big plus if you could handle it in an interview.

DFS can be implemented recursively by adding a “layer” of variables to the recursive function. As long as the node belongs to this layer, the node is placed in an array of the corresponding layer. The code is as follows:

private static final List<List<Integer>> TRAVERSAL_LIST  = new ArrayList<>();
/** * leetcdoe 102: Binary tree sequence traversal using DFS *@param root
 * @return* /
private static void dfs(Node root, int level) {
    if (root == null) {
        return;
    }

    if (TRAVERSAL_LIST.size() < level + 1) {
        TRAVERSAL_LIST.add(new ArrayList<>());
    }

    List<Integer> levelList = TRAVERSAL_LIST.get(level);
    levelList.add(root.value);

    // iterate over the left node
    dfs(root.left, level + 1);

    // iterate over the right node
    dfs(root.right, level + 1);
}
Copy the code

DFS, BFS application in search engine

We use search engines like Google and Baidu almost every day. Do you know how these search engines work? There are three simple steps

1, web crawling

Search engine will crawl the web page through the crawler, the page HTML code stored in the database

2. Pretreatment

Indexing procedures to crawl to the page data for text extraction, Chinese word segmentation, (inverted) index processing, for use by ranking procedures

3, ranking

After a user enters a keyword, the ranking program calls the indexed database data, calculates relevance, and then generates a page of search results in a certain format.

Let’s focus on the first step, web scraping.

The general operation of this step is as follows: assign a group of initial web pages to the crawler, and we know that the web page actually contains many hyperlinks. After crawler crawls a web page, it analyzes and extracts all the hyperlinks in the web page, and then crawls and extracts these hyperlinks, and then extracts the hyperlinks… “, so that you can continue to extract pages based on hyperlinks. The following chart

As shown above, you end up with a graph, and the question then becomes how to traverse the graph, obviously in a depth-first or breadth-first way.

If it is breadth-first traversal, first climb the first layer of the start page, then climb the hyperlinks in each page, if it is depth-first traversal, first climb the start page 1, then climb the links in this page… , then climb to the start page 2…

In fact, crawler is used together with depth-first and breadth-first strategies. For example, in the initial web page, if some pages are important (with high weight), depth-first traversal will be performed on this page first, and then breadth-first traversal will be performed on other (with the same weight) initial web pages after traversal.

conclusion

DFS and BFS are two very important algorithms, you must master, in order to explain the convenience of this article, only for trees DFS, BFS, you can try to use graph code, the principle is the same, but graph and tree representation is different, DFS is generally to solve the connectivity problem, BFS is generally used to solve the shortest path problem. Later, we will study together and look up sets, Dijkstra, Prism algorithm, etc. Please look forward to it!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Depth – first search and breadth – first search, ultra – detailed text and text parsing

preface

Depth-first traversal, breadth-first traversal brief

Depth-first traversal

Breadth-first traversal

Problem sets drill

DFS, BFS application in search engine

conclusion

Depth – first search and breadth – first search, ultra – detailed text and text parsing

preface

Depth-first traversal, breadth-first traversal brief

Depth-first traversal

Breadth-first traversal

Problem sets drill

DFS, BFS application in search engine

conclusion

Related Posts

Servlet Call Diagram, Servlet Details, ServletConfig, ServletContext

Implementing a simple LRU for linked Lists (Java)

Chaos Engineering from zero