Golang data structure: tree

Golang Datastructures: Trees

Original author: Ilija Eftimov

The Nuggets translation Project

Permanent link to this article: github.com/xitu/gold-m…

Translator: steinliber

Proofread by: Endone, LeoooY

For most of your programming career you will never have to deal with trees as data structures, or even if you don’t understand them, you can easily avoid using them (which is what I’ve been doing).

Now, don’t get me wrong — arrays, lists, stacks, and queues are all very powerful data structures that can take you a long way down the programming path, but they don’t solve every problem, regardless of how you use them and how efficiently. When you put hash tables into this combination, you can solve quite a few problems, but for many problems, if you can master the tree structure, it’s a powerful (and perhaps only) tool.

So let’s look at tree structures, and then we can learn how to use them with a little exercise.

A theory of

Arrays, lists, queues, and stacks store data in collections with heads and tails, so they are called “linear structures.” But when it comes to data structures like trees and graphs, this can get confusing because the data is not stored in the structure in a linear fashion.

Trees are called nonlinear structures. In fact, you could also say that a tree is a hierarchical data structure because its data is stored in a hierarchical manner.

For your reading pleasure, here’s wikipedia’s definition of a tree structure:

A tree is a data structure consisting of nodes (or vertices) and edges without any rings. A tree with no nodes is called an empty tree. A non-empty tree is composed of a root node and a hierarchy of additional nodes that may consist of multiple levels.

What this definition means is that a tree is just a collection of nodes (or vertices) and edges (or connections between nodes), and it does not contain any loops.

For example, the data structure represented in the diagram is A combination of nodes, named A through F, with six edges. Although all of its elements make it look like they are constructing A tree, nodes A, D, and F all have A loop, so the data structure is not A tree.

If we break the edge between nodes F and E and add a node G, joining G and F with edges, we get something like the following:

Now, because we have eliminated loops in the diagram, we can say that we now have a valid tree structure. It has A root node called A, and it has seven nodes. Node A has three child nodes (B, D, and F) and the nodes below these nodes (C, E, and G, respectively). Therefore, node A has six descendants. In addition, the tree has three leaf nodes (C, E, and G) or let’s call them nodes with no children.

What do nodes B, D and F have in common? They are siblings because they have the same parent (node A). They’re all in the first layer because each of them only takes one step to get to the root node. For example, node G is at layer 2, and since the path from G to A is G -> F -> A, we need to walk two edges to reach node A.

Now that we’ve looked at a bit of tree theory, let’s look at how trees can be used to solve some problems.

Model HTML documents

If you’re a software developer who’s never written any HTML, I’m going to assume you’ve seen (or know) what HTML looks like. If you still don’t know, I suggest you right click on the page you are currently reading and click “View Source” to see it.

Seriously, go check it out, I’ll be right here…

Browsers have a built-in thing called DOM — a cross-platform, language-independent application programming interface that treats these Web documents as a tree structure, with each node as an object representing part of the document. This means that when the browser reads the HTML code in your document it will load the document and create a DOM based on it.

So, let’s imagine for a moment that we’re developers of Chrome or Firefox, and we need to model the DOM. Well, to make this exercise easier, let’s look at a small HTML document:

<html> <h1>Hello, World! </h1> <p>This is a simple HTML document.</p> </html>Copy the code

So, if we modeled the document as a tree structure, it would look like this:

Now, we can think of text nodes as separate nodes, but for simplicity, we can assume that any HTML element can contain text.

The HTML node will have two child nodes, h1 and P, which contain the fields tag, text, and children. Let’s put this in code:

type Node struct {
    tag      string
    text     string
    children []*Node
}
Copy the code

A Node will have only label names and child nodes to choose from. Let’s try our hand at creating this HTML document using the Node tree seen above:

func main() {
        p := Node{
                tag:  "p",
                text: "This is a simple HTML document.",
                id:   "foo",
        }

        h1 := Node{
                tag:  "h1",
                text: "Hello, World!",
        }

        html := Node{
                tag:      "html",
                children: []*Node{&p, &h1},
        }
}
Copy the code

This looks ok, so we’ve got a basic tree structure up and running.

Build myDOM-dom as a direct alternative to 😂

Now that we have some tree structure in place, let’s step back and see what the DOM does. For example, if you replace DOM with MyDOM (TM) in the real world, you should be able to access the nodes and modify them using JavaScript.

The easiest way to do this with JavaScript is to use the following code

document.getElementById('foo')
Copy the code

This function will look for a node in the Document tree with foo as its ID. Let’s update our Node structure to get more functionality, and write a query function for our tree structure:

type Node struct {
  tag      string
  id       string
  class    string
  children []*Node
}
Copy the code

Now, each of our Node structures will have a tag, children, which is a slice of a pointer to that Node’s children, an ID representing the ID in that DOM Node, and a class referring to the class that can be applied to that DOM Node.

Now go back to our previous getElementById query function. How to do it. First, let’s construct a tree structure that can be used to test our query algorithm:

<html>
  <body>
    <h1>This is a H1</h1>
    <p>
      And this is some text in a paragraph. And next to it there's an image. s Logo"/>  
      
        This is the footer of the page. 
       2019 © Ilija Eftimov 
      
  Copy the code

This is a very complex HTML document. Let’s use Node as a structure in the Go language to represent its structure:

image := Node{
        tag: "img",
        src: "http://example.com/logo.svg",
        alt: "Example's Logo",
}

p := Node{
        tag:      "p",
        text:     "And this is some text in a paragraph. And next to it there's an image.",
        children: []*Node{&image},
}

span := Node{
        tag:  "span",
        id:   "copyright",
        text: "2019 &copy; Ilija Eftimov",
}

div := Node{
        tag:      "div",
        class:    "footer",
        text:     "This is the footer of the page.",
        children: []*Node{&span},
}

h1 := Node{
        tag:  "h1",
        text: "This is a H1",
}

body := Node{
        tag:      "body",
        children: []*Node{&h1, &p, &div},
}

html := Node{
        tag:      "html",
        children: []*Node{&body},
}
Copy the code

We started building this tree from the bottom up. This means building the structure from the deepest nested structure all the way up to the body and HTML nodes. Let’s take a look at the tree structure:

Implement node query 🔎

Let’s move on to our goal of having JavaScript call getElementById in our document and find the Node it wants to find.

To do this, we need to implement a tree query algorithm. The most popular methods for searching (or traversing) graph structures and tree structures are breadth-first search (BFS) and depth-first search (DFS).

Search ⬅ Make it a priority

As the name implies, the traversal approach adopted by BFS considers the “width” of the node to be explored first and then the “depth”. Here is a visualization of the BFS algorithm traversing the entire tree structure:

As you can see, the algorithm takes two steps in depth (through HTML and the body node), then traverses all of the child nodes of the body, and finally dives down to the next level to access the SPAN and IMG nodes.

If you wanted a step-by-step explanation, it would be:

We start at the roothtmlnode
Let’s push it toqueue
We start going into a cycle where ifqueueIt’s not null, and the loop keeps running
We checkqueueWhether the next element in the And if it matches, we just return this node and we’re done
When no match is found, we queue all the children of the checked node so they can be checked later
GOTOThe fourth step

Let’s look at a simple implementation of this algorithm in Go, and I’ll share some tips on how to easily remember the algorithm.

func findById(root *Node, id string) *Node {
        queue := make([]*Node, 0)
        queue = append(queue, root)
        for len(queue) > 0 {
                nextUp := queue[0]
                queue = queue[1:]
                if nextUp.id == id {
                        return nextUp
                }
                if len(nextUp.children) > 0 {
                        for _, child := range nextUp.children {
                                queue = append(queue, child)
                        }
                }
        }
        return nil
}
Copy the code

This algorithm has three key points:

queueIt will contain all nodes accessed by the algorithm
To obtainqueueCheck if it matches, and proceed to the next node if it does not
In the viewqueueAll children of the node are preceded by the next element ofIn the queue.

In essence, the whole algorithm is implemented around pushing child nodes into the queue and detecting nodes already in the queue. And of course, if there’s still no match at the end of the queue we return nil instead of a pointer to Node.

Depth-first search 缟

For completeness, let’s look at how DFS works.

As mentioned earlier, depth-first search first accesses as many nodes as possible in depth until it reaches a leaf node in the tree structure. When that happens, it goes back to the node above and finds another branch in the tree and continues down.

Let’s take a look at what this looks like:

If this confuses you, don’t worry — I’ve added more details in the steps to support my explanation.

The algorithm starts out just like BFS — it goes from HTML to body to div nodes. Then, instead of continuing to traverse the H1 node, the algorithm advances one step towards the leaf node span. Once it discovers that span is a leaf node, it returns to the DIV node to find other branches to explore. Since it can’t find it in div either, it moves back to the body node, where it finds a new branch and accesses the H1 node in that branch. It then continues through the same steps before — returning to the body node and finding another branch to explore — and finally accessing the P and IMG nodes.

If you want to know “how can we return to the parent without a pointer to the parent”, then you’ve forgotten one of the oldest tricks in the book, recursion. Let’s look at a simple recursive implementation of this algorithm in Go:

func findByIdDFS(node *Node, id string) *Node {
        if node.id == id {
                return node
        }

        if len(node.children) > 0 {
                for _, child := range node.children {
                        findByIdDFS(child, id)
                }
        }
        return nil
}
Copy the code

Search 🔎 by class name

Another feature MyDOM (TM) should have is to find nodes by class name. Basically, when a JavaScript script executes getElementsByClassName, MyDOM should know how to collect all the nodes with a particular class name.

As you can imagine, this is also an algorithm that must explore the entire MyDOM (TM) structure tree to obtain nodes that meet certain conditions.

For simplicity, we’ll implement a Node structure called hasClass:

func (n *Node) hasClass(className string) bool {
        classes := strings.Fields(n.classes)
        for _, class := range classes {
                if class == className {
                        return true}}return false
}
Copy the code

HasClass takes the Classes field of the Node structure, separates it with a space character, and then loops through the slice of the classes and tries to find the class name we want. Let’s write a few test cases to verify this function:

type testcase struct {
        className      string
        node           Node
        expectedResult bool
}

func TestHasClass(t *testing.T) {
        cases := []testcase{
                testcase{
                        className:      "foo",
                        node:           Node{classes: "foo bar"},
                        expectedResult: true,
                },
                testcase{
                        className:      "foo",
                        node:           Node{classes: "bar baz qux"},
                        expectedResult: false,
                },
                testcase{
                        className:      "bar",
                        node:           Node{classes: ""},
                        expectedResult: false,}}for _, case := range cases {
                result := case.node.hasClass(test.className)
                ifresult ! = case.expectedResult { t.Error("For node", case.node,
                                "and class", case.className,
                                "expected", case.expectedResult,
                                "got", result,
                        )
                }
        }
}
Copy the code

As you can see, the hasClass function checks whether the Node class name is in the class list. Now, let’s move on to the implementation of MyDOM, which looks up all matching nodes by class name.

func findAllByClassName(root *Node, className string) []*Node {
        result := make([]*Node, 0)
        queue := make([]*Node, 0)
        queue = append(queue, root)
        for len(queue) > 0 {
                nextUp := queue[0]
                queue = queue[1:]
                if nextUp.hasClass(className) {
                        result = append(result, nextUp)
                }
                if len(nextUp.children) > 0 {
                        for _, child := range nextUp.children {
                                queue = append(queue, child)
                        }
                }
        }
        return result
}
Copy the code

Does this algorithm look familiar? That’s because you’re looking at a modified findById function. FindAllByClassName works like findById, but instead of returning a match, findAllByClassName adds the matched Node to the Result slice. It will continue to loop until all nodes have been iterated.

If no match is found, the result slice will be empty. If any of them match, they are returned as part of result.

The last thing to note here is that we are using breadth-first traversal of the tree — this algorithm uses a queue to store each Node structure and loops through the queue to add matches to the result slice if they are found.

Deleting a node 🗑

Another feature that is often used in the Dom is node deletion. Just like DOM can do this, MyDOM (TM) should be able to do this.

The easiest way to do this in Javascript is:

var el = document.getElementById('foo');
el.remove();
Copy the code

Although our document knows how to handle getElementById (by calling findById later), our Node doesn’t know how to handle a remove function. Removing a Node from MyDOM (TM) will require two steps:

We findNodeThen delete it from the set of children of the parent node;
If you want to deleteNodeThere are child nodes, and we have to remove those child nodes from the DOM. This means that we must remove all Pointers to these children and their parents (the nodes to be deleted) so that the garbage collector in Go can free up the memory.

Here’s a simple way to do this:

func (node *Node) remove() {
        // Remove the node from it's parents children collection for idx, sibling := range n.parent.children { if sibling == node { node.parent.children = append( node.parent.children[:idx], node.parent.children[idx+1:]... , ) } } // If the node has any children, set their parent to nil and set the node's children collection to nil
        iflen(node.children) ! = 0 {for _, child := range node.children {
                        child.parent = nil
                }
                node.children = nil
        }
}
Copy the code

A *Node will have a remove function that performs the two steps described above to remove the Node.

In the first step, we take the node out of the parent node’s list of children and remove it by iterating through the children, merging the elements before and after the node into a new list.

In the second step, after checking to see if the Node has any child nodes, we remove the parent references from all child nodes, and then set the Node’s child Node field to nil.

What’s next?

Obviously, our MyDOM (TM) implementation will never replace DOM. But, I believe this is an interesting example to help you learn, and it’s also an interesting question. We interact with browsers every day, so it’s an interesting exercise to think about how they work in the dark.

If you want to use our tree structure and write more functionality for it, you can access the WC3 JavaScript HTML DOM documentation and consider adding more functionality to MyDOM.

Obviously, the purpose of this article is to give you more information about tree (graph) structures and the current popular search/traversal algorithms. However, keep exploring and experimenting anyway, and leave a comment below if there are any improvements to your MyDOM implementation.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.

A theory of

Model HTML documents

Build myDOM-dom as a direct alternative to 😂

Implement node query 🔎

Search ⬅ Make it a priority

Depth-first search 缟

Search 🔎 by class name

Deleting a node 🗑

What’s next?

Related Posts

One-click Install of supercar Nginx in server! Let website performance take off, don’t spend money!

Big data Hadoop-MapReduce learning journey chapter 4

[Spring Cloud Foundation]- Use of Eureka