Arrays are static data structures that are easy to implement programmatically and support random access well. Linked lists have more dynamic features. Linked lists are suitable for frequent addition, deletion and update operations. The time complexity of linked lists, which we also introduced, has the disadvantage of O(n) random access time compared to arrays.

There are also many derived data structures based on arrays and linked lists, such as queues and stacks, where queues are fifO and stacks are fifO, and they can be thought of as custom arrays or queues to solve specific problems. Then we introduced the data structure of hash table. The important application scenario of hash table is fast query and update. As we learned before, the time complexity of query and update of hash table is O(1) evenly.

Arrays and linked lists are limited by their linear nature. When you need to query an element in an array or linked list, you need to traverse the list from beginning to end in order n time. Imagine a set of 10,000 elements stored in an array or linked list. How long would it take to make 10,000 queries? The time complexity is O(108), and depending on your memory access speed and CPU speed, it can take more than ten minutes.

In modern Internet applications, 10,000 QPS is quite common. If you want to support up to 10,000 accesses per second, obviously using arrays and linked lists alone is not enough.

Tree definitions and examples

A tree consists of nodes and edge joins and is a nonlinear data structure, unlike arrays and linked lists

In the figure above, A is the root of the tree. Every node except the root node has one and only one edge that points to itself. The direction of this edge indicates that the parent node points to the child node. For example, A is the parent node of B, C, and D. B is also called the child node of A, and B is also the parent node of E, F, and K.

Each node can be connected to any number of children (including 0). Nodes without children are also called leaf nodes. In the example above, C, E, F, L, and G are leaf nodes. Nodes that share the same parent are called siblings. In the example above, B, C, and D are siblings.

The depth of a node is the number from the root node to its own edge, for example, the depth of K is 2. The height of a node is the number from the deepest node to its own edge, for example, B is 2. The height of a tree is also the height of its root node

The recursion of a tree is defined as a finite set of n (n>=0) nodes, each of which contains a value and points from edge to other nodes (child nodes), edges cannot be repeated, and no node can point to the root node. In a generalized tree, a node can have zero or more child nodes. Binary trees, which are described in later sections, are special types of generalized trees (each node has two children).

A generalized tree can be used to express hierarchical data relationships. For example, a File System is a hierarchical collection. Each Directory in a File System can contain multiple directories, and the leaf nodes without subdirectories are files.

Tree programming implementation

There are many ways to realize the tree, two common implementation methods are introduced here, one is based on linked list implementation, the other is based on array implementation. That’s right, you read that right. It’s based on what we learned earlier about arrays and linked lists, which is why I told you at the beginning of this column that arrays and linked lists are the building blocks of data structures. Advanced data structures require an in-depth understanding of these two basic data structures.

List-based implementations typically maintain a child node pointer and a list of siblings for each node type, which we call the left child sibling method. The code looks like this:

class TreeNode {
      Data data;
      LinkedList siblings;
      TreeNode left_child;
}
Copy the code

In our graph tree example above, what would node B look like in this code if we implemented node B using the left child sibling list? As shown in the figure below, the siblisings node of node B is a linked list that points to C and then to D, and the left_child node of node B points to E.

Another array-based approach is for each node to maintain an array containing all of its children, which we call the child array method.

class TreeNode {
      Data data;
      ArrayList children;
}
Copy the code

Let’s also see what this array of children method looks like in action! As shown in the figure below, the same node B needs to maintain an array containing E, K and F if it is implemented using the child array method.

It might seem easy to implement either way, but it’s not

The most common mistakes to make in implementing a tree are memory management and adding, deleting, and changing nodes.

class TreeNode {
      Data data;
      std::vector<TreeNode> children;
};
Copy the code

You can see that the memory of all child nodes is managed by the parent node. That is, when the parent node is deleted, the memory of the child node is automatically cleared. That creates a serious problem. It’s hard to easily delete any non-leaf nodes. So in fact with child arrays we tend to maintain only one set of Pointers to child nodes, like this:

class TreeNode {
      Data data;
      std::vector<TreeNode*> children;
};
Copy the code

This raises another question: who manages the memory of the child nodes? One solution is to implement a NodePool class to manage all node memory in addition to the TreeNode class. For example, the following code maintains memory with a simple dynamic array:

class NodePool {
  std::vector<TreeNode> nodes;
}
Copy the code

Another solution is to mimic the memory management of a linked list, which is easier to implement in the left child sibling list method because it is essentially a binary list.

In addition to memory management problems, tree node insertion and deletion are also difficult. If you remember the basics of arrays and linked lists, you know that random additions and deletions in arrays are order n, and linked lists are order 1. The tree implementation is similar here.

The complexity of inserting a tree node is O(1), which is equivalent to inserting an element into the siblings list or adding a left_child node. Inserting a node in a child array is like inserting an element in an array, moving all the nodes behind it, order n.

Deleting a node is more complicated than inserting a node. Similarly, deleting a single node is O(n) in the child array method, because you need to copy the subtree of that node to the upper node. In the left sibling method, deleting a single node is O(1) and you just need to rearrange a few pointer references.

So it turns out that left child sibling lists perform better than child arrays, and indeed, left child sibling lists are textbook implementations of trees. But in practice we often see children array method, why? Because the children array method is simpler from an API perspective, the user can access any child at random. Especially when we want to provide a read-only View of an array, the child array method can be provided as an API, but the modifiable tree behind it is still implemented by more complex private methods.

Tree traversal and basic algorithm

In addition to adding, deleting, and modifying trees, the most common tree operation is the search operation, also known as traversal. Without traversal, adding, deleting, and changing are impossible because you don’t know which node to operate on! Tree traversal can be divided into pre-order traversal, subsequent traversal and layer traversal according to the access order of the root node. The so-called preorder traversal is represented by pseudo-code:

  • First access the root node N
  • Recursively accesses the subtree of N

As you might have guessed, the so-called follow-up traversal is similar:

  • Recursively accesses the subtree of N
  • Then access the root node N

What about layer by layer? Top to bottom, left to right. For example, in the tree example above, the result of traversing layers is: A, B, C, D, E, K, F, G, L. Let’s take a look at how to implement the forward traversal in the left child sibling list method.

class TreeNode {
  TreeNode* left_child;
  TreeNode* sibling;
}

void PreorderVisit(TreeNode root) {
  Visit(root);
  for (TreeNode* child = root->left_child; child != nullptr; child = child->sibling) {
    PreorderVisit(child);
  }
}
Copy the code

Focus on system architecture, high availability, high performance, high concurrency technology sharing