Binary Search tree (BST)

A Binary Search Tree, also called a ordered Binary Tree, is an empty Tree or a Binary Tree with the following properties:

If the left subtree of any node is not empty, the values of all nodes in the left subtree are less than the values of its root node. If the right subtree of any node is not empty, the values of all nodes in the right subtree are greater than the values of its root node. The left and right subtrees of any node are binary search trees respectively. There are no nodes with equal keys

Binary search trees degenerate into linear structures in extreme cases, reducing search efficiency

The depth of the tree directly determines the search times under the worst conditions, which also determines the search efficiency. Therefore, it is important to reduce the depth of the tree, which leads to AVL tree and red-black tree

Advantages: Ordered disadvantages: Under extreme conditions will degenerate into a linked list, reduce the search efficiency

AVL Tree balanced binary Tree

AVL tree is a balanced binary tree, named after its inventor (Adelson-Velskii and Landis). Balanced binary tree stands for balanced binary search tree, also known as AVL tree. It’s a self-balancing tree.

Balanced binary tree is also a special binary search tree, which meets the characteristics of binary search tree

AVL trees also stipulate that the left node is smaller than the root node and the right node is larger than the root node. It is also stipulated that the height difference between the left subtree and the right subtree should not exceed 1. This ensures that it doesn’t become a linear linked list. The AVL tree is stable in search, insertion and deletion. The time complexity of search, insertion and deletion is O (logN). However, due to maintaining its own balance, nodes need to be rotated frequently when inserting and deleting nodes.

Each AVL tree node can hold only one element, and each node has only two child nodes. Data is stored on disk. Each query is to add a page of data from disk to memory. Nodes of each layer of the tree are stored on one page, and data of different layers are stored on different pages. This will require multiple disk IO if multiple layers of queries are required. To solve this problem of AVL trees, B trees appear.

Advantages: orderly, which solves the problem that BST will degenerate into a linear structure. Disadvantages: Frequent rotation of nodes is required when inserting and deleting nodes

Red black Tree (R-B Tree)

R-b Tree, full name is red-black Tree, also known as “Red Black Tree”, it is a special binary search Tree. Red-black trees have memory bits on each node to indicate the color of the node, which can be Red or Black.

The characteristics of red-black trees are as follows: (1) Each node is either black or red. (2) The root node is black. (3) Each leaf node (NIL) is black. [Note: leaf nodes are NIL or NULL leaf nodes!] (4) If a node is red, its children must be black. (5) All paths from a node to its descendants contain the same number of black nodes.

The red-black tree is shown as follows:

Red-black trees are almost identical to AVL tree operations in terms of lookup. However, in the operation of insertion and deletion, AVL tree will carry out a large number of balance calculation every time insertion and deletion, red black tree is at the expense of the superior conditions of strict height balance, it only requires partial balance requirements, combined with color change, reduce the requirements for rotation, thus improving the performance. Red-black trees can search, insert, and delete with O(log2 n) time complexity. Moreover, because of its design, any imbalance is resolved within three rotations.

Compared with BST, red-black tree can ensure that the longest path of the tree is not more than twice the length of the shortest path, so it can be seen that its search effect is minimally guaranteed. In the worst case, order logN is guaranteed, which is better than binary search trees. Because the binary search tree at worst can get the search to order N.

The time complexity of the algorithm of red-black tree is the same as that of AVL, but the statistical performance is higher than that of AVL tree. In the insertion and deletion, the maintenance operation of AVL tree is definitely much more time-consuming than that of red-black tree, but their search efficiency is O(logN), so the application of red-black tree is still higher than that of AVL tree. The speed at which you actually insert AVL and red-black trees depends on the data you insert. AVL trees are better if your data is well distributed (such as randomly generating series numbers), but red-black trees are faster if you want to deal with more clutter.

Red-black trees are widely used in TreeMap, TreeSet, and post-JDK1.8 HashMaps.


All three are based on binary trees

A binary tree can hold only one element per node, and each node has only two child nodes. In practical applications, data is stored on disks. Each query adds one page of data on disks to memory. Nodes at each layer of the tree are stored on one page, and data at different layers are stored on different pages. This will require multiple disk IO if multiple layers of queries are required. To solve this problem, we have B trees

B tree

B tree is not a binary tree. Each layer of B tree has more nodes, and the “thin and tall” AVL tree has become “short and fat”. B tree belongs to multi-fork tree, also known as balanced multi-path search tree (search path is not more than two). The order of B tree means that each node of the tree has at most M subtrees. M is the order of the tree.

A B-tree of order m specifies:

  1. The root node has at least two children.
  2. Each intermediate node contains K-1 elements and K children, where m/2 <= k <= m, that is, k is between m/2 and m.
  3. Each leaf node contains k-1 elements, where m/2 <= k <= m.
  4. All the leaf nodes are in the same layer.

5. Elements in each node are arranged from small to large, and k-1 element in the node is exactly the range division of elements contained by K children.

Features:

B tree is also a self-balancing tree, which requires rotation of nodes during insertion and deletion. However, compared with AVL tree and red-black tree, each node contains more keywords, thus reducing the depth of the tree and reducing the number of DISK I/O. Especially in the B tree is applied to the database, the database to make full use of the principle of the disk blocks (disk data is stored in the form of block storage, the size of each block of 4 k, data reading, every time an IO the same disk blocks of data can be read at once) the node size limit and fully used in the disk size range; After increasing the number of node keywords in the tree, the tree level is less than the original binary tree, reducing the number and complexity of data search, so B tree is widely used in the file system and database

Disadvantages: B-tree lookups are unstable, at best at the root, at worst at the leaf. B tree traversal is troublesome, because it requires an intermediate traversal, so it also performs a certain amount of disk I/O.

The maximum length of a key cannot be changed unless the database is completely rebuilt. This causes many database systems to truncate people’s names to less than 70 characters.

To solve these problems, B+ trees emerged.

Application Scenarios:

  • Windows: HPFS file system
  • Mac: HFS, HFS+ file system
  • Linux: ResiserFS, XFS, Ext3FS, JFS file system
  • MongoDB

Compared with B+ tree, the advantage of B tree is that if the frequently accessed data is close to the root node, and the non-leaf node of B tree itself has the address of the keyword and its data, the data retrieval will be faster than that of B+ tree.

B + tree

An M order B+ tree has the following characteristics:

1. The middle node of a k subtree contains K elements (k-1 elements in a B tree). Each element does not store data, but is only used for indexing. All data is stored in the leaf node. 2. All leaf nodes contain information of all elements and Pointers to records containing these elements, and leaf nodes themselves are linked in large order according to the size of keywords. 3. All intermediate node elements exist at the same time in the child node, which is the largest (or smallest) element in the child node element.

Advantages of B+ trees:

  1. B+ trees have fewer levels: elements in non-leaf nodes do not hold data, so each layer can hold more elements, which is “chunkier” than a B-tree, meaning more elements per page on disk. This reduces the number of disk I/O searches
  2. B+ tree queries are more stable: each lookup must match a leaf node, so each lookup is stable. Since the middle node also carries data, b-tree only needs to match the element to be searched. In the best case, it ends the search at the root node, and in the worst case, it ends the search at the leaf node, which results in unstable search performance
  3. B+ tree has the natural sorting function: all the leaf node data of B+ tree forms an ordered linked list, which is more convenient for data query in the range, with high data tightness and higher cache hit ratio than B- tree.
  4. B+ tree traversal of all nodes is faster: A B+ tree traversal of the whole tree only requires traversal of all leaf nodes, rather than traversal of each layer like a B tree, which is conducive to the database to perform full table scan

Zhihu. gray