①- Binary tree, red-black tree, Hash, B+ tree

Indexes: Indexes are ordered data structures that help MySQL fetch efficiently

Index data structure:

  • Binary tree

  • Red and black tree

  • The Hash table

  • B-Tree

Binary Search Trees

The left node is smaller than the parent, and the right node is larger than the parent. His height determines the efficiency of the search.

Balanced binary search tree:

Anomalously tilted binary search tree:

If a column of data encounter like “tilted binary search tree”, then the binary tree index, in fact, degenerate into a “linked list”, query this column of data or the full table scan, lost the meaning of adding indexes.

Tree insertion is very likely to cause tilt, and different insertion sequences will lead to different tree heights, which directly affect the search efficiency of the tree. Unbalanced binary search trees are naturally less efficient.

2. Red-black Trees

It’s essentially a binary tree, but it’s called a binary balanced tree. Solve the possibility of binary tree lopsided. When a balanced tree is inserted or deleted, the left and right nodes of the tree are balanced by rotation.

HashMap and TreeSet in Java, implementation of HashMap in Java8 because use red-black tree instead of linked list (linked list length >8).

Red black tree rule definition:

1. Each node has a color, red or black

2. The root node is black

3. Two consecutive red nodes cannot exist between parent and child nodes

4. The number of black nodes traversed by any root node to its descendants must be the same

5. Empty nodes are considered black

Because of the limitations of the above rules, the self-equilibrium of red-black trees is guaranteed. The longest path from root to leaf in a red-black tree is no more than twice the shortest path.

3. A Hash index

A hash is a key-value data structure. The implementation is usually an array + list structure, using the hash function to calculate the key position in the array, and then using the list to resolve hash conflicts if they occur (zipper method). Of course, there are other ways to resolve hash conflicts. Hash data structure is very common, for example, our system uses HashMap to build hot data cache, which is very efficient in accessing.

The hash structure first calculates the hash value of the key to determine its position in the array. If there is a conflict, a linked list is created at the position of the array. There are obviously several problems with this:

Even if the positions calculated by keys with the same characteristics may be far apart, continuous query is inefficient. That is, range query is not supported.

A hash index stores computed hash values and row Pointers, not row values, so searching for data using a hash index requires two queries (first to find the position of the row and then to find the specific data).

The prerequisite for querying data with a hash index is to calculate the hash value, that is, the key must be a key that can accurately point to a data. Therefore, matching queries such as like are not supported.

So what we know is that hash indexes are good for picking a row of data quickly. A search for a range lookup cannot be satisfied by Hash.

4. B + tree

The MySQL index uses a tree structure, but it is not a binary tree. Because data in a database is ultimately stored on disk, moving between nodes can take a lot of time if there are too many nodes in the tree. In the implementation of MySQL, we choose to put more content on the same node, and transfer the operation of the same node to the memory, so as to reduce the number of transfer between nodes in the external memory, so as to achieve the purpose of improving efficiency. This is B+Tree, in the implementation of B+Tree a three-layer Tree structure can basically meet almost all of our needs.

3. Why doesn’t MySQL adopt red-black trees because they are so efficient

In practical application scenarios, MySQL table data is generally large and massive. If you use a red-black tree, it’s going to be extremely tall, and red-black trees are very efficient queries. But in the case of massive data, the height of the tree is not controllable. If we’re looking for data that happens to be on the leaf of the tree. That query will be very slow. Therefore, MySQL does not use red-black trees to organize indexes.

Finally recommend a visual data structure site:

www.cs.usfca.edu/~galles/vis…